Introduction
Proper use of program structure is the key to producing fast, reliable software. This paper discusses three key types of structures and their corresponding structure classes. It then shows how to implement objects and object classes in C with modules, how modules should be organized into levels within a program, and the different types of levels. It also discusses ways to deal with errors and improve performance in a well-structured program.
The Elements of Program Structure
Let’s start by defining precisely what is meant by program structure.
A structure in a computer program is a named program element created by binding together other program elements, either code or data, in a way that allows them to be treated as a unit for one or more purposes.
This is a fairly general definition. Less formally, if we bind together some things, give them a name, and can do something with the result, we have a structure. There are three common kinds of structures: structs, functions, and objects. A struct is a data structure; functions and objects are structures of both code and data.
The power of a structure is the result of its basic nature. Whenever program elements belong together, putting them in a structure almost always represents a simplification and a clarification. Referencing of the structure replaces referencing of the comprised elements.
The greatest benefit of structure occurs when it is reusable across many applications. Then it can be written once, correctly, and used countless times. Very complex functionality can be made easily available to all applications. This greatly increases program reliability and development speed.
Please note that for now I am not addressing the issue of classes. A class is a structure template which allows for the creation of multiple instances of a structure. This is addressed later.
Now let’s look at the common types of program structures.
Struct
A struct is a data structure which binds together data elements, each of which can be either another struct or a basic data element. Any language which allows structs always provides a means for accessing the data elements they comprise. The operations allowed on a struct vary by language. In C, structs may be initialized, assigned, passed to functions and returned by functions.
A struct should be used whenever there are data elements which always belong together. Typically this is the case when the struct represents a thing, and the comprised data elements represent attributes of that thing. (I have used the term thing rather than object here because of the technical meaning of object given below). For example a struct representing a person might have data elements for the person’s name, height, weight, birth date, and gender. In this example the use of a struct provides simplification and clarification wherever the struct can be used; for example in assignments or function parameters. Just to make this clear, compare the two function calls:
display_person(person)
display_person(name, height, weight, birthdate, gender)
The first of these is superior because of clarity and simplicity. One might argue that the second provides more information, but this would be incorrect. The first, in conjunction with the definition of person, provides all of the information of the second. It further supplies the information that all of the attributes of a person are being passed. The second function, by passing all of the elements individually, actually obscures the fact that all attributes are being passed.
As most programmers know there is a strong maintenance advantage to the first method. To understand this, just think of what must be done to the program to add a new attribute to a person.
In COBOL and Pascal, the name for a struct is a record. Early programming languages did not have structs. Examples are: early versions of Fortran and BASIC, and, on the HP3000, (all versions of) SPL.
Function
A function is a code structure which binds together code statements. Its execution can be called from various points within a program, and it typically returns control to the statement following the call. Functions may be parameterized, which means that the calling code provides data which the function may read and/or modify.
A function may have data within it, persistent between calls, which only it may access. In C, this is the function’s static variables. There are two important consequences of this:
Functions derive their importance from the fact that there are operations comprising multiple statements that are of general utility. The examples are everywhere in computing, and they are familiar, so only a few need be mentioned. There are mathematical functions such as sine and cosine, operating system facilities such as file read and write, and the functions you wrote for your latest program.
Virtually every procedural language has functions (called, perhaps, something else), including assembler. This has been true in electronic computing from very early on. I think it is fair to say that the function has been, historically, the most important type of structure.
Object
An object is a structure of both code and data which has the ability to encapsulate its data. It can be thought of as a struct and zero or more functions. A zero-function object is a struct. A one-function object is a function. Thus an object is a fairly straightforward (once you see it!) generalization of struct and function, which allows an indefinite number of functions to act on the same data.
A distinguishing aspect of the object is data encapsulation. Before objects were invented, programmers could operate on data with multiple functions, but unless care was taken the data could be shared not only by the functions but by other code as well. This put it at risk of corruption. Worse than that was the maintenance burden that occurs when operations on data were spread throughout a program.
As mentioned above, functions also provide encapsulation of data, but restricting the data to being operated on by only one function is too limiting.
Objects are not part of the C language, but are easily implemented nonetheless by modules. We will discuss this later.
Classes
We have defined program structure and have given three examples: structs, functions, and objects. Now let’s examine how structures are defined and created.
A structure definition, or class, specifies the nature of one or more structures, but does not actually create a structure.
For example, here is a struct definition in C:
struct complex {
float real;
float im;
};
This can now be used to declare any number of objects:
struct complex c1, c2, c3;
Corresponding to the three types of structure we have discussed, there are three types of class: struct class, function class, and object class.
A struct class is as shown above for C. Pascal also has struct classes. Interestingly, early versions of COBOL had structs but not struct classes. This meant that creating identical structs required repeating the struct definition in the code, with corresponding risk of error.
An object class is the familiar type of class seen in C++ and other object-oriented languages.
To my knowledge, no language has a function class. The reason, I suspect, is that if a function returns to the same state between invocations, there is no need for a class, because all functions of the class would behave the same. However if a function can change state because of static variables, there is a use for having multiple functions of the same type.
If you are a C programmer, you are probably familiar with the asctime() function, which returns a pointer to a static variable which it maintains and which contains the results of its computation. Each call to asctime() wipes out the results of the previous call. Thus the possible usefulness of multiple instances of a function.
The recent popularity of threads has highlighted this issue. Imagine one thread calling asctime() before another thread has made use of the results of its call to asctime(). The solutions to this problem that I have seen don’t make use of function classes, however. Instead they invent new functions that don’t have static variables.
The C Module
C does not directly support either objects or object classes. However C is flexible enough that they can be simulated using a pair of files which we will call a module.
A module in C is a pair of files, the header file and the body file, which together implement an object or object class.
The header file provides the module interface; that is, what the module looks like to users of it. It is included by other modules which make use of it. The body file provides the module implementation; that is, the actual code and data to implement the interface. Thus, for example, the header file contains function headers for functions callable by the module users. The body file contains the functions themselves. The header file contains externs for global data provided by the module. The body file actually defines the data.
The Header File
The header file is a ".h" file, to be included by users of the module. It may contain any of the following things. The order listed is an appropriate order for them to appear in the file.
Occasionally #defines precede #includes to affect what is included. Otherwise it is best to keep the defines after the #include’s to make it clear that this is not occurring.
#defines are often used in typedefs and externs, and therefore precede them.
The header file allocates no data. Non-static data allocations would result in linking errors when the header file is included by more than one module. Static data allocations are certainly possible but not useful, since the allocations would only be available to the user of the module being implemented, not to the module itself.
The Need to Know
The key guideline for the header file is that it contains only what the module user needs to know, nothing more. If it contains information relevant to the implementation of the module, but not the use of it, a mistake has been made.
Naming
It is very useful to have a naming convention that identifies the items defined in the header file as belonging to the module. I have used a three letter prefix followed by an underbar for all the names. This convention is used in the following example.
Example: a keyed access module
As an example, code listing 1 presents the header file for a simple keyed access module:
#ifndef __key
#define __key
#include <types.h>
typedef
Code Listing 1
Header file for a Keyed Access Module
Comments on the Example
if (! key_func(…)) {
Some programmers prefer to return 0 for success, and non-zero for failure. I find it more logical to equate function success with boolean truth.
The Body File
The body file, or ".c" file, contains the functions and global variables promised by the header for general use. It may contain any of the following:
Occasionally #defines precede #includes to affect what is included. Otherwise it is best to keep the defines after the #include’s to make it clear that this is not occurring.
#defines are often used in typedefs and externs, and therefore precede them.
Objects and Object classes
As mentioned above, a module can represent either an object or an class, depending on whether it supports a single instance or multiple instances. Usually the slight amount of extra programming required to implement an object class is worthwhile. There is almost always a use for more than one object of a class.
The key_ module above implements an object classs. The module uses malloc()’s on each key_open() or key_create() to create the data representing the new object.
If a module just represents an object rather than an object class, the data representing the object (its struct) can just be static variables within the module body. There is no need to malloc() the data unless it is desired to preserve space prior to object creation.
Module Self-Test
The key_ module contains a function key_test(). As stated earlier, this is a module self-test function. Just as one can press a self-test button on a piece of modern electronic equipment, so one can call key_test() to test the key_ module. When called it does something like the following: create a keyed file, write data to it, check to be sure the data is there, close the file, open the file, check again to be sure the data is there, etc. It might even generate random data with random keys and run for a long time, as a thorough test.
The module self-test is an example of unit testing. This is a good practice; modules should be tested individually. So keep that code as part of the module itself. Otherwise it could get lost!
I have used this technique on my last three projects, and I am very happy with it. When I change a module, I rerun its self-test. It’s amazing how many "simple" changes break the test.
There are a number of ways to call the self-test function. A main() which calls it could be added to the module, but only compiled in when testing is to be done. A separate test program could call specific self-test functions on request, or the application under development could have a similar mechanism.
Levels And Level Types
We now know about objects and object classes, and how to implement them in C with modules. We now examine the different types of modules that an application can contain.
In a well-written application the modules are organized into levels. A module is at a higher level than a module it calls, and at a lower level than a module which calls it. If modules don’t use one another, they may be at the same level, though not necessarily.
It is possible of course for two modules to use one another, or for there to be loops of the sort: module A uses module B which uses module C which uses module A. There may on occasion be a reason to do this, but the necessity of it should be well understood. It is dangerous.
Diagram 1 below shows some modules organized into levels. A box represents a module. When there is an arrow between modules, the module at the tail of the arrow uses the one at the head. The arrows all point down.
Diagram 1
Modules organized into levels
Simplicity from Complexity
A good programmer manages complexity so well that it becomes simplicity. Levels are a programmer’s main tool in achieving this. Each level, with the levels below it, establishes a programming environment for the level above. Each higher level should provide an environment closer to the solution to the overall task of the program. Every level should be simply expressed in terms of the levels below.
This is the real key to writing reliable software:
Every level is simply expressed in terms of the levels below it.
It may sound easy. It isn’t. Not many programmers do it.
Level Types
The levels of an application are of four types, as follows:
Application
The Application Levels are the highest. They are the application. They define the overall behavior that the user sees. The lower levels provide an environment appropriate for writing the application, but don’t define the nature of the application.
Application Library
Just below the Application levels are the Application Library levels. Modules in these levels have general utility within the application, but probably not in other applications. They may display awareness of the application in which they operate. They may communicate with the user, and they may terminate.
General Library
Just below the Application Library levels are the General Library levels. Modules in these levels have general usefulness, and are written to be usable in other applications. Possibly they are more powerful than they need to be for this application alone. These modules may not terminate nor communicate with the user. The key_ module used as an example above could be a module of this type.
Shim
The lowest levels are the Shim levels. Modules in these levels contain all calls to functions outside of the application. With luck, these are the only modules that would have to be rewritten to move the application to a different DBMS or operating system. These modules provide only the functionality of the outside functions that is needed by the program. These modules may not terminate, nor communicate with the user.
The Module Chart
I have found it useful while programming an application to maintain a module chart organized into levels as described above. As an example, diagram 2 shows a module chart for a hypothetical text editor application.
Diagram 2
Module Chart for a Hypothetical Text Editor
Error Handling
Proper error handling is usually difficult. The normal program flow must be interrupted in favor of error-handling code. This can vary depending on the type of error. If the program needs to terminate, modules need to be called to do cleanup.
The cleanup and terminate code should be at a high level, higher than all the modules that need to be called. At startup, use the setjmp() function to set a location for longjmp() to reach at any time. Here’s a sample:
A low-level module has a function goto-cleanup() that can be called to do transfer control to cleanup(). Its header contains:
Its body contains:
When it’s time to terminate, goto_cleanup() is called, which then transfers control to the spot in main() which calls cleanup().
Error Reporting
Error reporting is another task that is usually difficult to do well. The goal when an error occurs is to present helpful messages to the user and also possibly to the programmer. Programmer messages are usually more of a problem than user messages.
Suppose, for instance, that in the text editor example the OS returns a file system error to the OS File module, which returns an error to the Keyed File module, which returns an error to the Work File module.
The Work File module is in the Application Library level. It is permitted to communicate with the user. It could for instance, issue the error message
Error writing to work file, cannot proceed
to the user, and then call a terminate function (see the section Error Handling, above). We would like to make detailed information about the error available to the programmer. This would include information on the system call failure, and the consequent failures in the OS File, Keyed File and Work File modules. The information should include function names and failure numbers.
The way that I have solved this is to provide each module in the Shim and General Library layers with an error message function that puts an error message for the most recent error into a buffer provided by the caller. The function key_status_msg() performs this task for the key_ module. If a module’s failure is due to the failure of a lower-level module, that module’s error message function would be called, and its text would be included in the text generated by the higher level module. In this way information is collected for all involved modules.
In the example we are considering, the Work File module would call the Keyed File module for an error message, which would call the OS File module for its message, which would include the system call error number in its message. The Work File module makes the error message available to the programmer.
I have found this technique to be quite general and quite useful.
Performance
The most important aspect of a program is that it is correct, that it does what it is supposed to do. We have seen how structuring contributes to reliability by helping a programmer make the complex simple. Another very important aspect is that it perform well. Structuring helps in achieving good performance, also.
The reason has to do with the nature of performance problems and the way in which they are identified and repaired.
A program’s performance is improved by reducing its demands for memory, CPU, and/or disk I/O. This is done by examining what tasks of the program consume significant amounts of these resources, and modifying these program parts to consume less.
The way that structuring helps is this: a well-structured program does each task in only one place. For example, keyed file access is done in the Keyed File module and nowhere else. Sines are calculated in the sine() function, and nowhere else. There are two beneficial performance consequences of this:
Program speedups often require tradeoffs among resources. A common example is that a buffering scheme can significantly reduce time spent waiting for disk I/O, at the cost of some additional code (increasing memory consumption) and CPU usage.
Let’s consider an example of a speedup in a structured program.
Performance Example: Buffering
Your program processes customer orders and it’s taking too long to do it. You suspect that access to the database is the cause. The DBMS is one of those modern super-relational-hyper-object models and it is also super-hyper slow. You have about twenty database read/write calls in the database access module, so you put in some code for each to count the number of times it is called and the total elapsed time.
You find out that a lot of time is spent getting data from the state information table. For each order processed your program reads information about the state from which the order was placed. You modify your database access module to buffer the state information. At the cost of a little extra code and a 60-record (don’t forget Puerto Rico!) array, you increase the program speed by 20 per cent. You don’t even hash the array. It takes you less than a day. Your program needs some additional speedups, but this is a good start.
If your program didn’t have a database access module, obtaining the performance data would be a lot more work because database calls would be spread throughout the program. Then fixing it would be a lot more work for the same reason. In fact, to do the fix you would probably replace the calls which read the state information table with calls to your own function which reads the table and does the buffering. You would be adding some of the structure that should have been there in the first place!
Or perhaps without the database access module, no performance measurements would be done because of the work involved. There might instead be discussions about replacing the DBMS with something faster.
Summary and Conclusions
Program structure is a useful binding of program elements. Three important types of structure are: structs, functions, and objects. Structs and functions are actually just special cases of objects. A structure class is a mechanism for producing an unlimited number of structures of the same type. In C, modules implement objects and object classes. In a well-structured program, modules are organized into levels of four types: Application, Application Library, General Library, and Shim. There are useful techniques to help a program handle errors cleanly.
Proper structuring is the key to writing reliable software and to analyzing and solving performance problems.