TakeFive Software, Inc.
20813 Stevens Creek Blvd., Ste 200
Cupertino, CA 95014
Phone: 408-777-1440
Fax: 408-777-1444
Email: mklaus@takefive.com
What is it all about?
Testing is critical to determine if an application is ready for the Year 2000. However, reading and understanding the entire source base might comprise several million lines of code. How do you find out what parts of your data represent only two digits for the year instead of the full four digits?
The purpose of this paper is to go beyond testing and investigate what the Year 2000-critical parts of an application are. It will provide practical advice for developers to identify and fix Year 2000-problematic code. It will also explain the criteria on which you should select tools... tools that can help you tackle the magnitude of the problem without being lost in the complexity of the application.
First, we will decide what class of software is of concern. Then we will define what the common date formats are. After exploring which segments make an application non-Year 2000-compliant, we will show how to identify Year 2000-critical areas and how to choose tools to enhance the process of fixing the Year 2000 problem.
What kinds of software are affected?
Operating Systems
Application software usually incorporates calls to the underlying operating system, for example, to query time stamps of files. Hence, the accuracy of the data provided by the operating system determines whether or not the application software calculates time and date correctly. Ask the vendor about the Year 2000-readiness of the version of the operating system on which your applications are running.
Third Party Software
Any vendor's software that interfaces and communicates time and date information to your applications is a potential candidate to undermine the Year 2000-readiness of your application. Check where the time and date are being read and written. For example, if your system has an interface to an underlying version control tool, you might check if your system receives correct information about modification times of repository files. If necessary, contact third-party vendors about their latest software versions.
Your Software
This needs to be the center of your concern. You are responsible (and possibly liable) for the correct functioning of your products. Regardless of what industry and branch, you must ensure that your system neither crashes nor that date-based calculations fail after the change to the Year 2000. Most importantly however, your application must not lose, alter or misinterpret data due to false assumptions of genuine date values.
Date Formats
Different applications may require different date formats. For operating systems, it may be necessary to calculate the current date out of milliseconds that have elapsed since origin in order to create unique time stamps. For example, a savings account application might compute interest rates from the number of days per year a particular amount of money has been saved.
Dates in the standard format CCYYMMDD are called Gregorian Dates where CC is the century, YY is the year, MM is the month, and DD is the day. If the century is omitted, 19YY is assumed.
Julian dates have the format CCYYDDD. The three digits of DDD indicate the day of the year.
Serial dates are most common in operating systems and represent the amount of time that has elapsed since origin (January 1, 1970 for most current computer systems).
What are Year 2000-critical components?
Year 2000-critical components are not big modules or libraries. They are more like small tokens (variables and data types) which can create great hazards when dealing with sensitive data. Basically, these critical components can be divided into four groups:
Literals
Literals are any hard coded dates or short forms of dates. For example, a hard coded "19" used in input masks, or "01/01/00" used for variable initializations.
Containers
Containers are any symbol (constant, variable) of a given symbol type (character array, class) storing any representation of time and date. For example, a variable currentDate may contain a string representation of December 12, 1999 ("12/12/99"). Candidates for containers usually match names like dt, Date, yr, Year, etc. Containers can also be nested to build more complex data structures or database records.
Modifiers
Modifiers are any operation or function that read or write time and date containers. For example, a modifier NumberOfDays(StartDate, EndDate) computes the number of days between two given dates. To track modifiers, watch out for symbol names containing update(), validate(), and date(), etc.
References
References are any association of containers, modifiers, or combinations of both. For example, a class person might have a reference to a data member birthDate that stores the birth date of a person (Container - Container reference). The data member birthDate may be read by a member function Age() in order to calculate the current age of a person (Container - Modifier reference). The member function Person::Age() could be called by a sophisticated function OptimumWeight() which might take the age into account to compute the optimum weight of a person Modifier - Modifier reference. References to literals are encapsulated by containers and modifiers.
How can Year 2000-critical components be identified and tracked?
This section deals with the more practical aspects of how tools can help locate Year 2000-containers and modifiers, and how to track references across the entire source code. The simplest tool you could use for this purpose might be the grep command on UNIX, or Find on Windows. More advanced tools come with a comfortable user interface, generate detailed reports, or have hyper-links from the search results back to the source code, etc. Most important however, is the ability to provide flexible filters for complex queries. Keep your hands off if a tool provides only a fixed number of predefined queries.
Tracking containers, modifiers and literals
In C/C++ Year 2000-critical containers can be variables, data members, structures, constants, enumerations, macros, and type definitions. Good tools help locate symbols by their name. For example, if a structure's name consists of a string "date" it is very likely that a member of that structure stores a year in one way or the other. Other potential containers of Year 2000-critical data might be year, yr, yy, yyyy, dt, time, month, day, etc., and all symbols containing these strings. Symbol browsers of this type are extremely useful. The user can filter and/or find objects by using regular expressions. A good tool should able to search the entire project, not just part.
Modifiers are harder to identify than containers, and almost impossible to locate as far as operators are concerned. Above, we provided the example of OptimumWeight(), a quite innocent-looking function. Nobody would guess from the name that OptimumWeight() is a potential Year 2000-violator. Knowing that this function computes the age of a person from the birth date and the current date is necessary to consider the violation.
Functions may contain any number of operators, both intrinsic and user-defined, which actually calculate with the given data. Depending on the internal representation of a date, operators can also be problematic. It is hard to predict whether the result of an operation is still valid after the dimensions of the operands are altered.
If containers and modifiers are how Year 2000-critical data is represented, stored, or accessed, they can be retrieved via these "handles". How do we find literals and hard-coded dates from input masks or initialization values of variables? The best search results can be achieved with a textual search based on regular expressions. For example, a date of the format "MM/DD/YY" can be translated into a regular expression "[0-1][0-9]/[0-3][0-9]/[0-9][0-9]". In a second step, filters could eliminate dates in comments, or help locate all occurrences of dates in assignments and comparisons.
Tracking Year 2000-critical references
Literals and modifiers, the majority of Year 2000-critical containers, can be identified with relatively simple tools that perform a textual search of suspicious declarations in the source code. However, just by knowing these "hot spots", we cannot determine what the impact and side-effects of changes on other parts of the application are.
The example of OptimumWeight() shows that just by looking at a symbol's name, potential Year 2000-violators cannot clearly be identified. Hence, it is necessary to take a different approach, i.e. starting with containers, and tracing references to other containers and modifiers. Queries run at a higher depth help to cross borders of abstraction levels and to get better results.
The ideal Year 2000-tool should offer cross referencing capabilities in the following areas:
The results of a cross reference query should be clear and show what kinds of references (forward, backward) and access types (read/write, call/called-by) are displayed. Hyper-links from the references to the actual statement would facilitate browsing and navigating in the source code. An interface to the version control tool used for the project could help select files for creating branch versions and configurations. During the early stage of Year 2000-assessment, a cross referencer helps figure out the impact of necessary changes on the entire code base.
How to fix Year 2000-critical code
Once the critical areas in a software system have been identified, several techniques can be applied to correct the code. Depending on the application as well as how much time and resources are available, you might select one or a combination of these. In the following, we describe the fundamental ideas behind each technique as well as major advantages and drawbacks:
Expansion
Expanding a two digit representation for the year to a full four digits is the only unambiguous and permanent solution. However, this technique requires a considerable amount of effort, because the data must be converted and the program must be changed.
Windowing
This technique determines the century based on the size of the two digit representation of the year. For example, the interval might be 100 years from 1/1/1950 to 12/31/2049. If the user enters a value YY of higher than 50 it would mean 19YY, a value less than 50 would mean 20YY. If the interval does not advance with the system date, it is called a "fixed window" technique (as opposed to a "sliding window", which advances with the system date). For example, if the current year is YY+1, the upper bound of the interval changes to 20YY+1, the lower bound to 19YY+1. Other flavors of this technique split the interval at a different ratio, e.g. 50/50, 60/40, or 70/30. A popular implementation of fixed windowing can be found in Microsoft Excel.
Windowing ensures that changes are necessary only to programs and that reports and screen layouts may not be affected. Major drawbacks include that this technique does not work for historical dates and dates spanning more than 100 years. Applications become more difficult to maintain. If data is passed on to other systems, these systems must use the same date window. In essence, having less work for now might give enough lead time for a thorough and final solution.
Encoding
The encoding technique allows you to compress a 4-digit date into an existing 2-byte field. The year will be converted into unsigned packed decimal or hexadecimal. For example, "07CE" would be the equivalent of 1998 in hexadecimal and can be stored in two bytes.
The encoding technique is a permanent solution, because dates sorts and database keys work correctly. Data does not require more space, where space is precious. However, data conversion and bridge programs for legacy files must be written, and programs must be change.
Bridge Programs
Bridge programs translate data between applications or databases with different date formats. If data is converted between databases, the bridge is called a filter. If data is converted between a program and database, the bridge is called a wrapper. Filters are used to convert legacy data into data with expanded or encoded dates. In order to facilitate testing of partly converted and unconverted modifiers, implementations of wrappers and library routines that use the Year 2000-process can be used.
Upgrading system software
Please note that these techniques described above do not work for dates in serial format on 32-bit operating systems. If the lifetime of such a system started on 1/1/1970, on overflow error will occur on 1/19/2038 and the resulting behavior is undefined. Upgrading to a 64-bit version will eliminate this problem.
Conclusion
In this paper we have shown what Year 2000-critical components are, how they can be tracked across source code, and what common solutions for Year 2000-conversions are. Despite all efforts, testing is the integral process of making software ready for the Year 2000. Before and after applying changes, unit testing, integration and system tests reveal if an application or parts of it can cope with the rollover to the next millennium. The support of efficient tools during all phases of a large scale Year 2000-project, however, is invaluable for success.