I l@ve RuBoard

16.8 Designing File Formats

Suppose you are designing a program to produce a graph. The height, width, limits, and scales are to be defined in a graph configuration file. You are also assigned to write a user-friendly program that asks the operator questions and writes a configuration file so he or she does not have to learn the text editor. How should you design a configuration file?

One way would be as follows:

height (in inches)

width (in inches)

x lower limit

x upper limit

y lower limit

y upper limit

x-scale

y-scale

A typical plotter configuration file might look like:

This file does contain all the data, but in looking at it, you have trouble identifying what, for example, is the value of the Y lower limit. A solution is to comment the file so the configuration program writes out not only the data, but also a string describing the data.

10.0			height (in inches) 
7.0			width (in inches) 
0			x lower limit 
100			x upper limit 
30			y lower limit 
300			y upper limit 
0.5			x-scale  
2.0			y-scale

Now the file is human-readable. But suppose a user runs the plot program and types in the wrong filename, and the program gets the lunch menu for today instead of a plot configuration file. The program is probably going to get very upset when it tries to construct a plot whose dimensions are "BLT on white" versus "Meatloaf and gravy."

The result is that you wind up with egg on your face. There should be some way of identifying a file as a plot configuration file. One method of doing this is to put the words "Plot Configuration File" on the first line of the file. Then, when someone tries to give your program the wrong file, the program will print an error message.

This takes care of the wrong file problem, but what happens when you are asked to enhance the program and add optional logarithmic plotting? You could simply add another line to the configuration file, but what about all those old files? It's not reasonable to ask everyone to throw them away. The best thing to do (from a user's point of view) is to accept old format files. You can make this easier by putting a version number in the file.

A typical file now looks like:

Plot Configuration File V1.0 
log			Logarithmic or normal plot 
10.0			height (in inches) 
7.0			width (in inches) 
0			x lower limit 
100			x upper limit 
30			y lower limit 
300			y upper limit 
0.5			x-scale  
2.0			y-scale

In binary files, it is common practice to put an identification number in the first four bytes of the file. This is called the magic number . The magic number should be different for each type of file.

One method for choosing a magic number is to start with the first four letters of the program name (e.g., list) and convert them to hex: 0x6c607374. Then add 0x80808080 to the number: 0xECE0F3F4.

This generates a magic number that is probably unique. The high bit is set on each byte to make the byte non-ASCII and avoid confusion between ASCII and binary files. On most Unix systems and Linux, you'll find a file called /etc/magic, which contains information on other magic numbers used by various programs.

When reading and writing a binary file containing many different types of structures, it is easy to get lost. For example, you might read a name structure when you expected a size structure. This is usually not detected until later in the program. To locate this problem early, you can put magic numbers at the beginning of each structure. Then if the program reads the name structure and the magic number is not correct, it knows something is wrong.

Magic numbers for structures do not need to have the high bit set on each byte. Making the magic number just four ASCII characters makes it easy to pick out the beginning of structures in a file dump.

I l@ve RuBoard