16.8 Designing File Formats
Suppose
you are designing a program to produce a graph. The height, width,
limits, and scales are to be defined in a graph configuration file.
You are also assigned to write a user-friendly program that asks the
operator questions and writes a configuration file so he or she does
not have to learn the text editor. How should you design a
configuration file?
One way would be as follows:
- height (in inches)
- width (in inches)
- x lower limit
- x upper limit
- y lower limit
- y upper limit
- x-scale
- y-scale
A typical plotter configuration file might look like:
10.0
7.0
0
100
30
300
0.5
2.0
This file does contain all the data, but in looking at it, you have
trouble identifying what, for example, is the value of the Y lower
limit. A solution is to comment the file so the configuration program
writes out not only the data, but also a string describing the data.
10.0 height (in inches)
7.0 width (in inches)
0 x lower limit
100 x upper limit
30 y lower limit
300 y upper limit
0.5 x-scale
2.0 y-scale
Now the file is human-readable. But suppose a user runs the plot
program and types in the wrong filename, and the program gets the
lunch menu for today instead of a plot configuration file. The
program is probably going to get very upset when it tries to
construct a plot whose dimensions are "BLT on
white" versus "Meatloaf and
gravy."
The result is that you wind up with egg on your face. There should be
some way of identifying a file as a plot configuration file. One
method of doing this is to put the words "Plot
Configuration File" on the first line of the file.
Then, when someone tries to give your program the wrong file, the
program will print an error message.
This takes care of the wrong file problem, but what happens when you
are asked to enhance the program and add optional logarithmic
plotting? You could simply add another line to the configuration
file, but what about all those old files? It's not
reasonable to ask everyone to throw them away. The best thing to do
(from a user's point of view) is to accept old
format files. You can make this easier by putting a
version number in the file.
A typical file now looks like:
Plot Configuration File V1.0
log Logarithmic or normal plot
10.0 height (in inches)
7.0 width (in inches)
0 x lower limit
100 x upper limit
30 y lower limit
300 y upper limit
0.5 x-scale
2.0 y-scale
In binary files, it is common practice to put an identification
number in the first four bytes of the file. This is called the
magic number
. The magic number should be different for
each type of file.
One method for choosing a magic number is to start with the first
four letters of the program name (e.g., list)
and convert them to hex: 0x6c607374. Then add 0x80808080 to the
number: 0xECE0F3F4.
This generates a magic number that is probably unique. The high bit
is set on each byte to make the byte non-ASCII and avoid confusion
between ASCII and binary files. On most Unix systems and Linux,
you'll find a file called
/etc/magic, which contains information on other
magic numbers used by various programs.
When reading and writing a binary
file containing many different types of structures, it is easy to get
lost. For example, you might read a name structure when you expected
a size structure. This is usually not detected until later in the
program. To locate this problem early, you can put magic numbers at
the beginning of each structure. Then if the program reads the name
structure and the magic number is not correct, it knows something is
wrong.
Magic numbers for structures do not need to have the high bit set on
each byte. Making the magic number just four ASCII characters makes
it easy to pick out the beginning of structures in a file
dump.
|