I l@ve RuBoard Previous Section Next Section

16.4 The End-of-Line Puzzle

Back in the dark ages BC (Before Computers), there existed a magical device called a Teletype Model 33. This amazing machine contained a shift register made out of a motor and a rotor as well as a keyboard ROM consisting solely of levers and springs.

The Teletype contained a keyboard, printer, and paper tape reader/punch. It could transmit messages over telephones using a modem at the blazing rate of 10 characters per second.

But Teletype had a problem. It took 0.2 seconds to move the printhead from the right side to the left. 0.2 seconds is two character times. If a second character came while the printhead was in the middle of a return, that character was lost.

The Teletype people solved this problem by making end-of-line two characters: <carriage return> to position the printhead at the left margin, and <line feed> to move the paper up one line. That way the <line feed> "printed" while the printhead was racing back to the left margin.

When the early computers came out, some designers realized that using two characters for end-of-line wasted storage (at this time storage was very expensive). Some picked <line feed> for their end-of-line, and some chose <carriage return>. Some of the die-hards stayed with the two-character sequence.

Unix uses <line feed> for end-of-line. The newline character \n is code 0xA (LF or <line feed>).

MS-DOS/Windows uses the two characters <carriage return><line feed>. Compiler designers had problems dealing with the old C programs that thought newline was just <line feed>. The solution was to add code to the I/O library that stripped out the <carriage return> characters from ASCII input files and changed <line feed> to <carriage return><line feed> on output.

In MS-DOS/Windows, whether or not a file is opened as ASCII or binary is important to note. The flag std::ios::binary is used to indicate a binary file:

// Open ASCII file for reading
ascii_file.open("name", std::ios::in);         

// Open binary file for reading 
binary_file.open("name", std::ios::in|std::ios::binary);       

Unix programmers don't have to worry about the C++ library automatically fixing their ASCII files. In Unix, a file is a file, and ASCII is no different from binary. In fact, you can write a half-ASCII/half-binary file if you want to.

Question 16-1: The member function put can be used to write out a single byte of a binary file. The following program (shown in Example 16-4) writes numbers 0 to 127 to a file called test.out. It works just fine in Unix, creating a 128-byte long file; however, in MS-DOS/Windows, the file contains 129 bytes. Why?

Example 16-4. wbin/wbin.cpp
#include <iostream>
#include <fstream>
#include <cstdlib>

int main(  )
{
    int cur_char;   // current character to write 
    std::ofstream out_file; // output file 

    out_file.open("test.out", std::ios::out);
    if (out_file.bad(  )) {
        (std::cerr << "Can not open output file\n");
        exit (8);
    }

    for (cur_char = 0; cur_char < 128; ++cur_char) {
        out_file << cur_char;
    }
    return (0);
}

Hint: Here is a hex dump of the MS-DOS/Windows file:

000:0001 0203 0405 0607 0809 0d0a 0b0c 0d0e 
010:0f10 1112 1314 1516 1718 191a 1b1c 1d1e 
020:1f20 2122 2324 2526 2728 292a 2b2c 2d2e 
030:2f30 3132 3334 3536 3738 393a 3b3c 3d3e 
040:3f40 4142 4344 4546 4748 494a 4b4c 4d4e 
050:4f50 5152 5354 5556 5758 595a 5b5c 5d5e 
060:5f60 6162 6364 6566 6768 696a 6b6c 6d6e 
070:6f70 7172 7374 7576 7778 797a 7b7c 7d7e 
080:7f 
    I l@ve RuBoard Previous Section Next Section