Running Into Trouble (Running Linux)

3.3.1. Problems with Booting the Installation Media

When attempting to boot the installation media for the first time, you may encounter a number of problems. Note that the following problems are not related to booting your newly installed Linux system. See the section "Section 3.3.4, "Problems After Installing Linux"" for information on these kinds of pitfalls.

Floppy or media error occurs when attempting to boot.

If you received your boot floppy from a mail-order vendor or some other distributor, instead of downloading and creating it yourself, contact the distributor and ask for a new boot floppy--but only after verifying that this is indeed the problem.

System "hangs" during boot or after booting.

After the installation media boots, you see a number of messages from the kernel itself, indicating which devices were detected and configured. After this, you are usually presented with a login prompt, allowing you to proceed with installation (some distributions instead drop you right into an installation program of some kind). The system may appear to "hang" during several of these steps. Be patient; loading software from floppy is very slow. In many cases, the system has not hung at all, but is merely taking a long time. Verify that there is no drive or system activity for at least several minutes before assuming that system is hung.

The proper boot sequence is:

After booting from the LILO prompt, the system must load the kernel image from floppy. This may take several seconds; you know things are going well if the floppy drive light is still on.

While the kernel boots, SCSI devices must be probed for. If you do not have any SCSI devices installed, the system will "hang" for up to 15 seconds while the SCSI probe continues; this usually occurs after the line:
```
lp_init: lp1 exists (0), using polling driver
```
appears on your screen.
After the kernel is finished booting, control is transferred to the system bootup files on the floppy. Finally, you will be presented with a login prompt, or be dropped into an installation program. If you are presented with a login prompt such as:
```
Linux login:
```
you should then log in (usually as root or install--this varies with each distribution). After entering the username, the system may pause for 20 seconds or more while the installation program or shell is being loaded from floppy. Again, the floppy drive light should be on. Don't assume the system is hung.

Each of the preceding activities may cause a delay that makes you think the system has stopped. However, it is possible that the system actually may "hang" while booting, which can be due to several causes. First of all, you may not have enough available RAM to boot the installation media. (See the following item for information on disabling the ramdisk to free up memory.)

Hardware incompatibility causes many system hangs. The section "Section 1.9, "Hardware Requirements"" in Chapter 1, "Introduction to Linux", presents an overview of supported hardware under Linux. Even if your hardware is supported, you may run into problems with incompatible hardware configurations that are causing the system to hang. See the next section, "Section 3.3.2, "Hardware Problems"," for a discussion of hardware incompatibilities:

System reports out-of-memory errors while attempting to boot or install the software.

This problem relates to the amount of RAM you have available. On systems with 4 MB of RAM or less, you may run into trouble booting the installation media or installing the software itself. This is because many distributions use a "ramdisk," which is a filesystem loaded directly into RAM, for operations while using the installation media. The entire image of the installation boot floppy, for example, may be loaded into a ramdisk, which may require more than 1 MB of RAM.

The solution to this problem is to disable the ramdisk option when booting the install media. Each release has a different procedure for doing this; on the SLS release, for example, you type floppy at the LILO prompt when booting the a1 disk. See your distribution's documentation for details.

You may not see an "out of memory" error when attempting to boot or install the software; instead, the system may unexpectedly hang or fail to boot. If your system hangs, and none of the explanations in the previous section seem to be the cause, try disabling the ramdisk.

Keep in mind that Linux itself requires at least 4 MB of RAM to run at all; almost all current distributions of Linux require 8 MB or more.

The system reports an error, such as "Permission denied" or "File not found" while booting.

This is an indication that your installation boot media is corrupt. If you attempt to boot from the installation media (and you're sure you're doing everything correctly), you should not see any errors such as this. Contact the distributor of your Linux software and find out about the problem, and perhaps obtain another copy of the boot media if necessary. If you downloaded the boot disk yourself, try recreating the boot disk, and see if this solves your problem.

The system reports the error "VFS: Unable to mount root" when booting.

This error message means that the root filesystem (found on the boot media itself) could not be found. This means that either your boot media is corrupt or that you are not booting the system correctly.

For example, many CD-ROM distributions require you to have the CD-ROM in the drive when booting. Also be sure that the CD-ROM drive is on, and check for any activity. It's also possible the system is not locating your CD-ROM drive at boot time; see the next section, "Section 3.3.2, "Hardware Problems"," for more information.

If you're sure you are booting the system correctly, then your boot media may indeed be corrupt. This is an uncommon problem, so try other solutions before attempting to use another boot floppy or tape.

3.3.2. Hardware Problems

The most common problem encountered when attempting to install or use Linux is an incompatibility with hardware. Even if all your hardware is supported by Linux, a misconfiguration or hardware conflict can sometimes cause strange results: your devices may not be detected at boot time, or the system may hang.

It is important to isolate these hardware problems if you suspect they may be the source of your trouble. In the following sections, we describe some common hardware problems and how to resolve them.

3.3.2.1. Isolating hardware problems

If you experience a problem you believe is hardware-related, the first thing to do is attempt to isolate the problem. This means eliminating all possible variables and (usually) taking the system apart, piece-by-piece, until the offending piece of hardware is isolated.

This is not as frightening as it may sound. Basically, you should remove all nonessential hardware from your system (after turning the power off), and then determine which device is actually causing the trouble--possibly by reinserting each device, one at a time. This means you should remove all hardware other than the floppy and video controllers, and, of course, the keyboard. Even innocent-looking devices, such as mouse controllers, can wreak unknown havoc on your peace of mind unless you consider them nonessential.

For example, let's say the system hangs during the Ethernet board detection sequence at boot time. You might hypothesize that there is a conflict or problem with the Ethernet board in your machine. The quick and easy way to find out is to pull the Ethernet board and try booting again. If everything goes well when you reboot, then you know that either the Ethernet board is not supported by Linux (see the section "Section 1.9, "Hardware Requirements"" in Chapter 1, "Introduction to Linux" for a list of compatible boards), or there is an address or IRQ conflict with the board. In addition, some badly designed network boards (mostly NE2000 clones) can hang the entire system when they auto-probed. If this appears to be the case for you, your best bet is to remove the network board from the system during the installation and put it back in later, or pass the appropriate kernel parameters during boot-up so that auto-probing of the network board can be avoided. The most permanent fix is to dump that card and get a new one from another vendor that designs its hardware more carefully.

"Address or IRQ conflict?" What on earth does that mean? All devices in your machine use an interrupt request line, or IRQ, to tell the system they need something done on their behalf. You can think of the IRQ as a cord the device tugs when it needs the system to take care of some pending request. If more than one device is tugging on the same cord, the kernel won't be able to determine which device it needs to service. Instant mayhem.

Therefore, be sure all your installed devices are using unique IRQ lines. In general, the IRQ for a device can be set by jumpers on the card; see the documentation for the particular device for details. Some devices do not require an IRQ at all, but it is suggested you configure them to use one if possible (the Seagate ST01 and ST02 SCSI controllers being good examples).

In some cases, the kernel provided on your installation media is configured to use a certain IRQ for certain devices. For example, on some distributions of Linux, the kernel is preconfigured to use IRQ 5 for the TMC-950 SCSI controller, the Mitsumi CD-ROM controller, and the busmouse driver. If you want to use two or more of these devices, you'll need first to install Linux with only one of these devices enabled, then recompile the kernel in order to change the default IRQ for one of them. (See the section "Section 7.4, "Building a New Kernel"" in Chapter 7, "Upgrading Software and the Kernel", for information on recompiling the kernel.)

Another area where hardware conflicts can arise is with direct memory access (DMA) channels, I/O addresses, and shared memory addresses. All these terms describe mechanisms through which the system interfaces with hardware devices. Some Ethernet boards, for example, use a shared memory address as well as an IRQ to interface with the system. If any of these are in conflict with other devices, the system may behave unexpectedly. You should be able to change the DMA channel, I/O, or shared memory addresses for your various devices with jumper settings. (Unfortunately, some devices don't allow you to change these settings.)

The documentation for your various hardware devices should specify the IRQ, DMA channel, I/O address, or shared memory address the devices use, and how to configure them. Again, the simple way to get around these problems is to temporarily disable the conflicting devices until you have time to determine the cause of the problem.

Table 3-1 is a list of IRQ and DMA channels used by various "standard" devices found on most systems. Almost all systems have some of these devices, so you should avoid setting the IRQ or DMA of other devices to these values.

Table 3-1. Common Device Settings

Device	I/O address	IRQ	DMA
ttyS0 (COM1)	3f8	4	n/a
ttyS1 (COM2)	2f8	3	n/a
ttyS2 (COM3)	3e8	4	n/a
ttyS3 (COM4)	2e8	3	n/a
lp0 (LPT1)	378 - 37f	7	n/a
lp1 (LPT2)	278 - 27f	5	n/a
fd0, fd1 (floppies 1 and 2)	3f0 - 3f7	6	2
fd2, fd3 (floppies 3 and 4)	370 - 377	10	3

3.3.2.2. Problems recognizing hard drive or controller

When Linux boots, you see a series of messages on your screen such as the following:

Console: colour EGA+ 80x25, 8 virtual consoles 
Serial driver version 3.96 with no serial options enabled 
tty00 at 0x03f8 (irq = 4) is a 16450 
tty03 at 0x02e8 (irq = 3) is a 16550A 
lp_init: lp1 exists (0), using polling driver 
…

Here, the kernel is detecting the various hardware devices present on your system. At some point, you should see the line:

Partition check:

followed by a list of recognized partitions, for example:

Partition check: 
  hda: hda1 hda2 
  hdb: hdb1 hdb2 hdb3

If, for some reason, your drives or partitions are not recognized, you will not be able to access them in any way.

There are several conditions that can cause this to happen:

Hard drive or controller not supported: If you are using a hard drive controller (IDE, SCSI, or otherwise) not supported by Linux, the kernel will not recognize your partitions at boot time.
Drive or controller improperly configured

3.3.2.3. Problems with SCSI controllers and devices

Presented here are some of the most common problems with SCSI controllers and devices, such as CD-ROMs, hard drives, and tape drives. If you are having problems getting Linux to recognize your drive or controller, read on. Let us again emphasize that most distributions use a modularized kernel and that you might have to load a module supporting your hardware during an early phase of the installation process. This might also be done automatically for you.

The Linux SCSI HOWTO contains much useful information on SCSI devices in addition to that listed here. SCSIs can be particularly tricky to configure at times.

It might be economizing on the false end, for example, if you use cheap cables, especially if you use wide SCSI. Cheap cables are a major source of problems and can cause all kinds of failures, as well as major headaches. If you use SCSI, use proper cabling.

Here are common problems and possible solutions:

A SCSI device is detected at all possible IDs.: This problem occurs when the system straps the device to the same address as the controller. You need to change the jumper settings so that the drive uses a different address from the controller itself.
Linux reports sense errors, even if the devices are known to be error-free.

If your SCSI controller is not recognized, you may need to force hardware detection at boot time. This is particularly important for SCSI controllers without BIOS. Most distributions allow you to specify the controller IRQ and shared memory address when booting the installation media. For example, if you are using a TMC-8xx controller, you may be able to enter:

boot: linux tmx8xx=interrupt,memory-address

at the LILO boot prompt, where interrupt is the IRQ of controller, and memory-address is the shared memory address. Whether you can do this depends on the distribution of Linux you are using; consult your documentation for details.

3.3.3. Problems Installing the Software

Installing the Linux software should be trouble free if you're lucky. The only problems you might experience would be related to corrupt installation media or lack of space on your Linux filesystems. Here is a list of common problems:

System reports "Read error, file not found," or other errors while attempting to install the software.

If you have other strange errors when installing Linux (especially if you downloaded the software yourself), be sure you actually obtained all of the necessary files when downloading.

For example, some people use the FTP command:

mget *.*

when downloading the Linux software via FTP. This will download only those files that contain a "." in their filenames; if there are any files without the "." you will miss them. The correct command to use in this case is:

mget *

The best advice is to retrace your steps when something goes wrong. You may think that you have done everything correctly, when in fact you forgot a small but important step somewhere along the way. In many cases, just attempting to re-download or reinstall the Linux software can solve the problem. Don't beat your head against the wall any longer than you have to!

Also, if Linux unexpectedly hangs during installation, there may be a hardware problem of some kind. See the section "Section 3.3.2, "Hardware Problems"" for hints.

3.3.4. Problems After Installing Linux

You've spent an entire afternoon installing Linux. In order to make space for it, you wiped your Windows and OS/2 partitions and tearfully deleted your copies of SimCity 2000 and Railroad Tycoon 2. You reboot the system and nothing happens. Or, even worse, something happens, but it's not what should happen. What do you do?

In the section "Section 3.3.1, "Problems with Booting the Installation Media"," earlier in this chapter, we covered the most common problems that can occur when booting the Linux installation media; many of those problems may apply here. In addition, you may be victim to one of the following maladies.

3.3.4.1. Problems booting Linux from floppy

If you are using a floppy to boot Linux, you may need to specify the location of your Linux root partition at boot time. This is especially true if you are using the original installation floppy itself and not a custom boot floppy created during installation.

While booting the floppy, hold down the Shift or Control key. This should present you with a boot menu; press Tab to see a list of available options. For example, many distributions allow you to boot from a floppy by entering:

boot: linux root=partition

at the boot menu, where partition is the name of the Linux root partition, such as /dev/hda2. SuSE Linux offers a menu entry early in the installation program that boots your newly created Linux system from the installation boot floppy. Consult the documentation for your distribution for details.

3.3.4.2. Problems booting Linux from the hard drive

If you opted to install LILO instead of creating a boot floppy, you should be able to boot Linux from the hard drive. However, the automated LILO installation procedure used by many distributions is not always perfect. It may make incorrect assumptions about your partition layout, in which case you need to reinstall LILO to get everything right. Installing LILO is covered in the section "Section 5.2.2, "Using LILO"" in Chapter 5, "Essential System Management".

Here are some common problems:

System reports "Drive not bootable-Please insert system disk."

You will get this error message if the hard drive's master boot record is corrupt in some way. In most cases, it's harmless, and everything else on your drive is still intact. There are several ways around this:

While partitioning your drive using fdisk, you may have deleted the partition that was marked as "active." MS-DOS and other operating systems attempt to boot the "active" partition at boot time (Linux, in general, pays no attention to whether the partition is "active," but the Master Boot Records installed by some distributions like Debian do). You may be able to boot MS-DOS from floppy and run FDISK to set the active flag on your MS-DOS partition, and all will be well.
Another command to try (with MS-DOS 5.0 and higher) is:
```
FDISK /MBR
```
This command will attempt to rebuild the hard drive master boot record for booting MS-DOS, overwriting LILO. If you no longer have MS-DOS on your hard drive, you'll need to boot Linux from floppy and attempt to install LILO later.
If you created an MS-DOS partition using Linux's version of fdisk, or vice versa, you may get this error. You should create MS-DOS partitions only by using MS-DOS's version of FDISK. (The same applies to operating systems other than MS-DOS.) The best solution here is either to start from scratch and repartition the drive correctly, or to merely delete and recreate the offending partitions using the correct version of fdisk.
The LILO installation procedure may have failed. In this case, you should boot either from your Linux boot floppy (if you have one), or from the original installation media. Either of these should provide options for specifying the Linux root partition to use when booting. At boot time, hold down the Shift or Control key and press Tab from the boot menu for a list of options.

When you boot the system from the hard drive, MS-DOS (or another operating system) starts instead of Linux.

First of all, be sure you actually installed LILO when installing the Linux software. If not, the system will still boot MS-DOS (or whatever other operating system you may have) when you attempt to boot from the hard drive. In order to boot Linux from the hard drive, you need to install LILO (see the section "Section 5.2.2, "Using LILO"" in Chapter 5, "Essential System Management").

On the other hand, if you did install LILO, and another operating system boots instead of Linux, then you have LILO configured to boot that other operating system by default. While the system is booting, hold down the Shift or Control key and press Tab at the boot prompt. This should present you with a list of possible operating systems to boot; select the appropriate option (usually just linux) to boot Linux.

If you wish to select Linux as the default operating system to boot, you will need to reinstall LILO.

It also may be possible that you attempted to install LILO, but the installation procedure failed in some way. See the previous item on installation.

3.3.4.3. Problems logging in

After booting Linux, you should be presented with a login prompt:

Linux login:

At this point, either the distribution's documentation or the system itself will tell you what to do. For many distributions, you simply log in as root, with no password. Other possible usernames to try are guest or test.

Most Linux distributions ask you for an initial root password. Hopefully, you have remembered what you typed in during installation; you will need it again now. If your distribution does not ask you for a root password during installation, you can try using an empty password.

If you simply can't log in, consult your distribution's documentation; the username and password to use may be buried in there somewhere. The username and password may have been given to you during the installation procedure, or they may be printed on the login banner.

One possible cause of this password impasse may be a problem with installing the Linux login and initialization files. If this is the case, you may need to reinstall (at least parts of) the Linux software, or boot your installation media and attempt to fix the problem by hand.

3.3.4.4. Problems using the system

If login is successful, you should be presented with a shell prompt (such as # or $) and can happily roam around your system. The next step in this case is to try the procedures in Chapter 4, "Basic Unix Commands and Concepts". However, there are some initial problems with using the system that sometimes creep up.

The most common initial configuration problem is incorrect file or directory permissions. This can cause the error message:

Shell-init: permission denied

to be printed after logging in. (In fact, any time you see the message permission denied, you can be fairly certain it is a problem with file permissions.)

In many cases, it's a simple matter of using the chmod command to fix the permissions of the appropriate files or directories. For example, some distributions of Linux once used the incorrect file mode 0644 for the root directory ( / ). The fix was to issue the command:

# chmod 755 /

as root. (File permissions are covered by the section "Section 4.13, "File Ownership and Permissions"" in Chapter 4, "Basic Unix Commands and Concepts".) However, in order to issue this command, you needed to boot from the installation media and mount your Linux root filesystem by hand--a hairy task for most newcomers.

As you use the system, you may run into places where file and directory permissions are incorrect, or software does not work as configured. Welcome to the world of Linux! While most distributions are quite trouble-free, you can't expect them to be perfect. We don't want to cover all of those problems here. Instead, throughout the book we help you to solve many of these configuration problems by teaching you how to find them and fix them yourself. In Chapter 1, "Introduction to Linux", we discussed this philosophy in some detail. In Chapter 5, "Essential System Management", we give hints for fixing many of these common configuration problems.


3.2. Post-Installation Procedures		4. Basic Unix Commands and Concepts

3.3. Running Into Trouble