The Linux Guide Online

Chapter 01 - System Basics

1. System Basics

Before you can think about working on a system you need to know the way the system is organized. You will need to know how the system responds to the various commands, the various processes etc. You also need to know how the system stores the various devices and files. This chapter is an introduction to all these aspects with respect to the Linux system with particular reference to the Red Hat method of distribution.

1.1 System organization

A Linux system (in particular the Red Hat variety) is particularly well organized and fully featured in comparison with other UNIX distributions. The Red Hat is complies with the Linux file system standard the FSSTND. Details about this can be obtained at www.pathname.com/fhs/.

A feature of the FSSTND is that the root '/' directory is very clean and holds the very essential files. The main entries will be something like the following.

bin/
etc/
lost+found/
sbin/
var/
boot/
home/
mnt/
tmp/
dev/
lib/
proc/
usr/

The following sections cover the details of most of these directories except the /dev, /proc and the /boot, which will be covered in detail in the section 1.4.

/bin and the /sbin

Most of the essential programs for using and maintaining a Linux system can be found under these directories. The bin in the name here refers to the fact that the executables in the Linux system are (and called) binary files (binaries).

The bin/ directory mostly holds the most commonly used essential user programs like the login, the various shells (bash, csh, ksh), the various file utilities (cp, mv, rm, ln), the various file system utilities (dd, df, mount, sync), system utilities (uname, hostname, arch) and other utilities (vi, emacs, gcc) etc.

There are also certain other utilities like the archiving utilities (tar, g(un)zip) etc.

The sbin/ on the other hand contains programs that are used in system maintenance (hence the s in the name). Nearly all the utilities in the sbin/ directory can be executed by a user with administrator privileges only. Some programs that can be found here are:

Fsck, fdisk, mkfs, shutdown, lilo, init. All these programs are very powerful. Use of these programs without proper knowledge can cause system wide damage and corrupt the Linux installation.

/etc

This directory is used to store the systemwide configuration files required by many programs. Some of the important files are as follows:

Passwd, shadow, fstab, hosts, inittab, motd, profile, shells, services, lilo.conf.

The first two files in the list, /etc/passwd and /etc/shadow, are the files that define the authorized users for a system. The passwd file has most of the information about the user except the encrypted pass word, that is contained in the shadow passwords (assuming that the shadow passwords has been enabled on the system). Unlike the other files, manual editing of these files is not recommended. User programs such as the adduser to add and edit users on the system (through these files).

The next file is the /etc/fstab file. This is the file system table file. This contains a list of file systems that can be mounted by the system. The data in the file is arranged in terms of lines, with each line containing information about a particular device. A sample line from the file looks like
/dev/hda1 / ext2 defaults 1 1
The first part of this definition is the device name. All devices in Linux are known as files, in the /dev directory. Here the /dev/hda1 refers to the primary master (hda) and specifically to the first partition (the number 1). The second part of the definition is the mount point on the Linux file system. Here the '/' refers to the root partition. Thus this is the definition for the mounting of the root partition. The next entry is for the type of partition to expect there. The other values are other mounting options with the default options being taken here. This place can also force the mount to take place in a particular manner, like read-only for example.

This file also contains two other entries one for swap and the other for the /proc. These are not one of the standard file systems but are special systems that are best left unchanged. Most of the normal fstab entries also have entries to mount CD-ROM and the floppy. To add entries to the file you can use the Red Hat's File System Manager or do it manually. Typically some entries you would like to change are the properties for mounting the CD-ROM and the floppy. You will also add entries to mount you Windows partitions, and mount dos formatted floppies.

The file /etc/hosts contains a list of IP addresses and the corresponding host names (with aliases). This file can be used the first step towards hostname resolution. If you are connected to a private intranet, without a nameserver, this file could be you source of resolution.

The file /etc/motd has the Message Of The Day. This could be anything that the administrator wants the users to take note of. The contents of this file are generally displayed at login.

The file /etc/profile if the system wide equivalent of the .profile file in the user home directory. It the default initialization file for shells like the bash. Mostly it is used to set the variable like $PATH and the PS1 (for your prompt). This file is not a place for personal initialization because this file is read both by the users and some scripts as well.

The /etc/shells is a list of "approved" shells for the users. This is to prevent users from accidentally change their shells to something unusable. The /etc/services file on the other hand is a list of services that run on the various ports of the system. It lists the various services, the port numbers and the type of service.

The /etc/lilo.conf is the configuration file for LILO. This has already been discussed in the chapter on installation of Red Hat Linux.

There are also subdirectories under the Linux system where many other configuration files and scripts are stored. The sub directory etc/rc.d/ contains all the startup and shutdown scripts on the system. The init sub directory actually stores all these files with the various initialization states having symbolic links.

In addition to the files discussed there are a number of files in the /etc/ directory that control many things on your Red Hat system. You may browse through the various files. Since they follow a common similar format it will be easy to at least guess their purpose, rather then understanding what they actually do.

/home

This is the directory where the home directories of all the users of the system are stored (except the root). This also includes the home directories like HTTPD, psql etc. (Red Hat 7.1 has moved the httpd to /var/)

/mnt

Conventionally this is where the removable file systems are mounted. CD-ROMs, floppies, Zip or Jaz disks are mounted under subdirectories. Note that this is merely convention and the mounting can be done on any directory. But this method makes system administration easy and also keeps the '/' clean.

/tmp and /var

these directories keep the dirty and the changing files. The /tmp is the temporary file dumping ground. If you have an old file there that can be safely removed. Conventionally users make subdirectories under this directory to keep their files. This also acts as the starting point to unzip, compile and build binaries for installation.

The /var holds the changing data on the system. It is definitely more structured than the /tmp. It has place for logs, mail spool etc.

/usr

This is where most of the programs for the users of the system are stored. The /usr/bin and the /usr/sbin store the majority of the executables on the system. The sbin again generally needs the root access. There are directories where the whole of the X server is stored. The /usr/opt is similar to the /opt directory where third party tools and applications are stored. The /usr/local contains most of the libraries, local programs and man pages. The /usr/dict contains the local dictionary for the system. The words file is an interesting file to look at. The dictionary for the Red Hat spell checker - ispell is under the directory /usr/lib/ispell.

1.2 RPM

One of the most powerful and innovative tools available in the Red Hat flavor (that is one of the reasons for its popularity) is the Red Hat Package Manager (RPM). This utility can be used with distributed precompiled binaries in a form similar to the Windows installers. The RPM can install, upgrade, query, verify software packages from the users point of view.

A software package built with RPM is an archive of files and some associated information, such as a name, a version, and a description. Following are a few of the advantages over the traditional tar.gz method.

  • Pre compiled - most of the tar.gz distributions are source files that need compilations that need to be compiled before they can be used.
  • Upgradation - it is possible to upgrade the binaries only without losing the customization files.
  • Uninstallation - all the files that came with the installation can be easy and cleanly installed.
  • Verification - the installation can be checked for correctness after installation.
  • Querying - this forms another source of information about the package and can thus help in knowing about the programs before installation itself.
  • Ease - this it perhaps the most important of all the advantages. Installation now does not need any specialized information.

The use of the RPM packages can be done for all the modes as follows. There are also other options that can be seen from the man or the info pages.

Install rpm -i
Upgrade rpm -U
Uninstall rpm -e (erase)
Query rpm -q
Verify rpm -v

Installing using RPM

The general syntax is
rpm -i [options] [packages]

The packages is the path (full or in the present directory) to the ".rpm" file. There are a number options that can be used. Here is a listing of some of the important ones.

-v Prints what the RPM is doing (verbose)
-h Prints hashes "#" as the package is being installed
--test Tests the package and does not actually install anything. Useful for catching conflicts.
--nodeps Installs a package without performing any dependency checks. This is a very powerful and dangerous option. This may not allow the program to work properly.
--force Forces the installation of the package irrespective of any error.

Upgrading using RPM

The general syntax is
rpm -U [options] [packages]

The upgrade is a combination of two operations, uninstall and install. First the RPM checks to see if there are any older versions of the requested package available. Then it removes them and installs the newer one. If there is no previous version, of the package, then it just installs the package. The additional advantage is that the upgrade automatically saves the configuration files. Hence the new installation need not be reconfigured. An additional point to note here is that there might be a problem if the format of the file changed between the two versions. This can be noted from the release notes of the package.

Uninstalling packages

The general syntax is
rpm -e [options] [package]

Here the package is the name of the packages and not the rpm file. For example the name of the dos emulator package is "dosemu" while the name of the package could be say "dosemu-0.64.1-1.i386.rpm" Use the name dosemu to uninstall it.

Another common error while trying to uninstall packages is a dependency error when a package that is being uninstalled has files that are required by another package. The -nodeps option can make it ignore such errors. The -test option, which again does not actually do the uninstall, but only goes through the motions of doing so. The uninstall general does not give any output, therefore the use of the -vv option is advised.

Querying packages using RPM

The general syntax is
rpm -q [package]

Here again the name of the package is the name of the package, not the ".rpm" file. A simple query returns the name of the ".rpm" package. The query option generally is used with another options for the command to be actually useful.

-l lists the files that are part of the package.
-s outputs the state of the package.
-d lists the documentation files as part of the package.
-c lists the configuration files as part of the package.
-i information about an installed package.
-a lists all the installed package.
-f file lists the package that owns the specified file.
-p package lists the package name of the specified package.

Also any of the above five options given along with the -p package option, does the querying not for an installed package but for an rpm file.

Verifying Packages

The general syntax is
rpm -V [package]

Verifying is an easy way to determine any problems with an installation. In verification, RPM compares the information about the various files with the original information that is a part of the installation. If the RPM detects a difference between the database record and the installed package, it outputs an 8-character string, where tests that fail are represented by a single character and tests that pass are represented by a '.'. The characters for failed tests are as follows:

5 MD5 Sum
S File Size
L Symlink
T Mtime
D Device
U User
G Group
M Mode (permissions and file type)

In addition, you can user the query option -f to verify a package that a file come in.

rpm -Vf [filename]

Will therefore verify the package that installed the filename.

1.3 The Boot Process

Now that you know the basics about the Linux system and know about the RPM, it is time to get a little hardware oriented. In this chapter we will look into the process of booting and shutdown of a Linux machine and its configuration. Here we will also cover system crashes and what to do when your system won't boot.

You would already have installed the LILO or must be using some method of getting to and executing the Linux kernel. In Intel based machines this is what happens. PCs start by looking at the first sector of the boot drive and look for code to load and execute. The drive where the machine searches for this bootable code called the boot record, can be changed through your system BIOS. Programs like LILO operate by writing themselves to this boot sector and upon being executed by the system, take input from the user and boot into one of the one or more alternative OSes.

In case of Linux LILO runs and then executes the Linux kernel whose location will already have been specified in the LILO.conf. Once the kernel is loaded, Linux then loads and executes the init command. This command is the first "process" that runs on the system and is therefore known as the father of all processes. All processes spawn from this init process. Please note that the use of process here is different from the word used in the title of this chapter. A "process" from the view of a Linux system is a thread of execution that is looked at by the kernel as one logical unit. The process will have a number of a attributes when it is executing. And all processes (except the init) need a parent process.

The init and the inittab file

The init command of the Linux system is compatible with the init command of another version of UNIX the System V command. Although init runs as the "last step of kernel booting" this is the first command that initializes the and configures the system you use. The program runs by parsing the file /etc/inittab and running the scripts in /etc/rc.d/ according to the default run level. Each of these scripts starts (or stops) a Linux service (or daemon).

Run levels

You may open the /etc/inittab file and look at the entries. When you come to the list of all run times on your system look at what each of the do.
These run levels are basically selections of scripts that run at each run level. That is say the scripts A,B,C run for the Single user mode. The say in the Full multiuser mode the scripts D,E,F run in addition that allow the operations of networking. Similarly using the init state of 5 for the graphical login, the scripts for the purpose of enabling the X server too run automatically and thus allow graphical logins. The default for the system can be set in the line that looks like
id:3:initdefault:
by changing the number to the desired level. Note that it doesn't make sense to set this to 6 or 0 as they cause your system to reboot and shutdown as soon as it is up. The system will however not stop you if you make such a change and you may have to re-install Linux if you do such a thing.

The sysinit script

The first script that is found in the /etc/inittab file is the rc.sysinit script which does the system initialization. This does a number of tasks (that can be seen from the output that comes on the console) that include checking file systems for errors, mounting them, clearing the mounted file systems table - /etc/mtab, finding module dependencies, deletes a number of entries in the /etc that don't need to be there, setting the system clock, turning swap and initializing the serial ports.

The rc.local script

This is the second script that is rum by init. This can be tweaked to suit your systems requirements.

The next job of Linux is to run all the scripts in the correct rcX.d directory where X is the required runlevel (0 through 6). All these scripts are merely symbolic links to the actual scripts in the /etc/init.d directory. Thus it is possible to select what scripts run in the various runlevels by adding and deleting these links.

Do a long listing of the files in the directory of say the runlevel 3, the /etc/rc3.d/ directory. Notice the links that are displayed to the actual script in the /etc/init.d/ directory. Also notice that all scripts have two items in front of them - a character (S or K) and a number. The number decides the order in which the scripts are to be run. The S or K determines whether the script is started or stopped. When entering a runlevel, all the scripts that start with S are executed in the ascending order of the number in its name. This order is important because it does not make sense (nor is it possible to) start say the sendmail before the network has been started. Similarly the reverse happens when the processes are killed. The higher numbered processes are killed first.

Say you are in runlevel 3 and wish to change to the run level 5; you type the following on a console
init 5
Then all the K scripts of the /etc/rc3.d/ are run and then the S scripts of the /etc/rc5.d/ are executed so that the system is now in the run level 5.

Finally after all this is done the system runs the getty, followed by the login command. Once the user is authenticated the shell for the user is executed and the login command dies. The system is now ready for use.

Shutting down the system.

Based on the above command, it is hence possible to reboot the system by switching into the runlevel 6
init 6
or halt by
init 0

Another way of doing the same more properly is use the shutdown command. This command uses a number of options the important ones discussed here. The -h halts the system and the -r reboots it. There is also a mandatory time gap between the warning and the kill signal. Use the word "now" to do the shutdown immediately.
shutdown -h now
Shuts down the system immediately.

Another method is the three fingered salute (ctrl+alt+del) that will start the shutdown immediately and is equivalent to
shutdown -t3 -r now

System Crashes

Never switch off the system without shutting it down first. Linux is more susceptible to these power offs than a Windows system. The file system that Linux uses is also very sensitive to power offs. Here is a list of do's and don'ts to avoid problems.

  • Don't use Linux as the root user. Create an account and use the root only for maintenance
  • Do make a back up after a clean install and setup
  • Do create the emergency boot and rescue floppies
  • Do Use the shutdown and don't just turn off the machine after working with it
  • Do consider using a UPS
  • Don't disable e2fsck in the rc.sysinit script
  • Do use the sync program to update your filesystem and avoid loss of data and data corruption
  • Do use the file system tools to check your system regularly

OK so you have been the model of all users and have followed the instructions to the letter. But the power supply turned out to be a problem, or one of your own users wrecked havoc on the machine. So the net effect could be one of the following:

Your Red Hat refuses to boot at all
Or it boots and asks you for the root password and drops you into a maintenance shell.

If it is the second case, try the following approach. The fact that you have booted the kernel means that there is some hope of rescue. There is probably a problem with the file systems. Once you are in the shell look at the preceding messages. Most probably the system itself would have given you instructions on how to proceed. It could be as simple as running e2fsck on some of the partition and probably lose some data. Or it could be as tough as locating an alternate descriptor table and use it to restore the file system. No matter what the problem, follow the steps logically and if possible make a list of all the steps taken. This will help you in case you need to approach somebody for help later.

If your system does not boot at all then you could try a rescue of the system. We assume that you followed our advice and made the boot and rescue disks. Boot into the system using the boot disk. At the boot: prompt type "rescue". Follow the prompts and change the diskettes when required. You'll end up with a '#' prompt. Under the /bin directory you will find a minimal set of programs. The idea is to at least get you to a point where you can at least check your existing partitions and possibly mount your drives.

Mount your partitions on a directory, and try to diagnose the problems with the various tools, that are described in the later parts.

To be really effective with any rescue be sure to at least read the man pages of the following commands.

  • badblocks
  • debugfs
  • dump
  • dumpe2fs
  • fsck and e2fsck
  • fstab
  • init and inittab
  • hdparm
  • halt

Althought we hope your system never brings you to it, but if it does, it is better if you are prepared and ready for action than, worrying in vain later. And reinstallation should not be an option unless things are beyond repair. Sticking to the true Linux style you should not look at reinstallation as an alternative.

1.4 File systems, disks and other devices

One thing that makes the Linux system very elegant and uniform is the visualization of all computer peripherals as files that can be accessed from the file system. This allows great uniformity in operation along with flexibility in system administration.

Linux, like UNIX, recognizes two types of file systems. Those that can be accessed serially (such as a tape drive) and those that can be accessed randomly (like the hard disk). Each of the supported device in linux can be found as a device file under the /dev/ directory. When you read or write a device file, the data comes from or goes to the device it represents. This way no special programs (and no special application programming methodology, such as catching interrupts or polling a serial port) are necessary to access devices; for example, to send a file to the printer, one could just say

cat somefile > /dev/lp1

and the contents of the file are printed (the file must, of course, be in a form that the printer understands). However, since it is not a good idea to have several people cat their files to the printer at the same time, one usually uses a special program to send the files to be printed (usually lpr). This program makes sure that only one file is being printed at a time, and will automatically send files to the printer as soon as it finishes with the previous file. Something similar is needed for most devices. In fact, one seldom needs to worry about device files at all.

Since devices show up as files in the filesystem (in the /dev directory), it is easy to see just what device files exist, using ls or another suitable command. In the output of ls -l, the first column contains the type of the file and its permissions. Looking at the output of such a listing one can see the following characters at the front of the description for the file properties. A "-" represents an ordinary file, for directories it is 'd', for character devices it is 'c' and it is 'b' for block devices.

Note that usually all device files exist even though the device itself might be not be installed. So just because you have a file /dev/sda, it doesn't mean that you really do have an SCSI hard disk. Having all the device files makes the installation programs simpler, and makes it easier to add new hardware (there is no need to find out the correct parameters for and create the device files for the new device).

The Hard Disk

This subsection introduces terminology related to hard disks. If you already know the terms and concepts, you can skip this subsection.

A hard disk consists of one or more circular platters, of which either or both surfaces are coated with a magnetic substance used for recording the data. For each surface, there is a read-write head that examines or alters the recorded data. The platters rotate on a common axis; a typical rotation speed is 3600 rotations per minute, although high-performance hard disks have higher speeds. The heads move along the radius of the platters; this movement combined with the rotation of the platters allows the head to access all parts of the surfaces.

The processor (CPU) and the actual disk communicate through a disk controller. This relieves the rest of the computer from knowing how to use the drive, since the controllers for different types of disks can be made to use the same interface towards the rest of the computer. Therefore, the computer can say just ``hey disk, gimme what I want'', instead of a long and complex series of electric signals to move the head to the proper location and waiting for the correct position to come under the head and doing all the other unpleasant stuff necessary. (In reality, the interface to the controller is still complex, but much less so than it would otherwise be.) The controller can also do some other stuff, such as caching, or automatic bad sector replacement.

The above is usually all one needs to understand about the hardware. There is also a bunch of other stuff, such as the motor that rotates the platters and moves the heads, and the electronics that control the operation of the mechanical parts, but that is mostly not relevant for understanding the working principle of a hard disk.

The surfaces are usually divided into concentric rings, called tracks, and these in turn are divided into sectors. This division is used to specify locations on the hard disk and to allocate disk space to files. To find a given place on the hard disk, one might say ``surface 3, track 5, sector 7''. Usually the number of sectors is the same for all tracks, but some hard disks put more sectors in outer tracks (all sectors are of the same physical size, so more of them fit in the longer outer tracks). Typically, a sector will hold 512 bytes of data. The disk itself can't handle smaller amounts of data than one sector.

Each surface is divided into tracks (and sectors) in the same way. This means that when the head for one surface is on a track, the heads for the other surfaces are also on the corresponding tracks. All the corresponding tracks taken together are called a cylinder. It takes time to move the heads from one track (cylinder) to another, so by placing the data that is often accessed together (say, a file) so that it is within one cylinder, it is not necessary to move the heads to read all of it. This improves performance. It is not always possible to place files like this; files that are stored in several places on the disk are called fragmented.

The number of surfaces (or heads, which is the same thing), cylinders, and sectors vary a lot; the specification of the number of each is called the geometry of a hard disk. The geometry is usually stored in a special, battery-powered memory location called the CMOS RAM, from where the operating system can fetch it during bootup or driver initialization.

Unfortunately, the BIOS [2] has a design limitation, which makes it impossible to specify a track number that is larger than 1024 in the CMOS RAM, which is too little for a large hard disk. To overcome this, the hard disk controller lies about the geometry, and translates the addresses given by the computer into something that fits reality. For example, a hard disk might have 8 heads, 2048 tracks, and 35 sectors per track. [3] Its controller could lie to the computer and claim that it has 16 heads, 1024 tracks, and 35 sectors per track, thus not exceeding the limit on tracks, and translates the address that the computer gives it by halving the head number, and doubling the track number. The math can be more complicated in reality, because the numbers are not as nice as here (but again, the details are not relevant for understanding the principle). This translation distorts the operating system's view of how the disk is organized, thus making it impractical to use the all-data-on-one-cylinder trick to boost performance.

The translation is only a problem for IDE disks. SCSI disks use a sequential sector number (i.e., the controller translates a sequential sector number to a head, cylinder, and sector triplet), and a completely different method for the CPU to talk with the controller, so they are insulated from the problem. Note, however, that the computer might not know the real geometry of an SCSI disk either.

Since Linux often will not know the real geometry of a disk, its filesystems don't even try to keep files within a single cylinder. Instead, it tries to assign sequentially numbered sectors to files, which almost always gives similar performance. The issue is further complicated by on-controller caches, and automatic prefetches done by the controller.

Each hard disk is represented by a separate device file. There can (usually) be only two or four IDE hard disks. These are known as /dev/hda, /dev/hdb, /dev/hdc, and /dev/hdd, respectively. SCSI hard disks are known as /dev/sda, /dev/sdb, and so on. Similar naming conventions exist for other hard disk types. Note that the device files for the hard disks give access to the entire disk, with no regard to partitions (which will be discussed below), and it's easy to mess up the partitions or the data in them if you aren't careful. The disks' device files are usually used only to get access to the master boot record (which will also be discussed below).

Each partition and extended partition has its own device file. The naming convention for these files is that a partition's number is appended after the name of the whole disk, with the convention that 1-4 are primary partitions (regardless of how many primary partitions there are) and 5-8 are logical partitions (regardless of within which primary partition they reside). For example, /dev/hda1 is the first primary partition on the first IDE hard disk, and /dev/sdb7 is the third extended partition on the second SCSI hard disk.

Floppies

A floppy disk consists of a flexible membrane covered on one or both sides with similar magnetic substance as a hard disk. The floppy disk itself doesn't have a read-write head that is included in the drive. A floppy corresponds to one platter in a hard disk, but is removable and one drive can be used to access different floppies, whereas the hard disk is one indivisible unit.

Like a hard disk, a floppy is divided into tracks and sectors (and the two corresponding tracks on either side of a floppy form a cylinder), but there are many fewer of them than on a hard disk.

A floppy drive can usually use several different types of disks; for example, a 3.5 inch drive can use both 720 kB and 1.44 MB disks. Since the drive has to operate a bit differently and the operating system must know how big the disk is, there are many device files for floppy drives, one per combination of drive and disk type. Therefore, /dev/fd0H1440 is the first floppy drive (fd0), which must be a 3.5 inch drive, using a 3.5 inch, high density disk (H) of size 1440 kB (1440), i.e., a normal 3.5 inch HD floppy. For more information on the naming conventions for the floppy devices, see XXX (device list).

The names for floppy drives are complex, however, and Linux therefore has a special floppy device type that automatically detects the type of the disk in the drive. It works by trying to read the first sector of a newly inserted floppy using different floppy types until it finds the correct one. This naturally requires that the floppy is formatted first. The automatic devices are called /dev/fd0, /dev/fd1, and so on.

The parameters the automatic device uses to access a disk can also be set using the program \cmd{setfdprm}. This can be useful if you need to use disks that do not follow any usual floppy sizes, e.g., if they have an unusual number of sectors, or if the autodetecting for some reason fails and the proper device file is missing.

Linux can handle many nonstandard floppy disk formats in addition to all the standard ones. Some of these require using special formatting programs. We'll skip these disk types for now, but in the mean time you can examine the /etc/fdprm file. It specifies the settings that setfdprm recognizes.

The operating system must know when a disk has been changed in a floppy drive, for example, in order to avoid using cached data from the previous disk. Unfortunately, the signal line that is used for this is sometimes broken, and worse, this won't always be noticeable when using the drive from within MS-DOS. If you are experiencing weird problems using floppies, this might be the reason. The only way to correct it is to repair the floppy drive.

CD-ROMs

A CD-ROM drive uses an optically read, plastic coated disk. The information is recorded on the surface of the disk [1] in small `holes' aligned along a spiral from the center to the edge. The drive directs a laser beam along the spiral to read the disk. When the laser hits a hole, the laser is reflected in one way; when it hits smooth surface, it is reflected in another way. This makes it easy to code bits, and therefore information. The rest is easy, mere mechanics.

CD-ROM drives are slow compared to hard disks. Whereas a typical hard disk will have an average seek time less than 15 milliseconds, a fast CD-ROM drive can use tenths of a second for seeks. The actual data transfer rate is fairly high at hundreds of kilobytes per second. The slowness means that CD-ROM drives are not as pleasant to use instead of hard disks (some Linux distributions provide `live' filesystems on CD-ROM's, making it unnecessary to copy the files to the hard disk, making installation easier and saving a lot of hard disk space), although it is still possible. For installing new software, CD-ROM's are very good, since it maximum speed is not essential during installation.

There are several ways to arrange data on a CD-ROM. The most popular one is specified by the international standard ISO 9660. This standard specifies a very minimal filesystem, which is even more crude than the one MS-DOS uses. On the other hand, it is so minimal that every operating system should be able to map it to its native system.

For normal UNIX use, the ISO 9660 filesystem is not usable, so an extension to the standard has been developed, called the Rock Ridge extension. Rock Ridge allows longer filenames, symbolic links, and a lot of other goodies, making a CD-ROM look more or less like any contemporary UNIX filesystem. Even better, a Rock Ridge filesystem is still a valid ISO 9660 filesystem, making it usable by non-UNIX systems as well. Linux supports both ISO 9660 and the Rock Ridge extensions; the extensions are recognized and used automatically.

The filesystem is only half the battle, however. Most CD-ROM's contain data that requires a special program to access, and most of these programs do not run under Linux (except, possibly, under dosemu, the Linux MS-DOS emulator).

A CD-ROM drive is accessed via the corresponding device file. There are several ways to connect a CD-ROM drive to the computer: via SCSI, via a sound card, or via EIDE. The hardware hacking needed to do this is outside the scope of this book, but the type of connection decides the device file.

Tapes

A tape drive uses a tape, similar to cassettes used for music. A tape is serial in nature, which means that in order to get to any given part of it, you first have to go through all the parts in between. A disk can be accessed randomly, i.e., you can jump directly to any place on the disk. The serial access of tapes makes them slow.

On the other hand, tapes are relatively cheap to make, since they do not need to be fast. They can also easily be made quite long, and can therefore contain a large amount of data. This makes tapes very suitable for things like archiving and backups, which do not require large speeds, but benefit from low costs and large storage capacities.

Formatting

Formatting is the process of writing marks on the magnetic media that are used to mark tracks and sectors. Before a disk is formatted, its magnetic surface is a complete mess of magnetic signals. When it is formatted, some order is brought into the chaos by essentially drawing lines where the tracks go, and where they are divided into sectors. The actual details are not quite exactly like this, but that is irrelevant. What is important is that a disk cannot be used unless it has been formatted.

The terminology is a bit confusing here: in MS-DOS, the word formatting is used to cover also the process of creating a filesystem (which will be discussed below). There, the two processes are often combined, especially for floppies. When the distinction needs to be made, the real formatting is called low-level formatting, while making the filesystem is called high-level formatting. In UNIX circles, the two are called formatting and making a filesystem.

Floppies are formatted with fdformat. The floppy device file to use is given as the parameter. For IDE and some SCSI disks the formatting is actually done at the factory and doesn't need to be repeated; hence most people rarely need to worry about it. In fact, formatting a hard disk can cause it to work less well, for example because a disk might need to be formatted in some very special way to allow automatic bad sector replacement to work. The mkfs command can be used for the creation of the filesystem on the hard disk.

Filesystems

What are filesystems?

A filesystem is the methods and data structures that an operating system uses to keep track of files on a disk or partition; that is, the way the files are organized on the disk. The word is also used to refer to a partition or disk that is used to store the files or the type of the filesystem. Thus, one might say ``I have two filesystems'' meaning one has two partitions on which one stores files, or that one is using the ``extended filesystem'', meaning the type of the filesystem.

The difference between a disk or partition and the filesystem it contains is important. A few programs (including, reasonably enough, programs that create filesystems) operate directly on the raw sectors of a disk or partition; if there is an existing file system there it will be destroyed or seriously corrupted. Most programs operate on a filesystem, and therefore won't work on a partition that doesn't contain one (or that contains one of the wrong type).

Before a partition or disk can be used as a filesystem, it needs to be initialized, and the bookkeeping data structures need to be written to the disk. This process is called making a filesystem.

Most UNIX filesystem types have a similar general structure, although the exact details vary quite a bit. The central concepts are superblock, inode, data block, directory block, and indirection block. The superblock contains information about the filesystem as a whole, such as its size (the exact information here depends on the filesystem). An inode contains all information about a file, except its name. The name is stored in the directory, together with the number of the inode. A directory entry consists of a filename and the number of the inode which represents the file. The inode contains the numbers of several data blocks, which are used to store the data in the file. There is space only for a few data block numbers in the inode, however, and if more are needed, more space for pointers to the data blocks is allocated dynamically. These dynamically allocated blocks are indirect blocks; the name indicates that in order to find the data block, one has to find its number in the indirect block first.

UNIX filesystems usually allow one to create a hole in a file (this is done with lseek; check the manual page), which means that the filesystem just pretends that at a particular place in the file there is just zero bytes, but no actual disk sectors are reserved for that place in the file (this means that the file will use a bit less disk space). This happens especially often for small binaries, Linux shared libraries, some databases, and a few other special cases. (Holes are implemented by storing a special value as the address of the data block in the indirect block or inode. This special address means that no data block is allocated for that part of the file, ergo, there is a hole in the file.)

Holes are moderately useful. On the author's system, a simple measurement showed a potential for about 4 MB of savings through holes of about 200 MB total used disk space. That system, however, contains relatively few programs and no database files.

Creating a filesystem

Filesystems are created, i.e., initialized, with the mkfs command. There is actually a separate program for each filesystem type. mkfs is just a front end that runs the appropriate program depending on the desired filesystem type. The type is selected with the -t fstype option. The programs called by mkfs have slightly different command line interfaces. See the manual pages for more information

Mounting and unmounting

Before one can use a filesystem, it has to be mounted. The operating system then does various bookkeeping things to make sure that everything works. Since all files in UNIX are in a single directory tree, the mount operation will make it look like the contents of the new filesystem are the contents of an existing subdirectory in some already mounted filesystem.

The mount command takes two arguments. The first one is the device file corresponding to the disk or partition containing the filesystem. The second one is the directory below which it will be mounted. After these commands the contents of the two filesystems look just like the contents of the /home and /usr directories, respectively. One would then say that ``/dev/hda2 is mounted on /home'', and similarly for /usr. To look at either filesystem, one would look at the contents of the directory on which it has been mounted, just as if it were any other directory. Note the difference between the device file, /dev/hda2, and the mounted-on directory, /home. The device file gives access to the raw contents of the disk, the mounted-on directory gives access to the files on the disk. The mounted-on directory is called the mount point.

Linux supports many filesystem types. mount tries to guess the type of the filesystem. You can also use the -t fstype option to specify the type directly; this is sometimes necessary, since the heuristics mount uses do not always work. For example, to mount an MS-DOS floppy, you could use the following command:

mount -t msdos /dev/fd0 /floppy

The mounted-on directory need not be empty, although it must exist. Any files in it, however, will be inaccessible by name while the filesystem is mounted. (Any files that have already been opened will still be accessible. Files that have hard links from other directories can be accessed using those names.) There is no harm done with this, and it can even be useful. For instance, some people like to have /tmp and /var/tmp synonymous, and make /tmp be a symbolic link to /var/tmp. When the system is booted, before the /var filesystem is mounted, a /var/tmp directory residing on the root filesystem is used instead. When /var is mounted, it will make the /var/tmp directory on the root filesystem inaccessible. If /var/tmp didn't exist on the root filesystem, it would be impossible to use temporary files before mounting /var.

If you don't intend to write anything to the filesystem, use the -r switch for mount to do a readonly mount. This will make the kernel stop any attempts at writing to the filesystem, and will also stop the kernel from updating file access times in the inodes. Read-only mounts are necessary for unwritable media, e.g., CD-ROM's.

When a filesystem no longer needs to be mounted, it can be unmounted with umount. [2] umount takes one argument: either the device file or the mount point. For example, to unmount the directories of the previous example, one could use the commands

umount /dev/hda2
umount /usr

See the man page for further instructions on how to use the command. It is imperative that you always unmount a mounted floppy. Don't just pop the floppy out of the drive! Because of disk caching, the data is not necessarily written to the floppy until you unmount it, so removing the floppy from the drive too early might cause the contents to become garbled. If you only read from the floppy, this is not very likely, but if you write, even accidentally, the result may be catastrophic.

Disks without filesystems

Not all disks or partitions are used as filesystems. A swap partition, for example, will not have a filesystem on it. Many floppies are used in a tape-drive emulating fashion, so that a tar or other file is written directly on the raw disk, without a filesystem. Linux boot floppies don't contain a filesystem, only the raw kernel.

Avoiding a filesystem has the advantage of making more of the disk usable, since a filesystem always has some bookkeeping overhead. It also makes the disks more easily compatible with other systems: for example, the tar file format is the same on all systems, while filesystems are different on most systems. You will quickly get used to disks without filesystems if you need them. Bootable Linux floppies also do not necessarily have a filesystem, although that is also possible.

One reason to use raw disks is to make image copies of them. For instance, if the disk contains a partially damaged filesystem, it is a good idea to make an exact copy of it before trying to fix it, since then you can start again if your fixing breaks things even more. One way to do this is to use dd:

$ dd if=/dev/fd0H1440 of=floppy-image
2880+0 records in
2880+0 records out
$ dd if=floppy-image of=/dev/fd0H1440
2880+0 records in
2880+0 records out
$

The first dd makes an exact image of the floppy to the file floppy-image, the second one writes the image to the floppy. (The user has presumably switched the floppy before the second command. Otherwise the command pair is of doubtful usefulness.)

1.5 The root account

Linux differentiates between different users. What they can do to each other and the system is regulated. File permissions are arranged so that normal users can't delete or modify files in directories like /bin and /usr/bin. Most users protect their own files with the appropriate permissions so that other users can't access or modify them. (One wouldn't want anybody to be able to read one's love letters.) Each user is given an account that includes a user name and home directory. In addition, there are special, system defined accounts which have special privileges. The most important of these is the root account, which is used by the system administrator. By convention, the system administrator is the user, root.

There are no restrictions on root. He or she can read, modify, or delete any file on the system, change permissions and ownerships on any file, and run special programs like those which partition a hard drive or create file systems. The basic idea is that a person who cares for the system logs in as root to perform tasks that cannot be executed as a normal user. Because root can do anything, it is easy to make mistakes that have catastrophic consequences.

If a normal user tries inadvertently to delete all of the files in /etc, the system will not permit him or her to do so. However, if root tries to do the same thing, the system doesn't complain at all. It is very easy to trash a Linux system when using root. The best way to prevent accidents is:

Sit on your hands before you press Enter for any command that is non-reversible. If you're about to clean out a directory, re-read the entire command to make sure that it is correct.
Use a different prompt for the root account. root's .bashrc or .login file should set the shell prompt to something different than the standard user prompt. Many people reserve the character ``#'' in prompts for root and use the prompt character ``$'' for everyone else.
Log in as root only when absolutely necessary. When you have finished your work as root, log out. The less you use the root account, the less likely you are to damage the system. You are less likely to confuse the privileges of root with those of a normal user.
Picture the root account as a special, magic hat that gives you lots of power, with which you can, by waving your hands, destroy entire cities. It is a good idea to be a bit careful about what you do with your hands. Because it is easy to wave your hands in a destructive manner, it is not a good idea to wear the magic hat when it is not needed, despite the wonderful feeling. Even if you're the only user on your system, it's important to understand the aspects of user management under Linux. You should at least have an account for yourself (other than root) to do most of your work.

1.6 Important Red Hat Administration tools

File System Tools

The following is a list of some of the important file tools that you may need to take care of your system. Be sure to read the man pages and any associated HOWTOs etc. as required.

fsck and e2fsck

The fsck is the front end to for the file system checking commands like the e2fsck. This command can be used to check and repair a number of file systems. Check the man pages for more details. The e2fsck is the program that checks the exy2 file system that is used by default on the Red Hat Linux system. It has a plethora of options to set right a corrupted file system. It is necessary for the sake of safety of the partition, and avoiding of conflicts with other programs trying to access the file system, to unmount the partition and conduct all checks on the device itself. This is generally located in the /dev/ directory.

badblocks

This command searches a device for physical bad blocks and also has a number of testing options. If it finds bad blocks, it marks them so and prevents the writing of data to the blocks and thus preventing data loss. Beware of the "write-mode" test. This causes all data on the file system to be destroyed.

dump and restore

The dump command is used for file system backup, as it searches for your files that need to be backed up. The command can also do remote backups. The restore is the companion command that also works across networks.

tune2fs

if you just want to tweak with the system performance, you can use this command to use the file systems tunable parameters. This however is only for cases where you have a ext2 file system. Again don't run the command when the partition is mounted.

mke2fs

This is similar to the format command of DOS. This may be required if you want to create new file system on existing or newer disks.

debugfs

debugfs is an ext2 file system debugger, with 34 built in commands. Use it with the unmounted devices. Read more about the command before you attempt to use it.

dump2fs

This is another useful command that dumps your file system information. You'll get inode count, block count, block size, last mount and write time. Running dumpe2fs on a 450 MB partition generates a 26,000 character report. An interesting part of the report is the mount and maximum mount count, which determines when e2fsck was last run and when it needs to be run again for proper maintenance.

Some other tools are also useful for managing filesystems. df shows the free disk space on one or more filesystems; du shows how much disk space a directory and all its files contain. These can be used to hunt down disk space wasters.

sync forces all unwritten blocks in the buffer cache to be written to disk. It is seldom necessary to do this by hand; the daemon process update does this automatically. It can be useful in catastrophies, for example if update or its helper process bdflush dies, or if you must turn off power now and can't wait for update to run.