Table of Contents
This chapter gives an introduction to some general UNIX and GNU/Linux concepts. It is important to read this chapter thoroughly if you do not have any UNIX or GNU/Linux experience. Many concepts covered in this chapter are used in this book and in GNU/Linux.
One of UNIX's traditional strengths is multitasking. Multitasking means that multiple programs can be run at the same time. You may wonder why this is important, because most people use only one application at a time. Multitasking is a bare necessity for UNIX-like systems. Even if you have not started any applications, there are programs that run in the background. For instance, some programs provide network services, while others show a login prompt and wait until a user logs in on a (virtual) terminal. Programs that are running in the background are often called daemon processes[2].
After a program is loaded from a storage medium, an instance of the program is started. This instance is called a process. A process has its own protected memory, named the process address space. The process address space has two important areas: the text area and the data area. The text area is the actual program code; it is used to tell the system what to do. The data area is used to store constant and runtime data. The operating system gives every process time to execute. On single processor systems processes are not really running simultaneously. In reality a smart scheduler in the kernel divides CPU time among processes, giving the illusion that processes run simultaneously. This process is called time-sharing. On systems with more than one CPU or CPU cores, more than one process can run simultaneously, but time-sharing is still used to divide the available CPU time among processes.
New processes are created by duplicating a running process with the fork system call. Figure 4.1, “Forking of a process” shows a fork() call in action schematically. The parent process issues a fork() call. The kernel will respond to this call by duplicating the process, and naming one process the parent, and the other process the child.
Forking can be used by a program to create two processes that can run simultaneously on multiprocessor machines. However, this is often not ideal, because both processes will have their own process address space. The initial duplication of the process memory takes relatively much time, and it is difficult to share data between two processes. This problem is solved by a concept named multithreading. Multithreading means that multiple instances of the text area can run at the same time, sharing the data area. These instances, named threads, can be executed in parallel on multiple CPUs.
Operating systems store data in filesystems. A filesystem is basically a tree-like structure of directories that hold files, like the operating system, user programs and user data. Most filesystems can also store various metadata about files and directories, for instance access and modification times. In GNU/Linux there is only one filesystem hierarchy, this means GNU/Linux does not have drive letters (e.g. A:, C:, D:) for different filesystems, like DOS and Windows. The filesystem looks like a tree, with a root directory (which has no parent directory), branches, and leaves (directories with no subdirectories). The root directory is alway denoted with a slash (“/”). Directories are separated by the same character.
Figure 4.2, “The filesystem structure” shows the structure of
a filesystem. You can see that the root directory
/binhomehomejoejackmemo.txt/home/jack/home/jack/memo.txt
Each directory has two special entries,
“...jack/home/joe../joe
You might wonder how it is possible to access other devices
or partitions than the hard disk partition which holds the
root filesystem. Linux uses the same approach as UNIX for
accessing other filesystems. Linux allows the system
administrator to connect a device to any directory in the
filesystem structure. This process is named
mounting. For example, one could mount
the CD-ROM drive to the /cdrom
The Filesystem Hierarchy Standard Group has attempted to create a standard that describes which directories should be available on a GNU/Linux system. Currently, most major distributions use the Filesystem Hierarchy Standard (FHS) as a guideline. This section describes some mandatory directories on GNU/Linux systems.
Please note that GNU/Linux does not have a separate directory
for each application (like Windows). Instead, files are ordered by
function and type. For example, the binaries for most
common user programs are stored in
/usr/bin/usr/lib
/bin: essential
user binaries that should still be available in case the
/usr
/dev: device files. These are special files used to access certain devices.
/etc: the
/etc
/home: contains home directories for individual users.
/lib: essential system libraries (like glibc), and kernel modules.
/root: home directory for the root user.
/sbin: essential binaries that are used for system administration.
/tmp: a world-writable directory for temporary files.
/usr/bin: stores the majority of the user binaries.
/usr/lib: libraries that are not essential for the system to boot.
/usr/sbin: nonessential system administration binaries.
/var: variable data files, like logs.
In GNU/Linux virtually everything is represented as a file, including
devices. Every GNU/Linux system has a directory with special files, named
/dev/dev/dev/zero
$ file /dev/zero
/dev/zero: character special (1/5)
The file command can be used to determine the type of a file. This particular file is recognized as a device file that has 1 as the major device number, and 5 as the minor device number.
If you have installed the kernel source package, you can find
a comprehensive list of all major devices with their minor and
major numbers in
/usr/src/linux/Documentation/devices.txt
The Linux kernel handles two types of devices: character and block devices. Character devices can be read byte by byte, block devices can not. Block devices are read per block (for example 4096 bytes at a time). Whether a device is a character or block device is determined by the nature of the device. For example, most storage media are block devices, and most input devices are character devices. Block devices have one distinctive advantage, namely that they can be cached. This means that commonly read or written blocks are stored in a special area of the system memory, named the cache. Memory is much faster than most storage media, so much performance can be gained by performing common block read and write operations in memory. Of course, eventually changes have to be written to the storage media to reflect the changes that were made in the cache.
There are two kinds of block devices that we are going to look into in detail, because understanding the naming of these devices is crucial for partitioning a hard disk and mounting. Almost all modern computers with an x86 architecture use ATA hard disks and CD-ROMs. Under Linux these devices are named in the following manner:
/dev/hda - master device on the first ATA channel
/dev/hdb - slave device on the first ATA channel
/dev/hdc - master device on the second ATA channel
/dev/hdd - slave device on the second ATA channel
On most computers with a single hard disk, the hard disk is the master
device on the first ATA channel (/dev/hda/dev/hda1/dev/hda
SCSI hard disks and CD-ROM drives follow an other naming convention. SCSI is not commonly used in most low-end machines, but USB drives and Serial ATA (SATA) drives are also represented as SCSI disks. The following device notation is used for SCSI drives:
/dev/sda - First SCSI disk
/dev/sdb - Second SCSI disk
/dev/sdc - Third SCSI disk
/dev/scd0 - First CD-ROM
/dev/scd1 - Second CD-ROM
/dev/scd2 - Third CD-ROM
Partitions names are constructed in the same way as ATA disks;
/dev/sda1
If you use the software RAID implementation of the Linux kernel, RAID
volumes are available as /dev/mdn
[2] The word daemon should not to be confused with the word demon, the word daemon refers to supernatural beings in Greek mythology.