Google
 

Tuesday, March 27, 2007

The Linux(r) Kernel Primer: A Top-Down Approach for x86 and PowerPC Architectures Reading Note Chapter 1

The Linux® Kernel Primer: A Top-Down Approach for x86 and PowerPC Architectures

Chapter 1. Overview

1.7. What Is an Operating System?

1.    The operating system is what turns your hardware into a usable computer. It is in charge of managing the resources provided by your system's particular hardware components and of providing a base for application programs to be developed on and executed.

2.    In Linux, we have kernel space and user space as two distinct portions of the operating system.

3.    A user associates with the operating system by way of user space where he will develop and/or use application programs. User space does not access the kernel (and hence, the hardware resources) directly but by way of system callsthe outermost layer of procedures defined by the kernel.

4.    Kernel space is where the hardware-management functionality takes place. Within the kernel, the system call procedures call on other procedures that are not available to user space to manipulate finer grain functionality.

5.    Linux also sports dynamically loadable device drivers, breaking one of the main drawbacks inherent in monolithic operating systems.

1.8. Kernel Organization

1.    Linux supports numerous architecturesthis means that it can be run on many types of processors, which include alpha, arm, i386, ia64, ppc, ppc64, and s390x. The Linux source code is packaged to include support for all these architectures.

2.    Most of the source code is written in C and is hardware independent. A portion of the code is heavily hardware dependent and is written in a mix of C and assembly for the particular architecture.

3.    The architecture-dependent portions of the code are generally involved with system initialization and bootstrapping, exception vector handling, address translation, and device I/O.

1.9. Overview of the Linux Kernel

1.    User Interface

1.    Users communicate with the system by way of programs.

2.    A user first logs in to the system through a terminal or a virtual terminal. In Linux, a program, called mingetty for virtual terminals or agetty for serial terminals, monitors the inactive terminal waiting for users to notify that they want to log in.

2.    User Identification

1.    A user logs in with a unique account name. However, he is also associated with a unique user ID (UID). The kernel uses this UID to validate the user's permissions with respect to file accesses.

2.    When a user logs in, he is granted access to his home directory, which is where he can create, modify, and destroy files.

3.    Files and Filesystems
A filesystem provides a method for the storage and organization of data.Linux supports the concept of the file as a device-independent sequence of bytes. By means of this abstraction, a user can access a file regardless of what device (for example, hard disk, tape drive, disk drive) stores it

1.    Directories, Files, and Pathnames

1.    Every file in a tree has a pathname that indicates its name and location.

2.    relative pathname

3.    absolute pathname

4.    The current working directory is the directory from which the process was called and is identified by a . (pronounced "dot").

5.    As an aside, the parent directory is the directory that contains the working directory and is identified by a .. (pronounced "dot dot").

2.    Filesystem Mounting

1.    In Linux, as in all UNIX-like systems, a filesystem is only accessible if it has been mounted.

2.    A filesystem is mounted with the mount system call and is unmounted with the umount system call.

3.    A filesystem is mounted on a mount point, which is a directory used as the root access to the mounted filesystem. A directory mount point should be empty. Any files originally located in the directory used as a mount point are inaccessible after the filesystem is mounted and remains so until the filesystem is unmounted.

4.    The /etc/mtab file holds the table of mounted filesystems while /etc/fstab holds the filesystem table, which is a table listing all the system's filesystems and their attributes. /etc/mtab lists the device of the mounted filesystem and associates it with its mount point and any options with which it was mounted.

3.    File Protection and Access Rights

1.    Files have access permissions to provide some degree of privacy and security.

2.    Access rights or permissions are stored as they apply to three distinct categories of users: the user himself, a designated group, and everyone else. The three types of users can be granted varying access rights as applied to the three types of access to a file: read, write, and execute.

4.    File Modes

1.9.3.4. File Modes

In addition to access rights, a file has three additional modes: sticky, suid, and sgid.

1.         sticky

a)         A file with the sticky bit enabled has a "t" in the last character of the mode field (for example, -rwx-----t). Back in the day when disk accesses were slower than they are today, when memory was not as large, and when demand-based methodologies hadn't been conceived,[10] an executable file could have the sticky bit enabled and ensure that the kernel would keep it in memory despite its state of execution. When applied to a program that was heavily used, this could increase performance by reducing the amount of time spent accessing the file's information from disk.

b)        When the sticky bit is enabled in a directory, it prevents the removal or renaming of files from users who have write permission in that directory (with exception of root and the owner of the file).

2.         suid

a)         An executable with the suid bit set has an "s" where the "x" character goes for the user-permission bits (for example, -rws------). When a user executes an executable file, the process is associated with the user who called it. If an executable has the suid bit set, the process inherits the UID of the file owner and thus access to its set of access rights. This introduces the concepts of the real user ID as opposed to the effective user ID. As we soon see when we look at processes in the "Processes" section, a process' real UID corresponds to that of the user that started the process. The effective UID is often the same as the real UID unless the setuid bit was set in the file. In that case, the effective UID holds the UID of the file owner.

b)        suid has been exploited by hackers who call executable files owned by root with the suid bit set and redirect the program operations to execute instructions that they would otherwise not be allowed to execute with root permissions.

3.         sgid

a)         An executable with the sgid bit set has an "s" where the "x" character goes for the group permission bits (for example, -rwxrws---). The sgid bit acts just like the suid bit but as applied to the group. A process also has a real group ID and an effective group ID that holds the GID of the user and the GID of the file group, respectively.

 

1.9.3.5. File Metadata

1.         File metadata is all the information about a file that does not include its content.

2.         For example, metadata includes the type of file, the size of the file, the UID of the file owner, the access rights, and so on. As we soon see, some file types (devices, pipes, and sockets) contain no data, only metadata.

3.         All file metadata, with the exception of the filename, is stored in an inode or index node. An inode is a block of information, and every file has its own inode.

4.         A file descriptor is an internal kernel data structure that manages the file data. File descriptors are obtained when a process accesses a file.

 

1.9.3.6. Types of Files

1.         Regular File

a)         A regular file is identified by a dash in the first character of the mode field (for example, -rw-rw-rw-).

b)        A regular file can contain ASCII data or binary data if it is an executable file.

c)         The kernel does not care what type of data is stored in a file and thus makes no distinctions between them. User programs, however, might care.

d)        Regular files have their data stored in zero or more data blocks.

2.         Directory

a)         A directory file is identified by a "d" in the first character of the mode field (for example, drwx------).

b)        A directory is a file that holds associations between filenames and the file inodes. A directory consists of a table of entries, each pertaining to a file that it contains.

c)         ls ai lists all the contents of a directory and the ID of its associated inode.

3.         Block Devices

a)         A block device is identified by a "b" in the first character of the mode field (for example, brw-------).

b)        These files represent a hardware device on which I/O is performed in discretely sized blocks in powers of 2.

c)         Block devices include disk and tape drives and are accessed through the /dev directory in the filesystem.

d)        Disk accesses can be time consuming; therefore, data transfer for block devices is performed by the kernel's buffer cache, which is a method of storing data temporarily to reduce the number of costly disk accesses. At certain intervals, the kernel looks at the data in the buffer cache that has been updated and synchronizes it with the disk. This provides great increases in performance; however, a computer crash can result in loss of the buffered data if it had not yet been written to disk. Synchronization with the disk drive can be forced with a call to the sync, fsync, or fdatasync system calls, which take care of writing buffered data to disk.

e)         A block device does not use any data blocks because it stores no data. Only an inode is required to hold its information.

4.         Character Devices

a)         A character device is identified by a "c" in the first character of the mode field (for example, crw-------).

b)        These files represent a hardware device that is not block structured and on which I/O occurs in streams of bytes and is transferred directly between the device driver and the requesting process.

c)         These devices include terminals and serial devices and are accessed through the /dev directory in the filesystem. Pseudo devices or device drivers that do not represent hardware but instead perform some unrelated kernel side function can also be character devices. These devices are also known as raw devices because of the fact that there is no intermediary cache to hold the data.

d)        Similar to a block device, a character device does not use any data blocks because it stores no data. Only an inode is required to hold its information.

5.         Link

a)         A link device is identified by an "l" in the first character of the mode field (for example, lrw-------).

b)        A link is a pointer to a file. This type of file allows there to be multiple references to a particular file while only one copy of the file and its data actually exists in the filesystem.

c)         There are two types of links: hard link and symbolic, or soft, link. Both are created through a call to ln . A hard link has limitations that are absent in the symbolic link. These include being limited to linking files within the same filesystem, being unable to link to directories, and being unable to link to non-existing files.

d)        Links reflect the permissions of the file to which it is pointing.

6.         Named Pipes

a)         A pipe file is identified by a "p" in the first character of the mode field (for example, prw-------).

b)        A pipe is a file that facilitates communication between programs by acting as data pipes; data is written into them by one program and read by another.

c)         The pipe essentially buffers its input data from the first process. Named pipes are also known as FIFOs because they relay the information to the reading program in a first in, first out basis.

d)        Much like the device files, no data blocks are used by pipe files, only the inode.

7.         Sockets

a)         A socket is identified by an "s" in the first character of the mode field (for example, srw-------).

b)        Sockets are special files that also facilitate communication between two processes. One difference between pipes and sockets is that sockets can facilitate communication between processes on different computers connected by a network.

c)         Socket files are also not associated with any data blocks.

 

1.9.3.7. Types of Filesystems

1.         Linux filesystems support an interface that allows various filesystem types to coexist.

2.         A filesystem type is determined by the way the block data is broken down and manipulated in the physical device and by the type of physical device.

3.         Some examples of types of filesystems include network mounted, such as NFS, and disk based, such as ext3, which is one of the Linux default filesystems. Some special filesystems, such as /proc, provide access to kernel data and address space.

1.9.3.8. File Control

1.         When a file is accessed in Linux, control passes through a number of stages.

2.         First, the program that wants to access the file makes a system call, such as open(), read(), or write().

3.         Control then passes to the kernel that executes the system call. There is a high-level abstraction of a filesystem called VFS, which determines what type of specific filesystem (for example, ext2, minix, and msdos) the file exists upon, and control is then passed to the appropriate filesystem driver.

4.         The filesystem driver handles the management of the file upon a given logical device. A hard drive could have msdos and ext2 partitions. The filesystem driver knows how to interpret the data stored on the device and keeps track of all the metadata associated with a file. Thus, the filesystem driver stores the actual file data and incidental information such as the timestamp, group and user modes, and file permissions (read/write/execute).

5.         The filesystem driver then calls a lower-level device driver that handles the actual reading of the data off of the device. This lower-level driver knows about blocks, sectors, and all the hardware information that is necessary to take a chunk of data and store it on the device. The lower-level driver passes the information up to the filesystem driver, which interprets and formats the raw data and passes the information to the VFS, which finally transfers the data back to the originating program.

1.9.4. Processes

1.         More specifically, a process is a program that is in execution. A single program can be executed multiple times so there might be more than one process associated with a particular program.

2.         The process model makes the execution of multiple tasks possible by defining execution contexts. In Linux, each process operates as though it were the only process. The operating system then manages these contexts by assigning the processor to work on one or the other according to a predefined set of rules. The scheduler defines and executes these rules. The scheduler tracks the length of time the process has run and switches it off to ensure that no one process hogs the CPU.

3.         The execution context consists of all the parts associated with the program such as its data (and the memory address space it can access), its registers, its stack and stack pointer, and the program counter value. Except for the data and the memory addressing, the rest of the components of a process are transparent to the programmer. However, the operating system needs to manage the stack, stack pointer, program counter, and machine registers. In a multiprocess system, the operating system must also be responsible for the context switch between processes and the management of system resources that processes contend for.

1.9.4.1. Process Creation and Control

1.         A process is created from another process with a call to the fork() system call. When a process calls fork(), we say that the process spawned a new process, or that it forked. The new process is considered the child process and the original process is considered the parent process.

2.         All processes have a parent, with the exception of the init process. All processes are spawned from the first process, init, which comes about during the bootstrapping phase. This is discussed further in the next section.

3.         When a child process is created, the parent process might want to know when it is finished. The wait() system call is used to pause the parent process until its child has exited.

4.         A process can also replace itself with another process.

5.         This is done, for example, by the mingetty() functions previously described. When a user requests access into the system, the mingetty() function requests his username and then replaces itself with a process executing login() to which it passes the username parameter. This replacement is done with a call to one of the exec() system calls.

1.9.4.2. Process Ids

1.         Every process has a unique identifier know as the process ID (PID).

2.         A PID is a non-negative integer. Process IDs are handed out in incrementing sequential order as processes are created. When the maximum PID value is hit, the values wrap and PIDs are handed out starting at the lowest available number greater than 1.

3.         There are two special processes: process 0 and process 1. Process 0 is the process that is responsible for system initialization and for spawning off process 1, which is also known as the init process. All processes in a running Linux system are descendants of process 1. After process 0 executes, the init process becomes the idle cycle. Chapter 8, "Booting the Kernel," discusses this process in "The Beginning: start_kernel()" section.

4.         Two system calls are used to identify processes. The getpid() system call retrieves the PID of the current process, and the getppid() system call retrieves the PID of the process' parent.

1.9.4.3. Process Groups

1.         A process can be a member of a process group by sharing the same group ID.

2.         A process group facilitates associating a set of processes. This is something you might want to do, for example, if you want to ensure that otherwise unrelated processes receive a kill signal at the same time.

3.         The process whose PID is identical to the group ID is considered the group leader.

4.         Process group IDs can be manipulated by calling the getpgid() and setpgid() system calls, which retrieve and set the process group ID of the indicated process, respectively.

1.9.4.4. Process States

1.         Processes can be in different states depending on the scheduler and the availability of the system resources for which the process contends.

2.         A process might be in a runnable state if it is currently being executed or in a run queue, which is a structure that holds references to processes that are in line to be executed. A process can be sleeping if it is waiting for a resource or has yielded to anther process, dead if it has been killed, and defunct or zombie if a process has exited before its parent was able to call wait() on it.

1.9.4.5. Process Descriptor

1.         Each process has a process descriptor that contains all the information describing it. The process descriptor contains such information as the process state, the PID, the command used to start it, and so on. This information can be displayed with a call to ps (process status).

2.          

1.9.4.6. Process Priority

1.         In single-processor computers, we can have only one process executing at a time. Processes are assigned priorities as they contend with each other for execution time.

2.         This priority is dynamically altered by the kernel based on how much a process has run and what its priority has been until that moment. A process is allotted a timeslice to execute after which it is swapped out for another process by the scheduler, as we describe next.

3.         Higher priority processes are executed first and more often.

4.         The user can set a process priority with a call to nice(). This call refers to the niceness of a process toward another, meaning how much the process is willing to yield. A high priority has a negative value, whereas a low priority has a positive value. The higher the value we pass nice, the more we are willing to yield to another process.

1.9.5. System Calls

1.         System calls are the main mechanism by which user programs communicate with the kernel.

2.         Systems calls are generally wrapped inside library calls that manage the setup of the registers and data that each system call needs before executing. The user programs then link in the library with the appropriate routines to make the kernel request.

3.         System calls generally apply to specific subsystems. This means that a user space program can interact with any particular kernel subsystem by means of these system calls. For example, files have file-handling system calls, and processes have process-specific system calls. Throughout this book, we identify the system calls associated with particular kernel subsystems. For example, when we talk about filesystems, we look at the read(), write(), open(), and close() system calls. This provides you with a view of how filesystems are implemented and managed within the kernel.

1.9.6. Linux Scheduler

1.         The Linux scheduler handles the task of moving control from one process to another. With the inclusion of kernel pre-emption in Linux 2.6, any process, including the kernel, can be interrupted at nearly any time and control passed to a new process.

2.         The scheduler handles both of these tasks: On one hand, it swaps the current process with a new process; on the other hand, it keeps track of processes' usage of the CPU and indicates that they be swapped if they have run too long.

3.         A quick summary is that the scheduler determines priority based on past performance (how much CPU the process has used before) and on the criticality of the process (interrupts are more critical than the log system).

4.         The Linux scheduler also manages how processes execute on multiprocessor machines (SMP). There are some interesting features for load balancing across multiple CPUs as well as the ability to tie processes to a specific CPU. That being said, the basic scheduling functionality operates identically across CPUs.

1.9.7. Linux Device Drivers

1.         Device drivers are how the kernel interfaces with hard disks, memory, sound cards, Ethernet cards, and many other input and output devices.

2.         The Linux kernel usually includes a number of these drivers in a default installation; Linux wouldn't be of much use if you couldn't enter any data via your keyboard. Device drivers are encapsulated in a module. Although Linux is a monolithic kernel, it achieves a high degree of modularization by allowing each device driver to be dynamically loaded. Thus, a default kernel can be kept relatively small and slowly extended based upon the actual configuration of the system on which Linux runs.

3.         In the 2.6 Linux kernel, device drivers have two major ways of displaying their status to a user of the system: the /proc and /sys filesystems. In a nutshell, /proc is usually used to debug and monitor devices and /sys is used to change settings. For example, if you have an RF tuner on an embedded Linux device, the default tuner frequency could be visible, and possibly changeable, under the devices entry in sysfs.

4.         In Chapters 5, "Input/Output," and 10, "Adding Your Code to the Kernel," we closely look at device drivers for both character and block devices. More specifically, we tour the /dev/random device driver and see how it gathers entropy information from other devices on the Linux system.

 

1.10. Portability and Architecture Dependence

1.       The Linux kernel is crafted in such a way as to minimize how much of its code is directly dependent on the underlying hardware. When interaction with the hardware is required, appropriate libraries have been brought in at compile time to execute that particular function on a given architecture.

2.       Depending on the target architecture, a different layer of software is brought in to interface with the hardware. Above this layer, the kernel code is oblivious to the underlying hardware.

3.    For this reason, the Linux kernel is said to be portable across different architectures. Limitations arise when drivers have not been ported, either because the hardware they are bound to is not available for a certain architecture or because there has not been enough demand for a port. To create a device driver, the programmer must have a register-level specification for a given piece of hardware. Not all manufacturers are willing to furnish this document because of the proprietary nature of their hardware. This, too, indirectly limits the portability of Linux across architectures.

No comments: