Introduction about Unix Process

As a system engineer, a server guy, or a sys admin, sys dev, sys ops, … most of the time you will have to work on Unix systems. To work on Unix, we interact with the operating system through commands. Each Unix command when executing will run a process or a group of processes.

In this article I introduce the basic knowledge and techniques to work with Processes on Unix. The article will show up with code illustrated with Ruby (then you'll find Ruby very simple). All of my code is run on a Unix environment (Linux itself is Unix – if you don't know it yet, don't hesitate to try it on your computer).

Although I tried very hard, but may still have errors, I am very grateful for the comments

I. Some general knowledge

All programs in Unix are essentially processes. terminal you run, apache, nginx, vim, or whatever command you type in the terminal. Process is the unit that makes Unix. It is an instance of the program you write. In other words, each of your code lines will be executed on a process.

Unix provides tool ps to list all processes running on the system

Here, I run ps and show properties of pid,ppid,user,rss,command of the process (note (1) ps has a lot of options to run, if you want to understand only details, use man ps To know, (2) the result returned is only part of the processes on my machine). The information I want to display here includes:

  1. PID – Process ID (process id),
  2. PPID – Parent Process ID (parent process id of that process),
  3. USER (user name on Unix start process),
  4. RSS (Resident Set Size) can treat the memory used by the process,
  5. COMMAND – command that the user uses to run the processs

Notice that the last line in the result returned to show is COMMAND: ps -e -opid,ppid,user,rss,command – is the command we use to run. That proves, each command is a process!?

In addition, the ps command also shows us that each process will have a process ID, and belong to a certain parent process. Process ID is unique for each process, ie two different processes must have different PIDs. In addition, Process ID cannot be changed while running the process.

1. How does the operating system number Process IDs?

Process ID is numbered in ascending order. Start at 0 and increase until the maximum value is reached. The maximum process ID value is configurable depending on the system.

On Linux you can view and change the default value of the Process ID maximum by changing the file /proc/sys/kernel/pid_max

When the process ID reaches the maximum value, the operating system (OS) will return to numbering from a specific value (some documents say that this value for Linux is 300, and for Mac OS is 100 – I don't know how to safely test this)

UNIX provides syscall getpid returns the Process ID of the current process. You can write a single C program to retrieve the process id with getpid . However, this article of mine will focus on Ruby language

In Ruby, if you want to take the Process ID of the process, you use .

The above code calls to puts – this function works to print a String to the screen. We can manipulate String in Ruby through syntax # {}. The ruby ​​code in # {} will be executed before passing to String

(Ruby files have an extension of .rb. To run a ruby ​​file, you use the ruby <file_name> command. There is no need to compile, it's very simple)

2. Is there any process that has a Parent Process?

I mentioned above that each process belongs to a certain parent process. If you think carefully, will you find something wrong? Well, this actually relates to the boot process of UNIX. When UNIX is started, it will start a process number 0 (with PID = 0) (this process is the process of Kernel UNIX). Process 0 will create a child process, Process 1. In most systems, Process 1 is named an init process, other processes are created from the init process.

Let's go back to the example of the ps command as at the beginning of section I. You may have noticed that the PPID of the first line is 0. That is the first process of the OS.

So the process in Unix is ​​actually organized as a tree. Each node in the tree represents a process in Unix. The root is process 0, the children of a node are the child processes of the process corresponding to that node.

In Ruby, to retrieve the parent process id of a process, we use Process.ppid

It's also obvious. Did I forget anything?

The problem is how can a process generate a child process? Well, don't worry, I'll talk more about this in part 2

3. Process Resource

In addition, our ps command shows that each Process has different RSS feeds. RSS is the memory that Process uses. Different processes, with different memory. In other words, the process address space is separate. Remember this design that processes are independent of each other. If a process dies, it doesn't affect other processes either.

In addition to memory, the operating system assigns Processes other resources that are file descriptor. Remember that on UNIX, everything is a file. That means, devide is treated as file, socket is considered as file, pipe is also file, and file is also file !!! For simplicity, we will use Resource instead of the general file concept, and the file represents the common file concept.

Whenever you open a Resource in a process, that resource is assigned with some file descriptor. File descriptor is not shared between unrelated processes. Resources will live and exist with the process it belongs to. When the process dies, the resources associated with it will be closed and exit.

Each process will have 3 default descriptor files, you should be familiar with them, which is stdin, stdout and stderr. Numbered file descriptors gradually increase from 0 to the largest value. Each process will have a limited number of file descriptors that it is allowed to use.

II. forking

In section I.2, we talked about process parent and child process, and raised questions, how can a process generate other processes.

UNIX provides a great tool to do that. You must have guessed, that is the fork . Personally, fork is probably one of the best functions of UNIX. Why? Because the child process created with fork has 2 characteristics:

  • The child process is copying all the memory from the parent process.
  • The child process will be succeeded from the resource parent process

This means that if in the parent process, you defined the variable a, and assigned a value to it, the child process can also use that variable.

Uhm … Not like that will lead to the situation where two processes change a variable, and not all processes are memory independent.

Well, this is how it is, when you fork a new process, the memory of the child process and the parent process are still independent, but the operating system will use the copy-on-wright (COW) mechanism to do that. That is, if the child process does not change the values ​​in the process of parent, child process, and the parent process will still share the memory. This makes the child process only read, there will be very small memory. In other words, UNIX provides us with a tool to run multiprograms with enough resources.

This is especially good when you need to load libraries. The parent process will be responsible for loading different libraries. After loading, it forked out child processes, and performed child process controls. The child processes are thanks to the COW mechanism, there is no need to spend more time loading libraries and still can access the libraries

In addition, the parent processes shared with the child process of resources also lead to an interesting technique: pre-forking – particularly effective in server programming.

This technique is described as follows:

  • The main process initializes a listening socket
  • Main process fork issues a list of children process. Note that these children process will also listen on the socket that the main process creates. But dispatching the incomming connection to the children process is done on the kernel. This makes dispatching the incomming connection very fast
  • Each process will accept connections from shared sockets and process them separately
  • The main process controls the children process. (provide the command to turn off all children process, create a new child process when a child process crashes …)

Pre-forking techniques are used a lot. eg apache (httpd), nginx, celery, postgresql, rabbitmq, ….

Process in Unix is ​​a very interesting field, especially in system programming and server programming. The article only mentions some initial knowledge and techniques with the Process. There are many issues not mentioned, like

  • Interaction between processes (IPC)
  • Control processes
  • Orphaned, daemon, zoombie, process …

Hopefully in the future, I will be able to write about these issues more closely.


The slide I presented at Framgia company about UNIX Process

ITZone via kiennt
Share the news now