Introduction about Unix Process

Wednesday, 24/08/2016

As a system engineer, a server guy, or a sys admin, sys dev, sys ops, … most of the time you will have to work on Unix systems. To work on Unix, we interact with the operating system through commands. Each Unix command when executing will run a process or a group of processes.

In this article I introduce the basic knowledge and techniques to work with Processes on Unix. The article will show up with code illustrated with Ruby (then you'll find Ruby very simple). All of my code is run on a Unix environment (Linux itself is Unix – if you don't know it yet, don't hesitate to try it on your computer).

Although I tried very hard, but may still have errors, I am very grateful for the comments

I. Some general knowledge

All programs in Unix are essentially processes. terminal you run, apache, nginx, vim, or whatever command you type in the terminal. Process is the unit that makes Unix. It is an instance of the program you write. In other words, each of your code lines will be executed on a process.

Unix provides tool ps to list all processes running on the system

 $> ps -e -opid, ppid, user, rss, command
PID PPID USER RSS COMMAND
1 0 root 152 init [2]
1695 1 root 428 / usr / sbin / sshd
1863 1 root 48 / sbin / getty 38400 tty1
1864 root 1 48 / sbin / getty 38400 tty2
1865 1 root 48 / sbin / getty 38400 tty3
1866 root 1 48 / sbin / getty 38400 tty4
1867 root 1 48 / sbin / getty 38400 tty5
1868 1 root 48 / sbin / getty 38400 tty6
24477 1695 root 2888 sshd: vagrant [priv]
24479 24477 vagrant 1996 sshd: vagrant @ pts / 0
24480 24479 vagrant 2328 -bash
24591 24480 vagrant 1060 ps -e -opid, ppid, user, rss, command

$> ps -e -opid, ppid, user, rss, command

PID PPID USER RSS COMMAND

1 0 root 152 init [2]

1695 1 root 428 / usr / sbin / sshd

1863 1 root 48 / sbin / getty 38400 tty1

1864 root 1 48 / sbin / getty 38400 tty2

1865 1 root 48 / sbin / getty 38400 tty3

1866 root 1 48 / sbin / getty 38400 tty4

1867 root 1 48 / sbin / getty 38400 tty5

1868 1 root 48 / sbin / getty 38400 tty6

24477 1695 root 2888 sshd: vagrant [priv]

24479 24477 vagrant 1996 sshd: vagrant @ pts / 0

24480 24479 vagrant 2328 -bash

24591 24480 vagrant 1060 ps -e -opid, ppid, user, rss, command

Here, I run ps and show properties of pid,ppid,user,rss,command of the process (note (1) ps has a lot of options to run, if you want to understand only details, use man ps To know, (2) the result returned is only part of the processes on my machine). The information I want to display here includes:

PID – Process ID (process id),
PPID – Parent Process ID (parent process id of that process),
USER (user name on Unix start process),
RSS (Resident Set Size) can treat the memory used by the process,
COMMAND – command that the user uses to run the processs

Notice that the last line in the result returned to show is COMMAND: ps -e -opid,ppid,user,rss,command – is the command we use to run. That proves, each command is a process!?

In addition, the ps command also shows us that each process will have a process ID, and belong to a certain parent process. Process ID is unique for each process, ie two different processes must have different PIDs. In addition, Process ID cannot be changed while running the process.

1. How does the operating system number Process IDs?

Process ID is numbered in ascending order. Start at 0 and increase until the maximum value is reached. The maximum process ID value is configurable depending on the system.

On Linux you can view and change the default value of the Process ID maximum by changing the file /proc/sys/kernel/pid_max

 # đọc hiện thời giá trị tối đa của id
$> cat / proc / sys / kernel / pid_max
32768

# đặt giá trị tối đa cho tiến trình id
$> echo 40000> / proc / sys / kernel / pid_max

# đọc hiện thời giá trị tối đa của id

$> cat / proc / sys / kernel / pid_max

32768

# đặt giá trị tối đa cho tiến trình id

$> echo 40000> / proc / sys / kernel / pid_max

When the process ID reaches the maximum value, the operating system (OS) will return to numbering from a specific value (some documents say that this value for Linux is 300, and for Mac OS is 100 – I don't know how to safely test this)

UNIX provides syscall getpid returns the Process ID of the current process. You can write a single C program to retrieve the process id with getpid . However, this article of mine will focus on Ruby language

In Ruby, if you want to take the Process ID of the process, you use Process.pid .

 # file test_pid.rb
puts "Process pid: # {Process.pid}"

1 2	# file test_pid.rb puts "Process pid: # {Process.pid}"

The above code calls to puts – this function works to print a String to the screen. We can manipulate String in Ruby through syntax # {}. The ruby code in # {} will be executed before passing to String

 $> irb

irb (main): 001: 0> puts "Example for String manipulate: 1 + 2 = # {1 + 2}"
Example for String manipulate: 1 + 2 = 3
=> nil

$> irb

irb (main): 001: 0> puts "Example for String manipulate: 1 + 2 = # {1 + 2}"

Example for String manipulate: 1 + 2 = 3

=> nil

(Ruby files have an extension of .rb. To run a ruby file, you use the ruby <file_name> command. There is no need to compile, it's very simple)

2. Is there any process that has a Parent Process?

I mentioned above that each process belongs to a certain parent process. If you think carefully, will you find something wrong? Well, this actually relates to the boot process of UNIX. When UNIX is started, it will start a process number 0 (with PID = 0) (this process is the process of Kernel UNIX). Process 0 will create a child process, Process 1. In most systems, Process 1 is named an init process, other processes are created from the init process.

Let's go back to the example of the ps command as at the beginning of section I. You may have noticed that the PPID of the first line is 0. That is the first process of the OS.

So the process in Unix is actually organized as a tree. Each node in the tree represents a process in Unix. The root is process 0, the children of a node are the child processes of the process corresponding to that node.

In Ruby, to retrieve the parent process id of a process, we use Process.ppid

 # file test_ppid.rb
put "Process id # {Process.pid}, parent process id # {Process.ppid}"

1 2	# file test_ppid.rb put "Process id # {Process.pid}, parent process id # {Process.ppid}"

It's also obvious. Did I forget anything?

The problem is how can a process generate a child process? Well, don't worry, I'll talk more about this in part 2

3. Process Resource

In addition, our ps command shows that each Process has different RSS feeds. RSS is the memory that Process uses. Different processes, with different memory. In other words, the process address space is separate. Remember this design that processes are independent of each other. If a process dies, it doesn't affect other processes either.

In addition to memory, the operating system assigns Processes other resources that are file descriptor. Remember that on UNIX, everything is a file. That means, devide is treated as file, socket is considered as file, pipe is also file, and file is also file !!! For simplicity, we will use Resource instead of the general file concept, and the file represents the common file concept.

Whenever you open a Resource in a process, that resource is assigned with some file descriptor. File descriptor is not shared between unrelated processes. Resources will live and exist with the process it belongs to. When the process dies, the resources associated with it will be closed and exit.

Each process will have 3 default descriptor files, you should be familiar with them, which is stdin, stdout and stderr. Numbered file descriptors gradually increase from 0 to the largest value. Each process will have a limited number of file descriptors that it is allowed to use.

II. forking

In section I.2, we talked about process parent and child process, and raised questions, how can a process generate other processes.

UNIX provides a great tool to do that. You must have guessed, that is the fork . Personally, fork is probably one of the best functions of UNIX. Why? Because the child process created with fork has 2 characteristics:

The child process is copying all the memory from the parent process.
The child process will be succeeded from the resource parent process

This means that if in the parent process, you defined the variable a, and assigned a value to it, the child process can also use that variable.

Uhm … Not like that will lead to the situation where two processes change a variable, and not all processes are memory independent.

Well, this is how it is, when you fork a new process, the memory of the child process and the parent process are still independent, but the operating system will use the copy-on-wright (COW) mechanism to do that. That is, if the child process does not change the values in the process of parent, child process, and the parent process will still share the memory. This makes the child process only read, there will be very small memory. In other words, UNIX provides us with a tool to run multiprograms with enough resources.

This is especially good when you need to load libraries. The parent process will be responsible for loading different libraries. After loading, it forked out child processes, and performed child process controls. The child processes are thanks to the COW mechanism, there is no need to spend more time loading libraries and still can access the libraries

In addition, the parent processes shared with the child process of resources also lead to an interesting technique: pre-forking – particularly effective in server programming.

This technique is described as follows:

The main process initializes a listening socket
Main process fork issues a list of children process. Note that these children process will also listen on the socket that the main process creates. But dispatching the incomming connection to the children process is done on the kernel. This makes dispatching the incomming connection very fast
Each process will accept connections from shared sockets and process them separately
The main process controls the children process. (provide the command to turn off all children process, create a new child process when a child process crashes …)

Pre-forking techniques are used a lot. eg apache (httpd), nginx, celery, postgresql, rabbitmq, ….

Process in Unix is a very interesting field, especially in system programming and server programming. The article only mentions some initial knowledge and techniques with the Process. There are many issues not mentioned, like

Interaction between processes (IPC)
Control processes
Orphaned, daemon, zoombie, process …

Hopefully in the future, I will be able to write about these issues more closely.

Update

The slide I presented at Framgia company about UNIX Process

ITZone via kiennt

Share the news now

Introduction about Unix Process

I. Some general knowledge

1. How does the operating system number Process IDs?

2. Is there any process that has a Parent Process?

3. Process Resource

II. forking

Update

TikTok becomes the second largest social platform in South Africa

The fastest depreciating after 9 months of launch, iPhone 14 Pro Max continues to break the bottom in Vietnam

Beginner's guide to R: Introduction

10 essential SublimeText plugins for JavaScript developers