NodeJS – Understanding Asynchronous Event-Driven Nonblocking I / O

Friday, 18/11/2016

A very long article I would like to get from http://sotatek.com/nodejs-hieu-asynchronous-event-drivent-nonblocking-io/ . I assure you this is an extremely quality article that helps to understand / review knowledge about the operating system as well as better understand NodeJS

It is often heard that NodeJS is a JavaScript runtime built on Chrome's V8 JavaScript engine using the model of event-driven, non-blocking I / O. Established since 2009, with its popularity, many people are familiar with NodeJS and event-driven , non-bloking I/O concepts, asynchronous that NodeJS contributes to popularity. This article overcomes these concepts once again in a long way, leading more than a few bulleted answers on Stackoverflow.

As in some jokes about philosophy (I apologize to the author for not remembering what story in this book I read), the students wandered into the philosophy class and expected to reach a realm. that is, like understanding the meaning of everything.

And a ragged guy stepped onto the podium, beginning to babble about the meaning of "meaning." Before starting to answer any question, we should question the meaning of the question itself, to make it sound dangerous.

Now, before starting to decide whether to block or non-block I / O, perhaps we should review the concept of I / O once.

I / O

I / O is the communication process (taking in data, returning data) between an information system and the external environment. For CPUs, even data communication with outside the chip structure such as data import / export with memory (RAM) is also an I / O task. In computer architecture, the combination of CPU and main memory (main memory – RAM) is considered the brain of the computer, all data transfer operations with the CPU / Memory duo, the reading example of writing data from Hard drives are considered I / O tasks.

Because the internal components of the architecture depend on data from other components, the speed between these components is different, when an active component does not keep up with other components, making the other component free idle because there is no working data, the slow component becomes a bottle-neck, pulling back the performance of the entire system.

Based on the components of modern computer architecture, the speed of implementation of dependent processes:

CPU Bound: The execution speed is limited by the CPU processing speed
Memory Bound: The speed at which the process is executed is limited by the available capacity and memory access speed
Cache Bound: The execution speed is limited by the number of memory cells and the speed of available cache
I / O Bound: The execution speed is limited by the speed of IO tasks

 I / O Bound &lt;Memory Bound &lt;Cache Bound &lt;CPU Bound

1	I / O Bound <Memory Bound <Cache Bound <CPU Bound

io-cost

Because the I / O speed is often very slow compared to the rest, bottle-neck often occurs here. People often consider I / O Bound and CPU Bound, trying to put processes limited by I / O bound to bound CPUs to make the most of the performance.

File in Unix

You've probably heard "On a UNIX system, Everything is a file" , (Actually, there's the following: "If something is not a file, it is a process" , or as Linux Torvalds - "Everything is a file descriptor or a process" ). Any I / O devide is considered, and can be treated as a file in the Unix filesystem. This creates an abtraction for I / O devices, obscuring the true nature of these devices, the kernel only communicates with the file and does not need to know how to behave separately for each device.

The actions open, read, and write on the main file are I/O operation . When open() or create() a file, the kernel will create a file description structure stored in the file table and return an integer containing the referent to the corresponding file description in the table.

Each process will store a list of file descriptors – the files that the process is possessing, all operations with the file will be done through a system call with the argument is file descriptor, ie process at user mode thanks to kernel sports Working on a file it owns. This is to ensure that only the kernel has the right to affect the system in a safe way, the program is only able to use the kernel to perform the action.

In Unix, it is possible to split the file into two groups: Fast and slow file . Do not let the name Fast and Slow makes you confused that it refers to the actual speed record with file read, things to care about here is the khả năng dự đoán it. Files can meet the read / write request of the user in a khoảng thời gian dự đoán được Fast khoảng thời gian dự đoán được , files can take up to an vô hạn thời gian to respond to use calls, namely Slow.

For example, when reading data from a hard drive, the kernel knows that the data there, it only needs a finite time to retrieve data and returns it to the user, so reading a regular file is Quick. Reading from a Pipe or Socket , the kernel cannot know when it has data, synonymous with Pipe or Socket is Slow.

File Type	Category
Block Device	Fast
`Pipe`	`Slow`
`Socket`	`Slow`
Regular	Fast
Directory	Fast
`Character Device`	`Varies`

When a process needs to retrieve data from the file, in user mode, it calls system call read() on an open file, the kernel receives this call and manipulates the file in kernel mode and then returns the data (block of bytes) to read get user mode to process.

If it is a fast file, the kernel retrieves the data where it already knows and returns the user or error message if an exception occurs (eg the file location cannot be determined).

With the slow file, the kernel will return any data available on the file even without encountering EOF (End Of File). If there is no data on the file, the kernel is simply a block process (in default mode), switch it to sleep mode and wake it up when the data is ready.

This is perfectly reasonable because normally when a programmer reads a file, he expects an essential data for further processing in the program. This is what people often call Bloking I/O

In fact, there will be times when we want our process to do something other than roll out, even if I / O operation cannot be completed immediately. By using the fcntl() interface get flag O_NONBLOCK file descriptor, we can transfer the I / O model of the file to Nonblocking I/O

Then, if an operation cannot be executed immediately, the call function returns EAGAIN ("TRy it again"). The O_NONBLOCK flag does not work with fast files like a regular file, directory file.

Blocking I / O vs Nonblocking I / O

Blocking I / O
Request to execute an IO operation, after completion, return the result. Pocess / Theard calls are blocked until the results return or an exception occurs.

Nonblocking I / O
Request to execute IO operation and return immediately (timeout = 0). If the operation is not ready yet, try again later. Equivalent to check whether the operatio IO is available immediately or not, if it is, then execute and return, otherwise the message will try again later.

Synchronous vs Asynchronous

Synchronous:
Simple understanding: Take place in order. An action is only started when the previous action ends.

Asynchronous:
In no order, actions can take place at the same time or at least, although the actions start in order but not end. An action can start (and even end) before the previous action is completed.

After calling action A, we do not expect immediate results but move to start acting B. Action A will be completed at a future time, when we can return to review the results of A or not. In case of interest in the result of A, we need an Asynchronous Notification event to announce that A has completed.

Since the timing of the event Action A is impossible to determine, the abort of the work being performed to switch to review and process the result of A causes a change in the processing flow of a program. Unpredictable way. The flow of the program is not sequential, but depends on the events that occur. Such a model is called Event-Driven .

Asynchronous – Event-Driven

Asynchronous and Event-Driven not something new. In fact it has existed in computer science since the early days. Cơ chế ngắt is a signal that tells the system that an event has occurred.

When an interrupt occurs, the system must stop the running program to prioritize running another program called Chương Trình Phục Vụ Ngắt Interrupt Service Routine and then return to the next running program.

There are two types of interrupts: Ngắt Cứng and Mgắt Mềm . Softening is called by a command in machine language (and assembly language, of course). Hard interrupt can only be called because electronic components impact the system.

An example of hardening is when the network card receives data from the outside. The network card sends an electrical signal to the interruption pin of the CPU, the interrupt flag (INT) on the resigter is activated. CPU stopped, checked the priority level (of course not the guy who interrupts also chooses, maybe what I'm doing is more important), checking what the interrupt signal is, the interrupt signal that gets the address of the interrupt service (which is the devide driver).

Drive will decide to write data to a Unix Virtual File System file, here is a socket file. The operation with divide (reading data from the keyboard, writing to the screen) is actually decided by devide and the kernel. From the perspective of any user, devide is just a readable, writable file, nothing different and is no different from regular files.

An example object for slow files is socket .

After a socket is created (represented by a desciptor file) and listens on a port, a request for a connection is sent from the client and is accepted accept() to set up the connection (Another file is created, clone From the old socket file, combine the client's port and address, each connection to the server will have a separate file descriptor).

Sytem call recv() is called to be ready to receive messages from the client, under the kernel's view is to read data from the socket file, the action that will return -1 if not yet received the message from the client (And with TCP socket, return 0 if the connection has been disconnected by one side). If in default synchronous mode, the process will sleep until the read is ready for execution. At this point, if another client sends a request to the server, your server will not be able to execute the client's request immediately, even if the server does not even know the existence of the request.

Asychronous in NodeJS

Going back to NodeJS, behind the wings, what brings Asychronous Event-Driven Non-Blocking I / O (Long listening to "I'm Daenerys I Targaryen, Lady Regnant of the Seven Kingdoms, Protector of the Realm , Is Khaleesi of the Great Grass Sea, the Breaker of Chains, the Mother of Dragons and the Queen of Meereen ”is libuv – a multi-platform library that supports asynchronous I / O. The thing in the deps / uv folder on the Github repo of NodeJS is libuv's repo.

These are libuv's strategies for each type of I / O that can be implemented asynchronous:

Thread Pool

The traditional way is to use multithreading . When calling an I / O operation, the main thread leaves and continues to execute another command, the halder operation is assigned to a theard worker theard or a child process (Please don't, the process is very expensive system resources). After this operation is completed in the worker thread, the worker threar notifies the main thread again.

The problem is here:

Born thread consumes resources to create new and need memory for its own stack (As Linux is at least 2MB per thread by default)
Thread-safety issues such as deadlock (due to resource sharing threads), racing conditions, mutex, …
Using the resources for the thread context switching, the kernel's scheduler needs more work
The worker thread is I / O bound

The lines in thread initialization can be reduced by using Thread Pool . In the classic model, with a socket example, older webservers often allow 1 process / thread to be created for each incoming request. That way, the main thread can still listen and acept () new connection requests, while the worker thread can still wait for recv() from the connected client simultaneously (simultaneously). .

Of course, there are more effective ways of using Thread Pool. Looking back at libuv's model, it only uses the thread poll for regular file I/O (Of course, since regular files are fast files and there are noblocking modes), DNS lookup and user code. As for Network I / Os like TCP, UDP, TTY, Pipe, … it uses nonblocking calls in combination with a mechanism that notifies when the channel is ready to execute I / O operation.

These notification mechanisms are provided by each operating system. When notified that the channel is ready, the operation is recalled and retrieves the data (if the data is still available on the channel).

In blocking I / O mechanism, when the process of executing an I / O operation is not ready and pushed into sleep mode, it registers itself into a queue called wait queue on the file. When a file is ready to read and write (this is usually decided by the driver of the file), all processes are waiting for the change of the file in the wait queue of the corresponding event to be awakened.

Nonblocking I / O is different, trying to read / write data to a file is called polling . If there is no mechanism to notify when the file operation is ready to perform, your program will have to continuously polling a file in an infinite loop until successful.

Going back to the socket example, each connected client will generate a connection file, the program needs to monitor all of these files to receive information when there is imcoming request, and must monitor the original socket file for the new client. The problem here is to monitor the ability to perform read / write on a large number of different slow files.

Unix provides an I/O Multiplexing mechanism that allows simultaneous monitoring of multiple file descriptors to see if I / O operation can be performed on any file without being affected. A monitoring call can block the process to call it until there is any file available.

I / O Multiplexing

Three system calls can execute I / O Multiplexing as select() , poll() and epoll() . All three calls require the devide driver support provided via the poll method (Note that this is a file operation, which is different from the system call poll for I / O multiplexing above). This method, basically will subscribe to one or more wait queues of the file to listen for changes on the file, and return a bit mask representing all operations that can be immediately executed with the file. not blocked.

Select System Call

Here is the call of select:

 select (int nfds, fd_set * readfds, fd_set * writefds, fd_set * errorfds, struct timeval * timeout);

1	select (int nfds, fd_set * readfds, fd_set * writefds, fd_set * errorfds, struct timeval * timeout);

readfds and writefds have a data type of fd_set – file descriptor set, containing a list of file descriptors that we are interested in reading (readfds) and file descriptors that we are interested in writing (wtireds). fd_set type is represented as a bit mask, file descriptors are stored as a field bit in the array of integers. Unix provides us with 4 macros to work with fd_set:

FD_ZERO() : Create an empty fd_set.
FD_SET() : Add a file descriptor to fd_set
FD_CLR() : Delete a file descriptor from fd_set
FD_ISSET() : Check if a file descriptor is in fd_set

Before starting the select, we use FD_ZERO to create empty fd_set, add the file of interest to fd_set via FD_SET pass it into the corresponding parameter in the call to do. If only interested in an event, for example readable, we only need to create a fd_set (for example, readale_set) and pass it to readfds, and writefds and errorfds to NULL .

Above we already know that each process has a list of file descriptors that it is open, which is numbered starting from 0. Normally, each process will have 3 file descriptor available, 0, 1, 2, turn pointing to stdin, stdout, stderr.

Each file desciptor will point to a file description in the file table – where the kernel controls all open files by all processes on the system. The nfds input of select () will point to the largest file description index that the process wants to check.

When select is called, it polling all files with file descriptors from 0 to nfds - 1 in the process. For each file that is polled and selected, it will check whether the file is in any fd_set or not with FD_ISSET . If the file is in readfds that polling result is an unread read, FD_CLR will be used to delete the file from fd_set readfds. Similar to writefds and errorfds.

If the timeout is set to 0, select simply tries to check the availability status of all interested files and returns them immediately. If the timeout value is greater than 0, the select will wait until at least one event is interested in a file descriptor that is available or out of time. The default value of timeout (if the timeout argument is NULL) varies on each system but can be up to several years.

select return an integer, -1 if an error occurs, 0 if the timeout and return the total number of file descriptors available on each fd_set. If a file descriptor is interested in both read and write capabilities, ie in readfds and writefds, it will be counted twice.

An implicit way as a side-effect, select changes fd_set input to ready set , readfds and writefds after the select call is no longer a list of file descriptors we need to know about readable or writeable, which becomes a sub-list of the original list, containing the file descriptors ready to read or write without being blocked.

With this result, if we previously saved a list of file descriptors that we are interested in, since readfds writefds is now ready set, we only need to loop through all the original file descriptors, using FD_ISSET to check that Whether the file descriptor is in readfds (or writefds), if it does, start reading (writing) the data.

Since the input fd_set has been changed after the select call, before restarting the next call, we must re-initialize these fd_set with FD_ZERO and FD_SET .

An example picked up on the network to select:

https://github.com/yangrujing/select-poll-epoll/blob/master/select/server.c

Poll System Call

Poll provides a similar function, which is often recommended to use the alternative for select . The main difference between the two methods is how the list of desctiptor files is managed. With select, we pass 3 fd_set – the bit mask contains a list of desciptor files of interest. In the poll system call, we use a single array to store the files we want to track, surrounded by a struct called pollfd .

 int poll (struct pollfd fds [], nfds_t nfds, int timeout);

1	int poll (struct pollfd fds [], nfds_t nfds, int timeout);

The parameter nfds is the total number of elements of the array with the data type nfds_t being a positive integer.
fds is an array of pollfs, structures that are defined as follows:

 struct pollfd {
int fd; / * file descriptor * /
short events; / * requested events * /
short revents; / * returned events * /
;

struct pollfd {

int fd; / * file descriptor * /

short events; / * requested events * /

short revents; / * returned events * /

;

This is the most important structure of the poll that makes a difference to select . Before calling the poll , we need to initialize the fds array. In each pollfd element, fd stores the file descriptor value of the file of interest, events saves the mask mask of the events we care about in the corresponding file.

Each time the poll is called, it will run a loop of 0 to nfds - 1 , during each iteration it will try polling a corresponding file in fds[i].fd , combining the results returned (the operations are available) to execute immediately with the file without blocking) with fds[i].events to output dfs[i].revents – the mask bit stores the available operations – among the operations that we are interested in on the file .

The timeout parameter, similar to select , will determine the blocking behavior of the poll system call:

Timeout is equal to 0, the poll checks the availability of all files in fds once and returns the result immediately.
With timeout other than 0, the poll process is blocked until at least one file descriptor in fds is ready for a certain action in the corresponding requested events or timeout. Timeout = -1 corresponds to the maximum waiting time.

Receiving an fds list, after each call, the poll system call returns an error or the total number of file descriptors is available for an operation. Regardless of whether the file is available for many operations, it is counted only once in the returned result.

Also implicitly changing the input argument contains a list of interested file descriptors but not like select , the poll only changes revents in the pollfd structure. This makes the input list includes the files and events corresponding interest is preserved, the result is we can reuse fds per call poll system calls.

When receiving the results returned from the system call poll , to know the current status of each file that is of interest, the need to do is loop through the fds list, check revents and perform actions corresponding to the existing operations. sieve.

Example of poll for socket server:

https://github.com/yangrujing/select-poll-epoll/blob/master/poll/server.c

Remember with select we have to create an array to save the file descriptor of the files we are interested in, then use FD_ISSET to check if the file is ready for any operation? With poll , we do not need to have an array like things anymore, fds has played its role already.

Compare the performance of select and poll

The advantage of a select versus poll is obvious:

In select , because the fd_set is value-result, which is both the input and the output, it is changed and needs to be reset to each call, using the poll 's pollfd array helps to avoid this.

Moreover, because select checks all open process files with file descriptor smaller than nfds , if we don't care about all open files, this action can be wasteful, especially if the files need to be interested in uneven distribution. Suppose we only need to know the readable properties of two files with file descriptors of 1 and 1000 respectively, the performance of the select will be a disaster compared to the poll – which only checks the files we care about.

Another weakness of select compared to poll is that it has FD_SETSIZE – which limits the number of trackable file descriptors. By default, this limit is 1024 on Linux, and sadly the limit change requires the program to be recompile.

Libuv's choice

Both select and poll are not the method used in libuv. Libuv's architecture uses epoll , kqueue and dev/poll for Unix-like operating systems.

epoll is a system call of Linux, kqueue is the same system call in all systems developed from BSD (One version of Unix) including Mac OS X, finally dev/poll for Solaris family. All are wrapped by the uv__oi_t interface. Windows itself uses IOCP (I don't know what it is anymore).

These methods (probably) share a common mechanism for monitoring the I / O event notification of many different file descriptors. Let's drag the epoll as a representative to see the difference between it and select , poll system call.

Weak points of select and poll

The common problem of select and poll when monitoring a large number of file descriptors is:

For each call, select and poll polling all the file descriptors needs to be monitored.
For each call, the process must pass a data structure ( fd_set or fds – array of pollfd ) from user mode, through systel call into the kernel, kernel polling file descriptors, get results to modify the structure This data structure before sending back to user mode (More terrible than with select, we need to re-initialize this data structure before each call). Copying a pile of data back and forth between user mode and kernel mode via system call also consumes system resources. In fact, the kernel and user mode have their own stack and heap, although they are all components in a process.
After each call, to check which file descriptor is available, our program must browse and check each element in the list of interested file descriptors.

The select and poll weaknesses come from a limitation in their design:

Typically, the program will monitor continuously, repeating the same file descriptor list, but the kernel does not remember this list between consecutive calls.

Under the kernel view, the file list only exists only in the duration of a select / poll system call. Maybe on your program, ie in user mode, this list is stored somewhere (a list of integers for select or pollfd list with poll ) for reuse, but not for the kernel.

Epoll's solution

On the other hand aka on the other hand, the epoll not like that. Starting with epoll is to step into a long-term relationship.

In fact, epoll is the subsystem of 3 different system calls. The heart of epoll subsystem is epoll instance , represented by a file description and of course as a file, there are several integer file descriptor points to it.

Epoll instance , a long and lasting way, keeps a list of file-descripton-minded (again, the frequency of this phrase seems a lot), and also saves a list of ready file descriptor (Nhớ chứ? Chính là các file descriptor đã sẵn sàng cho việc đọc hoặc ghi).

Tồn tại như là một file, epoll instance chỉ mất đi khi lỗi hoặc gọi lệnh close() , sống xuyên suốt qua các kernel system call, có thể share giữa các thread/process, và, đoán xem còn gì nữa? Epoll file descriptor, giờ đây là pollable , tức là sẽ có wait queue của riêng mình, sẵn sàng để wake up call tất cả các process đang xếp hàng đứng chờ trong đó. Thậm chí nó có thể được giám sát bởi một system call poll , select hay một epoll khác.

Ok, thú thực là mình không hào hứng với việc epoll lồng epoll lắm, nhưng khả năng share epoll instance như là một file giữa các process thì khá hữu ích.

Như chúng ta đã biết (hoặc dành cho các bạn chưa biết), chúng ta có một thủ thuật khá thú vị với multiprocessing trên Unix mang tên pre-fork . Unix process có thể sinh ra child process bằng cách copy toàn bộ context của chính nó thông qua lệnh fork() .

Thông thường tài nguyên giữa các process là độc lập, nhưng khi một child process vừa được sinh ra, nó thừa hưởng toàn bộ tài nguyên của cha và chỉ clone dữ liệu sang memory của riêng nó khi nó bắt đầu thay đổi giá trị khác với process cha. Cơ chế này có một cái tên rất dễ liên tưởng là copy-on-write .

Các file mà process cha sở hữu cũng sẽ được process con thừa hưởng. Lợi dụng điều này, ta có thể tạo một epoll theo dõi một socket đang lắng nghe trước, sau đó fork() ra nhiều process con, các process này cùng chia sẻ một epoll instance . Các webserver sử dụng pre-fork để tự động load-balancing ở tầng kernel thông qua process scheduler.

Epoll API được cung cấp qua 3 system call sau:

epoll_create() : Tạo ra một epoll instance và trả về file descriptor trỏ đến nó hoặc return -1 nếu tạo không thành công.
epoll_ctl() : Quản lý các danh sách các file descriptor cần theo dõi (Thêm, bớt, sửa các event cần quan tâm).
epoll_wait() : Trả lại một số item trong ready list sẵn có.

Với cơ chế cũ, tất cả các file cần giám sát được polling mỗi lần hàm select hoặc poll được gọi. Trong epoll , từng file riêng lẻ được polling mỗi lần nó được thêm hoặc sửa trong danh sách thông qua lệnh epoll_ctl :

 int epoll_ctl(int epfd, int op, int fd, struct epoll_event *ev);

1	int epoll_ctl(int epfd, int op, int fd, struct epoll_event *ev);

Lời gọi này đơn giản return 0 nếu thành công hoặc -1 nếu thất bại. Thất bại có thể xảy ra chẳng hạn như khi thêm một file descriptor mà file ấy đã tồn tại hoặc khi thay đổi các event quan tâm trên một file mà file ấy không nằm trong danh sách. epoll sử dụng một cấu Red-Black Tree để lưu danh sách file, với độ phức tạp là O(log n) cho các hành động thêm bớt sửa xoá.

Đối số epfd là file descriptor của epoll instance tạo ra từ lệnh epoll_create . Đối số fd là file descriptor của file cần giám sát. Đối số op xác định hành động của lệnh epoll_ctl , có thể có các giá trị sau: EPOLL_CTL_ADD , EPOLL_CTL_MOD , EPOLL_CTL_DEL tương ứng với thêm, sửa, xoá phần tử trong danh sách. Đối số cuối cùng là một cấu trúc của riêng epoll subsystem:

 struct epoll_event {
uint32_t events; /* epoll events (bit mask) */
epoll_data_t data; /* User data */
};

struct epoll_event {

uint32_t events; /* epoll events (bit mask) */

epoll_data_t data; /* User data */

};

Trong đó data chứa thông tin giúp ta tìm được file mà ta quan tâm, ví dụ file descriptor và events là bit mask đánh dấu các event ta quan tâm trên file ấy. Một danh sách các epoll_event cũng chính là kết quả quan trọng ta lấy được sau khi gọi epoll_wait() .

Epoll_wait System Call

 int epoll_wait(int epfd, struct epoll_event *evlist, int maxevents, int timeout);

1	int epoll_wait(int epfd, struct epoll_event *evlist, int maxevents, int timeout);

Tương tự như với poll , epoll sẽ chỉ đơn giản là trả về trạng thái sẵn sàng hiện tại của các file descriptor trong danh sách nếu timeout bằng 0. Khi timeout khác 0 (-1 chẳng hạn), epoll sẽ ngủ và chỉ return khi có ít nhất một file descriptor sẵn sàng cho I/O operation.

Ảo diệu ở đây như sau: Khác với poll và select , như đã nói, quá trình polling một file được thực hiện bằng lệnh epoll_ctl , lệnh này chỉ cần gọi một lần khi khởi tạo epoll instance .

Khi có bất kỳ một file descriptor nào thay đổi trạng thái, nó sẽ kích hoạt wait queue của bản thân và đánh thức epoll instance .

Epoll instance sẽ kiểm tra xem file nào vừa đánh thức nó, các ready operation trên file ấy là gì, nếu operation ấy được quan tâm, nó sẽ thêm một epoll_event vào danh sách các ready event mà nó giữ.

Xong việc, epoll instance lăn ra ngủ tiếp. Khi epoll_wait được gọi (và đây có vẻ là thằng được gọi nhiều nhất, gọi đi gọi lại sau mỗi lần lặp trong một vòng lặp vô hạn), user mode không truyền nhiều dữ liệu cho kernel, có chỉ đơn giản yêu cầu kernel trả về một số lượng epoll_event (<= maxevents ) dequeue ra từ danh sách các ready event mà epoll instance nắm giữ (Đồng nghĩa các event đã được user mode biết sẽ được xoá khỏi ready list ).

Nếu có nhiều sự thay đổi trên một file giữa các lần gọi epoll_wait , epoll instance tìm thấy file ấy đã tồn tại trong ready list, nó lấy epoll_event ấy ra và thay đổi events bit mask của nó.

Dựa trên một thực tế là tần suất xảy ra các sự kiện và kích thước của ready list thường chỉ là một phần rất nhỏ trên tổng số các file descriptor cần theo dõi, epoll chỉ update readly list mỗi lần có một sự kiện xảy ra chứ không polling rồi update mỗi một lần gọi hàm (mỗi một lần lặp trong một vòng lặp vô hạn, hoặc nói dân dã là rất-rất-nhiều-lần), và epoll trả lại một deady list dưới dạng một danh sách epoll_event , nơi mà từ đó ta có thể có lấy ra được file đã thay đổi trạng thái thông qua epoll_event.data .

Tham số số evlist trong epoll_wait system call là một danh sách chỉ chứa các ready file, việc cần làm là duyệt qua từng phần tử của nó để có hành động đọc, ghi tương ứng mà không cần lội qua tất cả các file quan tâm như 2 cách truyền thống.

Ví dụ về epoll:

 https://github.com/yangrujing/select-poll-epoll/blob/master/epoll/server.c

1	https://github.com/yangrujing/select-poll-epoll/blob/master/epoll/server.c

Bằng cách tiếp cận như vậy, epoll đặc biệt hiệu quả hơn select và poll trong việc giám sát hàng ngàn, thậm chí hàng chục ngàn file, nhất là khi số lượng event là thưa thớt so với tổng số file cần giám sát.

Thực tế The C10K problem , vấn đề xảy ra khi một webserver có 10000 lượt truy cập đồng thời luôn là một chủ đề gây đau đầu, epoll kết hợp một process/thread quản lý nhiều connection là cách hiệu quả để đối phó với vấn đề này. Xin được giới thiệu bài article về The 10k problem cho bạn nào chưa đọc, bao ngon bao bổ ích bao thú vị:

 http://www.kegel.com/c10k.html

1	http://www.kegel.com/c10k.html

Hai khái niệm nữa cần nhắc đến ở đây, và nếu bạn đã nhìn vào bài article trên thì các bạn cũng đã bắt gặp khái niệm này rồi, đó là Level-Triggered và Edge-Triggered .

Level-Triggered vs Edge-Triggered

Level-Triggered và Edge-Triggered là hai khái niệm mượn từ đâu thì mình cũng không rõ nữa, chắc từ ngành điện tử, bởi vì hình ảnh một sóng tín hiệu là mô tả tuyệt vời nhất cho các khái niệm này.

Nói cho dễ hiểu thì với mô hình Level-Triggered , ta quan tâm đến trạng thái của đối tượng, miễn là đối tượng còn ở trạng thái ấy thì ta còn quan tâm.

Với Edge-Triggered , cái ta quan tâm là sự thay đổi trạng thái của đối tượng, và nó chỉ thay đổi một lần ở mỗi cạnh-của-trạng-thái: Thay đổi từ thấp lên cao hoặc thay đổi từ cao xuống thấp. Nó là thông báo cho một sự kiện mà ta quan tâm.

Tóm lại:

Level-Triggered : Quan tâm đến trạng thái
Edge-Triggered : Quan tâm đến sự kiện

Edge-Triggered model dựa trên giả định rằng nếu ta đã biết trạng thái của đối tượng thay đổi và đã có biện pháp xử lý rồi, việc tiếp tục thông báo về trạng thái hiện tại ấy là không cần thiết.

Level-Triggered tương tự như một ông bố có trách nhiệm, bỏ giở việc nhà ngồi chơi với con khi nó chán (tiếng khóc thông báo cho ông bố về trạng thái chán đời của con) đến bao giờ nó nín mới thôi. Edge-Triggered daddy thì quẳng cho nó cái iPad khi thấy nó khóc, rồi kệ đấy tiếp tục rửa bát lau nhà.

Loanh quanh với việc gia đình vậy thôi, trở lại câu chuyện I/O Multiplexing của chúng ta. Dễ thấy, với mỗi lần gọi select hoặc poll , hai system call này đều kiểm tra tất cả các file descriptor và trả lại tất cả những trường hợp thoả mãn (ready file). Một file mà có sẵn sàng để đọc từ đầu năm đến cuối năm thì lần nào gọi, select và poll cũng kiểm tra rồi báo lại là file ấy đang sẵn sàng.

Select và poll là hai ông bố có trách nhiệm dù trong ví dụ hiện tại với socket server, ta cần một người thiếu trách nhiệm hơn: epoll .

Xem xét cách thức epoll hoạt động, một epoll_event chỉ được thêm vào ready list khi có một sự kiện xảy ra ( epoll instance nhận được thông báo rằng một file đang theo dõi thay đổi trạng thái sẵn sàng).

Sau lời gọi epoll_wait , một số event_polll được xoá khỏi ready list do chương trình đã biết về sự tồn tại của các event này. Nếu một file trong danh sách chỉ thay đổi trạng thái một lần trong năm, chẳng bao giờ nó có cơ hội được chương trình để mắt đến lần thứ 2.

epoll là the recommended edge-triggered poll từ 2.6 Linux kernel, nhưng bên cạnh đó, có còn hỗ trợ cả level-triggered mode, và đây là chế độ mặc định của nó. Để kích hoạt edge-triggered mode, ta thiết đặt cờ EPOLLET cho events bit mask của file descriptor tương ứng.

Trong chế độ này, miễn là operation của file còn sẵn sàng, nó sẽ không bao giờ bị xoá khỏi ready list. Việc này được thực hiện không có gì phức tạp cả, đơn giản là sau lời gọi epoll_wait , các event_poll được dequeue khỏi ready list, nếu file nào kích hoạt cờ EPOLLET , epoll instance sẽ kiểm tra tính sẵn sàng của file ấy, nếu thoả mãn thì add trở lại vào ready list.

Signal Driven I/O và Asynchronous I/O

Và, đấy là tất cả những gì ẩn dưới lời quảng cáo Asychronous Event-Driven Non-Blocking I/O của NodeJS. Ngoài các I/O model trên, trong Unix còn hỗ trợ thêm hai model nữa.

Thứ nhất là signal driven I/O , một cơ chế non-blocking và sử dụng signal để thông báo khi operation sẵn sàng. Signal chính là tin hiệu ngắt mềm của hệ đều hành đã được nhắc ở phần trên.

Còn nhớ kill -9 pid thường dùng trong Unix chứ? Kill chính là lệnh để truyền đi một signal, và process mang pid khốn khổ kia vừa đón nhận singal có số hiệu là 9 – SIGKILL – giết không cần hỏi, không từ chối, không ngoại lệ. Không rõ cơ chế này hoạt động ra sao, chỉ thấy chẳng được dùng mấy và nghe bảo hiệu năng thì kém hơn epoll .

Cơ chế cuối cùng mang tên asynchronous I/O được cung cấp qua các system call có prefix là aoi_* . Với asynchronous I/O , ta chỉ cần gọi kernel một lần, không cần biết là bị block hay không block nhưng tốc độ thực thi chậm (đọc dữ liệu từ regular file trên ổ HDD đời cũ siêu rùa bò chẳng hạn), miễn là kernel chưa thể trả data ngay lập tức, ta có thể kệ nó đi thực hiện việc khác. Sau khi operation hoàn thành, kernel sẽ báo lại và trả dữ liệu cho user mode. Không rõ cách thực hiện và hiệu năng của cách này, chỉ biết nó cũng chẳng được phổ biến rộng.

Synchronous I/O vs Asynchronous I/O

Trong ngữ cảnh hệ đều hành, synchronous I/O là chế độ mà process sẽ phải chờ cho I/O operation hoàn thành rồi mới có thể thực hiện lệnh tiếp, còn asynchronous I/O operation thì không yêu cầu chờ đợi. Với định nghĩa này chỉ có asynchronous I/O model mới được công nhận là asynchronous I/O . Các phương pháp còn lại chỉ không block process cho đến khi I/O operation sẵn sàng, chứ không phải hoàn thành.

Kết Luận

Với nhiều khái niệm được nhắc đến trong bài, hy vọng những ai còn hoang mang về event-driven, synchronous, asynchronous, blocking I/O, non-blocking I/O, synchronous I/O và asynchronous I/O.

Tuy nhiên, đừng cứng nhắc xem xét các khái niệm này chỉ với cách giải thích trong bài hoặc với các cách giải thích khác mà ta nhìn thấy ở bất cứ đâu, bởi vì các từ ngữ, xét cho cùng, theo giời gian đều thay đổi ý nghĩa tuỳ thuộc vào ngữ cảnh. Cũng như libuv tự quảng cáo nó là một multi-platform support library with a focus on asynchronous I/O , mặc dù không sử dụng một system call bắt đầu bằng aio_ nào cả.

Như một lần, xin được miễn trách nhiệm cá nhân đối với các quan điểm trong bài viết.

Share the news now