Before understanding Thread and Process in Ruby we need to understand two concepts of Concurrency and Parallelism .
Concurrency vs Parallelism
Concurrency
It can be simply understood that the Process can rotate tasks to save time “free” of each task.
For example, when we cook noodles, we need to perform 2 tasks: boil water and peel the noodles and seasoning packages into the bowl. Now, instead of waiting for the water to boil, in the meantime we can prepare noodles => much more time optimization.
Illustration:
Parallelism
Simply performing many tasks in parallel when there are many processes / CPU.
For example, the same job of cooking noodles as above, but 2 people work together, one person can prepare noodles, 1 person can boil water. Since two people are working at the same time, it will be faster.
Illustration:
Thread vs Process
Thread
Ruby provides Thread class to help us create and process threads. So does more threads mean the code will run faster? Let’s try an example below:
When not using the thread:
1 2 3 4 5 6 7 8 9 10 11 12 |
require 'benchmark' def fib(n) return n if [0,1].include?(n) fib(n-1) + fib(n-2) end puts Benchmark.measure { 4.times { fib(32) } } (CPU time|system CPU time|user and system CPU times|real time) 2.580000 0.000000 2.580000 ( 2.583519) |
When using thread:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
require 'benchmark' def fib(n) return n if [0,1].include?(n) fib(n-1) + fib(n-2) end puts Benchmark.measure { threads = [] 4.times do threads << Thread.new { Thread.current[:output] = fib(32) } end threads.each { |thread| thread.join } } (CPU time|system CPU time|user and system CPU times|real time) 2.710000 0.020000 2.730000 ( 2.726402) |
We see the execution time is almost equivalent, it is because Thread uses concurrency task. The thread has 5 statuses:
- sleep: when using Thread.stop or thread waiting for I / O
- run: when the thread is being executed
- aborting: when thread is aborting (eg using sleep command)
- false: when using Thread.exit
- nil: when thread is terminated when exception occurs
When executing multiple threads at the same time, ruby will execute each thread one by one until the status is no longer run. At this point the process will switch to the next thread.
So Thread will be really effective only when the program uses I / O request like request to another server, query database, read data from hard drive, … or simply use sleep in code. For example:
Do not use threads:
1 2 3 4 5 6 7 8 9 10 11 12 |
require 'benchmark' def fib(n) return n if [0,1].include?(n) fib(n-1) + fib(n-2) end puts Benchmark.measure { 4.times { fib(32); sleep(1) } } (CPU time|system CPU time|user and system CPU times|real time) 2.650000 0.000000 2.650000 ( 6.657430) |
Thread usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
require 'benchmark' def fib(n) return n if [0,1].include?(n) fib(n-1) + fib(n-2) end puts Benchmark.measure { threads = [] 4.times do threads << Thread.new { Thread.current[:output] = fib(32); sleep(1) } end threads.each { |thread| thread.join } } (CPU time|system CPU time|user and system CPU times|real time) 2.730000 0.010000 2.740000 ( 3.738425) |
Process
Processes use Parallelism task so if many processes are executed at the same time, the program will run faster. For example:
Don’t use the process:
1 2 3 4 5 6 7 8 9 10 11 12 |
require 'benchmark' def fib(n) return n if [0,1].include?(n) fib(n-1) + fib(n-2) end puts Benchmark.measure { 16.times { fib(31) } } (CPU time|system CPU time|user and system CPU times|real time) 6.400000 0.000000 6.400000 ( 6.402474) |
Use 8 processes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
require 'benchmark' def fib(n) return n if [0,1].include?(n) fib(n-1) + fib(n-2) end puts Benchmark.measure { read_stream, write_stream = IO.pipe 8.times do Process.fork do 2.times { write_stream.puts fib(31) } end end Process.waitall write_stream.close results = read_stream.read read_stream.close } (CPU time|system CPU time|user and system CPU times|real time) 0.010000 0.000000 13.280000 ( 1.805481) |
We can see that when using a process, the code is executed much faster, but the more processes, the faster the code runs? Let’s try the example:
Divide the task into 16 processes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
require 'benchmark' def fib(n) return n if [0,1].include?(n) fib(n-1) + fib(n-2) end puts Benchmark.measure { read_stream, write_stream = IO.pipe 16.times do Process.fork do write_stream.puts fib(31) end end Process.waitall write_stream.close results = read_stream.read read_stream.close } (CPU time|system CPU time|user and system CPU times|real time) 0.040000 0.080000 13.910000 ( 1.858237) |
We see the result is almost equivalent to using 8 processes so not necessarily the more processes, the faster the code runs, but also depends on the number of CPU cores.
References
https://naturaily.com/blog/multiprocessing-in-ruby
https://ruby-doc.org/core-2.6.3/Process.html
http://tutorials.jenkov.com/java-concurrency/concurrency-vs-parallelism.html