Ruby interpreter uses a single process by design. This means on your modern 8 core cpu, your script is going to use only 1/8th of your processing power at best. In this article, we’ll see more about multi-threaded and multi-process ruby.
Using multiple threads
Matz ruby interpreter uses GIL (Global Interpreter Lock), thus it only lets one thread to run at a time. So, in cpu bound tasks, there is no benefit of using multi thread, and it’ll yield no benefit for you (in ruby).
First, let’s take this example to calculate Fibonacci number. It’s totally a cpu bound task.
1 2 3 4 5 | <span class="token keyword">def</span> <span class="token function">fib</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span> <span class="token keyword">return</span> n <span class="token keyword">if</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span>include<span class="token operator">?</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span> <span class="token function">fib</span><span class="token punctuation">(</span>n<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token function">fib</span><span class="token punctuation">(</span>n<span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">)</span> <span class="token keyword">end</span> |
1 2 | <span class="token constant">Benchmark</span><span class="token punctuation">.</span>measure <span class="token punctuation">{</span> <span class="token number">10.</span>times <span class="token punctuation">{</span> <span class="token function">fib</span><span class="token punctuation">(</span><span class="token number">35</span><span class="token punctuation">)</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> |
1 2 3 4 5 6 7 | (CPU time|system CPU time|user and system CPU times|real time) 38.243695 0.647830 38.891525 ( 41.074481) 36.667084 0.550266 37.217350 ( 38.464907) 38.844508 0.711785 39.556293 ( 42.610056) =>AVG: 40.72s |
Let’s run it now using 10 threads.
1 2 3 4 5 6 7 8 | <span class="token constant">Benchmark</span><span class="token punctuation">.</span>measure <span class="token keyword">do</span> threads <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span> <span class="token number">10.</span>times <span class="token keyword">do</span> threads <span class="token operator"><</span><span class="token operator"><</span> <span class="token builtin">Thread</span><span class="token punctuation">.</span><span class="token keyword">new</span> <span class="token punctuation">{</span> <span class="token builtin">Thread</span><span class="token punctuation">.</span>current<span class="token punctuation">[</span><span class="token symbol">:output</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">fib</span><span class="token punctuation">(</span><span class="token number">35</span><span class="token punctuation">)</span> <span class="token punctuation">}</span> <span class="token keyword">end</span> threads<span class="token punctuation">.</span><span class="token keyword">each</span> <span class="token punctuation">{</span> <span class="token operator">|</span>thread<span class="token operator">|</span> thread<span class="token punctuation">.</span>join <span class="token punctuation">}</span> <span class="token keyword">end</span> |
On an ideal world, we’d hope tenfold performance increase. But,
1 2 3 4 5 6 | 38.623686 0.611559 39.235245 ( 40.751415) 38.077194 0.579472 38.656666 ( 39.956344) 38.445872 0.603536 39.049408 ( 40.273643) =>AVG: 40.33s |
So, what’s the benefit of using threads then?
The answer is, None (if you’re trying to solve a cpu bound problem).
But, if you’re trying to solve an IO bound issue, then threads will speedup your performance a lot.
Example: Performing HTTP requests with multiple threads
Imagine a scenario, where we have a method that checks if it can access some websites and responds back with HTTP status code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | <span class="token keyword">require</span> <span class="token string">'benchmark'</span> <span class="token keyword">require</span> <span class="token string">'net/http'</span> <span class="token keyword">def</span> check servers servers<span class="token punctuation">.</span><span class="token keyword">each</span> <span class="token keyword">do</span> <span class="token operator">|</span>server<span class="token operator">|</span> response <span class="token operator">=</span> <span class="token constant">Net</span><span class="token punctuation">:</span><span class="token punctuation">:</span><span class="token constant">HTTP</span><span class="token punctuation">.</span><span class="token function">get_response</span><span class="token punctuation">(</span>server<span class="token punctuation">,</span> <span class="token string">'/'</span><span class="token punctuation">)</span> puts server<span class="token punctuation">,</span> response<span class="token punctuation">.</span>code <span class="token keyword">end</span> <span class="token keyword">end</span> <span class="token constant">SERVERS</span> <span class="token operator">=</span> <span class="token builtin">Array</span><span class="token punctuation">.</span><span class="token keyword">new</span><span class="token punctuation">(</span><span class="token number">100</span><span class="token punctuation">,</span> <span class="token string">"www.google.com"</span><span class="token punctuation">)</span> puts <span class="token constant">Benchmark</span><span class="token punctuation">.</span>measure <span class="token punctuation">{</span><span class="token function">check</span><span class="token punctuation">(</span><span class="token constant">SERVERS</span><span class="token punctuation">)</span><span class="token punctuation">}</span> |
1 2 3 | ruby thread.rb 0.078843 0.046223 0.125066 ( 27.263542) |
Now, let’s rewrite the code to use multiple threads to do the job
1 2 3 4 5 6 7 8 9 10 11 12 13 | <span class="token keyword">def</span> check servers threads <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span> servers<span class="token punctuation">.</span><span class="token keyword">each</span> <span class="token keyword">do</span> <span class="token operator">|</span>server<span class="token operator">|</span> threads <span class="token operator"><</span><span class="token operator"><</span> <span class="token builtin">Thread</span><span class="token punctuation">.</span><span class="token keyword">new</span> <span class="token punctuation">{</span> response <span class="token operator">=</span> <span class="token constant">Net</span><span class="token punctuation">:</span><span class="token punctuation">:</span><span class="token constant">HTTP</span><span class="token punctuation">.</span><span class="token function">get_response</span><span class="token punctuation">(</span>server<span class="token punctuation">,</span> <span class="token string">'/'</span><span class="token punctuation">)</span> puts server<span class="token punctuation">,</span> response<span class="token punctuation">.</span>code <span class="token punctuation">}</span> <span class="token keyword">end</span> threads<span class="token punctuation">.</span><span class="token keyword">each</span> <span class="token punctuation">{</span> <span class="token operator">|</span>thread<span class="token operator">|</span> thread<span class="token punctuation">.</span>join <span class="token punctuation">}</span> <span class="token keyword">end</span> |
1 2 | 0.094302 0.038597 0.132899 ( 1.383422) |
That is a huge improvement over our single threaded implementation. On the plus side, running multiple threads don’t increase memory usage exponentialy like using multi-process.
Benefits
- Speedup for blocking operations
- Variables can be shared/modified (beaware of deadlocks)
- No extra memory used
Cons
- Much harder to debug
Using multiple processes
Remember our fibonacci implementation from #threads section.
1 2 3 4 5 | <span class="token keyword">def</span> <span class="token function">fib</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span> <span class="token keyword">return</span> n <span class="token keyword">if</span> <span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">,</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">.</span>include<span class="token operator">?</span><span class="token punctuation">(</span>n<span class="token punctuation">)</span> <span class="token function">fib</span><span class="token punctuation">(</span>n<span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span> <span class="token operator">+</span> <span class="token function">fib</span><span class="token punctuation">(</span>n<span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">)</span> <span class="token keyword">end</span> |
1 2 | <span class="token constant">Benchmark</span><span class="token punctuation">.</span>measure <span class="token punctuation">{</span> <span class="token number">10.</span>times <span class="token punctuation">{</span> <span class="token function">fib</span><span class="token punctuation">(</span><span class="token number">35</span><span class="token punctuation">)</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> |
1 2 3 4 5 6 7 | (CPU time|system CPU time|user and system CPU times|real time) 38.243695 0.647830 38.891525 ( 41.074481) 36.667084 0.550266 37.217350 ( 38.464907) 38.844508 0.711785 39.556293 ( 42.610056) =>AVG: 40.72s |
We’ll now try to run this with multiple process instead of threads.
The re-written function will be
1 2 3 4 5 6 7 8 9 10 11 12 13 | <span class="token constant">Benchmark</span><span class="token punctuation">.</span>measure <span class="token punctuation">{</span> read_stream<span class="token punctuation">,</span> write_stream <span class="token operator">=</span> <span class="token builtin">IO</span><span class="token punctuation">.</span>pipe <span class="token number">10.</span>times <span class="token keyword">do</span> <span class="token constant">Process</span><span class="token punctuation">.</span>fork <span class="token keyword">do</span> write_stream<span class="token punctuation">.</span>puts <span class="token function">fib</span><span class="token punctuation">(</span><span class="token number">35</span><span class="token punctuation">)</span> <span class="token keyword">end</span> <span class="token keyword">end</span> <span class="token constant">Process</span><span class="token punctuation">.</span>waitall write_stream<span class="token punctuation">.</span>close results <span class="token operator">=</span> read_stream<span class="token punctuation">.</span>read read_stream<span class="token punctuation">.</span>close <span class="token punctuation">}</span> |
now, let’s see the benchmark.
1 2 3 4 5 6 | 0.001240 0.005190 63.827237 ( 17.158324) 0.001579 0.007635 65.032995 ( 19.821757) 0.001433 0.006900 64.022068 ( 18.152649) =>AVG: 18.38s |
So, compared to 40 sec, it’s using 18 sec. Which is a great improvement. Note the memory usage
This implementation is using 10 times higher memory. Which is the tradeoff.
Benefits
- Speedup through multiple CPUs
- Speedup for blocking operations
- Variables are protected from change
- Child processes are killed when your main process is killed through Ctrl+c or kill -2
Cons
- Memory usage will be higher.
Summary
It’s best described in this excellent article from Eqbal Quran .
In conclusion, we can say, there is not end all be all solution on which is best. We have to understand the workload and choose the best solution for our problem.