Ruby Performance – Profile in optimizing Ruby performance – Part 4

Tram Ho

In the previous sections, we have learned techniques for optimizing ruby ​​performance. But what if those techniques don’t work? The answer is a profile

The profile technique is the only way to answer the question cái gì làm cho code bị chậm? . In this article, we will explore the secrets in the profile that can easily find things that cause slow down in the program.

This article will have two main parts, that is, memory and CPU usage. Part 2 converts the previous result to show how the code is slow.

The tool used in the profile process is ruby-prof . It will measure the execution time of the program and will break down into the separate functions that the program uses.

After you make measurements from ruby-prof, you can visualize them with print tools built by ruby-prof or with KCachegrind.

Also, we need to specify the rule in the CPU profile. The first and only rule for a successful CPU profile process is to tắt garbage collector

Measured with Ruby-prof

ruby-prof is a Ruby gem including API and command-line tool. In Rails, we can test a request by adding ruby-prof in the middleware process.

1. ruby-prof API

For example:

We can use blocks for ruby-prof:

There are several types of reports for ruby-prof to print the results, above we use FlatPrinter , which will show the total execution time of each function. You will easily see which functions are time consuming for CPU.

When run, we will have a profile results as follows:

# ruby_prof_example_api1_profile.txt

Thread ID: 70237499659060

Fiber ID: 70237507374720

Total: 0.001111

Sort by: self_time

% selftotalselfwaitchildcallsname
59.980.0010.0010.0000.0002Regexp # match
28.960.0010.0000.0000.001firstClass :: Date #parse
4.720.0010.0000.0000.001firstGlobal # [Nomethod]
3.810.0000.0000.0000.000firstString # gsub!
1.220.0000.0000.0000.000firstMatchData # begin
0.670.0000.0000.0000.000firstString # [] =
0.380.0000.0000.0000.000firstFixnum # div
0.250.0000.0000.0000.000firstMatchData # end

The above code spends more than 60% of the time searching for dates passed through a regular expression. Most of the remaining time is spent on Date#parse , which may be the initialization of the Date object.

Now, let’s learn how to use the command line.

2. The ruby-prof command line tool

We have the program:

We run the command ruby-prof to execute the above program profile:

-P option to specify output format for the profile. And -m to limit the display to all functions that take less than a specific time. In this case, there is no need to care which function runs less than 1%.

We have the result: Thread ID: 69883126035220

Fiber ID: 69883132260680

Total: 0.002094

Sort by: self_time

% selftotalselfwaitchildcallsname
17.370.0010.0000.0000.0013Kernel # gem_original_requi
16.970.0000.0000.0000.0002Regexp # match
10.850.0010.0000.0000.000firstClass :: Date #parse
2.360.0000.0000.0000.000113Module # method_added
1.830.0000.0000.0000.0006IO # set_encoding
1.570.0000.0000.0000.000firstString # gsub!
1.230.0000.0000.0000.0006MonitorMixin # mon_enter
1.160.0020.0000.0000.0022Global # [Nomethod]
1.120.0000.0000.0000.0003Array # each
1.070.0000.0000.0000.00049BasicObject # singleton_met
1.030.0000.0000.0000.0006MonitorMixin # mon_exit

With the above profile, we can track which command takes the most time in the whole program. In this case, Kernel#gem_original_require , we can also see the time for the require 'date' command. In addition to instance initialization functions, classes also appear. Not surprisingly, with a simple program like this, the time for initialization is quite much compared to the necessary code.

Having a profile created with the ruby-prof tool will be helpful if you care about program optimization at startup and gem loading time. In other cases, it also adds a lot of data to the results, and you should use the ruby-prof API more.

3. Rails Middleware

The Ruby-prof API works best when you want to create a profile for a part of a Rails application, such as a controller action, or a time to render a template. But if you want to learn about middleware, routing or controller creation, you need to install ruby-prof like middleware.

To do this, add ruby-prof gem 'ruby-prof' to Gemfile: gem 'ruby-prof'

Add ruby-prof Rack Adapter to middleware, in config/application.rb

The use function will add the profiler below the middleware stack. So the profile will be added to the Rails initialization code, but bypassing the middleware. If you want to profile middleware, you need to add a profiler before Rack :: Runtime.

With Rack :: RubyProf middleware uploaded, you will get the results of the report saved in the path you installed above.

Don’t forget to disable GC before Rack :: RubyProf starts. The best way to do this is to write one or more middleware and prepend the profiler in the stack.

Visualize with Ruby-Prof Printer

Ruby-prof can generate many types of easy-to-understand reports. The three most common types are flat, call graph and call stack. Each type will have a different purpose. We will start with a more complex program that applies to each type of profiler:

The above program will generate a random CSV with a list of people and include birth dates, handle it and find the youngest from the list. We will print out 3 reports according to the 3 categories above. And the limit for printing functions takes more than 3% of the time.

Flat report: Find functions that run slowly

The flat profile displays the total time each function runs. For example, with the above program, we have:

Thread ID: 70137029546800

Fiber ID: 70137039955760

Total: 1.882948

Sort by: self_time

% selftotalselfwaitchildcallsname
27.331,3540.5150.0000.839150000Object # parse_col
22.310.8060.4200.0000.38650000Class :: Date #parse
8.590.1620.1620.0000.000100000Regexp # match
5.111,7070.0960.0001,61150000Object # parse_row
4.791,7970.0900.0001,70750002Array # map!
4.660.0880.0880.0000.00050001String # split
4.460.0840.0840.0000.00050000String # gsub!


%self : is the percentage of total time spent on this function.

total : is the total time lost for this function, including the function’s execution time when called.

self : is the time consumed by the function, excluding the execution time when it calls.

wait : is the waiting time for other threads. It will always be 0 with a single thread application.

child : is the time consumed in functions called from the current function

calls : is the total number of calls to that function

Flat reports are organized by time. So the functions at the top are the most time-consuming functions.

And the self time value in the report is the total running time of all calls to that function, regardless of any calls in the code. For example, the String#split function is called directly from parse_data and 50,000 times in the parse_row loop. In the report, we see that the String # split function is called all 50,001 times.

Please note that self time does not include the time for nested function calls. For example, Object # parse_col’s self time does not include regular expresson runtime or date compilation. Although with a simple function with only an if-else expression, Object # parse_col still takes a considerable amount of time.

We all know, the top function is the slowest run. So it needs to be optimized first. But the optimization process is not easy. It cannot optimize the slowest function by self time, for 2 reasons:

  • Profiler only reports a large amount of self time for a function even when you speed it up. For example, Object # parse_col does nothing in this. This is all due to the implementation of the if-else expression ..
  • The slowest function can be a good library function or is simple enough that you can’t optimize. For example, Array # map is the 3rd slowest function. But it’s a library and we can’t directly fix it.

Both of these cases are actually due to the number of function calls rather than the code within the function. Therefore, the most effective way to optimize is to reduce the number of function calls. And you need to know where and how often the function is called. But in the flat report does not show that. So, you need another profile, that is the graph report.

Graph report: Understand which parts of the code are slow

Graph reports are organized by the total number of times the function was run, including the time since the function called.

Thread ID: 70137029546800

Fiber ID: 70137039955760

Total Time: 1.8829480049999994

Sort by: total_time

% selftotalselfwaitchildcallsname
100.00%0.01%1,8830.0001,883firstGlobal # [No method]
1,7770.0001,7771/1Object # parse_data
0.1060.0000.1061/1Object # find_youngest
95.46%4.79%0.0000.0000.00050000/50002Object # parse_row
0.0310.0310.0001/50002Object # find_youngest
1,7670.0601,7071/50002Object # parse_data
1,7970.0901,70750002Array # map!
1,7070.0961,61150000/50000Object # parse_row
1,3540.5150.839150000/150000Object # parse_col
94.39%0.00%1,7770.0001,7771/1Global # [No method]
1,7770.0001,777firstObject # parse_data
1,7670.0601,7071/50002Array # map!
0.0100.0100.0001/50001String # split
90.67%5.11%1,7070.0961,61150000/50000Array # map!
1,7070.0961,61150000Object # parse_row
0.0770.0770.00050000/50001String # split
0.0000.0000.00050000/50002Array # map!
71.88%27.33%1,3540.5150.839150000/150000Array # map!
1,3540.5150.839150000Object # parse_col
0.8060.4200.38650000/50000Class :: Date #parse
0.0330.0330.00050000/50000String # to_i
42.79%22.31%0.8060.4200.38650000/50000Object # parse_col
0.8060.4200.38650000Class :: Date #parse
0.1620.1620.000100000/100000Regexp # match
0.0840.0840.00050000/50000String # gsub!
0.0460.0460.00050000/50000String # [] =
0.0390.0390.00050000/50000MatchData # begin
0.0310.0310.00050000/50000Fixnum # div
0.0240.0240.00050000/50000MatchData # end
8.59%8.59%0.1620.1620.000100000/100000Class :: Date #parse
0.1620.1620.000100000Regexp # match
5.61%0.00%0.1060.0000.1061/1Global # [No method]
0.1060.0000.106firstObject # find_youngest
0.0750.0000.0751/1Enumerable # max
0.0310.0310.0001/50002Array # map!
4.66%4.66%0.0100.0100.0001/50001Object # parse_data
0.0770.0770.00050000/50001Object # parse_row
0.0880.0880.00050001String # split
4.46%4.46%0.0840.0840.00050000/50000Class :: Date #parse
0.0840.0840.00050000String # gsub!
3.98%0.00%0.0750.0000.0751/1Object # find_youngest
0.0750.0000.075firstEnumerable # max
0.0750.0350.0401/1Array # each
3.98%1.87%0.0750.0350.0401/1Enumerable # max
0.0750.0350.040firstArray # each
0.0400.0400.00049999/49999Date # <=>

The meaning of the columns is similar to the flat report. But the function calls will be immediately placed above and the called calls will be placed below.

The second line is Array # map! require more effort. The function runs in the middle, because it includes both the 3 calls above and the 2 calls below.

Calls column shows Array # map! 50,002 times: 50,000 times from Object # parse_now, one is from Object # find_youngest, and one is from Object # parse_data.

The timing column gives a different meaning to the call. Each call will display total, self and child that Array # map! needed when the call is executed.

The useful information from that report is first, we can get a good overview of where the program runs. For example, the program runs almost 90% of the time data processing the CSV stream. Secondly, we can track the profile from top to bottom to indicate the idea of ​​what code we should optimize. For example, we can see the slowest function through the total time Array # map! call Object # parse_row. That is called String # split, there is quite a lot of self time. So one way to optimize is to reduce the number of String # split calls or replace them with other functions.

Share the news now

Source : Viblo