I subscribe to the Chef blog in my RSS reader, The Old Reader, so that I can stay up to date with recent releases and changes happening in the community and in the various software projects that make up, “Chef.” Recently, I saw the release announcement for chef-client 12.6.0. I scrutinize these especially closely, as they’re an easy way to follow along with major upcoming changes that I might encounter.
One particular feature that stuck out in the release notes was a new option,
New –profile-ruby option
You like to go fast and so do we. The ‘chef’ gem now has new optional development support for profiling Ruby. When invoked, this option will dump a profiling graph that can be used to determine and resolve performance bottlenecks with chef-client runs.
It turns out the new feature, added by Lamont Granquist in this initial commit, uses the ruby-prof gem to generate a report that shows where time was spent during the Chef run. After creating a pull request to add a configuration option in Test-Kitchen for the profiling feature, I took a look at where most time was spent in a Chef run converging the elasticsearch cookbook.
Using my patched version of test-kitchen, I enabled the profiling option for the chef_zero provisioner:
provisioner: name: chef_zero profile_ruby: true
Then, I ran a fresh
kitchen test, and saw the following message in the output:
Chef Client finished, 23/56 resources updated in 01 minutes 20 seconds [2016-01-02T17:50:33+00:00] WARN: Ruby execution profile dumped to /tmp/kitchen/cache/graph_profile.out Finished converging <package-ubuntu-1404> (1m34.58s).
And here’s what I found at the top of
/tmp/kitchen/cache/graph_profile.out on that instance:
Measure Mode: wall_time Thread ID: 13693940 Fiber ID: 24968380 Total Time: 80.0750961303711 Sort by: total_time
Given that the entire run took slightly more than 1.5 minutes, 80 seconds seems like a very reasonable number for the total timing. Below that header, there’s also specific details showing how much time was spent in each method. I had to pull some of the explanation from the ruby-prof gem to really understand what I was seeing:
Dashed lines divide the report into entries, with one entry per method. Entries
are sorted by total time which is the time spent in the method plus its
Each entry has a primary line demarked by values in the %total and %self
columns. The primary line represents the method being profiled. Lines above it
are the methods that called this method (parents) while the lines below it are
the methods it called (children).
In my case, here’s what the logs showed:
64.619 0.000 0.000 64.619 1/1 Chef::Provider::RemoteFile::Content#grab_file_from_uri 80.70% 0.00% 64.619 0.000 0.000 64.619 1 Chef::Provider::RemoteFile::HTTP#fetch 64.473 0.000 0.000 64.472 1/1 Chef::HTTP#streaming_request 0.145 0.000 0.000 0.145 1/1 Chef::Provider::RemoteFile::HTTP#update_cache_control_data 0.001 0.000 0.000 0.001 1/1 Chef::Provider::RemoteFile::HTTP#headers 0.000 0.000 0.000 0.000 1/1 Chef::Provider::RemoteFile::HTTP#http_client_opts 0.000 0.000 0.000 0.000 1/30899 Class#new 0.000 0.000 0.000 0.000 1/15 Tempfile#close 0.000 0.000 0.000 0.000 1/1 Chef::HTTP#last_response
In short, 80% of my chef run was spent in
Chef::Provider::RemoteFile::HTTP#fetch, almost 65 seconds worth. This makes sense, since the cookbook’s longest task is downloading the package file to install it. While this may seem trivial, it’s hugely useful to see where the time is being spent, especially given that the only other way to do this currently is timestamps.
Ruby-prof also supports fancy HTML visualizations of time spent in the call stack, though these are not supported in the chef-client (yet!):
Kudos to Lamont on this cool feature!