Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 3 additions & 18 deletions doc/jit/yjit.md
Original file line number Diff line number Diff line change
Expand Up @@ -525,23 +525,8 @@ PERF=record ruby --yjit-perf=codegen -Iharness-perf benchmarks/lobsters/benchmar

# Aggregate results
perf script > /tmp/perf.txt
../ruby/misc/jit_perf.py /tmp/perf.txt
../ruby/misc/jit_perf.rb /tmp/perf.txt
```

#### Building perf with Python support

The above instructions work fine for most people, but you could also use
a handy `perf script -s` interface if you build perf from source.

```bash
# Build perf from source for Python support
sudo apt-get install libpython3-dev python3-pip flex libtraceevent-dev \
libelf-dev libunwind-dev libaudit-dev libslang2-dev libdw-dev
git clone --depth=1 https://github.com/torvalds/linux
cd linux/tools/perf
make
make install

# Aggregate results
perf script -s ../ruby/misc/jit_perf.py
```
This aggregation script reads the text output from `perf script`, so it does not require `perf`
to be built with scripting support.
56 changes: 55 additions & 1 deletion doc/jit/zjit.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,8 @@ ZJIT options:
--zjit-stats[=quiet]
Enable collecting ZJIT statistics (=quiet to suppress output).
--zjit-disable Disable ZJIT for lazily enabling it with RubyVM::ZJIT.enable.
--zjit-perf Dump ISEQ symbols into /tmp/perf-{}.map for Linux perf.
--zjit-perf[=iseq|hir]
Dump symbols for Linux perf /tmp/perf-{}.map (default: iseq).
--zjit-log-compiled-iseqs=path
Log compiled ISEQs to the file. The file will be truncated.
--zjit-trace-exits[=counter]
Expand All @@ -156,6 +157,59 @@ ZJIT options:
$
```

### Profiling with Linux perf

`--zjit-perf` allows you to profile JIT-ed methods along with other native functions using Linux perf.
When you run Ruby with `perf record`, perf looks up `/tmp/perf-{pid}.map` to resolve symbols in JIT code,
and this option lets ZJIT write ISEQ symbols into that file.

#### Call graph

Here's an example way to use this option with [Firefox Profiler](https://profiler.firefox.com)
(See also: [Profiling with Linux perf](https://profiler.firefox.com/docs/#/./guide-perf-profiling)):

```bash
# Compile the interpreter with frame pointers enabled
./configure --enable-zjit --prefix=$HOME/.rubies/ruby-zjit --disable-install-doc cflags=-fno-omit-frame-pointer
make -j && make install

# [Optional] Allow running perf without sudo
echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

# Profile Ruby with --zjit-perf
cd ../ruby-bench
PERF="record --call-graph fp" ruby --zjit-perf -Iharness-perf benchmarks/liquid-render/benchmark.rb

# View results on Firefox Profiler https://profiler.firefox.com.
# Create /tmp/test.perf as below and upload it using "Load a profile from file".
perf script --fields +pid > /tmp/test.perf
```

#### ZJIT HIR

You can also profile the number of cycles consumed by code generated from each kind of HIR instruction.

```bash
# Install perf
apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

# [Optional] Allow running perf without sudo
echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

# Profile Ruby with --zjit-perf=hir
cd ../ruby-bench
PERF=record ruby --zjit-perf=hir -Iharness-perf benchmarks/lobsters/benchmark.rb

# Aggregate results
perf script > /tmp/perf.txt
../ruby/misc/jit_perf.rb /tmp/perf.txt
```

This aggregation script reads the text output from `perf script`, so it does not require `perf`
to be built with scripting support.

### Source level documentation

You can generate and open the source level documentation in your browser using:
Expand Down
116 changes: 0 additions & 116 deletions misc/jit_perf.py

This file was deleted.

166 changes: 166 additions & 0 deletions misc/jit_perf.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
#!/usr/bin/env ruby

class JITPerf
INTERPRETER_SYMBOLS = [
# rb_* entry points
"rb_call0",
"rb_funcallv_scope",
"rb_yield",

# VM helpers without rb_vm_/vm_ prefixes
"callable_method_entry_or_negative",
"invoke_block_from_c_bh",
"setup_parameters_complex",
].freeze

def initialize
@total_cycles = 0
@category_cycles = Hash.new(0)
@detailed_category_cycles = Hash.new { |hash, category| hash[category] = Hash.new(0) }
@categories = {}
end

def read(path)
File.foreach(path).with_index(1) do |line, lineno|
next if line.strip.empty?

process_event(parse_line(line))
rescue ArgumentError => error
abort "#{path}:#{lineno}: #{error.message}"
end
rescue SystemCallError => error
abort "#{path}: #{error.message}"
end

def print_report
return if @total_cycles == 0

puts "Aggregated Event Data:"
puts format("%-20s %-50s %20s %15s", "[dso]", "[symbol or category]", "[top-most cycle ratio]", "[num cycles]")

most_common(@category_cycles).each do |category, cycles|
ratio = cycles.to_f / @total_cycles * 100
dsos = @detailed_category_cycles[category].each_key.map(&:first).uniq
dso_display = dsos.length == 1 ? dsos.first : "Multiple DSOs"
puts format("%-20s %-50s %20.2f%% %15d", dso_display, truncate_symbol(category), ratio, cycles)
end

most_common(@category_cycles).each do |category, _cycles|
next unless @categories.key?(category)

symbols = @detailed_category_cycles[category]
category_total = symbols.values.sum
category_ratio = category_total.to_f / @total_cycles * 100

puts
puts format("Category: %s (%.2f%%)", category, category_ratio)
puts format("%-20s %-50s %20s %15s", "[dso]", "[symbol]", "[top-most cycle ratio]", "[num cycles]")

most_common(symbols).each do |(dso, symbol), cycles|
symbol_ratio = cycles.to_f / category_total * 100
puts format("%-20s %-50s %20.2f%% %15d", dso, truncate_symbol(symbol), symbol_ratio, cycles)
end
end
end

private

def parse_line(line)
# Example:
# ruby 78207 3482.848465: 1212775 cpu_core/cycles:P/: 5c0333f682e1 [JIT] getlocal_WC_0+0x0 (/tmp/perf-78207.map)
#
# Split into command, pid, timestamp, period, event, ip, and the remaining
# "symbol (dso)" text. The final field is kept intact because symbols can
# contain spaces.
fields = line.split(nil, 7)
raise ArgumentError, "unexpected perf script line: #{line.chomp}" if fields.length < 7

begin
period = Integer(fields[3])
rescue ArgumentError, TypeError
raise ArgumentError, "unexpected sample period in perf script line: #{line.chomp}"
end

# Parse the trailing "symbol (dso)" text from the right, then drop the
# instruction offset after "+" from the symbol name.
dso_start = fields[6].rindex(" (")
raise ArgumentError, "missing dso in perf script line: #{line.chomp}" unless dso_start

dso_with_suffix = fields[6][(dso_start + 2)..-1]
dso_end = dso_with_suffix.index(")")
raise ArgumentError, "missing dso terminator in perf script line: #{line.chomp}" unless dso_end

symbol = fields[6][0...dso_start].split("+", 2).first
dso = dso_with_suffix[0...dso_end]

[dso, symbol, period]
end

def process_event(event)
full_dso, symbol, cycles = event
dso = File.basename(full_dso || "Unknown_dso")
symbol ||= "[unknown]"

@total_cycles += cycles

category = categorize_symbol(dso, symbol)
@category_cycles[category] += cycles
@detailed_category_cycles[category][[dso, symbol]] += cycles

@categories[category] = true if category.start_with?("[") && category.end_with?("]")
end

def truncate_symbol(symbol, max_length = 50)
symbol.length <= max_length ? symbol : "#{symbol[0...(max_length - 3)]}..."
end

def categorize_symbol(dso, symbol)
if dso == "sqlite3_native.so"
"[sqlite3]"
elsif symbol.include?("SHA256")
"[sha256]"
elsif symbol.start_with?("[JIT] gen_send")
"[JIT send]"
elsif symbol.start_with?("[JIT]") || symbol.start_with?("ZJIT: ") || dso.start_with?("perf-")
"[JIT code]"
elsif symbol.include?("::") || symbol.start_with?("_ZN4yjit") || symbol.start_with?("_ZN4zjit")
"[JIT compile]"
elsif symbol.start_with?("rb_vm_") || symbol.start_with?("vm_") || INTERPRETER_SYMBOLS.include?(symbol)
"[interpreter]"
elsif symbol.start_with?("rb_hash_") || symbol.start_with?("hash_")
"[rb_hash_*]"
elsif symbol.start_with?("rb_ary_") || symbol.start_with?("ary_")
"[rb_ary_*]"
elsif symbol.start_with?("rb_str_") || symbol.start_with?("str_")
"[rb_str_*]"
elsif symbol.start_with?("rb_sym") || symbol.start_with?("sym_")
"[rb_sym_*]"
elsif symbol.start_with?("rb_st_") || symbol.start_with?("st_")
"[rb_st_*]"
elsif symbol.start_with?("rb_ivar_") || symbol.include?("shape")
"[ivars]"
elsif symbol.include?("match") || symbol.start_with?("rb_reg") || symbol.start_with?("onig")
"[regexp]"
elsif symbol.include?("alloc") || symbol.include?("free") || symbol.include?("gc")
"[GC]"
elsif symbol.include?("pthread") && symbol.include?("lock")
"[pthread lock]"
else
symbol
end
end

def most_common(counter)
counter.each.with_index
.sort_by { |((_key, cycles), index)| [-cycles, index] }
.map(&:first)
end
end

if ARGV.length != 1
abort "Usage: #{File.basename($PROGRAM_NAME)} <perf-script-output>"
end

jit_perf = JITPerf.new
jit_perf.read(ARGV[0])
jit_perf.print_report
Loading