-
Notifications
You must be signed in to change notification settings - Fork 69
Tutorial
This wiki will describe provide an overview of general use of the Olympia RiSC-V Performance Model starting with trace generation through example performance debugging.
As described in the README.md of the main Olympia page, Olympia is a Performance Model for the RISC-V community as an example of an Out-of-Order RISC-V CPU Performance Model based on the Sparta Modeling Framework.
The pipeline design is very rudimentary, with a simple Fetch, Decode, Rename, Dispatch and Execution blocks in the main pipeline. The memory system consists of a simple in-order load/store pipeline coupled to a simple bus interface unit communicating with a very simple memory subsystem. The design layout is very similar to the Sparta Core Example.
This tutorial starts with trace generation using Dromajo, specifically Dhrystone (included in the repository). After the trace is generated the tutorial will focus on running that trace and looking for performance bottlenecks using tools like reporting, Argos pipeline viewing, and time series analysis.
Assumptions made:
- The reader of this tutorial has successfully built the Olympia model using the directions found in the main README.md.
- The traces, reports definition files, etc have not changed since this tutorial was composed
Generation a trace for Olympia involves instrumenting a functional model like Spike or Dromajo (or any functional simulator that can run RISC-V software) with the STF library's writer API. Included in the Olympia is a patch for Dromajo as well as documentation to build, run, and trace Dhrystone on Dromajo.
Traces are instruction streams -- the path an application took running on a RISC-V core. STF traces are binary files and can only be viewed using the STF library's reader API or with the STF tools (like stf_dump
and stf_imem
) found in the STF tracing tools repository.
For example, this is the command to view the instruction stream of the provided Dhrystone trace:
% stf_dump traces/dhry_riscv.zstf | less
VERSION 1.5
GENERATOR Dromajo
GEN_VERSION 1.1.0
GEN_COMMENT Trace from Dromajo
INST_IEM RISCV
PID 00000000:00000000:00000000
INST16 1 00000000000101ba 00006722 c.ldsp x14,8(x2)
MEM READ 0000 0000003fffa90cb8 0000000000000000
INST32 2 00000000000101bc 4f805d63 bge x0,x24,0x00000000000106b6
INST32 3 00000000000101c0 000247b7 lui x15,0x24
INST32 4 00000000000101c4 b4078793 addi x15,x15,-1216 # 0x0000000000023b40
INST16 5 00000000000101c8 00006398 c.ld x14,0(x15)
MEM READ 0000 0000000000023b40 0000000000000000
INST32 6 00000000000101ca 000244b7 lui x9,0x24
INST16 7 00000000000101ce 00004905 c.li x18,1
...
For example, this is the command to view a sorted instruction memory dump of the provided Dhrystone trace:
% stf_imem -S traces/dhry_riscv.zstf | less
Traces can be extended to include registers and their values per instruction, PTE entries, escape records with speculative paths, exception information, etc. This is beyond the scope of this tutorial, however.
Running a trace (expected extension zstf
or stf
) on Olympia is as simple as providing the trace to the simulator:
% ./olympia trace_file.zstf
Olympia does, however, support simple JSON input files as well. This is handy if a performance architect is interested in a simple what-if analysis, like load-to-use latency:
[
{
"mnemonic": "lw",
"rs1": 4,
"rs2": 3,
"rd": 5,
"vaddr" : "0xdeadbeef"
}
{
"mnemonic": "add",
"rs1": 5,
"rs2": 2,
"rd": 1
}
]
Running this JSON file on olympia with "infinite caches" gives a general idea of the latency from load issue time to the add
execution time.
% ./olympia -p top.cpu.core0.lsu.params.dl1_always_hit true load_add_dependency.json
More on analyzing such as example later in the tutorial.
Run the provided trace of Dhrystone on the simulator, specifically from the build directory where the olympia
binary resides:
% ./olympia ../trace/dhry_riscv.zstf --auto-summary on
This will run the default configuration of olympia
on 2.3 million instructions of Dhrystone trace in roughly 6 seconds.
Each unit in Olympia has parameters that it uses at startup/runtime to change/manipulate behavior. A comprehensive list of parameters can be viewed using the following command line options:
% ./olympia --no-run <parameter option>
--show-parameters # Dump to the console the parameters found in the tree
--write-final-config <config name>.yaml # Dump the final parameters to a YAML file
--write-final-config-verbose <config name>.yaml # Dump the final parameters to a YAML file with descriptions
--no-run
is handy to prevent the simulator from complaining that no workload was provided.
Parameters can be changed on the command line using the -p
option or via a configuration YAML file allowing for a list of parameters:
# Set the Dispatch Queue Depth
% ./olympia -p top.cpu.core0.dispatch.params.dispatch_queue_depth 12 traces/dhry_riscv.zstf
% cat > dipatch_params.yaml
top:
cpu.core0.dispatch.params.dispatch_queue_depth: 12
cpu.core0.dispatch.params.num_to_dispatch: 3
<ctrl-D>
% ./olympia -c dipatch_params.yaml traces/dhry_riscv.zstf
Architectures are another methodology to group parameters together that represent an architecture configuration. In Olympia, three made-up architectures are provided:
% ls arches/*.yaml
big_core.yaml medium_core.yaml small_core.yaml
Each architecture builds on top of the previous one:
% head -8 arches/big_core.yaml
#
# Set up the pipeline for a 8-wide machine
#
# Build on top of a medium core
include: medium_core.yaml
The include
statement allows big_core
to build on top of medium_core
, etc. This allows changes, for example, in medium_core
to be automatically included in big_core
.
To run an architecture, supply the name of the architecture to the --arch <arch_name>
command line:
./olympia --arch medium_core traces/dhry_riscv.zstf
olympia
automatically looks in the arches
directory (defined here) to look for named architectures.
By default, olympia
runs the small_core
architecture.
One of the most powerful features of the Sparta Modeling Framework is the ability to generate precise reports in a multitude of formats. Reports are the first insight to how an application (trace) is performing on a given modeled architecture.
Reports in Sparta are provided in two forms: definitions (.def
) and contents (.yaml
). Depending on how the modeler wants to collect a report, one of the two formats will be provided.
Report definitions (.def
) allow the modeler to control how and when a report is collected. For example, collecting a report starting from the very first instruction of a trace will include statistic/counter values that reflect cold cache/branch prediction effects. Starting a report at a distance into the trace will avoid those cold cache effects and provide a more "steady state" report.
Report content files (.yaml
) indicate to the reporting mechanism what to collect from the simulator. A content file can include all of the statistics/counters or a small subset.
Report definition files include report content files. But either file can be provided to the --report
command line option.
Start with the simplest report, the auto-summary
report. This is a report of every stat regardless of visibility.
# Run 1 million instructions for brevity
% ./olympia --auto-summary on --workload traces/dhry_riscv.zstf -i 1M
Next, generate a report that constrains the statistics/counters to only those that are not hidden using a content report file (.yaml
):
% cat reports/core_stats.yaml
#
# Auto populate the report with stats that are not marked hidden
# https://sparcians.github.io/map/classsparta_1_1InstrumentationNode.html#a855b6ecdd93e412052ae264032002ce1
#
content:
autopopulate:
attributes: "!=vis:hidden"
% ./olympia --report "top" reports/core_stats.yaml 1 text --workload traces/dhry_riscv.zstf -i1M
The command line option --report
broken down:
-
top
means start collection at the given node:top
.
Try this:top.cpu.core0.dispatch
-
reports/core_stats.yaml
is the content file.
Try this:reports/dhry_report.yaml
-
1
means send the report tostdout
.
Try this:2
forstderr
ormy_report.out
to save it in a file -
text
means save the report in text format.
Try this:html
for html output, but save it tomy_report.html
.
Other example to try:
# Supply the --report option as many times as needed
% ./olympia --report "top.cpu.core0.dispatch" reports/core_stats.yaml dispatch_stats.text text \
--report "top.cpu.core0.rob" reports/core_stats.yaml rob_stats.text text \
--report "top.cpu.core0" reports/core_stats.yaml core0_stats.text text \
--workload traces/dhry_riscv.zstf -i1M
The above example generates a single report using a content file. Using a definitions file, more report control can be added.
Included in Olympia is a core report definition file core_report.def
in the reports
directory. In this definition file, there are two reports expected to be generated:
- A report for the entire workload
- A report started after a certain number of instructions have elapsed
The definition file:
% cat reports/core_report.def
content:
# Report 1: Start from time/inst == 0 and collect everything
report:
pattern: top
def_file: reports/core_stats.yaml
dest_file: %OUT_BASE%.%OUT_FORMAT%
format: %OUT_FORMAT%
# Report 2: Start from inst == INST_START
report:
pattern: top
def_file: reports/core_stats.yaml
dest_file: %OUT_BASE%_delayed.%OUT_FORMAT%
format: %OUT_FORMAT%
trigger:
start: cpu.core0.rob.stats.total_number_retired >= %INST_START%
The report
keyword starts a new report, at the given pattern
using the content file core_stats.yaml
.
Each report has a given destination file (dest_file
) and format (format
), but the names are replace strings. Anything defined in %%
are keywords expected to be replaced at simulation runtime using the command line option --report-yaml-replacements <placeholder_name> <value> [<placeholder_name> <value>]
. This is handy when a performance architect is running an experiment on many workloads.
Finally, the second report generated is different from the first in that it is triggered to start statistics/counter collection at the given start
. In this case, when the total number of retired instructions is equal to or exceeds the replaced INST_START
the report will begin. For more information on triggers, see this README in the Sparta Modeling Framework repository.
Here's an example of how to use a report definition file:
% ./olympia --report reports/core_report.def \
--report-yaml-replacements OUT_BASE dhry_1M_insts OUT_FORMAT text INST_START 100k \
--workload traces/dhry_riscv.zstf -i 1M
This command will generate reports dhry_1M_insts.text
and dhry_1M_insts_delayed.text
. Diff these files to notice the difference in stats, particularly in instruction count (it will be 100K fewer):
% diff -y dhry_1M_insts.text dhry_1M_insts_delayed.text | grep total_number_retired
total_number_retired = 1000000 | total_number_retired = 900000
Command line broken option broken down:
- Option
--report reports/core_report.def
: Use the definition file for report generation. Note the lack of other options as compared to the previous example - Option
--report-yaml-replacements OUT_BASE dhry_1M_insts OUT_FORMAT text INST_START 100k
. ReplaceOUT_BASE
in the definition file withdhry_1M_insts
ReplaceOUT_FORMAT
in the definition file withtext
ReplaceINST_START
in the definition file with100k
TBD
TBD
RISC-V Performance Model