Tutorial

How to use the Olympia RiSC-V Performance Model

This wiki will describe provide an overview of general use of the Olympia RiSC-V Performance Model starting with trace generation through example performance debugging.

What Olympia is

As described in the README.md of the main Olympia page, Olympia is a Performance Model for the RISC-V community as an example of an Out-of-Order RISC-V CPU Performance Model based on the Sparta Modeling Framework.

The pipeline design is very rudimentary, with a simple Fetch, Decode, Rename, Dispatch and Execution blocks in the main pipeline. The memory system consists of a simple in-order load/store pipeline coupled to a simple bus interface unit communicating with a very simple memory subsystem. The design layout is very similar to the Sparta Core Example.

Tutorial Flow

This tutorial starts with trace generation using Dromajo, specifically Dhrystone (included in the repository). After the trace is generated the tutorial will focus on running that trace and looking for performance bottlenecks using tools like reporting, Argos pipeline viewing, and time series analysis.

Assumptions made:

The reader of this tutorial has successfully built the Olympia model using the directions found in the main README.md.
The traces, reports definition files, etc have not changed since this tutorial was composed

Generating a Trace

Generation a trace for Olympia involves instrumenting a functional model like Spike or Dromajo (or any functional simulator that can run RISC-V software) with the STF library's writer API. Included in the Olympia is a patch for Dromajo as well as documentation to build, run, and trace Dhrystone on Dromajo.

Traces are instruction streams -- the path an application took running on a RISC-V core. STF traces are binary files and can only be viewed using the STF library's reader API or with the STF tools (like stf_dump and stf_imem) found in the STF tracing tools repository.

For example, this is the command to view the instruction stream of the provided Dhrystone trace:

% stf_dump traces/dhry_riscv.zstf | less
VERSION             1.5
GENERATOR           Dromajo
GEN_VERSION         1.1.0
GEN_COMMENT         Trace from Dromajo
INST_IEM            RISCV
PID                 00000000:00000000:00000000
INST16    1         00000000000101ba                              00006722             c.ldsp      x14,8(x2)
                                     MEM READ          0000       0000003fffa90cb8     0000000000000000
INST32    2         00000000000101bc                              4f805d63             bge         x0,x24,0x00000000000106b6
INST32    3         00000000000101c0                              000247b7             lui         x15,0x24
INST32    4         00000000000101c4                              b4078793             addi        x15,x15,-1216 # 0x0000000000023b40
INST16    5         00000000000101c8                              00006398             c.ld        x14,0(x15)
                                     MEM READ          0000       0000000000023b40     0000000000000000
INST32    6         00000000000101ca                              000244b7             lui         x9,0x24
INST16    7         00000000000101ce                              00004905             c.li        x18,1
...

For example, this is the command to view a sorted instruction memory dump of the provided Dhrystone trace:

% stf_imem -S traces/dhry_riscv.zstf | less

Traces can be extended to include registers and their values per instruction, PTE entries, escape records with speculative paths, exception information, etc. This is beyond the scope of this tutorial, however.

Running a Generated Trace

Running a trace (expected extension zstf or stf) on Olympia is as simple as providing the trace to the simulator:

% ./olympia trace_file.zstf

Olympia does, however, support simple JSON input files as well. This is handy if a performance architect is interested in a simple what-if analysis, like load-to-use latency:

[
    {
        "mnemonic": "lw",
        "rs1": 4,
        "rs2": 3,
        "rd":  5,
        "vaddr" : "0xdeadbeef"
    }
    {
        "mnemonic": "add",
        "rs1": 5,
        "rs2": 2,
        "rd":  1
    }
]

Running this JSON file on olympia with "infinite caches" gives a general idea of the latency from load issue time to the add execution time.

% ./olympia -p top.cpu.core0.lsu.params.dl1_always_hit true load_add_dependency.json

More on analyzing such as example later in the tutorial.

Run Dhrystone on the Simulator

Run the provided trace of Dhrystone on the simulator, specifically from the build directory where the olympia binary resides:

% ./olympia ../trace/dhry_riscv.zstf --auto-summary on

This will run the default configuration of olympia on 2.3 million instructions of Dhrystone trace in roughly 6 seconds.

Tweaking Parameters

Each unit in Olympia has parameters that it uses at startup/runtime to change/manipulate behavior. A comprehensive list of parameters can be viewed using the following command line options:

% ./olympia --no-run <parameter option>
  --show-parameters     # Dump to the console the parameters found in the tree
  --write-final-config <config name>.yaml          # Dump the final parameters to a YAML file
  --write-final-config-verbose  <config name>.yaml # Dump the final parameters to a YAML file with descriptions

--no-run is handy to prevent the simulator from complaining that no workload was provided.

Parameters can be changed on the command line using the -p option or via a configuration YAML file allowing for a list of parameters:

# Set the Dispatch Queue Depth
% ./olympia -p top.cpu.core0.dispatch.params.dispatch_queue_depth 12 traces/dhry_riscv.zstf

% cat > dipatch_params.yaml
top:
    cpu.core0.dispatch.params.dispatch_queue_depth: 12
    cpu.core0.dispatch.params.num_to_dispatch: 3
<ctrl-D>
% ./olympia -c dipatch_params.yaml traces/dhry_riscv.zstf

Running Architectures

Architectures are another methodology to group parameters together that represent an architecture configuration. In Olympia, three made-up architectures are provided:

% ls arches/*.yaml
big_core.yaml  medium_core.yaml  small_core.yaml

Each architecture builds on top of the previous one:

% head -8 arches/big_core.yaml 
#
# Set up the pipeline for a 8-wide machine
#

# Build on top of a medium core
include: medium_core.yaml

The include statement allows big_core to build on top of medium_core, etc. This allows changes, for example, in medium_core to be automatically included in big_core.

To run an architecture, supply the name of the architecture to the --arch <arch_name> command line:

./olympia --arch medium_core traces/dhry_riscv.zstf

olympia automatically looks in the arches directory (defined here) to look for named architectures.

By default, olympia runs the small_core architecture.

Report Generation

One of the most powerful features of the Sparta Modeling Framework is the ability to generate precise reports in a multitude of formats. Reports are the first insight to how an application (trace) is performing on a given modeled architecture.

Reports in Sparta are provided in two forms: definitions (.def) and contents (.yaml). Depending on how the modeler wants to collect a report, one of the two formats will be provided.

Report definitions (.def) allow the modeler to control how and when a report is collected. For example, collecting a report starting from the very first instruction of a trace will include statistic/counter values that reflect cold cache/branch prediction effects. Starting a report at a distance into the trace will avoid those cold cache effects and provide a more "steady state" report.

Report content files (.yaml) indicate to the reporting mechanism what to collect from the simulator. A content file can include all of the statistics/counters or a small subset.

Report definition files include report content files. But either file can be provided to the --report command line option.

Start with the simplest report, the auto-summary report. This is a report of every stat regardless of visibility.

# Run 1 million instructions for brevity
% ./olympia --auto-summary on --workload traces/dhry_riscv.zstf -i 1M

Next, generate a report that constrains the statistics/counters to only those that are not hidden using a content report file (.yaml):

% cat reports/core_stats.yaml 
#
# Auto populate the report with stats that are not marked hidden
# https://sparcians.github.io/map/classsparta_1_1InstrumentationNode.html#a855b6ecdd93e412052ae264032002ce1
#
content:
    autopopulate:
         attributes: "!=vis:hidden"
% ./olympia --report "top" reports/core_stats.yaml 1 text  --workload traces/dhry_riscv.zstf -i1M

The command line option --report broken down:

top means start collection at the given node: top.
Try this: top.cpu.core0.dispatch
reports/core_stats.yaml is the content file.
Try this: reports/dhry_report.yaml
1 means send the report to stdout.
Try this: 2 for stderr or my_report.out to save it in a file
text means save the report in text format.
Try this: html for html output, but save it to my_report.html.

Other example to try:

# Supply the --report option as many times as needed
% ./olympia --report "top.cpu.core0.dispatch" reports/core_stats.yaml dispatch_stats.text text  \
            --report "top.cpu.core0.rob"      reports/core_stats.yaml rob_stats.text      text  \
            --report "top.cpu.core0"          reports/core_stats.yaml core0_stats.text    text  \
            --workload traces/dhry_riscv.zstf -i1M

The above example generates a single report using a content file. Using a definitions file, more report control can be added.

Included in Olympia is a core report definition file core_report.def in the reports directory. In this definition file, there are two reports expected to be generated:

A report for the entire workload
A report started after a certain number of instructions have elapsed

The definition file:

% cat reports/core_report.def
content:
  # Report 1: Start from time/inst == 0 and collect everything
  report:
    pattern:   top
    def_file:  reports/core_stats.yaml
    dest_file: %OUT_BASE%.%OUT_FORMAT%
    format:    %OUT_FORMAT%
  # Report 2: Start from inst == INST_START
  report:
    pattern:   top
    def_file:  reports/core_stats.yaml
    dest_file: %OUT_BASE%_delayed.%OUT_FORMAT%
    format:    %OUT_FORMAT%
    trigger:
      start:   cpu.core0.rob.stats.total_number_retired >= %INST_START%

The report keyword starts a new report, at the given pattern using the content file core_stats.yaml.

Each report has a given destination file (dest_file) and format (format), but the names are replace strings. Anything defined in %% are keywords expected to be replaced at simulation runtime using the command line option --report-yaml-replacements <placeholder_name> <value> [<placeholder_name> <value>]. This is handy when a performance architect is running an experiment on many workloads.

Finally, the second report generated is different from the first in that it is triggered to start statistics/counter collection at the given start. In this case, when the total number of retired instructions is equal to or exceeds the replaced INST_START the report will begin. For more information on triggers, see this README in the Sparta Modeling Framework repository.

Here's an example of how to use a report definition file:

% ./olympia --report reports/core_report.def \
            --report-yaml-replacements OUT_BASE dhry_1M_insts OUT_FORMAT text INST_START 100k \
            --workload traces/dhry_riscv.zstf -i 1M

This command will generate reports dhry_1M_insts.text and dhry_1M_insts_delayed.text. Diff these files to notice the difference in stats, particularly in instruction count (it will be 100K fewer):

%  diff -y dhry_1M_insts.text dhry_1M_insts_delayed.text | grep total_number_retired
          total_number_retired = 1000000		      |	          total_number_retired = 900000

Command line broken option broken down:

Option --report reports/core_report.def: Use the definition file for report generation. Note the lack of other options as compared to the previous example
Option --report-yaml-replacements OUT_BASE dhry_1M_insts OUT_FORMAT text INST_START 100k. Replace OUT_BASE in the definition file with dhry_1M_insts Replace OUT_FORMAT in the definition file with text Replace INST_START in the definition file with 100k

Time Series Report Generation

TBD

Pipeout Generation

TBD

RISC-V Performance Model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tutorial

How to use the Olympia RiSC-V Performance Model

What Olympia is

Tutorial Flow

Generating a Trace

Running a Generated Trace

Run Dhrystone on the Simulator

Tweaking Parameters

Running Architectures

Report Generation

Time Series Report Generation

Pipeout Generation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally