Skip to content

Commit ac864c5

Browse files
committed
Support bert in mlcommons cpp implementation, add pytorch backend
1 parent 7301fcd commit ac864c5

13 files changed

Lines changed: 523 additions & 32 deletions

File tree

script/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MLCommons Automation Scripts
22

3-
*Last updated: 2026-04-20 21:26:30*
3+
*Last updated: 2026-04-21 04:12:49*
44

55
This directory contains automation scripts for MLPerf benchmarks, AI/ML workflows, and development operations.
66

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# README for app-mlperf-inference-mlcommons-cpp
2+
This README is automatically generated. Create and add custom content in info.md. Please follow the [script execution document](https://docs.mlcommons.org/mlcflow/targets/script/execution-flow/) to understand more about the MLC script execution.
3+
4+
`mlcflow` stores all local data under `$HOME/MLC` by default. So, if there is space constraint on the home directory and you have more space on say `/mnt/$USER`, you can do
5+
```
6+
mkdir /mnt/$USER/MLC
7+
ln -s /mnt/$USER/MLC $HOME/MLC
8+
```
9+
You can also use the `ENV` variable `MLC_REPOS` to control this location but this will need a set after every system reboot.
10+
11+
## Setup
12+
13+
If you are not on a Python development environment please refer to the [official docs](https://docs.mlcommons.org/mlcflow/install/) for the installation.
14+
15+
```bash
16+
python3 -m venv mlcflow
17+
. mlcflow/bin/activate
18+
pip install mlcflow
19+
```
20+
21+
- Using a virtual environment is recommended (per `pip` best practices), but you may skip it or use `--break-system-packages` if needed.
22+
23+
### Pull mlperf-automations
24+
25+
Once `mlcflow` is installed:
26+
27+
```bash
28+
mlc pull repo mlcommons@mlperf-automations --pat=<Your Private Access Token>
29+
```
30+
- `--pat` or `--ssh` is only needed if the repo is PRIVATE
31+
- If `--pat` is avoided, you'll be asked to enter the password where you can enter your Private Access Token
32+
- `--ssh` option can be used instead of `--pat=<>` option if you prefer to use SSH for accessing the github repository.
33+
## Run Commands
34+
35+
```bash
36+
mlcr app,mlcommons,mlperf,inference,cpp
37+
```
38+
39+
### Script Inputs
40+
41+
| Name | Description | Choices | Default |
42+
|------|-------------|---------|------|
43+
| `--count` | | | `` |
44+
| `--max_batchsize` | | | `` |
45+
| `--mlperf_conf` | | | `` |
46+
| `--mode` | | | `` |
47+
| `--output_dir` | | | `` |
48+
| `--performance_sample_count` | | | `` |
49+
| `--scenario` | | | `` |
50+
| `--user_conf` | | | `` |
51+
### Generic Script Inputs
52+
53+
| Name | Description | Choices | Default |
54+
|------|-------------|---------|------|
55+
| `--input` | Input to the script passed using the env key `MLC_INPUT` | | `` |
56+
| `--output` | Output from the script passed using the env key `MLC_OUTPUT` | | `` |
57+
| `--outdirname` | The directory to store the script output | | `cache directory ($HOME/MLC/repos/local/cache/<>) if the script is cacheable or else the current directory` |
58+
| `--outbasename` | The output file/folder name | | `` |
59+
| `--search_folder_path` | The folder path where executables of a given script need to be searched. Search is done recursively upto 4 levels. | | `` |
60+
| `--name` | | | `` |
61+
| `--extra_cache_tags` | Extra cache tags to be added to the cached entry when the script results are saved | | `` |
62+
| `--skip_compile` | Skip compilation | | `False` |
63+
| `--skip_run` | Skip run | | `False` |
64+
| `--skip_sudo` | Skip SUDO detection | | `False` |
65+
| `--accept_license` | Accept the required license requirement to run the script | | `False` |
66+
| `--skip_system_deps` | Skip installing any system dependencies | | `False` |
67+
| `--git_ssh` | Use SSH for git repos | | `False` |
68+
| `--gh_token` | Github Token | | `` |
69+
| `--hf_token` | Huggingface Token | | `` |
70+
| `--verify_ssl` | Verify SSL | | `False` |
71+
## Variations
72+
73+
### Batch-size
74+
75+
- `batch-size.#` _(# can be substituted dynamically)_
76+
77+
### Device
78+
79+
- `cpu` (default)
80+
- `cuda`
81+
82+
### Framework
83+
84+
- `onnxruntime` (default)
85+
- `pytorch`
86+
- `tf`
87+
- `tflite`
88+
- `tvm-onnx`
89+
90+
### Loadgen-scenario
91+
92+
- `multistream`
93+
- `offline` (default)
94+
- `server`
95+
- `singlestream`
96+
97+
### Model
98+
99+
- `resnet50` (default)
100+
- `retinanet`

script/app-mlperf-inference-mlcommons-cpp/customize.py

Lines changed: 57 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,11 @@ def preprocess(i):
4444
script_path = i['run_script_input']['path']
4545
if env['MLC_MODEL'] == "retinanet":
4646
env['MLC_DATASET_LIST'] = env['MLC_DATASET_ANNOTATIONS_FILE_PATH']
47+
elif 'bert' in env['MLC_MODEL']:
48+
env['MLC_DATASET_SQUAD_TOKENIZED_ROOT'] = env.get(
49+
'MLC_DATASET_SQUAD_TOKENIZED_ROOT', '')
50+
env['MLC_DATASET_MAX_SEQ_LENGTH'] = env.get(
51+
'MLC_DATASET_SQUAD_TOKENIZED_MAX_SEQ_LENGTH', '384')
4752
env['MLC_SOURCE_FOLDER_PATH'] = os.path.join(script_path, "src")
4853

4954
for file in os.listdir(env['MLC_SOURCE_FOLDER_PATH']):
@@ -66,26 +71,51 @@ def preprocess(i):
6671

6772
if '+ CXXFLAGS' not in env:
6873
env['+ CXXFLAGS'] = []
69-
env['+ CXXFLAGS'].append("-std=c++14")
74+
if env.get('MLC_MLPERF_BACKEND', '') == 'pytorch':
75+
env['+ CXXFLAGS'].append("-std=c++17")
76+
else:
77+
env['+ CXXFLAGS'].append("-std=c++14")
7078

7179
# add preprocessor flag like "#define MLC_MODEL_RESNET50"
72-
env['+ CXXFLAGS'].append('-DMLC_MODEL_' + env['MLC_MODEL'].upper())
80+
env['+ CXXFLAGS'].append('-DMLC_MODEL_' + env['MLC_MODEL'].upper().replace('-', '_').replace('.', '_'))
7381
# add preprocessor flag like "#define MLC_MLPERF_BACKEND_ONNXRUNTIME"
7482
env['+ CXXFLAGS'].append('-DMLC_MLPERF_BACKEND_' +
7583
env['MLC_MLPERF_BACKEND'].upper())
7684
# add preprocessor flag like "#define MLC_MLPERF_DEVICE_CPU"
7785
env['+ CXXFLAGS'].append('-DMLC_MLPERF_DEVICE_' +
7886
env['MLC_MLPERF_DEVICE'].upper())
7987

88+
# For PyTorch backend, detect LibTorch include/lib paths from pip torch
89+
if env.get('MLC_MLPERF_BACKEND', '') == 'pytorch':
90+
import torch as _torch
91+
torch_path = os.path.dirname(_torch.__file__)
92+
torch_inc = os.path.join(torch_path, 'include')
93+
torch_inc_csrc = os.path.join(torch_path, 'include', 'torch', 'csrc', 'api', 'include')
94+
torch_lib = os.path.join(torch_path, 'lib')
95+
env['+CPLUS_INCLUDE_PATH'].append(torch_inc)
96+
env['+CPLUS_INCLUDE_PATH'].append(torch_inc_csrc)
97+
env['+C_INCLUDE_PATH'].append(torch_inc)
98+
env['+C_INCLUDE_PATH'].append(torch_inc_csrc)
99+
env['+LD_LIBRARY_PATH'].append(torch_lib)
100+
env['+DYLD_FALLBACK_LIBRARY_PATH'].append(torch_lib)
101+
if not _torch.compiled_with_cxx11_abi():
102+
env['+ CXXFLAGS'].append('-D_GLIBCXX_USE_CXX11_ABI=0')
103+
80104
if '+ LDCXXFLAGS' not in env:
81105
env['+ LDCXXFLAGS'] = []
82106

83107
env['+ LDCXXFLAGS'] += [
84108
"-lmlperf_loadgen",
85109
"-lpthread"
86110
]
111+
112+
# For PyTorch, link against torch, torch_cpu, and c10
113+
if env.get('MLC_MLPERF_BACKEND', '') == 'pytorch':
114+
env['+ LDCXXFLAGS'] += ['-ltorch', '-ltorch_cpu', '-lc10']
115+
if env.get('MLC_MLPERF_DEVICE', '') == 'gpu':
116+
env['+ LDCXXFLAGS'] += ['-ltorch_cuda', '-lc10_cuda']
87117
# e.g. -lonnxruntime
88-
if 'MLC_MLPERF_BACKEND_LIB_NAMESPEC' in env:
118+
elif 'MLC_MLPERF_BACKEND_LIB_NAMESPEC' in env:
89119
env['+ LDCXXFLAGS'].append('-l' +
90120
env['MLC_MLPERF_BACKEND_LIB_NAMESPEC'])
91121
# e.g. -lcudart
@@ -96,9 +126,31 @@ def preprocess(i):
96126
env['MLC_LINKER_LANG'] = 'CXX'
97127
env['MLC_RUN_DIR'] = os.getcwd()
98128

129+
130+
# For PyTorch backend, convert .pth weights to TorchScript .pt if needed
131+
if env.get('MLC_MLPERF_BACKEND', '') == 'pytorch':
132+
model_path = env.get('MLC_ML_MODEL_FILE_WITH_PATH', '')
133+
if model_path.endswith('.pth'):
134+
torchscript_path = model_path.replace('.pth', '_torchscript.pt')
135+
if not os.path.exists(torchscript_path):
136+
import torch
137+
import torchvision.models as models
138+
logger.info(f"Converting {model_path} to TorchScript at {torchscript_path}")
139+
model = models.resnet50()
140+
model.load_state_dict(torch.load(model_path, map_location='cpu', weights_only=False))
141+
model.eval()
142+
traced = torch.jit.trace(model, torch.randn(1, 3, 224, 224))
143+
traced.save(torchscript_path)
144+
logger.info("TorchScript conversion done")
145+
env['MLC_ML_MODEL_FILE_WITH_PATH'] = torchscript_path
146+
99147
if 'MLC_MLPERF_USER_CONF' not in env:
100-
env['MLC_MLPERF_USER_CONF'] = os.path.join(
101-
env['MLC_MLPERF_INFERENCE_CLASSIFICATION_AND_DETECTION_PATH'], "user.conf")
148+
if 'bert' in env['MLC_MODEL']:
149+
env['MLC_MLPERF_USER_CONF'] = os.path.join(
150+
env.get('MLC_MLPERF_INFERENCE_BERT_PATH', ''), "user.conf")
151+
else:
152+
env['MLC_MLPERF_USER_CONF'] = os.path.join(
153+
env['MLC_MLPERF_INFERENCE_CLASSIFICATION_AND_DETECTION_PATH'], "user.conf")
102154

103155
return {'return': 0}
104156

script/app-mlperf-inference-mlcommons-cpp/inc/backend.h

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@
2828
* the location in memory of this batch, and passes this to RunInference implemented by
2929
* derived classes (e.g. OnnxRuntimeBackend).
3030
*/
31-
class Backend {
31+
class MlcBackend {
3232
public:
33-
Backend(std::shared_ptr<Model> &model, std::shared_ptr<Device> &device,
33+
MlcBackend(std::shared_ptr<Model> &model, std::shared_ptr<MlcDevice> &device,
3434
size_t performance_sample_count, size_t batch_size)
3535
: model(model), device(device)
3636
, performance_sample_count(performance_sample_count), batch_size(batch_size)
@@ -60,7 +60,7 @@ class Backend {
6060
std::cerr << "warning: performance sample count = 0" << std::endl;
6161
}
6262

63-
virtual ~Backend() {
63+
virtual ~MlcBackend() {
6464
for (size_t i = 0; i < num_inputs; i++) {
6565
for (size_t j = 0; j < num_memory; j++) {
6666
device->Free(j, sample_memory[i][j]);
@@ -175,14 +175,15 @@ class Backend {
175175
size_t memory_index = device->GetMemoryIndex(concurrency_index);
176176
// might use batch_memory
177177
std::unique_lock<std::mutex> batch_memory_lock{batch_memory_mutex[memory_index], std::defer_lock};
178+
if (!contiguous)
179+
batch_memory_lock.lock();
178180
for (size_t i = 0; i < num_inputs; i++) {
179181
// if input is contiguous, use input directly as batch address
180182
// otherwise, gather a batch to batch_memory
181183
if (contiguous) {
182184
batch_data[i] = GetMemoryAddress(i, memory_index, node->index_in_memory);
183185
} else {
184186
// copy data if not contiguous
185-
batch_memory_lock.lock();
186187
for (size_t k = 0; k < batch.size(); k++) {
187188
const mlperf::QuerySample &sample = batch[k];
188189
void *sample_address = GetMemoryAddress(i, memory_index, sample_map[sample.index].index_in_memory);
@@ -232,7 +233,7 @@ class Backend {
232233

233234
protected:
234235
std::shared_ptr<Model> model;
235-
std::shared_ptr<Device> device;
236+
std::shared_ptr<MlcDevice> device;
236237
size_t performance_sample_count;
237238
size_t batch_size;
238239
size_t num_memory;
@@ -275,12 +276,12 @@ class Backend {
275276
Trie batches;
276277
};
277278

278-
class DummyBackend : public Backend {
279+
class DummyBackend : public MlcBackend {
279280
public:
280281
DummyBackend(
281-
std::shared_ptr<Model> &model, std::shared_ptr<Device> &device,
282+
std::shared_ptr<Model> &model, std::shared_ptr<MlcDevice> &device,
282283
size_t performance_sample_count, size_t batch_size)
283-
: Backend(model, device, performance_sample_count, batch_size) {}
284+
: MlcBackend(model, device, performance_sample_count, batch_size) {}
284285

285286
void RunInference(
286287
size_t concurrency_index,

script/app-mlperf-inference-mlcommons-cpp/inc/device.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
*
2020
* The Alloc, Free, Read, Write, Copy operations are for the corresponding device memory.
2121
*/
22-
class Device {
22+
class MlcDevice {
2323
public:
2424
virtual size_t NumConcurrency() const = 0;
2525
virtual size_t NumMemory() const = 0;
@@ -33,7 +33,7 @@ class Device {
3333
virtual void SetConcurrencyIndex(size_t concurrency_index) {}
3434
};
3535

36-
class CPUDevice : public Device {
36+
class CPUDevice : public MlcDevice {
3737
size_t NumConcurrency() const override {
3838
return 2;//std::thread::hardware_concurrency();
3939
}

script/app-mlperf-inference-mlcommons-cpp/inc/gpu_device.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
#define CHECK_CUDA_SUCCESS(x) if ((x) != cudaSuccess) std::cerr << "encountered CUDA error" << std::endl;
1212

13-
class GPUDevice : public Device {
13+
class GPUDevice : public MlcDevice {
1414
size_t NumConcurrency() const override {
1515
return NumMemory();
1616
}

script/app-mlperf-inference-mlcommons-cpp/inc/model.h

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,4 +121,34 @@ class Retinanet : public Model {
121121
float score_threshold;
122122
};
123123

124+
125+
class BertLarge : public Model {
126+
public:
127+
BertLarge(std::string model_path, size_t max_seq_length) :
128+
Model(
129+
model_path,
130+
3, {"input_ids", "input_mask", "segment_ids"},
131+
{max_seq_length * sizeof(int64_t), max_seq_length * sizeof(int64_t), max_seq_length * sizeof(int64_t)},
132+
{{max_seq_length}, {max_seq_length}, {max_seq_length}},
133+
2, {"output_start_logits", "output_end_logits"},
134+
{max_seq_length * sizeof(float), max_seq_length * sizeof(float)},
135+
{{max_seq_length}, {max_seq_length}}),
136+
max_seq_length(max_seq_length) {}
137+
138+
void PostProcess(
139+
mlperf::QuerySampleIndex index,
140+
const std::vector<void *> &raw,
141+
const std::vector<std::vector<size_t>> &raw_shapes,
142+
std::vector<uint8_t> &response_buffer) override {
143+
// Concatenate start_logits and end_logits into response
144+
size_t logits_bytes = max_seq_length * sizeof(float);
145+
response_buffer.resize(2 * logits_bytes);
146+
std::memcpy(response_buffer.data(), raw.at(0), logits_bytes);
147+
std::memcpy(response_buffer.data() + logits_bytes, raw.at(1), logits_bytes);
148+
}
149+
150+
private:
151+
size_t max_seq_length;
152+
};
153+
124154
#endif // MODEL_H_

script/app-mlperf-inference-mlcommons-cpp/inc/onnxruntime_backend.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@
1111

1212
#include "backend.h"
1313

14-
class OnnxRuntimeBackend : public Backend {
14+
class OnnxRuntimeBackend : public MlcBackend {
1515
public:
1616
OnnxRuntimeBackend(
17-
std::shared_ptr<Model> &model, std::shared_ptr<Device> &device,
17+
std::shared_ptr<Model> &model, std::shared_ptr<MlcDevice> &device,
1818
size_t performance_sample_count, size_t batch_size,
1919
bool use_cuda)
20-
: Backend(model, device, performance_sample_count, batch_size)
20+
: MlcBackend(model, device, performance_sample_count, batch_size)
2121
, env(ORT_LOGGING_LEVEL_WARNING, "env") {
2222
for (size_t i = 0; i < device->NumMemory(); i++) {
2323
memory_infos.emplace_back(

0 commit comments

Comments
 (0)