Skip to content

Commit 2816702

Browse files
committed
pmlogsynth: reframe all specs as standalone GitHub project
Remove all references to PCP's qa/ tree. The tool is now specified as an independent Python project distributed via PyPI, with PCP as an installed runtime dependency rather than a host repository. Changes across all three specs: - Project layout: qa/pmlogsynth/ -> standalone pmlogsynth/ package with pyproject.toml, pip install, and console_scripts entry point - Dependencies: explicit section covering Python 3.8+, python3-pcp, PyYAML, and optional anthropic[ai] extras group (Phase 2) - Tests: PCP qa/NNNNN framework replaced with pytest, split into Tier 1 (no PCP required) and Tier 2 (integration, auto-skipped if pmlogcheck not on PATH) - Phase 2: client.py path updated; anthropic declared as optional extras; missing-package error message includes pip install hint - Phase 3: concurrent.futures noted as stdlib (no extra dep); pytest integration test example added; fixture path updated - Hardware profiles: bundled as package data in pmlogsynth/profiles/ - User profiles: ~/.pcp/pmlogsynth/profiles/ retained (PCP convention) https://claude.ai/code/session_01WCeV6wLiaXgrw7s3v4oyQH
1 parent 4a6c345 commit 2816702

3 files changed

Lines changed: 289 additions & 105 deletions

File tree

pmlogsynth-phase1-spec.md

Lines changed: 155 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,19 @@ toolchain.
3535

3636
---
3737

38-
## 3. Goals
38+
## 3. Project Overview
39+
40+
`pmlogsynth` is a standalone Python project distributed via PyPI and developed independently
41+
of PCP. It depends on PCP being installed on the host (for `libpcp_import` and the `pcp`
42+
Python bindings), but is otherwise self-contained.
43+
44+
The project is intended for eventual contribution back to PCP, but operates as its own
45+
repository to allow faster iteration, independent releases, and contributions from users
46+
who are not PCP committers.
47+
48+
---
49+
50+
## 4. Goals
3951

4052
- Produce archives indistinguishable in format from `pmlogger` output
4153
- Support real PCP metric namespaces (kernel, disk, network, memory) with correct units
@@ -64,7 +76,35 @@ The following are explicitly deferred and will not be addressed in this phase:
6476

6577
---
6678

67-
## 4. Architecture
79+
## 5. Dependencies
80+
81+
### Runtime
82+
83+
| Dependency | How to install | Notes |
84+
|---|---|---|
85+
| **Python 3.8+** | System package manager | `python3` |
86+
| **PCP** | See [PCP installation docs](https://pcp.io/docs/guide.html) | Provides `libpcp_import.so` and the `pcp` Python bindings |
87+
| **`python3-pcp`** | System package manager | RPM: `python3-pcp`; Deb: `python3-pcp`; provides `pcp.pmi`, `pcp.pmapi`, and the `cpmi` C extension |
88+
| **PyYAML** | `pip install pyyaml` | Profile parsing |
89+
90+
### Optional (Phase 2 — natural language generation)
91+
92+
| Dependency | How to install | Notes |
93+
|---|---|---|
94+
| **`anthropic>=0.20.0`** | `pip install anthropic` | Anthropic Python SDK; only needed for `--prompt` |
95+
96+
### Not required
97+
98+
- No C compiler (pure Python after PCP is installed)
99+
- No `numpy` — Gaussian noise uses `random.gauss` from stdlib
100+
- No running `pmcd`
101+
- No root access
102+
- No database, message queue, or web service
103+
- Phase 3 parallel `--jobs` uses `concurrent.futures` from stdlib
104+
105+
---
106+
107+
## 6. Architecture
68108

69109
```
70110
profile.yaml
@@ -79,29 +119,30 @@ profile.yaml
79119
ValueSampler
80120
│ (applies Gaussian noise, accumulates counters, coerces types)
81121
82-
libpcp_import (via pmi.pmiLogImport Python bindings)
122+
libpcp_import (via pcp.pmi.pmiLogImport Python bindings)
83123
84124
85125
output.{0,index,meta}
86126
```
87127

88128
### Implementation Language
89129

90-
Python 3. Depends only on the `pcp` Python module (included in any PCP installation).
91-
No third-party dependencies are required.
130+
Python 3. Depends only on the `pcp` Python package (installed alongside any PCP
131+
installation that includes Python bindings) and PyYAML. No other third-party dependencies
132+
are required for core archive generation.
92133

93134
---
94135

95-
## 5. Hardware Profile Library
136+
## 7. Hardware Profile Library
96137

97-
### 5.1 Concept
138+
### 7.1 Concept
98139

99140
A hardware profile is a named YAML document that describes the physical or virtual host
100141
being simulated: CPU count, RAM, disk devices, and network interfaces. Profiles decouple
101142
the "what hardware" question from the "what workload" question, making profiles reusable
102143
across many workload scenarios.
103144

104-
### 5.2 Bundled Profiles
145+
### 7.2 Bundled Profiles
105146

106147
`pmlogsynth` ships with a small set of generic reference host profiles. These are loosely
107148
inspired by common cloud instance tiers but are not tied to any vendor — they serve as
@@ -117,9 +158,10 @@ reasonable, recognisable starting points.
117158
| `memory-optimized` | 4 | 64 GB | 1× NVMe | 1× 10 GbE | High RAM, modest CPU |
118159
| `storage-optimized` | 4 | 16 GB | 4× HDD | 1× 10 GbE | High disk capacity |
119160

120-
Bundled profiles live in `qa/pmlogsynth/profiles/` as individual YAML files.
161+
Bundled profiles are packaged inside the `pmlogsynth/profiles/` directory and installed
162+
as package data alongside the Python source.
121163

122-
### 5.3 User-Defined Profiles
164+
### 7.3 User-Defined Profiles
123165

124166
Users may define their own profiles — or override bundled ones — by placing YAML files in:
125167

@@ -153,21 +195,21 @@ interfaces:
153195
speed_mbps: 25000
154196
```
155197
156-
### 5.4 Build-Time Validation
198+
### 7.4 Profile Validation in CI
157199
158-
The build system runs a schema validation pass over all bundled profiles in
159-
`qa/pmlogsynth/profiles/`. Any malformed profile fails the build. Content review
200+
The CI pipeline runs a schema validation pass over all bundled profiles in
201+
`pmlogsynth/profiles/`. Any malformed profile fails the test run. Content review
160202
of contributed profiles remains a human responsibility.
161203

162204
---
163205

164-
## 6. Profile Format
206+
## 8. Profile Format
165207

166208
A profile is a YAML file that describes the simulated host and a timeline of workload
167209
**phases**. Each phase has a duration and a set of **stressors** that drive one or more
168210
metric domains.
169211

170-
### 6.1 Full Example
212+
### 8.1 Full Example
171213

172214
```yaml
173215
# cpu-memory-spike.yaml
@@ -240,14 +282,14 @@ phases:
240282
tx_mbps: 2.0
241283
```
242284

243-
### 6.2 Phase Transitions
285+
### 8.2 Phase Transitions
244286

245287
| Value | Behaviour |
246288
|-------|-----------|
247289
| `instant` (default) | Values jump immediately at the phase boundary |
248290
| `linear` | Values interpolate linearly over the full phase duration from prior phase end values |
249291

250-
### 6.3 Repeating Phases
292+
### 8.3 Repeating Phases
251293

252294
A phase may include a `repeat` key to express recurring patterns without copy-pasting.
253295
The timeline sequencer expands repeats before writing begins.
@@ -275,7 +317,7 @@ When `repeat: daily` is used, the sequencer inserts the baseline phase between e
275317
repetition to fill the 24-hour period. `meta.duration` must accommodate the full
276318
expanded timeline; the validator will reject profiles where this does not hold.
277319

278-
### 6.4 Noise
320+
### 8.4 Noise
279321

280322
A `noise:` key at domain level overrides `meta.noise` for that domain only:
281323

@@ -288,13 +330,13 @@ A `noise:` key at domain level overrides `meta.noise` for that domain only:
288330
write_mbps: 5.0
289331
```
290332

291-
### 6.5 Instance Domains
333+
### 8.5 Instance Domains
292334

293335
Disk and NIC instances are derived from the host configuration and remain **fixed**
294336
for the lifetime of the archive. Instance names match the device names in the host
295337
profile (e.g. `nvme0n1`, `eth0`).
296338

297-
### 6.6 Constraints Enforced at Validation
339+
### 8.6 Constraints Enforced at Validation
298340

299341
- `user_ratio + sys_ratio + iowait_ratio ≤ 1.0` (remainder is steal/other)
300342
- Sum of phase durations == `meta.duration` (when no `repeat` key is present)
@@ -307,12 +349,12 @@ profile (e.g. `nvme0n1`, `eth0`).
307349

308350
---
309351

310-
## 7. Metric Domains and Consistency Model
352+
## 9. Metric Domains and Consistency Model
311353

312354
Each domain is a self-contained `MetricModel` subclass that accepts high-level stressor
313355
values and derives all related PCP metrics, enforcing internal constraints at every sample.
314356

315-
### 7.1 CPU Domain
357+
### 9.1 CPU Domain
316358

317359
**PCP metrics:** `kernel.all.cpu.*`, `kernel.percpu.cpu.*`
318360

@@ -329,7 +371,7 @@ interval.
329371
across samples so that rate-based tools (`pmval`, `pmrep`) produce correct results when
330372
replaying the archive.
331373

332-
### 7.2 Memory Domain
374+
### 9.2 Memory Domain
333375

334376
**PCP metrics:** `mem.util.*`
335377

@@ -341,7 +383,7 @@ replaying the archive.
341383

342384
**Constraint enforced:** `used + free == physmem`. `available ≈ free + cached`.
343385

344-
### 7.3 Disk Domain
386+
### 9.3 Disk Domain
345387

346388
**PCP metrics:** `disk.all.*`, `disk.dev.*`
347389

@@ -353,7 +395,7 @@ replaying the archive.
353395

354396
**Metric type:** counter (cumulative bytes and ops).
355397

356-
### 7.4 Network Domain
398+
### 9.4 Network Domain
357399

358400
**PCP metrics:** `network.interface.*`
359401

@@ -365,7 +407,7 @@ replaying the archive.
365407
Packet counts are estimated from byte totals assuming a 1400-byte mean packet size
366408
(configurable via a top-level `meta.mean_packet_bytes` key).
367409

368-
### 7.5 Load Average Domain
410+
### 9.5 Load Average Domain
369411

370412
**PCP metrics:** `kernel.all.load`
371413

@@ -375,7 +417,7 @@ UNIX load average decay constants.
375417

376418
---
377419

378-
## 8. CLI Interface
420+
## 10. CLI Interface
379421

380422
```
381423
pmlogsynth [OPTIONS] PROFILE
@@ -415,7 +457,7 @@ pmlogsynth --list-metrics
415457

416458
---
417459

418-
## 9. Output
460+
## 11. Output
419461

420462
`pmlogsynth` produces a standard PCP v3 archive:
421463

@@ -436,57 +478,106 @@ pcp -a ./out atop
436478

437479
---
438480

439-
## 10. File Layout
481+
## 12. Project Layout
440482

441483
```
442-
qa/pmlogsynth/
443-
├── pmlogsynth # CLI entry point
444-
├── profile.py # YAML loader and validator
445-
├── timeline.py # Phase sequencer, transition interpolation,
446-
│ # repeat expansion
447-
├── sampler.py # Gaussian noise, counter accumulation,
448-
│ # type coercion
449-
├── writer.py # libpcp_import wrapper (pmi.pmiLogImport)
450-
├── profiles/ # Bundled hardware profiles
451-
│ ├── generic-small.yaml
452-
│ ├── generic-medium.yaml
453-
│ ├── generic-large.yaml
454-
│ ├── generic-xlarge.yaml
455-
│ ├── compute-optimized.yaml
456-
│ ├── memory-optimized.yaml
457-
│ └── storage-optimized.yaml
458-
└── domains/
459-
├── cpu.py
460-
├── memory.py
461-
├── disk.py
462-
├── network.py
463-
└── load.py
464-
465-
qa/
466-
└── NNNNN # QA test: uses pmlogsynth to generate a fixture,
467-
# verifies with pmlogcheck + pmval
484+
pmlogsynth/ # repository root
485+
├── pyproject.toml # package metadata, dependencies, entry point
486+
├── README.md
487+
├── requirements.txt # pinned dev dependencies
488+
├── pmlogsynth/ # installable Python package
489+
│ ├── __init__.py
490+
│ ├── __main__.py # enables: python -m pmlogsynth
491+
│ ├── cli.py # argument parsing, entry point
492+
│ ├── profile.py # YAML loader and validator
493+
│ ├── timeline.py # phase sequencer, transition interpolation,
494+
│ │ # repeat expansion
495+
│ ├── sampler.py # Gaussian noise, counter accumulation,
496+
│ │ # type coercion
497+
│ ├── writer.py # libpcp_import wrapper (pcp.pmi.pmiLogImport)
498+
│ ├── profiles/ # bundled hardware profiles (package data)
499+
│ │ ├── generic-small.yaml
500+
│ │ ├── generic-medium.yaml
501+
│ │ ├── generic-large.yaml
502+
│ │ ├── generic-xlarge.yaml
503+
│ │ ├── compute-optimized.yaml
504+
│ │ ├── memory-optimized.yaml
505+
│ │ └── storage-optimized.yaml
506+
│ └── domains/
507+
│ ├── cpu.py
508+
│ ├── memory.py
509+
│ ├── disk.py
510+
│ ├── network.py
511+
│ └── load.py
512+
└── tests/
513+
├── test_profile.py # profile loading and validation
514+
├── test_timeline.py # phase sequencing and repeat expansion
515+
├── test_sampler.py # noise and counter accumulation
516+
├── test_domains.py # per-domain metric consistency checks
517+
└── test_writer.py # archive generation (requires PCP installed)
468518
```
469519

470520
**User profile directory:** `~/.pcp/pmlogsynth/profiles/`
471521

522+
### Installation
523+
524+
```bash
525+
pip install pmlogsynth
526+
527+
# Or from source:
528+
git clone https://github.com/<org>/pmlogsynth
529+
cd pmlogsynth
530+
pip install -e .
531+
```
532+
533+
`pyproject.toml` declares the entry point:
534+
535+
```toml
536+
[project.scripts]
537+
pmlogsynth = "pmlogsynth.cli:main"
538+
```
539+
472540
---
473541

474-
## 11. QA Test Requirements
542+
## 13. Test Requirements
543+
544+
Tests are written with `pytest` and live in `tests/`. They are split into two tiers:
545+
546+
### Tier 1 — unit tests (no PCP required)
475547

476-
A QA test must be included that:
548+
Test profile loading, validation, timeline sequencing, phase transitions, repeat
549+
expansion, noise application, and counter accumulation without writing any archive.
550+
All domain consistency constraints are verified at the value-computation level.
551+
These tests run anywhere Python 3.8+ is available.
477552

478-
1. Generates an archive from a known profile
479-
2. Runs `pmlogcheck` against the output and asserts it passes
480-
3. Runs `pmval` against one metric per domain and asserts the values are within
553+
### Tier 2 — integration tests (PCP must be installed)
554+
555+
Generate a real archive from a known profile, then verify it with PCP tooling:
556+
557+
1. Run `pmlogsynth` against a fixed reference profile
558+
2. Run `pmlogcheck` against the output and assert it passes
559+
3. Run `pmval` against one metric per domain and assert values are within
481560
the expected range (stressor value ± noise tolerance)
482-
4. Validates that the archive start and end timestamps match `--start` and
483-
`meta.duration`
561+
4. Assert the archive start and end timestamps match `--start` and `meta.duration`
484562

485-
The test must not require a running `pmcd` or root access.
563+
Tier 2 tests are skipped automatically (via a pytest fixture) if `pmlogcheck` is not
564+
found on `PATH`. This allows the test suite to run in environments without PCP installed,
565+
with only Tier 1 executing.
566+
567+
```bash
568+
# Run all tests
569+
pytest
570+
571+
# Run only unit tests (no PCP needed)
572+
pytest -m "not integration"
573+
574+
# Run with verbose output
575+
pytest -v
576+
```
486577

487578
---
488579

489-
## 12. Future Enhancements
580+
## 14. Future Enhancements
490581

491582
The following items are explicitly deferred from Phase 1:
492583

0 commit comments

Comments
 (0)