@@ -35,7 +35,19 @@ toolchain.
3535
3636---
3737
38- ## 3. Goals
38+ ## 3. Project Overview
39+
40+ ` pmlogsynth ` is a standalone Python project distributed via PyPI and developed independently
41+ of PCP. It depends on PCP being installed on the host (for ` libpcp_import ` and the ` pcp `
42+ Python bindings), but is otherwise self-contained.
43+
44+ The project is intended for eventual contribution back to PCP, but operates as its own
45+ repository to allow faster iteration, independent releases, and contributions from users
46+ who are not PCP committers.
47+
48+ ---
49+
50+ ## 4. Goals
3951
4052- Produce archives indistinguishable in format from ` pmlogger ` output
4153- Support real PCP metric namespaces (kernel, disk, network, memory) with correct units
@@ -64,7 +76,35 @@ The following are explicitly deferred and will not be addressed in this phase:
6476
6577---
6678
67- ## 4. Architecture
79+ ## 5. Dependencies
80+
81+ ### Runtime
82+
83+ | Dependency | How to install | Notes |
84+ | ---| ---| ---|
85+ | ** Python 3.8+** | System package manager | ` python3 ` |
86+ | ** PCP** | See [ PCP installation docs] ( https://pcp.io/docs/guide.html ) | Provides ` libpcp_import.so ` and the ` pcp ` Python bindings |
87+ | ** ` python3-pcp ` ** | System package manager | RPM: ` python3-pcp ` ; Deb: ` python3-pcp ` ; provides ` pcp.pmi ` , ` pcp.pmapi ` , and the ` cpmi ` C extension |
88+ | ** PyYAML** | ` pip install pyyaml ` | Profile parsing |
89+
90+ ### Optional (Phase 2 — natural language generation)
91+
92+ | Dependency | How to install | Notes |
93+ | ---| ---| ---|
94+ | ** ` anthropic>=0.20.0 ` ** | ` pip install anthropic ` | Anthropic Python SDK; only needed for ` --prompt ` |
95+
96+ ### Not required
97+
98+ - No C compiler (pure Python after PCP is installed)
99+ - No ` numpy ` — Gaussian noise uses ` random.gauss ` from stdlib
100+ - No running ` pmcd `
101+ - No root access
102+ - No database, message queue, or web service
103+ - Phase 3 parallel ` --jobs ` uses ` concurrent.futures ` from stdlib
104+
105+ ---
106+
107+ ## 6. Architecture
68108
69109```
70110profile.yaml
@@ -79,29 +119,30 @@ profile.yaml
79119 ValueSampler
80120 │ (applies Gaussian noise, accumulates counters, coerces types)
81121 ▼
82- libpcp_import (via pmi.pmiLogImport Python bindings)
122+ libpcp_import (via pcp. pmi.pmiLogImport Python bindings)
83123 │
84124 ▼
85125output.{0,index,meta}
86126```
87127
88128### Implementation Language
89129
90- Python 3. Depends only on the ` pcp ` Python module (included in any PCP installation).
91- No third-party dependencies are required.
130+ Python 3. Depends only on the ` pcp ` Python package (installed alongside any PCP
131+ installation that includes Python bindings) and PyYAML. No other third-party dependencies
132+ are required for core archive generation.
92133
93134---
94135
95- ## 5 . Hardware Profile Library
136+ ## 7 . Hardware Profile Library
96137
97- ### 5 .1 Concept
138+ ### 7 .1 Concept
98139
99140A hardware profile is a named YAML document that describes the physical or virtual host
100141being simulated: CPU count, RAM, disk devices, and network interfaces. Profiles decouple
101142the "what hardware" question from the "what workload" question, making profiles reusable
102143across many workload scenarios.
103144
104- ### 5 .2 Bundled Profiles
145+ ### 7 .2 Bundled Profiles
105146
106147` pmlogsynth ` ships with a small set of generic reference host profiles. These are loosely
107148inspired by common cloud instance tiers but are not tied to any vendor — they serve as
@@ -117,9 +158,10 @@ reasonable, recognisable starting points.
117158| ` memory-optimized ` | 4 | 64 GB | 1× NVMe | 1× 10 GbE | High RAM, modest CPU |
118159| ` storage-optimized ` | 4 | 16 GB | 4× HDD | 1× 10 GbE | High disk capacity |
119160
120- Bundled profiles live in ` qa/pmlogsynth/profiles/ ` as individual YAML files.
161+ Bundled profiles are packaged inside the ` pmlogsynth/profiles/ ` directory and installed
162+ as package data alongside the Python source.
121163
122- ### 5 .3 User-Defined Profiles
164+ ### 7 .3 User-Defined Profiles
123165
124166Users may define their own profiles — or override bundled ones — by placing YAML files in:
125167
@@ -153,21 +195,21 @@ interfaces:
153195 speed_mbps : 25000
154196` ` `
155197
156- ### 5 .4 Build-Time Validation
198+ ### 7 .4 Profile Validation in CI
157199
158- The build system runs a schema validation pass over all bundled profiles in
159- ` qa/ pmlogsynth/profiles/`. Any malformed profile fails the build . Content review
200+ The CI pipeline runs a schema validation pass over all bundled profiles in
201+ ` pmlogsynth/profiles/`. Any malformed profile fails the test run . Content review
160202of contributed profiles remains a human responsibility.
161203
162204---
163205
164- # # 6 . Profile Format
206+ # # 8 . Profile Format
165207
166208A profile is a YAML file that describes the simulated host and a timeline of workload
167209**phases**. Each phase has a duration and a set of **stressors** that drive one or more
168210metric domains.
169211
170- # ## 6 .1 Full Example
212+ # ## 8 .1 Full Example
171213
172214` ` ` yaml
173215# cpu-memory-spike.yaml
@@ -240,14 +282,14 @@ phases:
240282 tx_mbps: 2.0
241283` ` `
242284
243- # ## 6 .2 Phase Transitions
285+ # ## 8 .2 Phase Transitions
244286
245287| Value | Behaviour |
246288|-------|-----------|
247289| `instant` (default) | Values jump immediately at the phase boundary |
248290| `linear` | Values interpolate linearly over the full phase duration from prior phase end values |
249291
250- # ## 6 .3 Repeating Phases
292+ # ## 8 .3 Repeating Phases
251293
252294A phase may include a `repeat` key to express recurring patterns without copy-pasting.
253295The timeline sequencer expands repeats before writing begins.
@@ -275,7 +317,7 @@ When `repeat: daily` is used, the sequencer inserts the baseline phase between e
275317repetition to fill the 24-hour period. `meta.duration` must accommodate the full
276318expanded timeline; the validator will reject profiles where this does not hold.
277319
278- # ## 6 .4 Noise
320+ # ## 8 .4 Noise
279321
280322A `noise:` key at domain level overrides `meta.noise` for that domain only :
281323
@@ -288,13 +330,13 @@ A `noise:` key at domain level overrides `meta.noise` for that domain only:
288330 write_mbps: 5.0
289331` ` `
290332
291- # ## 6 .5 Instance Domains
333+ # ## 8 .5 Instance Domains
292334
293335Disk and NIC instances are derived from the host configuration and remain **fixed**
294336for the lifetime of the archive. Instance names match the device names in the host
295337profile (e.g. `nvme0n1`, `eth0`).
296338
297- # ## 6 .6 Constraints Enforced at Validation
339+ # ## 8 .6 Constraints Enforced at Validation
298340
299341- ` user_ratio + sys_ratio + iowait_ratio ≤ 1.0` (remainder is steal/other)
300342- Sum of phase durations == `meta.duration` (when no `repeat` key is present)
@@ -307,12 +349,12 @@ profile (e.g. `nvme0n1`, `eth0`).
307349
308350---
309351
310- # # 7 . Metric Domains and Consistency Model
352+ # # 9 . Metric Domains and Consistency Model
311353
312354Each domain is a self-contained `MetricModel` subclass that accepts high-level stressor
313355values and derives all related PCP metrics, enforcing internal constraints at every sample.
314356
315- # ## 7 .1 CPU Domain
357+ # ## 9 .1 CPU Domain
316358
317359**PCP metrics:** `kernel.all.cpu.*`, `kernel.percpu.cpu.*`
318360
@@ -329,7 +371,7 @@ interval.
329371across samples so that rate-based tools (`pmval`, `pmrep`) produce correct results when
330372replaying the archive.
331373
332- # ## 7 .2 Memory Domain
374+ # ## 9 .2 Memory Domain
333375
334376**PCP metrics:** `mem.util.*`
335377
@@ -341,7 +383,7 @@ replaying the archive.
341383
342384**Constraint enforced:** `used + free == physmem`. `available ≈ free + cached`.
343385
344- # ## 7 .3 Disk Domain
386+ # ## 9 .3 Disk Domain
345387
346388**PCP metrics:** `disk.all.*`, `disk.dev.*`
347389
@@ -353,7 +395,7 @@ replaying the archive.
353395
354396**Metric type:** counter (cumulative bytes and ops).
355397
356- # ## 7 .4 Network Domain
398+ # ## 9 .4 Network Domain
357399
358400**PCP metrics:** `network.interface.*`
359401
@@ -365,7 +407,7 @@ replaying the archive.
365407Packet counts are estimated from byte totals assuming a 1400-byte mean packet size
366408(configurable via a top-level `meta.mean_packet_bytes` key).
367409
368- # ## 7 .5 Load Average Domain
410+ # ## 9 .5 Load Average Domain
369411
370412**PCP metrics:** `kernel.all.load`
371413
@@ -375,7 +417,7 @@ UNIX load average decay constants.
375417
376418---
377419
378- # # 8 . CLI Interface
420+ # # 10 . CLI Interface
379421
380422```
381423pmlogsynth [ OPTIONS] PROFILE
@@ -415,7 +457,7 @@ pmlogsynth --list-metrics
415457
416458---
417459
418- ## 9 . Output
460+ ## 11 . Output
419461
420462` pmlogsynth ` produces a standard PCP v3 archive:
421463
@@ -436,57 +478,106 @@ pcp -a ./out atop
436478
437479---
438480
439- ## 10. File Layout
481+ ## 12. Project Layout
440482
441483```
442- qa/pmlogsynth/
443- ├── pmlogsynth # CLI entry point
444- ├── profile.py # YAML loader and validator
445- ├── timeline.py # Phase sequencer, transition interpolation,
446- │ # repeat expansion
447- ├── sampler.py # Gaussian noise, counter accumulation,
448- │ # type coercion
449- ├── writer.py # libpcp_import wrapper (pmi.pmiLogImport)
450- ├── profiles/ # Bundled hardware profiles
451- │ ├── generic-small.yaml
452- │ ├── generic-medium.yaml
453- │ ├── generic-large.yaml
454- │ ├── generic-xlarge.yaml
455- │ ├── compute-optimized.yaml
456- │ ├── memory-optimized.yaml
457- │ └── storage-optimized.yaml
458- └── domains/
459- ├── cpu.py
460- ├── memory.py
461- ├── disk.py
462- ├── network.py
463- └── load.py
464-
465- qa/
466- └── NNNNN # QA test: uses pmlogsynth to generate a fixture,
467- # verifies with pmlogcheck + pmval
484+ pmlogsynth/ # repository root
485+ ├── pyproject.toml # package metadata, dependencies, entry point
486+ ├── README.md
487+ ├── requirements.txt # pinned dev dependencies
488+ ├── pmlogsynth/ # installable Python package
489+ │ ├── __init__.py
490+ │ ├── __main__.py # enables: python -m pmlogsynth
491+ │ ├── cli.py # argument parsing, entry point
492+ │ ├── profile.py # YAML loader and validator
493+ │ ├── timeline.py # phase sequencer, transition interpolation,
494+ │ │ # repeat expansion
495+ │ ├── sampler.py # Gaussian noise, counter accumulation,
496+ │ │ # type coercion
497+ │ ├── writer.py # libpcp_import wrapper (pcp.pmi.pmiLogImport)
498+ │ ├── profiles/ # bundled hardware profiles (package data)
499+ │ │ ├── generic-small.yaml
500+ │ │ ├── generic-medium.yaml
501+ │ │ ├── generic-large.yaml
502+ │ │ ├── generic-xlarge.yaml
503+ │ │ ├── compute-optimized.yaml
504+ │ │ ├── memory-optimized.yaml
505+ │ │ └── storage-optimized.yaml
506+ │ └── domains/
507+ │ ├── cpu.py
508+ │ ├── memory.py
509+ │ ├── disk.py
510+ │ ├── network.py
511+ │ └── load.py
512+ └── tests/
513+ ├── test_profile.py # profile loading and validation
514+ ├── test_timeline.py # phase sequencing and repeat expansion
515+ ├── test_sampler.py # noise and counter accumulation
516+ ├── test_domains.py # per-domain metric consistency checks
517+ └── test_writer.py # archive generation (requires PCP installed)
468518```
469519
470520** User profile directory:** ` ~/.pcp/pmlogsynth/profiles/ `
471521
522+ ### Installation
523+
524+ ``` bash
525+ pip install pmlogsynth
526+
527+ # Or from source:
528+ git clone https://github.com/< org> /pmlogsynth
529+ cd pmlogsynth
530+ pip install -e .
531+ ```
532+
533+ ` pyproject.toml ` declares the entry point:
534+
535+ ``` toml
536+ [project .scripts ]
537+ pmlogsynth = " pmlogsynth.cli:main"
538+ ```
539+
472540---
473541
474- ## 11. QA Test Requirements
542+ ## 13. Test Requirements
543+
544+ Tests are written with ` pytest ` and live in ` tests/ ` . They are split into two tiers:
545+
546+ ### Tier 1 — unit tests (no PCP required)
475547
476- A QA test must be included that:
548+ Test profile loading, validation, timeline sequencing, phase transitions, repeat
549+ expansion, noise application, and counter accumulation without writing any archive.
550+ All domain consistency constraints are verified at the value-computation level.
551+ These tests run anywhere Python 3.8+ is available.
477552
478- 1 . Generates an archive from a known profile
479- 2 . Runs ` pmlogcheck ` against the output and asserts it passes
480- 3 . Runs ` pmval ` against one metric per domain and asserts the values are within
553+ ### Tier 2 — integration tests (PCP must be installed)
554+
555+ Generate a real archive from a known profile, then verify it with PCP tooling:
556+
557+ 1 . Run ` pmlogsynth ` against a fixed reference profile
558+ 2 . Run ` pmlogcheck ` against the output and assert it passes
559+ 3 . Run ` pmval ` against one metric per domain and assert values are within
481560 the expected range (stressor value ± noise tolerance)
482- 4 . Validates that the archive start and end timestamps match ` --start ` and
483- ` meta.duration `
561+ 4 . Assert the archive start and end timestamps match ` --start ` and ` meta.duration `
484562
485- The test must not require a running ` pmcd ` or root access.
563+ Tier 2 tests are skipped automatically (via a pytest fixture) if ` pmlogcheck ` is not
564+ found on ` PATH ` . This allows the test suite to run in environments without PCP installed,
565+ with only Tier 1 executing.
566+
567+ ``` bash
568+ # Run all tests
569+ pytest
570+
571+ # Run only unit tests (no PCP needed)
572+ pytest -m " not integration"
573+
574+ # Run with verbose output
575+ pytest -v
576+ ```
486577
487578---
488579
489- ## 12 . Future Enhancements
580+ ## 14 . Future Enhancements
490581
491582The following items are explicitly deferred from Phase 1:
492583
0 commit comments