Skip to content

qiskit-community/qcode-discovery

Repository files navigation

qcode-discovery

Discovering bivariate bicycle (BB) and perturbed bivariate bicycle (PBB) quantum LDPC codes via LLM-guided evolutionary search.

Five evolution campaigns employing six LLMs from three families discover 465 distinct codes (97 CSS, 368 non-CSS) with CSS encoding dimensions up to k = 54 (prior best: k = 16). MILP distance computation reveals a sharp rate-distance tradeoff and that BP-OSD overestimates distance by up to 12x for high-rate codes. Total cost: ~US$400 over ~140 hours.

Paper: "Evolutionary Discovery of Bivariate Bicycle Codes with LLM-Guided Search" — see paper/paper.tex.

Headline Results

Best CSS BB Codes (MILP-verified distances)

Code Lattice A polynomial B polynomial d (MILP) FOM Notes
[[288,24,12]] (12,12) x⁶ + y + y² y³ + x² + x⁴ 12 (exact) 12.0 Direct sum of 2 gross codes
[[288,50,8]] (18,8) 1+y⁵+x+xy⁵ 1+y+x⁵+x⁵y 8 (exact) 11.1 Cross-factored, weight-8
[[360,20,≤14]] (30,6) 1+x+x³y⁴+x⁶y² 1+y²+x³y²+x⁶y⁴ ≤14 ≤10.9 Mixed-monomial
[[360,16,≤14]] (15,12) y² + y⁴ + x³ y⁶ + x² + x⁴ ≤14 ≤8.7 x/y-swap
[[288,16,12]] (12,12) x³ + y + y² y³ + x + x² 12 (exact) 8.0 All shifts ≤ 3 hops
[[144,8,12]] (12,6) 1+xy²+xy³ 1+x²y³+x³y² 12 (exact) 8.0 Mixed-monomial

Best Non-CSS PBB Codes (MILP-verified)

Code Lattice d (MILP) FOM Notes
[[360,12,≤24]] (30,6) ≤24 ≤19.2 Highest trusted PBB FOM, LER < p at all tested rates
[[360,12,≤20]] (30,6) ≤20 ≤13.3 Additional simulated high-distance PBB code
[[144,12,12]] (12,6) 12 (exact) 12.0 Non-CSS gross code (matches CSS FOM)
[[108,8,10]] (9,6) 10 (exact) 7.4
[[72,12,6]] (6,6) 6 (exact) 6.0 Highest-FOM n=72 PBB representative

FOM = kd²/n. Comparison baseline: Bravyi et al. 2024 (arXiv:2308.07915). "Exact" = MILP proven optimal (MIP gap = 0).

Key Findings

  • A = B distance trap: All BB codes with A = B have d = 2 exactly (Theorem 1). BP-OSD fails to detect this even at 1.5M trials.
  • [[288,24,12]] decomposition: Tanner graph analysis reveals this is a direct sum of two [[144,12,12]] gross codes — no error-correction advantage over two independent copies.
  • BP-OSD overestimation: Up to 12x for high-rate codes. The [[360,40,2]] yielded d_BP ≤ 24 at 150k trials vs d_MILP = 2.
  • Rate-distance tradeoff: Indecomposable d = 12 CSS codes limited to k ≤ 16; k > 24 implies d ≤ 8. Non-CSS PBB codes occupy the same envelope.
  • Univariate/HGP codes: The dominant high-k family (k = 40 at n = 360) has d ≤ 4 universally. k = 8ℓ/3 proven via CRT decomposition.

Evolution Campaigns

Campaign Type Models Iters Pop. Codes Time Cost
1 CSS Gemini 3 Flash 100 100 9 5h $15
2 CSS Ensemble (3 models) 251 100 0* 9.5h $25
3 CSS Ensemble (3 models) 500 1,000 145 21h $50
4 CSS Ensemble (3 models) 300 750 45 92h $47
5 PBB Ensemble (3 models) 500 200 368 11h $100

*All Campaign 2 codes subsumed by Campaign 3. "Ensemble" = Claude Opus 4.6 + GPT-5.2/5.3 + Gemini 3 Pro/3.1 Pro with equal selection weight. Campaigns 1-3 ran on Apple M4 Max; Campaigns 4-5 on a 64-core server.

How It Works

What Is a BB Code?

A bivariate bicycle code is defined by two polynomials over F_2[x,y]/(x^ℓ - 1, y^m - 1). Code parameters: n = 2·ℓ·m physical qubits, k logical qubits (exact via GF(2) rank), d code distance (expensive — see Distance Estimation). CSS commutativity is automatic, so the search is purely combinatorial.

A perturbed bivariate bicycle (PBB) code augments the construction with two additional polynomials (C, D) creating mixed X/Z stabilizers. When C = D = 0, the code reduces to a CSS BB code.

Architecture

                    ┌─────────────────────────┐
                    │    Evolutionary Loop     │
                    │  (OpenEvolve + LLM       │
                    │   ensemble: 3 models)    │
                    │                          │
                    │  Evolves the strategy    │
                    │  in generate_candidates  │
                    └───────────┬──────────────┘
                                │ proposes polynomial tuples
                                ▼
┌──────────────────────────────────────────────────────────────┐
│                    Evaluation Cascade                         │
│                                                              │
│  Stage 1: k-only screening (~2s)  — GF(2) rank on 2 lattices│
│  Stage 2: BP-OSD distance (~30-60s) — on 8 lattices          │
│  Stage 3: MILP exact distance (Campaigns 4-5, in-loop)       │
│                                                              │
│  Post-hoc: MILP verification on all discovered codes         │
│            150k multi-decoder BP-OSD protocol                │
│            3-tier adaptive pipeline (non-CSS)                │
└──────────────────────────────┬───────────────────────────────┘
                               │
                               ▼
                    ┌──────────────────────┐
                    │  Results & Tracking  │
                    │  - JSON persistence  │
                    │  - JSONL run logs    │
                    │  - W&B dashboards    │
                    └──────────────────────┘

Known Benchmark Codes (Bravyi et al. 2024)

Code m A polynomial B polynomial FOM
[[72,12,6]] 6 6 x³ + y + y² y³ + x + x² 6.0
[[90,8,10]] 15 3 x⁹ + y + y² 1 + x² + x⁷ 8.9
[[144,12,12]] 12 6 x³ + y + y² y³ + x + x² 12.0
[[288,12,18]] 12 12 x³ + y² + y⁷ y³ + x + x² 13.5
[[360,12,≤24]] 30 6 x⁹ + y + y² y³ + x²⁵ + x²⁶ 19.2

Source: Bravyi et al. 2024 (arXiv:2308.07915)

Setup

Requires Python ≥ 3.10. Uses uv for dependency management.

# Core dependencies only
uv sync

# With dev tools (pytest) — recommended
uv sync --group dev

# Everything (dev + evolution + tracking)
uv sync --group dev --group evolve --group tracking

Important: uv sync removes packages from groups not listed in the command. Always include all groups you need in a single call.

Dependencies

Group Packages Purpose
core qldpc, ldpc, numpy, scipy, galois, sympy BB/PBB code construction, decoding, algebra
dev pytest, python-igraph Test suite, graph isomorphism
evolve openevolve, litellm[proxy] Evolutionary search + LLM proxy
tracking wandb Experiment tracking dashboards

Usage

Run the evaluation pipeline

# Evaluate all 18 target lattices (quick mode — k only, ~30s)
uv run python main.py --quick

# Evaluate specific lattices with distance estimation
uv run python main.py --lattices 12,6 6,6

# Verbose output
uv run python main.py --lattices 6,6 --quick -v

Run tests

# All tests (~1s)
uv run python -m pytest tests/ -v

# Skip slow tests (distance estimation)
uv run python -m pytest tests/ -v -m "not slow"

# Single test
uv run python -m pytest tests/test_known_codes.py::TestEvaluator::test_gross_code_quick -v

Use as a library

from evaluation.evaluator import evaluate_candidate

# Evaluate the gross code
result = evaluate_candidate(
    12, 6,
    [(3, 0), (0, 1), (0, 2)],  # A = x^3 + y + y^2
    [(0, 3), (1, 0), (2, 0)],  # B = y^3 + x + x^2
)
print(f"[[{result['n']},{result['k']},{result['d']}]] FOM={result['fom']:.1f}")
# [[144,12,12]] FOM=12.0
from evaluation.pbb_code import build_pbb_code
# Build a non-CSS PBB code (4 polynomials)
code = build_pbb_code(12, 6, A_terms, B_terms, C_terms, D_terms)
from evaluation.clifford_equivalence import is_lc_equivalent_css, verify_lc_bruteforce

# Algebraic check: can this PBB code be made CSS by per-qubit Cliffords?
result = is_lc_equivalent_css(A_terms, B_terms, C_terms, D_terms)
print(result)  # {'equivalent': False, 'conditions': {...}}

# Brute-force verification (tries all 16 uniform Clifford assignments)
result = verify_lc_bruteforce(12, 6, A_terms, B_terms, C_terms, D_terms)
print(result)  # {'equivalent': False, 'algebraic_match': True, ...}

Project Structure

qcode-discovery/
├── main.py                                # CLI entry point for manual evaluation
├── pyproject.toml                         # uv project config + dependency groups
├── evaluation/                            # Core evaluation pipeline
│   ├── bb_code.py                         #   CSS BB code construction (qldpc.BBCode)
│   ├── pbb_code.py                        #   PBB non-CSS code construction
│   ├── distance.py                        #   BP-OSD distance estimation (CSS)
│   ├── distance_milp.py                   #   MILP exact distance (CSS + symplectic)
│   ├── distance_bposd_noncss.py           #   BP-OSD for non-CSS (achievable-syndrome sampling)
│   ├── evaluator.py                       #   Multi-stage evaluation cascade
│   ├── clifford_equivalence.py            #   LC equivalence to CSS (6 public functions, see below)
│   ├── mirror_code.py                     #   Mirror code construction (Khesin & Lu)
│   ├── results.py                         #   JSON persistence & Pareto front
│   ├── tracking.py                        #   Run logging and metrics (JSONL)
│   └── threshold.py                       #   Threshold simulation stub (not implemented)
├── evolve/                                # LLM-guided evolutionary search
│   ├── seed_solution.py                   #   CSS seed (Campaigns 1-3)
│   ├── seed_solution_ansatz.py            #   Mixed-monomial seed (Campaign 4)
│   ├── seed_solution_noncss.py            #   PBB 4-tuple seed (Campaign 5)
│   ├── openevolve_evaluator.py            #   CSS evaluator adapter
│   ├── openevolve_evaluator_noncss.py     #   Non-CSS evaluator (3-tier pipeline)
│   ├── _noncss_distance_worker.py        #   Multiprocessing worker for non-CSS distance
│   ├── run_evolution.py                   #   Entry point (--noncss for Campaign 5)
│   ├── config.yaml                        #   Config: Campaigns 1-3
│   ├── config_ansatz.yaml                 #   Config: Campaign 4
│   ├── config_ansatz_server.yaml          #   Config: Campaign 4 (64-core server)
│   ├── config_noncss.yaml                 #   Config: Campaign 5
│   ├── prompt_context.md                  #   Domain knowledge: Campaigns 1-3
│   ├── prompt_context_ansatz.md           #   Domain knowledge: Campaign 4
│   └── prompt_context_noncss.md           #   Domain knowledge: Campaign 5
├── scripts/                               # Verification & publication scripts
│   ├── verify_campaign7.py                #   Campaign 5 initial BLISS dedup + verification
│   ├── verify_campaign7d.py               #   Campaign 5 variant verification
│   ├── verify_deep_milp.py                #   Deep MILP (up to 14400s/logical, 60 workers)
│   ├── verify_deep_milp_remote_n360.py    #   Remote MILP for n=360 codes
│   ├── verify_deep_milp_remote_standalone.py #  Standalone remote MILP worker
│   ├── verify_from_scratch.py             #   Full re-verification from raw data
│   ├── verify_publication.py              #   Merge all verification passes → publication
│   ├── verify_top_codes.py                #   Top codes verification
│   ├── verify_top_codes_deep.py           #   Extended MILP on top codes
│   ├── extract_witnesses.py               #   Extract witness (low-weight) operators
│   └── test_fast_weight_check.py          #   Fast weight-check utility
├── tests/                                 # Test suite + analysis scripts
│   ├── test_known_codes.py                #   pytest regression suite (40 tests)
│   ├── test_bposd_noncss.py              #   Non-CSS BP-OSD tests
│   ├── test_pbb_code.py                   #   PBB code construction tests
│   ├── test_clifford_equivalence.py       #   Clifford equivalence tests
│   ├── test_mirror_code.py                #   Mirror code tests
│   ├── test_ansatz.py                     #   Campaign 4 mixed-monomial tests
│   ├── test_cm_verification.py            #   Constant-monomial verification tests
│   ├── test_k_formula_verification.py     #   k = 8ℓ/3 formula verification
│   ├── test_parallel_k_screening.py       #   Parallel k-screening tests
│   ├── ilp_catalog.py                     #   MILP catalog of all 97 CSS codes
│   ├── soak_test.py                       #   150k multi-decoder protocol
│   ├── threshold_simulation.py            #   CSS code capacity simulation
│   ├── threshold_simulation_noncss.py     #   Non-CSS code capacity simulation (X-only noise)
│   ├── threshold_simulation_independent_xz.py # PBB depolarizing channel, iid X/Z decoder prior (n ≤ 144)
│   ├── threshold_simulation_360_12_20_depol.py # PBB [[360,12,≤20]] depolarizing channel
│   ├── threshold_simulation_360_12_24_depol.py # PBB [[360,12,≤24]] depolarizing channel
│   ├── threshold_simulation_360_12_24.py  #   PBB [[360,12,≤24]] X-only channel
│   ├── threshold_simulation_missing.py    #   Gap-filling simulations (extra CSS + [[360,12,≤20]])
│   ├── verify_decomposition_288_24_12.py  #   [[288,24,12]] direct-sum proof
│   ├── verify_bravyi_codes.py            #   Bravyi baseline MILP validation
│   ├── verify_ensemble_codes.py          #   Two-round ensemble verification (60k/150k)
│   ├── verify_campaign4_milp.py          #   Campaign 4 MILP on all codes
│   ├── verify_ga_distances.py            #   Exhaustive GA distance verification
│   ├── verify_novel_incumbents.py        #   Novel incumbent verification
│   ├── reverify_top_incumbents.py        #   Extended MILP on C4 top 39 codes
│   ├── check_all_equivalences.py         #   BLISS Tanner-graph deduplication
│   ├── check_code_equivalence.py         #   Pairwise code equivalence checking
│   ├── paper_equivalence_check.py        #   Paper-specific equivalence checks
│   ├── ablation_study.py                 #   Original 3-arm ablation
│   ├── ablation_k_only.py               #   8-arm k-only ablation
│   ├── ablation_extended.py             #   Extended ablation with distance
│   ├── ablation_ga_generators.py        #   GA-on-generators ablation
│   ├── milp_ga_codes.py                 #   MILP on best GA codes
│   ├── milp_optimality_audit.py         #   d=12/14 per-logical optimality audit
│   ├── bp_osd_intensive_search.py       #   300k-trial BP-OSD search on top codes
│   ├── bp_osd_intensive_search_batch2.py #  Batch 2 of the same search
│   ├── per_batch_distributions.py       #   Per-batch decoder distribution analysis
│   ├── extended_verification.py         #   1.5M trial extended verification
│   ├── enumerate_constant_monomial.py   #   Factored-product enumeration
│   ├── generate_pbb_catalog_rows.py     #   Generate LaTeX PBB catalog tables
│   ├── benchmark_noncss.py              #   Non-CSS comparative benchmark
│   ├── lattice_survey.py                #   Lattice-level code survey
│   ├── pbb_survey.py                    #   PBB code survey at small lattices
│   └── pbb_survey_milp_66.py            #   PBB MILP survey at (6,6)
├── results/                               # Verified results and evolution artifacts
│   ├── ilp_catalog.json                   #   97 CSS codes with MILP distances
│   ├── campaign7_publication_merged.jsonl  #   368 non-CSS PBB codes (publication-quality)
│   ├── threshold_simulation*.json         #   Code capacity simulation data
│   ├── ablation_*.json                    #   Ablation study results
│   ├── evolution_gemini3flash_seed42/     #   Campaign 1 checkpoints
│   ├── evolution_ensemble1_pop100/        #   Campaign 2 checkpoints
│   ├── evolution_ensemble2_pop1000/       #   Campaign 3 checkpoints
│   ├── evolution_ansatz_campaign4/        #   Campaign 4 checkpoints
│   └── evolution/                         #   Campaign 5 runs
│       └── campaign7/                    #     Published Campaign 5 checkpoints
├── paper/                                 # Paper source and figures
│   ├── paper.tex                          #   Main paper
│   ├── supplemental.tex                   #   Supplemental material
│   ├── css_catalog_tables.tex             #   Auto-generated CSS catalog
│   ├── pbb_catalog_tables.tex             #   Auto-generated PBB catalog
│   └── figures/                           #   Figure generation scripts + PDFs
└── plans/                                 # Research direction plans

See results/README.md for details on each data file. See tests/README.md for documentation on all analysis scripts.

Distance Estimation

Methods

Method Code Speed Accuracy Paper Section
BP-OSD (OSD_0) evaluation/distance.py Fast Loose (up to 12x overestimate) Sec V.A
BP-OSD (OSD-CS order 10) evaluation/distance.py Medium Tighter Sec V.A
MILP exact (CSS) evaluation/distance_milp.py Minutes-hours Exact or upper bound Sec V.B
MILP symplectic (non-CSS) evaluation/distance_milp.py ~4x slower than CSS Exact or upper bound Sec V.B
BP-OSD non-CSS evaluation/distance_bposd_noncss.py Fast Requires achievable-syndrome sampling Sec V.C
Hash-exact enumeration evaluation/distance_bposd_noncss.py Fast for d ≤ 6 Exact Sec V.C
Multi-decoder soak test tests/soak_test.py 30-120 min/code Publication-quality Sec V.A

Key insight: BP-OSD is unreliable for high-rate codes

MILP ground truth reveals that BP-OSD overestimates distance by 3-12x for codes with k/n > 0.1. The trust filter in the evolution cascade (d/√n ≤ 1.3 trusted, ≥ 2.0 discarded) mitigates this but is imperfect. For non-CSS codes, standard random syndrome sampling fails entirely — achievable-syndrome sampling (computing the reachable syndrome subspace via GF(2) null-space projection) is required.

Known Limitations

  • BP-OSD overestimation: Up to 12x for high-rate codes (k/n > 0.1). A single BP-OSD run is unreliable — estimates can range from 6 to 18 across independent batches on the same code. All publication distance claims use 150,000+ trials (3 decoders × 10 batches × 5,000 trials) via tests/soak_test.py.
  • MILP incumbents: Some large codes have milp_incumbent status (feasible solution found, but not proven optimal). This gives a valid upper bound on d, not exact d. The milp_details["exact"] flag is the authoritative indicator of proven optimality. See tests/milp_optimality_audit.py for per-logical audit.
  • LC equivalence coverage: evaluation/clifford_equivalence.py exhaustively tests all 36 uniform per-block Clifford assignments (6×6 over {I, S, H, HS, SH, HSH}). Non-uniform {I,S}, {H,HS}, and Hadamard patterns are handled by polynomial-time checks. Non-uniform {SH,HSH} patterns on block 1 and cross-family non-uniform Clifford assignments remain unchecked.
  • qldpc get_logical_ops() bug: The library's logical operator computation fails on some non-CSS codes (singular matrix errors). The codebase uses its own GF(2)-based get_symplectic_logicals() at evaluation/pbb_code.py instead.
  • _poly_to_matrix bug class: evaluation/pbb_code.py must use sympy.Poly(expr, x, y), not a bare sympy expression — qldpc silently produces wrong circulant matrices for multi-term polynomials otherwise.
  • PBB C/D convention: evaluation/pbb_code.py and the catalog in results/campaign7_publication_merged.jsonl follow paper Eq. 2: the block-1 Z-part is [C \| D]. An earlier ptb_code.py (now renamed) used [D \| C]; legacy JSON/JSONL data was migrated by scripts/migrate_cd_convention.py. PBB polynomial tuples written by hand should match paper Table V (e.g. for the [[144,12,12]] PBB at (12,6): C = y + x³y, D = y³ + x³y³). Passing legacy-convention values to build_pbb_code raises ValueError: Commutativity violated.

LC Equivalence to CSS (evaluation/clifford_equivalence.py)

This module determines whether a non-CSS PBB code can be transformed into a CSS code via per-qubit single-qubit Clifford operations. It underpins the paper's claim that 357 of 368 discovered PBB codes are CSS-inequivalent within the tested LC families (Sec. III.D, App. F): 10 are Hadamard-equivalent to CSS via non-uniform per-qubit H (caught by is_equivalently_css), 1 is LC-equivalent via uniform S on both blocks (caught by is_lc_equivalent_css_group), and the remaining 357 fail every tested Clifford reduction.

Functions (in escalating generality)

Function Line Scope Description
is_equivalently_css(code) 41 Hadamard only 2-coloring of constraint graph via Union-Find with parity. For mirror codes (no Y support).
is_lc_equivalent_css(A, B, C, D) 220 Generator-level, uniform S Tests 4 algebraic conditions: C=D=0 / C=0,D=B / D=0,C=A / C=A,D=B. Frozenset comparison.
is_lc_equivalent_css_group(ell, m, A, B, C, D) 278 Group-level, uniform S Rank condition: rowspan([C+s1*A | D+s2*B]) ⊆ rowspan([B^T | A^T]) for (s1,s2) in {0,1}². Strictly more general than generator-level.
verify_lc_bruteforce(ell, m, A, B, C, D) 406 Uniform, 6 gates Tests all 36 assignments of {I,S,H,HS,SH,HSH} × {I,S,H,HS,SH,HSH} on 2 blocks. Checks both generator-level and group-level CSS.
check_lc_pattern_feasibility(ell, m, A, B, C, D) 538 Per-qubit {I,S} and {H,HS} Column-wise feasibility: can per-qubit s_j cancel the Z-part? For circulant matrices, only uniform patterns are feasible.
verify_uniform_reduction_exact(ell, m, A, B, C, D) 620 Per-qubit {I,S}, non-uniform Full GF(2) affine system solver. Determines if any (possibly non-uniform) S-pattern makes the group CSS within the covered Clifford family.

The typical verification pipeline runs is_lc_equivalent_css_group (fast algebraic check) on all codes, then verify_lc_bruteforce and verify_uniform_reduction_exact for independent confirmation.

Reproducing the 11 / 357 CSS-equivalence accounting

The paper's §III.D / Appendix F accounting on the 368-code PBB catalog (10 Hadamard-CSS + 1 uniform-S-CSS + 357 CSS-inequivalent) is reproduced end-to-end in ~2 min by:

PYTHONPATH=. uv run python scripts/run_lc_analysis.py

This loads results/campaign7_dedup.jsonl (368 records), runs is_equivalently_css (Hadamard 2-coloring) + is_lc_equivalent_css_group (uniform per-block S/H) + verify_uniform_reduction_exact ({I,S}/{H,HS} GF(2)) on each, writes results/campaign7_lc_analysis.jsonl, and prints the 10 / 1 / 11 / 357 totals. The lone uniform-S code's polynomials (the [[36,4,6]] at (6,3) with $C = y + x^3 y$, $D = xy + x^4 y$) are visible in the per-record output.

Ablation Study

Eight arms compared on the same 8 lattices using deterministic Σ_k = Σ max{k} per lattice:

Arm Budget Σ_k Best FOM
Random (10³/lattice) 8,000 172
Random (10⁴/lattice) 80,000 299 ± 26
Seed 21,962 276
GA (exponent tuples) 80,000 703 ± 142 4.0 (d≤2 at standard lattices)
GA-G (generators) 14,000 1,037 ± 22 d≤2 everywhere
Campaign 1 (Flash) 213,216 704 12.0 (indecomposable: 8.0)

GA-G exceeds LLM on Σ_k but exclusively through degenerate d ≤ 2 codes. The LLM's advantage is semantic understanding: discovering algebraic families with d ≥ 6 and FOM up to 12.0.

Scripts: tests/ablation_k_only.py, tests/ablation_ga_generators.py, tests/milp_ga_codes.py.

Structural Families

Four structural families emerged, each with a distinct rate-distance profile:

  1. Univariate (HGP of cyclic codes): A = f(y), B = g(x). Highest k (up to 40 at n=360) but d ≤ 4 universally. k = 8ℓ/3 proven via CRT.
  2. x/y-swap: A = x^a + y^b + y^c, B = y^d + x^e + x^f. Only trinomial family with d ≥ 6. Best: [[288,16,12]].
  3. Mixed-monomial (Campaign 4): 4-6 term polynomials with x^a y^b terms. Access new (k,d) combinations. Best: [[288,50,8]] (FOM = 11.1).
  4. Non-CSS PBB (Campaign 5): 4-tuples (A, B, C, D). Match CSS FOM = 12.0 with non-CSS stabilizers. Best trusted upper bound: [[360,12,≤24]] (FOM ≤ 19.2); [[360,12,≤20]] is an additional simulated high-distance code.

Evolutionary Search with OpenEvolve

The evolutionary loop uses OpenEvolve to evolve the generate_candidates function via LLM-guided mutations.

# CSS evolution (Campaigns 1-3)
uv run python evolve/run_evolution.py \
    --models anthropic/claude-opus-4-6 openai/gpt-5.2 gemini/gemini-3-pro \
    --iterations 500 --run-name ensemble_v2

# Campaign 4 (mixed-monomial) — paper run used config_ansatz_server.yaml (pop=750)
uv run python evolve/run_evolution.py \
    --config evolve/config_ansatz_server.yaml \
    --seed evolve/seed_solution_ansatz.py \
    --iterations 300

# Campaign 5 (non-CSS PBB)
uv run python evolve/run_evolution.py --noncss --iterations 200

A LiteLLM-compatible proxy must be running. See evolve/config.yaml for full configuration options.


Paper-to-Repository Mapping

This section maps every section, figure, table, and claim in the paper and supplemental material to the corresponding code, scripts, and data files in this repository.

Campaign Naming Convention

The repository uses internal numbering that differs from the paper:

Paper Repo Name Run Directory Config
Campaign 1 Campaign 1 (Flash) results/evolution_gemini3flash_seed42/ evolve/config.yaml
Campaign 2 Ensemble #1 (pop=100) results/evolution_ensemble1_pop100/ evolve/config.yaml
Campaign 3 Ensemble #2 (pop=1000) results/evolution_ensemble2_pop1000/ evolve/config.yaml
Campaign 4 Ansatz results/evolution_ansatz_campaign4/ evolve/config_ansatz_server.yaml
Campaign 5 Campaign 7 results/evolution/campaign7/ evolve/config_noncss.yaml

Paper Sections → Code

Paper Section Key Files
Sec III.A BB codes evaluation/bb_code.py
Sec III.B PBB codes evaluation/pbb_code.py, evaluation/clifford_equivalence.py
Sec III.C Figure of merit evaluation/evaluator.py (FOM = kd²/n computed in evaluate_candidate())
Sec III.D A=B distance trap (Theorem 1) Algebraic proof in paper Sec III.D
Sec IV.A Evolutionary framework evolve/run_evolution.py, evolve/config*.yaml, evolve/prompt_context*.md
Sec IV.B Evaluation cascade evolve/openevolve_evaluator.py, evolve/openevolve_evaluator_noncss.py, evaluation/evaluator.py
Sec IV.C Campaign configs evolve/seed_solution.py (C1-3), evolve/seed_solution_ansatz.py (C4), evolve/seed_solution_noncss.py (C5)
Sec V.A BP-OSD limits evaluation/distance.py, tests/soak_test.py
Sec V.B MILP exact distance evaluation/distance_milp.py, tests/ilp_catalog.py, tests/milp_optimality_audit.py
Sec V.C Non-CSS pipeline evaluation/distance_bposd_noncss.py (achievable-syndrome sampling), evaluation/distance_milp.py (symplectic MILP)
Sec V.D Trust boundaries evaluation/evaluator.py (DISTANCE_TRUST_RATIO, DISTANCE_UNTRUST_RATIO)
Sec VI.A Structural families tests/check_all_equivalences.py (BLISS dedup), tests/verify_decomposition_288_24_12.py (direct-sum proof), tests/test_k_formula_verification.py (k = 8ℓ/3)
Sec VI.B Rate-distance tradeoff paper/figures/plot_pareto_frontier.py
Sec VI.C Comparison with prior art tests/verify_bravyi_codes.py, results/ilp_catalog.json
Sec VI.D Code capacity simulations tests/threshold_simulation.py, tests/threshold_simulation_noncss.py, tests/threshold_simulation_independent_xz.py (depolarizing channel + iid X/Z decoder, n ≤ 144), tests/threshold_simulation_360_12_20_depol.py, tests/threshold_simulation_360_12_24_depol.py, tests/threshold_simulation_360_12_24.py, tests/threshold_simulation_missing.py
Sec VI.E BP-OSD overestimation tests/bp_osd_intensive_search.py, tests/bp_osd_intensive_search_batch2.py, tests/per_batch_distributions.py, tests/extended_verification.py
Sec VI.F Ablation tests/ablation_k_only.py, tests/ablation_ga_generators.py, tests/ablation_extended.py, tests/milp_ga_codes.py, tests/verify_ga_distances.py
App C Representative LLM mutations results/evolution_gemini3flash_seed42/ (Campaign 1 checkpoint diffs)
App F LC-CSS equivalence accounting evaluation/clifford_equivalence.py (is_lc_equivalent_css, verify_lc_bruteforce, check_lc_pattern_feasibility)
SM Sec I CSS catalog paper/css_catalog_tables.texresults/ilp_catalog.json
SM Sec II PBB catalog paper/pbb_catalog_tables.texresults/campaign7_publication_merged.jsonl, tests/generate_pbb_catalog_rows.py
SM Sec III Per-decoder comparison results/soak_test_publication.json, results/ensemble_verification_150k.json
SM Sec IV BP-OSD per-batch analysis tests/per_batch_distributions.py, results/per_batch_distributions.json
SM Sec V Distance tightening results/ensemble_verification_60k.jsonresults/ensemble_verification_150k.json
SM Sec VI Univariate families results/ilp_catalog.json (UV-pattern filter), tests/test_k_formula_verification.py
SM Sec VII Full threshold data results/threshold_simulation*.json (all 6 simulation files)
SM Sec VIII Utility stratification results/ilp_catalog.json, results/campaign7_publication_merged.jsonl
SM Sec IX Novelty assessment tests/check_all_equivalences.py, tests/paper_equivalence_check.py
SM Sec X Campaign details Evolution run directories (see Campaign Naming Convention above)
SM Sec XI Ablation methodology tests/ablation_k_only.py, tests/ablation_ga_generators.py, tests/verify_ga_distances.py, tests/milp_ga_codes.py
SM Sec XII MILP formulations evaluation/distance_milp.py (CSS Hamming weight; symplectic uses the per-qubit-OR encoding $w_j \geq x_j$, $w_j \geq z_j$, $w_j \leq x_j + z_j$ -- the convex hull of binary OR for ${0,1}$ variables, not the McCormick envelope)

Figures → Generation Scripts

All figure scripts are in paper/figures/. To regenerate:

cd paper/figures
uv run python <script>.py
Figure Script Data Sources
Fig 1: Evolution pipeline fig_pipeline.py (schematic, no data)
Fig 2: LER curves fig_ler_curves.py results/threshold_simulation.json, results/threshold_simulation_noncss.json, results/threshold_simulation_milp.json, results/threshold_simulation_depolarizing.json, results/threshold_simulation_360_12_20_depol.json, results/threshold_simulation_360_12_24_depol.json
Fig 3: Pareto frontier (k vs d, FOM vs n) plot_pareto_frontier.py results/ilp_catalog.json, results/campaign7_publication_merged.jsonl
Fig 4: Distance tightening fig_tightening.py results/soak_test_publication.json, results/ensemble_verification_150k.json
SM Fig: Fitness trajectories (C1-C3) fig_fitness_trajectory.py results/evolution_gemini3flash_seed42/, results/evolution_ensemble1_pop100/, results/evolution_ensemble2_pop1000/
SM Fig: Per-batch distributions tests/per_batch_distributions.py results/per_batch_distributions.json

Tables → Data Files

Main Paper

Table Content Data Source
Table I: Campaign summary 5 campaigns Campaign run directories
Table II: Best PBB codes per lattice 7 non-CSS codes results/campaign7_publication_merged.jsonl
Table III: Best CSS codes (MILP) 12 CSS codes with d_BP vs d_MILP results/ilp_catalog.json + results/soak_test_publication.json
Table IV: Comparison with prior art k-ratio at n=144,288,360 results/ilp_catalog.json
Table V: Practically relevant codes d ≥ 8, FOM ≥ 6 results/ilp_catalog.json + results/campaign7_publication_merged.jsonl
Table VI: CSS threshold data 8 CSS codes at p=1,3,5,8% results/threshold_simulation.json, results/threshold_simulation_milp.json
Table VII: Non-CSS threshold data 5 PBB + CSS baseline results/threshold_simulation_noncss.json, results/threshold_simulation_depolarizing.json, results/threshold_simulation_360_12_20_depol.json and results/threshold_simulation_360_12_24_depol.json (depol rows for the (30,6) lattice), results/threshold_simulation_missing.json ([[360,12,≤20]] X-only row), results/threshold_simulation_360_12_24.json ([[360,12,≤24]] X-only row)
Table VIII: Ablation Σ_k summary 8 arms results/ablation_k_only.json
Table IX: Per-lattice max k All arms × 8 lattices results/ablation_k_only.json, results/ablation_study.json

Supplemental Material

Table Content Data Source
Tables cat144/cat288/cat360: CSS catalog 225 representations → 97 distinct paper/css_catalog_tables.texresults/ilp_catalog.json
Tables pbb_cat*: PBB catalog 368 non-CSS codes paper/pbb_catalog_tables.texresults/campaign7_publication_merged.jsonl
Tab: Decoder rates: Per-decoder success by k/n Decoder comparison results/soak_test_publication.json, results/ensemble_verification_150k.json
Tab: Per-batch stats: 5 codes × 3 decoders Per-batch distributions results/per_batch_distributions.json
Tab: Extended verification: 1.5M trials 4 codes at 10x depth results/extended_verification_1500k.json, results/gross_verification_1500k.json
Tab: Tightening examples: 60k→150k Distance tightening results/soak_test_publication.json vs results/ensemble_verification_60k.json
Tab: Univariate families: HGP codes All UV codes with BP vs MILP results/ilp_catalog.json (UV-pattern filter)
Full threshold tables: CSS + non-CSS 12 error rates × all codes results/threshold_simulation*.json

Key Claims → Verification Scripts

Paper Claim Script Result File
Theorem 1: A=B ⟹ d=2 Paper Sec III.D Algebraic proof
[[288,24,12]] = 2 × gross code tests/verify_decomposition_288_24_12.py Tanner graph connectivity: 2 components
k = 8ℓ/3 (univariate) tests/test_k_formula_verification.py Verified on 1,680 parameter combinations
LC-CSS accounting for PBB codes evaluation/clifford_equivalence.py, tests/test_clifford_equivalence.py Algebraic + brute-force + feasibility checks on all 368 codes
BP-OSD 12x overestimate tests/bp_osd_intensive_search.py, tests/bp_osd_intensive_search_batch2.py results/bp_osd_attack.json
97 distinct CSS codes tests/check_all_equivalences.py BLISS Tanner-graph deduplication
368 distinct PBB codes scripts/verify_campaign7.py, scripts/verify_publication.py results/campaign7_dedup.jsonl (BLISS-deduplicated), results/campaign7_publication_merged.jsonl (publication-quality MILP+BP-OSD verification)
MILP validation on Bravyi baselines tests/verify_bravyi_codes.py results/bravyi_verified.json
MILP optimality audit (d=12, d=14) tests/milp_optimality_audit.py results/milp_optimality_audit.json
GA-G Σ_k=1037 but d≤2 tests/ablation_ga_generators.py results/ablation_ga_generators.json
GA MILP verification (FOM ≤ 4.0) tests/milp_ga_codes.py results/milp_ga_codes.json
Deep MILP: 33 corrections (22%) scripts/verify_deep_milp.py results/campaign7_deep_milp.jsonl
150k multi-decoder protocol tests/soak_test.py results/soak_test_publication.json
1.5M extended verification tests/extended_verification.py results/extended_verification_1500k.json
Code capacity: LER < p tests/threshold_simulation.py, tests/threshold_simulation_noncss.py results/threshold_simulation*.json

Campaign 4 (Mixed-Monomial) Verification

Script Purpose Output
tests/verify_campaign4_milp.py MILP on all C4 codes results/campaign4_milp_verified.jsonl
tests/reverify_top_incumbents.py Extended MILP on top 39 results/campaign4_reverified.jsonl
tests/bp_osd_intensive_search.py 300k BP-OSD on top 6 results/bp_osd_attack.json
tests/enumerate_constant_monomial.py Factored-product enumeration results/constant_monomial_enumeration.json

Campaign 5 (PBB Non-CSS) Verification

Script Purpose Output
scripts/verify_campaign7.py Initial BLISS dedup + verification results/campaign7_dedup.jsonl
scripts/verify_deep_milp.py Deep MILP (up to 14400s/logical, 60 workers) results/campaign7_deep_milp.jsonl
scripts/verify_publication.py Publication MILP+BP-OSD verification pass results/campaign7_publication.jsonl (merged into results/campaign7_publication_merged.jsonl with deep MILP evidence)
scripts/merge_deep_milp_results.py Merge deep MILP evidence into publication catalog results/campaign7_publication_merged.jsonl
tests/generate_pbb_catalog_rows.py Generate LaTeX catalog paper/pbb_catalog_tables.tex

Result Files Reference

File Content Paper Reference
results/ilp_catalog.json 97 CSS codes with MILP distances Tables III, IV, V, SM catalogs
results/campaign7_publication_merged.jsonl 368 PBB codes (publication-quality) Table II, SM PBB catalog
results/campaign7_dedup.jsonl BLISS-deduplicated PBB codes Sec VI.A
results/campaign7_deep_milp.jsonl Deep MILP on 149 PBB rows Sec V.C
results/soak_test_publication.json 9 C1 codes at 150k Table III, SM decoders
results/ensemble_verification_150k.json 145 ensemble codes at 150k Fig 4, SM tightening
results/ensemble_verification_60k.json 145 codes at 60k SM tightening
results/bravyi_verified.json 3 Bravyi baselines at 150k Sec V.B validation
results/extended_verification_1500k.json 3 codes at 1.5M SM extended verification
results/gross_verification_1500k.json Gross code at 1.5M SM extended verification
results/per_batch_distributions.json Per-batch decoder data SM per-batch analysis
results/milp_optimality_audit.json d=12/14 optimality audit Sec V.B
results/threshold_simulation.json CSS X-only simulation Table VI, SM full threshold
results/threshold_simulation_milp.json C4 codes simulation Table VI (C4 rows)
results/threshold_simulation_noncss.json PBB X-only simulation Table VII, SM non-CSS
results/threshold_simulation_depolarizing.json PBB depolarizing Table VII, SM depol
results/threshold_simulation_missing.json Gap-filling simulations Table VI
results/ablation_k_only.json 8-arm ablation Table VIII, IX
results/ablation_study.json Original 3-arm ablation App B
results/ablation_extended.json Equal-budget random + GA baselines (5 seeds × 8 lattices) SM ablation
results/ablation_ga_generators.json GA-G ablation SM ablation
results/milp_ga_codes.json MILP on 10 best GA codes (FOM ≤ 4.0) Sec VI.F, SM ablation
results/bp_osd_attack.json 300k BP-OSD attacks Sec VI.E
results/campaign4_milp_verified.jsonl C4 MILP results Table III (C4 rows)
results/campaign4_reverified.jsonl C4 extended MILP SM campaigns
results/bp_osd_attack_batch2.json Batch 2 BP-OSD attacks Sec VI.E
results/constant_monomial_enumeration.json Factored-product enumeration Sec VI.A (mixed-monomial)
results/threshold_simulation_360_12_20_depol.json [[360,12,≤20]] depolarizing sim (regen via tests/threshold_simulation_360_12_20_depol.py) Table VII
results/threshold_simulation_360_12_24_depol.json [[360,12,≤24]] depolarizing sim (regen via tests/threshold_simulation_360_12_24_depol.py) Table VII
results/threshold_simulation_360_12_24.json [[360,12,≤24]] X-only sim (regen via tests/threshold_simulation_360_12_24.py) Table VII
results/campaign7_report.md C5 summary report
results/benchmark_noncss.json Non-CSS benchmark SM utility

Tracking & Observability

Every main.py run is logged to results/runs/<run_id>/. W&B integration is built into the evolutionary search via --wandb.

Citation

If you use this software or its results, please cite the associated paper:

J. Cruz-Benito, A. W. Cross, D. Kremer, and I. Faro, "Evolutionary Discovery of Bivariate Bicycle Codes with LLM-Guided Search," arXiv:2606.02418 [quant-ph], 2026. https://arxiv.org/abs/2606.02418

BibTeX:

@misc{cruzbenito2026qcodediscovery,
  title         = {Evolutionary Discovery of Bivariate Bicycle Codes with LLM-Guided Search},
  author        = {Cruz-Benito, Juan and Cross, Andrew W. and Kremer, David and Faro, Ismael},
  year          = {2026},
  eprint        = {2606.02418},
  archivePrefix = {arXiv},
  primaryClass  = {quant-ph},
  doi           = {10.48550/arXiv.2606.02418},
  url           = {https://arxiv.org/abs/2606.02418}
}

A machine-readable CITATION.cff is also provided.

License

Apache License 2.0 — see LICENSE for details.

About

Discovering bivariate bicycle (BB) and perturbed bivariate bicycle (PBB) quantum LDPC codes via LLM-guided evolutionary search.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors