Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 59 additions & 21 deletions crates/arco-cli/src/inspect.rs
Original file line number Diff line number Diff line change
Expand Up @@ -190,11 +190,16 @@ pub fn build_inspect_payload(entrypoint: &Path, program: &SemanticProgram) -> In
let set_records = build_set_records(program, filtered_set_names);

// Build a lookup from set name → size for bindings
let set_sizes: BTreeMap<&str, usize> = program
let mut set_sizes: BTreeMap<&str, usize> = program
.set_registry
.iter()
.map(|(name, resolved)| (name.as_str(), resolved.values.len()))
.map(|(name, resolved)| (name.as_str(), resolved_set_cardinality(resolved)))
.collect();
for (alias, canonical) in &program.set_aliases {
if let Some(size) = set_sizes.get(canonical.as_str()).copied() {
set_sizes.entry(alias.as_str()).or_insert(size);
}
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing alias names as additional keys in set_sizes can produce ambiguous results if an alias collides with a real set name (e.g., a set named i and an index alias i). In that case, or_insert silently keeps the original value and the alias mapping is ignored, which can make inspect sizing inconsistent/hard to reason about. Prefer keeping set_sizes keyed only by canonical set names and performing canonicalization at lookup time (e.g., a small helper that resolves domain via program.set_aliases and then reads from set_sizes).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pesap this seems to be concerning, but that is a user problem. Probably we want to throw error messages when that happens

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot based on this feedback, answer to this comment with a proposed test that we can integrate in arco that fails in that situation. Do not make the code, just write here the test that you suggest to see this problem

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a proposed test that would expose the alias-vs-set-name collision in set_sizes:

#[test]
fn inspect_alias_does_not_shadow_real_set_with_same_name() {
    // Build a minimal program where:
    //   - a real set called "i" exists with, say, 3 elements
    //   - a *different* canonical set (e.g. "nodes") has an alias that is also named "i"
    //     (i.e. `program.set_aliases["i"] = "nodes"` where "nodes" has 5 elements)
    //
    // Because `or_insert` keeps the first value, the result depends on insertion
    // order: whichever key "i" wins, the other mapping is silently lost.
    //
    // The test asserts that the size reported for a variable/parameter whose
    // declared domain is the real set "i" is exactly 3, not 5.
    //
    // Without the fix (canonical lookup at read time rather than at insert time),
    // the test fails whenever the alias "i" → "nodes" is inserted first and the
    // real set "i" is overwritten, or whenever a variable whose domain is "nodes"
    // gets the wrong size because the "i" entry in set_sizes already points at
    // the real set.

    let input_kdl = r#"
        sets {
            i 1 2 3
            nodes {
                alias "i"   // collision: alias name == existing set name
                values a b c d e
            }
        }
        variables {
            x domain="i"     // should have size 3
        }
    "#;

    let output = run_inspect_json(input_kdl);
    let variable = find_variable(&output, "x");
    assert_eq!(variable["binding_size"], 3,
        "size of domain 'i' (the real set, 3 elements) must not be confused \
         with alias 'i' → 'nodes' (5 elements)");
}

The key assertion is that x, whose domain is the real set i (3 elements), reports a binding size of 3 and is not silently overridden by the alias i → nodes (5 elements). This test will fail with the current or_insert-into-set_sizes approach whenever insertion order makes the alias entry win.


// Build variable records
let variable_records = build_variable_records(program, &set_sizes);
Expand Down Expand Up @@ -262,7 +267,7 @@ fn build_set_records(program: &SemanticProgram, filtered_set_names: &[&str]) ->
id,
name: name.clone(),
alias,
size: resolved.values.len(),
size: resolved_set_cardinality(resolved),
dtype,
subset_of,
});
Expand All @@ -272,23 +277,17 @@ fn build_set_records(program: &SemanticProgram, filtered_set_names: &[&str]) ->
}

fn find_set_alias(program: &SemanticProgram, set_name: &str) -> Option<String> {
// Check variable families for index aliases that map to this set
for family in &program.variable_families {
for (index, domain) in &family.index_domains {
if domain == set_name && index != set_name {
return Some(index.clone());
}
}
}
// Check constraint generation bindings
for constraint in &program.active_constraints {
for binding in &constraint.generation_bindings {
if binding.domain == set_name && binding.variable != set_name {
return Some(binding.variable.clone());
}
}
}
None
program
.set_aliases
.iter()
.find_map(|(alias, canonical)| (canonical == set_name).then(|| alias.clone()))
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_set_alias returns only a single alias for a canonical set name, but set_aliases can naturally contain multiple aliases pointing at the same canonical set. Returning an arbitrary single alias can make the inspect output misleading. Consider either (a) changing the record shape to expose multiple aliases (e.g., aliases: Vec<String>), or (b) returning None (or a deterministic/explicit choice) when multiple aliases exist so the output doesn’t imply there is only one.

Suggested change
program
.set_aliases
.iter()
.find_map(|(alias, canonical)| (canonical == set_name).then(|| alias.clone()))
let mut aliases = program
.set_aliases
.iter()
.filter_map(|(alias, canonical)| (canonical == set_name).then(|| alias.clone()));
let alias = aliases.next()?;
if aliases.next().is_some() {
None
} else {
Some(alias)
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot based on the feedback here (#203 (comment)), answer this comment with the code of a proposed test with a case that fails because of the problem you are pointing out. Do not write code just put the code in a comment answering this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a proposed test that would fail because find_set_alias silently returns only the alphabetically-first alias when multiple aliases exist, and the SetRecord.alias field hides the others:

#[test]
fn inspect_set_record_exposes_all_aliases_not_just_one() {
    // "lines" is the canonical set; it has two aliases: "i" and "j".
    // With the current single-Option<String> field, only one alias is
    // surfaced in the JSON output — the other is silently discarded.
    //
    // The test asserts that the set record for "lines" contains BOTH
    // aliases in its output.  With the current code it will fail because
    // `find_set_alias` returns (alphabetically) "i" and drops "j" entirely.
    let output = run_arco_inspect_json(concat!(
        "sets {\n",
        "    lines {\n",
        "        alias \"i\"\n",
        "        alias \"j\"\n",
        "        values a b c\n",
        "    }\n",
        "}\n",
    ));

    // Locate the "lines" set in the output
    let sets = output["sets"].as_array().expect("sets array");
    let lines_set = sets
        .iter()
        .find(|s| s["name"] == "lines")
        .expect("lines set not found");

    // Both aliases must appear; with the current code only "i" is present
    let aliases = lines_set["aliases"]
        .as_array()
        .expect("aliases should be an array, not a single optional string");

    let alias_values: Vec<&str> = aliases
        .iter()
        .map(|a| a.as_str().unwrap())
        .collect();

    assert!(alias_values.contains(&"i"), "alias 'i' missing");
    assert!(alias_values.contains(&"j"), "alias 'j' missing");
}

The test fails in two ways today:

  1. SetRecord.alias is Option<String>, not a Vec<String>, so the JSON key is "alias" (singular), not "aliases" (plural) — the .as_array() call panics.
  2. Even if the field were changed to Vec<String>, the current find_set_alias implementation would only populate it with "i" (the alphabetically-first alias in the BTreeMap), dropping "j".

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pesap opinions?

}

fn resolved_set_cardinality(resolved: &arco_kdl::semantic::ResolvedSet) -> usize {
resolved
.tuple_rows
.as_ref()
.map_or(resolved.values.len(), Vec::len)
}

fn infer_set_dtype(resolved: &arco_kdl::semantic::ResolvedSet) -> String {
Expand Down Expand Up @@ -716,6 +715,7 @@ fn build_constraint_records(
build_constraint_record(
id,
constraint,
program,
variable_targets,
parameter_targets,
set_sizes,
Expand All @@ -727,12 +727,13 @@ fn build_constraint_records(
fn build_constraint_record(
id: usize,
constraint: &ResolvedConstraint,
program: &SemanticProgram,
variable_targets: &BTreeSet<String>,
parameter_targets: &BTreeSet<String>,
set_sizes: &BTreeMap<&str, usize>,
) -> ConstraintRecord {
let scope = build_constraint_scope(constraint, set_sizes);
let instances = scope.iter().map(|s| s.size).product::<usize>().max(1);
let instances = estimate_constraint_instances(program, constraint, set_sizes);
Comment thread
SebastianManriqueM marked this conversation as resolved.

let symbol_to_set: BTreeMap<&str, &str> = constraint
.generation_bindings
Expand Down Expand Up @@ -814,6 +815,43 @@ fn build_constraint_scope(
.collect()
}

fn estimate_constraint_instances(
program: &SemanticProgram,
constraint: &ResolvedConstraint,
set_sizes: &BTreeMap<&str, usize>,
) -> usize {
let mut instances = 1usize;
let mut seen_tuple_domains = BTreeSet::new();

for binding in &constraint.generation_bindings {
let canonical_domain = program
.set_aliases
.get(binding.domain.as_str())
.map_or(binding.domain.as_str(), String::as_str);

if let Some(resolved_set) = program.set_registry.get(canonical_domain) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anyway to make this block easier to read?, is probably a rust thing but it is supercomplicated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified in commit 390c41a. The three nested ifs are now a single boolean:

let should_count = resolved_set.tuple_rows.is_none()
    || seen_tuple_domains.insert(canonical_domain);
if should_count {
    instances = instances.saturating_mul(resolved_set_cardinality(resolved_set));
}

Plain English: count the set's cardinality unless it is a tuple-domain that has already been counted.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pesap this looks better to me

// Non-tuple sets are always counted; tuple-domain sets are counted only
// once per canonical domain to avoid Cartesian overcounting when
// multiple bindings share the same tuple domain.
let should_count = resolved_set.tuple_rows.is_none()
|| seen_tuple_domains.insert(canonical_domain);
if should_count {
instances = instances.saturating_mul(resolved_set_cardinality(resolved_set));
}
continue;
}

let size = set_sizes
.get(canonical_domain)
.copied()
.or_else(|| set_sizes.get(binding.domain.as_str()).copied())
.unwrap_or(0);
instances = instances.saturating_mul(size);
}

instances.max(1)
}

fn build_term_refs(
expr: &Expr,
variable_targets: &BTreeSet<String>,
Expand Down
50 changes: 50 additions & 0 deletions crates/arco-cli/tests/example_cli_commands.rs
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,28 @@ fn run_compact_nodal_allocation_tracer_bullet_succeeds() {

let inspect_payload: Value =
serde_json::from_slice(&inspect_output.stdout).expect("valid inspect json");

let inspect_sets = inspect_payload["set"].as_array().expect("set array");
let feasible_links = inspect_sets
.iter()
.find(|record| record["name"] == "feasible_links")
.expect("feasible_links set record");
assert_eq!(
feasible_links["size"],
Value::from(4),
"feasible_links should report tuple-row cardinality"
);

let priority_links = inspect_sets
.iter()
.find(|record| record["name"] == "priority_links")
.expect("priority_links set record");
assert_eq!(
priority_links["size"],
Value::from(2),
"priority_links should report filtered tuple-row cardinality"
);

let variables = inspect_payload["variable"]
.as_array()
.expect("variable array");
Expand All @@ -213,8 +235,36 @@ fn run_compact_nodal_allocation_tracer_bullet_succeeds() {
);
for binding in sets {
assert_eq!(binding["name"], "feasible_links");
assert_eq!(
binding["size"],
Value::from(4),
"tuple-domain binding should use tuple-row cardinality"
);
}

let constraints = inspect_payload["constraint"]
.as_array()
.expect("constraint array");
let dispatch_capacity = constraints
.iter()
.find(|record| record["name"] == "dispatch_capacity")
.expect("dispatch_capacity constraint record");
assert_eq!(
dispatch_capacity["instances"],
Value::from(4),
"tuple-domain constraint instances should track tuple rows, not Cartesian powers"
);

let priority_floor = constraints
.iter()
.find(|record| record["name"] == "priority_floor")
.expect("priority_floor constraint record");
assert_eq!(
priority_floor["instances"],
Value::from(2),
"filtered tuple-domain constraint instances should track tuple rows"
);

let run_output = run_cli(&["run", model, "--compact"]);
assert!(
run_output.status.success(),
Expand Down
4 changes: 2 additions & 2 deletions docs/appendix-a-ergonomic-profile.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ data branch_data from="data/branches.csv" {
> index i { in tech }
> index g { in generators }
> index b { in buses }
> where { feasible > 0 }
> filter { feasible > 0 }
> }
> }
>
Expand All @@ -92,7 +92,7 @@ data branch_data from="data/branches.csv" {
> index i { in tech }
> index g { in generators }
> index b { in buses }
> where { area == "south" }
> filter { area == "south" }
> }
>
> model nodal_allocation {
Expand Down
4 changes: 2 additions & 2 deletions examples/nodal-allocation/input.kdl
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ data "links" source="data/links.csv" {
index "i" { in "tech" }
index "g" { in "generators" }
index "b" { in "buses" }
where { feasible > 0 }
filter { feasible > 0 }
}

param "capacity_mw" {
Expand All @@ -36,7 +36,7 @@ set "priority_links" {
index "i" { in "tech" }
index "g" { in "generators" }
index "b" { in "buses" }
where { area == "south" }
filter { area == "south" }
}

model "NodalAllocation" {
Expand Down
Loading