Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions .jules/runs/run-1/decision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Decision

## Option A (Recommended)
Add `if !args.is_object() { return args; }` or similar protection to `scan_arg_object` in `crates/tokmd-core/src/ffi/parse.rs`, or simply ensure that the returned value is still useful, but `unwrap_or` on `get` can be problematic if `args` is not an object. The root issue is `unwrap_or(args)`. `args.get("scan")` returns `None` if `args` is not an object, which then returns `args`. If `args` is an array or string, it gets passed along. However, `parse.rs` assumes the `args` passed to it are objects because of `run_json_inner` in `mod.rs`: `if !args.is_object() { return Err(...) }`.

Wait, in `run_json_inner`, `args` is validated to be an object:
```rust
let args: Value =
serde_json::from_str(args_json).map_err(|err| TokmdError::invalid_json(err.to_string()))?;
if !args.is_object() {
return Err(TokmdError::invalid_json(
"Top-level JSON value must be an object",
));
}
```
If `args` is an object, then `args.get("scan")` returns `Some(scan_value)`. If `scan` is not an object, `scan_arg_object` will return `scan_value`. Then `parse_string_array` will be called on `scan_value`. Wait, `parse_scan_settings` uses `scan_arg_object`:
```rust
pub(super) fn scan_arg_object(args: &Value) -> &Value {
args.get("scan").unwrap_or(args)
}
```
If `args` has `"scan": "string"`, then `scan_arg_object` returns `"string"`. Then `obj.get("paths")` on `"string"` returns `None`, so `parse_string_array` uses the default! Wait, `get` on a string value returns `None`. It silently ignores the invalid `"scan"` object and uses defaults for all fields!

Let's test this:

The tests confirm that passing `"scan": "not an object"` or `"lang": "not an object"` silently falls back to empty defaults, because `Value::get(field)` on a string returns `None`, the same as if the fields were missing! This is a gap in the trust boundary, as we silently ignore a malformed settings block instead of validating the FFI input strictly.

Let's look at `scan_arg_object` and `parse_*_settings`:

```rust
pub(super) fn scan_arg_object(args: &Value) -> &Value {
let scan_val = args.get("scan").unwrap_or(args);
// If it's explicitly "scan": "something_else", we just return it and it acts like an empty object since it's not an object.
}
```

Wait! It's even worse. If `"scan"` is provided but it's not an object, `args.get("scan").unwrap_or(args)` returns the `"scan"` value (a string, array, etc). Then `scan_val.get("paths")` returns `None`. So it silently ignores `"scan"` being a string/array, AND it ignores the rest of `args` because we are now calling `get` on a string/array instead of `args`!

Actually, it says: "Missing keys use sensible defaults. Invalid values return errors (no silent fallback to defaults)". But `{"scan": "not an object"}` is an invalid value that does NOT return an error, it silently falls back to defaults!

Let's check `parse_lang_settings`:
```rust
pub(super) fn parse_lang_settings(args: &Value) -> Result<LangSettings, TokmdError> {
let obj = args.get("lang").unwrap_or(args);
```
Same thing here.

And `parse_module_settings`, `parse_export_settings`, `parse_analyze_settings`, `parse_cockpit_settings`, `parse_diff_settings`.

Wait, in `parse_diff_settings`:
```rust
pub(super) fn parse_diff_settings(args: &Value) -> Result<DiffSettings, TokmdError> {
let obj = args.get("diff").unwrap_or(args);

let from = parse_required_string(obj, "from")?;
let to = parse_required_string(obj, "to")?;
```
If `"diff": "string"`, then `obj` is a string, `parse_required_string` will return `None`, but since it's required it will return an error because it expects a string. However, if there are optional properties, they would fall back to default.

For `scan`, `paths` is optional (defaults to `vec![".".to_string()]`).
For `lang`, `top` is optional (defaults to `0`).

Wait, how should we fix this?
Option A: Create a helper function in `parse.rs` like `get_arg_object<'a>(args: &'a Value, field: &str) -> Result<&'a Value, TokmdError>` which returns `args` if `field` is missing, but returns `Err` if `field` is present but not an object.
And use it in `settings_parse.rs`.

Let's check `parse.rs`:
```rust
pub(super) fn scan_arg_object(args: &Value) -> &Value {
args.get("scan").unwrap_or(args)
}
```

We can change it to return `Result<&Value, TokmdError>` and validate it!
Wait, if it returns `Result`, we need to change callers.

```rust
pub(super) fn get_settings_object<'a>(args: &'a Value, field: &str) -> Result<&'a Value, TokmdError> {
match args.get(field) {
None | Some(Value::Null) => Ok(args),
Some(Value::Object(_)) => Ok(args.get(field).unwrap()),
Some(_) => Err(TokmdError::invalid_field(field, "an object")),
}
}
```

Wait, `scan_arg_object(args: &Value) -> &Value` does not return a `Result`. Let's change `scan_arg_object` to return `Result<&Value, TokmdError>` or just replace it with `get_settings_object(args, "scan")?`.

Let's do this for `scan`, `lang`, `module`, `export`, `analyze`, `cockpit`, `diff`.

Let's try to update `parse.rs` and `settings_parse.rs`.
This fixes the boundary gap where `{"scan": []}` or `{"lang": "rust"}` would silently act like an empty object `{}` and drop all actual arguments inside `args` because `get()` on non-objects returns `None`.

I will:
1. Update `crates/tokmd-core/src/ffi/parse.rs` to replace `scan_arg_object` with `get_settings_object(args: &Value, field: &str) -> Result<&Value, TokmdError>`.
2. Add a test in `parse.rs` for `get_settings_object`.
3. Update `crates/tokmd-core/src/ffi/settings_parse.rs` to use `get_settings_object(args, "scan")?`, `get_settings_object(args, "lang")?`, etc.
4. Update `tests/ffi_bug_test.rs` to ensure the correct `invalid_settings` error is returned.
5. Review other files where `scan_arg_object` is used.

Are there other usages of `scan_arg_object`?

Perfect. Option A solves the gap correctly by hardening the trust boundary for FFI inputs.

## Option B
Keep the current behavior and document that `scan: string` drops all configuration. This is unacceptable for a security/trust boundary. FFI inputs are inherently untrusted and must be strictly validated.

Decision: Proceed with Option A.
16 changes: 16 additions & 0 deletions .jules/runs/run-1/envelope.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"prompt_id": "sentinel_boundaries",
"persona": "Sentinel",
"style": "Builder",
"primary_shard": "interfaces",
"allowed_paths": [
"crates/tokmd-config/**",
"crates/tokmd-core/**",
"crates/tokmd/**",
"docs/reference-cli.md",
"docs/tutorial.md",
"crates/tokmd/tests/**"
],
"gate_profile": "security-boundary",
"allowed_outcomes": ["patch", "learning_pr"]
}
62 changes: 62 additions & 0 deletions .jules/runs/run-1/pr_body.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
## 💡 Summary
Hardened the JSON FFI boundary against setting block evasion by rejecting invalid top-level component types. Replaced silent default fallbacks with strict error responses for invalid nested configuration objects (e.g. `{"scan": "not an object"}`).

## 🎯 Why
The previous parsing logic for sub-objects like `scan`, `lang`, `module`, and `export` used `args.get("scan").unwrap_or(args)`. If `scan` was passed as a non-object (e.g., a string or array), `Value::get` on a non-object returns `None`, silently acting like an empty object instead of returning an `invalid_settings` error. This broke the contract of strict configuration parsing across the FFI trust boundary and allowed bindings to bypass validation checks silently.

## 🔎 Evidence
Passing a malformed scan object returned a silent fallback to defaults instead of a parsing error:
```bash
let result = tokmd_core::ffi::run_json(
"lang",
r#"{"scan": "not an object", "paths": ["src"]}"#
);
// Previously resulted in success ("ok": true), ignoring "paths".
```

## 🧭 Options considered
### Option A (recommended)
- what it is: Replace `scan_arg_object` with a robust `get_settings_object` that strictly verifies the field is either missing, null, or a valid JSON object, returning a standard `TokmdError::invalid_field` otherwise.
- why it fits this repo and shard: It natively integrates into the existing `parse.rs` and `settings_parse.rs` workflow in `crates/tokmd-core`, preventing a trust-boundary leakage.
- trade-offs:
- Structure: Centralizes validation into one helper without changing the interface types.
- Velocity: Small change with high leverage.
- Governance: Ensures strict parsing contract is upheld for future API versions.

### Option B
- what it is: Treat strings/arrays as an empty configuration block and document the behavior.
- when to choose it instead: If the JSON input format isn't considered a trust boundary.
- trade-offs: Violates the core system tenet of rejecting malformed input quickly to avoid unpredictable execution state.

## ✅ Decision
Proceeded with Option A to strictly harden the JSON entrypoint boundary.

## 🧱 Changes made (SRP)
- `crates/tokmd-core/src/ffi/parse.rs`: Introduced `get_settings_object` to perform strict object validation.
- `crates/tokmd-core/src/ffi/settings_parse.rs`: Replaced all usages of `unwrap_or(args)` with the new strict parser.
- `crates/tokmd-core/tests/ffi_contract.rs`: Added E2E validation tests for invalid `scan` and `lang` object types.

## 🧪 Verification receipts
```text
cargo test -p tokmd-core --test ffi_contract (Passed)
cargo test -p tokmd-core (Passed)
cargo fmt -- --check (Passed)
cargo clippy -- -D warnings (Passed)
```

## 🧭 Telemetry
- Change shape: Hardening
- Blast radius: FFI API, IO boundary. No backward compatibility breaks for well-formed JSON.
- Risk class: Low, only rejects previously invalid JSON that would have failed silently.
- Rollback: Revert the PR.
- Gates run: `cargo test`, `cargo fmt -- --check`, `cargo clippy -- -D warnings`

## 🗂️ .jules artifacts
- `.jules/runs/run-1/envelope.json`
- `.jules/runs/run-1/decision.md`
- `.jules/runs/run-1/receipts.jsonl`
- `.jules/runs/run-1/result.json`
- `.jules/runs/run-1/pr_body.md`

## 🔜 Follow-ups
None.
4 changes: 4 additions & 0 deletions .jules/runs/run-1/receipts.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"command": "cargo test -p tokmd-core --test ffi_contract", "outcome": "ok"}
{"command": "cargo test -p tokmd-core", "outcome": "ok"}
{"command": "cargo fmt -- --check", "outcome": "ok"}
{"command": "cargo clippy -- -D warnings", "outcome": "ok"}
8 changes: 8 additions & 0 deletions .jules/runs/run-1/result.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"outcome": "patch",
"files_modified": [
"crates/tokmd-core/src/ffi/parse.rs",
"crates/tokmd-core/src/ffi/settings_parse.rs",
"crates/tokmd-core/tests/ffi_contract.rs"
]
}
28 changes: 21 additions & 7 deletions crates/tokmd-core/src/ffi/parse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,15 @@ use serde_json::Value;
use crate::error::TokmdError;
use crate::settings::{ChildIncludeMode, ChildrenMode, ConfigMode, ExportFormat, RedactMode};

pub(super) fn scan_arg_object(args: &Value) -> &Value {
args.get("scan").unwrap_or(args)
pub(super) fn get_settings_object<'a>(
args: &'a Value,
field: &str,
) -> Result<&'a Value, TokmdError> {
match args.get(field) {
None | Some(Value::Null) => Ok(args),
Some(v) if v.is_object() => Ok(v),
Some(_) => Err(TokmdError::invalid_field(field, "an object")),
}
}
Comment on lines +11 to 20

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If args is not a JSON object (e.g., if it is a string, array, or null), args.get(field) will return None, which currently causes get_settings_object to return Ok(args) (a non-object). To prevent silent fallback to defaults and ensure robust defensive programming, we should explicitly verify that args is a JSON object before attempting to retrieve the field.

pub(super) fn get_settings_object<'a>(
    args: &'a Value,
    field: &str,
) -> Result<&'a Value, TokmdError> {
    if !args.is_object() {
        return Err(TokmdError::invalid_json("Expected a JSON object"));
    }
    match args.get(field) {
        None | Some(Value::Null) => Ok(args),
        Some(v) if v.is_object() => Ok(v),
        Some(_) => Err(TokmdError::invalid_field(field, "an object")),
    }
}


/// Parse a boolean field strictly: missing/null -> default, non-bool -> error.
Expand Down Expand Up @@ -262,22 +269,29 @@ mod tests {
use crate::error::ErrorCode;
use serde_json::json;

// ---- scan_arg_object --------------------------------------------------
// ---- get_settings_object ----------------------------------------------

#[test]
fn scan_arg_object_returns_nested_when_present() {
fn get_settings_object_returns_nested_when_present() {
let args = json!({"scan": {"root": "."}, "other": 1});
let inner = scan_arg_object(&args);
let inner = get_settings_object(&args, "scan").unwrap();
assert_eq!(inner, &json!({"root": "."}));
}

#[test]
fn scan_arg_object_returns_args_when_missing() {
fn get_settings_object_returns_args_when_missing() {
let args = json!({"root": "."});
let inner = scan_arg_object(&args);
let inner = get_settings_object(&args, "scan").unwrap();
assert_eq!(inner, &args);
}

#[test]
fn get_settings_object_errors_when_not_object() {
let args = json!({"scan": "not-an-object"});
let err = get_settings_object(&args, "scan").unwrap_err();
assert_eq!(err.code, ErrorCode::InvalidSettings);
}

// ---- parse_bool -------------------------------------------------------

#[test]
Expand Down
22 changes: 11 additions & 11 deletions crates/tokmd-core/src/ffi/settings_parse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
use serde_json::Value;

use super::parse::{
parse_analyze_preset, parse_bool, parse_child_include_mode, parse_children_mode,
parse_config_mode, parse_effort_layer, parse_effort_model, parse_export_format,
parse_import_granularity, parse_optional_bool, parse_optional_redact_mode,
get_settings_object, parse_analyze_preset, parse_bool, parse_child_include_mode,
parse_children_mode, parse_config_mode, parse_effort_layer, parse_effort_model,
parse_export_format, parse_import_granularity, parse_optional_bool, parse_optional_redact_mode,
parse_optional_string, parse_optional_u64, parse_optional_usize, parse_redact_mode,
parse_required_string, parse_string, parse_string_array, parse_usize, scan_arg_object,
parse_required_string, parse_string, parse_string_array, parse_usize,
};
use crate::error::TokmdError;
use crate::settings::{
Expand All @@ -19,7 +19,7 @@ use crate::settings::{
};

pub(super) fn parse_scan_settings(args: &Value) -> Result<ScanSettings, TokmdError> {
let obj = scan_arg_object(args);
let obj = get_settings_object(args, "scan")?;
if obj.get("paths").is_some_and(Value::is_null) {
return Err(TokmdError::invalid_field("paths", "an array of strings"));
}
Expand All @@ -40,7 +40,7 @@ pub(super) fn parse_scan_settings(args: &Value) -> Result<ScanSettings, TokmdErr
}

pub(super) fn parse_lang_settings(args: &Value) -> Result<LangSettings, TokmdError> {
let obj = args.get("lang").unwrap_or(args);
let obj = get_settings_object(args, "lang")?;

Ok(LangSettings {
top: parse_usize(obj, "top", 0)?,
Expand All @@ -51,7 +51,7 @@ pub(super) fn parse_lang_settings(args: &Value) -> Result<LangSettings, TokmdErr
}

pub(super) fn parse_module_settings(args: &Value) -> Result<ModuleSettings, TokmdError> {
let obj = args.get("module").unwrap_or(args);
let obj = get_settings_object(args, "module")?;

Ok(ModuleSettings {
top: parse_usize(obj, "top", 0)?,
Expand All @@ -67,7 +67,7 @@ pub(super) fn parse_module_settings(args: &Value) -> Result<ModuleSettings, Tokm
}

pub(super) fn parse_export_settings(args: &Value) -> Result<ExportSettings, TokmdError> {
let obj = args.get("export").unwrap_or(args);
let obj = get_settings_object(args, "export")?;

Ok(ExportSettings {
format: parse_export_format(obj, ExportFormat::Jsonl)?,
Expand All @@ -88,7 +88,7 @@ pub(super) fn parse_export_settings(args: &Value) -> Result<ExportSettings, Tokm

#[allow(dead_code)]
pub(super) fn parse_analyze_settings(args: &Value) -> Result<AnalyzeSettings, TokmdError> {
let obj = args.get("analyze").unwrap_or(args);
let obj = get_settings_object(args, "analyze")?;

let effort_base_ref = parse_optional_string(obj, "effort_base_ref")?;
let effort_head_ref = parse_optional_string(obj, "effort_head_ref")?;
Expand Down Expand Up @@ -133,7 +133,7 @@ pub(super) fn parse_analyze_settings(args: &Value) -> Result<AnalyzeSettings, To
pub(super) fn parse_cockpit_settings(
args: &Value,
) -> Result<crate::settings::CockpitSettings, TokmdError> {
let obj = args.get("cockpit").unwrap_or(args);
let obj = get_settings_object(args, "cockpit")?;

Ok(crate::settings::CockpitSettings {
base: parse_string(obj, "base", "main")?,
Expand All @@ -144,7 +144,7 @@ pub(super) fn parse_cockpit_settings(
}

pub(super) fn parse_diff_settings(args: &Value) -> Result<DiffSettings, TokmdError> {
let obj = args.get("diff").unwrap_or(args);
let obj = get_settings_object(args, "diff")?;

let from = parse_required_string(obj, "from")?;
let to = parse_required_string(obj, "to")?;
Expand Down
19 changes: 19 additions & 0 deletions crates/tokmd-core/tests/ffi_contract.rs
Original file line number Diff line number Diff line change
Expand Up @@ -183,3 +183,22 @@ fn error_envelope_never_has_data_field() {
"error envelope must not have 'data'"
);
}
use serde_json::Value;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Imports should be placed at the top of the file rather than inline or in the middle of the file, to adhere to standard Rust style guidelines (idiomatic Rust / Rust Style Guide).


#[test]
fn test_scan_object_validation() {
let result =
tokmd_core::ffi::run_json("lang", r#"{"scan": "not an object", "paths": ["src"]}"#);
println!("{}", result);
let v: Value = serde_json::from_str(&result).unwrap();
assert_eq!(v["ok"], false);
assert_eq!(v["error"]["code"], "invalid_settings");
}

#[test]
fn test_lang_object_validation() {
let result = tokmd_core::ffi::run_json("lang", r#"{"lang": "not an object", "top": 10}"#);
let v: Value = serde_json::from_str(&result).unwrap();
assert_eq!(v["ok"], false);
assert_eq!(v["error"]["code"], "invalid_settings");
}
Loading