-
Notifications
You must be signed in to change notification settings - Fork 2
sentinel: harden FFI strict parsing for trust boundary 🛡️ #2672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
EffortlessSteven
wants to merge
1
commit into
main
Choose a base branch
from
sentinel-boundaries-3245254242784963356
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| # Decision | ||
|
|
||
| ## Option A (Recommended) | ||
| Add `if !args.is_object() { return args; }` or similar protection to `scan_arg_object` in `crates/tokmd-core/src/ffi/parse.rs`, or simply ensure that the returned value is still useful, but `unwrap_or` on `get` can be problematic if `args` is not an object. The root issue is `unwrap_or(args)`. `args.get("scan")` returns `None` if `args` is not an object, which then returns `args`. If `args` is an array or string, it gets passed along. However, `parse.rs` assumes the `args` passed to it are objects because of `run_json_inner` in `mod.rs`: `if !args.is_object() { return Err(...) }`. | ||
|
|
||
| Wait, in `run_json_inner`, `args` is validated to be an object: | ||
| ```rust | ||
| let args: Value = | ||
| serde_json::from_str(args_json).map_err(|err| TokmdError::invalid_json(err.to_string()))?; | ||
| if !args.is_object() { | ||
| return Err(TokmdError::invalid_json( | ||
| "Top-level JSON value must be an object", | ||
| )); | ||
| } | ||
| ``` | ||
| If `args` is an object, then `args.get("scan")` returns `Some(scan_value)`. If `scan` is not an object, `scan_arg_object` will return `scan_value`. Then `parse_string_array` will be called on `scan_value`. Wait, `parse_scan_settings` uses `scan_arg_object`: | ||
| ```rust | ||
| pub(super) fn scan_arg_object(args: &Value) -> &Value { | ||
| args.get("scan").unwrap_or(args) | ||
| } | ||
| ``` | ||
| If `args` has `"scan": "string"`, then `scan_arg_object` returns `"string"`. Then `obj.get("paths")` on `"string"` returns `None`, so `parse_string_array` uses the default! Wait, `get` on a string value returns `None`. It silently ignores the invalid `"scan"` object and uses defaults for all fields! | ||
|
|
||
| Let's test this: | ||
|
|
||
| The tests confirm that passing `"scan": "not an object"` or `"lang": "not an object"` silently falls back to empty defaults, because `Value::get(field)` on a string returns `None`, the same as if the fields were missing! This is a gap in the trust boundary, as we silently ignore a malformed settings block instead of validating the FFI input strictly. | ||
|
|
||
| Let's look at `scan_arg_object` and `parse_*_settings`: | ||
|
|
||
| ```rust | ||
| pub(super) fn scan_arg_object(args: &Value) -> &Value { | ||
| let scan_val = args.get("scan").unwrap_or(args); | ||
| // If it's explicitly "scan": "something_else", we just return it and it acts like an empty object since it's not an object. | ||
| } | ||
| ``` | ||
|
|
||
| Wait! It's even worse. If `"scan"` is provided but it's not an object, `args.get("scan").unwrap_or(args)` returns the `"scan"` value (a string, array, etc). Then `scan_val.get("paths")` returns `None`. So it silently ignores `"scan"` being a string/array, AND it ignores the rest of `args` because we are now calling `get` on a string/array instead of `args`! | ||
|
|
||
| Actually, it says: "Missing keys use sensible defaults. Invalid values return errors (no silent fallback to defaults)". But `{"scan": "not an object"}` is an invalid value that does NOT return an error, it silently falls back to defaults! | ||
|
|
||
| Let's check `parse_lang_settings`: | ||
| ```rust | ||
| pub(super) fn parse_lang_settings(args: &Value) -> Result<LangSettings, TokmdError> { | ||
| let obj = args.get("lang").unwrap_or(args); | ||
| ``` | ||
| Same thing here. | ||
|
|
||
| And `parse_module_settings`, `parse_export_settings`, `parse_analyze_settings`, `parse_cockpit_settings`, `parse_diff_settings`. | ||
|
|
||
| Wait, in `parse_diff_settings`: | ||
| ```rust | ||
| pub(super) fn parse_diff_settings(args: &Value) -> Result<DiffSettings, TokmdError> { | ||
| let obj = args.get("diff").unwrap_or(args); | ||
|
|
||
| let from = parse_required_string(obj, "from")?; | ||
| let to = parse_required_string(obj, "to")?; | ||
| ``` | ||
| If `"diff": "string"`, then `obj` is a string, `parse_required_string` will return `None`, but since it's required it will return an error because it expects a string. However, if there are optional properties, they would fall back to default. | ||
|
|
||
| For `scan`, `paths` is optional (defaults to `vec![".".to_string()]`). | ||
| For `lang`, `top` is optional (defaults to `0`). | ||
|
|
||
| Wait, how should we fix this? | ||
| Option A: Create a helper function in `parse.rs` like `get_arg_object<'a>(args: &'a Value, field: &str) -> Result<&'a Value, TokmdError>` which returns `args` if `field` is missing, but returns `Err` if `field` is present but not an object. | ||
| And use it in `settings_parse.rs`. | ||
|
|
||
| Let's check `parse.rs`: | ||
| ```rust | ||
| pub(super) fn scan_arg_object(args: &Value) -> &Value { | ||
| args.get("scan").unwrap_or(args) | ||
| } | ||
| ``` | ||
|
|
||
| We can change it to return `Result<&Value, TokmdError>` and validate it! | ||
| Wait, if it returns `Result`, we need to change callers. | ||
|
|
||
| ```rust | ||
| pub(super) fn get_settings_object<'a>(args: &'a Value, field: &str) -> Result<&'a Value, TokmdError> { | ||
| match args.get(field) { | ||
| None | Some(Value::Null) => Ok(args), | ||
| Some(Value::Object(_)) => Ok(args.get(field).unwrap()), | ||
| Some(_) => Err(TokmdError::invalid_field(field, "an object")), | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Wait, `scan_arg_object(args: &Value) -> &Value` does not return a `Result`. Let's change `scan_arg_object` to return `Result<&Value, TokmdError>` or just replace it with `get_settings_object(args, "scan")?`. | ||
|
|
||
| Let's do this for `scan`, `lang`, `module`, `export`, `analyze`, `cockpit`, `diff`. | ||
|
|
||
| Let's try to update `parse.rs` and `settings_parse.rs`. | ||
| This fixes the boundary gap where `{"scan": []}` or `{"lang": "rust"}` would silently act like an empty object `{}` and drop all actual arguments inside `args` because `get()` on non-objects returns `None`. | ||
|
|
||
| I will: | ||
| 1. Update `crates/tokmd-core/src/ffi/parse.rs` to replace `scan_arg_object` with `get_settings_object(args: &Value, field: &str) -> Result<&Value, TokmdError>`. | ||
| 2. Add a test in `parse.rs` for `get_settings_object`. | ||
| 3. Update `crates/tokmd-core/src/ffi/settings_parse.rs` to use `get_settings_object(args, "scan")?`, `get_settings_object(args, "lang")?`, etc. | ||
| 4. Update `tests/ffi_bug_test.rs` to ensure the correct `invalid_settings` error is returned. | ||
| 5. Review other files where `scan_arg_object` is used. | ||
|
|
||
| Are there other usages of `scan_arg_object`? | ||
|
|
||
| Perfect. Option A solves the gap correctly by hardening the trust boundary for FFI inputs. | ||
|
|
||
| ## Option B | ||
| Keep the current behavior and document that `scan: string` drops all configuration. This is unacceptable for a security/trust boundary. FFI inputs are inherently untrusted and must be strictly validated. | ||
|
|
||
| Decision: Proceed with Option A. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| { | ||
| "prompt_id": "sentinel_boundaries", | ||
| "persona": "Sentinel", | ||
| "style": "Builder", | ||
| "primary_shard": "interfaces", | ||
| "allowed_paths": [ | ||
| "crates/tokmd-config/**", | ||
| "crates/tokmd-core/**", | ||
| "crates/tokmd/**", | ||
| "docs/reference-cli.md", | ||
| "docs/tutorial.md", | ||
| "crates/tokmd/tests/**" | ||
| ], | ||
| "gate_profile": "security-boundary", | ||
| "allowed_outcomes": ["patch", "learning_pr"] | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| ## 💡 Summary | ||
| Hardened the JSON FFI boundary against setting block evasion by rejecting invalid top-level component types. Replaced silent default fallbacks with strict error responses for invalid nested configuration objects (e.g. `{"scan": "not an object"}`). | ||
|
|
||
| ## 🎯 Why | ||
| The previous parsing logic for sub-objects like `scan`, `lang`, `module`, and `export` used `args.get("scan").unwrap_or(args)`. If `scan` was passed as a non-object (e.g., a string or array), `Value::get` on a non-object returns `None`, silently acting like an empty object instead of returning an `invalid_settings` error. This broke the contract of strict configuration parsing across the FFI trust boundary and allowed bindings to bypass validation checks silently. | ||
|
|
||
| ## 🔎 Evidence | ||
| Passing a malformed scan object returned a silent fallback to defaults instead of a parsing error: | ||
| ```bash | ||
| let result = tokmd_core::ffi::run_json( | ||
| "lang", | ||
| r#"{"scan": "not an object", "paths": ["src"]}"# | ||
| ); | ||
| // Previously resulted in success ("ok": true), ignoring "paths". | ||
| ``` | ||
|
|
||
| ## 🧭 Options considered | ||
| ### Option A (recommended) | ||
| - what it is: Replace `scan_arg_object` with a robust `get_settings_object` that strictly verifies the field is either missing, null, or a valid JSON object, returning a standard `TokmdError::invalid_field` otherwise. | ||
| - why it fits this repo and shard: It natively integrates into the existing `parse.rs` and `settings_parse.rs` workflow in `crates/tokmd-core`, preventing a trust-boundary leakage. | ||
| - trade-offs: | ||
| - Structure: Centralizes validation into one helper without changing the interface types. | ||
| - Velocity: Small change with high leverage. | ||
| - Governance: Ensures strict parsing contract is upheld for future API versions. | ||
|
|
||
| ### Option B | ||
| - what it is: Treat strings/arrays as an empty configuration block and document the behavior. | ||
| - when to choose it instead: If the JSON input format isn't considered a trust boundary. | ||
| - trade-offs: Violates the core system tenet of rejecting malformed input quickly to avoid unpredictable execution state. | ||
|
|
||
| ## ✅ Decision | ||
| Proceeded with Option A to strictly harden the JSON entrypoint boundary. | ||
|
|
||
| ## 🧱 Changes made (SRP) | ||
| - `crates/tokmd-core/src/ffi/parse.rs`: Introduced `get_settings_object` to perform strict object validation. | ||
| - `crates/tokmd-core/src/ffi/settings_parse.rs`: Replaced all usages of `unwrap_or(args)` with the new strict parser. | ||
| - `crates/tokmd-core/tests/ffi_contract.rs`: Added E2E validation tests for invalid `scan` and `lang` object types. | ||
|
|
||
| ## 🧪 Verification receipts | ||
| ```text | ||
| cargo test -p tokmd-core --test ffi_contract (Passed) | ||
| cargo test -p tokmd-core (Passed) | ||
| cargo fmt -- --check (Passed) | ||
| cargo clippy -- -D warnings (Passed) | ||
| ``` | ||
|
|
||
| ## 🧭 Telemetry | ||
| - Change shape: Hardening | ||
| - Blast radius: FFI API, IO boundary. No backward compatibility breaks for well-formed JSON. | ||
| - Risk class: Low, only rejects previously invalid JSON that would have failed silently. | ||
| - Rollback: Revert the PR. | ||
| - Gates run: `cargo test`, `cargo fmt -- --check`, `cargo clippy -- -D warnings` | ||
|
|
||
| ## 🗂️ .jules artifacts | ||
| - `.jules/runs/run-1/envelope.json` | ||
| - `.jules/runs/run-1/decision.md` | ||
| - `.jules/runs/run-1/receipts.jsonl` | ||
| - `.jules/runs/run-1/result.json` | ||
| - `.jules/runs/run-1/pr_body.md` | ||
|
|
||
| ## 🔜 Follow-ups | ||
| None. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| {"command": "cargo test -p tokmd-core --test ffi_contract", "outcome": "ok"} | ||
| {"command": "cargo test -p tokmd-core", "outcome": "ok"} | ||
| {"command": "cargo fmt -- --check", "outcome": "ok"} | ||
| {"command": "cargo clippy -- -D warnings", "outcome": "ok"} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| { | ||
| "outcome": "patch", | ||
| "files_modified": [ | ||
| "crates/tokmd-core/src/ffi/parse.rs", | ||
| "crates/tokmd-core/src/ffi/settings_parse.rs", | ||
| "crates/tokmd-core/tests/ffi_contract.rs" | ||
| ] | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -183,3 +183,22 @@ fn error_envelope_never_has_data_field() { | |
| "error envelope must not have 'data'" | ||
| ); | ||
| } | ||
| use serde_json::Value; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| #[test] | ||
| fn test_scan_object_validation() { | ||
| let result = | ||
| tokmd_core::ffi::run_json("lang", r#"{"scan": "not an object", "paths": ["src"]}"#); | ||
| println!("{}", result); | ||
| let v: Value = serde_json::from_str(&result).unwrap(); | ||
| assert_eq!(v["ok"], false); | ||
| assert_eq!(v["error"]["code"], "invalid_settings"); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_lang_object_validation() { | ||
| let result = tokmd_core::ffi::run_json("lang", r#"{"lang": "not an object", "top": 10}"#); | ||
| let v: Value = serde_json::from_str(&result).unwrap(); | ||
| assert_eq!(v["ok"], false); | ||
| assert_eq!(v["error"]["code"], "invalid_settings"); | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
argsis not a JSON object (e.g., if it is a string, array, or null),args.get(field)will returnNone, which currently causesget_settings_objectto returnOk(args)(a non-object). To prevent silent fallback to defaults and ensure robust defensive programming, we should explicitly verify thatargsis a JSON object before attempting to retrieve the field.