-
Notifications
You must be signed in to change notification settings - Fork 2
diff/patch: refactor #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
diff/patch: refactor #6
Conversation
…patch(); expose node_id and trace in Python; adjust CLI and tests
…l PatchTrace; remove JSON roundtrip and ID reconciliation; preserve scalar node IDs on in-place updates
…tore network-based CLI tests; remove dead code
…ead of expect/panic; update tests, CLI, and Python bindings
…kMLValue::Null; add node_id to Null; handle Null in to_json, traversal, and parsing; resolve conflicts
…cting Null directly; handle Null in trace traversal
| let new_obj = serde_json::json!({ | ||
| "id": "P:999", | ||
| "name": "Added Person", | ||
| "objecttype": "https://w3id.org/linkml/examples/personinfo/Person" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't seen this as part of a linkml schema, is this new in this package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, the personinfo toy schema is already there (and i think i took it from the python linkml). here i test patch tracing by adding another person to a list of persons and then seeing if making a diff then applying it gives me the node IDs of the nodes that changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry i meant the objecttype key being resolved by a url - presumably you are matching the uri to the class definition somewhere? catching up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, that's a slot that is designates_type: true, with range=uri. that was already possible in the python / pydantic classes too ...
src/runtime/src/lib.rs
Outdated
| // Stable node identifiers assigned to every LinkMLValue node | ||
| pub type NodeId = u64; | ||
|
|
||
| static NEXT_NODE_ID: AtomicU64 = AtomicU64::new(1); | ||
|
|
||
| fn new_node_id() -> NodeId { | ||
| NEXT_NODE_ID.fetch_add(1, Ordering::Relaxed) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this stable? it seems like it would be order dependent. how do these work? Is each node a whole object definition, or is a node each item/field/etc. within a definition?
there is some notion of identity in the linkml spec - https://linkml.io/linkml-model/latest/docs/specification/02instances/#identity-conditions - so it would probably be good to document how some parallel notion of identity relates to the notion of identity in linkml.
i can't quite tell how they are used here from the description in your original comment, could you help me understand how these are used when merging? are we talking about linkml schemas, or the data described by a schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey,
this has not really something to do with the identities in linkml. it is an internal node id for tracing mutations: when i mutate a linkmlvalue (patch) i want to get feedback on which "nodes" in my value tree changed. this is done by, as result of the patch algo, give me a list of updated/deleted/changed node IDs.
which also means it doesn't have to be stable.
the end goal is to do provenance tracking: if i apply a patch, and another patch, and another patch, i can tell exactly what attribute came from what patch, just like "git blame" would do for code.
i can also put the git blame functionality in this package but i tought it belonged in a separate package. however, i can't get around adding some patch tracking here or i have to duplicate the whole diff/patch logic.
greetings,
Frank
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add this as rustdoc, because indeed its confusing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, thanks for pointing out the document to me, i can do a better job of aligning with it, for instance by not assigning new node IDs when replacing an object/value/... by a value that is identical according to the docu. will try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and another question (sorry for the spam) should i rename my linkmlvalue to linkmlinstance so that it aligns with the docu on instances ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am still catching up on what this is doing - if this is an instance of a linkml class, as in "a person named frank" rather than a class definition, "the definition of a person" then yes i think instance would be the canonical term.
so this is a diff/patch for instances, not class definitions? that's pretty cool! so the ID here is monotonically increasing per instance? or is that global? i don't quite understand the algorithm of the diff here but would like to. the thing i am not quite understanding is that say we have version 1 of the instance, and then each node in its tree is given an integer ID here. say that's in the range 1-10, and initially have some lastName property "a". then we have some patch with a small number of updated nodes, say that's in the range 11-15, and that sets the lastName property to "b". how are they used to match updated nodes? like how is the integer node id used to determine updates to the lastName property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will rename (i guess in a separate PR, so we have to get this one thru first)
the goal is indeed to have a diff/patch (but also a "git blame" like functionality) for instances. the node ID has no meaning. the only thing you can do is see if it remained the same after applying a patch then you know the patch didn't impact that node. and the "patch" function will give you a list of node IDs it added and deleted.
the diff algorithm on linkml instances is "almost" but not quite a simple yaml/json diff. there are some differences, for instance if the identifier of an inlined object changed we consider this as a new object not just as a changed identifier, also we treat missing differently from explicit null so we can apply "sparse" patches without loosing data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i now made the "patch" function compliant with the instances doc you sent me: it will not carry out no-op patches, and it will determine whether a patch is a no-op based on the equality as laid out in the spec by default.
i still kept a flag for my use case where i want to treat explicit null differently from a missing value, but PatchOptions::default is now compliant (save for bugs)
…or single, list, and mapping inlined cases
…ity coverage; revert CLI offline toggle and run tests with network
…h API takes treat_missing_as_null; CLI flag; Python equals/patch args; tests for equality and no-op semantics; preserve scalar node_ids
…} with defaults; make no-op skipping configurable; wire CLI --ignore-noop/--treat-missing-as-null; update Python patch signature; update tests
|
argh sorry i ment to merge into my repo main not this one |
|
i rolled back the push to main ... now i need to get this PR to reopen. |
Uh oh!
There was an error while loading. Please reload this page.