-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated schema #486
base: main
Are you sure you want to change the base?
updated schema #486
Conversation
529ba54
to
d4e1d5c
Compare
How do you specify that a property is "nullable" (for example a property that is allowed to be a date-time or null)? |
@LHolten |
b5e8433
to
b6c96a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @zkat !
Sorry this is a super erratic and incomplete review of things, but I figured I should share what I had so far before coming back to it! Lots of the comments are likely just me not being super familiar with schemas or missing something, but hopefully a few are interesting?
My plan for tomorrow is to write some schemas for some of my KDL files, bending this syntax if needed to get something that feels clean and minimal, then I'll hopefully have an idea of where things can be tweaked!
The only other specific thing I can think is related to #492 somewhat — I personally think arbitrary links around documents might be a little wacky, and all of my files where I've been "inheriting" properties have looked like:
residues {
types {
Monosaccharide {
functional-group "Hydroxyl" at="Reducing End"
functional-group "Hydroxyl" at="Nonreducing End"
functional-group "Hydroxyl" at="6-Position"
functional-group "Acetyl" at="Secondary Amide"
}
AminoAcid {
functional-group "Amino" at="N-Terminal"
functional-group "Carboxyl" at="C-Terminal"
}
}
Monosaccharide "g" "N-Acetylglucosamine" {
composition "C8H15NO6"
}
Monosaccharide "m" "N-Acetylmuramic Acid" {
composition "C11H19NO8"
functional-group "Carboxyl" at="Lactyl Ether"
}
Monosaccharide "x" "Unknown Monosaccharide" {
composition null
}
AminoAcid "A" "Alanine" {
composition "C3H7NO2"
}
AminoAcid "B" "Diaminobutyric Acid" {
composition "C4H10N2O2"
functional-group "Amino" at="Sidechain"
}
AminoAcid "C" "Cysteine" {
composition "C3H7NO2S"
}
AminoAcid "D" "Aspartic Acid" {
composition "C4H7NO4"
functional-group "Carboxyl" at="Sidechain"
}
AminoAcid "E" "Glutamic Acid" {
composition "C5H9NO4"
functional-group "Carboxyl" at="Sidechain"
}
AminoAcid "F" "Phenylalanine" {
composition "C9H11NO2"
}
AminoAcid "G" "Glycine" {
composition "C2H5NO2"
}
AminoAcid "H" "Histidine" {
composition "C6H9N3O2"
}
AminoAcid "I" "Isoleucine" {
composition "C6H13NO2"
}
AminoAcid "J" "Diaminopimelic Acid" {
composition "C7H14N2O4"
functional-group "Amino" at="Sidechain"
functional-group "Carboxyl" at="Sidechain"
}
// SNIP ...
Where I have some node "hard-coded" to store some templates that are used in the rest of the document, so the "variables" are all in one place and you know where to scroll to to find that information. Now, I can see how that might be tricky when you're writing something like a schema for the schema, and you need a lot more linking? If that's the case, then maybe giving relative or absolute node "paths" in something like KQL could be good? I'll have to think a bit more on it, but I think tagging one node, then linking to that tag from anywhere could be a bit confusing and tricky to maintain / keep the ID's always in sync.
Sorry, I've rambled far far too much here, but as a concluding thought, a bit like the unicode quotes "go big or go home" discussion, I'd lean on the side of keeping things more general, minimal, and future-proof personally — being a bit cautious of feature-creep. Things like rel=self|documentation
or author ORCIDs being in the specification feel like things that be over-specifying? Though that's just my gut reaction!
I'll have much more practical and hopefully helpful stuff to share after I try writing a couple of schemas myself! Sorry in the meantime and hopefully there is a nice regex for finding the useful needle in this haystack (like that insane semver one) 😅
@@ -0,0 +1,172 @@ | |||
@kdl:schema "https://github.com/kdl-org/kdl/blob/main/schema/kdl-schema.kdl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where will this @kdl:schema
be used? If it doesn't have to be a reachable URL, then do you still think that it's valuable to have as a top-level key like this?
Do you envision KDL documents adding this to their top-level? Does that mean users are expected to handle this node when parsing with kdl-rs
? Or should KDL parsers remove this node from the AST before passing it on to user code? If so, then I think that would require a change to the KDL spec?
I'm not opposed to this, though I'd personally probably keep them separate and give any sort of validator two files (the data and the schema it follows) — unless I'm misunderstanding though, I think this might be better as a field in metadata
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to JSONSchema's $schema
, which points to a valid (relative) URL.
This is not required, e.g. I can edit my angular.json
and get LSP support because the Angular VSCode plugin interprets any file called angular.json
as an Angular configuration document.
But, for schemas that don't come with editor plugins or that don't have a filename convention, it is very handy that you can add that $schema
property and the JSON plugin will kick in and provide LSP support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I can definitely see that as nice to have for editor / tooling support! Would implementations like kdl-rs
strip this out of the AST returned to users? Essentially just because I agree it would be something nice to add to get LSP support, but I wouldn't want that addition to affect any result of parsing!
Is this the sort of thing that could / should be a standard-formatted comment? Like how sometimes people seem to set vi / vim settings in comments with vi:
or whatever at the top of a file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users would have to ignore this field, if they want it to be present. Just like folks are free to ignore $schema
when using JSONSchema.
I thought about using a comment for this, but that seems... somehow worse? I don't know. I feel like it should just be data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair... I'll sleep on it — I'd personally find it a little wacky to have to describe in my schema for a KDL format the node that will be used to refer back to that schema. So, if I wanted to do what @bgotink mentioned, and link my current KDL file to a schema for editor support, I'd need to make sure that that schema includes the @kdl:schema
node in it as well? If it's even a URL linked to a file I can edit?
But, before I whinge any more, I'll read up properly on $schema
in JSON and do some proper thinking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this so you no longer need to remember to include this in your own schemas: it is now a default item for all schemas, and can be disabled by doing node @kdl:schema { undefine }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if I wanted to do what @bgotink mentioned, and link my current KDL file to a schema for editor support, I'd need to make sure that that schema includes the @KDL:schema node in it as well?
There are multiple versions of JSONSchema. The $schema
in the schema is used to define the version of JSONSchema in which it is written, so it serves an extra purpose on top of enabling your editor to provide support while writing schemas.
For example, data file "my-data.kdl" has @kdl:schema "./my-schema.kdl"
, schema file my-schema.kdl
would then have @kdl:schema "https://kdl.dev/schema/v0.2.kdl";
(fictitious URL)
I thought about using a comment for this, but that seems... somehow worse? I don't know. I feel like it should just be data.
Agreed, both because it feels bad and because it's actually useful data. Just like the version of JSONSchema, this could be used to differentiate between different versions of configuration file formats. If this were a comment, you'd have to have a comment to a versioned URL ánd a version indicator in the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've done some pondering and perhaps come to the conclusion that the first draft of this was best, but with some caveats. If I'm understanding correctly, it sounds like there's two, slightly distinct, reasons to want something like this:
- It's a way to link files to a schema that can be used to enrich editor / LSP-type support
- It can also be used as a sort of idiomatic / canonical version field, if a format's schema has changed over time
But one of my problems remains if the @kdl:schema
must be an uncommented node: say I'm working on a certain type of KDL config file — it's the type of thing that's complex enough that I'd like my LSP to know the schema and validate my edits. The issue, however, is that this config file is read by some code that looks like this:
#[derive(Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
struct KdlConfig {
bonds: BondData,
modifications: ModData,
// Etc...
}
Where the important part is the #[serde(deny_unknown_fields)]
— which is think is a nice thing to have if you want to enforce that config files are free of unrecognized sections, something that could help a lot in catching user typos. Now I'm in a bit of a bind because, once again, I have to switch between 1) having @kdl:schema
uncommented, so LSP support works but the config can't actually be loaded, or 2) commenting out that node, losing LSP support, but actually being able to test the config with the program it's meant for.
Obviously, however, this is a non-problem if the program already uses @kdl:schema
as a standard convention for specifying a version:
#[derive(Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
struct KdlConfig {
#[serde(rename = "@kdl:schema")
version: Url
bonds: BondData,
modifications: ModData,
// Etc...
}
Then leaving @kdl:schema
uncommented 100% of the time is perfectly fine!
To address both of these cases, we could:
- Make it clear that
@kdl:schema
is the idiomatic way to version KDL formats, and set a good example by having thekdl-schema.kdl
define the optional@kdl:schema
node - Make sure it's clear to all LSP implementations that schema information can come in one of two forms: either an uncommented
@kdl:schema
node, I suppose anywhere in the file, or a/- @kdl:schema
comment at the top of the file for when you want editor support, but the KDL format you're writing disallows it. - Maybe revert the "it is now a default item for all schemas" change? I think I'd prefer to reduce the amount of magic, and if a format wants to use
@kdl:schema
to keep track of versions, then I think it should say so explicitly.
In summary then, I think rolling this back to how it started (schemas must explicitly allow @kdl:schema
) and making it clear that language servers should recognize both @kdl:schema
and /- @kdl:schema
(maybe allow other commented forms?) would be my ideal solution.
That way even if someone's program is explicitly disallowing unknown fields (to save users from typos) and doesn't care to version files, I can still get editor support.
Let me know what you think about demystifying the @kdl:schema
node and "downgrading" it to something that's idiomatic convention / recognized by tooling.
SIDENOTE: Serde support for KDL will need to make decently heavy use of #[serde(rename = "...")]
, and quick-xml
, for example, already uses the "@..." prefix for attributes. I was thinking perhaps we could use "$..." for properties (since they felt kind shell-variable-ish), but we'll likely need at least one other special prefix reserved. If we used @
for that, like #[serde(rename="@arguments")]
or whatever, then reading @kdl:schema
would become a bit more awkward (though not impossible, I don't think?). Might be a bit bike-sheddy, but we should perhaps think about either removing the @
, or rule it out as a potential serde
prefix!
Sorry for the flip-flop on this, but I think this is a solution I'd be 100% happy with <3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking through it, the main difference here with JSONSchema is that it allows other properties by default. That is, "$schema": "foo"
won't fail validation.
It would be a weaker validation, but we could always just default to allowing all children/props/args, and replace allow-others
with disallow-others
. Basically, make things more permissive by default, but let people constrain stuff. We can also add an allow-ksl
to specifically allow KSL-related keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have more time to reply to other comments soon, but for this one, I do like KSL being much more stringent (disallowing others by default), but I would be okay with something like allow-ksl
if you think it would be useful — though maybe that only makes sense if there are several KSL keywords eventually?
Unfortunately, I don't think that would solve the issue of an application not using KSL, just using serde
and me wanting to add that LSP-type support (assuming I can't change the serde
derive myself). Because in that case there is no schema validation being done (via KSL, anyways). For what it's worth, knus
/ knuffle
by default disallow unknown nodes, so I couldn't get schema-enhanced editor support in any of my current projects unless I could have /-@ksl:schema
commented.
But I'm happy with what you originally had + letting the LSP pick up the node in comment form as well? If the user wants to use @ksl:schema
as a real node / version, then I do think they should be required to specify that in their schema — like you originally had it :)
That's my personal preference, anyways — keeping things very explicit and minimizing special cases :)
I can't seem to resolve review comments I've opened, so feel free to resolve and hide any of those threads that feel resolved! |
d3dea5d
to
b03e651
Compare
schema/ksl-schema.kdl
Outdated
} | ||
prop path title="KPath @ksl:reference to another node." { | ||
type string | ||
format kpath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this means we have to finalize the KPath 2.0 spec before we can finalize this spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. And possibly make a small modification to the v2 spec itself to add kpath
as a "reserved" string format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean as a type annotation? Like these? https://github.com/kdl-org/kdl/blob/main/draft-marchan-kdl2.md#reserved-type-annotations-for-strings
Something like (kpath)"<whatever>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the formats correspond to type annotations, but they do not specify the annotation that needs to be present. That is, format kpath
will validate "foo"
as if it were (kpath)"foo"
, but the annotation is not required. To do that, use the annotation
node.
ecc0c99
to
c98454b
Compare
Looks like lots of exciting updates — I'll aim to take a look this evening! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still didn't quite have time to pour over every aspect in detail, but as before I hope some of the comments are helpful!
Feel free to resolve threads liberally to keep things manageable :)
// TODO: add a feature that will let us specify that `args` | ||
// MUST be after any existing `arg` nodes in the current | ||
// scope. i.e. you can't do `node x { args; arg }` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a child node like after
or before
with a reference?
So in args
it could be:
node args {
after super.arg
}
For example!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current plan is to port XSD's xsd:group
(including xsd:sequence
), and seeing how far it gets me with this kind of thing
schema/ksl-schema.kdl
Outdated
} | ||
} | ||
// TODO: establish equality expectations. | ||
node const about=""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe if we rename enum
, then can get rid of this?
Maybe if enum
were has-value
(hmm, perhaps that name too can be improved...), it could be:
has-value 2
has-value self documentation
has-value allow-others=#true u8 f64 date // Etc...
Overall, I think having, where possible, one right way to right things would be a good property and keep things simple!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably unify these, yeah. I don't like has-value
, though. Maybe literally just value
.
schema/ksl-schema.kdl
Outdated
} | ||
} | ||
} | ||
node about-mixin id=about-mixin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've maybe shifted my opinion a bit vs the past — in line with, ideally, "one right way", maybe we should be a lot more hesitant to allow things as both properties and children.
Also something you may have already thought about, but a massive part of the KDL validation I've been doing is maybe something I saw a comment referring to "cross-validation" about? In a shortened version of the example I already shared in this thread:
All of the children in the That would be a super helpful validation to have for schema where there is some sort of definitions section! |
Finally (for today), is the plan to have a working validator implementation before stabilizing the schema? It would be very very helpful to actually test writing some real schemas, and I'm sure would be invaluable for making sure the |
Cross validation is in the todo list |
Yes I'm planning on starting an implementation soon |
This is very much still WIP while I work out some kinks, but if anyone wants to take a look out of curiosity and ask questions, you're more than welcome to.