updated schema #486

zkat · 2025-01-17T09:59:07Z

This is very much still WIP while I work out some kinks, but if anyone wants to take a look out of curiosity and ask questions, you're more than welcome to.

LHolten · 2025-01-19T11:12:15Z

How do you specify that a property is "nullable" (for example a property that is allowed to be a date-time or null)?

zkat · 2025-01-19T14:46:01Z

@LHolten type string #null

TheLostLambda

Hi @zkat !

Sorry this is a super erratic and incomplete review of things, but I figured I should share what I had so far before coming back to it! Lots of the comments are likely just me not being super familiar with schemas or missing something, but hopefully a few are interesting?

My plan for tomorrow is to write some schemas for some of my KDL files, bending this syntax if needed to get something that feels clean and minimal, then I'll hopefully have an idea of where things can be tweaked!

The only other specific thing I can think is related to #492 somewhat — I personally think arbitrary links around documents might be a little wacky, and all of my files where I've been "inheriting" properties have looked like:

residues {
    types {
        Monosaccharide {
            functional-group "Hydroxyl" at="Reducing End"
            functional-group "Hydroxyl" at="Nonreducing End"
            functional-group "Hydroxyl" at="6-Position"
            functional-group "Acetyl" at="Secondary Amide"
        }

        AminoAcid {
            functional-group "Amino" at="N-Terminal"
            functional-group "Carboxyl" at="C-Terminal"
        }
    }

    Monosaccharide "g" "N-Acetylglucosamine" {
        composition "C8H15NO6"
    }
    Monosaccharide "m" "N-Acetylmuramic Acid" {
        composition "C11H19NO8"
        functional-group "Carboxyl" at="Lactyl Ether"
    }
    Monosaccharide "x" "Unknown Monosaccharide" {
        composition null
    }

    AminoAcid "A" "Alanine" {
        composition "C3H7NO2"
    }
    AminoAcid "B" "Diaminobutyric Acid" {
        composition "C4H10N2O2"
        functional-group "Amino" at="Sidechain"
    }
    AminoAcid "C" "Cysteine" {
        composition "C3H7NO2S"
    }
    AminoAcid "D" "Aspartic Acid" {
        composition "C4H7NO4"
        functional-group "Carboxyl" at="Sidechain"
    }
    AminoAcid "E" "Glutamic Acid" {
        composition "C5H9NO4"
        functional-group "Carboxyl" at="Sidechain"
    }
    AminoAcid "F" "Phenylalanine" {
        composition "C9H11NO2"
    }
    AminoAcid "G" "Glycine" {
        composition "C2H5NO2"
    }
    AminoAcid "H" "Histidine" {
        composition "C6H9N3O2"
    }
    AminoAcid "I" "Isoleucine" {
        composition "C6H13NO2"
    }
    AminoAcid "J" "Diaminopimelic Acid" {
        composition "C7H14N2O4"
        functional-group "Amino" at="Sidechain"
        functional-group "Carboxyl" at="Sidechain"
    }
// SNIP ...

Where I have some node "hard-coded" to store some templates that are used in the rest of the document, so the "variables" are all in one place and you know where to scroll to to find that information. Now, I can see how that might be tricky when you're writing something like a schema for the schema, and you need a lot more linking? If that's the case, then maybe giving relative or absolute node "paths" in something like KQL could be good? I'll have to think a bit more on it, but I think tagging one node, then linking to that tag from anywhere could be a bit confusing and tricky to maintain / keep the ID's always in sync.

Sorry, I've rambled far far too much here, but as a concluding thought, a bit like the unicode quotes "go big or go home" discussion, I'd lean on the side of keeping things more general, minimal, and future-proof personally — being a bit cautious of feature-creep. Things like rel=self|documentation or author ORCIDs being in the specification feel like things that be over-specifying? Though that's just my gut reaction!

I'll have much more practical and hopefully helpful stuff to share after I try writing a couple of schemas myself! Sorry in the meantime and hopefully there is a nice regex for finding the useful needle in this haystack (like that insane semver one) 😅

TheLostLambda · 2025-01-20T19:03:29Z

schema/cargo.kdl

@@ -0,0 +1,172 @@
+@kdl:schema "https://github.com/kdl-org/kdl/blob/main/schema/kdl-schema.kdl"


Where will this @kdl:schema be used? If it doesn't have to be a reachable URL, then do you still think that it's valuable to have as a top-level key like this?

Do you envision KDL documents adding this to their top-level? Does that mean users are expected to handle this node when parsing with kdl-rs? Or should KDL parsers remove this node from the AST before passing it on to user code? If so, then I think that would require a change to the KDL spec?

I'm not opposed to this, though I'd personally probably keep them separate and give any sort of validator two files (the data and the schema it follows) — unless I'm misunderstanding though, I think this might be better as a field in metadata.

This is similar to JSONSchema's $schema, which points to a valid (relative) URL.

This is not required, e.g. I can edit my angular.json and get LSP support because the Angular VSCode plugin interprets any file called angular.json as an Angular configuration document.
But, for schemas that don't come with editor plugins or that don't have a filename convention, it is very handy that you can add that $schema property and the JSON plugin will kick in and provide LSP support.

Okay, I can definitely see that as nice to have for editor / tooling support! Would implementations like kdl-rs strip this out of the AST returned to users? Essentially just because I agree it would be something nice to add to get LSP support, but I wouldn't want that addition to affect any result of parsing!

Is this the sort of thing that could / should be a standard-formatted comment? Like how sometimes people seem to set vi / vim settings in comments with vi: or whatever at the top of a file?

Users would have to ignore this field, if they want it to be present. Just like folks are free to ignore $schema when using JSONSchema.

I thought about using a comment for this, but that seems... somehow worse? I don't know. I feel like it should just be data.

That's fair... I'll sleep on it — I'd personally find it a little wacky to have to describe in my schema for a KDL format the node that will be used to refer back to that schema. So, if I wanted to do what @bgotink mentioned, and link my current KDL file to a schema for editor support, I'd need to make sure that that schema includes the @kdl:schema node in it as well? If it's even a URL linked to a file I can edit?

But, before I whinge any more, I'll read up properly on $schema in JSON and do some proper thinking!

I've changed this so you no longer need to remember to include this in your own schemas: it is now a default item for all schemas, and can be disabled by doing node @kdl:schema { undefine }.

So, if I wanted to do what @bgotink mentioned, and link my current KDL file to a schema for editor support, I'd need to make sure that that schema includes the @KDL:schema node in it as well?

There are multiple versions of JSONSchema. The $schema in the schema is used to define the version of JSONSchema in which it is written, so it serves an extra purpose on top of enabling your editor to provide support while writing schemas.

For example, data file "my-data.kdl" has @kdl:schema "./my-schema.kdl", schema file my-schema.kdl would then have @kdl:schema "https://kdl.dev/schema/v0.2.kdl"; (fictitious URL)

I thought about using a comment for this, but that seems... somehow worse? I don't know. I feel like it should just be data.

Agreed, both because it feels bad and because it's actually useful data. Just like the version of JSONSchema, this could be used to differentiate between different versions of configuration file formats. If this were a comment, you'd have to have a comment to a versioned URL ánd a version indicator in the data.

Okay, I've done some pondering and perhaps come to the conclusion that the first draft of this was best, but with some caveats. If I'm understanding correctly, it sounds like there's two, slightly distinct, reasons to want something like this:

It's a way to link files to a schema that can be used to enrich editor / LSP-type support

It can also be used as a sort of idiomatic / canonical version field, if a format's schema has changed over time

But one of my problems remains if the @kdl:schema must be an uncommented node: say I'm working on a certain type of KDL config file — it's the type of thing that's complex enough that I'd like my LSP to know the schema and validate my edits. The issue, however, is that this config file is read by some code that looks like this:

#[derive(Serialize, Deserialize)] #[serde(deny_unknown_fields)] struct KdlConfig { bonds: BondData, modifications: ModData, // Etc... }

Where the important part is the #[serde(deny_unknown_fields)] — which is think is a nice thing to have if you want to enforce that config files are free of unrecognized sections, something that could help a lot in catching user typos. Now I'm in a bit of a bind because, once again, I have to switch between 1) having @kdl:schema uncommented, so LSP support works but the config can't actually be loaded, or 2) commenting out that node, losing LSP support, but actually being able to test the config with the program it's meant for.

Obviously, however, this is a non-problem if the program already uses @kdl:schema as a standard convention for specifying a version:

#[derive(Serialize, Deserialize)] #[serde(deny_unknown_fields)] struct KdlConfig { #[serde(rename = "@kdl:schema") version: Url bonds: BondData, modifications: ModData, // Etc... }

Then leaving @kdl:schema uncommented 100% of the time is perfectly fine!

To address both of these cases, we could:

Make it clear that @kdl:schema is the idiomatic way to version KDL formats, and set a good example by having the kdl-schema.kdl define the optional @kdl:schema node

Make sure it's clear to all LSP implementations that schema information can come in one of two forms: either an uncommented @kdl:schema node, I suppose anywhere in the file, or a /- @kdl:schema comment at the top of the file for when you want editor support, but the KDL format you're writing disallows it.

Maybe revert the "it is now a default item for all schemas" change? I think I'd prefer to reduce the amount of magic, and if a format wants to use @kdl:schema to keep track of versions, then I think it should say so explicitly.

In summary then, I think rolling this back to how it started (schemas must explicitly allow @kdl:schema) and making it clear that language servers should recognize both @kdl:schema and /- @kdl:schema (maybe allow other commented forms?) would be my ideal solution.

That way even if someone's program is explicitly disallowing unknown fields (to save users from typos) and doesn't care to version files, I can still get editor support.

Let me know what you think about demystifying the @kdl:schema node and "downgrading" it to something that's idiomatic convention / recognized by tooling.

SIDENOTE: Serde support for KDL will need to make decently heavy use of #[serde(rename = "...")], and quick-xml, for example, already uses the "@..." prefix for attributes. I was thinking perhaps we could use "$..." for properties (since they felt kind shell-variable-ish), but we'll likely need at least one other special prefix reserved. If we used @ for that, like #[serde(rename="@arguments")] or whatever, then reading @kdl:schema would become a bit more awkward (though not impossible, I don't think?). Might be a bit bike-sheddy, but we should perhaps think about either removing the @, or rule it out as a potential serde prefix!

Sorry for the flip-flop on this, but I think this is a solution I'd be 100% happy with <3

Thinking through it, the main difference here with JSONSchema is that it allows other properties by default. That is, "$schema": "foo" won't fail validation.

It would be a weaker validation, but we could always just default to allowing all children/props/args, and replace allow-others with disallow-others. Basically, make things more permissive by default, but let people constrain stuff. We can also add an allow-ksl to specifically allow KSL-related keywords.

I'll have more time to reply to other comments soon, but for this one, I do like KSL being much more stringent (disallowing others by default), but I would be okay with something like allow-ksl if you think it would be useful — though maybe that only makes sense if there are several KSL keywords eventually?

Unfortunately, I don't think that would solve the issue of an application not using KSL, just using serde and me wanting to add that LSP-type support (assuming I can't change the serde derive myself). Because in that case there is no schema validation being done (via KSL, anyways). For what it's worth, knus / knuffle by default disallow unknown nodes, so I couldn't get schema-enhanced editor support in any of my current projects unless I could have /-@ksl:schema commented.

But I'm happy with what you originally had + letting the LSP pick up the node in comment form as well? If the user wants to use @ksl:schema as a real node / version, then I do think they should be required to specify that in their schema — like you originally had it :)

That's my personal preference, anyways — keeping things very explicit and minimizing special cases :)

schema/cargo.kdl

schema/kdl-schema.kdl

TheLostLambda · 2025-01-20T23:18:25Z

I can't seem to resolve review comments I've opened, so feel free to resolve and hide any of those threads that feel resolved!

bgotink · 2025-01-21T21:16:01Z

schema/ksl-schema.kdl

+                                    }
+                                    prop path title="KPath @ksl:reference to another node." {
+                                        type string
+                                        format kpath


I assume this means we have to finalize the KPath 2.0 spec before we can finalize this spec?

yep. And possibly make a small modification to the v2 spec itself to add kpath as a "reserved" string format.

Do you mean as a type annotation? Like these? https://github.com/kdl-org/kdl/blob/main/draft-marchan-kdl2.md#reserved-type-annotations-for-strings

Something like (kpath)"<whatever>?

the formats correspond to type annotations, but they do not specify the annotation that needs to be present. That is, format kpath will validate "foo" as if it were (kpath)"foo", but the annotation is not required. To do that, use the annotation node.

TheLostLambda · 2025-01-24T08:50:23Z

Looks like lots of exciting updates — I'll aim to take a look this evening!

TheLostLambda

I still didn't quite have time to pour over every aspect in detail, but as before I hope some of the comments are helpful!

Feel free to resolve threads liberally to keep things manageable :)

schema/ksl-schema.kdl

schema/cargo.kdl

schema/ksl-schema.kdl

TheLostLambda · 2025-01-24T21:36:58Z

schema/ksl-schema.kdl

+                            // TODO: add a feature that will let us specify that `args`
+                            // MUST be after any existing `arg` nodes in the current
+                            // scope. i.e. you can't do `node x { args; arg }`


Could be a child node like after or before with a reference?

So in args it could be:

node args { after super.arg }

For example!

My current plan is to port XSD's xsd:group (including xsd:sequence), and seeing how far it gets me with this kind of thing

TheLostLambda · 2025-01-24T21:44:57Z

schema/ksl-schema.kdl

+                }
+            }
+            // TODO: establish equality expectations.
+            node const about="""


Maybe if we rename enum, then can get rid of this?

Maybe if enum were has-value (hmm, perhaps that name too can be improved...), it could be:

has-value 2 has-value self documentation has-value allow-others=#true u8 f64 date // Etc...

Overall, I think having, where possible, one right way to right things would be a good property and keep things simple!

We can probably unify these, yeah. I don't like has-value, though. Maybe literally just value.

schema/ksl-schema.kdl

TheLostLambda · 2025-01-24T21:52:07Z

schema/ksl-schema.kdl

+            }
+        }
+    }
+    node about-mixin id=about-mixin  {


I've maybe shifted my opinion a bit vs the past — in line with, ideally, "one right way", maybe we should be a lot more hesitant to allow things as both properties and children.

TheLostLambda · 2025-01-24T22:01:46Z

Also something you may have already thought about, but a massive part of the KDL validation I've been doing is maybe something I saw a comment referring to "cross-validation" about?

In a shortened version of the example I already shared in this thread:

residues {
    types {
        Monosaccharide {
            // SNIP ...
        }

        AminoAcid {
            // SNIP ...
        }
    }

    Monosaccharide "g" "N-Acetylglucosamine" {
        composition "C8H15NO6"
    }

// SNIP ...

    AminoAcid "A" "Alanine" {
        composition "C3H7NO2"
    }

// SNIP ...

All of the children in the residues section have to have node-names that match one of the types in residue.types!

That would be a super helpful validation to have for schema where there is some sort of definitions section!

TheLostLambda · 2025-01-24T22:06:28Z

Finally (for today), is the plan to have a working validator implementation before stabilizing the schema? It would be very very helpful to actually test writing some real schemas, and I'm sure would be invaluable for making sure the ksl-schema.kdl file is valid before it's stabilized!

zkat · 2025-01-24T22:17:33Z

Also something you may have already thought about, but a massive part of the KDL validation I've been doing is maybe something I saw a comment referring to "cross-validation" about?

In a shortened version of the example I already shared in this thread:
residues {

    types {

        Monosaccharide {

            // SNIP ...

        }



        AminoAcid {

            // SNIP ...

        }

    }



    Monosaccharide "g" "N-Acetylglucosamine" {

        composition "C8H15NO6"

    }



// SNIP ...



    AminoAcid "A" "Alanine" {

        composition "C3H7NO2"

    }



// SNIP ...
All of the children in the residues section have to have node-names that match one of the types in residue.types!

That would be a super helpful validation to have for schema where there is some sort of definitions section!

Cross validation is in the todo list

zkat · 2025-01-24T22:18:01Z

Finally (for today), is the plan to have a working validator implementation before stabilizing the schema? It would be very very helpful to actually test writing some real schemas, and I'm sure would be invaluable for making sure the ksl-schema.kdl file is valid before it's stabilized!

Yes I'm planning on starting an implementation soon

zkat force-pushed the zkat/schema-v2 branch 4 times, most recently from 529ba54 to d4e1d5c Compare January 19, 2025 07:44

zkat force-pushed the zkat/schema-v2 branch 3 times, most recently from b5e8433 to b6c96a0 Compare January 20, 2025 10:27

TheLostLambda reviewed Jan 20, 2025

View reviewed changes

zkat force-pushed the zkat/schema-v2 branch 2 times, most recently from d3dea5d to b03e651 Compare January 21, 2025 08:23

zkat mentioned this pull request Jan 21, 2025

KDL 2.0.0 support onyx-lang/onyx#184

Open

bgotink reviewed Jan 21, 2025

View reviewed changes

zkat force-pushed the zkat/schema-v2 branch 4 times, most recently from ecc0c99 to c98454b Compare January 24, 2025 03:28

TheLostLambda reviewed Jan 24, 2025

View reviewed changes

zkat force-pushed the zkat/schema-v2 branch from c98454b to 7244331 Compare January 26, 2025 22:54

zkat linked an issue Feb 18, 2025 that may be closed by this pull request

New KDL Schema #504

Open

remexre mentioned this pull request Feb 18, 2025

Clarify intended semantics of isize, usize, and regex types #505

Open

zkat force-pushed the zkat/schema-v2 branch from 7244331 to f90aa2e Compare February 23, 2025 20:22

new schema format/spec

2d60b75

zkat force-pushed the zkat/schema-v2 branch from f90aa2e to 2d60b75 Compare March 30, 2025 21:29

		@@ -0,0 +1,172 @@
		@kdl:schema "https://github.com/kdl-org/kdl/blob/main/schema/kdl-schema.kdl"

Uh oh!

updated schema #486

Are you sure you want to change the base?

updated schema #486

Uh oh!

Conversation

zkat commented Jan 17, 2025

Uh oh!

LHolten commented Jan 19, 2025

Uh oh!

zkat commented Jan 19, 2025

Uh oh!

TheLostLambda left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheLostLambda commented Jan 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheLostLambda commented Jan 24, 2025

Uh oh!

TheLostLambda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheLostLambda commented Jan 24, 2025

Uh oh!

TheLostLambda commented Jan 24, 2025

Uh oh!

zkat commented Jan 24, 2025

Uh oh!