Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated schema #486

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

updated schema #486

wants to merge 1 commit into from

Conversation

zkat
Copy link
Member

@zkat zkat commented Jan 17, 2025

This is very much still WIP while I work out some kinks, but if anyone wants to take a look out of curiosity and ask questions, you're more than welcome to.

@zkat zkat force-pushed the zkat/schema-v2 branch 4 times, most recently from 529ba54 to d4e1d5c Compare January 19, 2025 07:44
@LHolten
Copy link

LHolten commented Jan 19, 2025

How do you specify that a property is "nullable" (for example a property that is allowed to be a date-time or null)?

@zkat
Copy link
Member Author

zkat commented Jan 19, 2025

@LHolten type string #null

@zkat zkat force-pushed the zkat/schema-v2 branch 3 times, most recently from b5e8433 to b6c96a0 Compare January 20, 2025 10:27
Copy link

@TheLostLambda TheLostLambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zkat !

Sorry this is a super erratic and incomplete review of things, but I figured I should share what I had so far before coming back to it! Lots of the comments are likely just me not being super familiar with schemas or missing something, but hopefully a few are interesting?

My plan for tomorrow is to write some schemas for some of my KDL files, bending this syntax if needed to get something that feels clean and minimal, then I'll hopefully have an idea of where things can be tweaked!

The only other specific thing I can think is related to #492 somewhat — I personally think arbitrary links around documents might be a little wacky, and all of my files where I've been "inheriting" properties have looked like:

residues {
    types {
        Monosaccharide {
            functional-group "Hydroxyl" at="Reducing End"
            functional-group "Hydroxyl" at="Nonreducing End"
            functional-group "Hydroxyl" at="6-Position"
            functional-group "Acetyl" at="Secondary Amide"
        }

        AminoAcid {
            functional-group "Amino" at="N-Terminal"
            functional-group "Carboxyl" at="C-Terminal"
        }
    }

    Monosaccharide "g" "N-Acetylglucosamine" {
        composition "C8H15NO6"
    }
    Monosaccharide "m" "N-Acetylmuramic Acid" {
        composition "C11H19NO8"
        functional-group "Carboxyl" at="Lactyl Ether"
    }
    Monosaccharide "x" "Unknown Monosaccharide" {
        composition null
    }

    AminoAcid "A" "Alanine" {
        composition "C3H7NO2"
    }
    AminoAcid "B" "Diaminobutyric Acid" {
        composition "C4H10N2O2"
        functional-group "Amino" at="Sidechain"
    }
    AminoAcid "C" "Cysteine" {
        composition "C3H7NO2S"
    }
    AminoAcid "D" "Aspartic Acid" {
        composition "C4H7NO4"
        functional-group "Carboxyl" at="Sidechain"
    }
    AminoAcid "E" "Glutamic Acid" {
        composition "C5H9NO4"
        functional-group "Carboxyl" at="Sidechain"
    }
    AminoAcid "F" "Phenylalanine" {
        composition "C9H11NO2"
    }
    AminoAcid "G" "Glycine" {
        composition "C2H5NO2"
    }
    AminoAcid "H" "Histidine" {
        composition "C6H9N3O2"
    }
    AminoAcid "I" "Isoleucine" {
        composition "C6H13NO2"
    }
    AminoAcid "J" "Diaminopimelic Acid" {
        composition "C7H14N2O4"
        functional-group "Amino" at="Sidechain"
        functional-group "Carboxyl" at="Sidechain"
    }
// SNIP ...

Where I have some node "hard-coded" to store some templates that are used in the rest of the document, so the "variables" are all in one place and you know where to scroll to to find that information. Now, I can see how that might be tricky when you're writing something like a schema for the schema, and you need a lot more linking? If that's the case, then maybe giving relative or absolute node "paths" in something like KQL could be good? I'll have to think a bit more on it, but I think tagging one node, then linking to that tag from anywhere could be a bit confusing and tricky to maintain / keep the ID's always in sync.

Sorry, I've rambled far far too much here, but as a concluding thought, a bit like the unicode quotes "go big or go home" discussion, I'd lean on the side of keeping things more general, minimal, and future-proof personally — being a bit cautious of feature-creep. Things like rel=self|documentation or author ORCIDs being in the specification feel like things that be over-specifying? Though that's just my gut reaction!

I'll have much more practical and hopefully helpful stuff to share after I try writing a couple of schemas myself! Sorry in the meantime and hopefully there is a nice regex for finding the useful needle in this haystack (like that insane semver one) 😅

@@ -0,0 +1,172 @@
@kdl:schema "https://github.com/kdl-org/kdl/blob/main/schema/kdl-schema.kdl"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where will this @kdl:schema be used? If it doesn't have to be a reachable URL, then do you still think that it's valuable to have as a top-level key like this?

Do you envision KDL documents adding this to their top-level? Does that mean users are expected to handle this node when parsing with kdl-rs? Or should KDL parsers remove this node from the AST before passing it on to user code? If so, then I think that would require a change to the KDL spec?

I'm not opposed to this, though I'd personally probably keep them separate and give any sort of validator two files (the data and the schema it follows) — unless I'm misunderstanding though, I think this might be better as a field in metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to JSONSchema's $schema, which points to a valid (relative) URL.

This is not required, e.g. I can edit my angular.json and get LSP support because the Angular VSCode plugin interprets any file called angular.json as an Angular configuration document.
But, for schemas that don't come with editor plugins or that don't have a filename convention, it is very handy that you can add that $schema property and the JSON plugin will kick in and provide LSP support.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can definitely see that as nice to have for editor / tooling support! Would implementations like kdl-rs strip this out of the AST returned to users? Essentially just because I agree it would be something nice to add to get LSP support, but I wouldn't want that addition to affect any result of parsing!

Is this the sort of thing that could / should be a standard-formatted comment? Like how sometimes people seem to set vi / vim settings in comments with vi: or whatever at the top of a file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users would have to ignore this field, if they want it to be present. Just like folks are free to ignore $schema when using JSONSchema.

I thought about using a comment for this, but that seems... somehow worse? I don't know. I feel like it should just be data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair... I'll sleep on it — I'd personally find it a little wacky to have to describe in my schema for a KDL format the node that will be used to refer back to that schema. So, if I wanted to do what @bgotink mentioned, and link my current KDL file to a schema for editor support, I'd need to make sure that that schema includes the @kdl:schema node in it as well? If it's even a URL linked to a file I can edit?

But, before I whinge any more, I'll read up properly on $schema in JSON and do some proper thinking!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this so you no longer need to remember to include this in your own schemas: it is now a default item for all schemas, and can be disabled by doing node @kdl:schema { undefine }.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I wanted to do what @bgotink mentioned, and link my current KDL file to a schema for editor support, I'd need to make sure that that schema includes the @KDL:schema node in it as well?

There are multiple versions of JSONSchema. The $schema in the schema is used to define the version of JSONSchema in which it is written, so it serves an extra purpose on top of enabling your editor to provide support while writing schemas.

For example, data file "my-data.kdl" has @kdl:schema "./my-schema.kdl", schema file my-schema.kdl would then have @kdl:schema "https://kdl.dev/schema/v0.2.kdl"; (fictitious URL)

I thought about using a comment for this, but that seems... somehow worse? I don't know. I feel like it should just be data.

Agreed, both because it feels bad and because it's actually useful data. Just like the version of JSONSchema, this could be used to differentiate between different versions of configuration file formats. If this were a comment, you'd have to have a comment to a versioned URL ánd a version indicator in the data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've done some pondering and perhaps come to the conclusion that the first draft of this was best, but with some caveats. If I'm understanding correctly, it sounds like there's two, slightly distinct, reasons to want something like this:

  1. It's a way to link files to a schema that can be used to enrich editor / LSP-type support
  2. It can also be used as a sort of idiomatic / canonical version field, if a format's schema has changed over time

But one of my problems remains if the @kdl:schema must be an uncommented node: say I'm working on a certain type of KDL config file — it's the type of thing that's complex enough that I'd like my LSP to know the schema and validate my edits. The issue, however, is that this config file is read by some code that looks like this:

#[derive(Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
struct KdlConfig {
    bonds: BondData,
    modifications: ModData,
    // Etc...
}

Where the important part is the #[serde(deny_unknown_fields)] — which is think is a nice thing to have if you want to enforce that config files are free of unrecognized sections, something that could help a lot in catching user typos. Now I'm in a bit of a bind because, once again, I have to switch between 1) having @kdl:schema uncommented, so LSP support works but the config can't actually be loaded, or 2) commenting out that node, losing LSP support, but actually being able to test the config with the program it's meant for.

Obviously, however, this is a non-problem if the program already uses @kdl:schema as a standard convention for specifying a version:

#[derive(Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
struct KdlConfig {
    #[serde(rename = "@kdl:schema")
    version: Url
    bonds: BondData,
    modifications: ModData,
    // Etc...
}

Then leaving @kdl:schema uncommented 100% of the time is perfectly fine!

To address both of these cases, we could:

  1. Make it clear that @kdl:schema is the idiomatic way to version KDL formats, and set a good example by having the kdl-schema.kdl define the optional @kdl:schema node
  2. Make sure it's clear to all LSP implementations that schema information can come in one of two forms: either an uncommented @kdl:schema node, I suppose anywhere in the file, or a /- @kdl:schema comment at the top of the file for when you want editor support, but the KDL format you're writing disallows it.
  3. Maybe revert the "it is now a default item for all schemas" change? I think I'd prefer to reduce the amount of magic, and if a format wants to use @kdl:schema to keep track of versions, then I think it should say so explicitly.

In summary then, I think rolling this back to how it started (schemas must explicitly allow @kdl:schema) and making it clear that language servers should recognize both @kdl:schema and /- @kdl:schema (maybe allow other commented forms?) would be my ideal solution.

That way even if someone's program is explicitly disallowing unknown fields (to save users from typos) and doesn't care to version files, I can still get editor support.

Let me know what you think about demystifying the @kdl:schema node and "downgrading" it to something that's idiomatic convention / recognized by tooling.

SIDENOTE: Serde support for KDL will need to make decently heavy use of #[serde(rename = "...")], and quick-xml, for example, already uses the "@..." prefix for attributes. I was thinking perhaps we could use "$..." for properties (since they felt kind shell-variable-ish), but we'll likely need at least one other special prefix reserved. If we used @ for that, like #[serde(rename="@arguments")] or whatever, then reading @kdl:schema would become a bit more awkward (though not impossible, I don't think?). Might be a bit bike-sheddy, but we should perhaps think about either removing the @, or rule it out as a potential serde prefix!

Sorry for the flip-flop on this, but I think this is a solution I'd be 100% happy with <3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking through it, the main difference here with JSONSchema is that it allows other properties by default. That is, "$schema": "foo" won't fail validation.

It would be a weaker validation, but we could always just default to allowing all children/props/args, and replace allow-others with disallow-others. Basically, make things more permissive by default, but let people constrain stuff. We can also add an allow-ksl to specifically allow KSL-related keywords.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have more time to reply to other comments soon, but for this one, I do like KSL being much more stringent (disallowing others by default), but I would be okay with something like allow-ksl if you think it would be useful — though maybe that only makes sense if there are several KSL keywords eventually?

Unfortunately, I don't think that would solve the issue of an application not using KSL, just using serde and me wanting to add that LSP-type support (assuming I can't change the serde derive myself). Because in that case there is no schema validation being done (via KSL, anyways). For what it's worth, knus / knuffle by default disallow unknown nodes, so I couldn't get schema-enhanced editor support in any of my current projects unless I could have /-@ksl:schema commented.

But I'm happy with what you originally had + letting the LSP pick up the node in comment form as well? If the user wants to use @ksl:schema as a real node / version, then I do think they should be required to specify that in their schema — like you originally had it :)

That's my personal preference, anyways — keeping things very explicit and minimizing special cases :)

schema/cargo.kdl Show resolved Hide resolved
schema/cargo.kdl Show resolved Hide resolved
schema/cargo.kdl Show resolved Hide resolved
schema/cargo.kdl Show resolved Hide resolved
schema/cargo.kdl Show resolved Hide resolved
schema/cargo.kdl Show resolved Hide resolved
schema/kdl-schema.kdl Outdated Show resolved Hide resolved
@TheLostLambda
Copy link

I can't seem to resolve review comments I've opened, so feel free to resolve and hide any of those threads that feel resolved!

@zkat zkat force-pushed the zkat/schema-v2 branch 2 times, most recently from d3dea5d to b03e651 Compare January 21, 2025 08:23
}
prop path title="KPath @ksl:reference to another node." {
type string
format kpath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this means we have to finalize the KPath 2.0 spec before we can finalize this spec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. And possibly make a small modification to the v2 spec itself to add kpath as a "reserved" string format.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean as a type annotation? Like these? https://github.com/kdl-org/kdl/blob/main/draft-marchan-kdl2.md#reserved-type-annotations-for-strings

Something like (kpath)"<whatever>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the formats correspond to type annotations, but they do not specify the annotation that needs to be present. That is, format kpath will validate "foo" as if it were (kpath)"foo", but the annotation is not required. To do that, use the annotation node.

@zkat zkat force-pushed the zkat/schema-v2 branch 4 times, most recently from ecc0c99 to c98454b Compare January 24, 2025 03:28
@TheLostLambda
Copy link

Looks like lots of exciting updates — I'll aim to take a look this evening!

Copy link

@TheLostLambda TheLostLambda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still didn't quite have time to pour over every aspect in detail, but as before I hope some of the comments are helpful!

Feel free to resolve threads liberally to keep things manageable :)

schema/ksl-schema.kdl Show resolved Hide resolved
schema/ksl-schema.kdl Show resolved Hide resolved
schema/ksl-schema.kdl Outdated Show resolved Hide resolved
schema/cargo.kdl Show resolved Hide resolved
schema/ksl-schema.kdl Outdated Show resolved Hide resolved
Comment on lines +582 to +550
// TODO: add a feature that will let us specify that `args`
// MUST be after any existing `arg` nodes in the current
// scope. i.e. you can't do `node x { args; arg }`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a child node like after or before with a reference?

So in args it could be:

node args {
    after super.arg
}

For example!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current plan is to port XSD's xsd:group (including xsd:sequence), and seeing how far it gets me with this kind of thing

}
}
// TODO: establish equality expectations.
node const about="""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if we rename enum, then can get rid of this?

Maybe if enum were has-value (hmm, perhaps that name too can be improved...), it could be:

has-value 2
has-value self documentation
has-value allow-others=#true u8 f64 date // Etc...

Overall, I think having, where possible, one right way to right things would be a good property and keep things simple!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably unify these, yeah. I don't like has-value, though. Maybe literally just value.

schema/ksl-schema.kdl Show resolved Hide resolved
schema/ksl-schema.kdl Outdated Show resolved Hide resolved
}
}
}
node about-mixin id=about-mixin {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've maybe shifted my opinion a bit vs the past — in line with, ideally, "one right way", maybe we should be a lot more hesitant to allow things as both properties and children.

@TheLostLambda
Copy link

Also something you may have already thought about, but a massive part of the KDL validation I've been doing is maybe something I saw a comment referring to "cross-validation" about?

In a shortened version of the example I already shared in this thread:

residues {
    types {
        Monosaccharide {
            // SNIP ...
        }

        AminoAcid {
            // SNIP ...
        }
    }

    Monosaccharide "g" "N-Acetylglucosamine" {
        composition "C8H15NO6"
    }

// SNIP ...

    AminoAcid "A" "Alanine" {
        composition "C3H7NO2"
    }

// SNIP ...

All of the children in the residues section have to have node-names that match one of the types in residue.types!

That would be a super helpful validation to have for schema where there is some sort of definitions section!

@TheLostLambda
Copy link

Finally (for today), is the plan to have a working validator implementation before stabilizing the schema? It would be very very helpful to actually test writing some real schemas, and I'm sure would be invaluable for making sure the ksl-schema.kdl file is valid before it's stabilized!

@zkat
Copy link
Member Author

zkat commented Jan 24, 2025

Also something you may have already thought about, but a massive part of the KDL validation I've been doing is maybe something I saw a comment referring to "cross-validation" about?

In a shortened version of the example I already shared in this thread:


residues {

    types {

        Monosaccharide {

            // SNIP ...

        }



        AminoAcid {

            // SNIP ...

        }

    }



    Monosaccharide "g" "N-Acetylglucosamine" {

        composition "C8H15NO6"

    }



// SNIP ...



    AminoAcid "A" "Alanine" {

        composition "C3H7NO2"

    }



// SNIP ...

All of the children in the residues section have to have node-names that match one of the types in residue.types!

That would be a super helpful validation to have for schema where there is some sort of definitions section!

Cross validation is in the todo list

@zkat
Copy link
Member Author

zkat commented Jan 24, 2025

Finally (for today), is the plan to have a working validator implementation before stabilizing the schema? It would be very very helpful to actually test writing some real schemas, and I'm sure would be invaluable for making sure the ksl-schema.kdl file is valid before it's stabilized!

Yes I'm planning on starting an implementation soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants