Skip to content

Introduce source generator directives #3583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Mee-Tree
Copy link

This PR aims to add support for source generators in scala-cli, addressing #610 and providing an alternative approach to #3033.

Source generators can be configured using "named" directives. This allows for multiple generators to be configured independently and provides a clean way to specify inputs, outputs, and other parameters.

The approach can be further improved to include standard predefined generators (ex: scalapb, smithy4s), which would require only minimal configuration.

Features

  • Named source generator configurations using the new [name] syntax
  • Arbitrary command execution with cached results
  • Support for both relative and absolute paths
  • Input and output location defaults to the current directory if not specified

Example

//> using sourceGenerator.[hello].input     in
//> using sourceGenerator.[hello].output    out
//> using sourceGenerator.[hello].glob      *.txt
//> using sourceGenerator.[hello].command   python ${.}/hello.py
//> using sourceGenerator.[hello].unmanaged hello.py

A full example showcasing all possibilities can be found here.

Known Issues

  • Execution order for multiple generators is undefined
  • Incremental compilation is not working
  • No tests
  • Metals triggers the generator multiple times on save
  • Metals doesn't work with the scalapb generator in the full example (likely due to the generator's source code being in the same project)

Please let me know if you have feedback about this implementation or any of the known issues.

Copy link
Contributor

@Gedochao Gedochao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the contribution!
There's one problem with this implementation: the named directive keys syntax.
As per SIP-46, directives have to be clean, simple pairings of key strings and values. No DSL within either keys or values is allowed.
If we were to use syntax such as this, it could never leave experimental, so it's definitely a no-go.

Perhaps the individual source generators could be configured separately, in a dedicated directory or input.

You may refer to the previous attempts at this:

#3033 in particular is in unfinished status, it may be a good idea to try to finish it, instead of a completely fresh approach.

@Gedochao
Copy link
Contributor

As for the listed known issues:

Execution order for multiple generators is undefined

I think this can be addressed in a subsequent PR, for as long as the feature is experimental.

Incremental compilation is not working

...same as execution order, although this is perhaps more important.

No tests

tests will be necessary before we merge anything, even in experimental status.

Metals triggers the generator multiple times on save
Metals doesn't work with the scalapb generator in the full example (likely due to the generator's source code being in the same project)

Tagging @tgodzik and @kasiaMarek for potential help with this

@Gedochao
Copy link
Contributor

Other than that, please let me and/or @tgodzik know if you need help with this!

@Mee-Tree
Copy link
Author

Hey, thanks for the contribution! There's one problem with this implementation: the named directive keys syntax. As per SIP-46, directives have to be clean, simple pairings of key strings and values. No DSL within either keys or values is allowed. If we were to use syntax such as this, it could never leave experimental, so it's definitely a no-go.

Perhaps the individual source generators could be configured separately, in a dedicated directory or input.

You may refer to the previous attempts at this:

#3033 in particular is in unfinished status, it may be a good idea to try to finish it, instead of a completely fresh approach.

Thank you for the feedback!

I've already seen both of these PRs (the first one is referenced in the description). The approach in #3033 of having directives in the generator itself has a significant limitation - it restricts generators to be written in Scala.

Given the constraints of directives syntax this currently leaves us with two options:

  1. Follow the original solution proposed in Support code generators #610:

    //> using sourceGenerator key1|key2|key3|key4|...

    However, this approach has its drawbacks:

    • Readability decreases as the number of keys grows
    • Parameter order becomes critical and error-prone
    • Support for default values couldn't be properly implemented
  2. Only support a single generator configuration, while this may seem limiting, it could be practical because:

    • Most projects only need one generator
    • Multiple generators can be supported through a nested script.

What do you think about these? Would you prefer either of them?

I'll try to think of something else in the meantime.

@Mee-Tree
Copy link
Author

As for the listed known issues:

Execution order for multiple generators is undefined

I think this can be addressed in a subsequent PR, for as long as the feature is experimental.

Incremental compilation is not working

...same as execution order, although this is perhaps more important.

No tests

tests will be necessary before we merge anything, even in experimental status.

I wasn't really planning on leaving these issues unaddressed. I plan to work on all of them in this PR once we decide on the approach.

@Gedochao
Copy link
Contributor

The approach in #3033 of having directives in the generator itself has a significant limitation - it restricts generators to be written in Scala.

I wonder if we could have a .scala configuration file with configuration directives per generator, where one of the directives would specify how the generator is to be launched... so the generator wouldn't have to be written in Scala, it would just need a Scala definition to wrap it? That wouldn't be so different from what you proposed initially, but wouldn't require DSL in the directive keys. Scala CLI could then accept a list of directories with generator definitions. Ofc, the directories would have to be excluded from the main build... just theorising.

Only support a single generator configuration, while this may seem limiting, it could be practical

For an initial implementation, I believe that supporting just a single generator would actually be sufficient (although we'd have to still leave the door at least half-open for expanding it in the future).

@tgodzik
Copy link
Member

tgodzik commented Mar 25, 2025

Should we use special extensions for source generators? Something like script.sourcegen.scala and then //> using sourcegen script when we want to use it?

We could define everything we need there and wrap the command needed (just invoke whatever we need). It would need to be a separate scope.

@Mee-Tree
Copy link
Author

Should we use special extensions for source generators? Something like script.sourcegen.scala and then //> using sourcegen script when we want to use it?

We could define everything we need there and wrap the command needed (just invoke whatever we need). It would need to be a separate scope.

This sounds good! The only problem I see so far is that, as this would essentially be a Scala file, it would allow any Scala code, not only directives.

We can handle this in several ways:

  1. Only read the directives.

    Should it fail if it has something else? Or should it just ignore everything besides directives?
    The latter option might be a poor design decision, since if code isn't meant to be handled, it probably shouldn't be there in the first place (except for comments).

  2. Treat it as a regular Scala file.

    I can't think of what kind of code would be appropriate to put there or how should we handle it.
    For example, what should happen if it has a main method?

What are your thoughts on these?

@Mee-Tree
Copy link
Author

I also think we should add an out-of-box support for popular generators like ScalaPB.

For example, with this line in project.scala:

//> using sourcegen scalapb

This directive would:

  • Look for a configuration file named scalapb.sourcegen.scala
  • If the file is not found, use a predefined configuration
  • If the file exists, use it to override specific fields of configuration

@Gedochao
Copy link
Contributor

@Mee-Tree

Should we use special extensions for source generators? Something like script.sourcegen.scala and then //> using sourcegen script when we want to use it?
We could define everything we need there and wrap the command needed (just invoke whatever we need). It would need to be a separate scope.

This sounds good! The only problem I see so far is that, as this would essentially be a Scala file, it would allow any Scala code, not only directives.

We can handle this in several ways:

  1. Only read the directives.
    Should it fail if it has something else? Or should it just ignore everything besides directives?
    The latter option might be a poor design decision, since if code isn't meant to be handled, it probably shouldn't be there in the first place (except for comments).
  2. Treat it as a regular Scala file.
    I can't think of what kind of code would be appropriate to put there or how should we handle it.
    For example, what should happen if it has a main method?

What are your thoughts on these?

Hm... 2 options I see here.

  • script.sourcegen.scala (or script.sourcegen.sc?) is a script which itself runs the source generator, in which case any Scala code in there is used to configure and run the generator (and it should be otherwise excluded from the built Scala CLI project, and rather processed separately)
  • script.sourcegen.scala is the place to put directives configuring the generator, but is otherwise a standard .scala file in the Scala CLI project (similar to how we treat project.scala - it's where all directives should (optionally) go, but it can contain any code the user wants there for whatever reason)

I think @tgodzik meant the former (and it being in a separate scope means that it has to be built and run separately from main/test scopes).

As for how to achieve a separate scope, I wouldn't create a separate scope as per Scala CLI internals (you could, but it will be complex to implement, and I don't think it necessary). I would rather exclude the script from the Scala CLI app (as per the exclude directive or --exclude option) and run it as a separate build.

I also think we should add an out-of-box support for popular generators like ScalaPB.

Agreed.

@tgodzik
Copy link
Member

tgodzik commented Mar 31, 2025

I think @tgodzik meant the former (and it being in a separate scope means that it has to be built and run separately from main/test scopes).

Yep! That's what I meant.

@Mee-Tree
Copy link
Author

Hm... 2 options I see here.

  • script.sourcegen.scala (or script.sourcegen.sc?) is a script which itself runs the source generator, in which case any Scala code in there is used to configure and run the generator (and it should be otherwise excluded from the built Scala CLI project, and rather processed separately)
  • script.sourcegen.scala is the place to put directives configuring the generator, but is otherwise a standard .scala file in the Scala CLI project (similar to how we treat project.scala - it's where all directives should (optionally) go, but it can contain any code the user wants there for whatever reason)

I think @tgodzik meant the former (and it being in a separate scope means that it has to be built and run separately from main/test scopes).

If I understood you correctly, that means going back to only supporting generators written in Scala, which I believe should not be the go-to solution as discussed in the previous comments.

The latter option is what I've been referring to in #3583 (comment) and seems to me like a good idea to try.

@tgodzik
Copy link
Member

tgodzik commented Mar 31, 2025

If I understood you correctly, that means going back to only supporting generators written in Scala,

I think if we provide a utility trait or interface SourceGenerator, it would work for both really.

trait SourceGenerator {

  final def main(args: Array[String]) = {
      if (command.nonEmpty). System.process(command :: inputs)
      else invoke() 
   }

  def command: Seq[String] = Nil

  def invoke(): Unit = {

   }
  
  def inputs: List[String] 
  ...
}

Something along this lines, what do you think?

@Mee-Tree
Copy link
Author

Mee-Tree commented Apr 4, 2025

Something along this lines, what do you think?

I'm a bit confused, I thought the idea was to rely on bloop to execute the command and track its inputs/outputs.

@tgodzik
Copy link
Member

tgodzik commented Apr 5, 2025

That is the idea, sure. I was thinking of compiling the script first and then providing it to Bloop as a jar to run. This would basically be just a main class once you implement it.

Not 100% sure how doable it is. If it's much harder to do, we might need to reconsider.

@Mee-Tree
Copy link
Author

That is the idea, sure. I was thinking of compiling the script first and then providing it to Bloop as a jar to run. This would basically be just a main class once you implement it.

But this way we would lose the ability to track input files, since they would be hardcoded inside the generator itself and couldn't be passed to Bloop.

@tgodzik
Copy link
Member

tgodzik commented Apr 10, 2025

That is the idea, sure. I was thinking of compiling the script first and then providing it to Bloop as a jar to run. This would basically be just a main class once you implement it.

But this way we would lose the ability to track input files, since they would be hardcoded inside the generator itself and couldn't be passed to Bloop.

Might make sense to have some source gen directives then in that case.

trait SourceGenerator {
  // this would be invoked by Scala CLI so we do what we want here
  final def main(args: Array[String]) = {
      invoke(args.take(args.length - 1).map(Paths.get), Paths.get(args.last)) 
   }
  // should it be outputs?
  def invoke(input:Path, output: Path): Unit =
}

The initially proposed would look like:

// In protobuf.sourcegen.scala
//> using sourceGenerator.input     file.proto
//> using sourceGenerator.output  Generated.scala
//> using sourceGenerator.glob      *.txt

object ProtobugGenerator extends SourceGenerator {
  def invoke(input:Path, output: Path): Unit ={
     System.process("proto" :: input :: output)
   }
}

The main problem we want to avoid is having any kind of DSL in using directives. Having them as separate files would allow us for flexibility and we could even do.

// In protobuf.sourcegen.scala
//> using sourceGenerator.input     file.proto
//> using sourceGenerator.output  Generated.scala
//> using sourceGenerator.glob      *.txt
//> using dep org.example:my-generator:2.3.4

One generator should defined by file and then //> using source.generator ../protobuf.sourcegen.scala

Opinions? @Gedochao ?

@Mee-Tree
Copy link
Author

Are there any pros/cons of defining the command as a class instead of a directive?

// In protobuf.sourcegen.scala
//> using sourceGenerator.input    file.proto
//> using sourceGenerator.output   Generated.scala
//> using sourceGenerator.glob     *.txt
//> using sourceGenerator.command  proto ${input} ${output}

@tgodzik
Copy link
Member

tgodzik commented Apr 10, 2025

//> using sourceGenerator.command proto ${input} ${output}

This becomes a DSL, which we want to avoid (and I think we can't actually do formally). There is no other command currently that does it.

It's also not possible to write new source generators easily.

But if we can detect the main class from the source generator and use that in the soruce generators section of Bloop.

@Mee-Tree
Copy link
Author

This becomes a DSL, which we want to avoid (and I think we can't actually do formally). There is no other command currently that does it.

Isn't ${.} also a DSL? This doesn't seem that different.

We can also use the current format (without scalacenter/bloop#2646) that just appends output and inputs to the command.

It's also not possible to write new source generators easily.

Can you please elaborate on this? Seems like it's the other way around as we don't have to write a Scala wrapper around the command.

@tgodzik
Copy link
Member

tgodzik commented Apr 10, 2025

Isn't ${.} also a DSL? This doesn't seem that different.

We couldn't actually make some features work without this, I wouldn't want to extend that to even more DSL if we can make it work without

Can you please elaborate on this? Seems like it's the other way around as we don't have to write a Scala wrapper around the command.

For existing ones it makes it a bit tougher, but we can release a bunch even ourselves together with Scala CLI. But any new ones will be quite easy to do even using just the scalameta parser etc.

@Mee-Tree
Copy link
Author

For existing ones it makes it a bit tougher, but we can release a bunch even ourselves together with Scala CLI. But any new ones will be quite easy to do even using just the scalameta parser etc.

I don't really understand what would be the difference between the existing and new ones.

The way I see it is that we either have the command by itself or the the same command but wrapped inside a Scala class.
In the case when generator itself is written in Scala we can just do //> using sourgen.command scala-cli run gen.scala.

Please correct me if I'm wrong.

@dos65
Copy link
Contributor

dos65 commented Apr 10, 2025

I see that design discussion is definitely hard.

I'm wondering if we can reduce this task only to build-in generators. There are not a lot of generators that people use in sbt.
I think the most popular are scalapb and sbt-buildinfo. I feel like having such generators might cover 90% of needs in them.

Can we do only scalapb support for a start? Using existing bloop support. With smth like that:

// enables scalapb generator 
//> using sourceGenerator.scalapb
// all other settings are optional with default values
//> using sourceGenerator.scalapb.version 0.11.17
//> using sourceGenerator.scalapb.input proto-dir
...

This would allow not solve this design issue.

@Gedochao @tgodzik wdyt?

@Gedochao
Copy link
Contributor

Can we do only scalapb support for a start? Using existing bloop support. With smth like that:

By all means, doing a PoC with support for just one built-in source generator is quite alright, just to get us started.

I believe the discussion came out of the desire to find a cure-all, or at least a solution that'd be easily extendable in the future. We don't want to introduce a syntax that we'd get rid of afterwards, even under experimental

// enables scalapb generator 
//> using sourceGenerator.scalapb
// all other settings are optional with default values
//> using sourceGenerator.scalapb.version 0.11.17
//> using sourceGenerator.scalapb.input proto-dir

If we were to treat hardcoded source generators separately, I think this kind of syntax is alright.
This doesn't close the door for custom solutions, like:

// the directives specific to the custom (non-hardcoded) generator would go into the wrapper
//> using sourceGenerator.custom path/to/scala/wrapper

@tgodzik second opinion?

@Gedochao
Copy link
Contributor

Isn't ${.} also a DSL? This doesn't seem that different.

That's... grey zone, I suppose. We are not supposed to introduce any further DSL into standard feature directives, and the one you mention perhaps shouldn't even be there.
If we were to introduce any DSL, it would effectively lock the feature as experimental indefinitely, as per past SIP requirements.

@tgodzik
Copy link
Member

tgodzik commented Apr 11, 2025

Right, let's do that. Hardcoded source generators will allow us to do some custom ones later anyway.

For custom ones we could actually just have them released separately and add some utilities for that. Having the full source for them inside the current scala cli project, would mean we would need to add another scope and make sure tooling works properly.

TLDR;

Let's just do built-in ones for now.

@lbialy
Copy link

lbialy commented Apr 11, 2025

sorry to crash into the discussion so late but I have an impression that the topic of multiple generators for the same source isn't really handled? (for example .graphql -> Typescript types AND scala sources or openapi.yaml -> client AND server sources). Or am I mistaken and it's just a matter of creating a separate file with sourcegen directives in proposed approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants