Introduce source generator directives #3583

Mee-Tree · 2025-03-21T10:40:43Z

This PR aims to add support for source generators in scala-cli, addressing #610 and providing an alternative approach to #3033.

Source generators can be configured using "named" directives. This allows for multiple generators to be configured independently and provides a clean way to specify inputs, outputs, and other parameters.

The approach can be further improved to include standard predefined generators (ex: scalapb, smithy4s), which would require only minimal configuration.

Features

Named source generator configurations using the new [name] syntax
Arbitrary command execution with cached results
Support for both relative and absolute paths
Input and output location defaults to the current directory if not specified

Example

//> using sourceGenerator.[hello].input     in
//> using sourceGenerator.[hello].output    out
//> using sourceGenerator.[hello].glob      *.txt
//> using sourceGenerator.[hello].command   python ${.}/hello.py
//> using sourceGenerator.[hello].unmanaged hello.py

A full example showcasing all possibilities can be found here.

Known Issues

Execution order for multiple generators is undefined
Incremental compilation is not working
No tests
Metals triggers the generator multiple times on save
Metals doesn't work with the scalapb generator in the full example (likely due to the generator's source code being in the same project)

Please let me know if you have feedback about this implementation or any of the known issues.

Gedochao

Hey, thanks for the contribution!
There's one problem with this implementation: the named directive keys syntax.
As per SIP-46, directives have to be clean, simple pairings of key strings and values. No DSL within either keys or values is allowed.
If we were to use syntax such as this, it could never leave experimental, so it's definitely a no-go.

Perhaps the individual source generators could be configured separately, in a dedicated directory or input.

You may refer to the previous attempts at this:

#3033
#3035

#3033 in particular is in unfinished status, it may be a good idea to try to finish it, instead of a completely fresh approach.

Gedochao · 2025-03-24T10:47:56Z

As for the listed known issues:

Execution order for multiple generators is undefined

I think this can be addressed in a subsequent PR, for as long as the feature is experimental.

Incremental compilation is not working

...same as execution order, although this is perhaps more important.

No tests

tests will be necessary before we merge anything, even in experimental status.

Metals triggers the generator multiple times on save
Metals doesn't work with the scalapb generator in the full example (likely due to the generator's source code being in the same project)

Tagging @tgodzik and @kasiaMarek for potential help with this

Gedochao · 2025-03-24T10:48:46Z

Other than that, please let me and/or @tgodzik know if you need help with this!

Mee-Tree · 2025-03-24T12:00:52Z

Hey, thanks for the contribution! There's one problem with this implementation: the named directive keys syntax. As per SIP-46, directives have to be clean, simple pairings of key strings and values. No DSL within either keys or values is allowed. If we were to use syntax such as this, it could never leave experimental, so it's definitely a no-go.

Perhaps the individual source generators could be configured separately, in a dedicated directory or input.

You may refer to the previous attempts at this:

Add supports for using Source Generator using Directives #3033

[OLD PoC] Source generators #3035

#3033 in particular is in unfinished status, it may be a good idea to try to finish it, instead of a completely fresh approach.

Thank you for the feedback!

I've already seen both of these PRs (the first one is referenced in the description). The approach in #3033 of having directives in the generator itself has a significant limitation - it restricts generators to be written in Scala.

Given the constraints of directives syntax this currently leaves us with two options:

Follow the original solution proposed in Support code generators #610:
```
//> using sourceGenerator key1|key2|key3|key4|...
```
However, this approach has its drawbacks:
- Readability decreases as the number of keys grows
- Parameter order becomes critical and error-prone
- Support for default values couldn't be properly implemented
Only support a single generator configuration, while this may seem limiting, it could be practical because:
- Most projects only need one generator
- Multiple generators can be supported through a nested script.

What do you think about these? Would you prefer either of them?

I'll try to think of something else in the meantime.

Mee-Tree · 2025-03-24T12:10:40Z

As for the listed known issues:

Execution order for multiple generators is undefined

I think this can be addressed in a subsequent PR, for as long as the feature is experimental.

Incremental compilation is not working

...same as execution order, although this is perhaps more important.

No tests

tests will be necessary before we merge anything, even in experimental status.

I wasn't really planning on leaving these issues unaddressed. I plan to work on all of them in this PR once we decide on the approach.

Gedochao · 2025-03-24T12:17:04Z

The approach in #3033 of having directives in the generator itself has a significant limitation - it restricts generators to be written in Scala.

I wonder if we could have a .scala configuration file with configuration directives per generator, where one of the directives would specify how the generator is to be launched... so the generator wouldn't have to be written in Scala, it would just need a Scala definition to wrap it? That wouldn't be so different from what you proposed initially, but wouldn't require DSL in the directive keys. Scala CLI could then accept a list of directories with generator definitions. Ofc, the directories would have to be excluded from the main build... just theorising.

Only support a single generator configuration, while this may seem limiting, it could be practical

For an initial implementation, I believe that supporting just a single generator would actually be sufficient (although we'd have to still leave the door at least half-open for expanding it in the future).

tgodzik · 2025-03-25T18:29:43Z

Should we use special extensions for source generators? Something like script.sourcegen.scala and then //> using sourcegen script when we want to use it?

We could define everything we need there and wrap the command needed (just invoke whatever we need). It would need to be a separate scope.

Mee-Tree · 2025-03-29T00:56:10Z

Should we use special extensions for source generators? Something like script.sourcegen.scala and then //> using sourcegen script when we want to use it?

We could define everything we need there and wrap the command needed (just invoke whatever we need). It would need to be a separate scope.

This sounds good! The only problem I see so far is that, as this would essentially be a Scala file, it would allow any Scala code, not only directives.

We can handle this in several ways:

Only read the directives.

Should it fail if it has something else? Or should it just ignore everything besides directives?
The latter option might be a poor design decision, since if code isn't meant to be handled, it probably shouldn't be there in the first place (except for comments).
Treat it as a regular Scala file.

I can't think of what kind of code would be appropriate to put there or how should we handle it.
For example, what should happen if it has a main method?

What are your thoughts on these?

Mee-Tree · 2025-03-29T01:22:05Z

I also think we should add an out-of-box support for popular generators like ScalaPB.

For example, with this line in project.scala:

//> using sourcegen scalapb

This directive would:

Look for a configuration file named scalapb.sourcegen.scala
If the file is not found, use a predefined configuration
If the file exists, use it to override specific fields of configuration

Gedochao · 2025-03-31T07:25:55Z

@Mee-Tree

Should we use special extensions for source generators? Something like script.sourcegen.scala and then //> using sourcegen script when we want to use it?
We could define everything we need there and wrap the command needed (just invoke whatever we need). It would need to be a separate scope.

This sounds good! The only problem I see so far is that, as this would essentially be a Scala file, it would allow any Scala code, not only directives.

We can handle this in several ways:

Only read the directives.
Should it fail if it has something else? Or should it just ignore everything besides directives?
The latter option might be a poor design decision, since if code isn't meant to be handled, it probably shouldn't be there in the first place (except for comments).

Treat it as a regular Scala file.
I can't think of what kind of code would be appropriate to put there or how should we handle it.
For example, what should happen if it has a main method?

What are your thoughts on these?

Hm... 2 options I see here.

script.sourcegen.scala (or script.sourcegen.sc?) is a script which itself runs the source generator, in which case any Scala code in there is used to configure and run the generator (and it should be otherwise excluded from the built Scala CLI project, and rather processed separately)
script.sourcegen.scala is the place to put directives configuring the generator, but is otherwise a standard .scala file in the Scala CLI project (similar to how we treat project.scala - it's where all directives should (optionally) go, but it can contain any code the user wants there for whatever reason)

I think @tgodzik meant the former (and it being in a separate scope means that it has to be built and run separately from main/test scopes).

As for how to achieve a separate scope, I wouldn't create a separate scope as per Scala CLI internals (you could, but it will be complex to implement, and I don't think it necessary). I would rather exclude the script from the Scala CLI app (as per the exclude directive or --exclude option) and run it as a separate build.

I also think we should add an out-of-box support for popular generators like ScalaPB.

Agreed.

tgodzik · 2025-03-31T08:03:06Z

I think @tgodzik meant the former (and it being in a separate scope means that it has to be built and run separately from main/test scopes).

Yep! That's what I meant.

Mee-Tree · 2025-03-31T08:30:22Z

Hm... 2 options I see here.

script.sourcegen.scala (or script.sourcegen.sc?) is a script which itself runs the source generator, in which case any Scala code in there is used to configure and run the generator (and it should be otherwise excluded from the built Scala CLI project, and rather processed separately)

script.sourcegen.scala is the place to put directives configuring the generator, but is otherwise a standard .scala file in the Scala CLI project (similar to how we treat project.scala - it's where all directives should (optionally) go, but it can contain any code the user wants there for whatever reason)

I think @tgodzik meant the former (and it being in a separate scope means that it has to be built and run separately from main/test scopes).

If I understood you correctly, that means going back to only supporting generators written in Scala, which I believe should not be the go-to solution as discussed in the previous comments.

The latter option is what I've been referring to in #3583 (comment) and seems to me like a good idea to try.

tgodzik · 2025-03-31T08:58:32Z

If I understood you correctly, that means going back to only supporting generators written in Scala,

I think if we provide a utility trait or interface SourceGenerator, it would work for both really.

trait SourceGenerator {

  final def main(args: Array[String]) = {
      if (command.nonEmpty). System.process(command :: inputs)
      else invoke() 
   }

  def command: Seq[String] = Nil

  def invoke(): Unit = {

   }
  
  def inputs: List[String] 
  ...
}

Something along this lines, what do you think?

Mee-Tree · 2025-04-04T09:54:42Z

Something along this lines, what do you think?

I'm a bit confused, I thought the idea was to rely on bloop to execute the command and track its inputs/outputs.

tgodzik · 2025-04-05T07:15:24Z

That is the idea, sure. I was thinking of compiling the script first and then providing it to Bloop as a jar to run. This would basically be just a main class once you implement it.

Not 100% sure how doable it is. If it's much harder to do, we might need to reconsider.

Mee-Tree · 2025-04-10T13:11:41Z

That is the idea, sure. I was thinking of compiling the script first and then providing it to Bloop as a jar to run. This would basically be just a main class once you implement it.

But this way we would lose the ability to track input files, since they would be hardcoded inside the generator itself and couldn't be passed to Bloop.

tgodzik · 2025-04-10T13:58:08Z

That is the idea, sure. I was thinking of compiling the script first and then providing it to Bloop as a jar to run. This would basically be just a main class once you implement it.

But this way we would lose the ability to track input files, since they would be hardcoded inside the generator itself and couldn't be passed to Bloop.

Might make sense to have some source gen directives then in that case.

trait SourceGenerator {
  // this would be invoked by Scala CLI so we do what we want here
  final def main(args: Array[String]) = {
      invoke(args.take(args.length - 1).map(Paths.get), Paths.get(args.last)) 
   }
  // should it be outputs?
  def invoke(input:Path, output: Path): Unit =
}

The initially proposed would look like:

// In protobuf.sourcegen.scala
//> using sourceGenerator.input     file.proto
//> using sourceGenerator.output  Generated.scala
//> using sourceGenerator.glob      *.txt

object ProtobugGenerator extends SourceGenerator {
  def invoke(input:Path, output: Path): Unit ={
     System.process("proto" :: input :: output)
   }
}

The main problem we want to avoid is having any kind of DSL in using directives. Having them as separate files would allow us for flexibility and we could even do.

// In protobuf.sourcegen.scala
//> using sourceGenerator.input     file.proto
//> using sourceGenerator.output  Generated.scala
//> using sourceGenerator.glob      *.txt
//> using dep org.example:my-generator:2.3.4

One generator should defined by file and then //> using source.generator ../protobuf.sourcegen.scala

Opinions? @Gedochao ?

Mee-Tree · 2025-04-10T14:44:56Z

Are there any pros/cons of defining the command as a class instead of a directive?

// In protobuf.sourcegen.scala
//> using sourceGenerator.input    file.proto
//> using sourceGenerator.output   Generated.scala
//> using sourceGenerator.glob     *.txt
//> using sourceGenerator.command  proto ${input} ${output}

tgodzik · 2025-04-10T15:31:00Z

//> using sourceGenerator.command proto ${input} ${output}

This becomes a DSL, which we want to avoid (and I think we can't actually do formally). There is no other command currently that does it.

It's also not possible to write new source generators easily.

But if we can detect the main class from the source generator and use that in the soruce generators section of Bloop.

Mee-Tree · 2025-04-10T16:17:22Z

This becomes a DSL, which we want to avoid (and I think we can't actually do formally). There is no other command currently that does it.

Isn't ${.} also a DSL? This doesn't seem that different.

We can also use the current format (without scalacenter/bloop#2646) that just appends output and inputs to the command.

It's also not possible to write new source generators easily.

Can you please elaborate on this? Seems like it's the other way around as we don't have to write a Scala wrapper around the command.

tgodzik · 2025-04-10T16:20:46Z

Isn't ${.} also a DSL? This doesn't seem that different.

We couldn't actually make some features work without this, I wouldn't want to extend that to even more DSL if we can make it work without

Can you please elaborate on this? Seems like it's the other way around as we don't have to write a Scala wrapper around the command.

For existing ones it makes it a bit tougher, but we can release a bunch even ourselves together with Scala CLI. But any new ones will be quite easy to do even using just the scalameta parser etc.

Mee-Tree · 2025-04-10T17:07:49Z

For existing ones it makes it a bit tougher, but we can release a bunch even ourselves together with Scala CLI. But any new ones will be quite easy to do even using just the scalameta parser etc.

I don't really understand what would be the difference between the existing and new ones.

The way I see it is that we either have the command by itself or the the same command but wrapped inside a Scala class.
In the case when generator itself is written in Scala we can just do //> using sourgen.command scala-cli run gen.scala.

Please correct me if I'm wrong.

dos65 · 2025-04-10T20:40:17Z

I see that design discussion is definitely hard.

I'm wondering if we can reduce this task only to build-in generators. There are not a lot of generators that people use in sbt.
I think the most popular are scalapb and sbt-buildinfo. I feel like having such generators might cover 90% of needs in them.

Can we do only scalapb support for a start? Using existing bloop support. With smth like that:

// enables scalapb generator 
//> using sourceGenerator.scalapb
// all other settings are optional with default values
//> using sourceGenerator.scalapb.version 0.11.17
//> using sourceGenerator.scalapb.input proto-dir
...

This would allow not solve this design issue.

@Gedochao @tgodzik wdyt?

Gedochao · 2025-04-11T06:46:18Z

Can we do only scalapb support for a start? Using existing bloop support. With smth like that:

By all means, doing a PoC with support for just one built-in source generator is quite alright, just to get us started.

I believe the discussion came out of the desire to find a cure-all, or at least a solution that'd be easily extendable in the future. We don't want to introduce a syntax that we'd get rid of afterwards, even under experimental

// enables scalapb generator 
//> using sourceGenerator.scalapb
// all other settings are optional with default values
//> using sourceGenerator.scalapb.version 0.11.17
//> using sourceGenerator.scalapb.input proto-dir

If we were to treat hardcoded source generators separately, I think this kind of syntax is alright.
This doesn't close the door for custom solutions, like:

// the directives specific to the custom (non-hardcoded) generator would go into the wrapper
//> using sourceGenerator.custom path/to/scala/wrapper

@tgodzik second opinion?

Gedochao · 2025-04-11T06:48:54Z

Isn't ${.} also a DSL? This doesn't seem that different.

That's... grey zone, I suppose. We are not supposed to introduce any further DSL into standard feature directives, and the one you mention perhaps shouldn't even be there.
If we were to introduce any DSL, it would effectively lock the feature as experimental indefinitely, as per past SIP requirements.

tgodzik · 2025-04-11T10:11:43Z

Right, let's do that. Hardcoded source generators will allow us to do some custom ones later anyway.

For custom ones we could actually just have them released separately and add some utilities for that. Having the full source for them inside the current scala cli project, would mean we would need to add another scope and make sure tooling works properly.

TLDR;

Let's just do built-in ones for now.

lbialy · 2025-04-11T11:02:05Z

sorry to crash into the discussion so late but I have an impression that the topic of multiple generators for the same source isn't really handled? (for example .graphql -> Typescript types AND scala sources or openapi.yaml -> client AND server sources). Or am I mistaken and it's just a matter of creating a separate file with sourcegen directives in proposed approach?

Introduce source generator directives

9d26fc5

Mee-Tree marked this pull request as draft March 21, 2025 10:40

Mee-Tree mentioned this pull request Mar 23, 2025

feat: Add command template to source generators scalacenter/bloop#2646

Draft

Gedochao reviewed Mar 24, 2025

View reviewed changes

Introduce source generator directives #3583

Are you sure you want to change the base?

Introduce source generator directives #3583

Uh oh!

Conversation

Mee-Tree commented Mar 21, 2025

Features

Example

Known Issues

Uh oh!

Gedochao left a comment

Choose a reason for hiding this comment

Uh oh!

Gedochao commented Mar 24, 2025

Uh oh!

Gedochao commented Mar 24, 2025

Uh oh!

Mee-Tree commented Mar 24, 2025

Uh oh!

Mee-Tree commented Mar 24, 2025

Uh oh!

Gedochao commented Mar 24, 2025

Uh oh!

tgodzik commented Mar 25, 2025

Uh oh!

Mee-Tree commented Mar 29, 2025

Uh oh!

Mee-Tree commented Mar 29, 2025

Uh oh!

Gedochao commented Mar 31, 2025

Uh oh!

tgodzik commented Mar 31, 2025

Uh oh!

Mee-Tree commented Mar 31, 2025

Uh oh!

tgodzik commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mee-Tree commented Apr 4, 2025

Uh oh!

tgodzik commented Apr 5, 2025

Uh oh!

Mee-Tree commented Apr 10, 2025

Uh oh!

tgodzik commented Apr 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Mee-Tree commented Apr 10, 2025

Uh oh!

tgodzik commented Apr 10, 2025

Uh oh!

Mee-Tree commented Apr 10, 2025

Uh oh!

tgodzik commented Apr 10, 2025

Uh oh!

Mee-Tree commented Apr 10, 2025

Uh oh!

dos65 commented Apr 10, 2025

Uh oh!

Gedochao commented Apr 11, 2025

Uh oh!

Gedochao commented Apr 11, 2025

Uh oh!

tgodzik commented Apr 11, 2025

Uh oh!

lbialy commented Apr 11, 2025

Uh oh!

Uh oh!

tgodzik commented Mar 31, 2025 •

edited

Loading

tgodzik commented Apr 10, 2025 •

edited

Loading