Improve current semanticdb processing #6585

lefou · 2026-01-14T10:35:48Z

lefou
Jan 14, 2026
Maintainer

This is related to the state of JavaModule/ScalaModule in Mill 1.0 / 1.1.

I'd like to split the problem and provide clear solutions separated from each other.

Issue 1: Bad compilation performance

we currently compile too much, since the semanticDbData task duplicates compilation work already done in the compile task.
by always compiling with the semanticDB generator enabled, we could optimize the compilation for the BSP use case and would also ensure sync'ed results, but we potenially leak unwanted semanticDb data downstream.

Issue 2: Decide when we need semanticDB data

explicit: user enabled semanticDB in the module via scalacOptions - the compile result will contain the semanticDB data files and we should not not apply any extra processing
implicit: e.g. user uses Metals as the IDE - the compile task should not contain any semanticDB data files, as these are considered unwanted results (e.g. they should not appear downstream on classpaths or in jars)
no: we don't need semanticDB data at all - we should not generate it

Proposal for Issue 1:

Disclaimer: proposed task names are not final but choosen to make the concept clear

create a new persistent compileWithMaybeSemanticDb task which does the actual compilation, and include semanticDB data if we, for some reasons, need them.
Let the compile task use the result of compileWithMaybeSemanticDb but filter out semanticDB data, iff it was not explicitly requested by the module configuration, e.g. via scalacOptions.
Let the semanticDbData task use the result of compileWithMaybeSemanticDb and filter out any non-semanticDB data.
We keep the current concept of well-separated tasks with well-defined results. All downstream users, esp. the BSP client or mill-scalafix plugin keep as-is, but better performing.

Ideas for Issue 2:

Maybe too simple, but we could always generate semanticDB data in compileWithMaybeSemanticDb and just don't use it downstream if nobody is interested in it. This has an overhead of up to 20 percent in case nobody is going to need it. (It may also conflict with other compiler plugins and fail the compilation that otherwise would succeed, but that's unlikely.)
Smart-decision for semanticDB data need. Either project use of the semanticDbData task or any BSP use should permanently enable it. This must be a bullet-proof design, well-documented and users need a way to disable it (opt-out or opt-in).

To detect BSP, we should just use the fact that a Mill-generated .bsp/mill-bsp.json file is present, since this won't require any extra book keeping. Users, who don't want to use BSP can also safely remove that file. We could also write an extra file next to this location, so we can check its age, for example.

Once the semanticDbData task is used/planned, we may record that fact under the namespace of a dedicated module-specific persistent task semanticDbDataGenerationWanted. This has the issue that the semanticDbData task is a downstream dependency of compileWithMaybeSemanticDb task and we currently can't know if the initial value is correct (so factually we always need to guess "enabled"). It would be really cool to have some way to detect early what the user is going to run. E.g. if we could expose the current execution plan via the TaskCtx. That way we could have an early persistent task semanticDbDataGenerationWanted that decides based on it's persistent state and the fact whether the semanticDbData task is requested. Then we could conservatively default to disabled, unless there is more evidence.

Originally posted by @lefou in #5841 (comment)

HollandDM · 2026-02-09T04:49:05Z

HollandDM
Feb 9, 2026

Issue 2: Decide when we need semanticDB data

I wonder, when mill-bsp server uses a separate our dir, does the compile result get shared with mill, or does it not
If it not shared, then we can enable semanticDB when user explicitly said to set MILL_NO_SEPARATE_BUILD... env to 1

1 reply

lefou Feb 9, 2026
Maintainer Author

Using distinct out dirs for BSP vs. CLI means, nothing gets shared, except external caches like coursier.

SemanticDB is only implicitly needed for Metals, but BSP is also used for IntelliJ IDEA. I'm pretty sure, IDEA users aren't interested in burning extra CPU for unused SemanticDB data.

So, do Mill need to generate SemanticDB, when users use:

CLI only => no
IntelliJ IDEA with GenIdea => no
IntelliJ IDEA with BSP => no
Metals => yes (in BSP out)
mill-scalafix plugin use (mill __.fix) => yes on-demand (in normal out)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve current semanticdb processing #6585

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Improve current semanticdb processing #6585

Uh oh!

Uh oh!

lefou Jan 14, 2026 Maintainer

Issue 1: Bad compilation performance

Issue 2: Decide when we need semanticDB data

Proposal for Issue 1:

Ideas for Issue 2:

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

HollandDM Feb 9, 2026

Uh oh!

Uh oh!

lefou Feb 9, 2026 Maintainer Author

lefou
Jan 14, 2026
Maintainer

Replies: 1 comment 1 reply

HollandDM
Feb 9, 2026

lefou Feb 9, 2026
Maintainer Author