Skip to content

Conversation

@simonmar
Copy link
Collaborator

@simonmar simonmar commented May 6, 2025

Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

  • this captures a lot more xrefs (e.g. local variables)
  • it has more information (distinguishes functions/classes/constructors etc.)
  • it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
  • it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

  • we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

  • extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2025
@netlify
Copy link

netlify bot commented May 6, 2025

Deploy Preview for fb-oss-glean canceled.

Name Link
🔨 Latest commit 9c72cae
🔍 Latest deploy log https://app.netlify.com/projects/fb-oss-glean/deploys/6832132ec8befb0008563d57

@pepeiborra
Copy link
Contributor

Something's broken in glass

@facebook-github-bot
Copy link
Contributor

@jjuliamolin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jjuliamolin
Copy link
Contributor

Thanks Simon this is great :) I'll have a look at this one, will need to fix some internal dependencies

@simonmar
Copy link
Collaborator Author

simonmar commented May 12, 2025

@jjuliamolin you probably want to re-import, I just pushed a new version of this branch with:

  • rebased
  • added Glass Symbol ID handling
  • fixed Glass search by name + case insensitive search
  • added a Glass regression test
  • removed the old indexer
  • some bug fixes

Note: stacked on #514

@pepeiborra
Copy link
Contributor

It's going to be quite tricky to merge this as it requires changes to internal TARGETS files. Can you fix the CI errors @simonmar and let us know when to reimport?

@facebook-github-bot
Copy link
Contributor

@jjuliamolin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jjuliamolin
Copy link
Contributor

I'll import a temporary one to check the internal build targets, will re-import again if there's CI signals to fix here

@simonmar
Copy link
Collaborator Author

simonmar commented May 14, 2025

Alright, it looks like the failures are:

  • GHC 9.2.8: ordering issues in the snapshot output. This should be fixable
  • GHC 9.4.7: failure in glass-snapshot-flow, no idea what might be causing this yet. (EDIT: nevermind, I just pushed a fix, hopefully)
  • GHC 8.10.7 and earlier: build failures in the indexer due to differences in the GHC API. I might just disable the indexer for these.

@simonmar simonmar force-pushed the new-hs-indexer branch 4 times, most recently from 3d106fe to 8f49a6b Compare May 14, 2025 12:35
@simonmar
Copy link
Collaborator Author

@jjuliamolin @pepeiborra OK, CI is green. I'm sure this is going to be non-trivial to get working with the internal build system so no worries if it takes a while! I'm not in a rush to get it landed. I'll stop modifying this PR so that you have something stable to work with and create separate PRs for future changes, although note that I'll have to stack new PRs on top of this one.

@facebook-github-bot
Copy link
Contributor

@jjuliamolin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jjuliamolin
Copy link
Contributor

jjuliamolin commented May 23, 2025

Sorry that this is taking longer than anticipated.. We're close to landing it internally, but I found some backward incompatibilities from the generated schema thrift files:

Module_key in hs.thrift
1: hs.PackageId -> hs.ModuleName
2: hs.ModuleName -> hs.UnitName

SourceModule_key in hs.thrift
1: hs.ModuleName -> src.File
2: src.File -> hs.Module

SearchByName_key search_hs.thrift
2: code_hs.Entity -> hs.Name

Can you take a look @simonmar, maybe we need evolve there, in the angle schemas

Also I noticed there's a bug somewhere that creates build errors for us when Entity = Pred , instead of {p = P }, from code.hs if you want to land faster it might be worth just changing this..

@simonmar
Copy link
Collaborator Author

@jjuliamolin

I found some backward incompatibilities from the generated schema thrift files:

you can ignore errors from the Thrift linter about backward-incompatibilities in the Thrift files generated from the schema, they aren't relevant to Glean because Glean has its own backwards-compatibility handling for schemas. As long as the schema passes the validation check, we're all good.

Also I noticed there's a bug somewhere that creates build errors for us when Entity = Pred , instead of {p = P }, from code.hs if you want to land faster it might be worth just changing this..

could you give more details? What fails and what is the error?

@jjuliamolin
Copy link
Contributor

jjuliamolin commented May 23, 2025

Ah alright great, wasn't sure what the consequences could be..

types_cpp_splits=1 is misconfigured: it can not be greater than the number of objects, which is 0.

it's coming from the types_cpp_splits in targets file generated by
glean/schema/gen/Glean/Schema/Gen/Thrift.hs

haven't had a closer look, but guess we'll just need to change how we count predicates there

@simonmar
Copy link
Collaborator Author

types_cpp_splits=1 is misconfigured: it can not be greater than the number of objects, which is 0.

it's coming from the types_cpp_splits in targets file generated by glean/schema/gen/Glean/Schema/Gen/Thrift.hs

haven't had a closer look, but guess we'll just need to change how we count predicates there

OK, so something weird with compiling the generated Thrift file to C++, which wouldn't show up in the OSS build because we don't do that.

If this is due to type Entity = hs.Name then I want to change that anyway, we should make it into a union type to allow for future extensions. I'll update the PR shortly.

simonmar added 6 commits May 24, 2025 17:13
Avoids some cases where the Glean inferred type can diverge from the
type expected at the call site.
Redesigned the schema and rewrote the indexer. Compared with the
previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an
entity and corresponds fairly closely to GHC's Name, including
OccName. The main difference is we don't store Uniques, instead we
distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one
for the codemarkup layer.

Not done yet:

* Removing all the old stuff

* more Glass integration, e.g. Symbol ID

* we can extract types from the .hie file too, and provide type hovers
  in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls
  should be possible but it's not straightforward using .hie. I'm still
  thinking about how best to do that.
Seems to be some issue with record field assignments, I couldn't repro
it with a standalone test but it showed up when I indexed all of
Glean. The hie file has a reference but no definition for the name.
@simonmar simonmar changed the title New Haskell Indexer [new-hs-indexer #4] New Haskell Indexer May 24, 2025
@simonmar
Copy link
Collaborator Author

@jjuliamolin I fixed that issue, and also rebased the whole stack (and added a couple more commits at the bottom). All the PRs are numbered - e.g. this one is [new-hs-indexer #4] - so you can see which order to import/land them.

@facebook-github-bot
Copy link
Contributor

@jjuliamolin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit to facebook/fb303 that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/CacheLib that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/fboss that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/mvfst that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/openr that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebookexperimental/moxygen that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebookexperimental/edencommon that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/fbthrift that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/watchman that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/folly that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/proxygen that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/wangle that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebookincubator/katran that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebookincubator/fizz that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebookincubator/hsthrift that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/sapling that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebookexperimental/rust-shed that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
facebook-github-bot pushed a commit to facebook/hhvm that referenced this pull request May 30, 2025
Summary:
Redesigned the schema and rewrote the indexer. Compared with the previous indexer:

* this captures a lot more xrefs (e.g. local variables)
* it has more information (distinguishes functions/classes/constructors etc.)
* it is much simpler and probably faster, because it doesn't go via hiedb, it reads .hie files directly.
* it is probably more correct, I fixed a lot of things.

The schema is carefully designed so that a Name uniquely identifies an entity and corresponds fairly closely to GHC's Name, including OccName. The main difference is we don't store Uniques, instead we distinguish local Names by including their ByteSpan.

There are a couple of snapshot tests, one for the plain indexer and one for the codemarkup layer, and a Glass regression test.

Not done yet:

* we can extract types from the .hie file too, and provide type hovers in Glass. That wouldn't be too hard.

* extracting more structure so that we can reconstruct data/class decls should be possible but it's not straightforward using .hie. I'm still thinking about how best to do that.

X-link: facebookincubator/Glean#511

Reviewed By: rubmary

Differential Revision: D74400980

Pulled By: jjuliamolin

fbshipit-source-id: 6cb183b96ef1c7030c8b7278434f84a4d72ceb28
@jjuliamolin
Copy link
Contributor

Sorry broke the open source regression tests, had to split them into fb/oss.. maybe you can re-generate them @simonmar, or anyone else. I just don't have the access to the actual open source checkout yet so can't generate with the same result

@simonmar
Copy link
Collaborator Author

@jjuliamolin thanks for landing this! I fixed up the tests in this PR: #530 and also rebased the rest of the stack. I'm curious how the output differs when you run the tests internally, maybe that's something we can fix?

@jjuliamolin
Copy link
Contributor

@simonmar we'll pass the hie files to the binary (not using the --with-ghc option), so one difference that I didn't really managed to get around was the filepath. Also "unit" is different, seems to be based on the build targets, but haven't really looked into this one

@jjuliamolin
Copy link
Contributor

from hsDeclarationLocation.out

...
"file": { "key": "fbcode/glean/lang/haskell/tests/code/A.hs" },
"name": {
"key": {
"mod": {
"key": {
"name": { "key": "A" },
"unit": { "key": "fbcode-glean-lang-haskell-tests-code-library" }
..
library is a build target in tests/code

@simonmar
Copy link
Collaborator Author

simonmar commented Jun 2, 2025

@jjuliamolin yeah I see. OK having separate output for the internal & OSS tests is probably the right thing to do then. Great that you managed to get the tests working internally.

facebook-github-bot pushed a commit that referenced this pull request Jun 13, 2025
Summary:
It's useful to be able to know whether a ref is in an import or export,
e.g. for dead code detection, so let's tag refs with an enum to say what
kind of ref it is.

I changed field names so that the schema change shouldn't trigger a
validation failure (I hope).

Note: stacked on #511

Pull Request resolved: #518

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Rollback Plan:

Reviewed By: donsbot

Differential Revision: D75066884

Pulled By: bochko

fbshipit-source-id: b58f19b3bc9d4261b38640e30833916b654adc14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants