-
Notifications
You must be signed in to change notification settings - Fork 1.4k
add codesearch tool #6458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add codesearch tool #6458
Conversation
|
@sirdarckcat FYI This is something we will be able to reliable use in prod, maintain, extend and fix as necessary. |
|
We have many tools and packages. Documentation page about what tools could be used for agentic games may help others to sharpen the focus. |
19ad222 to
97e8e12
Compare
For now it's just this one. None of the existing tools were developed for agents. |
Here is how it can be wired into aflow infra. It builds and caches the index as part of the workflow, and then uses the index to answer agent queries: |
sirdarckcat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool! This should help resolve the usecase for code navigation.
For more complex queries (like AST and Ctrl flow) do you want to build on top of this as needed? Note that may transform into a lot of work if it turns out it's necessary
Yes.
We already know how to do control and dataflow for interface inference: |
cool, ok! I think the hardest questions we will need to answer (and if we don't, then we need at least a query into git grep):
|
97e8e12 to
cd178f7
Compare
What would be the heuristic to match locks with protected fields? If we also give wrong answers (both false positives and false negatives), it may confuse LLM. |
cd178f7 to
4de58a2
Compare
We would only give this tool to the LLM when it is already debugging a potential lock issue, for example. That said, that feels like a problem for later (as long as we have enough information to answer these questions, whether we do it in a function or let the LLM do it manually is an implementation detail we can decide based on experimentation) |
If you use latest LLVM repo you can just use infer_alloc::inferPossibleType: https://github.com/llvm/llvm-project/blob/29e7b4f9a72576a2901407834b988ec37f931d28/clang/include/clang/AST/InferAlloc.h#L25 Edit: As-is, it finds sizeof expressions. But doesn't automatically figure out the casts, but expects the caller to keep track of the inner-most cast and pass it in. |
More generally, beware of the issues of how kmalloc is designed (macro -> macro -> always_inline -> outline slab function). That is also apparent from the patch I built for the kernel heap partitioning prototype: https://lore.kernel.org/all/20250825154505.1558444-1-elver@google.com/ I suspect that if you build a special purpose kernel-only tool, that can be solved and the right call expression can be fed into infer_alloc::inferPossibleType. |
Let tools verify that all source file names, line numbers, etc are valid/present. If there are any bogus entries, it's better to detect them early, than to crash/error much later when the info is used.
Add a clang tool that is used for code indexing (tools/clang/codesearch/). It follows conventions and build procedure of the declextract tool. Add pkg/codesearch package that aggregates the info exposed by the clang tools, and allows doing simple queries: - show source code of an entity (function, struct, etc) - show entity comment - show all entities defined in a source file Add tools/syz-codesearch wrapper tool that allows to create index for a kernel build, and then run code queries on it.
4de58a2 to
66b8f6e
Compare
add skeleton for code searching tool
Add a clang tool that is used for code indexing (tools/clang/codesearch/).
t follows conventions and build procedure of the declextract tool.
Add pkg/codesearch package that aggregates the info exposed by the clang tools,
and allows doing simple queries:
Add tools/syz-codesearch wrapper tool that allows to create index for a kernel build,
and then run code queries on it.