GSoC 2026 – Interest in Project #36 - Agentic GraphRAG #34671
Replies: 4 comments 2 replies
-
|
Hi Naitik, thanks for reaching out and your expression of interest in two topics. My request: Check if you can have a discussion with @ishaanv1709 on project 36 for mutual collaboration and submission. You can in addition focus on project 37 in alignment with approach on project 36. All of us can get into a common call once you confirm if you are aligned on this approach and have had a discussion with @ishaanv1709. FYI: @ishaanv1709, @14pankaj |
Beta Was this translation helpful? Give feedback.
-
|
I was going through the Graph construction strategy, I realize the right approach depends heavily on two things I wanted to clarify:
Microsoft Graph Rag also suggest two strategies. One is to use LLM and other is to use the traditional NLP methods. Problem with using standard LLM they can take too much time, which is not good for the edge devices. So the other option is using NLP methods or we can use Small language models, with parallel processing. @ishaanv1709 would love your thoughts on this too. |
Beta Was this translation helpful? Give feedback.
-
|
@naitik-2006 : The GraphRAG feature is intended to be offered as a base capability that can be used in multiple use cases cutting across industry segments, with a corpus of complex documents (different formats) and in future in combination with multimodal data too. We have our own plans on that front. To ensure this evolving requirement does not complicate phrasing the problem statement for GSoC, we suggested implementing it over some standard dataset to ensure there is no ambiguity on that front. So, my request is to focus on the VIINA like datasets, but build it such that we can do necessary modifications for complex multimodal data. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @bharagha I have already submitted my proposal for project 36 and 37. But while exploring more on the edge deployment side, came across two things that seem directly relevant:
Would be curious whether this direction aligns with what you have in mind. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @14pankaj @bharagha
I hope this message finds you well. My name is Naitik Agrawal, 3rd year B.Tech + M.Tech student in Mathematics and Computing at IIT BHU (CPI: 9.51), and I am writing to express my strong interest in contributing to the Agentic GraphRAG project (Project 36).
I have been following the discussion on this project including the mentor feedback shared in the community thread and wanted to share how I am incorporating those insights into my approach.
I bring directly relevant experience to this work. This past summer, I built an Agentic GraphRAG powered coding assistant for the Summer of Bitcoin program, designing a scalable hybrid retrieval pipeline combining semantic embeddings, graph re-ranking, and a CLI-based developer tool with conversation summarization and progressive context tracking.
And before that during the Inter-IIT Tech Meet 13.0, I built a two-stage retrieval pipeline with an interleaved reasoning framework for cross-domain QA closely mirroring the multi-hop demands of the VIINA benchmark. I have also co-authored two papers accepted to CVPR 2025 (Main Conference + Best Paper at the AI Storytelling Workshop).
From studying the edge-ai-libraries repository and the mentor feedback, here is my refined technical approach:
Architecture: I plan to replace linear chain in app/chain.py with a stateful LangGraph agent routing queries between a VectorSearchTool and a GraphQueryTool (Text-to-Cypher over Neo4j aligned with the edge deployment preference). A Reflection Agent sits in the generation loop to verify outputs against retrieved context, with latency and reflection effectiveness as first-class evaluation signals.
Launch Mode Flexibility: Taking the mentor's suggestion on board, We can implement a --mode flag (simple-rag | graph-rag) at the CLI/Docker entrypoint level, allowing the application to launch in either mode without code changes. This makes the solution accessible for users who do not yet have a Neo4j instance available, and keeps the upgrade path clean. If no input is provided, Agent will choose the best option for the given query.
Evaluation Framework: Aligned with the mentor's guidance, my evaluation plan will cover all three dimensions: (a) GraphRAG-specific retrieval metrics (precision/recall on entity and relation extraction, multi-hop accuracy on VIINA), (b) generation quality metrics (faithfulness, answer relevance, context utilization), (c) agentic metrics (reflection agent effectiveness, tool selection accuracy, reasoning chain quality)
This ensures the benchmark is meaningful both for research and for edge deployment scenarios.
I also have quite interest in the For Project 37, I see the feedback loop (Thumbs Up / Down) as a natural extension feeding into prompt tuning or lightweight LoRA fine-tuning, with knowledge graph updates for persistent corrections. I would be happy to discuss whether scoping both projects into a unified proposal makes sense, or whether a primary + stretch goal structure is preferred.
I would be grateful for any warm-up task or pre-contribution issue you could point me to I am eager to engage with the codebase before the proposal deadline and demonstrate fit in practice.
Thank you sincerely for your time and consideration.
Best regards,
Naitik Agrawal
naitikagrawal838@gmail.com
Beta Was this translation helpful? Give feedback.
All reactions