GSoC 2026 Proposal – Auto-Labeling Data Factory & Edge Training Integration #35100
MapleEagles
started this conversation in
Google Summer of Code
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
About Me
Name: Jing Jing
University: Michigan State University (PhD student, Construction Engineering / Data-driven Systems)
Timezone: EST (UTC-5)
Short Bio:
I am a PhD student focusing on AI-driven data pipelines and decision systems. My research interests lie in transforming unstructured data (e.g., images, logs) into structured representations for downstream reasoning and control.
Programming Experience:
Relevant Technical Experience:
About the Project
Project Choice
Auto-Labeling "Data Factory" & Edge Training Integration
Why I Chose This Project
This project aligns strongly with my research interest in data-centric AI systems. I am particularly interested in building pipelines that transform raw data into structured datasets that can support scalable model training and deployment.
Compared to traditional annotation workflows, this project introduces a more scalable approach using zero-shot models as teachers, which I find both technically interesting and practically impactful.
Proposed Solution (Abstract)
I propose to design a teacher–student data pipeline that leverages zero-shot models to generate pseudo-labels and uses quality-aware filtering to produce reliable training datasets.
The system will include:
Additionally, I propose an iterative refinement loop, where student model performance is used to improve pseudo-label quality over time.
Time Commitment
I plan to dedicate up to 30 hours/week during the GSoC period.
Timeline (High-Level)
General Questions
How do I know OpenVINO?
I am looking into GSoC lists, and I found it aligns my thesis direction. And it can solve the problem in my field. In civil engineering field, data is limited. And I am thinking of if I can contribute to this.
I became familiar with OpenVINO through its role in optimizing AI models for edge deployment. I have explored its use in accelerating inference and supporting lightweight deployment scenarios.
What do I know about OpenVINO?
OpenVINO provides tools for optimizing and deploying deep learning models efficiently on edge hardware. It supports model conversion, inference optimization, and integration with training pipelines such as OTX.
Contributions to OpenVINO
I am currently exploring the repository and plan to contribute via the prerequisite task.
Professional Development
This project aligns closely with my research direction in AI-driven data systems. It will help me deepen my understanding of:
Other Summer Plans
My primary focus for the summer is GSoC. I do not have conflicting commitments.
Why Should You Pick Me?
I bring a strong combination of:
I am particularly interested in not just building the pipeline, but improving data quality and understanding how it impacts model performance.
Prerequisites
I am currently working on the prerequisite task and will update this thread with my pull request.
Beta Was this translation helpful? Give feedback.
All reactions