-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Project ideas for 2026
Short description: You will be required to build an agent application with a graphical user interface as input. It should be able to automatically operate your computer screen or the UI interface of a specific application based on user instructions, and accomplish complex logical goals. In this pipeline, at least one model must be deployed locally using OpenVINO. During this project, you will get free access to AIPC cloud. You can refer to the following projects for ramp up.
Expected outcomes: a desktop application that provides a native GUI Agent based on local models
Skills required/preferred: Python, OpenVINO, Prompt engineering , Agentic workflow
Mentors: Ethan Yang, Zhuo Wu
Size of project: 350 hours
Difficulty: Hard
Short description: Deep Search, as one of the core functions of a personal AI assistant, significantly enhances the user experience by providing information extraction capabilities for various file types (such as Word, PowerPoint, PDF, images, and videos) and supporting multi-dimensional information queries. The localized personal knowledge base not only improves the accuracy and relevance of answers but also protects data security and provides personalized search results based on the user's private data. This project aims to develop a desktop AI localized personal knowledge base search assistant for AI PCs. By building a multimodal personal database and using Retrieval Augmented Generation (RAG) technology, this project leverages this private multimodal data to enhance local large language models (LLMs). Users can interact with the OpenVINO instant messaging AI assistant, ask questions, and perform fuzzy searches using multimodal data.
Expected outcomes:
- A standalone desktop application capable of building a personal knowledge base from multimodal data (word, images, videos) in specified directories, and supporting information retrieval and summarization via API/application.
- Localized deployment using OpenVINO, building a localized multimodal personal knowledge base using local multimodal LLMs and Retrieval Augmented Generation (RAG) technology.
- Deployment on Intel AIPC, with flexible switching between GPU/NPU hardware based on task load.
- The application will have a user interface allowing users to interact with the local LLM, perform fuzzy searches using multimodal information, and generate valuable output.
Skills required/preferred: Python or C++, OpenVINO, OpenCV, ollama, llama.cpp, LLMs, RAG, OCR, UI
Mentors: Hongbo Zhao, Kunda Xu
Size of project: 350 hours
Difficulty: Hard
Short description: Tracking the objects in a video stream is an important use case. It combines an object detection model with a tracking algorithm that analyzes a whole sequence of images. The current state-of-the-art algorithm is ByteTrack.
The goal of the project is to implement the ByteTrack algorithm as a MediaPipe graph that could delegate inference execution to the OpenVINO inference calculator. This graph could be deployed in the OpenVINO Model Server and deployed for serving. A sample application adopting KServer API would send the stream of images and would get the information about the tracked objects in the stream.
Expected outcomes: MediaPipe graphs with the calculator implementation for ByteTrack algorithm with YOLO models.
Skills required/preferred: C++ (for writing calculator), Python(for writing client) MediaPipe
Mentors: Adrian Tobiszewski, Dariusz Trawinski
Size of project: 175 hours
Difficulty: Medium
Short description: OpenVINO GenAI is a library of popular Generative AI pipelines, optimized execution methods, and samples built on top of the high-performance OpenVINO Runtime, focused on efficient deployment and easy integration. Currently, OpenVINO GenAI provides a text-to-video generation pipeline based on the LTX model - a diffusion-based video generator that creates videos from a text prompt via iterative denoising in latent space. This project extends the LTX pipeline with image-to-video (I2V) generation, enabling users to create short videos conditioned on an input image combined with a text prompt, running on Intel CPU and GPU. Adding image conditioning provides a strong visual anchor, improving control over composition and style. The project output includes C++ and Python API updates, runnable samples, validation tool updates (OpenVINO GenAI WWB and LLM Benchmarking), and basic tests to validate functionality.
Expected outcomes: Pull-request implementing image-to-video support in the OpenVINO GenAI API including: `. Pipeline Architecture: Extension of the Text2VideoPipeline class to support image-to-video execution paths with minimal memory overhead. 2. API Parity: Full C++ and Python API support for image conditioning inputs. 3. Infrastructure: Updates to OpenVINO GenAI benchmarking tools to measure I2V throughput and latency. 4. Reproducibility: A comprehensive test suite ensuring output consistency between Python and C++ implementations.
Skills required/preferred: C++, Python, good understanding of Stable diffusion architectures, experience with Hugging Face and Diffusers libraries, experience with PyTorch (OpenVINO is a plus), Git.
Mentors: Anna Likholat, Stanislav Gonorovskii
Size of project: 350 hours
Difficulty: Medium
Short description:
The goal of this project is to design and implement a set of optimizations in the OpenVINO runtime focused on improving inference performance of quantized neural network models on ARM-based devices. The work will target commonly used quantization schemes and model types, with an emphasis on reducing inference latency, increasing throughput, improving compilation time, and minimizing memory footprint. Special attention will be given to efficiently leveraging ARM-specific features such as NEON and ARM Compute Library integrations.
Detailed instructions are available here: https://github.com/alvoron/gsoc-2026-openvino
Expected outcomes:
- Improved adoption of quantized models in OpenVINO on ARM platforms
- Reduced inference latency and increased throughput for quantized workloads
- Faster model compilation and initialization times
- Lower memory consumption for deploying quantized models on resource-constrained ARM devices
Skills required/preferred: C++, Mac device with ARM chip is a must-have
Mentors: Aleksandr Voron, Vladislav Golubev
Size of project: 350 hours
Difficulty: Medium
Short description: OpenVINO is a critical toolkit for optimizing and deploying AI models on Intel hardware, but developing high-quality OpenVINO-related code (e.g., model inference, quantization, deployment tuning) requires deep domain expertise. This project aims to train a specialized coder model for the OpenVINO ecosystem using Supervised Fine-Tuning (SFT), GRPO, and Retrieval Augmented Generation (RAG) technologies. The model will be optimized for OpenVINO-specific scenarios: generating executable OpenVINO code, debugging deployment issues, providing performance optimization suggestions, and designing solutions for common development challenges. By integrating RAG with a curated OpenVINO knowledge base, the model will retrieve accurate domain knowledge in real time, while SFT/GRPO will refine its ability to produce high-relevance, correct OpenVINO code. The final model will be deployed locally via OpenVINO Runtime to ensure low latency on Intel CPU/GPU/NPU.
Expected outcomes:
- A specialized OpenVINO-domain coder model trained with SFT/GRPO/RAG, capable of generating accurate, executable OpenVINO code (e.g., inference pipeline construction, quantized model optimization, OpenVINO Runtime integration), debugging code errors, and providing targeted performance tuning suggestions for OpenVINO deployments.
- A localized deployment pipeline for the coder model based on OpenVINO Runtime, optimized for Intel AIPC/PC hardware (CPU/GPU/NPU) to achieve low latency (<100ms per code generation request) and high throughput
Skills required/preferred: Python, OpenVINO, LLMs, SFT/GRPO/RAG, experience with PyTorch
Size of project: 350 hours
Difficulty: Hard
Short description: Clawdbot is a personal AI assistant designed to run on your own infrastructure while interacting through the chat surfaces you already use (e.g., WhatsApp/Telegram/Slack/Discord/Signal/iMessage/Teams/WebChat), with the Gateway acting as the long-lived local control plane. This project proposes an entirely local model stack for Clawdbot: replace hosted LLM dependencies with on-device inference (a local LLM server and local embeddings), add a local RAG knowledge base exposed via Clawdbot “skills”, and ship a hardened configuration for real-world messaging inputs with sandboxed tool execution. The result is a privacy-preserving, low-latency Clawdbot deployment in which sensitive prompts, tool calls, and retrievals remain local.
Expected outcomes:
- Clawdbot running with a local LLM provider (no hosted inference required), configured via models. providers with a local baseUrl, plus an optional failover strategy.
- Local RAG capability delivered as Clawdbot skills, with a clear precedence/override model and reproducible ingestion.
- Hardened operational profile for real messaging surfaces:
- secure DM posture;
- sandboxed risky contexts/tools
- prompt-injection regression tests aligned with Clawdbot’s own warning that local/smaller models can increase risk.
- Deployment kit: reference configs, scripts, and a runbook matching Clawdbot’s gateway architecture and operational constraints.
Skills required/preferred:
- Node.js/TypeScript (Clawdbot runtime + gateway integration)
- LLM serving (OpenAI-compatible endpoints; optionally LM Studio/vLLM/LiteLLM patterns)
- RAG engineering (chunking, embeddings, vector DB, evaluation)
- Security engineering for agentic systems (sandboxing, tool policy, prompt injection mitigation)
- DevOps: system services, containerization, observability
Size of project: 350 hours
Difficulty: Hard
Short description: AI PCs incorporate multiple devices/inference engines for different machine-learning applications. Based on the performance, latency or power consumption requirements, an application may choose to use either NPU, GPU or a CPU for inference tasks. Usually, an application utilizes a single engine/device for the entire lifetime of the process/inference. The machine learning model being used by the application is compiled only for one device. However, it is important for the application to switch between different inference devices during runtime based on user preference, application behavior, and load/stress on the current device in use. Through this project, we want to build a face-detection application that continuously runs on the AI PC while switching between different inference devices during runtime based on user recommendations or evaluating the stress on the current engine. The inference should not end/pause while switching devices and should not lead to BSODs/System Hang/Device Crashes causing other applications to fail.
Expected outcomes:
- Implement low latency Face-Detection application to run on multiple devices/engines within AI PCs
- Utilize OpenVINO AUTO feature to demonstrate runtime switching between devices
- Create a GUI to prompt user to change the device during runtime based on user preference
- Analyze the device load and recommend user to switch to the most appropriate device to continue inference
Skills required/preferred: Python or C++, Basic ML knowledge
Mentors: Shivam Basia, Aishwarye Omer
Size of project: 175 hours
Difficulty: Easy
Short description: OpenHands is a popular component that provides a local GUI for AI coding agents. It supports integration with servings compatible with OpenAI api.
The goal of this project is the integrate OpenHands with OpenVINO Model Server. It would include instructions for deploying the serving with a set of models and configuring OpenHands to delegate tasks to the serving endpoints.
Expected outcomes: Receipt for deploying OpenHands with an instance of OpenVINO Model Server. Reporting usability experience and gaps analysis.
Skills required/preferred: Python, LLMs
Mentors: Michal Kulakowski, Milosz Zeglarski
Size of project: 90 hours
Difficulty: Easy to medium
Short description: Profiling AI model performance is a tedious and time-consuming task. Intel’s VTune provides great level of details at the high level as well as Intel’s GTPin tool provides kernel and instruction level details. This project focus on developing a GTPin plugin to find correlation between instruction level matrices and GPU stats, to identify the hotspots in the kernel and provide guidance on improving kernel level performance.
Expected outcomes: Leverage Intel OpenVino & Intel tools with custom developed plugin to automatically identify bottlenecks in GPU kernels in LLM, VLM, VLA models
Skills required/preferred: Strong in C/C++, GPU programming, GPU kernel, exposure to GPU profiling tools, AI SW execution pipeline, compiler experience is a plus
Mentors: Selvakumar Panneer, Pramit Biswas
Size of project: 175 hours
Difficulty: Hard
Short description: This project involves developing comprehensive Jupyter notebooks that showcase how to deploy and optimize trending AI models using OpenVINO toolkit. The contributor will research and identify the most popular and emerging models in computer vision, natural language processing, and multimodal AI, then create step-by-step tutorials demonstrating model conversion, optimization, and inference with OpenVINO. Each notebook will include practical examples, performance benchmarking, and comparison between different hardware targets (CPU, GPU, NPU). The notebooks will serve as educational resources for the OpenVINO community, helping developers quickly adopt new model architectures and understand optimization techniques. The notebooks will be merged to https://github.com/openvinotoolkit/openvino_notebooks.
Expected outcomes:
- At least one comprehensive Jupyter notebook covering trending models
- Each notebook includes model conversion from popular frameworks (PyTorch, TensorFlow, ONNX) to OpenVINO IR format.
- Each notebook includes a rebuilt model pipeline based on OpenVINO runtime.
- Performance optimization examples including quantization, model compression, and hardware-specific optimizations.
- Interactive visualizations and demos showcasing model capabilities.
- Benchmarking results across different Intel hardware (CPU, integrated GPU, discrete GPU, NPU where applicable)
- Documentation and best practices guide for model deployment patterns
Skills required/preferred:
- Python programming and Jupyter notebook development
- Experience with popular ML frameworks (PyTorch, TensorFlow, Hugging Face)
- Basic understanding of computer vision and/or NLP concepts
- Familiarity with model optimization techniques (quantization, pruning)
- OpenVINO toolkit experience (preferred but not required)
- Technical writing and documentation skills
- Git/GitHub workflow knowledge
Mentors: Aleksandr Mokrov, Ethan Yang
Size of project: 175 hours
Difficulty: Medium
Short description: The Optimum-Intel provides high performance inference and export pipeline for Hugging Face Pytorch models using OpenVINO backend. Adding support for new models currently requires significant manual effort: understanding model architecture, writing model-specific patching, tests and documentation. This project proposes the development of an autonomous agent system that automatically generates high-quality code to add support for new Hugging Face models into the Optimum-Intel repository. The system will analyze model config, generate model patching, create appropriate tests and generate a tiny variation of the model with reduced parameters based on the model config. The project aims to speed up the development process, automate the most repetitive parts and reduce manual effort when adding support for models to Optimum-Intel.
Expected outcomes:
- Multi-Agent system that analyzes a given model by Hugging Face ID, generates the code for the model support, tests, documentation, and (optionally) runs local validation and generates a tiny model based on the model config with reduced parameters for use in tests.
- The result of the agent system workflow is a code patch and a folder containing a tiny model.
Skills required/preferred: OpenVINO, Python, PyTest, LangGraph or similar agent orchestration framework, Docker (optionally).
Mentors: Anastasiia Pnevskaia , Roman Kazantsev
Size of project: 350 hours
Difficulty: Hard
Short description: This project integrates n8n (no-code workflow automation platform) with OpenVINO Model Server to enable visual creation of AI-powered workflows that leverage Intel GPU and NPU hardware acceleration. The focus is on building a concrete "Smart Document Processing Pipeline" as the primary demonstration: a workflow that monitors local folders for incoming documents (PDFs, images, scanned forms), uses OpenVINO Model Server with document understanding models running on NPU for efficient text/table extraction, routes data through an LLM on GPU for classification and entity extraction, and stores results locally with notifications. The system showcases OpenVINO AUTO plugin for intelligent device switching—document analysis on NPU (power efficient), heavy LLM processing on GPU (performance), with CPU fallback when devices are busy. Additional workflow templates may include customer service chatbots with multimodal inputs, or automated content generation pipelines.
Expected outcomes:
- A production-ready custom n8n node that connects to OpenVINO Model Server endpoints with support for GPU/NPU device selection
- Podman Compose deployment package bundling n8n, OpenVINO Model Server, and supporting services
- Implementation of the "Smart Document Processing Pipeline" demonstrating: folder monitoring, document understanding on NPU, LLM processing on GPU, local storage, and notifications
- OpenVINO AUTO plugin integration for intelligent runtime device switching based on system load and performance requirements
- At least 2-3 additional pre-built workflow templates (e.g., conversational AI with RAG, or multimodal customer service)
- Comprehensive documentation including tutorials, workflow design patterns, and installation guide for AI PCs
Skills required/preferred: Node.js/TypeScript, Python, OpenVINO, Podman, REST/gRPC APIs, basic ML/AI knowledge
Mentors: Praveen Kundurthy, Max Domeika
Size of project: 350 hours
Difficulty: Medium
Short description: AIPlayerInsight is a post-match analysis system that turns ordinary game footage into rich player telemetry and tactical insights. Using computer vision and local AI, the platform automatically tracks players and the ball, stores motion data in a time-series database, and lets coaches ask natural language questions like "Show me every time #11 hits > 20 mph in the fourth quarter." The goal is to make elite-level analytics accessible without expensive proprietary tracking hardware, while giving students hands-on experience building a full-stack, AI-powered sports product.
What students will learn:
- Building end-to-end AI systems from raw video to structured data.
- Designing data models for high-frequency spatial data.
- Optimizing inference for real hardware constraints.
- Shipping a user-facing product with real analytics value.
Expected outcomes:
- 10x faster analysis by reducing time from whistle to data availability.
- Surface hidden performance metrics like player fatigue (speed decay), off-ball separation, and defensive coverage patterns.
- Lower cost by using standard HD/4K cameras instead of proprietary tracking hardware.
- A working pipeline that ingests video, runs detection and tracking, and produces structured spatiotemporal data.
- A queryable analytics layer that answers natural language questions and returns charts or visual overlays.
- A polished demo that students can showcase as real-world AI + data engineering experience.
Skills required/preferred:
Required:
- Python for backend pipelines and data processing.
- OpenVINO for model optimization on Intel hardware.
- YOLOv11 and TrackNet for detection and ball tracking.
- StrongSORT for multi-object tracking and ID persistence.
Preferred:
- React for the tactical dashboard.
- TimescaleDB/PostgreSQL for spatiotemporal schema design.
- Local LLM serving (Ollama, Hugging Face) and Text-to-SQL prompt engineering.
- Intel Arc GPU AV1/HEVC video handling.
Mentors: Ben Odom
Size of project: 350 hours
Difficulty: Hard (ambitious but very rewarding for students who want real-world AI and systems experience)
Short description: Currently, OpenVINO GenAI’s GGUF reader manually reconstructs model architectures by parsing metadata and building the OpenVINO model layer-by-layer. This project will add/replace the current static graph generation approach in OpenVINO GenAI with an alternative mechanism that traverses the GGML computation graph and dynamically translates it into an OpenVINO model by leveraging llama.cpp APIs and the existing OpenVINO backend implementation in llama.cpp. The student will utilize the GgmlOvDecoder and ov::frontend::ggml::FrontEnd logic, which translates GGML computation graphs directly into OpenVINO IR, allowing for dynamic graph conversion rather than static architecture reconstruction. This new GGUF Reader will act as a generic reader, automatically supporting a wider range of architectures supported by GGML without requiring manual C++ implementation for every new topology. The project involves integrating the translation logic with GenAI’s read_model pipeline, and ensuring feature parity with the existing backend execution (including quantization support and NPU specificities).
Expected outcomes:
- GGUF-Reader-v2: A functional GGUFReaderV2 class in OpenVINO GenAI that utilizes the llama.cpp libraries and graph translation logic in OpenVINO backend in llama.cpp to load GGUF files.
- Generate GGML Computation Graph: Update OpenVINO GenAI GGUF reader to generate GGML computation graph by using llama.cpp APIs and relevant source code components.
- Model Translation: Utilize existing GgmlOVDecoder and ov::frontend::ggml::FrontEnd implementations to translate GGML computation graph into an OpenVINO model.
- Integration: Seamless integration into the read_model API, allowing users to load GGUF models via the new GGUF-Reader-v2 mechanism.
- Test Suite: A set of regression tests comparing the output of the new GGUF-Reader-v2 against the existing reader and the original llama.cpp backend to ensure accuracy.
- Documentation: comprehensive documentation on the architecture of the new GGUF-Reader-v2 and instructions for adding support for new GGML operations.
Skills required/preferred: C++, llama.cpp (GGUF/GGML), OpenVINO
Mentors: Mustafa Cavus, Ravi Panchumarthy
Size of project: 350 hours
Difficulty: Hard
16. Optimizing OpenVINO GPU Performance across Executorch and LiteRT frameworks through Vulkan Backend Comparative Analysis
Short description: The goal of this project is to benchmark and analyze Vulkan backend performance on Intel GPUs across multiple AI inference frameworks (Executorch and LiteRT), identify performance gaps compared to OpenVINO GPU backend, and provide actionable optimization recommendations to enhance OpenVINO's GPU performance. Currently, ExecuTorch and LiteRT’s Vulkan backend focuses only on Android with no benchmarking on Intel AI PCs. This project will establish comprehensive performance baselines for Vulkan backend on Intel hardware, conduct comparative analysis against OpenVINO GPU execution and deliver specific optimizations strategies for improving OpenVINO GPU backend based on insights from Vulkan’s performance across computer vision and LLM workloads.
Expected outcomes:
- Ability to execute models using Vulkan backend on Intel AI PCs.
- Establish baseline performance of Vulkan backend on various workloads.
- Perform comparative performance analysis of Vulkan backend and OpenVINO backend on Intel integrated and discrete GPUs.
- Identify any performance gaps between Vulkan backend and OpenVINO backend.
- Propose and implement solutions to bridge the gap between Vulkan and OpenVINO backends.
- Validate the solutions with proper test cases.
- Enable fallback to Vulkan GPU backend when OpenVINO GPU backend has any missing operators.
Skills required/preferred: Python, C++, PyTorch, TensorFlow. Good to have: OpenVINO, Executorch, LiteRT, Vulkan SDK
Mentors: Surya Siddharth Pemmaraju, Anisha Dattatraya Kulkarni
Size of project: 350 hours
Difficulty: Medium to hard
Short description: Llama.cpp is an open-source C++ project that lets you run large language models locally on your own machine. Llama-server is a server application built on top of llama.cpp that exposes the model through an HTTP API, making it easy to use from other programs or services. The OpenVINO backend in llama.cpp allows llama.cpp to run models using Intel’s OpenVINO runtime, which provides optimized inference on Intel hardware (CPUs, GPUs, NPUs). While the Llama.cpp OpenVINO backend enables efficient inference on Intel CPUs and accelerators, not all llama.cpp-server configurations are fully supported or function correctly when OpenVINO is enabled. These gaps limit usability, and deployment flexibility with llama-server using OpenVINO backend. This project aims to systematically examine the available configuration options of llama.cpp-server, identify incompatibilities or failures when using the OpenVINO backend, and implement robust solutions to close those gaps.
Expected outcomes:
- Perform a configuration coverage analysis to document all supported llama.cpp-server runtime and build-time options.
- Classify configurations by functionality (e.g., model loading, batching, KV cache, threading, streaming, parallelism).
- Test each configuration against the OpenVINO backend and identify configurations that fail at runtime or produce incorrect results.
- Determine whether issues originate from backend feature gaps, model graph constraints, memory layout or tensor shape assumptions, KV cache or scheduling logic, API mismatches between llama.cpp and OpenVINO.
- Propose and implement fixes for unsupported or broken configurations. Ensure solutions are consistent with llama.cpp design principles and OpenVINO best practices.
- Validate fixes with functional and performance tests. Clearly document supported, unsupported, and conditionally supported configurations.
Skills required/preferred: C++. Good to have: OpenVINO Toolkit, Llama.cpp, LLM architectures and inference pipelines
Mentors: Mustafa Cavus, Zijun Yu
Size of project: 350 hours
Difficulty: Medium to hard
Short description: Modern Intel AI PCs open the possibility of running highly capable agents fully on device. This extends accessibility of agents to end users where connectivity, privacy, costs, and latency may be of concern. This project’s goal is to provide an easy-to-use Toolkit to bootstrap application developers build their agents to run locally on Intel AI PC, taking full advantage of platform compute devices (CPU, GPU, NPU). This Toolkit provides APIs for agent creation and management, enable tool use, model query/management, model serving, local data context management, and agent session management. Toolkit must be integrated with at least one popular editor (e.g. VSCode) and support C/C++ and Python.
Expected outcomes:
- Provides ability to query/use existing agentic frameworks where OpenVINO backend is integrated (e.g. LangChain, LlamaIndex, etc)
- Use OpenVINO libraries or OpenVINO-integrated libraries/frameworks – OpenVINO model server, OpenVINO GenAI, ORT/WinML OpenVINO EP, etc.
- Sample agent(s) created with Toolkit must run in AI PC. E.g. agent for local data query and summarization with tool use extension.
- Sample agent(s) created with Toolkit to interact with off-device agents through A2A, showcasing agent app extensions and integration of cloud edge experiences
Skills required/preferred: Python, Bash/PowerShell scripting, familiarity with REST APIs, familiarity with Agentic systems
Mentors: Freddy Chiu, Ravi Panchumarthy
Size of project: 350 hours
Difficulty: Medium to hard
Short description: OpenVINO Model Server (OVMS) is a high-performance inference serving solution widely used for deploying optimized deep learning models across CPU, GPU, and NPU. OVMS already provides powerful backend capabilities: HuggingFace model pulling, OpenAI-compatible APIs with streaming, runtime config management, and model add/remove/list. However, bare-metal installation requires multiple manual steps. This project adds a one-command bootstrap installer that automates installation across Ubuntu, RHEL, and Windows, and a Python CLI wrapper built on top of the existing OVMS binary and its CLI flags that unifies model interaction into “ovms run ” - downloading the model, starting the server, and opening an interactive terminal chat session. Additionally, it should provide ovms models subcommands for discovering and managing models, and an ovms init wizard for first-time users.
Expected outcomes:
- One-Command Installer (install-ovms.sh / install-ovms.ps1): Bootstrap script that detects OS/architecture, downloads the correct OVMS package, extracts it, installs dependencies, configures environment variables persistently, and verifies with a health check. Supports Ubuntu, RHEL, and Windows.
- ovms run : Single command that spawns the existing OVMS binary, polls readiness via /v2/health/ready, and starts a streaming chat REPL against /v3/chat/completions with slash commands (/help, /set system, /clear, /exit).
- ovms models list|pull|search|info|rm: Formatted model management CLI built on top of existing --list_models and --pull CLI flags, adding HuggingFace Hub search for model discovery.
- ovms init: Interactive first-run wizard for task selection, model choice, and device detection.
- Tests & documentation: Unit/integration tests for Linux and Windows, updated quick-start guide, installer packaged for hosting, CLI packaged as pip install ovms-cli.
Skills required/preferred: Python, Bash/PowerShell scripting, familiarity with REST APIs and SSE streaming, C++ (for understanding OVMS internals), familiarity with LLM serving concepts, HuggingFace ecosystem.
Mentors: Freddy Chiu, Ravi Panchumarthy
Size of project: 350 hours
Difficulty: Medium to hard
Short description: In complex robotics tasks like humanoid mobile manipulation, real-time collision detection is the primary computational bottleneck for Model Predictive Control (MPC). The OCS2 (Optimal Control for Switched Systems) library currently relies on scalar computations that do not fully leverage modern CPU architectures. In this task, we will implement AVX2 SIMD (Single Instruction, Multiple Data) acceleration for the collision detection pipeline within OCS2. By vectorizing distance queries and primitive-based checks (e.g., sphere-to-capsule or box-to-box), we aim to achieve a significant speedup in the control loop. This optimization will allow humanoid platforms to compute safer, more fluid trajectories in cluttered environments with lower latency.
Expected outcomes: A high-performance collision detection module integrated into OCS2 using AVX2 intrinsic, showing measurable reduction in MPC solve times for humanoid manipulation.
Skills required/preferred: Advanced C++, familiarity with AVX2 SIMD intrinsic, understanding of collision detection algorithms (e.g., GJK, SAT), and basic knowledge of Model Predictive Control (MPC).
Size of project: 350 hours
Difficulty: Medium to Hard
Short description: The goal of this project is to optimize Vision-Language-Action (VLA) models, which are designed for robotic control, e.g. GR00T, PI0.5, etc, to achieve high-performance, real-time inference on Intel integrated Graphics Processing Units (iGPUs) . This involves a deep technical dive into the full inference pipeline, with a focus on Intel's OpenVINO toolkit and the oneAPI Deep Neural Network Library (oneDNN), even the hardware-specific features. The outcome is clearly to speedup the deployment of embodied intelligence systems on Intel Architecture (IA).
Expected outcomes: By optimizing a specific VLA model on Intel iGPU, figure out the bottle neck of AI enabling on IA, implement optimized kernels, and summarize the methodology for the portability of optimizations.
Skills required/preferred: Understanding of graphics pipelines (GPU programming is a plus), DL basics, robotics
Size of project: 350 hours
Difficulty: Medium to Hard
Short description: Efficient edge deployment on Intel architecture requires deep insight into hardware utilization to correlate edge AI workload behavior with system performance metrics (CPU/GPU/NPU/IPUs, memory, power, etc.). This project will create a web-based dashboard using React that visualizes real-time and post-hoc performance telemetry from Intel hardware during AI workloads tests. The solution will allow practitioners, researchers, and developers to intuitively explore performance data, spot hardware bottlenecks, and make informed optimization decisions — all with a modern UI grounded in live graphs, heatmaps, and interactive timelines. A dedicated UI will:
- Showcase Intel HW capabilities for OEMs/ODMs/ISVs during events & presentations
- Boost accessibility of performance data for non-expert users and domain specialists
- Facilitate comparisons across runs, models, and configuration settings
This aligns with the Open Edge Platform ecosystem’s goal of simplifying workload optimization and performance insights across Intel architectures.
Expected outcomes:
- A survey of existing visualization techniques for HW utilization
- Design & implement a responsive UI using React to present performance metrics from Intel CPUs/GPUs/NPUs/IPUs during Edge AI workload testing
- Integrate real-time data visualization for key hardware performance counters (utilization %, clock speed, memory throughput, temperature, power draw)
Skills required/preferred:
- Required: React, TypeScript, REST/Socket APIs, UI design fundamentals, keen eye for design and aesthetics
- Preferred: Familiarity with performance counters and telemetry, basic knowledge of benchmarking workflows, UI data visualization libraries (e.g., D3, Recharts)
- Bonus: Understanding of hardware profiling metrics
Mentors: Wiktoria Kalisz, Pawel Zak, Jacek Jankowski
Size of project: 350 hours
Difficulty: Medium
Short description: Intel SceneScape as an Open Edge Platform component, makes writing applications based on sensor data faster, easier and better by reaching beyond vision-based AI to realize spatial awareness through contextualization of multimodal sensor data in a common reference frame. It provides a collection of microservices, tools and supporting containers to quickly move from single sensor analytics to a multimodal aggregated scene view. The challenge is to implement fluent object tracking and analytics without lost tracks in ultra high density scenes.
This aligns with the Open Edge Platform ecosystem’s goal of enabling workload optimization and performance insights across Intel architectures.
Expected outcomes:
- Implement new tracking micro-service for 1000+ objects tracking on one scene without lost tracks in multi-camera environment
- Integrate new micro-service with Scenescape solution on github CI/CD
- Enable tests and perform full validation cycle to prove quality of the delivered micro-service and full Scenescape solution
Skills required/preferred:
- Required: REST/Socket APIs, C/C++ & Python
- Preferred: Familiarity with performance counters and telemetry, basic knowledge of benchmarking workflows
- Bonus: Understanding of hardware profiling metrics
Mentors: Tomasz Dorau, Lukasz Talarczyk
Size of project: 350 hours
Difficulty: Medium
Short description: Deploying Intel Edge AI Suites (Metro AI, Manufacturing AI, Retail AI, Robotics AI) requires users to manually configure OS images with correct drivers, dependencies, and optimizations—a multi-step process involving kernel setup, GPU/NPU driver installation, and OpenVINO configuration. This project creates a web interface for OS Image Composer that dramatically simplifies this workflow through AI-powered natural language interaction. Users describe their target workload (e.g., "smart intersection with sensor fusion" or "defect detection for manufacturing") and receive a production-ready OS image template with all required packages pre-configured. The interface features conversational template refinement using Template-Enriched RAG, visual YAML editing with real-time validation, and build progress monitoring. By abstracting the complexity of edge image configuration, this project accelerates developers' path from Edge AI Suite selection to deployment-ready custom OS images (image templates).
Expected outcomes:
- Web application enabling visual creation of edge OS images for OEP-supported distributions
- AI Chat interface for natural language template generation aligned with edge use cases (IoT, AI/ML inference, industrial automation)
- Visual template editor with validation against OEP best practices
- Build dashboard with progress monitoring and log viewing
- Template library organized by OEP deployment scenarios (edge gateway, inference node, minimal sensor, etc.)
- Session persistence for iterative template refinement
- Stretch goal: Docker-based deployment packaging
Skills required/preferred:
- Required: Go programming, React or Vue.js, REST API design, HTML/CSS, Git
- Preferred: WebSocket, YAML/JSON validation, Docker, familiarity with edge computing concepts, understanding of RAG systems
Mentors: Alpesh Rodage, Mats Agerstam
Size of project: 350 hours
Difficulty: Medium
Short description: Developers building Edge AI solutions often prototype on Edge Microvisor Toolkit (EMT) then need to deploy to production environments running enterprise Linux distributions (RHEL, SUSE, CentOS, Fedora). Today, recreating an EMT environment on a different RPM-based OS requires manually tracking packages, exporting Docker containers, and reconfiguring system settings, a tedious and error-prone process. This project creates a CLI tool that captures a complete EMT system setup and generates portable migration scripts for target distributions. The tool analyzes installed packages, Docker images, and configurations, then produces re-runnable scripts with cross-distro package mapping and recovery logic. By automating the "works on EMT, deploy anywhere" workflow, this project enables ISVs and solution builders to develop Edge AI Suites on EMT and confidently deploy to their customers' diverse infrastructure.
Expected outcomes:
- CLI tool that captures EMT system state (packages, Docker images, configurations)
- Cross-distribution package mapping for RHEL, SUSE, CentOS, Fedora
- Generated shell scripts for reproducible deployment on target systems
- Docker image export/import automation
- Incremental update and validation capabilities
- Error recovery with resume-from-failure support
Skills required/preferred: Go programming, RPM package management, shell scripting, Docker, Linux system administration
Mentors: Yock Gen Mah, Lishan Liu
Size of project: 350 hours
Difficulty: Medium
Short description: When Edge AI deployments fail in the field, developers face a challenging debugging process—manually sifting through kernel logs, hardware diagnostics, and system journals to identify root causes. This project creates a command-line diagnostic agent for Edge Microvisor Toolkit (EMT) that brings AI-assisted troubleshooting directly to the edge device. The agent collects system, kernel, and hardware logs locally and uses on-device LLMs to provide log summarization, natural language Q&A, and root cause analysis—all without sending sensitive data off the device. By integrating with a knowledge base of known issues and solutions, the tool can match symptoms to documented fixes and suggest actionable remediation steps. The extensible architecture allows teams to add custom diagnostic checks and connect to issue trackers like GitHub Issues, building institutional knowledge over time.
Expected outcomes:
- CLI diagnostic agent running natively on EMT
- Automated log collection and analysis (system, kernel, hardware)
- On-device LLM-powered summarization, Q&A, and root cause recommendations
- Knowledge base integration for matching issues to known solutions
- Plugin architecture for custom diagnostic checks and swappable AI models
- Issue tracker integration (e.g., GitHub Issues) for knowledge capture
Skills required/preferred: Python, Linux system debugging, LLM experience (Llama, DistilBERT), LangChain framework
Mentors: Lishan Liu, Andy Peng
Size of project: 350 hours
Difficulty: Medium
Short description: Our prompt framework (under development, soon to be open sourced) currently excels at allowing users to detect objects instantly using Zero-Shot foundation models. However, while these models are powerful, they can be computationally expensive for large-scale, 24/7 deployment on resource-constrained edge hardware. Conversely, efficient supervised models (like YOLO or MobileNet) are perfect for the edge but require thousands of labeled images and weeks of manual effort to train.
This project will introduce a "Cold Start" Data Factory capability to the framework. This new feature allows users to leverage the powerful zero-shot capabilities of our prompt framework not just for live inference, but as a "Teacher" to automatically generate high-quality labeled datasets for efficient "Student" models.
By adding this capability, our Prompt framework will become a critical upstream tool that feeds into mature training ecosystems. The contributor will implement a workflow where users can:
- Define a task using visual/text prompts on a few frames.
- Run an automated "batch inference" job over video streams/files to generate robust pseudo-labels (auto-annotation).
- Export these labeled datasets into industry-standard formats (YOLO, COCO, CVAT) for external use.
- Integrate directly with OTX to fine-tune lightweight models on the generated data.
Expected outcomes:
- Datumaro Integration: Integration of the Datumaro library into the application backend dependencies.
- DatasetWriter Component: Implementation of a new
StreamWriterin the pipeline that buffers frames and model predictions, converting them into Datumaro objects. - Format Support: Capabilities to serialize the buffer into standard formats (YOLO, COCO, etc.) using Datumaro’s native exporters.
- UI Integration: A "Record/Export" control in the frontend to start/stop data capture and select the desired output format.
Skills required/preferred:
- Python and backend application development
- Familiarity with deep learning workflows and model training using PyTorch
- Experience with dataset formats such as YOLO and COCO
- Understanding of video processing and batch inference pipelines
- Experience with or interest in Datumaro, OTX, or similar ML tooling ecosystems
- (Nice to have) Frontend–backend integration experience (for UI control wiring)
Mentors: Mikhail Pryakhin, Eugene Liu
Size of project: 350 hours
Difficulty: Hard
Short description:
Most unsupervised anomaly detection models (like PatchCore or Padim) are trained only on "normal" data. However, determining the anomaly threshold typically requires a labeled validation set containing both normal and anomalous images. This project aims to implement and benchmark a suite of State-of-the-Art (SOTA) Synthetic Anomaly Generation techniques within the anomalib framework. The goal is to create realistic, out-of-distribution samples that allow the model to automatically calibrate its decision threshold during the validation split, ensuring high performance in real-world deployments without requiring manual labeling.
Expected outcomes:
- Literature Survey:
Perform a comprehensive survey of synthetic anomaly methods, categorizing such as (left to the students):
- Heuristic/Procedural: Perlin noise (current Anomalib default), Fractal noise, etc.
- Feature-based: DRAEM (Discriminative Reconstruction), CutPaste, and MemSeg.
- Generative: Diffusion-based (AnomalyDiffusion) and GAN-based synthesis.
- Report: The student should deliver a detailed report on this study.
- Benchmarking Performance and Efficiency: Compare generation speed across PyTorch and OpenVINO. The solution should focus on resource economy and ideally run on CPU.
- Implementation:
- The best method from the study should be integrated into Anomalib. For example: an "on-the-fly" augmentation pipeline where synthetic anomalies are generated during the
val_dataloaderloop, avoiding the need for temporary disk storage. - Focus on methods that capture "local" nature (scratches, holes, stains) rather than global style shifts.
- The best method from the study should be integrated into Anomalib. For example: an "on-the-fly" augmentation pipeline where synthetic anomalies are generated during the
Skills required/preferred:
- Deep Learning Frameworks: Proficiency in PyTorch, PyTorch Lightning, generative models (GANs/Diffusion).
- Deployment: Basic understanding of OpenVINO for inference optimization.
- Computer Vision: Experience with image processing (OpenCV, Albumentations).
- Research Experience: Ability to read, implement, and benchmark academic papers.
Mentors: Ashwin Vaidya, Rajesh Gangireddy
Size of project: 350 hours
Difficulty: Medium
Short description: Industrial anomaly detection often requires monitoring an object from multiple angles (e.g., top, side, and bottom views) simultaneously. This project aims to extend our application based on Anomalib to support multi-camera streams. The student will develop the infrastructure to either:
- Orchestrate parallel pipelines: Training and running inference on individual models per camera view. or
- Multimodal/Multi-view fusion: Stacking or concatenating images from different cameras and process by a single, unified anomaly detection model.
The project involves extending the React-based frontend, the FastAPI backend, and contributing new pipeline logic to the Anomalib library.
Useful Resources:
- https://github.com/open-edge-platform/anomalib/tree/main/src/anomalib/pipelines/tiled_ensemble
- https://anomalib.readthedocs.io/en/latest/markdown/guides/reference/pipelines/index.html
Expected outcomes:
- UI Extensions: A multi-pane dashboard in Anomalib's application frontend using React and TanStack Query, etc. to manage configuration, live-view, and results for cameras.
- Backend Support: Update FastAPI endpoints to handle multi-camera registration, state management, and asynchronous processing of multiple streams.
- Anomalib Multi-Camera Pipeline: A new pipeline in Anomalib (similar to tiled_ensemble) that can aggregate results from multiple views or process a stacked input tensor.
Skills required/preferred:
- Frontend Development: Proficiency in React and TanStack Query for state management and UI components.
- Backend Development: Strong experience with FastAPI and asynchronous Python.
- Deep Learning: Understanding of CNNs and Transformers (e.g., how to modify input layers for stacked images or use ensemble methods).
- Anomalib Architecture: Familiarity with Anomalib's pipeline structure and the tiled_ensemble logic.
Mentors: Mark S Redeman, Ashwin Vaidya
Size of project: 350 hours
Difficulty: Medium
Short description: Industrial setup involves monitoring an object from multiple angles (e.g., top, side, and bottom views) simultaneously. This project aims to extend our application based on OpenVINO Training Extensions (OTX) to support multi-camera streams. The student will develop the infrastructure to either:
- Orchestrate parallel pipelines: Training and running inference on individual models per camera view. Or
- Multimodal/Multi-view fusion: Stacking or concatenating images from different cameras and process by a single, unified anomaly detection model.
The project involves extending the React-based frontend, the FastAPI backend, and contributing new pipeline logic to the OpenVINO Training Extension library.
Expected outcomes:
- UI Extensions: A multi-pane dashboard in OTX’s application using React and TanStack Query, etc. to manage configuration, live-view, and results for cameras.
- Backend Support: Update FastAPI endpoints to handle multi-camera registration, state management, and asynchronous processing of multiple streams.
- Multi-Camera Pipeline: A new pipeline that can aggregate results from multiple views or process a stacked input tensor.
Skills required/preferred:
- Frontend Development: Proficiency in React and TanStack Query for state management and UI components.
- Backend Development: Strong experience with FastAPI and asynchronous Python.
- Deep Learning: Understanding of CNNs and Transformers (e.g., how to modify input layers for stacked images or use ensemble methods).
Mentors: Leonardo Lai, Kirill Prokofiev
Size of project: 350 hours
Difficulty: Medium
Short description: The goal of the project is to add a compatibility layer that allows users to utilize policies from leRobot through our new Robotics framework interface. Policies wrapper with that layer should work as original Robotics Framework policies in most of the usecases (training, validation, inference) while being fully compatible with the original leRobot framework: leRobot users should be able to load and use them alongside with the original leRobot models.
Expected outcomes:
- A policy from leRobot can be trained via Robotics framework library
- Evaluation and benchmarking in Robotics Framework work for policies from leRobot
- A leRobot policy can be exported to a checkpoint compatible with the Robotics framework’s inference module.
Skills required/preferred:
- Python & Deep Learning: Strong experience with PyTorch and deep learning for computer vision and robotics.
- Frameworks: Familiarity with leRobot framework.
Mentors: Vladislav Sovrasov, Samet Akcay
Size of project: 350 hours
Difficulty: Medium
Short description: This project aims to bring intelligent features to our Robotics Framework data collection process. Imitation learning assumes dataset for training consists of multiple episodes, and each of them demonstrates a successful solution for the task being solved. User collecting the dataset needs guidance to prevent learning on poorly recorded episodes. The goal is to provide that guidance and extra metrics which could help decide if there’s enough data to train on.
Expected outcomes:
- Implemented approach allows the Robotics Framework to detect common mistakes arising during data capturing via teleoperation and to filter out defective dataset samples.
- Add extended datasets statistics that directly or indirectly describe dataset quality and diversity.
Skills required/preferred:
- Python & Deep Learning: Strong experience with PyTorch and deep learning for computer vision and robotics.
- Frameworks: Familiarity with leRobot or other imitation learning frameworks.
Mentors: Vladislav Sovrasov, Arend Jan Kramer, Samet Akcay
Size of project: 350 hours
Difficulty: Hard
Short description: Many real-world deployment scenarios require object detection with stable identities over time, commonly referred to as Multi-Object Tracking (MOT). Typical use cases include counting items on conveyor belts, monitoring moving assets, measuring dwell time within specific zones, and enabling reliable downstream logic that depends on consistent object IDs rather than frame-by-frame detections. This project introduces Multi-Object Tracking (MOT) capabilities into the OTX library with the following goals:
- Seamlessly integrate with OTX’s design philosophy and align with existing object detection workflows, including APIs, configuration files, entities, and inference patterns.
- Support OTX detection models (e.g., DETR-based, YOLO-based, and other OTX-supported detectors), treating tracking as a post-processing step applied to per-frame detection outputs.
- Deliver a working proof-of-concept, including a comparison of well-established tracking algorithms in realistic scenarios.
As part of the project, the student will implement and integrate a baseline MOT solution (such as ByteTrack or OC-SORT) and explore more advanced, SAM-2-inspired tracking concepts, including:
- Decoupled detector–tracker design: the detector processes each frame independently, while the tracker is responsible for maintaining consistent object identities across frames.
- Memory-based tracking mechanisms: maintaining a memory bank of object appearance features from previous frames to improve identity stability under occlusions and appearance variations.
Expected outcomes:
- OTX Tracking API: a supported tracking component integrated into OTX’s detection workflows, producing per-frame results with consistent
track_idvalues. - Unified output format: a stable schema containing
frame_id,track_id,bbox,score,label, plus optional track metadata (age, velocity, state). - Configurable tracking behavior: expose key knobs (association thresholds, track lifecycle, confidence gating, optional appearance/memory settings) consistent with OTX config patterns.
- PoC and comparison report: evaluate multiple tracking baselines (e.g., ByteTrack / OC-SORT / BoT-SORT / DeepSORT-like) and a memory-bank variant inspired by SAM-2 concepts; document pros/cons and recommended defaults for OTX.
- Inference integration: enable running tracking on videos / image sequences via OTX CLI in a way consistent with existing detection inference commands.
- Tracking evaluation support: provide tooling to evaluate tracking quality on MOT-style annotations (e.g., IDF1, ID switches, MOTA where applicable) and document how to use it in OTX.
- Documentation: user docs covering how to run tracking, tune parameters, and understand limitations.
Skills required/preferred:
- Python: Intermediate to advanced Python programming skills. Comfortable working with scientific Python libraries (NumPy, pandas) and reading/understanding existing codebases.
- Computer Vision fundamentals: Understanding of object detection concepts (bounding boxes, IoU, confidence scores). Familiarity with tracking basics is helpful.
- PyTorch basics: Experience with PyTorch tensors, basic operations, and working with model outputs.
- Software engineering practices: Ability to write clean, documented code and basic unit tests. Willingness to learn from code reviews.
- Bonus: Prior exposure to tracking algorithms (ByteTrack, DeepSORT, SORT) or video processing.
- Bonus: Experience with deep learning frameworks or production ML systems.
Mentors: Kirill Prokofiev, Leonardo Lai
Size of project: 350 hours
Difficulty: Medium
Short description: Smart cities are evolving into cognitive cities powered by agentic AI, where critical infrastructure can sense, learn, detect issues, and make autonomous decisions. This enables early failure detection and predictive maintenance, reducing maintenance costs and preventing service disruptions.
Intel Metro AI Suite release 26.0 provides an agentic Predictive Maintenance Pipeline for cognitive cities critical infrastructure (e.g., bridges, utilities, solar panels), using LLMs and VLMs optimized with OpenVINO to support applications across cognitive city use cases. This project aims to leverage this pipeline to build a sample application and validate it on Intel Core Ultra using open datasets.
Expected outcomes: Agentic AI-driven application for Predictive Maintenance for Cognitive Cities Critical Infrastructure
Skills required/preferred: Required: Agentic AI models, familiarity with development on Intel HW and integrated AI accelerators, familiarity with building AI applications using open-source SW components
Preferred: Familiarity with Intel OpenVINO, and AI models optimization
Mentors: Hassnaa Moustafa, Gopi Krishna, Anand Bodas
Size of project: 175 hours
Difficulty: Medium
Short description: Digital twins for smart buildings are essential to cognitive cities, creating real time virtual replicas for buildings that enable continuous monitoring, predictive maintenance, energy optimization, and data sharing across city systems. This helps buildings operate more efficiently, detect issues earlier, and respond autonomously to changing conditions, making cognitive cities more resilient.
Intel Metro AI Suite release 26.0 provides a modular pipeline that enables live data streaming from buildings, scene analytics, and situational awareness using AI models optimized with OpenVINO.
This project aims to leverage this pipeline to build a sample application that addresses critical scenarios in smart buildings by using diverse AI models, and to validate it on Intel Core Ultra with open datasets.
Expected outcomes: Digital Twin Sample Application for Smart Buildings with Visualization and Dashboard
Skills required/preferred: Required: Media streaming, AI models, familiarity with development on Intel HW and integrated AI accelerators, familiarity with building AI applications using open-source SW components
Preferred: Familiarity with Intel OpenVINO, Intel SceneScape, and AI models optimization
Mentors: Hassnaa Moustafa, Rob Watts
Size of project: 175 hours
Difficulty: Medium
Short description: A simple chat-question-and-answer currently exists that relies on semantic similarity for information retrieval. This project aims to extend that application into an Agentic Retrieval-Augmented Generation (RAG) system using a Graph Vector Database, enabling relational reasoning in addition to semantic similarity.
Public datasets (for example, GraphRAG benchmarking datasets and VIINA) will be used. The solution should employ popular agentic frameworks and a GraphDB of the participant’s choice, aligned with mentor guidance. The approach should support extensibility to multimodal use cases. Models used must not exceed 13B parameters.
Expected outcomes:
- Extension of the existing ChatQnA application to an Agentic GraphRAG solution
- Updated UI to support agentic workflows
- Expanded documentation covering implementation details and benchmark results (including accuracy)
- Deployable solution using Docker and Helm charts
Skills required/preferred:
- Deep Learning and Generative AI experience
- Familiarity with popular open-source GenAI frameworks
- Strong programming skills
- Docker and Kubernetes experience
- Experience on Intel platforms (Core Ultra preferred)
- UI development experience
Mentors: Pankaj Singh, Raghavendra Bhat
Size of project: 175 hours
Difficulty: Medium
Short description: This project builds on the Agentic GraphRAG topic by introducing user feedback–driven fine-tuning using mechanisms such as Thumbs Up / Thumbs Down on generated responses.
Participants are free to choose one or more strategies, including:
- Prompt tuning
- Model fine-tuning
- Knowledge graph updates
- Embedding alignment All implementations must be based on the existing Agentic GraphRAG architecture
Expected outcomes:
- Integrated fine-tuning capability within Agentic GraphRAG
- UI enhancements focused on usability and user experience
Skills required/preferred:
- Deep Learning and Generative AI experience
- Familiarity with popular open-source GenAI frameworks
- Strong programming skills
- Docker and Kubernetes experience
- Experience on Intel platforms (Core Ultra preferred)
- UI development experience
Mentors: Pankaj Singh, Krishna Murti
Size of project: 350 hours
Difficulty: Hard
Short description: Video summarization accuracy is closely tied to compute and memory requirements. Higher accuracy typically demands more resources. This project aims to build a tuning tool that accepts a target accuracy for a given video and estimates the required compute and memory resources with over 80% confidence.
Participants may propose multiple approaches, analyze their pros and cons, and choose one in collaboration with mentors. The project will extend the existing video-search-and-summarization sample application and required microservices.
Expected outcomes:
- A tuning tool integrated into the OEP repository (edge-ai-libraries/tools)
- Complete documentation and deployment assets
- A user-friendly UI
Skills required/preferred:
- Deep Learning and GenAI experience
- Open-source GenAI framework familiarity
- Strong coding skills
- Docker and Kubernetes experience
- Intel platform experience (Core Ultra preferred)
- UI development experience
Mentors: Krishna Murti, Vinod B
Size of project: 350 hours
Difficulty: Medium
Short description: This project proposes a new sample application leveraging multiple Open Edge Platform (OEP) libraries and components (such as DL Streamer and audio transcription, VLM Serving etc.) to enable automated redacting, censorship or editing out portions of video or audio content.
- Input: Video or audio (with defined size limits).
- Processing:
- Pipeline processes and create transcripts of the video/audio.
- The undesired part of the transcript. We define undesired to be as follows:
- Violent
- Abusive
- Confidential - These parts need to be identified and deleted. With that, the corresponding segment/chunk in the video/audio should be impacted. An LLM based reasoning should be used here to identify the undesired part and proceed further.
- LLM Fine-tuning on such undesired content might be required if no such open-source models are available or sufficient.
- An easier version would be to manually identify the undesired content from the audio/video transcript.
- An even interesting version could be to use visual understanding of violence in case of videos - to decide on censoring instead of relying on audio transcripts to make the decision.
- Output: Video or audio with censored, redacted or edited out part.
Expected outcomes: A new sample application which allows users to provide video or audio as an input and automatically edits out audio or video content based on the transcript text or visual understanding based on defined criteria.
Skills required/preferred:
- Audio/Video Codec Understanding
- Basic understanding of popular
- Experience working with open-source Vision Models and Audio models
- UI or CLI Development Skills
- Understanding of open-source LLMs and experience with deploying and getting inferences
- Fine-Tuning of LLMs
- Understanding of build tools and deployments of an end-to-end application, espacially LLM based applications
- Understanding of Microservices architecture - tradeoffs of microservices vs. Monoliths and some experience with building/deploying them.
Mentors: Krishna Murti, Pankaj Singh, Raghavendra Bhat
Size of project: 350 hours
Difficulty: Medium to Hard
Short description: Develop a containerized, web based orchestration and management interface that automatically discovers, introspects, and parses sample applications across the open edge platform/edge ai suites and open edge platform/edge ai libraries repositories. The platform will provide a unified control plane for inspecting application metadata, validating prerequisites, and building or deploying workloads using Docker Compose or single node k3s (Kubernetes).
As the number of sample applications grows across domains such as manufacturing, retail, and metro, the lack of a centralized entry point introduces significant friction, requiring users to manually navigate scattered documentation to evaluate compatibility, resource requirements, and deployment steps. This tool addresses that gap by offering a single, coherent interface for application discovery and deployment lifecycle management.
In addition, the system will instrument deployments to capture and report system level resource utilization metrics—including XPU, memory, and network I/O—providing comparative insights into pre and post deployment resource consumption to aid performance analysis and capacity planning.
Expected outcomes:
- Support for end to end build and deployment workflows using Docker Compose and single node k3s, with centralized metadata inspection.
- A user-friendly UI
Skills required/preferred:
- Strong coding skills
- Docker and Kubernetes experience
- Intel platform experience (Core Ultra preferred)
- UI development experience (React JS)
- Open-source GenAI framework familiarity
Mentors: Vinod B, Sathyendran Vellaisamy
Size of project: 350 hours
Difficulty: Medium
Short description: Store theft significantly impacts retailers' profitability, requiring advanced video analytics solutions to identify suspicious behavior and shoplifting activities. This project aims to build comprehensive video pipelines capable of detecting theft-related actions and gestures from staff and customers using surveillance cameras, staff body cameras, and specialist cameras across the store. The system must detect specific theft activities including consumption of food prior to checkout and suspicious behaviors such as concealing items, leaving the store without paying, employees removing security tags outside the checkout area, and staff leaving through emergency exits. The system will use Vision Language Models (VLMs) for scene understanding and contextual analysis, enabling contextual video retrieval based on specific queries (example: "Man with a white shirt between 2pm and 4pm" or "customer concealing item in coat pocket on Monday"). The entire pipeline must operate on edge computing infrastructure with real-time storage and replay capabilities.
Expected outcomes:
- Multi-camera video analytics pipeline for real-time theft detection
- VLM integration for scene captioning and contextual video retrieval
- Real-time detection of suspicious behaviors (concealing items, unpaid exits, tag removal)
- Video storage system with replay capabilities and contextual search
- Edge computing deployment with performance optimization
- Comprehensive metrics including pipeline latency, VLM latency, throughput, and power consumption
- System monitoring for memory, CPU, GPU, and NPU utilization
Skills required/preferred:
- Computer Vision and Deep Learning experience
- Vision Language Models (VLMs) and multimodal AI
- Video analytics and real-time processing
- Edge computing and optimization techniques
- Strong programming skills (Python)
- Understanding of surveillance systems and video codecs
- Docker and containerization experience
- Intel platform experience (Core Ultra preferred)
Mentors: Jitendra Kumar Saini, Sachin Sharma, Avinash Reddy Palleti
Size of project: 350 hours
Difficulty: Hard
Short description: Interactive kiosks in retail environments operate in noisy, high-traffic conditions where robust audio processing is essential for effective customer assistance. This project focuses on building an optimized voice pipeline that reduces word error rates and improves speech interaction accuracy for self-service kiosks. The system must handle speech recognition, sentiment analysis, and speaker diarization while processing customer queries against a knowledge base. Additionally, the pipeline should integrate camera-based audience measurement and support multiple languages to serve diverse customer populations. The entire solution must operate efficiently on edge computing infrastructure with comprehensive performance monitoring.
Expected outcomes:
- Robust speech recognition pipeline optimized for noisy retail environments
- Integrated sentiment analysis and speaker diarization capabilities
- Multi-language support for diverse customer interactions
- Customer query processing system with knowledge base integration
- Camera integration for audience measurement and context awareness (people counting, queue management, engagement tracking, occupancy detection)
- Performance optimization for edge deployment with low latency
- Comprehensive metrics including word error rate, pipeline latency, and throughput
- System monitoring for memory, CPU, GPU, and NPU utilization
Skills required/preferred:
- Speech Recognition and Audio Processing experience
- Natural Language Processing and sentiment analysis
- Multi-language speech systems development
- Edge computing and real-time processing
- Strong programming skills (Python, audio libraries)
- Experience with speech recognition frameworks (Whisper, Wav2Vec, etc.)
- Camera integration and computer vision basics
- Docker and containerization experience
- Intel platform experience (Core Ultra preferred)
Mentors: Jitendra Kumar Saini, Sachin Sharma, Avinash Reddy Palleti
Size of project: 350 hours
Difficulty: Medium
Short description:
Improve the Edge Manageability Framework by adding an agentic AI component. The agentic AI would be responsible for taking a natural language query such as “onboard node and deploy wordpress on it” and generate a series of steps that would carry out those actions. It would evaluate the result and take corrective action if a step fails. The agentic interface should be presented in a visual manner, such as a dashboard.
Expected outcomes:
- New dashboard pages and widgets, could be integrated into EMF or side-by-side in a native UI provided by Agentic AI toolset.
- Agentic AI backend
- Demonstrate three multi-step workflows, for example.
- “Onboard an edge node and deploy an application”
- “Look for broken applications and re-deploy them”
- “Migrate applications from one region to another”
- Documentation and demo walkthrough
Skills required/preferred: Agentic AI, API integration, Go Programming, Optional Javascript/Typescript and UI design.
Mentors: Scott Baker, Russel Callen
Size of project: 350 hours
Difficulty: Hard
Short description:
Create a tool that evaluates logs and metrics from Edge nodes and identifies anomalies, suggesting the root cause of issues. For example, it could identify a workload that has run away consuming too many resources or identify problems with network issues.
Expected outcomes:
- AI tool for edge node / log file analysis
- Retrieval of logs from opentelemetry-compatible endpoint
- Demonstrate three diagnostic results, such as
- Excessive resource consumption / Resource exhaustion
- Firewall misconfiguration / Network connectivity issue
- Hardware device fault / Driver failure
- Tests and documentation
Skills required/preferred: Go, distributed systems, security practices, AI analysis
Mentors: Scott Baker, Russel Callen
Size of project: 350 hours
Difficulty: Hard
Short description:
This project focus on developing a SW pipeline to capture real world through smartphone camera and seamlessly import that to 3D simulation software (Gazebo / Genesis) in true scale to train a virtual robot to navigate in the virtual world setup.
Expected outcomes: Showcase virtual robot (AMR, Humanoid) navigate to the instructed destination within a 3D virtual environment captured from real world.
Skills required/preferred: Python programming, Exposure to 3D visual representation (URDF, 3DGS etc), 3D reconstruction techniques, Hands on with Gazebo simulation, exposure to Colmap/Blender/Unreal3D is a plus.
Mentors: Mrutunjayya Mrutunjayya, Selvakumar Panneer
Size of project: 175 hours
Difficulty: Medium to Hard
© Copyright 2018-2024, OpenVINO team
- Home
- General resources
- How to build
-
Developer documentation
- Inference Engine architecture
- CPU plugin
- GPU plugin
- HETERO plugin architecture
- Snippets
- Sample for IE C++/C/Python API
- Proxy plugin (Concept)
- Tests