-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Description
Feluda is a configurable engine for analyzing multi-lingual and multi-modal content. It allows researchers, factcheckers and journalists to explore and analyze large quantity of multimedia content. Feluda has a component called operators , which are built keeping in mind the need to process data in various modalities (text, audio, video, images, hybrid) and various languages. Each operator is a python package, and we have a monorepo with a multi-package system. As Feluda continues to grow, ensuring consistency and robustness across these packages is crucial for long-term maintainability and ease of contribution.
The goal of the project is to improve the internal structure of Feluda packages, enhance documentation, and optimize performance, laying the groundwork for a stable v1.0.0 release.
Goals
- Standardize interfaces and functions across all Feluda Python packages.
- Write comprehensive documentation for all packages.
- Achieve close to 100% test coverage with unit and integration tests.
- Enhance package build robustness and integration capabilities with other applications.
- Create practical tutorials (jupyter notebooks) and use cases for fact-checkers and researchers.
Expected Outcome
- Consistent APIs across all packages with standardized functions
- Basic documentation hosted on Read the Docs covering package functionality, input/output formats, and detailed use cases.
- Improved error handling with clear, actionable messages for the user
- A collection of tutorials and notebooks demonstrating real-world applications like extracting text from newspaper images, clustering large amounts of video to detect social thematic labels etc
- Expanded test coverage for better reliability
Acceptance Criteria
- Write Tests to validate compliance for current feluda operators
- Write documentation and python notebooks to demonstrate use of current operators
- Participate in weekly update calls and demonstrate progress
Implementation Details
- Standardization:
- Define a common interface (
API) for all packages withinit()andrun()functions. - Implement proper error handling and warnings
- Ensure feluda raises exceptions when working with incompatible operators
- Define a common interface (
- Documentation:
- Create a Read the Docs template covering each package's purpose, function signatures, expected inputs/outputs, and common use cases.
- Develop example notebooks (Google Colab/Marimo) showcasing practical applications for researchers and fact-checkers.
- Robustness:
- Optimize model loading processes to reduce memory footprint and improve runtime efficiency.
- Testing:
- Expand test coverage to include integration tests across packages.
- Implement CI GitHub Action pipelines to run tests and other safety checks.
Product Name
Feluda
Organisation Name
Tattle
Domain
Open Source Library
Tech Skills Needed
Python, Object-Oriented Programming, Machine Learning, Performance Improvement, Docker, Testing, Technical Writing.
Mentor
Denny George (@dennyabrain ), Aatman Vaidya (@aatmanvaidya )
Category
Data Science, Machine Learning