Skip to content

[DMP 2025]: Standardizing and Enhancing the Feluda Python Packages #516

@aatmanvaidya

Description

@aatmanvaidya

Description

Feluda is a configurable engine for analyzing multi-lingual and multi-modal content. It allows researchers, factcheckers and journalists to explore and analyze large quantity of multimedia content. Feluda has a component called operators , which are built keeping in mind the need to process data in various modalities (text, audio, video, images, hybrid) and various languages. Each operator is a python package, and we have a monorepo with a multi-package system. As Feluda continues to grow, ensuring consistency and robustness across these packages is crucial for long-term maintainability and ease of contribution.
The goal of the project is to improve the internal structure of Feluda packages, enhance documentation, and optimize performance, laying the groundwork for a stable v1.0.0 release.

Goals

  • Standardize interfaces and functions across all Feluda Python packages.
  • Write comprehensive documentation for all packages.
  • Achieve close to 100% test coverage with unit and integration tests.
  • Enhance package build robustness and integration capabilities with other applications.
  • Create practical tutorials (jupyter notebooks) and use cases for fact-checkers and researchers.

Expected Outcome

  • Consistent APIs across all packages with standardized functions
  • Basic documentation hosted on Read the Docs covering package functionality, input/output formats, and detailed use cases.
  • Improved error handling with clear, actionable messages for the user
  • A collection of tutorials and notebooks demonstrating real-world applications like extracting text from newspaper images, clustering large amounts of video to detect social thematic labels etc
  • Expanded test coverage for better reliability

Acceptance Criteria

  • Write Tests to validate compliance for current feluda operators
  • Write documentation and python notebooks to demonstrate use of current operators
  • Participate in weekly update calls and demonstrate progress

Implementation Details

  1. Standardization:
    • Define a common interface (API) for all packages with init() and run() functions.
    • Implement proper error handling and warnings
    • Ensure feluda raises exceptions when working with incompatible operators
  2. Documentation:
    • Create a Read the Docs template covering each package's purpose, function signatures, expected inputs/outputs, and common use cases.
    • Develop example notebooks (Google Colab/Marimo) showcasing practical applications for researchers and fact-checkers.
  3. Robustness:
    • Optimize model loading processes to reduce memory footprint and improve runtime efficiency.
  4. Testing:
    • Expand test coverage to include integration tests across packages.
    • Implement CI GitHub Action pipelines to run tests and other safety checks.

Product Name

Feluda

Organisation Name

Tattle

Domain

Open Source Library

Tech Skills Needed

Python, Object-Oriented Programming, Machine Learning, Performance Improvement, Docker, Testing, Technical Writing.

Mentor

Denny George (@dennyabrain ), Aatman Vaidya (@aatmanvaidya )

Category

Data Science, Machine Learning

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions