Skip to content

Fix VoyageAI Text Embedder Issue (#2832) #2833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

devin-ai-integration[bot]
Copy link
Contributor

Fix VoyageAI Text Embedder Issue (#2832)

Issue

Users were encountering an error when trying to use the VoyageAI embedder with CrewAI:

No module named 'chromadb.utils.embedding_functions.voyageai_embedding_function'

Root Cause

The VoyageAI embedding function was listed in the EmbeddingConfigurator class but the actual implementation was missing from the ChromaDB embedding functions.

Solution

  1. Created a VoyageAI embedding function implementation in the CrewAI repository
  2. Added the necessary directory structure and imports
  3. Added tests to verify the VoyageAI embedder configuration works correctly

Testing

The implementation has been tested with the following configuration:

embedder={
    "provider": "voyageai",
    "config": {
        "model": "voyage-3",
        "api_key": "your-api-key",
    },
}

The implementation supports all VoyageAI embedding models:

  • voyage-3
  • voyage-3.5
  • voyage-3.5-lite

How to Test

  1. Install the voyageai package: pip install voyageai
  2. Configure a knowledge base with the VoyageAI embedder as shown above
  3. Run the tests: uv run pytest tests/utilities/test_embedding_configurator.py -v

Link to Devin run: https://app.devin.ai/sessions/67ab732085a54ecb893bd5081f4c178a
Requested by: Joe Moura ([email protected])

Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@joaomdmoura
Copy link
Collaborator

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment for PR #2833: VoyageAI Embedding Function Implementation

Summary

PR #2833 introduces VoyageAI embedding functionalities aimed at integrating with ChromaDB. While the implementation demonstrates a solid structure and adheres to good practices, several areas for enhancement were identified, focusing on error handling, type hinting, testing, and security.

File-by-File Analysis

1. Module Structure (__init__.py files)

  • Positive Aspects:
    • The inclusion of __all__ enhances clarity regarding public APIs.
    • Well-organized module structure improves maintainability and reduces the likelihood of import issues.

2. voyageai_embedding_function.py

  • Identified Issues:
    1. Import Redundancy:

      try:
          import voyageai
      except ImportError:
          raise ValueError(...)
      • Recommendation: Refactor to move redundancy into a method:
      def _ensure_voyageai_installed():
          try:
              import voyageai
          except ImportError:
              raise ValueError("The voyageai python package is not installed.")
    2. Error Handling:

      • The current generic exception handling obscures actual issues.
      • Recommendation: Apply more specific error handling:
      except voyageai.VoyageError as e:
          logger.error(f"VoyageAI API error: {e}")
          raise
    3. Type Hints Improvement:

      • Current type hints lack specificity.
      • Recommendation: Enhance type hints for clarity:
      def __call__(self, input: Union[str, List[str]]) -> List[List[float]]:
    4. Input Validation:

      • It is crucial to ensure input is validated.
      • Recommendation: Implement comprehensive input checks:
      if not isinstance(input, (str, list)):
          raise ValueError("Input must be a string or a list of strings.")

3. test_embedding_configurator.py

  • Identified Issues:
    1. Test Coverage:
      • Lack of tests for edge cases and error handling.
      • Recommendation: Extend test cases for better coverage:
      def test_invalid_configuration(self):
          with pytest.raises(ValueError):
              config = {}
              configurator = EmbeddingConfigurator()
              configurator._configure_voyageai(config, "voyage-3")

General Recommendations

  1. Documentation: Include docstrings for all classes and methods to facilitate understanding for new developers.

  2. Configuration Validation: Implement validation to ensure that the provided API key and model names are correct and secure.

  3. Performance Optimization: Consider adding batch processing functionalities to enhance efficiency with larger datasets.

  4. Logging Enhancement: Introduce detailed logging mechanisms for better monitoring and debugging capabilities.

Security Considerations

  1. API Key Handling: Ensure secure storage and management of API keys, possibly by utilizing environment variables.

  2. Input Sanitization: Strengthen validation of inputs to prevent harmful data from being processed.

Conclusion

The VoyageAI embedding function implementation is a solid foundation, but addressing the identified issues could lead to a more robust, maintainable, and secure application. Implementing the recommendations provided will enhance the overall quality and reliability of the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant