Skip to content

feat: add native Groq model integration for high-speed evaluations#2556

Open
Jayachander123 wants to merge 5 commits intoconfident-ai:mainfrom
Jayachander123:add-groq-integration
Open

feat: add native Groq model integration for high-speed evaluations#2556
Jayachander123 wants to merge 5 commits intoconfident-ai:mainfrom
Jayachander123:add-groq-integration

Conversation

@Jayachander123
Copy link
Copy Markdown

@Jayachander123 Jayachander123 commented Mar 17, 2026

Description

This PR introduces native support for Groq (GroqModel) to the DeepEval framework, allowing users to leverage Groq's ultra-fast LPU inference engine for high-speed LLM evaluations.

Motivation

As evaluation datasets grow larger, metric calculation speed becomes a bottleneck. Integrating Groq natively allows developers to run DeepEval metrics significantly faster using models like llama3-8b-8192.

Changes Made

  • Core Model (deepeval/models/llms/groq_model.py): Implemented GroqModel inheriting from DeepEvalBaseLLM. Added support for both synchronous (generate) and asynchronous (a_generate) execution, as well as JSON mode structured outputs.
  • Lazy Loading: Utilized DeepEval's require_dependency for the groq SDK. This ensures groq remains an optional dependency and does not bloat the environment for non-Groq users.
  • Configuration (deepeval/config/settings.py): Added GROQ_API_KEY, GROQ_MODEL_NAME, and USE_GROQ_MODEL to the central Pydantic Settings class for seamless .env management.
  • Security & Type Safety: Handled Pydantic SecretStr unwrapping in the model loaders to ensure secure key management without causing isinstance type errors in the underlying Groq client.
  • Documentation (docs/): Added comprehensive .mdx documentation for Groq under the integrations tab, including setup instructions and Python/ENV usage examples.

Testing

Added comprehensive unit tests in tests/test_core/test_models/test_groq_model.py:

  • Verified that explicit API keys passed to the constructor override .env settings.
  • Verified that if no key is provided, the model correctly falls back to Settings.GROQ_API_KEY and successfully unwraps the SecretStr before passing it to the client.
  • All tests pass locally.

Checklist

  • Code is formatted with black and linted with ruff
  • New unit tests added and passing
  • Lazy loading implemented (no new required dependencies added to pyproject.toml)
  • Documentation added following existing integration formats

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 17, 2026

@Jayachander123 is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

@Jayachander123
Copy link
Copy Markdown
Author

Hi team! Just a quick heads-up regarding the failing CI checks—my new code is working perfectly, but the pipeline is hitting some standard fork-related restrictions:

  1. Core & Confident Tests (test): These are failing due to missing OPENAI_API_KEY and CONFIDENT_API_KEY secrets in the GitHub Actions environment, which is the expected security behavior for PRs originating from a fork. My local tests for GroqModel pass successfully.
  2. Lint Lint Lint (lint): The black formatting step failed because it flagged 102 existing files across the repository that need reformatting. I already ran black and ruff locally on the 4 specific files I modified, so my additions are fully compliant with your style guide.
  3. Vercel: This is pending because Vercel requires a repository maintainer to manually authorize preview deployments from outside contributors.

Let me know if you need me to adjust anything, or if you are able to trigger the test workflows on your end with the secrets enabled!

@A-Vamshi
Copy link
Copy Markdown
Collaborator

Hey @Jayachander123, thanks for raising this PR! It looks mostly good, just added some comments above, could you please resolve them? Let me know if you have any doubts, thank you! :)

@Jayachander123
Copy link
Copy Markdown
Author

Hey @Jayachander123, thanks for raising this PR! It looks mostly good, just added some comments above, could you please resolve them? Let me know if you have any doubts, thank you! :)

Hey @Jayachander123, thanks for raising this PR! It looks mostly good, just added some comments above, could you please resolve them? Let me know if you have any doubts, thank you! :)

Hi @A-Vamshi! Thank you so much for taking the time to review this. I would love to resolve your comments, but I actually don't see any inline code comments on my end in the PR diff or the conversation timeline.

Please let me know what you'd like me to change and I'll get right on it!

Copy link
Copy Markdown
Collaborator

@A-Vamshi A-Vamshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Jayachander123, here's the comments, hope they're visible now :)

Comment thread deepeval/models/llms/groq_model.py Outdated
default_groq_model = "llama3-8b-8192"

# Use a standard string for the retry decorator if ProviderSlug.GROQ doesn't exist yet
retry_groq = create_retry_decorator("Groq")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the groq as enum inside the PS (ProviderSlug) itself? Here's where you can add it btw: https://github.com/confident-ai/deepeval/blob/main/deepeval/constants.py#L27

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've added GROQ = "groq" to the ProviderSlug enum in deepeval/constants.py and updated the retry decorator to use it.

Comment thread deepeval/models/llms/groq_model.py Outdated
return self._client

def load_async_model(self) -> "AsyncGroq":
"""Initializes and caches the asynchronous Groq client."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like most of the code here is repeated but for sync and async logic, could we combine the logic into one method load_model and separate the logic using self.async_mode? Please look at the existing examples to see how we can do that: https://github.com/confident-ai/deepeval/blob/main/deepeval/models/llms/grok_model.py#L256

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've removed load_async_model and combined both into a single load_model(self, async_mode: bool = False) method, following the Grok model example.

# Generation Methods
# -------------------------------------------------------------------------
@retry_groq
def generate(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the generate and a_generate method, if you look at the other model examples, we have a way to support images inside them as well, an example for this is here: https://github.com/confident-ai/deepeval/blob/main/deepeval/models/llms/openai_model.py#L154

I've looked through the official docs to see how to pass images, you can see them here: https://console.groq.com/docs/vision#how-to-pass-images-from-urls-as-input

It looks like the API is mostly similar to openai so the above shared example should help you implement this, don't worry about supporting images if it's too hard, you can safely ignore this comment :)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing the Groq vision docs! I will safely ignore this one for now just to keep this initial integration focused and stable, but we can definitely add multimodal support in a future PR.

Comment thread deepeval/models/llms/groq_model.py Outdated
Comment on lines +202 to +215
def supports_log_probs(self) -> Union[bool, None]:
return False

def supports_temperature(self) -> Union[bool, None]:
return True

def supports_multimodal(self) -> Union[bool, None]:
return False

def supports_structured_outputs(self) -> Union[bool, None]:
return True

def supports_json_mode(self) -> Union[bool, None]:
return True
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these methods, we just need to add the constants inside the self.model_data and return them here, I'm sharing some examples that would help you understand how to write this here:

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I added GROQ_MODELS_DATA to deepeval/models/llms/constants.py using make_model_data (with verified Groq API pricing). I then updated init to load this into self.model_data and replaced the hardcoded booleans here with dynamic lookups.

@Jayachander123 Jayachander123 requested a review from A-Vamshi March 19, 2026 20:12
Copy link
Copy Markdown
Collaborator

@A-Vamshi A-Vamshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Jayachander123, everything looks good, would you be open to writing docs for this as well? If you're willing to write them, please follow the existing pattern used in the existing models docs in the integrations tab, thank you :) Let me know if you're not too keen on the documentation side, we can get this PR merged right away!

@Jayachander123
Copy link
Copy Markdown
Author

Hey @Jayachander123, everything looks good, would you be open to writing docs for this as well? If you're willing to write them, please follow the existing pattern used in the existing models docs in the integrations tab, thank you :) Let me know if you're not too keen on the documentation side, we can get this PR merged right away!

Hey @A-Vamshi! I'm really glad the code looks good. I went ahead and wrote the documentation as you suggested.

I added groq.mdx following the exact format of the existing Gemini docs in the integrations tab, I've pushed the commit to this PR and updated the description.

Let me know if the docs look good or if you need any formatting tweaks before we merge! Thanks again for all the help.

@Jayachander123 Jayachander123 requested a review from A-Vamshi March 23, 2026 23:43
@penguine-ip
Copy link
Copy Markdown
Contributor

Hey @Jayachander123 we're missing a few things, specifically, add calculate_cost, use require_costs, add generation_kwargs, remove the os.environ fallback, add cost Settings fields, from what i can see

@Jayachander123
Copy link
Copy Markdown
Author

Hey @Jayachander123 we're missing a few things, specifically, add calculate_cost, use require_costs, add generation_kwargs, remove the os.environ fallback, add cost Settings fields, from what i can see

Hey @penguine-ip, thanks for catching those! I've just pushed a commit that addresses all your points to align with the latest model standards:

  • Added cost Settings fields: Added GROQ_COST_PER_INPUT_TOKEN and GROQ_COST_PER_OUTPUT_TOKEN to deepeval/config/settings.py.
  • Removed os.environ fallback: Updated init so API keys and costs now rely purely on the central Settings class.
  • Added generation_kwargs: Added this to init and unpacked it into chat_args in both generate and a_generate.
  • Added calculate_cost & require_costs: Implemented the calculate_cost method using the require_costs utility for strict pricing validation, and updated the generation methods to return the actual calculated cost instead of 0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants