Skip to content

[DOC] Add notebook to show using query with id filter #4521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jairad26
Copy link
Contributor

@jairad26 jairad26 commented May 12, 2025

Description of changes

This PR adds a python notebook showing how to use the id filter in query to help in decreasing the search space. The notebook goes through an ecommerce example, where the user has categories and products stored in a sqlite3 db, with only the text data stored in chroma. It then uses normal sql queries to find the subset of IDs it wants to search across semantically, and passes that to chroma.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@jairad26 jairad26 marked this pull request as ready for review May 12, 2025 06:14
" min_price, max_price = price_ranges[category_id]\n",
" \n",
" for i, product_name in enumerate(product_names[category_id]):\n",
" description = descriptions[category_id][i] if i < len(descriptions[category_id]) else \"Product description not available.\"\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

There's a mismatch between product descriptions and their actual associations in the demo output. For example, in scenario 2 results, the CloudRest Memory Foam Mattress shows description about denim jeans, and CrispWave Air Fryer shows description about t-shirts. This happens because the setup logic assigns descriptions from one category to products from another.

In the setup_databases function, when fetching descriptions for each product, try ensuring that descriptions match the correct category and product index:

Copy link
Contributor

propel-code-bot bot commented May 12, 2025

Add Example Notebook: Using Query with ID Filter in Chroma

This PR introduces a new Python Jupyter notebook that demonstrates how to use Chroma's query functionality with an ID filter to reduce search space, using a legal cases dataset as an illustrative example. The notebook walks through creating and populating SQLite tables, ingesting text data into Chroma, and querying using filtered IDs derived from structured SQL queries, thereby showcasing best practices for combining structured and semantic search.

Key Changes:
• Added 'query_with_id_filter.ipynb' under examples/basic_functionality/
• Notebook builds a synthetic legal cases dataset with jurisdictions and case types in SQLite and indexed text in Chroma.
• End-to-end demonstration of fetching candidate document IDs via SQL, then using those IDs to semantically search with Chroma's query() API.
• Notebook code generates sample scenario, database setup logic, and semantic search queries.

Affected Areas:
• examples/basic_functionality/query_with_id_filter.ipynb

This summary was automatically generated by @propel-code-bot

@jairad26 jairad26 force-pushed the jai/query-allow-ids-notebook branch from 234a249 to 5f8d886 Compare May 13, 2025 18:09
@jairad26 jairad26 force-pushed the jai/query-allow-ids-notebook branch from 5f8d886 to e5da83c Compare May 19, 2025 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant