Releases: Dataherald/dataherald
v1.0.3
Release Notes for Version 1.0.3
What's New
1. New features
- Added Redshift support c8e55a2
- Added multi-schemas support for db connections, it only works for Postgres, Bigquery, Snowflake and Databricks d4d6f4e
2. Improvements and fixes
- Fixed
urivalidation for db connections f40ac0e - Fixed the fallback and confidence score 30f5226
- Fixed the observation code blocks a11d1f1
- Fixed refresh endpoint error handling 15b6d46
- Fixed the malformed sql queries in intermediate steps 828c64d
- If the sql-generation endpoint gets an invalid sql it should raise an error fbd96ea
New Contributors
- @toliver38 #449
- @zhanpengjie #456
- @akshayrakate #475
v1.0.2
Release Notes for Version 1.0.2
What's New
1. New features
- Adds Astra vector store support 6f39892
- Adds MS SQL Server support 078c17d
- Adds Streaming endpoint to show intermediate steps 1205d8a
- Adds support to Pinecone serverless 7906f03
- Adds intermediate steps in the SQL Generation response 3dbd483
- Adds LangSmith metadata param(
langsmith_metadata) to easily filter cf88a1b - Stores the db dialect when a db connection is created 809ac31
2. Improvements and fixes
- Adds logs when a request fails 09f65c6
- Adds descriptions to the new agent faf07de
- Fixes malformed LLM output 4190b4d
- Documents error codes e94c788
- Fixes the running query forever issue cfb1d5b
- Fixes the error parsing handler 8751410
- Added Click House Hyperloglog support to improve the scaninng 61a92c9
- Fixes SQL generation 5160e8d
- Fixes background scanner process in Parallel 88ee8fa
- Fixes error handling for golden SQL additions 8efb00f
3. Migration Script
- Purpose: To facilitate a smooth transition from version 1.0.1 to version 1.0.2, we've introduced a migration script.
- Data Modifications: The script performs the following actions:
- Decrypts all the db connection
uricolumn - Executes a regex method to retrieve the db dialect.
- Stores
dialectcolumn indatabase_connectionsmongo collection.
- Decrypts all the db connection
To run the migration script, use the following command:
docker-compose exec app python3 -m dataherald.scripts.populate_dialect_db_connection
New Contributors
v1.0.1
Release Notes for Version 1.0.1
What's New
1. New features
- Added clickhouse support d494fed
- MariaDB/MySQL support officially added and documented. 7b86ad3
- Added a refresh endpoint (
POST /api/v1/table-descriptions/refresh) to get the table name from a specified database and store them into thetable-descriptionMongo collection. This improves response time when querying thetable-descriptionlist endpoint (GET /api/v1/table-descriptions). 28b8130 - Implemented error codes for better error handling. Now errors response a 400 HTTP status code. 2c70f16
2. Changes and fixes
- Reduced SSH fields in requests by utilizing the
connection_urifield. 64ceb6e - Updated LLM with the latest models. dd440f2
- Expanded functionality to allow SSH connections on different ports. 1a5a2be
- Improved performance for scanning endpoint (
POST /api/v1/table-descriptions/sync-schemas). 435884e
3. Migration Script
- You don't need to update the data if you're already using the stable 1.0.0 version; you can simply pull these changes.
New Contributors
- @rajeshmohapatra-ayla #407
- @AmazingAbhi #403
- @moltar #396 #395
- @nalz #392
- @rwatts3 6f81890
v01.0.0
Release Notes for Version 1.0.0
What's New
1. New Resources, Attributes, and Endpoints
- Finetuning: One of our new exciting features is automatic finetuning GPT family models on your golden questions/SQLs pairs.
POST /api/v1/finetuning: By calling this endpoint you can create a fientuning job on your golden question/SQL pairs. The only required parameter is the db_connection_id and you have the option to specify which golden question/SQL pairs you want to use for finetuning process.GET /api/v1/finetuning/{finetuning_di}: With this endpoint you can retrieve the status of the finetuning process and once the status is SUCCEEDED you can use the model for SQL generation.POST /api/v1/finetuning/{finetuning_id}/cancel: If you want to cancel the finetuning for whatever reason you can call this endpoint.GET /api/v1/finetuning: List all of the finetuned models for a given db_connection_idDELETE /api/v1/finetuning/{finetuning_di}: Delete a given finetuned model from the finetunings collection.
- Metadata: All resources now include a
metadataattribute, allowing you to store additional information for internal purposes. Soon, GET list endpoints will support filtering based on metadata fields.
2. Resource and Endpoint Changes
-
Renaming
questionstoprompts: The entity has been renamed toPrompt, and the collection is now calledprompts. You can use the following endpoints to interact with this resource:GET /api/v1/prompts: List all existingprompts.POST /api/v1/prompts: Create a newprompt.GET /api/v1/prompts/{prompt_id}: Retrieve a specificprompt.PUT /api/v1/prompts/{prompt_id}: Update the metadata for aprompt.
-
Splitting
responsesintosql_generationandnl_generation: The previousresponsesresource has been divided intosql_generationsandnl_generations. You can work with them as follows:-
POST /api/v1/prompts/{prompt_id}/sql-generations: Create asql-generationfrom an existingprompt. -
POST /api/v1/prompts/sql-generations: Create a newpromptand asql-generation. -
GET /api/v1/prompts/sql-generations: Listsql-generations. -
GET /api/v1/sql-generations/{sql_generation_id}: Retrieve a specificsql-generation. -
PUT /api/v1/sql-generations/{sql_generation_id}: Update the metadata for asql-generation. -
GET /api/v1/sql-generations/{sql_generation_id}/execute: Execute the created SQL and retrieve the result. -
GET /api/v1/sql-generations/{sql_generation_id}/csv-file: Execute the created SQL and generate a CSV file using the result. -
POST /api/v1/sql-generations/{sql_generation_id}/nl-generations: Create annl-generationfrom an existingsql-generation. -
POST /api/v1/prompts/{prompt_id}/sql-generations/nl-generations: Create asql-generationand annl-generationfrom an existingprompt. -
POST /api/v1/prompts/sql-generations/nl-generations: Create aprompt,sql-generation, andnl-generation. -
GET /api/v1/nl-generations: List allnl-generations. -
GET /api/v1/nl-generations/{nl_generation_id}: Retrieve a specificnl-generation. -
GET /api/v1/sql-generations/{sql_generation_id}: Retrieve a specificsql-generation. -
PUT /api/v1/nl-generations/{nl_generation_id}: Update the metadata for annl-generation.
-
-
Renaming
golden_recordstogolden_sqls: We've updated the name for all endpoints, entities, and collections.
3. Migration Script
- Purpose: To facilitate a smooth transition from version 0.0.5 to version 1.0.0, we've introduced a migration script.
- Data Modifications: The script performs the following actions:
- Renames the
golden_recordscollection togolden_sqls. - Replaces all related data types from
ObjectIdto strings. - Updates table descriptions by changing "SYNCHRONIZED" status to "SCANNED" and "NOT_SYNCHRONIZED" to "NOT_SCANNED."
- Utilizes the existing
questionscollections to create thepromptscollection. - Converts
responsescollections intosql_generationsandnl_generationscollections.
- Renames the
To run the migration script, use the following command:
docker-compose exec app python3 -m dataherald.scripts.migrate_v006_to_v100
We hope that these changes enhance your experience with our platform. If you have any questions or encounter any issues, please don't hesitate to reach out to our support team.
v0.0.6
What's Changed
1. Changes in POST /api/v1/responses endpoint:
If the sql_query body parameter is not set, the response is regenerated. This process generates new values for sql_query, sql_result, and response.
2. Introducing the generate_csv flag:
The generate_csv flag is a parameter that allows the generation of a CSV file populated with the sql_query_result rows. This parameter can be set in both POST /api/v1/responses and POST /api/v1/questions endpoints.
-
If the file is created, the response will include the field
csv_file_path. For example:"csv_file_path": "s3://k2-core/c6ddccfc-f355-4477-a2e7-e43f77e31bbb.csv" -
Additionally, if the
generate_csvflag is set toTrue, thesql_query_resultwill returnNULLwhen it contains more than 50 rows.
3. Configure S3 Credentials:
- You have the flexibility to set your S3 credentials to store the CSV files within the
POST /api/v1/database-connectionsendpoint as follows:
"file_storage": {
"name": "string",
"access_key_id": "string",
"secret_access_key": "string",
"region": "string",
"bucket": "string"
}- If S3 credentials are not specified within the
db_connection, the system will use the S3 credentials from your environment variables, as set in your.envfile.
These changes will improve the consistency and maintainability of your application's data structures and APIs. If you encounter any issues during the upgrade process, please don't hesitate to reach out to our support team.
New Contributors
v0.0.5
What's Changed
1. Endpoint Update
-
Affected Endpoints: The changes impact two API endpoints:
--POST /api/v1/database-connections: This endpoint is used to create a database connection.
--PUT /api/v1/database-connections/{db_connection_id}: This endpoint is used to update a database connection. -
Change Description: The
llm_credentials objectin these endpoints has been replaced with thellm_api_key field, which now only accepts strings as its value. In other words, thellm_credentialsfield has been removed, and it has been replaced with a simplerllm_api_keyfield that can only hold string values. This change suggests a more straightforward approach to managing API keys or credentials within the system.
5. Migration Script
-
Purpose: A migration script has been introduced to assist users in smoothly transitioning their data from version 0.0.4 to version 0.0.5.
-
Data Modification: This script operates on the data_connections collection and performs the following action:
It replaces the llm_credentials field with the llm_api_key field but only if the llm_credentials field is populated in the data. In other words, if there is data in the llm_credentials field, the script will transfer it to the new llm_api_key field.
To run the migration script, use the following command:
docker-compose exec app python3 -m dataherald.scripts.migrate_v004_to_v005
These changes will improve the consistency and maintainability of your application's data structures and APIs. If you encounter any issues during the upgrade process, please don't hesitate to reach out to our support team.
New Contributors
v0.0.4
What's Changed f57fde5
1. Endpoint Renaming
We have streamlined our API endpoints for better consistency and clarity:
Renamed Endpoints:
- POST /api/v1/nl-query-responses is now POST /api/v1/responses.
- POST /api/v1/question is now POST /api/v1/questions.
2. Endpoint Removal
In this version, we have removed the following endpoint:
- PATCH /api/v1/nl-query-responses/{query_id}.
Note: Responses resources are now immutable, so you can only create new responses and not update existing ones.
3. MongoDB Collection and Field Renaming
To improve consistency and readability, we have renamed MongoDB collection and field names:
Collection Name Changes:
- nl_questions collection has been renamed to questions.
- nl_query_responses collection has been renamed to responses.
Field Name Changes (within the responses collection):
- nl_question_id has been renamed to question_id.
- nl_response has been renamed to response.
4. Use of ObjectId for Foreign Keys
To enhance data integrity and relationships, we have transitioned to using ObjectId types for foreign keys, providing stronger data typing.
5. Migration Script
We've created a migration script to help you smoothly transition your data from version 0.0.3 to version 0.0.4. This script updates collection names, field names, and foreign keys data type to ObjectId. To run the migration script, use the following command:
docker-compose exec app python3 -m dataherald.scripts.migrate_v003_to_v004
Upgrade Instructions:
To upgrade to Version 0.0.4, follow these steps:
- Ensure you have Docker Compose installed.
- Pull the latest version of the application.
- Run the provided migration script as shown above.
These changes will improve the consistency and maintainability of your application's data structures and APIs. If you encounter any issues during the upgrade process, please don't hesitate to reach out to our support team.
New Contributors
v0.0.3
What's Changed
1. Validate Database Connection Requests 5937b35
- When a database connection is created or updated, it now attempts to establish a connection.
- If the connection is successfully established, it is stored, and a
200response is returned. - In case of failure, a
400error response is generated.
2. Add LLM Credentials to Database Connection Endpoints 2d9e873
- With the latest update, when creating or updating a database connection, you have the option to set LLM credentials. This allows you to use different keys for different connections
3. SSH Connection Update a66f7d8
- We have discontinued the use of the
private_key_pathfield for SSH connections. - Instead, we now utilize the
path_to_credentials_fileto specify the path to the SSH private key file.
4. Enhanced Table Scanning with Background Tasks fdc3bb7
- We have implemented background tasks for asynchronous table scanning.
- The endpoint name has been updated from
/api/v1/table-descriptions/scanto/api/v1/table-descriptions/sync-schemas. - This enhancement ensures that even if the process operates slowly, potentially taking several minutes, the HTTP response remains consistently fast and responsive.
5. Returns Scanned Tables and Not Scanned Tables 9e2d119
- This endpoint
/api/v1/table-descriptionsshould make a db connection to retrieve all the table names and check which tables have been scanned to generate a response. - The status can be:
NOT_SYNCHRONIZEDif the table has not been scannedSYNCHRONIZINGwhile the sync schema process is runningDEPRECATEDif there is a row in ourtable-descriptionscollection that is no longer in the database, probably because the table/view was deleted or renamedSYNCHRONIZEDwhen we have scanned the tableFAILEDif anything failed during the sync schema process, and theerror_messagefield stores the error.
6. Migration Script from v0.0.2 to v0.0.3 9e2d119
- This script facilitates the transition from version v0.0.2 to v0.0.3 by performing the following essential task:
In the table_descriptions collection, it updates the status field to the value SYNCHRONIZED.
To execute the script, simply run the following command:
docker-compose exec app python3 -m dataherald.scripts.migrate_v002_to_v003
New Contributors
v0.0.2
What's Changed
1. RESTful Endpoint Names and Swagger Grouping
We have made significant changes to our endpoint naming conventions, following RESTful principles. Additionally, we have organized the endpoints into logical sections within our Swagger documentation for easier navigation and understanding.
2. MongoDB Collection Name Changes
We have updated the names of several MongoDB collections. Here are the collection name changes:
- nl_query_response ➡️ nl_query_responses
- nl_question ➡️ nl_questions
- database_connection ➡️ database_connections
- table_schema_detail ➡️ table_descriptions
3. Migration to db_connection_id for MongoDB Collections
Previously, we used a db_alias field to relate MongoDB collections. In this release, we have transitioned to using a new field called db_connection_id to establish relationships between collections.
4. Renamed Core Methods for Code Clarity
To improve the clarity of our codebase, we have renamed several core methods.
5. Migration Script from v0.0.1 to v0.0.2
We understand the importance of a smooth transition between versions. This script performs the following actions:
- Adds the db_connection_id relation for all MongoDB collections.
- Renames all MongoDB collection names to align with the new naming conventions.
- Deletes the Vector store data (Pinecone or Chroma) and utilizes the golden_records collection to upload the data seamlessly.
To execute the script just run this command
docker-compose exec app python3 -m dataherald.scripts.migrate_v001_to_v002