-
Notifications
You must be signed in to change notification settings - Fork 5
Wrap all create/update connectors with generic tools #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap all create/update connectors with generic tools #49
Conversation
always game for manual testing, but checking: do we not have anything automated in place that will catch issues? |
Unfortunately, there is no such thing :( We probably should add some |
TODO: |
"databricks_volumes": create_databricks_volumes_destination, | ||
"mongodb": create_mongodb_destination, | ||
"neo4j": create_neo4j_destination, | ||
"pinecone": create_pinecone_destination, | ||
"s3": create_s3_destination, | ||
"weaviate": create_weaviate_destination, | ||
} | ||
|
||
if destination_type in destination_functions: | ||
destination_function = destination_functions[destination_type] | ||
return await destination_function(ctx=ctx, name=name, **type_specific_config) | ||
|
||
return ( | ||
f"Unsupported destination type: {destination_type}. " | ||
f"Please use a supported destination type {list(destination_functions.keys())}." | ||
) | ||
|
||
|
||
async def update_destination_connector( | ||
ctx: Context, | ||
destination_id: str, | ||
destination_type: Literal[ | ||
"astradb", | ||
"databricks_delta_table", | ||
"databricks_volumes", | ||
"mongodb", | ||
"neo4j", | ||
"pinecone", | ||
"s3", | ||
"weaviate", | ||
], | ||
type_specific_config: dict[str, Any], | ||
) -> str: | ||
"""Update a destination connector based on type. | ||
|
||
Args: | ||
ctx: Context object with the request and lifespan context | ||
destination_id: ID of the destination connector to update | ||
destination_type: The type of destination being updated | ||
|
||
type_specific_config: | ||
astradb: | ||
collection_name: (Optional[str]): The AstraDB collection name | ||
keyspace: (Optional[str]): The AstraDB keyspace | ||
batch_size: (Optional[int]) The batch size for inserting documents | ||
databricks_delta_table: | ||
catalog: (Optional[str]): Name of the catalog in Databricks Unity Catalog | ||
database: (Optional[str]): The database in Unity Catalog | ||
http_path: (Optional[str]): The cluster’s or SQL warehouse’s HTTP Path value | ||
server_hostname: (Optional[str]): The Databricks cluster’s or SQL warehouse’s | ||
Server Hostname value | ||
table_name: (Optional[str]): The name of the table in the schema | ||
volume: (Optional[str]): Name of the volume associated with the schema. | ||
schema: (Optional[str]) Name of the schema associated with the volume | ||
volume_path: (Optional[str]) Any target folder path within the volume, starting | ||
from the root of the volume. | ||
databricks_volumes: | ||
catalog: (Optional[str]): Name of the catalog in Databricks | ||
host: (Optional[str]): The Databricks host URL | ||
volume: (Optional[str]): Name of the volume associated with the schema | ||
schema: (Optional[str]) Name of the schema associated with the volume. The default | ||
value is "default". | ||
volume_path: (Optional[str]) Any target folder path within the volume, | ||
starting from the root of the volume. | ||
mongodb: | ||
database: (Optional[str]): The name of the MongoDB database | ||
collection: (Optional[str]): The name of the MongoDB collection | ||
neo4j: | ||
database: (Optional[str]): The Neo4j database, e.g. "neo4j" | ||
uri: (Optional[str]): The Neo4j URI | ||
e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io | ||
batch_size: (Optional[int]) The batch size for the connector | ||
pinecone: | ||
index_name: (Optional[str]): The Pinecone index name | ||
namespace: (Optional[str]) The pinecone namespace, a folder inside the | ||
pinecone index | ||
batch_size: (Optional[int]) The batch size | ||
s3: | ||
remote_url: (Optional[str]): The S3 URI to the bucket or folder | ||
weaviate: | ||
cluster_url: (Optional[str]): URL of the Weaviate cluster | ||
collection: (Optional[str]): Name of the collection in the Weaviate cluster | ||
|
||
Returns: | ||
String containing the updated destination connector information | ||
""" | ||
update_functions = { | ||
"astradb": update_astradb_destination, | ||
"databricks_delta_table": update_databricks_delta_table_destination, | ||
"databricks_volumes": update_databricks_volumes_destination, | ||
"mongodb": update_mongodb_destination, | ||
"neo4j": update_neo4j_destination, | ||
"pinecone": update_pinecone_destination, | ||
"s3": update_s3_destination, | ||
"weaviate": update_weaviate_destination, | ||
} | ||
|
||
if destination_type in update_functions: | ||
update_function = update_functions[destination_type] | ||
return await update_function(ctx=ctx, destination_id=destination_id, **type_specific_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the long term vision? are we going to keep extending across all destinations? this does feel great as far as scaling out, but no need to block here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't know if we would scale drastically with something, I think the crucial thing to add would be to select a subset of tools. Then we could split them across different dimensions, e.g., s3_connector, which can perform CRUD on s3 sources and destinations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or for example expose only tools that have credentials provided
Yea, agree, feels like table stakes. Maybe stub a ticket just as a reminder/placeholder? |
This PR wraps create update tools into generic tools. It reduces the amount of tokens in the context by 5k:
I have verified running all notebooks and in debug mode invalid usage, and LGTM. I encourage reviewers for extra testing ;)