Skip to content

Wrap all create/update connectors with generic tools #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

plutasnyy
Copy link
Collaborator

@plutasnyy plutasnyy commented Apr 3, 2025

This PR wraps create update tools into generic tools. It reduces the amount of tokens in the context by 5k:

# Total Tokens: 16665 - main
# Total tokens: 11816 - pluto/reduce_prompt_length_4_unify_source_and_destination

I have verified running all notebooks and in debug mode invalid usage, and LGTM. I encourage reviewers for extra testing ;)

image

@plutasnyy plutasnyy marked this pull request as ready for review April 3, 2025 16:07
@plutasnyy plutasnyy changed the title Squeeze connectors into one tool Wrap all create/update connectors with generic tools Apr 3, 2025
@plutasnyy plutasnyy requested a review from MKhalusova April 3, 2025 16:14
@ryannikolaidis
Copy link

I have verified running all notebooks and in debug mode invalid usage, and LGTM. I encourage reviewers for extra testing ;)

always game for manual testing, but checking: do we not have anything automated in place that will catch issues?

@plutasnyy
Copy link
Collaborator Author

Unfortunately, there is no such thing :( We probably should add some

@plutasnyy plutasnyy self-assigned this Apr 7, 2025
@plutasnyy plutasnyy requested a review from ryannikolaidis April 7, 2025 12:52
@plutasnyy
Copy link
Collaborator Author

TODO:
The release job is triggered AFTER the merge to the main, so comparing the version from the branch vs the main branch is actually always equal with no change.

Comment on lines +40 to +208
"databricks_volumes": create_databricks_volumes_destination,
"mongodb": create_mongodb_destination,
"neo4j": create_neo4j_destination,
"pinecone": create_pinecone_destination,
"s3": create_s3_destination,
"weaviate": create_weaviate_destination,
}

if destination_type in destination_functions:
destination_function = destination_functions[destination_type]
return await destination_function(ctx=ctx, name=name, **type_specific_config)

return (
f"Unsupported destination type: {destination_type}. "
f"Please use a supported destination type {list(destination_functions.keys())}."
)


async def update_destination_connector(
ctx: Context,
destination_id: str,
destination_type: Literal[
"astradb",
"databricks_delta_table",
"databricks_volumes",
"mongodb",
"neo4j",
"pinecone",
"s3",
"weaviate",
],
type_specific_config: dict[str, Any],
) -> str:
"""Update a destination connector based on type.

Args:
ctx: Context object with the request and lifespan context
destination_id: ID of the destination connector to update
destination_type: The type of destination being updated

type_specific_config:
astradb:
collection_name: (Optional[str]): The AstraDB collection name
keyspace: (Optional[str]): The AstraDB keyspace
batch_size: (Optional[int]) The batch size for inserting documents
databricks_delta_table:
catalog: (Optional[str]): Name of the catalog in Databricks Unity Catalog
database: (Optional[str]): The database in Unity Catalog
http_path: (Optional[str]): The cluster’s or SQL warehouse’s HTTP Path value
server_hostname: (Optional[str]): The Databricks cluster’s or SQL warehouse’s
Server Hostname value
table_name: (Optional[str]): The name of the table in the schema
volume: (Optional[str]): Name of the volume associated with the schema.
schema: (Optional[str]) Name of the schema associated with the volume
volume_path: (Optional[str]) Any target folder path within the volume, starting
from the root of the volume.
databricks_volumes:
catalog: (Optional[str]): Name of the catalog in Databricks
host: (Optional[str]): The Databricks host URL
volume: (Optional[str]): Name of the volume associated with the schema
schema: (Optional[str]) Name of the schema associated with the volume. The default
value is "default".
volume_path: (Optional[str]) Any target folder path within the volume,
starting from the root of the volume.
mongodb:
database: (Optional[str]): The name of the MongoDB database
collection: (Optional[str]): The name of the MongoDB collection
neo4j:
database: (Optional[str]): The Neo4j database, e.g. "neo4j"
uri: (Optional[str]): The Neo4j URI
e.g. neo4j+s://<neo4j_instance_id>.databases.neo4j.io
batch_size: (Optional[int]) The batch size for the connector
pinecone:
index_name: (Optional[str]): The Pinecone index name
namespace: (Optional[str]) The pinecone namespace, a folder inside the
pinecone index
batch_size: (Optional[int]) The batch size
s3:
remote_url: (Optional[str]): The S3 URI to the bucket or folder
weaviate:
cluster_url: (Optional[str]): URL of the Weaviate cluster
collection: (Optional[str]): Name of the collection in the Weaviate cluster

Returns:
String containing the updated destination connector information
"""
update_functions = {
"astradb": update_astradb_destination,
"databricks_delta_table": update_databricks_delta_table_destination,
"databricks_volumes": update_databricks_volumes_destination,
"mongodb": update_mongodb_destination,
"neo4j": update_neo4j_destination,
"pinecone": update_pinecone_destination,
"s3": update_s3_destination,
"weaviate": update_weaviate_destination,
}

if destination_type in update_functions:
update_function = update_functions[destination_type]
return await update_function(ctx=ctx, destination_id=destination_id, **type_specific_config)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the long term vision? are we going to keep extending across all destinations? this does feel great as far as scaling out, but no need to block here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know if we would scale drastically with something, I think the crucial thing to add would be to select a subset of tools. Then we could split them across different dimensions, e.g., s3_connector, which can perform CRUD on s3 sources and destinations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or for example expose only tools that have credentials provided

@ryannikolaidis
Copy link

Unfortunately, there is no such thing :( We probably should add some

Yea, agree, feels like table stakes. Maybe stub a ticket just as a reminder/placeholder?

@plutasnyy plutasnyy merged commit dd77693 into main Apr 8, 2025
2 checks passed
@plutasnyy plutasnyy deleted the pluto/reduce_prompt_length_4_unify_source_and_destination branch April 8, 2025 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants