Skip to content

feat(alias core): add data source management#110

Merged
xieyxclack merged 19 commits intoagentscope-ai:mainfrom
StCarmen:main
Feb 2, 2026
Merged

feat(alias core): add data source management#110
xieyxclack merged 19 commits intoagentscope-ai:mainfrom
StCarmen:main

Conversation

@SSSuperDan
Copy link
Contributor

📝 PR Type

  • Add new sample
  • Update existing sample
  • Add new test cases
  • Fix test failures
  • Documentation/Configuration update

📚 Description

This PR introduces data source management functionality to alias, which enables users to configure data sources flexibly. This component automatically connects to, inspects, and profiles various data sources (e.g., CSV, JSON, relational databases) for use in agents.


✅ Checklist

Please complete the following checks before submitting the PR:

  • All sample code has been formatted with pre-commit run --all-files
  • Test coverage has not decreased (if applicable)
  • Related documentation in agentscope-samples has been updated (e.g., README.md)

@SSSuperDan SSSuperDan requested a review from a team January 22, 2026 11:20
@cla-assistant
Copy link

cla-assistant bot commented Jan 22, 2026

CLA assistant check
All committers have signed the CLA.

@cla-assistant
Copy link

cla-assistant bot commented Jan 22, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ SSSuperDan
✅ StCarmen
❌ LittleHealth


LittleHealth seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@gemini-code-assist
Copy link

Summary of Changes

Hello @SSSuperDan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the alias core by introducing a robust data source management system. This system empowers agents to seamlessly interact with a diverse range of data formats, from local files to relational databases, by automatically profiling their structure and dynamically providing context-specific handling guidelines. The changes streamline data ingestion and processing workflows, making agents more versatile and efficient in data-intensive tasks.

Highlights

  • New Data Source Management System: Introduced a comprehensive data source management system that allows agents to flexibly configure, connect to, inspect, and profile various data sources, including CSV, Excel, Image, JSON, Text files, and Relational Databases.
  • LLM-Driven Data Profiling: Implemented LLM-driven data profiling for different source types, enabling automatic extraction of schema, metadata, and descriptive summaries, including specialized handling for irregular Excel files.
  • Dynamic Data Skill Loading: Developed a DataSkillManager to dynamically load relevant data handling skills and guidelines for agents based on the identified data source types, enhancing the agent's contextual awareness.
  • Refactored DataScienceAgent: The DataScienceAgent has been refactored to integrate with the new data source manager, allowing it to receive data source configurations and dynamic skills directly.
  • Enhanced CLI for Data Input: Updated the command-line interface (CLI) to support --datasource for specifying data source endpoints (files, URLs, DB connection strings) and --dataconfig for providing a JSON configuration file for data sources.
  • Improved Report Generation: Modified the report generation mechanism to produce both Markdown and HTML versions of detailed reports, saving them to the workspace for easier access and review.
  • Dependency Updates and Nginx Configuration: Added polars for efficient data handling, updated pydantic and mcp dependencies, and configured Nginx with longer proxy timeouts to support potentially long-running operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive data source management feature, which is a significant and well-architected enhancement. The core of this change is the DataSourceManager, which capably handles various data sources by profiling them, preparing them for agent use, and loading relevant skills. The command-line interface has also been improved to support these new capabilities. While the overall structure is solid, I've identified several areas for improvement in the new data profiling logic, mainly concerning error handling and robustness. Addressing these points will help prevent silent failures and make the system more resilient.

Copy link
Member

@xieyxclack xieyxclack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the inline comments

@xieyxclack xieyxclack merged commit 679431b into agentscope-ai:main Feb 2, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants