A modern LangChain application using LangGraph agents for natural language access to the PubChem database. PubChemAgent supports multiple AI providers (OpenAI, Google Gemini, and Anthropic Claude) with configuration-based setup for easy customization.
- Natural Language Interface: Ask questions about chemical compounds in plain English
- Configuration-Based Setup: Easy configuration with
config.tomlfiles - Multiple AI Providers: Support for OpenAI, Google Gemini, and Anthropic Claude
- Comprehensive Database: Access to millions of chemical compounds from PubChem
- Multiple Search Methods: Search by name, CID, SMILES, InChI, molecular formula, and more
- Rich Chemical Properties: Get molecular weight, XLogP, TPSA, and many other properties
- Structural Information: Retrieve SMILES strings, InChI identifiers, and molecular formulas
- Advanced Search: Perform substructure and similarity searches
- Identifier Conversion: Convert between different chemical identifier formats
- Multiple Interfaces: CLI, web interface, and programmatic API
# Clone the repository
git clone https://github.com/yourusername/PubChemAgent.git
cd PubChemAgent
# Install the package
pip install -e .
# Or install with optional dependencies
pip install -e .[gemini,claude]PubChemAgent uses a config.toml file for configuration. Create a configuration file in one of these locations:
- Current directory:
./config.toml - User home directory:
~/.pubchem_agent/config.toml - User home directory:
~/config.toml
# Create a sample config file
pubchem-agent --create-configThis creates a config.toml file with the following structure:
[general]
default_provider = "openai"
temperature = 0.1
streaming = true
timeout = 30
[openai]
api_key = "your_openai_api_key_here"
model = "gpt-3.5-turbo"
base_url = "https://api.openai.com/v1"
temperature = 0.1
max_tokens = 1000
streaming = true
[gemini]
api_key = "your_gemini_api_key_here"
model = "gemini-pro"
temperature = 0.1
max_tokens = 1000
streaming = true
[claude]
api_key = "your_anthropic_api_key_here"
model = "claude-3-haiku-20240307"
temperature = 0.1
max_tokens = 1000
streaming = true
[pubchem]
base_url = "https://pubchem.ncbi.nlm.nih.gov/rest/pug"
timeout = 10
max_retries = 3
[web]
port = 8501
host = "localhost"
page_title = "PubChemAgent"
page_icon = "π§ͺ"
[logging]
level = "INFO"
file = ""
format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"PubChemAgent supports two methods for providing API keys:
- Configuration File (Recommended): Set API keys in
config.toml - Environment Variables (Fallback): Set environment variables if not provided in config
Update the api_key values in your config.toml file:
[openai]
api_key = "sk-your-actual-openai-key"
[gemini]
api_key = "your-actual-gemini-key"
[claude]
api_key = "your-actual-anthropic-key"If API keys are not set in the config file (or are set to placeholder values), PubChemAgent will automatically fall back to reading from environment variables:
# Set environment variables
export OPENAI_API_KEY="sk-your-actual-openai-key"
export GEMINI_API_KEY="your-actual-gemini-key"
export ANTHROPIC_API_KEY="your-actual-anthropic-key"The system uses the following priority order for API keys:
- Config file values (if not empty or placeholder)
- Environment variables (fallback)
- Empty/placeholder values (will show as unavailable)
- OpenAI: Get your API key from OpenAI Platform
- Google Gemini: Get your API key from Google AI Studio
- Anthropic Claude: Get your API key from Anthropic Console
This hybrid approach provides maximum flexibility - you can use config files for development and environment variables for production deployments.
PubChemAgent features a modern, visually-enhanced CLI powered by the Rich library, providing:
- π¨ Beautiful formatted output with colors and styling
- π Structured tables for configuration and provider status
- π― Progress indicators for long-running queries
- β‘ Interactive prompts with improved user experience
- π Organized panels for responses and error messages
# Interactive mode (uses default provider from config)
pubchem-agent
# Single query
pubchem-agent -q "What is the molecular weight of aspirin?"
# Use specific provider
pubchem-agent --provider gemini -q "Find information about caffeine"
# Use specific model
pubchem-agent --provider openai --model gpt-4 -q "Convert this SMILES to InChI: CC(=O)OC1=CC=CC=C1C(=O)O"
# Use custom config file
pubchem-agent --config my_config.toml
# Show examples
pubchem-agent --examples
# Show help
pubchem-agent --help# Start the Streamlit web interface
streamlit run streamlit_app.pyThen open your browser to http://localhost:8501 to use the web interface.
from pubchem_agent import create_agent
# Use default configuration
agent = create_agent()
# Use specific provider
agent = create_agent(provider="gemini")
# Use specific model
agent = create_agent(provider="openai", model="gpt-4")
# Use custom config file
agent = create_agent(config_path="my_config.toml")
# Override configuration parameters
agent = create_agent(provider="claude", temperature=0.5)
# Query the agent
response = agent.query("What is the molecular weight of caffeine?")
print(response)gpt-3.5-turbo- Fast and economicalgpt-4- Advanced reasoning capabilitiesgpt-4-turbo- Latest with enhanced capabilities
gemini-pro- Recommended for most tasksgemini-1.5-pro- Advanced with larger context window
claude-3-haiku-20240307- Fast responsesclaude-3-sonnet-20240229- Balanced performanceclaude-3-opus-20240229- Most capable
- "What is the molecular weight of aspirin?"
- "Find information about caffeine"
- "Convert this SMILES to InChI: CC(=O)OC1=CC=CC=C1C(=O)O"
- "What are the synonyms for compound with CID 2244?"
- "Get the structure of ibuprofen"
- "What is the TPSA of morphine?"
- "Find compounds similar to benzene"
- "What is the molecular formula of vitamin C?"
- "Get detailed properties for acetaminophen"
- "Find the InChI for paracetamol"
The agent has access to the following PubChem tools:
- search_compounds: Search for compounds by name, CID, SMILES, InChI, or formula
- get_compound_properties: Get basic molecular properties
- get_compound_synonyms: Find alternative names and synonyms
- get_compound_structure: Get structural information (SMILES, InChI, formula)
- get_compound_properties_detailed: Get detailed molecular descriptors
- convert_identifier: Convert between different chemical identifier formats
default_provider: Default AI provider to use ("openai", "gemini", "claude")temperature: Global temperature setting (0.0-2.0)streaming: Enable streaming responsestimeout: API request timeout in seconds
Each provider section supports:
api_key: API key for the providermodel: Model to usetemperature: Temperature for this provider (overrides global)max_tokens: Maximum tokens for responsesstreaming: Enable streaming for this provider
base_url: PubChem API base URLtimeout: Request timeout for PubChem API callsmax_retries: Maximum number of retries for failed requests
port: Port for Streamlit web interfacehost: Host for web interfacepage_title: Title for web interfacepage_icon: Icon for web interface
# Run all tests
pytest tests/
# Run with coverage
pytest tests/ --cov=pubchem_agent
# Run specific test
pytest tests/test_agent.py::test_basic_functionalityPubChemAgent/
βββ pubchem_agent/
β βββ __init__.py
β βββ agent.py # Main agent implementation
β βββ tools.py # PubChem tools
β βββ config.py # Configuration management
β βββ cli.py # Command line interface
βββ tests/
β βββ __init__.py
β βββ test_agent.py # Agent tests
βββ config.toml # Sample configuration
βββ streamlit_app.py # Web interface
βββ example.py # Usage examples
βββ pyproject.toml # Package configuration
βββ README.md
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run tests and ensure they pass
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain for the agent framework
- LangGraph for state management
- PubChem for chemical data
- Streamlit for the web interface
- OpenAI, Google, and Anthropic for AI model APIs
For support, please:
- Check the documentation above
- Review example usage in
example.py - Open an issue on GitHub
- Use the
--helpflag for CLI options
- Initial release with configuration-based setup
- Support for OpenAI, Google Gemini, and Anthropic Claude
- CLI, web interface, and programmatic API
- Comprehensive PubChem database access
- Multi-provider support with easy switching