A python project that uses LLMs, to analyze LLMs, built using an LLM. 90% of this code is generated using cline and Clude 3.5 Sonnet
Currently it does one thing.
Tells you what code libraries and APIs LLMs are using when writing code.
This project solves the challenge of creating enough samples to get good coverage and scaling it cover different models and domains.
See DESIGN DETAILS for more details on the design
- Generate product ideas using LLM models
- Convert product ideas into detailed requirements
- Generate code implementations based on requirements
- Analyze and track frameworks used in generated code
- Comprehensive dependency tracking and analysis
- Flexible workflow with ability to start from any step
- Custom working directory naming
- Verified at num ideas up to 100
- Verified against Llama, Antropic, Qwen, OpenAI and DeepSeek models
- Python 3.10 or higher
- OpenRouter API key
- Clone the repository:
git clone https://github.com/byjlw/llm-analysis.git
cd llm-analysis
- Create and activate a virtual environment:
python3 -m venv .venv
source .venv/bin/activate # On Windows, use: .venv\Scripts\activate
- Install the package:
pip install -e .
Run the code dependency analysis with default settings:
llm-analysis coding-dependencies --api-key your-api-key-here
Customize the run with command-line options:
llm-analysis coding-dependencies \
--config path/to/config.json \
--model meta-llama/llama-3.3-70b-instruct \
--output-dir custom/output/path \
--log-level DEBUG \
--start-step 2 \
--working-dir my-project
Analyze multiple models at once using this script. Update the script with the models you want to use.
./coding_dependencies_job.sh <api_key> [num_ideas] [start_step working_dir]
Parameters:
api_key
: (Required) Your OpenRouter API keynum_ideas
: (Optional) Number of ideas to generate (defaults to 15)start_step
: (Optional) Which pipeline stage to start from (1-4)working_dir
: (Required if start_step is provided) Name of the working directory
Examples:
# Basic usage - creates timestamped directory
./coding_dependencies_job.sh sk_or_...
# Generate 20 ideas - creates timestamped directory
./coding_dependencies_job.sh sk_or_... 20
# Start from step 2 using existing directory
./coding_dependencies_job.sh sk_or_... 15 2 my_analysis
The script:
- Runs analysis using multiple models
- Directory structure:
- Creates a parent working directory (timestamp if not specified)
- Each model gets its own subdirectory within the working directory
- When starting from a later step (2-4):
- Uses the specified working directory
- Skips models that don't have existing subdirectories
- Continues to next model if one fails
- Includes delay between runs to prevent rate limiting
Example output structure:
output/
└── 03-15-24-14-30-45/ # Parent directory (timestamp or user-specified)
├── qwen_qwen-2.5-coder-32b-instruct/
│ ├── ideas.json
│ ├── requirements/
│ ├── code/
│ └── dependencies.json
├── meta-llama_llama-3.3-70b-instruct/
├── openai_gpt-4o-2024-11-20/
└── deepseek_deepseek-chat/
The code dependency analysis follows a 4-step process:
- Ideas Generation (
--start-step 1
or--start-step ideas
) - Requirements Analysis (
--start-step 2
or--start-step requirements
) - Code Generation (
--start-step 3
or--start-step code
) - Dependencies Collection (
--start-step 4
or--start-step dependencies
)
You can start from any step using the --start-step
argument. The tool will assume that any necessary files from previous steps are already present in the working directory.
By default, the tool creates a timestamped directory for each run. You can specify a custom directory name using the --working-dir
argument:
llm-analysis coding-dependencies --working-dir my-project
All available command line arguments:
--api-key
: OpenRouter API key (can also be set via OPENROUTER_API_KEY environment variable)--output-dir
: Output directory (default: 'output')--working-dir
: Working directory name (defaults to timestamp)--num-ideas
: Number of ideas to generate (default: 15)--model
: Model to use for generation (default: 'meta-llama/llama-3.3-70b-instruct')--log-level
: Logging level (choices: DEBUG, INFO, WARNING, ERROR, CRITICAL; default: INFO)--start-step
: Step to start from (1/ideas, 2/requirements, 3/code, 4/dependencies)
The tool supports configuration through a JSON file. The tool looks for configuration files in the following order:
- Custom config file path specified via command line (
--config path/to/config.json
) config.json
in the current working directory- Default config at
src/config/default_config.json
To create a custom config file:
# Copy the default config to your current directory
cp src/config/default_config.json config.json
# Edit config.json with your settings
Available configuration options:
{
"openrouter": {
"api_key": "", // Your OpenRouter API key
"default_model": "openai/gpt-4o-2024-11-20", // Default LLM model to use
"timeout": 120, // API request timeout in seconds
"max_retries": 3 // Maximum number of API request retries
},
"output": {
"base_dir": "output", // Base directory for output files
"ideas_filename": "ideas.json", // Filename for generated ideas
"dependencies_filename": "dependencies.json" // Filename for dependency analysis
},
"prompts": {
"ideas": "prompts/1-spawn_ideas.txt", // Prompt file for idea generation
"requirements": "prompts/2-idea-to-requirements.txt", // Prompt for requirements generation
"code": "prompts/3-write-code.txt", // Prompt for code generation
"dependencies": "prompts/4-collect-dependencies.txt", // Prompt for dependency collection
"error_format": "prompts/e1-wrong_format.txt" // Prompt for format error handling
},
"logging": {
"level": "DEBUG", // Logging level
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s" // Log message format
}
}
Configuration precedence:
- Command line arguments
- Environment variables (for API key)
- Custom config file specified via --config
- config.json in current working directory
- Default config file (src/config/default_config.json)
The tool creates a working directory (timestamped or custom-named) for each run under the output directory:
output/
└── working-dir/
├── ideas.json
├── requirements/
│ └── requirements_*.txt
├── code/
│ └── generated_files
└── dependencies.json
The tool uses LLM to analyze code files and identify frameworks used. Key features:
- All code files are processed as .txt files for consistency
- LLM analyzes the code to identify frameworks and libraries
- Tracks usage frequency of each framework
- Framework detection is language-agnostic and based on LLM analysis
Example dependencies.json:
You can see a full run output in docs/example_output
{
"frameworks": [
{
"name": "Ruby on Rails",
"count": 2
},
{
"name": "PostgreSQL",
"count": 1
},
{
"name": "Vue.js",
"count": 1
}
]
}
The dependency analysis process:
- Code files are read as text
- Each file is analyzed by the LLM to extract a list of frameworks
- Results are aggregated and framework counts are updated
- Final results are saved in dependencies.json
- Install development dependencies:
pip install -r requirements.txt
- Run tests:
pytest
- Run linting:
flake8 src tests
black src tests
mypy src tests
- Guidelines for AI Coding Agents Ensure the Coding Agent uses the AI Rules and Guidelines in every request