Using CLI Environments

Although CLI environments are primarily designed for software and website development, they are also great environments for creating ERs.

They offer several advantages for our use case:

Automatic loading of skills
Automatic loading of settings/permissions
Automatic loading of MCP servers
Proper handling of @agents, avoiding session bias and context bloat
Multiple, parallel sessions

Particularly for security reasons in regard to malicious mcp code, prompt injection, and potential errors in the CLI itself, we have two main security measures in place

Sandboxing of all MCPs in Docker containers
Sandboxing of the CLI itself using Docker SBX

Claude Code, OpenCode, Codex

We tested three environments for ER creation and auditing, as well as model evaluation. The repository provides the configuration for all of them.

Claude Code - Opus
OpenAI Codex - GPT
OpenCode - Grok & Gemini (or any other model that can use MCP servers)

Installing AI4L

Check out this project

git clone https://github.com/forever-healthy/AI4L

Or download and expand AI4L-main.zip if you don't want to use git.

Tip

On macOS, use CMD-SHIFT-. to show hidden files and access .claude, .opencode, or .mcp.json

Project Structure

prompts/ - prompt files used by the agents and skills, including the main AI4L.md prompt
creation/ - created ERs & QA audits, tmp files & trash
docs/ - documentation files, including this one
examples/ - example ERs and audits created with the CLI environments

Configuration Files

CLAUDE.md - project global instructions, also read by OpenCode & Codex
.claude/ - related to Claude Code, including agents, skills, and settings
.mcp.json - configuration for the local MCP servers (used by Claude Code)
.opencode/ - related to OpenCode, including agents, skills, and settings
AGENTS.md - project global instructions for Codex
.codex/ - related to OpenAI Codex, including agents, skills, and settings

Docker Sandbox for the AI CLI Environments

For security reasons, we only run our CLIs in a dedicated Docker Sandbox

Docker SBX:

Protects against prompt injection by nefarious sites or MCP servers
Protects the local environment from Claude Code/Codex/OpenCode errors
Provides the containers for the MCP servers, no need to install Docker Desktop separately

It installs using brew and needs a one-time login with a Docker account

brew install docker/tap/sbx
sbx login

cd .../AI4L
sbx run claude
sbx run opencode
sbx run codex

Important

We strongly recommend running the CLIs only within the sandbox!!

MCP Servers

We are using three local MCP servers with tools that significantly improve the quality of review creation and audit.

The MCPs are configured in:

Claude Code: .mcp.json
OpenCode: .opencode/opencode.json
Codex: .codex/config.toml

They are automatically loaded at startup and then run in isolated Docker containers to protect against malicious code.

There is no need to install Docker Desktop separately, as the Docker SBX will handle the containers for the MCP servers.

Local URL Fetch

mcp-server-fetch

Local requests to websites are less likely to be blocked
Is set to ignore robots.txt
Is set to a Chrome user agent to avoid blocks from sites that disallow bots or have aggressive bot detection

Clinical Trials MCP Server

cyanheads/clinicaltrialsgov-mcp-server

Local requests to clinicaltrials.gov are less likely to be blocked

PubMed MCP Server

cyanheads/pubmed-mcp-server

Local requests to PubMed are less likely to be blocked
To increase rate limits: Get a free account with NCBI > get a free API key
Save the NCBI_API_KEY & NCBI_ADMIN_EMAIL in .env to use the API key for increased rate limits

Disabling the built-in Anthropic MCP Servers

Claude Code includes built-in MCP servers for PubMed & Clinical Trials. However, they need credentials and make requests from Anthropic's Data Center, which are often blocked by target websites. We recommend disabling them and using the local versions instead.

/mcp > `claude.ai Clinical Trials` & `claude.ai Clinical PubMed` > `disabled`

First Startup

First startup might take a while:

On first startup, sbx needs to build the container for the CLI and the containers for individual MCP servers.
Some MCPs might be displayed as failed on the first run
You can check the status of MCP servers with /mcp
Usually, containers are fully instantiated after a short while
On next startup, all MCP containers are cached and will launch and connect instantly

Our Model / CLI of Choice

We are currently using Opus / Claude Code for all creation/audit/fixing agents and Sonnet for workflow processing.

Sonnet is more cost-effective and faster for processing tasks that don't require the advanced capabilities of Opus.

Your first ER Creation & Audit

You can use this simple prompt to create your first ER and iteratively audit it with the /er skill:

Claude Code / OpenCode

/er full Tadalafil

Codex

$er full Tadalafil

It will create an ER for Tadalafil (Cialis), then audit it, then fix the findings, then audit it again, and so on until it gets a 100% pass rate or reaches the maximum number of audits defined.

Using the /er Skill ($er for Codex)

We have included our /er skill, which allows for easy creation and auditing of evidence reviews. The skill is automatically loaded by the CLIs on startup.

We have implemented the following /er commands (not case sensitive) in the skill:

/er create [ <intervention> [ for|to|as|: <goal> ] ] - Create evidence review for <intervention> to achieve <goal>. The default goal is "for Health & Longevity". Results are saved in [creation_dir] with filenames constructed as defined in AI4L.md
/er audit [ <filename> | <intervention> | all ] - Review a specific ER, all unaudited ERs, or the latest ER if nothing is specified. Can also be used to create audits for all ERs generated with other environments/models and saved in [creation_dir].
/er fix [ <filename> | <intervention> ] - Review a specific ER, or the latest ER if nothing is specified, and fix the findings.
/er iterate [ <filename> | <intervention> ] - Loop (audit → fix) until an audit shows 100% pass rate (up to [max_audits])
/er full [ <intervention> ] - Create ER for <intervention> → Loop (audit → fix)
/er compare [ <intervention> ] - Compare all ERs for <intervention>, or use the latest <intervention> worked on, if none specified. Allows comparison of ERs across different models for the same intervention. Very helpful when combined with the "/er audit all" command to gain a comprehensive understanding of the models' relative performance.

Helpful Tools

ZED as a lightweight editor
MacDown for visually editing .md files
Markdown Reader Chrome Extension as an easy-to-use markdown viewer

Lessons Learned

Check out our Lessons Learned document for insights and best practices.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using CLI Environments

Claude Code, OpenCode, Codex

Installing AI4L

Project Structure

Configuration Files

Docker Sandbox for the AI CLI Environments

MCP Servers

Local URL Fetch

Clinical Trials MCP Server

PubMed MCP Server

Disabling the built-in Anthropic MCP Servers

First Startup

Our Model / CLI of Choice

Your first ER Creation & Audit

Using the /er Skill ($er for Codex)

Helpful Tools

Lessons Learned

FilesExpand file tree

Using-CLI-Environments.md

Latest commit

History

Using-CLI-Environments.md

File metadata and controls

Using CLI Environments

Claude Code, OpenCode, Codex

Installing AI4L

Project Structure

Configuration Files

Docker Sandbox for the AI CLI Environments

MCP Servers

Local URL Fetch

Clinical Trials MCP Server

PubMed MCP Server

Disabling the built-in Anthropic MCP Servers

First Startup

Our Model / CLI of Choice

Your first ER Creation & Audit

Using the /er Skill ($er for Codex)

Helpful Tools

Lessons Learned