This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Activate the virtual environment first (system Python is externally managed)
source .venv/bin/activate
# Preview what the next changelog would look like (unreleased commits)
git cliff --unreleased
# Regenerate CHANGELOG.md and docs/guide/changelog.md manually
git cliff -o CHANGELOG.md
{ printf -- "---\ntitle: Changelog\n---\n\n"; awk '/^## \[/{found=1} found' CHANGELOG.md; } > docs/guide/changelog.md
# Install for development
pip install -e ".[dev]"
# Run all tests with coverage
pytest
# Run a single test file
pytest tests/test_search.py
# Run a single test
pytest tests/test_search.py::test_name
# Run the CLI
python -m mosaic search "transformer attention"
mosaic search "transformer attention" --max 5 --source arxiv
mosaic get 10.1234/example.doi
mosaic config --showMOSAIC is a CLI tool (mosaic entry point → mosaic/cli.py) that fans out paper searches across multiple scientific sources, deduplicates results, caches them locally, and can download PDFs.
cli.pyloads config, instantiates enabled sources viasource_registry.py:build_sources(), callssearch_all().search.py:search_all()iterates sources sequentially, merges duplicates byPaper.uid(preferring richer data), appliesSearchFiltersas a post-processing safety net.- Results are saved to a SQLite cache (
db.py:Cache) and optionally downloaded viadownloader.py.
models.py—Paperdataclass (central data model) andSearchFilters(year/author/journal filtering).Paper.uidis the deduplication key: prefers DOI > arxiv_id > pii > title slug.sources/base.py—BaseSourceABC withsearch()andavailable(). All sources insources/implement this interface.sources/— 21 sources:arxiv,semantic_scholar,sciencedirect(API key or browser session),sciencedirect_browser,springer_browser(Playwright, shorthandsp),springer_api(free API key, shorthandspringer),doaj,europepmc,openalex,base_search,core(free API key),nasa_ads(free API token),ieee(free API key, shorthandieee),zenodo(no auth required),crossref(no auth required),dblp(no auth required),hal(no auth required),pubmed(no auth required, API key optional),pmc(PubMed Central, always OA + direct PDF, API key optional; same NCBI key as pubmed),biorxiv(bioRxiv + medRxiv, shorthandrxiv; searches both servers via website search, fetches metadata fromapi.biorxiv.org; always OA),pedro(physiotherapy evidence, shorthandpedro; requiresacknowledge_fair_useconfig flag),scopus(Elsevier, shorthandscopus; API key or browser session).unpaywall.pyis a helper (not a search source) used by the downloader.source_registry.py— Source factory registry, shorthand maps (SRC_MAP,SHORTHAND_TO_CFG_KEY), andbuild_sources(cfg)which instantiates all enabled sources from config.services.py— Shared business logic:build_filters()(constructSearchFiltersfrom user input),filter_papers()(OA/PDF/sort post-processing),merge_papers()(deduplication byPaper.uid).workflows.py— Multi-step orchestration:download_papers(),push_to_zotero(),push_to_obsidian(). Used by both CLI and web UI.parsing.py— Shared parsing utilities:parse_year(),normalise_doi(),strip_html(),parse_authors_name_key(),parse_authors_given_family(),split_authors(),extract_first().errors.py— Custom exception hierarchy (MosaicError→SourceError,DownloadError,ConfigError) and central logging setup.similar.py—find_similar(identifier, max_results, *, oa_email, ss_api_key)fans out to OpenAlexrelated_works(always) and Semantic Scholar recommendations (when API key configured), deduplicates byPaper.uid, and returns(seed_title, papers). Used bymosaic similarCLI command.bulk.py—read_dois(path)extracts DOIs from.bib(regex) or.csv(DictReader) files. Used bymosaic get --from.zotero.py—ZoteroClientclass supporting both local API (http://localhost:23119) and web API (https://api.zotero.org). Auto-detects mode from config (zotero.api_key). Key methods:is_reachable(),discover_user_id(),ensure_collection(name),add_papers(papers),attach_pdf(item_key, path). PDF attachment is local-only (linked_file); web mode is metadata-only in v1.gui_launcher.py— Entry point for standalone desktop app (PyInstaller). Opens web UI in a Chromium--appwindow.db.py— SQLite with two tables:papers(upsert on uid, updates pdf_url/abstract/is_open_access) anddownloads(tracks local file paths and status).config.py— Reads/writes~/.config/mosaic/config.toml; deep-merges user config over defaults. DB lives at~/.local/share/mosaic/cache.db, downloads at~/mosaic-papers/. Zotero config under[zotero]section (api_key,user_id).
- Create
mosaic/sources/myname.pywith a class extendingBaseSource. - Set
nameclass attribute and implementsearch()returninglist[Paper]. - Export from
mosaic/sources/__init__.py. - Wire into
source_registry.py(factory function +_SOURCE_REGISTRY+ shorthand maps).
Tests use unittest.mock to patch httpx calls — no real network requests. conftest.py provides tmp_cache (in-memory SQLite) and paper fixtures, and a make_response() helper for building mock httpx responses. Coverage JSON is written to docs/public/ after each test run.
VitePress site in docs/. Build with npm run docs:build from the docs/ directory.
- Never run
git commit— generate commit messages with/semantic-commitand let the user paste them in their own terminal (GPG signing requires a TTY that Claude Code does not have).