Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 101 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,13 @@ The main bibtex file ([cdl.bib](https://raw.githubusercontent.com/ContextLab/CDL
- [Using the bibtex checker tools](#using-the-bibtex-checker-tools)
- [Installation](#installation)
- [Overview](#overview)
- [bibcheck.py - Format Verification](#bibcheckpy---format-verification)
- [bibverify.py - Accuracy Verification](#bibverifypy---accuracy-verification)
- [Suggested workflow](#suggested-workflow)
- [Additional information and usage instructions](#additional-information-and-usage-instructions)
- [`verify`](#verify)
- [`compare`](#compare)
- [`commit`](#commit)
- [`bibcheck verify`](#verify)
- [`bibcheck compare`](#compare)
- [`bibcheck commit`](#commit)
- [Using the bibtex file as a common bibliography for all *local* LaTeX files](#using-the-bibtex-file-as-a-common-bibliography-for-all-local-latex-files)
- [General Unix/Linux Setup (Command Line Compilation)](#general-unixlinux-setup-command-line-compilation)
- [MacOS Setup with TeXShop and TeX Live](#macos-setup-with-texshop-and-tex-live)
Expand All @@ -35,10 +37,27 @@ You may find the included bibtex file and/or readme file useful for any of the f
- Instructions for adding this repository as a sub-module to Overleaf projects, so that you can share a common bibtex file across your Overleaf projects

## Using the bibtex checker tools
You may find the bibtex checker tools useful for:
- Verifying the integrity of a .bib file

This repository includes two complementary verification tools:

1. **bibcheck.py** - Verifies formatting and consistency
- Checks key naming conventions
- Validates author/editor name formatting
- Ensures proper capitalization
- Verifies page number formatting
- Removes duplicate entries

2. **bibverify.py** - Verifies accuracy against external sources
- Cross-references entries with CrossRef database (170M+ records)
- Validates volume, issue/number, and page fields
- Detects common errors (e.g., DOI in pages field)
- Uses conservative matching to prevent false positives

You may find these tools useful for:
- Verifying the integrity and accuracy of a .bib file
- Autocorrecting a .bib file (use with caution!)
- Automatically generating change logs and commit messages
- Finding and fixing metadata errors

### Installation
The bibtex checker has only been tested on MacOS, but it will probably work without modification on other Unix systems, and with minor modification on Windows systems.
Expand All @@ -51,7 +70,9 @@ pip install -r requirements.txt

### Overview

The included checker has three general functions: `verify`, `compare`, and `commit`:
#### bibcheck.py - Format Verification

The format verification tool has three main functions: `verify`, `compare`, and `commit`:
```bash
Usage: bibcheck.py [OPTIONS] COMMAND [ARGS]...

Expand All @@ -68,25 +89,95 @@ Commands:
verify
```

#### bibverify.py - Accuracy Verification

The accuracy verification tool checks entries against the CrossRef database:
```bash
Usage: python bibverify.py [OPTIONS] COMMAND [ARGS]...

Commands:
verify Verify bibliographic entries against CrossRef database
info Show information about the verification tool
```

**Key Features:**
- **Fast:** Verifies 6,151 entries in ~6 minutes using parallel processing
- **Conservative:** Requires strong similarity in title, authors, AND journal before reporting issues
- **Accurate:** Prevents false positives by rejecting uncertain matches
- **Focused:** Only checks volume, issue, and pages metadata (not formatting)

**Basic Usage:**
```bash
# Verify entire bibliography with 10 parallel workers
python bibverify.py verify cdl.bib --workers 10

# Get detailed output
python bibverify.py verify cdl.bib --verbose --workers 10

# Save report to file
python bibverify.py verify cdl.bib --workers 10 > verification_report.txt 2>&1
```

**How it Works:**
1. Queries CrossRef API by DOI (if present) or by title/authors
2. **Conservative Matching:** Requires ALL of:
- Title similarity ≥ 85%
- Author similarity ≥ 70%
- Journal similarity ≥ 60%
- Year difference ≤ 1 year
3. Only reports discrepancies when confident it's the same paper
4. Checks for volume/number mismatches, incorrect pages, and common errors

**Example Output:**
```
============================================================
VERIFICATION SUMMARY
============================================================
✓ Verified: 3,988 (65%)
✗ Errors: 724 (12%)
⚠ Warnings: 1,434 (23%)

Common errors found:
- Volume/issue number mismatches
- Page range errors or off-by-one issues
- DOI placed in pages field instead of doi field
- Year discrepancies (preprint vs published versions)
```

**Performance:** With 10 workers, verifies ~17 entries/second. Full bibliography verification takes approximately 6 minutes.

**Note:** 23% of entries may not be found in CrossRef (arXiv preprints, technical reports, very new/old publications). The tool correctly rejects uncertain matches rather than suggesting false corrections.

# Suggested workflow

After making changes to `cdl.bib` (manually, using
[bibdesk](https://bibdesk.sourceforge.io/), etc.), please follow the suggested
workflow below in order to safely update the shared lab resource:

1. Verify the integrity of the modified cdl.bib file (correct any changes until this passes):
1. **(Optional) Verify accuracy against CrossRef:**
```bash
python bibverify.py verify cdl.bib --workers 10 > verification_report.txt 2>&1
# Review verification_report.txt and fix any genuine errors found
```

2. Verify the formatting/integrity of the modified cdl.bib file (correct any changes until this passes):
```bash
python bibcheck.py verify --verbose
```
2. Generate a change log and commit your changes:

3. Generate a change log and commit your changes:
```bash
python bibcheck.py commit --verbose
```
3. Push your changes to your fork:

4. Push your changes to your fork:
```bash
git push
```
4. Create a pull request for pulling your changes into the ContextLab fork

5. Create a pull request for pulling your changes into the ContextLab fork

**Note:** The bibverify step is optional but recommended for catching metadata errors. It's especially useful when adding new entries or updating existing ones.

## Additional information and usage instructions

Expand Down
Loading