Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@ and this project adheres to [Conventional Commits](https://www.conventionalcommi

## [Unreleased]

## [0.6.1]

### Changed

- **Bulk download for `download --all`**: when downloading all 54 titles, the downloader now fetches a single `xml_uscAll@{releasePoint}.zip` instead of making 54 individual HTTP requests. Falls back to per-title downloads if the bulk zip is unavailable. No CLI changes — same `--all` flag, same output.

## [0.6.0]

### Changed
Expand Down
9 changes: 9 additions & 0 deletions packages/cli/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# law2md

## 0.6.1

### Patch Changes

- Enhance downloader for when all titles are downloaded
- Updated dependencies
- @law2md/core@0.6.1
- @law2md/usc@0.6.1

## 0.6.0

### Minor Changes
Expand Down
2 changes: 1 addition & 1 deletion packages/cli/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "law2md",
"version": "0.6.0",
"version": "0.6.1",
"description": "Convert U.S. legislative XML (USLM) to structured Markdown for AI/RAG ingestion",
"type": "module",
"main": "./dist/index.js",
Expand Down
6 changes: 6 additions & 0 deletions packages/core/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# @law2md/core

## 0.6.1

### Patch Changes

- Enhance downloader for when all titles are downloaded
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changelog entry attributes the bulk USC downloader enhancement to @law2md/core, but the downloader implementation lives in @law2md/usc. To avoid misleading consumers, adjust this entry to reflect actual core changes in 0.6.1 (e.g., dependency bump / no functional changes) and leave the downloader note to the usc/cli changelogs.

Suggested change
- Enhance downloader for when all titles are downloaded
- Internal maintenance and dependency updates; no functional changes in core

Copilot uses AI. Check for mistakes.

## 0.6.0

### Minor Changes
Expand Down
2 changes: 1 addition & 1 deletion packages/core/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@law2md/core",
"version": "0.6.0",
"version": "0.6.1",
"description": "Core XML parsing, AST, and Markdown rendering for law2md",
"type": "module",
"main": "./dist/index.js",
Expand Down
8 changes: 8 additions & 0 deletions packages/usc/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# @law2md/usc

## 0.6.1

### Patch Changes

- Enhance downloader for when all titles are downloaded
- Updated dependencies
- @law2md/core@0.6.1

## 0.6.0

### Minor Changes
Expand Down
2 changes: 1 addition & 1 deletion packages/usc/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@law2md/usc",
"version": "0.6.0",
"version": "0.6.1",
"description": "U.S. Code-specific element handlers and downloader for law2md",
"type": "module",
"main": "./dist/index.js",
Expand Down
31 changes: 31 additions & 0 deletions packages/usc/src/downloader.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import {
buildDownloadUrl,
buildAllTitlesUrl,
releasePointToPath,
isAllTitles,
CURRENT_RELEASE_POINT,
USC_TITLE_NUMBERS,
} from "./downloader.js";
Expand Down Expand Up @@ -63,6 +64,36 @@ describe("buildAllTitlesUrl", () => {
});
});

describe("isAllTitles", () => {
it("returns true for the full set 1-54 in order", () => {
expect(isAllTitles(Array.from({ length: 54 }, (_, i) => i + 1))).toBe(true);
});

it("returns true for the full set shuffled", () => {
const shuffled = Array.from({ length: 54 }, (_, i) => i + 1);
shuffled.reverse();
expect(isAllTitles(shuffled)).toBe(true);
});

it("returns false for a subset", () => {
expect(isAllTitles([1, 2, 3])).toBe(false);
});

it("returns false for 54 elements with duplicates (missing titles)", () => {
const withDupes = Array.from({ length: 54 }, (_, i) => (i < 53 ? i + 1 : 1));
expect(isAllTitles(withDupes)).toBe(false);
});

it("returns true for full set with extra duplicates", () => {
const withExtras = [...Array.from({ length: 54 }, (_, i) => i + 1), 1, 27, 54];
expect(isAllTitles(withExtras)).toBe(true);
});

it("returns false for an empty array", () => {
expect(isAllTitles([])).toBe(false);
});
});

describe("constants", () => {
it("has 54 USC title numbers", () => {
expect(USC_TITLE_NUMBERS).toHaveLength(54);
Expand Down
146 changes: 146 additions & 0 deletions packages/usc/src/downloader.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,20 @@
/** Valid USC title numbers (1-54) */
export const USC_TITLE_NUMBERS = Array.from({ length: 54 }, (_, i) => i + 1);

// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------

/**
* Check whether a list of title numbers covers all 54 USC titles.
*
* Handles arbitrary ordering and duplicates.
*/
export function isAllTitles(titles: number[]): boolean {
const unique = new Set(titles);
return unique.size === 54 && USC_TITLE_NUMBERS.every((n) => unique.has(n));
}

// ---------------------------------------------------------------------------
// Public API
// ---------------------------------------------------------------------------
Expand Down Expand Up @@ -78,13 +92,27 @@

/**
* Download USC title XML files from OLRC.
*
* When all 54 titles are requested, uses the bulk `uscAll` zip for a single
* HTTP round-trip instead of 54 individual requests. Falls back to per-title
* downloads if the bulk download fails.
*/
export async function downloadTitles(options: DownloadOptions): Promise<DownloadResult> {
const releasePoint = options.releasePoint ?? CURRENT_RELEASE_POINT;
const titles = options.titles ?? USC_TITLE_NUMBERS;

await mkdir(options.outputDir, { recursive: true });

// Use bulk zip when all 54 titles are requested
if (options.titles === undefined || isAllTitles(titles)) {
try {
const files = await downloadAndExtractAllTitles(releasePoint, options.outputDir);
return { releasePoint, files, errors: [] };
} catch {
// Fall back to per-title downloads
}
Comment on lines +106 to +113
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bulk-download attempt swallows all errors and falls back silently, but it can leave a partially downloaded uscAll.zip and/or partially extracted XML files in outputDir. Consider narrowing the fallback to expected “bulk unavailable” failures (e.g., 404/410), and ensure temp artifacts are cleaned up on bulk failure (e.g., cleanup in a finally inside the bulk download/extract helper, or delete uscAll.zip + partial outputs before continuing).

Copilot uses AI. Check for mistakes.
}
Comment on lines +106 to +114
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New bulk-download behavior (single uscAll zip, extraction of all titles, and fallback when unavailable) isn’t covered by tests here. Consider adding tests that mock fetch to (1) return a small in-memory zip with a few uscNN.xml entries to verify extraction/mapping, and (2) simulate a 404 for the bulk URL to verify the per-title fallback path is used.

Copilot uses AI. Check for mistakes.

const files: DownloadedFile[] = [];
const errors: DownloadError[] = [];

Expand Down Expand Up @@ -265,3 +293,121 @@
});
});
}

// ---------------------------------------------------------------------------
// Bulk download (all titles in one zip)
// ---------------------------------------------------------------------------

/** Regex matching USC XML filenames like usc01.xml, usc54.xml */
const USC_XML_RE = /^(?:.*\/)?usc(\d{2})\.xml$/;

/**
* Extract all `usc{NN}.xml` files from a bulk zip archive.
*
* Returns an array of `{ titleNumber, filePath }` for each extracted file.
*/
function extractAllXmlFromZip(
zipPath: string,
outputDir: string,
): Promise<{ titleNumber: number; filePath: string }[]> {
return new Promise((resolve, reject) => {
yauzlOpen(zipPath, { lazyEntries: true }, (err, zipFile) => {
if (err) {
reject(new Error(`Failed to open zip: ${err.message}`));
return;
}
if (!zipFile) {
reject(new Error("Failed to open zip: no zipFile returned"));
return;
}

const extracted: { titleNumber: number; filePath: string }[] = [];
let pending = 0;
let ended = false;

const maybeResolve = (): void => {
if (ended && pending === 0) {
resolve(extracted);
}
};

zipFile.on("entry", (entry: Entry) => {
const match = USC_XML_RE.exec(entry.fileName);
if (match) {
const titleNum = parseInt(match[1]!, 10);

Check failure on line 337 in packages/usc/src/downloader.ts

View workflow job for this annotation

GitHub Actions / Node 20

Forbidden non-null assertion

Check failure on line 337 in packages/usc/src/downloader.ts

View workflow job for this annotation

GitHub Actions / Node 22

Forbidden non-null assertion
const outPath = join(outputDir, `usc${match[1]!}.xml`);

Check failure on line 338 in packages/usc/src/downloader.ts

View workflow job for this annotation

GitHub Actions / Node 20

Forbidden non-null assertion

Check failure on line 338 in packages/usc/src/downloader.ts

View workflow job for this annotation

GitHub Actions / Node 22

Forbidden non-null assertion
pending++;

extractEntry(zipFile, entry, outPath)
.then(() => {
extracted.push({ titleNumber: titleNum, filePath: outPath });
pending--;
// Continue reading entries after extraction completes
zipFile.readEntry();
maybeResolve();
})
.catch((extractErr) => {
zipFile.close();
reject(extractErr);
});
} else {
zipFile.readEntry();
}
});

zipFile.on("end", () => {
ended = true;
maybeResolve();
});

zipFile.on("error", (zipErr: Error) => {
reject(new Error(`Zip error: ${zipErr.message}`));
});
Comment on lines +358 to +365
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractAllXmlFromZip never closes the zipFile on the successful path (and also doesn’t close it in the zipFile.on("error") handler). This can leak file descriptors, especially when downloading all titles repeatedly. Ensure zipFile.close() is called when finishing (after end/pending resolves) and when handling zip-level errors.

Copilot uses AI. Check for mistakes.

zipFile.readEntry();
});
});
}

/**
* Download the bulk all-titles zip and extract every `usc{NN}.xml` file.
*/
async function downloadAndExtractAllTitles(
releasePoint: string,
outputDir: string,
): Promise<DownloadedFile[]> {
const url = buildAllTitlesUrl(releasePoint);
const zipPath = join(outputDir, "uscAll.zip");

// Download the zip file
const response = await fetch(url);
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText} for ${url}`);
}

if (!response.body) {
throw new Error(`No response body for ${url}`);
}

// Write zip to disk
const fileStream = createWriteStream(zipPath);
await pipeline(Readable.fromWeb(response.body as never), fileStream);

// Extract all XML files from zip
const extracted = await extractAllXmlFromZip(zipPath, outputDir);

// Clean up zip file
await unlink(zipPath);

// Stat each extracted file and build results
const files: DownloadedFile[] = [];
for (const { titleNumber, filePath } of extracted) {
const fileStat = await stat(filePath);
Comment on lines +396 to +405
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downloadAndExtractAllTitles treats any extraction result as success. If the bulk zip layout changes (or the regex fails), extractAllXmlFromZip can return an empty/partial list and downloadTitles will return { errors: [] }, silently skipping titles. Add a completeness check (e.g., ensure the extracted title numbers cover 1–54 and/or that 54 files were extracted) and throw if incomplete so the caller can fall back to per-title downloads.

Copilot uses AI. Check for mistakes.
files.push({ titleNumber, filePath, size: fileStat.size });
}

// Sort by title number for consistent ordering
files.sort((a, b) => a.titleNumber - b.titleNumber);

return files;
}
1 change: 1 addition & 0 deletions packages/usc/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ export {
buildDownloadUrl,
buildAllTitlesUrl,
releasePointToPath,
isAllTitles,
CURRENT_RELEASE_POINT,
USC_TITLE_NUMBERS,
} from "./downloader.js";
Expand Down
Loading