Skip to content

feat: add concat command to concatenate multiple WARC files#188

Open
NGTmeaty wants to merge 4 commits intomasterfrom
concat
Open

feat: add concat command to concatenate multiple WARC files#188
NGTmeaty wants to merge 4 commits intomasterfrom
concat

Conversation

@NGTmeaty
Copy link
Copy Markdown
Collaborator

No description provided.

@NGTmeaty NGTmeaty requested review from Copilot and willmhowes March 31, 2026 01:33
@NGTmeaty NGTmeaty self-assigned this Mar 31, 2026
@NGTmeaty NGTmeaty added the enhancement New feature or request label Mar 31, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new warc concat subcommand to combine multiple WARC files into a single output file, with optional deletion of the original inputs.

Changes:

  • Register a new concat Cobra subcommand in the warc CLI.
  • Implement byte-level concatenation of multiple input files into a single output file.
  • Add a safety check to refuse concatenation of gowarc zstd dictionary-frame files.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
cmd/warc/main.go Registers the new concat subcommand on the root CLI.
cmd/warc/concat/concat.go Implements concatenation, logging, dictionary-frame detection, and optional input deletion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/warc/concat/concat.go Outdated
Comment thread cmd/warc/concat/concat.go
Comment thread cmd/warc/concat/concat.go
Comment thread cmd/warc/concat/concat.go
Comment thread cmd/warc/concat/concat.go
@yzqzss
Copy link
Copy Markdown
Collaborator

yzqzss commented Mar 31, 2026

IIRC, WARC Spec allows you to directly concat multiple warc files without rewriting the warc records? (cat *.warc > one.warc)

Comment thread cmd/warc/concat/concat.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/warc/concat/concat.go Outdated
Comment thread cmd/warc/concat/concat.go Outdated
@willmhowes
Copy link
Copy Markdown
Collaborator

IIRC, WARC Spec allows you to directly concat multiple warc files without rewriting the warc records? (cat *.warc > one.warc)

@yzqzss Yes! We just wanted a convenient way to accomplish WARC concatenation within existing toolset

Copy link
Copy Markdown
Collaborator Author

@NGTmeaty NGTmeaty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants