Skip to content

File type validation on upload through web browser#624

Draft
alitheg wants to merge 1 commit intomainfrom
file_type_detection
Draft

File type validation on upload through web browser#624
alitheg wants to merge 1 commit intomainfrom
file_type_detection

Conversation

@alitheg
Copy link
Contributor

@alitheg alitheg commented Feb 3, 2026

For issue #553. If this is a good approach we probably want to run it for all other upload types, and it needs other supported filetypes added as in Dave's investigation. But I thought I'd get a quick PR raised to validate the approach.

@alitheg alitheg requested a review from sbp February 3, 2026 15:04
@alitheg alitheg marked this pull request as draft February 3, 2026 15:04
@alitheg
Copy link
Contributor Author

alitheg commented Feb 3, 2026

I'm ignoring the build failure for now - it's because the test uploaded a txt file - but since the list of file types supported will need to be changed there's no point adding just that right now.

@sbp
Copy link
Contributor

sbp commented Feb 3, 2026

We could certainly maintain an allow list of suffixes, though it seems likely that even with exhaustive comparison against already published files we are likely to encounter requests to add to the suffixes often.

I doubt that puremagic can always be relied on for gating content, and I wondered when I added it whether we should be raising warnings rather than blocking with errors. Its database is relatively small, plenty of files don't have magic numbers that it can detect, and there are also overlaps between some magic numbers. ASVS 5.2.2 is pretty clear that at L2 we must validate the content of all files, but I think it would be useful to analyse what confidence levels we have for each type in puremagic. We can raise errors for types that we are confident about, and warn about the rest.

I did think about using magika in conjunction with puremagic, but it's even more heuristic, though it does cover more types. It claims a 99% success rate, which means we would still have ten false positives or so for every thousand files uploaded, which for some projects will be just one or two releases. It's hard to balance usability with security when the accuracy rate is so low.

@alitheg
Copy link
Contributor Author

alitheg commented Feb 3, 2026

Here's a survey of puremagic's results for each of the example files.

puremagic detection results - issue #553

Run against real files from downloads.apache.org using the paths from dave2wave's
suffix survey. Only the first 64 KB of each file is fetched (HTTP Range request);
detection uses puremagic.magic_file(), the same call site as atr/detection.py.

.tmp was dropped from the survey: only 2 instances ever existed on the mirror
(both under zzz/) and both have since been removed.

84 file types tested.

OK MISMATCH NOT DETECTED
35 11 38

Extension Status Detected by puremagic Expected Notes
.2 NOT DETECTED - text/plain Plain text, no magic bytes
.4 NOT DETECTED - text/plain Plain text, no magic bytes
.5 NOT DETECTED - text/plain Plain text, no magic bytes
.512 NOT DETECTED - text/plain Plain text, no magic bytes
.6 NOT DETECTED - text/plain Plain text, no magic bytes
.66 NOT DETECTED - text/plain Plain text, no magic bytes
.7 NOT DETECTED - text/plain Plain text, no magic bytes
.7z OK application/x-7z-compressed application/x-7z-compressed
.adoc NOT DETECTED - text/plain Plain text, no magic bytes
.apk MISMATCH application/x-gzip application/zip This particular APK is a gzip stream, not a standard ZIP-based APK
.asc OK application/pgp-signature application/pgp-signature
.bin MISMATCH application/zip (+20 subtypes) application/octet-stream The langdetect .bin is actually a ZIP; expected set assumed opaque binary
.changes NOT DETECTED - text/plain Plain text, no magic bytes
.crate OK application/x-gzip application/x-gzip Rust crates are .tar.gz
.css NOT DETECTED - text/css Plain text, no magic bytes
.deb OK application/vnd.debian.binary-package, application/x-archive application/vnd.debian.binary-package, application/x-archive
.dmg NOT DETECTED - application/x-apple-diskimage puremagic does not recognise this format
.exe OK application/octet-stream, application/vnd.microsoft.portable-executable application/vnd.microsoft.portable-executable
.far OK application/java-archive, application/zip (+20 subtypes) application/java-archive, application/zip
.gem MISMATCH application/x-tar application/x-gzip Ruby gems are plain tar, not gzipped tar - expected set was wrong
.gif OK image/gif image/gif
.gpg NOT DETECTED - application/pgp-encrypted Binary PGP; puremagic does not recognise the signature
.html OK text/html text/html
.ico MISMATCH image/png image/x-icon The favicon.ico on downloads.apache.org is actually a PNG file
.img NOT DETECTED - application/octet-stream Raw binary, no magic bytes
.index NOT DETECTED - text/plain Plain text, no magic bytes
.jar OK application/java-archive, application/zip (+20 subtypes) application/java-archive, application/zip
.json OK application/json application/json
.KEYS MISMATCH audio/x-ms-asx text/plain False positive - file is plain-text PGP public keys; puremagic is tripped up by content
.list NOT DETECTED - text/plain Plain text, no magic bytes
.mar MISMATCH application/zip (+20 subtypes) application/octet-stream The sling .mar is a ZIP; expected set assumed opaque binary
.md MISMATCH application/xml, text/html text/plain This particular README.md opens with XML/HTML-like content
.MD5 NOT DETECTED - text/plain Plain text, no magic bytes
.md5 NOT DETECTED - text/plain Plain text, no magic bytes
.mds NOT DETECTED - text/plain Plain text, no magic bytes
.msi OK application/x-ole-storage (+ other OLE types) application/x-ole-storage
.nar OK application/java-archive, application/zip (+20 subtypes) application/java-archive, application/zip
.nupkg OK application/java-archive, application/zip (+20 subtypes) application/java-archive, application/zip
.old NOT DETECTED - text/plain Plain text, no magic bytes
.pack.gz OK application/x-gzip application/x-gzip
.patch MISMATCH text/x-patch text/x-diff Correct detection; just needs text/x-patch added to the expected set
.pdf OK application/pdf application/pdf
.pem MISMATCH application/x-pem-file text/plain puremagic correctly identifies PEM; the expected set should be updated, not the detection
.pl NOT DETECTED - text/plain Plain text, no magic bytes
.PL NOT DETECTED - text/plain Plain text, no magic bytes
.pm NOT DETECTED - text/plain Plain text, no magic bytes
.png OK image/png image/png
.pom OK application/xml application/xml
.prov NOT DETECTED - text/plain Plain text, no magic bytes
.ps1 OK text/plain text/plain
.py OK text/x-python text/x-python
.rar MISMATCH application/zip (+20 subtypes) application/x-rar-compressed The jackrabbit .rar is actually a ZIP, not a real RAR archive
.readme NOT DETECTED - text/plain Plain text, no magic bytes
.repo NOT DETECTED - text/plain Plain text, no magic bytes
.repositories NOT DETECTED - text/plain Plain text, no magic bytes
.rpm OK application/x-rpm application/x-rpm
.sh NOT DETECTED - application/x-sh Plain text, no magic bytes
.sh1 NOT DETECTED - text/plain Plain text, no magic bytes
.sha NOT DETECTED - text/plain Plain text, no magic bytes
.sha1 NOT DETECTED - text/plain Plain text, no magic bytes
.sha256 NOT DETECTED - text/plain Plain text, no magic bytes
.SHA256 NOT DETECTED - text/plain Plain text, no magic bytes
.SHA512 NOT DETECTED - text/plain Plain text, no magic bytes
.sha512 NOT DETECTED - text/plain Plain text, no magic bytes
.sha512sum NOT DETECTED - text/plain Plain text, no magic bytes
.sig NOT DETECTED - application/pgp-signature Binary PGP; puremagic does not recognise the signature
.slingosgifeature MISMATCH application/json application/zip Only 64 KB fetched; file likely starts with a JSON manifest before the ZIP payload
.snupkg OK application/java-archive, application/zip (+20 subtypes) application/java-archive, application/zip
.sqlite.bz2 OK application/x-bzip2 application/x-bzip2
.taco OK application/java-archive, application/zip (+20 subtypes) application/java-archive, application/zip
.tar.bz2 OK application/x-bzip2 application/x-bzip2
.tar.gz OK application/x-gzip application/x-gzip
.tar.xz OK application/x-xz application/x-xz
.temp NOT DETECTED - text/plain Plain text, no magic bytes
.tgz OK application/x-gzip application/x-gzip
.txt OK text/plain text/plain
.vsix OK application/java-archive, application/zip (+20 subtypes) application/java-archive, application/zip
.war OK application/java-archive, application/zip (+20 subtypes) application/java-archive, application/zip
.whl OK application/java-archive, application/zip (+20 subtypes) application/java-archive, application/zip
.x NOT DETECTED - text/plain Plain text, no magic bytes
.xml OK application/xml application/xml
.xsd OK application/xml application/xml
.yaml OK application/x-yaml application/x-yaml
.zip OK application/java-archive, application/zip (+20 subtypes) application/java-archive, application/zip

Summary of issues

NOT DETECTED (38) - all expected

Plain-text files have no magic byte, so every type that is
really just text comes back empty. This covers all checksum variants (.sha1,
.sha256, .sha512, .md5, .mds, .sha, .SHA256, .SHA512, .sha512sum,
.sh1, .512, .MD5), all plain-text code / config / doc extensions (.css,
.sh, .pl, .PL, .pm, .ps1, .adoc, .readme, .repo, .list,
.changes, .index, .prov, .old, .repositories, .x, .temp), the numeric
changelog suffixes (.2 through .66), and .KEYS / .pem (addressed under
MISMATCH).

.dmg, .gpg, .sig, and .img are also not detected, but for a different reason:
puremagic simply does not have signatures for those formats in its database.

For all of these the only viable strategy is extension-based classification (or, for
checksums, a content-pattern check on the text).

MISMATCH (11) - broken down by action needed

Expected from ChatGPT is wrong, detection is fine:

  • .gem - Ruby gems are plain .tar, not .tar.gz. Expected set should be
    application/x-tar.
  • .patch - puremagic returns text/x-patch; expected was text/x-diff.
  • .pem - puremagic correctly returns application/x-pem-file; expected set had
    text/plain.

The example file on the mirror is not what the extension implies:

  • .apk - this specific file is a gzip stream, not a ZIP-based APK. A different
    APK from a different project might detect correctly.
  • .bin - the langdetect .bin is actually a ZIP. .bin is intentionally
    opaque; puremagic is not wrong, the file just happens to be a ZIP.
  • .ico - favicon.ico on downloads.apache.org is a PNG. A real ICO file would
    detect as image/x-icon.
  • .mar - the sling .mar is a ZIP. Expected assumed opaque binary.
  • .rar - the jackrabbit .rar is actually a ZIP, not a RAR archive.

Likely a 64 KB truncation artefact:

  • .slingosgifeature - detected as application/json. The file probably opens
    with a JSON manifest; the ZIP payload starts later. A full download would likely
    detect the ZIP signature.

puremagic false positive:

  • .KEYS - detected as audio/x-ms-asx. The file is plain-text PGP public keys.
    puremagic is matching something in the key material against the ASX signature
    bytes. Needs extension-based fallback.
  • .md - detected as application/xml, text/html. This particular README.md
    opens with XML / HTML-like markup. A different .md file would likely return
    empty (same as other plain-text types).

@alitheg
Copy link
Contributor Author

alitheg commented Feb 3, 2026

Here's the same for magika:

magika detection results - issue #553

Same files and 64 KB Range-request downloads as the puremagic run. Detection uses
Magika.identify_path(). Magika is an ML-based classifier (Google), so it returns a
single best-guess result rather than a set of candidates.

84 file types tested.

OK MISMATCH
54 30

Extension Status Magika MIME Magika label / description Expected Notes
.2 OK text/plain txt - Generic text document text/plain
.4 OK text/plain txt - Generic text document text/plain
.5 MISMATCH text/x-c c - C source text/plain File is svn_version.h.dist; detection is correct, expected set too narrow
.512 OK text/plain txt - Generic text document text/plain
.6 OK text/plain txt - Generic text document text/plain
.66 OK text/plain txt - Generic text document text/plain
.7 OK text/plain txt - Generic text document text/plain
.7z OK application/x-7z-compressed sevenzip - 7-zip archive data application/x-7z-compressed
.adoc OK text/plain txt - Generic text document text/plain
.apk MISMATCH application/gzip gzip - gzip compressed data application/zip This APK is a gzip stream; same file, same result as puremagic
.asc MISMATCH application/x-pem-file pem - PEM certificate application/pgp-signature PGP ASCII-armored and PEM share the -----BEGIN wrapper; magika confuses them
.bin MISMATCH application/zip zip - Zip archive data application/octet-stream The langdetect .bin is actually a ZIP; same as puremagic
.changes MISMATCH application/x-pem-file pem - PEM certificate text/plain False positive; a Debian changelog classified as PEM
.crate MISMATCH application/gzip gzip - gzip compressed data application/x-gzip application/gzip is the IANA-registered form of application/x-gzip; equivalent
.css OK text/css css - CSS source text/css
.deb OK application/vnd.debian.binary-package deb - Debian binary package application/vnd.debian.binary-package
.dmg MISMATCH application/zlib zlibstream - zlib compressed data application/x-apple-diskimage Detected the compression layer, not the container format
.exe MISMATCH application/x-dosexec pebin - PE Windows executable application/vnd.microsoft.portable-executable application/x-dosexec is the common MIME for PE; detection is correct
.far OK application/java-archive jar - Java archive data (JAR) application/java-archive
.gem MISMATCH application/x-tar tar - POSIX tar archive application/x-gzip Gems are plain tar; expected set is wrong (same as puremagic)
.gif OK image/gif gif - GIF image data image/gif
.gpg MISMATCH application/octet-stream unknown - Unknown binary data application/pgp-encrypted magika does not recognise binary PGP
.html MISMATCH text/x-ruby erb - Embedded Ruby source text/html False positive; jena/HEADER.html misclassified as ERB
.ico MISMATCH image/png png - PNG image image/x-icon The favicon.ico is actually a PNG; same file, same result as puremagic
.img OK application/octet-stream unknown - Unknown binary data application/octet-stream
.index OK text/plain txt - Generic text document text/plain
.jar OK application/java-archive jar - Java archive data (JAR) application/java-archive
.json OK application/json json - JSON document application/json
.KEYS MISMATCH application/x-pem-file pem - PEM certificate text/plain PGP public key blocks share -----BEGIN with PEM; same confusion as .asc
.list OK text/plain txt - Generic text document text/plain
.mar MISMATCH application/java-archive jar - Java archive data (JAR) application/octet-stream The sling .mar is actually a ZIP/JAR; same as puremagic
.md OK text/plain txt - Generic text document text/plain
.md5 OK text/plain txt - Generic text document text/plain
.MD5 OK text/plain txt - Generic text document text/plain
.mds OK text/plain sum - Checksum file text/plain
.msi OK application/x-msi msi - Microsoft Installer file application/x-msi
.nar OK application/java-archive jar - Java archive data (JAR) application/java-archive
.nupkg MISMATCH application/octet-stream nupkg - NuGet Package application/zip Label is correct but no registered MIME in the model; falls back to octet-stream
.old MISMATCH message/rfc822 eml - RFC 822 mail text/plain cassandra/KEYS.old is PGP public keys; ML model reads bulk key blocks as email
.pack.gz MISMATCH application/gzip gzip - gzip compressed data application/x-gzip Same gzip MIME equivalence as .crate / .tar.gz / .tgz
.patch OK text/plain diff - Diff file text/plain
.pdf OK application/pdf pdf - PDF document application/pdf
.pem MISMATCH application/x-pem-file pem - PEM certificate text/plain Correct detection; expected set should be updated (same as puremagic)
.pl MISMATCH text/x-perl perl - Perl source text/plain Correct detection; expected set too narrow
.PL MISMATCH text/x-perl perl - Perl source text/plain Correct detection; expected set too narrow
.pm MISMATCH text/x-perl perl - Perl source text/plain Correct detection; expected set too narrow
.png OK image/png png - PNG image image/png
.pom OK text/xml xml - XML document application/xml, text/xml
.prov MISMATCH application/x-pem-file pem - PEM certificate text/plain Helm .prov files contain PGP signed-message blocks; same -----BEGIN confusion
.ps1 MISMATCH application/x-powershell powershell - Powershell source text/plain Correct detection; expected set too narrow
.py OK text/x-python python - Python source text/x-python
.rar MISMATCH application/java-archive jar - Java archive data (JAR) application/x-rar-compressed The jackrabbit .rar is actually a ZIP; same as puremagic
.readme OK text/plain txt - Generic text document text/plain
.repo OK text/plain ini - INI configuration file text/plain
.repositories OK text/plain ini - INI configuration file text/plain
.rpm OK application/x-rpm rpm - RedHat Package Manager archive (RPM) application/x-rpm
.sh OK text/x-shellscript shell - Shell script text/x-shellscript
.sh1 OK text/plain txt - Generic text document text/plain
.sha OK text/plain txt - Generic text document text/plain
.sha1 OK text/plain txt - Generic text document text/plain
.sha256 OK text/plain txt - Generic text document text/plain
.SHA256 OK text/plain txt - Generic text document text/plain
.SHA512 OK text/plain txt - Generic text document text/plain
.sha512 OK text/plain txt - Generic text document text/plain
.sha512sum OK text/plain txt - Generic text document text/plain
.sig MISMATCH application/octet-stream unknown - Unknown binary data application/pgp-signature magika does not recognise binary PGP signatures
.slingosgifeature MISMATCH application/json json - JSON document application/zip 64 KB truncation artefact; same as puremagic
.snupkg MISMATCH application/octet-stream nupkg - NuGet Package application/zip Same as .nupkg: label correct, MIME falls back to octet-stream
.sqlite.bz2 OK application/x-bzip2 bzip - bzip2 compressed data application/x-bzip2
.taco OK application/java-archive jar - Java archive data (JAR) application/java-archive
.tar.bz2 OK application/x-bzip2 bzip - bzip2 compressed data application/x-bzip2
.tar.gz MISMATCH application/gzip gzip - gzip compressed data application/x-gzip Same gzip MIME equivalence as .crate / .pack.gz / .tgz
.tar.xz OK application/x-xz xz - XZ compressed data application/x-xz
.temp OK text/plain txt - Generic text document text/plain
.tgz MISMATCH application/gzip gzip - gzip compressed data application/x-gzip Same gzip MIME equivalence as .crate / .pack.gz / .tar.gz
.txt MISMATCH text/x-rst rst - ReStructuredText document text/plain The pig README.txt is written in RST; detection is correct
.vsix OK application/zip zip - Zip archive data application/zip
.war OK application/java-archive jar - Java archive data (JAR) application/java-archive
.whl OK application/zip zip - Zip archive data application/zip
.x OK text/plain txt - Generic text document text/plain
.xml OK text/xml xml - XML document application/xml, text/xml
.xsd OK text/xml xml - XML document application/xml, text/xml
.yaml OK application/x-yaml yaml - YAML source application/x-yaml
.zip OK application/java-archive jar - Java archive data (JAR) application/java-archive

Summary of mismatches

Expected set too narrow or uses legacy MIME - detection is correct (13)

These are not magika failures; the expected result (from the ChatGPT table in the issue)
need updating.

  • .5 - file is a C header template (svn_version.h.dist); text/x-c is right.
  • .crate, .pack.gz, .tar.gz, .tgz - magika returns application/gzip,
    which is the IANA-registered form. The expected set has the legacy
    application/x-gzip. They are the same format..
  • .exe - application/x-dosexec is the widely-used MIME for PE executables.
    Add it to the expected set alongside application/vnd.microsoft.portable-executable.
  • .gem - gems are plain .tar, not .tar.gz. Expected set should be
    application/x-tar (same conclusion as puremagic).
  • .pem - application/x-pem-file is the correct type. Expected set had
    text/plain (same conclusion as puremagic).
  • .pl, .PL, .pm - correctly identified as text/x-perl.
  • .ps1 - correctly identified as application/x-powershell.
  • .txt - the specific pig README.txt is written in reStructuredText; text/x-rst
    is accurate.

The example file on the mirror is not what the extension implies (5)

Same five files that tripped puremagic for the same reason.

  • .apk - this file is a gzip stream, not a ZIP-based APK.
  • .bin - the langdetect .bin is actually a ZIP.
  • .ico - favicon.ico is a PNG.
  • .mar - the sling .mar is a ZIP.
  • .rar - the jackrabbit .rar is a ZIP.

64 KB truncation artefact (1)

  • .slingosgifeature - opens with a JSON manifest; the ZIP payload starts later in
    the file. Same result as puremagic.

magika does not recognise the format (5)

  • .gpg, .sig - binary PGP. magika has no model for this; returns
    application/octet-stream.
  • .dmg - magika detects the zlib compression layer but not the Apple Disk Image
    container format on top of it.
  • .nupkg, .snupkg - the label (nupkg) is correct, but there is no registered
    MIME type in the model so it falls back to application/octet-stream.

magika false positives / PGP-PEM confusion (6)

PGP ASCII-armored blocks (-----BEGIN PGP …-----) and PEM blocks
(-----BEGIN CERTIFICATE----- etc.) share the same -----BEGIN … ----- wrapper.
The ML model conflates them.

  • .asc - PGP signature classified as PEM.
  • .KEYS - PGP public-key block classified as PEM.
  • .prov - Helm provenance files contain PGP signed-message blocks; classified as PEM.
  • .changes - Debian changelog classified as PEM. Likely a spurious match on some
    content pattern in this particular file.
  • .old - cassandra/KEYS.old is PGP public keys; classified as message/rfc822.
    Bulk PGP key material can resemble email headers to the model.
  • .html - jena/HEADER.html classified as ERB (Embedded Ruby). No obvious reason;
    likely a low-confidence ML misfire on the file content.

@alitheg
Copy link
Contributor Author

alitheg commented Feb 3, 2026

It feels to me like magika doesn't add much since all its real wins are on the plaintext files - could we just validate that something puremagic doesn't recognise is at least valid UTF-8? Maybe that's still not sufficient for the upload page where we don't necessarily know all the types we'll get! What do you think @sbp?

@alitheg alitheg force-pushed the file_type_detection branch from aa3364a to 61f166f Compare February 4, 2026 10:52
@sbp
Copy link
Contributor

sbp commented Feb 4, 2026

I was thinking more about looking at the implementation to figure out the confidence we can have in the result for each type. For example, you ran puremagic on one real world example XML file and it passed. Was that an unusual result, or does puremagic always detect XML files correctly? We can't know that from checking one file. In fact, we can't know it from checking lots of files. For example, we could feed it a million files that don't have an XML declaration, <?xml version="1.0"?>, at the top. They all seem to be detected correctly. Then we use it in production to test XML files and a user uploads one with an XML declaration, and it breaks. The only way to know how much confidence we can have in results is to do detailed analysis of the algorithm that it uses.

Obviously that would be pretty complicated, and there are a lot of file types, so I was suggesting that we only do this for the most common types that we find on the server. Does it always detect ZIP correctly? Always? 100%? Then we can use it to detect ZIP formats. What about TGZ? 100%? Then we can use it. The suggestion about magika was that we use it when puremagic fails but, again, only when we need it (on the common types) and only when it turns out to be very, very close to 100% accurate. It's impossible to audit magika though, so that's why I was dismissive of it in my comment.

@alitheg
Copy link
Contributor Author

alitheg commented Feb 4, 2026

Makes sense. So you think we should just work through the largest from Dave's survey and investigate those to see if this is achievable or not?

I did notice there's a note in ASVS saying level 1 only requires verification of files used for specific sensitive purposes, but level 2 requires them all.

@dave2wave
Copy link
Member

Makes sense. So you think we should just work through the largest from Dave's survey and investigate those to see if this is achievable or not?

You can make use of the find file and then access via https://dlcdn.apache.org the files and test as many as needed.

@alitheg
Copy link
Contributor Author

alitheg commented Feb 9, 2026

Most (but not all) archives look relatively good. The below file types were 100% correctly picked up by puremagic - the numbers still aren't high but I can up my search volumes (this is discovered from the first 3 directory levels on dlcdn.apache.org, max 800 of each file type for test purposes):

.tar.gz (797 files tested)
.zip (800 files tested)
.jar (800 files tested)
.tgz (463 files tested)
.whl (479 files tested)
.tar.bz2 (111 files tested)
.rpm (28 files tested)
.deb (29 files tested)
.war (13 files tested)
.nar (245 files tested)
.pdf (26 files tested)

@dave2wave
Copy link
Member

@alitheg Are you doing further testing? If not then it looks puremagic is a better fit?

@alitheg
Copy link
Contributor Author

alitheg commented Feb 10, 2026

I'm happy to do more - or not. What this shows is that puremagic seems to me to be good enough for tar.gz, zip, jar, tgz, whl and probably tar.bz2. It's likely as good on all the entries in that list but my test didn't find more to test on.

For other file types, we don't have enough data to say we can definitely safely block files that don't pass puremagic validation, so we could change it to a warning for other filetypes and an error for these, for example.

@dave2wave
Copy link
Member

This processing fits into the new quarantine as a first step. It's important to proceed as follows.

  1. Check / validate file type with puremagic
  2. If not an archive then it must be a "decorator" file. We probably need to discuss the rules for tagging or rejecting these files.
  3. If this an archive then new do the archive analysis and expand the archive.
  4. The artifact as it leaves quarantine should be properly typed and tagged.

@alitheg alitheg force-pushed the main branch 2 times, most recently from 133ab83 to 929a8c3 Compare February 16, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants