chore(deps): bump the cookiecutter group across 1 directory with 2 updates by dependabot[bot] · Pull Request #8822 · aws/aws-sam-cli

dependabot · 2026-03-18T08:15:43Z

⚠️ Dependabot is rebasing this PR ⚠️

Rebasing might not happen immediately, so don't worry if this takes some time.

Note: if you make any changes to this PR yourself, they will take precedence over the rebase.

Bumps the cookiecutter group with 2 updates in the / directory: chardet and binaryornot.

Updates chardet from 5.2.0 to 7.4.3

Release notes

7.4.3

Patch release: fixes a crash when input contains null bytes inside a <meta charset> declaration.

Bug Fixes

Fixed ValueError: embedded null character crash when input contained a <meta charset> declaration with a null byte in the encoding name (e.g. b'<meta charset="\x00utf-8">'). codecs.lookup() raises ValueError on embedded nulls, and lookup_encoding() was only catching LookupError. Also added defensive ValueError catches in _validate_bytes() and _to_utf8() for completeness. (#369, thanks @DRMacIver for the report)

Full Changelog: chardet/chardet@7.4.2...7.4.3

7.4.2

Patch release: fixes a crash on short inputs and closes a bunch of WHATWG/IANA alias gaps.

Bug Fixes

Fixed RuntimeError: pipeline must always return at least one result on ~2% of all possible two-byte inputs (e.g. b"\xf9\x92"). Multi-byte encodings like CP932 and Johab could score above the structural confidence threshold on very short inputs, but then statistical scoring would return nothing, leaving an empty result list instead of falling through to the fallback. (#367, #368, thanks @jasonwbarnett)

Improvements

Added ~90 encoding aliases from the WHATWG Encoding Standard and IANA Character Sets registry so that <meta charset> labels like x-cp1252, x-sjis, dos-874, csUTF8, and the cswindows* family all resolve correctly through the markup detection stage. Every alias was driven by a failing spec-compliance test, not speculative. (#366)

Added a spec-compliance test suite covering Python decode round-trips for all 86 registry encodings, WHATWG label resolution, IANA preferred MIME names, and Unicode/RFC conformance (BOM sniffing, UTF-8 boundary cases, UTF-16 surrogate pairs). This is the test suite that would have caught the 7.4.1 BOM bug before release. (#366)

Full Changelog: chardet/chardet@7.4.1...7.4.2

7.4.1

Bug Fixes

BOM-prefixed UTF-16/32 input now returns utf-16/utf-32 instead of utf-16-le/utf-16-be/utf-32-le/utf-32-be. The endian-specific codecs don't strip the BOM on decode, so callers were getting a stray U+FEFF at the start of their text. BOM-less detection is unchanged. (#364, #365)

Full Changelog: chardet/chardet@7.4.0...7.4.1

chardet 7.4.0 brings accuracy up to 99.3% (from 98.6% in 7.3.0) and significantly faster cold start thanks to a new dense model format.

What's New

Performance:

New dense zlib-compressed model format (v2) drops cold start (import + first detect) from ~75ms to ~13ms with mypyc

Accuracy (98.6% → 99.3%):

Eliminated train/test data overlap via content fingerprinting

Added MADLAD-400 and Wikipedia as supplemental training sources

Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training and weighted by per-bigram IDF

Encoding-aware substitution filtering (substitutions only apply for characters the target encoding can't represent)

Increased training samples from 15K to 25K per language/encoding pair

Bug fixes:

Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS (these were previously sharing their base encoding's byte-range analyzer, missing extended ranges)

Metrics

| | chardet 7.4.0 (mypyc) | chardet 6.0.0 | charset-normalizer 3.4.6 |

... (truncated)

Changelog

Sourced from chardet's changelog.

7.4.3 (2026-04-13)

Bug Fixes:

Fixed ValueError: embedded null character crash when input contained a <meta charset> declaration with a null byte in the encoding name (e.g. b'<meta charset="\x00utf-8">'). codecs.lookup() raises ValueError on embedded nulls, and lookup_encoding() was only catching LookupError. Also added defensive ValueError catches in _validate_bytes() and _to_utf8() for completeness. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#369](https://github.com/chardet/chardet/issues/369) <https://github.com/chardet/chardet/issues/369>_)

7.4.2 (2026-04-12)

Bug Fixes:

Fixed RuntimeError: pipeline must always return at least one result on ~2% of all possible two-byte inputs (e.g. b"\xf9\x92"). Multi-byte encodings like CP932 and Johab could score above the structural confidence threshold on very short inputs, but then statistical scoring would return nothing, leaving the pipeline with an empty result list instead of falling through to the no_match_encoding fallback. (Jason Barnett <https://github.com/jasonwbarnett>_ via Claude, [#367](https://github.com/chardet/chardet/issues/367) <https://github.com/chardet/chardet/issues/367>, [#368](https://github.com/chardet/chardet/issues/368) <https://github.com/chardet/chardet/pull/368>)

Improvements:

Added ~90 encoding aliases from the WHATWG Encoding Standard and IANA Character Sets registry so that <meta charset> labels like x-cp1252, x-sjis, dos-874, csUTF8, and the cswindows* family all resolve correctly through the markup detection stage. Every alias was driven by a failing spec-compliance test. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#366](https://github.com/chardet/chardet/issues/366) <https://github.com/chardet/chardet/pull/366>_)

Added a spec-compliance test suite covering Python decode round-trips for all 86 registry encodings, WHATWG web-platform label resolution, IANA preferred MIME names, and Unicode/RFC conformance (BOM sniffing, UTF-8 boundary cases, UTF-16 surrogate pairs). This is the test suite that would have caught the 7.4.1 BOM bug before release. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#366](https://github.com/chardet/chardet/issues/366) <https://github.com/chardet/chardet/pull/366>_)

7.4.1 (2026-04-07)

... (truncated)

Commits

8f404a5 docs: set 7.4.3 release date to 2026-04-13
7a6667f docs: fix changelog attribution for #369
a1fc986 docs: changelog for 7.4.3
0af01d6 Fix ValueError crash on null bytes in charset declarations (#369)
08e4ebc ci: parallelize riscv64 builds across 5 RISE runners
2f6e1e9 ci: use python3 -m pip on riscv64 runner
204623d ci: invoke cibuildwheel manually on riscv64 runner
78c1d20 ci: use native runners for aarch64/riscv64 instead of QEMU
3cc0960 docs: changelog for 7.4.2
9079efc Fix RuntimeError on ~2% of two-byte inputs (#368)
Additional commits viewable in compare view

Updates binaryornot from 0.4.4 to 0.6.0

Release notes

Sourced from binaryornot's releases.

BinaryOrNot 0.6.0: Three Layers of Detection

BinaryOrNot identifies binary files three ways: by extension, by file signature, and by content analysis. Pass it any file path and it tells you binary or text, accurately, across PNGs, PDFs, executables, archives, fonts, CJK-encoded text, and hundreds of other formats.
uv pip install --upgrade binaryornot
What's new

131 file types recognized by name. is_binary() checks the filename extension against a curated list of binary types (images, audio, video, archives, executables, fonts, documents, databases, 3D models, CAD files, scientific data formats, game ROMs) before reading any bytes. A .png or .mp4 is classified instantly with zero file I/O. The extension list ships as binary_extensions.csv and is easy to inspect or extend. (#648)

If you need pure content-based classification, pass check_extensions=False:
from binaryornot.check import is_binary
Extension says binary, but let's check the actual bytes
is_binary("mystery_file.pyc", check_extensions=False)
55 binary format signatures. The detector checks file headers against known magic bytes for PNG, JPEG, PDF, ZIP, ELF, Mach-O, WebAssembly, SQLite, Parquet, Arrow IPC, and 45 more formats. Files that match a known signature are classified as binary immediately, before the statistical model runs. The signature table ships as binary_formats.csv. (#647)

Type annotations on the public API. is_binary(), is_binary_string(), and get_starting_chunk() all have inline type annotations. Editors and type checkers know that is_binary() accepts str, bytes, or pathlib.Path and returns bool. Credit to @smheidrich for the initial type stubs proposal (#627) and @AlJohri for requesting pathlib.Path support (#628). (#643)

What's better

Completely retrained decision tree on 4x more data. The detector reads 512 bytes per file instead of 128, and the decision tree was rebuilt from scratch on those larger samples. A new feature, has_magic_signature, gives the tree a second path to the right answer when statistical features are ambiguous. Byte ratios and entropy calculations reflect actual file content rather than header artifacts. (#647)

Python 3.10+ compatibility. BinaryOrNot installs on Python 3.10 through 3.14, supporting Cookiecutter, cookieplone, and other tools that run on older interpreters. Thanks @wesleybl for raising this. (#645)

Test fixtures ship in the sdist. .pyc and .DS_Store test fixtures are force-included in the source distribution so tests pass when run from the sdist. (#646)

What's fixed

PNGs with ambiguous headers are correctly classified. A 512x512 grayscale+alpha PNG has an IHDR chunk with enough null bytes that the first 128 bytes accidentally decode as UTF-16. Extension checking, signature matching, and the retrained tree each independently prevent this misclassification. Closes #642. (#647)

What's changed

is_binary() has a new keyword argument. check_extensions (default True) controls whether the extension check runs. Existing code that calls is_binary(path) gets the extension check automatically. Code that passes check_extensions=False gets the previous content-only behavior.

Contributors

@audreyfeldroy (Audrey M. Roy Greenfeld) designed and built this release: the extension detection system, file signature matching, decision tree retraining, type annotations, Python 3.10 compatibility, and sdist fixes.

Thanks to @smheidrich for the type stubs proposal, @AlJohri for requesting pathlib.Path support, and @wesleybl for raising Python 3.10 compatibility.

BinaryOrNot 0.5.0: Zero Dependencies, 128 Bytes, One Trained Classifier

This is the biggest release in BinaryOrNot's history. I rebuilt the detection engine from the ground up. The original used byte ratio heuristics with chardet as a second opinion for ambiguous files. I replaced all of that with a trained decision tree operating on 23 features, covering 49 binary formats and 37 text encodings, with zero external dependencies. It's backed by 211 tests and a training pipeline you can re-run yourself. If you've ever had BinaryOrNot misidentify a UTF-16 file, choke on a CJK-encoded document, or crash because chardet changed its API, this release is for you.

BinaryOrNot now has zero dependencies. The chardet library (2.1 MB installed) is gone, replaced by a decision tree that reads 128 bytes of a file and classifies it as binary or text using 23 features computed from those bytes alone. The API is unchanged: is_binary("file.png") still returns True.

... (truncated)

Commits

9c979cd Release 0.6.0
fba3730 Merge pull request #648 from binaryornot/check-file-extensions
c375996 Document why bytes filenames matter
17540bc Handle bytes filenames in extension check
bc069ff Classify 131 file types by extension before reading them
a1ff8d0 Merge pull request #647 from binaryornot/fix-png-misclassification
7c1864b Cover 55 binary formats from plists to network captures
2a31e62 Apply ruff formatting to slice expression
5f6351a Teach the decision tree to use file signatures as evidence
feb38d8 Give the decision tree 4x more context per file
Additional commits viewable in compare view

Note
Automatic rebases have been disabled on this pull request as it has been open for over 30 days.

roger-zhangg · 2026-05-12T22:00:57Z

@dependabot recreate

…dates Bumps the cookiecutter group with 2 updates in the / directory: [chardet](https://github.com/chardet/chardet) and [binaryornot](https://github.com/binaryornot/binaryornot). Updates `chardet` from 5.2.0 to 7.4.3 - [Release notes](https://github.com/chardet/chardet/releases) - [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst) - [Commits](chardet/chardet@5.2.0...7.4.3) Updates `binaryornot` from 0.4.4 to 0.6.0 - [Release notes](https://github.com/binaryornot/binaryornot/releases) - [Commits](binaryornot/binaryornot@0.4.4...v0.6.0) --- updated-dependencies: - dependency-name: binaryornot dependency-version: 0.6.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: cookiecutter - dependency-name: chardet dependency-version: 7.2.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: cookiecutter ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-05-12T22:22:08Z

This pull request was built based on a group rule. Closing it will not ignore any of these versions in future pull requests.

To ignore these dependencies, configure ignore rules in dependabot.yml

binaryornot 0.6.0 ships a new binaryornot.data subpackage containing binary_extensions.csv, binary_formats.csv and encodings.csv, loaded via importlib.resources.files() at import time. PyInstaller can't discover that resource lookup statically, so the bundled binary fails on `sam init` with `No module named 'binaryornot.data'`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…86c2d

dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Mar 18, 2026

dependabot Bot requested a review from a team as a code owner March 18, 2026 08:15

github-actions Bot added the pr/internal label Mar 18, 2026

seshubaws previously approved these changes Mar 26, 2026

View reviewed changes

dependabot Bot changed the title ~~chore(deps): bump the cookiecutter group with 2 updates~~ chore(deps): bump the cookiecutter group across 1 directory with 2 updates May 12, 2026

dependabot Bot dismissed seshubaws’s stale review via 42b128c May 12, 2026 22:02

dependabot Bot force-pushed the dependabot/pip/develop/cookiecutter-d701786c2d branch from 8e01744 to 42b128c Compare May 12, 2026 22:02

Update reproducible requirements

aec904d

roger-zhangg closed this May 12, 2026

roger-zhangg reopened this May 12, 2026

roger-zhangg and others added 2 commits May 12, 2026 15:24

Update pyproject.toml

b101a30

roger-zhangg approved these changes May 12, 2026

View reviewed changes

roger-zhangg enabled auto-merge May 12, 2026 22:47

Merge branch 'develop' into dependabot/pip/develop/cookiecutter-d7017…

4821405

…86c2d

vicheey approved these changes May 12, 2026

View reviewed changes

auto-merge was automatically disabled May 12, 2026 23:03
Merge commits are not allowed on this repository

roger-zhangg merged commit 9f13e3d into develop May 12, 2026
55 checks passed

dependabot Bot deleted the dependabot/pip/develop/cookiecutter-d701786c2d branch May 12, 2026 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): bump the cookiecutter group across 1 directory with 2 updates#8822

chore(deps): bump the cookiecutter group across 1 directory with 2 updates#8822
roger-zhangg merged 5 commits into
developfrom
dependabot/pip/develop/cookiecutter-d701786c2d

dependabot Bot commented on behalf of github Mar 18, 2026 •

edited

Loading

Uh oh!

roger-zhangg commented May 12, 2026

Uh oh!

dependabot Bot commented on behalf of github May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dependabot Bot commented on behalf of github Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

7.4.3

Bug Fixes

7.4.2

Bug Fixes

Improvements

7.4.1

Bug Fixes

What's New

Metrics

7.4.3 (2026-04-13)

7.4.2 (2026-04-12)

7.4.1 (2026-04-07)

BinaryOrNot 0.6.0: Three Layers of Detection

What's new

What's better

What's fixed

What's changed

Contributors

BinaryOrNot 0.5.0: Zero Dependencies, 128 Bytes, One Trained Classifier

Uh oh!

roger-zhangg commented May 12, 2026

Uh oh!

dependabot Bot commented on behalf of github May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dependabot Bot commented on behalf of github Mar 18, 2026 •

edited

Loading