Skip to content

fix(deps): update dependency datasets to v3.6.0#448

Open
konflux-internal-p02[bot] wants to merge 1 commit into
mainfrom
konflux/mintmaker/main/datasets-3.x
Open

fix(deps): update dependency datasets to v3.6.0#448
konflux-internal-p02[bot] wants to merge 1 commit into
mainfrom
konflux/mintmaker/main/datasets-3.x

Conversation

@konflux-internal-p02

@konflux-internal-p02 konflux-internal-p02 Bot commented Feb 11, 2026

Copy link
Copy Markdown

This PR contains the following updates:

Package Change Age Confidence
datasets ==3.1.0==3.6.0 age confidence

Release Notes

huggingface/datasets (datasets)

v3.6.0

Compare Source

Dataset Features

Other improvements and bug fixes

New Contributors

Full Changelog: huggingface/datasets@3.5.1...3.6.0

v3.5.1

Compare Source

Bug fixes

  • support pyarrow 20 by @​lhoestq in #​7540
    • Fix pyarrow error TypeError: ArrayExtensionArray.to_pylist() got an unexpected keyword argument 'maps_as_pydicts'
  • Write pdf in map by @​lhoestq in #​7487

Other improvements

New Contributors

Full Changelog: huggingface/datasets@3.5.0...3.5.1

v3.5.0

Compare Source

Datasets Features

>>> from datasets import load_dataset, Pdf
>>> repo = "path/to/pdf/folder"  # or username/dataset_name on Hugging Face
>>> dataset = load_dataset(repo, split="train")
>>> dataset[0]["pdf"]
<pdfplumber.pdf.PDF at 0x1075bc320>
>>> dataset[0]["pdf"].pages[0].extract_text()
...

What's Changed

New Contributors

Full Changelog: huggingface/datasets@3.4.1...3.5.0

v3.4.1

Compare Source

Bug Fixes

Full Changelog: huggingface/datasets@3.4.0...3.4.1

v3.4.0

Compare Source

Dataset Features

  • Faster folder based builder + parquet support + allow repeated media + use torchvideo by @​lhoestq in #​7424

    • /!\ Breaking change: we replaced decord with torchvision to read videos, since decord is not maintained anymore and isn't available for recent python versions, see the video dataset loading documentation here for more details. The Video type is still marked as experimental is this version
    from datasets import load_dataset, Video
    
    dataset = load_dataset("path/to/video/folder", split="train")
    dataset[0]["video"]  # <torchvision.io.video_reader.VideoReader at 0x1652284c0>
    • faster streaming for image/audio/video folder from Hugging Face
    • support for metadata.parquet in addition to metadata.csv or metadata.jsonl for the metadata of the image/audio/video files
  • Add IterableDataset.decode with multithreading by @​lhoestq in #​7450

    • even faster streaming for image/audio/video folder from Hugging Face if you enable multithreading to decode image/audio/video data:
    dataset = dataset.decode(num_threads=num_threads)
  • Add with_split to DatasetDict.map by @​jp1924 in #​7368

General improvements and bug fixes

New Contributors

Full Changelog: huggingface/datasets@3.3.2...3.4.0

v3.3.2

Compare Source

Bug fixes

Other general improvements

New Contributors

Full Changelog: huggingface/datasets@3.3.1...3.3.2

v3.3.1

Compare Source

Bug fixes

Full Changelog: huggingface/datasets@3.3.0...3.3.1

v3.3.0

Compare Source

Dataset Features

  • Support async functions in map() by @​lhoestq in #​7384

    • Especially useful to download content like images or call inference APIs
    prompt = "Answer the following question: {question}. You should think step by step."
    async def ask_llm(example):
        return await query_model(prompt.format(question=example["question"]))
    ds = ds.map(ask_llm)
  • Add repeat method to datasets by @​alex-hh in #​7198

    ds = ds.repeat(10)
  • Support faster processing using pandas or polars functions in IterableDataset.map() by @​lhoestq in #​7370

    • Add support for "pandas" and "polars" formats in IterableDatasets
    • This enables optimized data processing using pandas or polars functions with zero-copy, e.g.
    ds = load_dataset("ServiceNow-AI/R1-Distill-SFT", "v0", split="train", streaming=True)
    ds = ds.with_format("polars")
    expr = pl.col("solution").str.extract("boxed\\{(.*)\\}").alias("value_solution")
    ds = ds.map(lambda df: df.with_columns(expr), batched=True)
  • Apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets by @​alex-hh in #​7207

    • IterableDatasets with "numpy" format are now much faster

What's Changed

New Contributors

Full Changelog: huggingface/datasets@3.2.0...3.3.0

v3.2.0

Compare Source

Dataset Features

  • Faster parquet streaming + filters with predicate pushdown by @​lhoestq in #​7309
    • Up to +100% streaming speed
    • Fast filtering via predicate pushdown (skip files/row groups based on predicate instead of downloading the full data), e.g.
      from datasets import load_dataset
      filters = [('date', '>=', '2023')]
      ds = load_dataset("HuggingFaceFW/fineweb-2", "fra_Latn", streaming=True, filters=filters)

Other improvements and bug fixes

New Contributors

Full Changelog: huggingface/datasets@3.1.0...3.2.0


Configuration

📅 Schedule: (UTC)

  • Branch creation
    • At any time (no schedule defined)
  • Automerge
    • At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about these updates again.


  • If you want to rebase/retry this PR, check this box

To execute skipped test pipelines write comment /ok-to-test.


Documentation

Find out how to configure dependency updates in MintMaker documentation or see all available configuration options in Renovate documentation.

@konflux-internal-p02 konflux-internal-p02 Bot changed the title chore(deps): update dependency datasets to v3.6.0 chore(deps): update dependency datasets to v3.6.0 - autoclosed Feb 18, 2026
@konflux-internal-p02 konflux-internal-p02 Bot deleted the konflux/mintmaker/main/datasets-3.x branch February 18, 2026 16:59
@konflux-internal-p02 konflux-internal-p02 Bot changed the title chore(deps): update dependency datasets to v3.6.0 - autoclosed chore(deps): update dependency datasets to v3.6.0 Feb 21, 2026
@konflux-internal-p02 konflux-internal-p02 Bot force-pushed the konflux/mintmaker/main/datasets-3.x branch 2 times, most recently from 44e8182 to 35b7046 Compare February 21, 2026 01:37
@konflux-internal-p02 konflux-internal-p02 Bot force-pushed the konflux/mintmaker/main/datasets-3.x branch from 35b7046 to df99799 Compare March 11, 2026 17:48
@konflux-internal-p02 konflux-internal-p02 Bot force-pushed the konflux/mintmaker/main/datasets-3.x branch from df99799 to 945883f Compare March 26, 2026 17:50
@konflux-internal-p02 konflux-internal-p02 Bot changed the title chore(deps): update dependency datasets to v3.6.0 chore(deps): update dependency datasets to v3.6.0 - autoclosed Apr 3, 2026
@konflux-internal-p02 konflux-internal-p02 Bot changed the title chore(deps): update dependency datasets to v3.6.0 - autoclosed chore(deps): update dependency datasets to v3.6.0 Apr 3, 2026
@konflux-internal-p02 konflux-internal-p02 Bot reopened this Apr 3, 2026
@konflux-internal-p02 konflux-internal-p02 Bot force-pushed the konflux/mintmaker/main/datasets-3.x branch 2 times, most recently from 945883f to 24ec3c4 Compare April 3, 2026 06:31
@konflux-internal-p02 konflux-internal-p02 Bot force-pushed the konflux/mintmaker/main/datasets-3.x branch from 24ec3c4 to f375557 Compare April 13, 2026 23:08
@konflux-internal-p02 konflux-internal-p02 Bot force-pushed the konflux/mintmaker/main/datasets-3.x branch from f375557 to db33838 Compare May 8, 2026 16:15
@konflux-internal-p02 konflux-internal-p02 Bot changed the title chore(deps): update dependency datasets to v3.6.0 fix(deps): update dependency datasets to v3.6.0 Jun 25, 2026
Signed-off-by: konflux-internal-p02 <170854209+konflux-internal-p02[bot]@users.noreply.github.com>
@konflux-internal-p02 konflux-internal-p02 Bot force-pushed the konflux/mintmaker/main/datasets-3.x branch from db33838 to a547658 Compare June 29, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants