13 May 13:10

elronbandel

fd97309

Unitxt 1.23.0

Main changes

Revised the tool calling tasks and metrics introduced in 1.22.4) - Non backward compatible change. Existing datasets addressed.
Fixed support for running HF with AutoModelInferenceEngine (MultiGPU + tokenization issue)
Added to_yaml() to create a yaml representation of the card that can be used for running custom datasets in Granite.build

What's Changed

FIx batching support for hf Dataset in HFAutoModelInferenceEngine by @elronbandel in #1771
Fix litellm inference without task_data by @elronbandel in #1772
Added to_yaml shorthand function to artifact by @yoavkatz in #1768
Simplify tool calling base types by @elronbandel in #1773
Added tool calling to wml chat by @pawelknes in #1782
Reverting to datasets=351 can solve problems in test catalog preparation by @dafnapension in #1784
Update ibm wml engine #1775 by @MikolajCharchut in #1781
Fix HF AutoModel tokenization issue with chat template + issue with multi GPU by @OfirArviv in #1779
Performance to report accurate times based on end-to-end time() diffs, rather than accumulate cProfile numbers over methods whose names seem relevant by @dafnapension in #1783
Add support to mix args and textual query in load_dataset by @elronbandel in #1778
Add installation of spacy as a binary dependency for examples regression tests by @elronbandel in #1787
Improvements to tool calling - NON BACKWARD COMPATIBLE CHANGES by @Narayanan-V-Eswar in #1770
Added example for standalone metric evaluation by @yoavkatz in #1769
Update version to 1.23.0 by @elronbandel in #1789

New Contributors

@Narayanan-V-Eswar made their first contribution in #1770

Full Changelog: 1.22.4...1.23.0

Contributors

OfirArviv, elronbandel, and 5 other contributors

Assets 2

04 May 10:46

elronbandel

1.22.4

831e535

Unitxt 1.22.4

What's Changed

Add comprehensive support for tool calling + Berkley Tool Calling Benchmark by @elronbandel in #1762
Add tool calling support + Berekley Tool Calling Benchmark (simple-v3) by @elronbandel in #1764
Remove the rename from test to train by @BenjSz in #1759
trying to fix PERFORMANCE: use a github repository as a replacement for the gone HF 'lmsys/arena-hard-browser' by @dafnapension in #1757
Update version to 1.22.4 by @elronbandel in #1766

Full Changelog: 1.22.3...1.22.4

Contributors

BenjSz, elronbandel, and dafnapension

Assets 2

27 Apr 11:34

elronbandel

1.22.3

4d14dc2

Unitxt 1.22.3

What's Changed

Small fixes and touch ups in docs by @elronbandel in #1738
Fix some docs styling by @elronbandel in #1739
Remove more red lines from make docs-server, and rehab lost sons by @dafnapension in #1742
Add instance ID to WML credentials by @pawelknes in #1741
remove unnecessary llm-as-judge from scigen + max pred in benchmark by @ShirApp in #1746
Support Byom for RITS inference engine. by @eladven in #1744
Add GG and deepseek to rits in CrossInferenceEngine by @martinscooper in #1750
Torr const max prediction by @ShirApp in #1752
trying networkx 3.2.1 to remove TypeError: entry_points() got an unexpected keyword argument 'group' by @dafnapension in #1754
Comment out SQL tests until fixed by @elronbandel in #1756
fix typo in azure_openai_host variable name by @algadhib in #1753
FIx tablebanch data analysis broken split by @elronbandel in #1749
XSTest (and compliance criteria) by @bnayahu in #1751
CrossProvider fails if model name doesn't exist in map by @martinscooper in #1747
Fix relative imports in evaluate cli by @elronbandel in #1758
Update version to 1.22.3 by @elronbandel in #1761

New Contributors

@algadhib made their first contribution in #1753

Full Changelog: 1.22.2...1.22.3

Contributors

eladven, martinscooper, and 6 other contributors

Assets 2

16 Apr 07:03

elronbandel

1.22.2

31e0e61

Unitxt 1.22.2

What's Changed

Fix errors with peft inference engine implementation. by @eladven in #1721
Allow overriding of CI method in GlobalMetric by @lga-zurich in #1722
Mark issues stale after 30 days of no interaction. Closed them 15 days later. by @elronbandel in #1727
Updated models on Replicate by @bnayahu in #1729
Added example of multi-choice QA by @yoavkatz in #1634
Add general formatter for chat api, with chat template based on the m… by @eladven in #1728
Airbench by @bnayahu in #1730
New Unitxt home page by @elronbandel in #1731
Add catalog back to website by @elronbandel in #1734
Disable litellm cache by @martinscooper in #1732
Add a CLI for end-to-end evaluation by @perlitz in #1708
Fix some tests by @elronbandel in #1735
modify some doc-strings, thereby eliminating some red lines in make doc-server by @dafnapension in #1736
Update version to 1.22.2 by @elronbandel in #1737

Full Changelog: 1.22.1...1.22.2

Contributors

eladven, perlitz, and 6 other contributors

Assets 2

09 Apr 12:41

elronbandel

1.22.1

85e478e

Unitxt 1.22.1

What's Changed

Benjams/add watson x by @BenjSz in #1719
remove metadata_fields in clapnq by @BenjSz in #1718
Add full data_files support in HFLoader + tests by @elronbandel in #1724

Full Changelog: 1.22.0...1.22.1

Contributors

BenjSz and elronbandel

Assets 2

06 Apr 13:47

elronbandel

1.22.0

1d0596f

Unitxt 1.22.0

Main changes

Support HFPipelineBasedInferenceEngine with PEFT model by @eladven in #1701

Catalog changes

Update exact multiple choice template by @OfirArviv in #1698
Vision templates by @alfassy in #1700
Vision bench update by @alfassy in #1704
update wml llmajj from llama-3-70b to llama-3-1-70b by @OfirArviv in #1703
Fix Table Bench by @elronbandel in #1709
Text2sql metrics fixes by @oktie in #1702
Updates to the provoq card and related artifacts by @bnayahu in #1705
Fix unitxt assistant context size control and update docs snapshot by @elronbandel in #1707
Fix llm judge artifacts by @martinscooper in #1695

CI/CD

Add timeout to github actions by @elronbandel in #1716
Fix helm test by @elronbandel in #1697
Add retry policy for huggingface assets downloads by @elronbandel in #1711

Other

Add unitxt version to inference cache keys by @elronbandel in #1714
Some touch ups to benchmarks by @elronbandel in #1715

Full Changelog: 1.21.0...1.22.0

Contributors

eladven, oktie, and 5 other contributors

Assets 2

19 Mar 18:07

elronbandel

1.21.0

4aad1e0

Unitxt 1.21.0

What's Changed

add 'show more' button for imports from unitxt modules by @dafnapension in #1651
Update head qa dataset by @elronbandel in #1658
Update few slow datasets by @elronbandel in #1663
MLCommons AILuminate card and related artifacts by @bnayahu in #1662
Granite guardian: add raw prompt to the result by @martinscooper in #1671
Add positional bias summary to the response by @martinscooper in #1640
Return float instead float32 in granite guardian metric by @martinscooper in #1669
add qa template exact output by @OfirArviv in #1674
LLM Judge: add prompts to the result by default by @martinscooper in #1670
Safety eval updates by @bnayahu in #1668
Add inference engine caching by @eladven in #1645
BugFix: Handle cases where all sample scores are the same (yields nan) by @elronbandel in #1660
CrossInferenceProvider: add more models by @martinscooper in #1676
Implement get_engine_id were missing by @martinscooper in #1679
Revisit base dependencies (specifically remove ipadic and absl-py) by @elronbandel in #1681
Fix LoadHF.load_dataset() when mem-caching is off by @yhwang in #1683
HFPipelineInferenceEngine - add loaded tokenizer to pipeline by @eladven in #1677
Add default cache folder to .gitgnore by @martinscooper in #1687
Fix a bug in loading without trust remote code by @elronbandel in #1684
Add sacrebleu[ja] to test dependencies by @elronbandel in #1685
Let evaluator name to be a string by @martinscooper in #1665
Fix: AzureOpenAIInferenceEngine fails if api_version is not set by @martinscooper in #1680
Fix some bugs in inference engine tests by @elronbandel in #1682
Improved output message when using inference cache by @yoavkatz in #1686
Changed API of Key Value Extraction task to use Dict and not List[Tuple] (NON BACKWARD COMPATIBLE CHANGE) by @yoavkatz in #1675
Support for asynchronous requests for watsonx.ai chat by @pawelknes in #1666
add tags information - url by @BenjSz in #1691
Fixes to GraniteGuardian metric,, safety evals cleanups by @bnayahu in #1690
Add docstring to LLMJudge classes by @martinscooper in #1652
Remove src.lock by @elronbandel in #1692
Text2sql metrics update and optional caching by @oktie in #1672
Llm judge use cross provider by @martinscooper in #1673
Improve LLM as Judge consistency by @martinscooper in #1688
Update version to 1.21.0 by @elronbandel in #1693

New Contributors

@yhwang made their first contribution in #1683

Full Changelog: 1.20.0...1.21.0

Contributors

eladven, yhwang, and 9 other contributors

Assets 2

09 Mar 08:49

elronbandel

1.20.0

f1ca43a

Unitxt 1.20.0

What's Changed

Fix unnecessary attempts in LoadCSV by @elronbandel in #1630
Fix LLM as Judge direct criteria typo by @martinscooper in #1631
Fix of typo in usage of attributes inside IntersectCorrespondingFields by @pklpriv in #1637
Added MILU and Indic BoolQ Support by @murthyrudra in #1639
Vision bench by @alfassy in #1641
Add Granite Guardian evaluation on HF example by @martinscooper in #1638
present catalog entries as pieces of python code by @dafnapension in #1643
Example for evaluating system message leakage by @elronbandel in #1609
Benjams/add hotpotqa + change type of metadata field to dict (non backward compatible) by @BenjSz in #1633
removed the leftout break_point by @dafnapension in #1646
Added Indic ARC Challenge Support by @murthyrudra in #1654
Minor bug fix affecting Text2SQL execution accuracy by @oktie in #1657
WMLInferenceEngineChat fixes by @pawelknes in #1656
Update version to 1.20.0 by @elronbandel in #1659

New Contributors

@murthyrudra made their first contribution in #1639

Full Changelog: 1.19.0...1.20.0

Contributors

oktie, BenjSz, and 7 other contributors

Assets 2

25 Feb 17:00

elronbandel

1.19.0

2b216f7

Unitxt 1.19.0

What's Changed

Add RagBench datasets by @elronbandel in #1580
Fix prompts table benchmark by @ShirApp in #1581
Fix attempt to missing arrow dataset by @elronbandel in #1582
Wml comp by @alfassy in #1578
Key value extraction improvements by @yoavkatz in #1573
fix: minor bug when only space id is provided for WML inference by @tsinggggg in #1583
Try fixing csv loader by @elronbandel in #1586
Fix failing tests by @elronbandel in #1589
Fix tests by @elronbandel in #1590
Fix metrics formatting and style by @elronbandel in #1591
Fix bird dataset by @perlitz in #1593
Use Lazy Loaders by @dafnapension in #1536
Fix loading without limit by @elronbandel in #1594
[Breaking change] Add support for all Granite Guardian risks by @martinscooper in #1576
Added api call example by @yoavkatz in #1587
Make MultipleSourceLoader lazy and fix its use of fusion by @elronbandel in #1602
Prioritize using default templates from card over task by @elronbandel in #1596
Use faster model for examples by @elronbandel in #1607
Add clear and minimal settings documentation by @elronbandel in #1606
Fix some tests by @elronbandel in #1610
Add download and etag timeout settings to workflow configurations by @elronbandel in #1613
Allow read timeout error in preparation tests by @elronbandel in #1615
Fix Ollama inference engine by @eladven in #1611
Add verify as an option to LoadFromAPI by @perlitz in #1608
Added example of custom metric by @yoavkatz in #1616
Granite guardian minor changes by @martinscooper in #1605
add ragbench faithfulness cards by @lilacheden in #1598
Update tables benchmark name to torr by @elronbandel in #1617
Add CoT to LLM as judge assessments by @martinscooper in #1612
Simplify preparation tests with better error handling by @elronbandel in #1618
Text2sql execution accuracy metric updates by @oktie in #1604
Fix Azure OpenAI based LLM judges by @martinscooper in #1619
Add correctness_based_on_ground_truth criteria by @martinscooper in #1623
Enable offline mode for hugginface by using local pre-downloaded metrics, datasets and models by @elronbandel in #1603
Add provider specific args and allow using unrecognized model names by @elronbandel in #1621
Start implementing assesment for unitxt assitant by @eladven in #1625
small changes to profiler by @dafnapension in #1627
Return MultiStream in lazy loaders to avoid copying by @elronbandel in #1628

New Contributors

@tsinggggg made their first contribution in #1583

Full Changelog: 1.18.0...1.19.0

Contributors

eladven, oktie, and 9 other contributors

Assets 2

04 Feb 14:20

elronbandel

1.18.0

2ef9091

Unitxt 1.18.0 - Faster Loading

The main improvements in this version focus on caching strategies, dataset loading, and speed optimizations.

Hugging Face Datasets Caching Policy

We have completely revised our caching policy and how we handle Hugging Face datasets in order to improve performance.

Hugging Face datasets are now cached by default.

This means that LoadHF loader will cache the downloaded datasets in the HF cache directory (typically ~/.cache/huggingface/datasets).

To disable this caching mechanism, use:

unitxt.settings.disable_hf_datasets_cache = True

All Hugging Face datasets are first downloaded and then processed.
- This means the entire dataset is downloaded, which is faster for most datasets. However, if you want to process a huge dataset, and the HF dataset supports streaming, you can load it in streaming mode
```
LoadHF(name="my-dataset", streaming=True)
```
To enable streaming mode by default for all Hugging Face datasets, use:
```
unitxt.settings.stream_hf_datasets_by_default = True
```

While the new defaults (full download & caching) may make the initial dataset load slower, subsequent loads will be significantly faster.

Unitxt Datasets Caching Policy

By default, when loading datasets with unitxt.load_dataset, the dataset is prepared from scratch each time you call the function.
This ensures that any changes made to the card definition are reflected in the output.

This process may take a few seconds, and for large datasets, repeated loading can accumulate overhead.
If you are using fixed datasets from the catalog, you can enable caching for Unitxt datasets and thus cache the unitxt datasets.
The datasets are cached in the huggingface cache (typically ~/.cache/huggingface/datasets).
```
from unitxt import load_dataset

ds = load_dataset(card="my_card", use_cache=True)
```

Faster Unitxt Dataset Preparation

To improve dataset loading speed, we have optimized how Unitxt datasets are prepared.

Background:

Unitxt datasets are converted to Hugging Face datasets because they store data on disk while keeping only the necessary parts in memory (via PyArrow). This enables efficient handling of large datasets without excessive memory usage.

Previously, unitxt.load_dataset used built-in Hugging Face methods for dataset preparation, which included unnecessary type handling and verification, slowing down the process.

Key improvements:

We now create the Hugging Face dataset directly, reducing preparation time by almost 50%.
With this optimization, Unitxt datasets are now faster than ever!

What's Changed

End of year summary blog post by @elronbandel in #1530
Updated documentation and examples of LLM-as-Judge by @tejaswini in #1532
Eval assist documentation by @tejaswini in #1537
Update notification banner styles and add 2024 summary blog link by @elronbandel in #1538
Add more granite llm as judge artifacts by @martinscooper in #1516
Fix Australian legal qa dataset by @elronbandel in #1542
Set use 1 shot for wikitq in tables_benchmark by @yifanmai in #1541
Bugfix: indexed row major serialization fails with None cell values by @yifanmai in #1540
Solve issue of expired token in Unitxt Assistant by @eladven in #1543
Add Replicate inference support by @elronbandel in #1544
add a filter to wikitq by @ShirApp in #1547
Add text2sql tasks by @perlitz in #1414
Add deduplicate operator by @elronbandel in #1549
Fix the authentication problem by @eladven in #1550
Attach assitant answers to their origins with url link by @elronbandel in #1528
Add mtrag benchmark by @elronbandel in #1548
Update end of year summary blog by @elronbandel in #1552
Add data classification policy to CrossProviderInferenceEngine initialization based on selected model by @elronbandel in #1539
Fix recently broken rag metrics by @elronbandel in #1554
Renamed criterias in LLM-as-a-Judge metrics to criteria - Breaking change by @tejaswini in #1545
Finqa hash to top by @elronbandel in #1555
Refactor safety metric to be faster and updated by @elronbandel in #1484
Improve assistant by @elronbandel in #1556
Feature/add global mmlu cards by @eliyahabba in #1561
Add quality dataset by @eliyahabba in #1563
Add CollateInstanceByField operator to group data by specific field by @sarathsgvr in #1546
Fix prompts table benchmark by @ShirApp in #1565
Create new IntersectCorrespondingFields operator by @pklpriv in #1531
Add granite documents format by @elronbandel in #1566
Revisit huggingface cache policy - BREAKING CHANGE by @elronbandel in #1564
Add global mmlu lite sensitivity cards by @eliyahabba in #1568
Add schema-linking by @KyleErwin in #1533
fix the printout of empty strings in the yaml cards of the catalog by @dafnapension in #1567
Use repr instead of to_json for unitxt dataset caching by @elronbandel in #1570
Added key value extraction evaluation and example with images by @yoavkatz in #1529

New Contributors

@tejaswini made their first contribution in #1532
@KyleErwin made their first contribution in #1533

Full Changelog: 1.17.0...1.18.0

Contributors

yifanmai, tejaswini, and 11 other contributors

Assets 2

Releases: IBM/unitxt

Unitxt 1.23.0

Main changes

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.22.4

What's Changed

Contributors

Uh oh!

Unitxt 1.22.3

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.22.2

What's Changed

Contributors

Uh oh!

Unitxt 1.22.1

What's Changed

Contributors

Uh oh!

Unitxt 1.22.0

Main changes

Catalog changes

CI/CD

Other

Contributors

Uh oh!

Unitxt 1.21.0

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.20.0

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.19.0

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.18.0 - Faster Loading

Hugging Face Datasets Caching Policy

Unitxt Datasets Caching Policy

Faster Unitxt Dataset Preparation

Background:

Key improvements:

What's Changed

New Contributors

Contributors

Uh oh!