07 Aug 08:44

elronbandel

076649a

Unitxt 1.26.6 Latest

Latest

What's Changed

Update pearsonr tests by @elronbandel in #1890
return source_to_recipe to performance evaluation, once 403 is fixed by bnayahu by @dafnapension in #1891
remove a card whose preprocess_steps do not match the contents of the loaded dataset by @dafnapension in #1893
fix an ineffective setting of max size of loader_cache by @dafnapension in #1892
Fix compatibility with datasets 4.0 by @elronbandel in #1861
Improve speed in mmlu global by @elronbandel in #1895
Remove the need for datasets<4.0.0 by @elronbandel in #1897
Refresh README by @elronbandel in #1898
Update Readme by @elronbandel in #1899
Update README by @elronbandel in #1900
Update README by @elronbandel in #1901
Fix docs and example of how to use benchmark by @elronbandel in #1903
Refine condition for avoiding the Benchmark wrapper by @bnayahu in #1904
Complete transition to datasets 4.0.0 in preparation tests by @dafnapension in #1902
Make sacrebleu faster and more efficient by @elronbandel in #1906
Implements LogProbEngine on CrossInference and adds more granite guardian models by @martinscooper in #1905
Remove IBM GenAI support and moved legacy GenAI metrics to use CrossProviderInferenceEngine by @yoavkatz in #1508
GPT on rits and minor llm judge criteria changes by @martinscooper in #1909
The special installation of networkx can be removed as well by @dafnapension in #1908
Update version to 1.26.6 by @elronbandel in #1911

Full Changelog: 1.26.5...1.26.6

Contributors

martinscooper, bnayahu, and 3 other contributors

Assets 2

31 Jul 14:10

elronbandel

1.26.5

8eb8974

Unitxt 1.26.5

What's Changed

For load_dataset, use_cache default value is taken from settings by @eladven in #1880
Support watsonx.ai on-prem credentials by @pratapkishorevarma in #1883
extend condition to also filter by field exists or not by @dafnapension in #1879
fix performance test by @dafnapension in #1884
Add support for inline-defined templates in the UI by @Chemafiz in #1886
Mitigate HTTP 403 errors in pandas by @bnayahu in #1888
Biggen benchmark and pearson correlation metric by @martinscooper in #1887
Update version to 1.26.5 by @elronbandel in #1889

New Contributors

@pratapkishorevarma made their first contribution in #1883
@Chemafiz made their first contribution in #1886

Full Changelog: 1.26.4...1.26.5

Contributors

eladven, martinscooper, and 5 other contributors

Assets 2

22 Jul 14:35

elronbandel

1.26.4

83063f9

Unitxt 1.26.4

What's Changed

Add more Judgebench benchmarks by @martinscooper in #1869
Make sqlite3 not an optional dependency by @elronbandel in #1871
Removed legacy topicality, idk, and groundness metrics that worked only on BAM by @yoavkatz in #1875
Bench and models by @martinscooper in #1872
Handle a case in ToolCallPostProcessor where prediction is an empty list of tools by @yoavkatz in #1874
Update version to 1.26.4 by @elronbandel in #1876

Full Changelog: 1.26.3...1.26.4

Contributors

martinscooper, elronbandel, and yoavkatz

Assets 2

16 Jul 17:47

elronbandel

1.26.3

728fcc8

Unitxt 1.26.3

What's Changed

LLM Judge: Improve context/prediction fields parsing by @martinscooper in #1856
Fixed bug in tool inference by @yoavkatz in #1868
Added a new MetricBasedNer that allows calculating entity similary using any Unitxt metric. by @yoavkatz in #1860
Update version to 1.26.3 by @elronbandel in #1870

Full Changelog: 1.26.2...1.26.3

Contributors

martinscooper, elronbandel, and yoavkatz

Assets 2

16 Jul 09:44

elronbandel

1.26.2

68aa406

Unitxt 1.26.2

What's Changed

Add tot dataset by @elronbandel in #1865
Add tokenizer_name to base huggingface inference engines by @elronbandel in #1862
Add hf to cross provider inference engine by @yoavkatz in #1866
Update version to 1.26.2 by @elronbandel in #1867

Full Changelog: 1.26.1...1.26.2

Contributors

elronbandel and yoavkatz

Assets 2

10 Jul 17:27

elronbandel

1.26.1

b6cc840

Unitxt 1.26.1

Lock datasets dependency to <4.0.0

The latest datasets v4.0.0 release removes support for loading datasets with trust_remote_code=True. This change breaks compatibility with many datasets currently in the Unitxt catalog, as several datasets require this feature to load properly.

This patch restricts the datasets version to below 4.0.0 until we can find or develop replacements for affected datasets.

Assets 2

09 Jul 14:27

elronbandel

1.26.0

9561615

Unitxt 1.26.0 - Multi Threading

Main changes:

Made Unitxt Thread-Safe so it can run in multi-threaded environments.
Added an option to set sampling seed for demos (in context example). This is done by demos_sampling_seed. It allows running the same dataset with different demo examples.
Improved printouts of instance scores with to_markdown() and summary in Unitxt. For example :

results = evaluate(predictions=predictions, data=dataset)
print(results.instance_scores.summary)

All changes:

Add to_markdown() to InstanceScores to pretty print output by @yoavkatz in #1846
Improved InstanceScores summary to be readible and in decent width by @yoavkatz in #1847
Improve multi turn tool calling example by @elronbandel in #1848
Add metrics documentation including range, directionality and references by @elronbandel in #1850
Fix sacrebleu documentation by @elronbandel in #1851
Add F1 score documentation to F1Fast metric class by @elronbandel in #1852
Add more llmjudge benchmarks by @martinscooper in #1804
Fix llama scout name and url on rits by @martinscooper in #1857
Add demos_sampling_seed to recipe api by @elronbandel in #1858
Add comprehensive multi threading support and tests by @elronbandel in #1853
Update BlueBench to match the original implementation by @bnayahu in #1855

Full Changelog: 1.25.0...1.26.0

Contributors

martinscooper, bnayahu, and 2 other contributors

Assets 2

25 Jun 18:51

elronbandel

1.25.0

c5acd23

Unitxt 1.25.0 - Improved Error Messages

Main changes

Error message simplied and improved. Now each failue produces a short stack trace, following by the context the error occured, a link to help documention, and then the detailed error message

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 🦄 Unitxt Error Context                                                                                              │
│ -------------------------------------------------------------------------------------------------------------------- │
│  - Python: 3.10.17                                                                                                   │
│  - Unitxt: 1.25.0                                                                                                    │
│  - Stage: Metric Processing                                                                                          │
│  - Stream: all_data>>                                                                                                │
│  - Object: KeyValueExtraction (https://www.unitxt.ai/en/latest/unitxt.metrics.html#unitxt.metrics.KeyValueExtraction)│
│  - Help: https://www.unitxt.ai/en/latest/docs/adding_metric.html                                                     │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Each reference is expected to be of type 'Dict[str, str]' in metrics.key_value_extraction.accuracy metric. Received reference of type <class 'str'>: Austin

Added Granite Thinking support including example.
Added a flag in the format to determine whether the to place the template instructions once in the system turn, or in the user turns (for each demo and for the final input). This is important because some models delete their default system prompt, when their recieve an external system prompt.
Added option to get generated text in meta data when calling infer_log_prob() . In the past only seperated tokens were returned.
See example code.
Added support for multi turn dialog metrics. See tool calling example.

What's Changed

Add Multi Turn Metrics Support by @elronbandel in #1579
add a test for faithfulness with an external client and fetch artifact by @matanor in #1824
Fix rits model names and the judges that use them by @martinscooper in #1825
Add option to store template instruction in user role and not system role and added granite thinking example by @yoavkatz in #1667
Bluebench fixes by @bnayahu in #1828
Fix huggingface auto model log probs evaluation by @elronbandel in #1829
Add support for tool calling in HFAutoModelInferenceEngine by @elronbandel in #1827
Changed artifiact.to_yaml() to use standard dict to yaml API + added example to create a yaml representation of a data card by @yoavkatz in #1831
Fix CLI issues by @bnayahu in #1832
Allow changing default ollama api_base by @martinscooper in #1830
fix bug when WML does not return any content or tool call by @yoavkatz in #1835
Arena hard fix by @bnayahu in #1836
Add full generated text when running infer_log_prob() with meta data enabled. by @yoavkatz in #1834
Improved parsing of MT bench style scores by @yoavkatz in #1839
Use os.path.join to create infer cache dir path by @martinscooper in #1840
Add multi turn tool calling task and support multiple tools per call by @elronbandel in #1811
Results summarization utility for the CLI by @bnayahu in #1842
Improved error messages by @elronbandel in #1838
Improve Text2SQL Metrics: Refactoring, New Execution Metric, and Bug Fixes by @oktie in #1841
Update coverage exclusions by @elronbandel in #1843
Use full artifact representation as the cache key for the dataset by @elronbandel in #1644
Update version to 1.25.0 by @elronbandel in #1844

Full Changelog: 1.24.0...1.25.0

Contributors

oktie, martinscooper, and 4 other contributors

Assets 2

03 Jun 15:52

elronbandel

1.24.0

5d576f6

Unitxt 1.24.0

What's Changed

External client for wml infer engine by @matanor in #1817
Improved JoinStream error messages by @yoavkatz in #1819
Added param to control of confidence interval calculation in evaluate api by @yoavkatz in #1815
extend code coverage some by @dafnapension in #1814
Make api_key_env_var optional in LoadFromAPI by @martinscooper in #1799
Fix Issue with multi byte token decoding by @elronbandel in #1821
Fix ruff format pre-commit by @elronbandel in #1822
Test eval utils with external client by @matanor in #1820
Improved and Optimized JaccardIndex , Spearman, StringContainment metrics and added MSE and RMSE metrics by @elronbandel in #1816
Update version to 1.24.0 by @elronbandel in #1823

Full Changelog: 1.23.1...1.24.0

Contributors

martinscooper, elronbandel, and 3 other contributors

Assets 2

29 May 14:23

elronbandel

1.23.1

f5e47f7

Unitxt 1.23.1

What's Changed

Add more metrics for schema linking by @kurhula in #1788
Fixed argument_value_precision by @yoavkatz in #1794
FIx granite guardian agentic metric and align it with unitxt built in tool calling types by @elronbandel in #1786
Allow running benchmarks and recipes in cli by @elronbandel in #1785
Add ToRR Benchmark Readme file by @csrajmohan in #1793
Add tool calling correctness metric by @elronbandel in #1796
Remove IBM branding from opensource doc by @yoavkatz in #1802
Add LoadJsonFile loader and tests by @elronbandel in #1801
LLM judge judgebench benchmarks by @martinscooper in #1800
Added granite tool calling system prompt by @Narayanan-V-Eswar in #1798
Documenation updates by @yoavkatz in #1790
Cards for the Real MM RAG datasets by @assaftibm in #1795
Add more judges by @martinscooper in #1808
Fixed problematic load of json with a single dictionary line. by @yoavkatz in #1806
Add more cross provider models by @martinscooper in #1807
Fix model name by @martinscooper in #1809
watsonx.ai mistral small support by @LukaszCmielowski in #1810
Fix: number of batches calculation is incorrect by @martinscooper in #1805
Fix example dependencies installation by @elronbandel in #1812
Update version to 1.23.1 by @elronbandel in #1818

New Contributors

@kurhula made their first contribution in #1788
@LukaszCmielowski made their first contribution in #1810

Full Changelog: 1.23.0...1.23.1

Contributors

csrajmohan, kurhula, and 6 other contributors

Assets 2

Releases: IBM/unitxt

Unitxt 1.26.6

What's Changed

Contributors

Uh oh!

Unitxt 1.26.5

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.26.4

What's Changed

Contributors

Uh oh!

Unitxt 1.26.3

What's Changed

Contributors

Uh oh!

Unitxt 1.26.2

What's Changed

Contributors

Uh oh!

Unitxt 1.26.1

Lock datasets dependency to <4.0.0

Uh oh!

Unitxt 1.26.0 - Multi Threading

Contributors

Uh oh!

Unitxt 1.25.0 - Improved Error Messages

Main changes

What's Changed

Contributors

Uh oh!

Unitxt 1.24.0

What's Changed

Contributors

Uh oh!

Unitxt 1.23.1

What's Changed

New Contributors

Contributors

Uh oh!