Unitxt 1.9.0
What's Changed
The most important things are:
- Addition of LLM as a Judge Metrics and Tasks for both evaluating LLMs as judge and using them for evaluation of other tasks. Read more in the LLM as a Judge Tutorial
 - Addition of RAG response generation tasks and datasets as part of an effort to add comprhensive RAG evaluation to unitxt.
 - Renaming FormTask to Task for simplicity
 - Major improvments to documentation and tutorials
 
Breaking Changes 🚨
- Ensure consistent evaluation of CI across implementations [Might change previous results] by @dafnapension in #844
 - Fix default format so it will be the same as formats.empty in catalog. Impacts runs that did not specify a format by @yoavkatz in #848
 - LoadJson operator moved from unit.processors to unitxt.struct_data_operators
 - Fixed YesNoTemplate and Diverse LabelSampler, to support binary task typing. YesNoTemplate now expect class field to contain a string and not a list of of strings with one elements by @yoavkatz in #836
 
Bug Fixes
- Change processor type for to_list_by_comma_from_references by @antonpibm in #815
 - Handle empty text in Literal Eval by @antonpibm in #819
 - Fix clash between dir names and artifact names in catalog website by @elronbandel in #825
 - Ner typing had a mistake. by @yoavkatz in #832
 - Fix catalog reference by @elronbandel in #838
 - Fix default format by @yoavkatz in #848
 - Fixed YesNoTemplate and Diverse LabelSampler, to support binary task typing. by @yoavkatz in #836
 
New Features
- Support prediction regex match by setting the operator as a postproce… by @antonpibm in #792
 - Add sample score output in test card by @yoavkatz in #803
 - Support for loading dictionaries by @pawelknes in #784
 - Add ability to fuse, split, MultiStreamScoreMean, and merge all by @dafnapension in #767
 - Changed default log verbosity to "info" instead of "debug" by @yoavkatz in #822
 - Skip artifact prepare and verify in catalog consistency tests by @elronbandel in #839
 - Add seperation between eagered streams and regular streams by @elronbandel in #846
 - Add precision and recall scores to f1_binary, max_f1_binary by @lilacheden in #824
 - Rename task by @elronbandel in #850
 
New Assets
- Add basic format for llama3 models by @arielge in #812
 - Adding literal eval processor by @antonpibm in #813
 - Add RAG (response generation part) tasks and datasets by @perlitz in #811
 - Add 5 legalbench tasks (the 5 existing in HELM) by @perlitz in #827
 - Add financebench by @perlitz in #828
 - Add billsum dataset by @perlitz in #830
 - Add tldr dataset by @perlitz in #831
 - Add Attaq500 by @naamaz in #835
 - Add llm as judge mt-bench dataset and metrics by @OfirArviv in #791
 
Documentation
- Documentation review by @yoavkatz in #805
 - Added documentation for global and huggingface metrics by @yoavkatz in #807
 - Touch up docs by @elronbandel in #809
 - Remove the contents from main menu by @elronbandel in #810
 - Add tags docs by @elronbandel in #814
 - Reviewing Unitxt tutorials by @michal-jacovi in #817
 - Fix the link to the operators tutorial by @elronbandel in #821
 - More documentation changes in metrics by @yoavkatz in #820
 - Update adding_task.rst by @michal-jacovi in #823
 - Fix missing mandatory new line in the begging of code block in documentation by @elronbandel in #829
 - Add description, homepage, and citation obtained from HF with datasets.load_dataset_builder by @dafnapension in #818
 - Updated documentation by @yoavkatz in #849
 
New Contributors
- @antonpibm made their first contribution in #792
 - @michal-jacovi made their first contribution in #817
 
Full Changelog: 1.8.1...1.9.0