Releases: argilla-io/distilabel
Releases · argilla-io/distilabel
1.0.0
What's Changed
- Add
Stepabstract class and newPipelineby @gabrielmbmb in #338 - Add runtime parameters validation by @gabrielmbmb in #345
- Pipeline local execution by @gabrielmbmb in #346
- Add
Task(minimal implementation) by @alvarobartt in #347 - Refactor
_BatchManagerto have list of batches per step by @gabrielmbmb in #353 - Refactor getting parameters from
Step.processmethod by @gabrielmbmb in #355 - Add
LLM,OpenAILLM,TransformersLLM, andLlamaCppLLMby @alvarobartt in #354 - Fix
TaskandTextGenerationby @alvarobartt in #356 - Add
combine_dictsfunction andCombineColumnsclass by @alvarobartt in #358 - Add
PushToHubstep and fixtypingby @alvarobartt in #357 - Add serialization for the new components by @plaguss in #349
- Fix
OpenAILLM.api_keydue toSecretStrandStepInputwrong imports by @alvarobartt in #359 - Add
GlobalStep, fix_BatchManager, and addloggingby @alvarobartt in #362 - Migrate vllm to the new API by @plaguss in #361
- Update
_BatchManagerto work withGlobalSteps andinput_batch_sizeper step by @gabrielmbmb in #366 - Clean up outdated / unused files by @alvarobartt in #369
- Add
input_mappingsandoutput_mappingsattributes by @gabrielmbmb in #367 - Move batching from
TasktoLLM, fixvLLM.generateand addDISTILABEL_LOG_LEVELby @alvarobartt in #371 - Improve runtime parameter definition by @gabrielmbmb in #372
- Add
AsyncOpenAIand updateOpenAILLMaccordingly by @alvarobartt in #381 - Update serde by @gabrielmbmb in #382
- Add
MistralLLMand addgeneration_kwargsasRuntimeParametersby @alvarobartt in #383 - Move
stepsout ofpipelineby @gabrielmbmb in #384 - Add tests and docstring for
Taskand subclasses by @alvarobartt in #385 - Add
stepdecorator by @gabrielmbmb in #387 - Add
inputpropagation throughTask.processby @alvarobartt in #399 - Improve
Pipelineerror handling by @gabrielmbmb in #400 - Fix
combine_dictsandStepInputimport inPushToHubby @alvarobartt in #401 - Improve
GlobalSteperror handling by @gabrielmbmb in #402 - Changed " by italics in EvolInstruct tutorial where one "" was missing by @ignacioct in #398
- Add
get_last_hidden_statesmethod and updateTransformersLLMby @gabrielmbmb in #414 - docs: correct small typos in tutorial by @sdiazlor in #419
- docs: readme positioning by @davidberenstein1957 in #386
- Add
num_generationsandgroup_generationsparameters toTaskby @gabrielmbmb in #416 - Add
ArgillaandPromptCompletionToArgillaby @alvarobartt in #420 - Add
EvolInstructandEvolInstructGeneratortasks by @alvarobartt in #407 - Wrap optional
LLMdependencies underloadby @alvarobartt in #428 - Add
ComplexityScorertask by @gabrielmbmb in #421 - Implement caching mechanism for the pipelines by @plaguss in #370
- Add method to Pipeline to handle keyboard interruptions via ctrl+c by @plaguss in #406
- Add
GenerateEmbeddingstask by @gabrielmbmb in #427 - Add
api_keywithinLLM.loadand addllm_kwargsasRuntimeParameterby @alvarobartt in #432 - Add
GeneratorStep.processvalidation inDAGand smaller fixes by @alvarobartt in #435 - Add
EvolComplexitytask by @davidberenstein1957 in #415 - Add
QualityScorerTask by @ignacioct in #425 - Add
CudaDevicePlacementMixinclass by @gabrielmbmb in #436 - Return
distisetfromPipeline.runby @plaguss in #417 - Update README.md by @strickvl in #451
- Add
InferenceEndpointsLLMby @alvarobartt in #439 - Fix
DistisetafterPushToHuband smaller fixes by @alvarobartt in #452 - Fix
Step.process_applying_mappingsby @alvarobartt in #453 - Add
AnyscaleLLMby @davidberenstein1957 in #447 - Add general function to obtain schema for parquet writer by @plaguss in #454
- Add
TogetherLLMby @davidberenstein1957 in #449 - Fix
LLMsubclasses based onOpenAILLMby @alvarobartt in #455 - Improve batching and caching by @gabrielmbmb in #457
- Add
EvolQualitytask by @davidberenstein1957 in #429 - Add
VertexAILLMby @davidberenstein1957 in #445 - Add
use_cachetoBasePipelineby @plaguss in #463 - Add
AnthropicLLMby @sdiazlor in #444 - Add
multiprocessdependency by @gabrielmbmb in #467 - Add
UltraFeedbackby @alvarobartt in #464 - Add
OllamaLLMby @davidberenstein1957 in #405 - Add
RuntimeParametersMixinandLLMruntime parameters by @gabrielmbmb in #466 - Add
LiteLLMby @davidberenstein1957 in #441 - Add CLI by @gabrielmbmb in #471
- Set
_batch_managertoNoneafter run by @gabrielmbmb in #473 - Add create_distiset function by @plaguss in #480
- Add
overloadtostepdecorator by @gabrielmbmb in #474 - Move Enum to Dict[str, str] to avoid serialization errors during caching by @plaguss in #482
- Include a dataset card and the
pipeline.yamlonDistiset.push_to_hubby @plaguss in #479 - Add
PairRMtask for ranking responses by @plaguss in #450 - Update
_WriteBufferto write several parquet files by @gabrielmbmb in #483 - Extend
ArgillaintegrationTextGeneration,Preference, and more by @alvarobartt in #472 - Add
DeitaFilteringstep by @gabrielmbmb in #481 - Add
InstructionBacktranslationby @alvarobartt in #486 - Fix huggingface_hub TextGenerationError import by @Wauplin in #485
- Improve azure openai support by @BramVanroy in #461
- Add
SelfInstructtask by @ignacioct in #456 - Use
QueueHandlerforPipelinelogging by @gabrielmbmb in #489 - Improve
_stopandloggingby @gabrielmbmb in #491 - Fix creating empty
Datasetincreate_distisetfunction by @gabrielmbmb in #492 - Add imports from
__init__modules by @gabrielmbmb in #493 batch_sizeandinput_batch_sizeruntime parameters by @gabrielmbmb in #495- Update serialization method of _BatchManager to write each step on its own file by @plaguss in #496
- Fix
asyncioinAsyncLLMto use the running event loop if any by @alvarobartt in #501 - Added authentication header to allow private/gated dataset use by @bjoernpl in https://github.com/argilla-io/distila...
0.6.0
What's Changed
- Fix typo in docstring of to_argilla metrics_ to metric_ by @burtenshaw in #334
- Implement a JSON responding OpenAI LLM as JSONOpenAILLM by @burtenshaw in #331
- Add examples for the deita paper tasks by @plaguss in #329
- Add checkpoint strategy to automatically push to hub by @plaguss in #321
- docs: update tutorials avoid argilla installation error by @sdiazlor in #337
- Fix
CustomDataset.load_from_diskwithstr/Pathobjects by @plaguss in #341 - Clalrify number of generations produced when using LLMPool in docs by @davanstrien in #339
- Refactor _build_dataset piece for speed by @plaguss in #344
- Fix documentation and type variables in
CustomDatasetcheckpoint methods by @plaguss in #342 - US Spelling and other typo correction on Distilabel tutorials by @ignacioct in #324
- docs: add a tutorial for evolinstruct by @sdiazlor in #327
- Fix Openai api error with OpenAI-compatible providers by @jphme in #351
- Add fix for labels not returned by openai api by @plaguss in #364
- Refactor model availability check in is_serverless_endpoint_available by @davanstrien in #363
New Contributors
- @burtenshaw made their first contribution in #334
- @jphme made their first contribution in #351
Full Changelog: 0.5.0...0.6.0
0.5.0
What's Changed
- fix: Correct import error by @plaguss in #279
- fix: Filter examples for which len generations != len ratings by @plaguss in #284
- feat: Add sentence transformers support for the to argilla method by @davidberenstein1957 in #262
- feat: Add text descriptives support to the to argilla methods by @davidberenstein1957 in #271
- feat: Add
to_argillamethod toEvolInstructTaskgenerated datasets by @plaguss in #291 - docs: Shorten titles tutorials and update core example by @davidberenstein1957 in #289
- feat: Add new serialization strategy by @plaguss in #288
- feat: Review
OllamaLLMandTogetherInferenceLLMby @alvarobartt in #305 - refactor: Remove Metadata for Ratings by @ignacioct in #303
- docs: Add missing VertexAI information within
README.mdanddocs/index.mdby @alvarobartt in #308 - feat: Add functionality to push tasks to the HuggingFace hub and download them automatically. by @plaguss in #297
- feat: Add
ComplexityScorerandQualityScorertasks from Deita by @plaguss in #302 - fix: Fix logging visualization of labeller pipelines by @plaguss in #310
- feat: Add
Improving Text Embeddings with LLMstutorial by @alvarobartt in #313 - feat: Add
EvolComplexityandEvolQualityby @davidberenstein1957 in #299 - feat: Add
validate_promptsmethod to LLMs to help validating the prompts by @plaguss in #314 - fix: typo in clean an existing preference dataset by @sdiazlor in #312
- feat: Add new column for sft fine tuning with
prepare_datasetby @plaguss in #309 - docs: Custom Task Documentation by @ignacioct in #275
- refactor: Align the
LLMsubclasses args by @alvarobartt in #315 - feat: Include rationale of the model responses on
prepare_datasetif available by @plaguss in #317 - feat: Add embedding tutorial to docs by @ignacioct in #319
- feat: Add
MistralAILLMby @plaguss in #293 - feat: Use
ollamaPython client withinOllamaLLMby @sdiazlor in #307
Full Changelog: 0.4.0...0.5.0
0.4.0
What's Changed
- docs: Notus end2end example for preference and instruction generation by @ignacioct in #145
- docs: binders anchors by @ignacioct in #235
- feat: Add support for dedicated and serverless inference endpoints via inference API by @philschmid in #238
- docs: Update links to arxiv landing pages rather than PDFs by @davanstrien in #249
- feat: add ETA to progress bar and fix not showing the progress bar if irrelavant by @ignacioct in #253
- feat: Add Evol instruct task by @plaguss in #237
- docs: rename
enable_checkpointstocheckpoint_strategyby @davidberenstein1957 in #257 - feat: Fixing progress bar and ETA by @ignacioct in #260
- fix: resolved error with self instruct to argilla method by @plaguss in #265
- chore: Add extra check in llmpool to ensure all the tasks share the same parent class by @plaguss in #266
- fix: fix for Notus tutorial after bug in record unwrap by @ignacioct in #267
- feat: add customizable criteria for query generation in SelfInstructTask by @ignacioct in #269
- docs: add a tutorial on "clean a DPO/preference dataset with distilabel" by @sdiazlor in #270
- feat: Add new functionality to binarize preference datasets directly from distilabel by @plaguss in #264
- feat: add support
ollamaapi by @davidberenstein1957 in #250
New Contributors
- @philschmid made their first contribution in #238
- @davanstrien made their first contribution in #249
- @sdiazlor made their first contribution in #270
Full Changelog: 0.3.0...0.4.0
0.3.0
What's Changed
- Add
VertexAILLM&VertexAIEndpointLLMclasses by @gabrielmbmb in #204 - Add draft with social cards by @plaguss in #197
- Relax
LLMPoolcheck to match parentTaskinstead by @plaguss in #210 - Align
README.mdwithdocs/and minor fixes / improvements by @alvarobartt in #214 - Add
TogetherInferenceLLMby @alvarobartt in #215 - Add checking valid
inputsbefore calling_generateby @gabrielmbmb in #216 - Add
TogetherInferenceLLMtests by @alvarobartt in #217 - Add Vertex AI
LLMs documentation by @gabrielmbmb in #222 - Documentation review by @alvarobartt in #223
- Rename
for_text_qualitytofor_overall_qualitymethod inUltraFeedbackTaskby @alvarobartt in #224 - Add Anyscale endpoints by @plaguss in #213
- Feature dataset checkpoint strategy by @plaguss in #194
- Fix
ratingparsing inRatingToArgillaMixin.to_argilla_recordby @alvarobartt in #227 - Add badges to readme by @plaguss in #226
- Fix badges by @dvsrepo in #228
- Update
LICENSEand addLICENSE_HEADERby @davidberenstein1957 in #221
Full Changelog: 0.2.1...0.3.0
0.2.1
What's Changed
- Fix
PrometheusTaskcould not be imported by @gabrielmbmb in #190 - Fix
LLM.return_futuresby @gabrielmbmb in #192 - Remove learn section from docs until developed by @plaguss in #188
- Add markdown to fields by default by @plaguss in #189
- Fix
PrometheusTaskandUltraCMTaskcould not be chained withTextGenerationTaskby @gabrielmbmb in #195 - Add missing
use_markdownfor every field by @plaguss in #196 - Add
to_argilla_{dataset,record}forCritiqueTaskby @gabrielmbmb in #198 - Update
generate_promptinTasksubclasses to always returnPromptby @alvarobartt in #199 - Add
CritiqueTaskdocumentation by @alvarobartt in #200 - Fix
UltraCMTaskscoring range and alignargillaimports by @alvarobartt in #201
Full Changelog: 0.2.0...0.2.1
0.2.0
What's Changed
- adds accelerate example by @edbeeching in #141
- Add a dry-run when calling
Pipeline.generateby @alvarobartt in #146 - Add Notus format in
Prompt.format_asand updateexamples/*.pyby @alvarobartt in #147 - Add
ProcessLLMclass by @gabrielmbmb in #151 - Adds
CritiqueTask,UltraCMTaskand more by @alvarobartt in #152 - docs: add
llama.cppto extras by @davidberenstein1957 in #154 - Fix
_build_datasetasprocessed_labelswere ignored by @plaguss in #158 - Add
to_argilla_{dataset,record}methods inTextGenerationTaskby @alvarobartt in #159 - Fix
UltraFeedbackTask.to_argilla_datasetratings values by @alvarobartt in #160 - Align
typingandtyping_extensionswith supported Python versions by @alvarobartt in #161 - Add
LLMPoolclass by @gabrielmbmb in #156 - Add missing
CritiqueTaskandUltraCMTaskin__init__and moveargilla_utilstoutils.argillaby @alvarobartt in #162 - Add
testworkflow by @gabrielmbmb in #163 - Update
LLMto returnFuture[List[List[LLMOutput]]]by @gabrielmbmb in #164 - Add
PrometheusTaskby @alvarobartt in #165 - Randomise generations order by @gabrielmbmb in #167
- Add custom
to_argilla_{dataset,record}toSelfInstructTaskby @alvarobartt in #169 - Fix
shuffle_before_labellingand progress bar inPipeline.generateby @alvarobartt in #170 - Replace
multiprocessingwithmultiprocessby @gabrielmbmb in #171 - Refactor and improve docs by @plaguss in #134
- Fix
SelfInstructTask.{parse_output,to_argilla_record}methods and_build_datasetby @alvarobartt in #172 - Fix
resultsdidn't have same order asfuturesby @gabrielmbmb in #173 - Remove unnecesary plugin by @plaguss in #174
- Add
{generation,labelling}_modelcolumn as metadata in Argilla by @alvarobartt in #175 - Fix exporting model name to Argilla with
LLMPoolby @gabrielmbmb in #177 - Update docs to include info about
ProcessLLMandLLMPoolby @gabrielmbmb in #176
New Contributors
- @edbeeching made their first contribution in #141
- @davidberenstein1957 made their first contribution in #154
Full Changelog: 0.1.1...0.2.0
0.1.1
What's Changed
- Template for Documentation Issue created by @ignacioct in #128
- self.thread_pool_executor can be None, protecting it for print by @ignacioct in #129
- Use
do_sampleintransformersexample by @dvsrepo in #138 - Fix
llama-cppandhf-inference-endpointsextras inpyproject.tomlby @plaguss in #139 - Fix
llama_cpp_pythondependency check by @plaguss in #140
New Contributors
- @ignacioct made their first contribution in #128
- @plaguss made their first contribution in #139
Full Changelog: 0.1.0...0.1.1
0.1.0
Stable Release - v0.1.0
0.1.0rc2
distilabel 0.1.0rc2