All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fixed PostgreSQL database not being updated after
begin_nestedbecause of missingcommit(#3567).
- Added
PATCH /api/v1/fields/{field_id}endpoint to update the field title and markdown settings (#3421). - Added
PATCH /api/v1/datasets/{dataset_id}endpoint to update dataset name and guidelines (#3402). - Added
PATCH /api/v1/questions/{question_id}endpoint to update question title, description and some settings (depending on the type of question) (#3477). - Added
DELETE /api/v1/records/{record_id}endpoint to remove a record given its ID (#3337). - Added
pullmethod inRemoteFeedbackDataset(aFeedbackDatasetpushed to Argilla) to pull all the records from it and return it as a local copy as aFeedbackDataset(#3465). - Added
deletemethod inRemoteFeedbackDataset(aFeedbackDatasetpushed to Argilla) (#3512). - Added
delete_recordsmethod inRemoteFeedbackDataset, anddeletemethod inRemoteFeedbackRecordto delete records from Argilla (#3526).
- Improved efficiency of weak labeling when dataset contains vectors (#3444).
- Added
ArgillaDatasetMixinto detach the Argilla-related functionality from theFeedbackDataset(#3427) - Moved
FeedbackDataset-relatedpydantic.BaseModelschemas toargilla.client.feedback.schemasinstead, to be better structured and more scalable and maintainable (#3427) - Update CLI to use database async connection (#3450).
- Limit rating questions values to the positive range [1, 10] (#3451).
- Updated
POST /api/usersendpoint to be able to provide a list of workspace names to which the user should be linked to (#3462). - Updated Python client
User.createmethod to be able to provide a list of workspace names to which the user should be linked to (#3462). - Updated
GET /api/v1/me/datasets/{dataset_id}/recordsendpoint to allow getting records matching one of the response statuses provided via query param (#3359). - Updated
POST /api/v1/me/datasets/{dataset_id}/recordsendpoint to allow searching records matching one of the response statuses provided via query param (#3359). - Updated
SearchEngine.searchmethod to allow searching records matching one of the response statuses provided (#3359). - After calling
FeedbackDataset.push_to_argilla, the methodsFeedbackDataset.add_recordsandFeedbackRecord.set_suggestionswill automatically call Argilla with no need of callingpush_to_argillaexplicitly (#3465). - Now calling
FeedbackDataset.push_to_huggingfacedumps theresponsesas aList[Dict[str, Any]]instead ofSequenceto make it more readable via 🤗datasets(#3539).
- Fixed issue with
boolvalues anddefaultfrom Jinja2 while generating the HuggingFaceDatasetCardfromargilla_template.md(#3499). - Fixed
DatasetConfig.from_yamlwhich was failing when callingFeedbackDataset.from_huggingfaceas the UUIDs cannot be deserialized automatically byPyYAML, so UUIDs are neither dumped nor loaded anymore (#3502). - Fixed an issue that didn't allow the Argilla server to work behind a proxy (#3543).
TextClassificationSettingsandTokenClassificationSettingslabels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).- Fixed
PUT /api/v1/datasets/{dataset_id}/publishto check whether at least one field and question hasrequired=True(#3511). - Fixed
FeedbackDataset.from_huggingfaceassuggestionswere being lost when there were noresponses(#3539). - Fixed
QuestionSchemaandFieldSchemanot validatingnameattribute (#3550).
- After calling
FeedbackDataset.push_to_argilla, callingpush_to_argillaagain won't do anything since the dataset is already pushed to Argilla (#3465). - After calling
FeedbackDataset.push_to_argilla, callingfetch_recordswon't do anything since the records are lazily fetched from Argilla (#3465). - After calling
FeedbackDataset.push_to_argilla, the Argilla ID is no longer stored in the attribute/propertyargilla_idbut inidinstead (#3465).
- Fixed
ModuleNotFoundErrorcaused because theargilla.utils.telemetrymodule used in theArgillaTrainerwas importing an optional dependency not installed by default (#3471). - Fixed
ImportErrorcaused because theargilla.client.feedback.configmodule was importingpyyamloptional dependency not installed by default (#3471).
- The
suggestion_type_enumENUM data type created in PostgreSQL didn't have any value (#3445).
- Fix database migration for PostgreSQL (See #3438)
- Added
GET /api/v1/users/{user_id}/workspacesendpoint to list the workspaces to which a user belongs (#3308 and #3343). - Added
HuggingFaceDatasetMixinfor internal usage, to detach theFeedbackDatasetintegrations from the class itself, and use Mixins instead (#3326). - Added
GET /api/v1/records/{record_id}/suggestionsAPI endpoint to get the list of suggestions for the responses associated to a record (#3304). - Added
POST /api/v1/records/{record_id}/suggestionsAPI endpoint to create a suggestion for a response associated to a record (#3304). - Added support for
RankingQuestionStrategy,RankingQuestionUnificationand the.for_text_classificationmethod for theTrainingTaskMapping(#3364) - Added
PUT /api/v1/records/{record_id}/suggestionsAPI endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391). - Added
suggestionsattribute toFeedbackRecord, and allow adding and retrieving suggestions from the Python client (#3370) - Added
allowed_for_rolesPython decorator to check whether the current user has the required role to access the decorated function/method forUserandWorkspace(#3383) - Added API and Python Client support for workspace deletion (Closes #3260)
- Added
GET /api/v1/me/workspacesendpoint to list the workspaces of the current active user (#3390)
- Updated output payload for
GET /api/v1/datasets/{dataset_id}/records,GET /api/v1/me/datasets/{dataset_id}/records,POST /api/v1/me/datasets/{dataset_id}/records/searchendpoints to include the suggestions of the records based on the value of theincludequery parameter (#3304). - Updated
POST /api/v1/datasets/{dataset_id}/recordsinput payload to add suggestions (#3304). - The
POST /api/datasets/:dataset-id/:task/bulkendpoints don't create the dataset if does not exists (Closes #3244) - Added Telemetry support for
ArgillaTrainer(closes #3325) User.workspacesis no longer an attribute but a property, and is callinglist_user_workspacesto list all the workspace names for a given user ID (#3334)- Renamed
FeedbackDatasetConfigtoDatasetConfigand export/import from YAML as default instead of JSON (just used internally onpush_to_huggingfaceandfrom_huggingfacemethods ofFeedbackDataset) (#3326). - The protected metadata fields support other than textual info - existing datasets must be reindex. See docs for more detail (Closes #3332).
- Updated
Dockerfileparent image frompython:3.9.16-slimtopython:3.10.12-slim(#3425). - Updated
quickstart.Dockerfileparent image fromelasticsearch:8.5.3toargilla/argilla-server:${ARGILLA_VERSION}(#3425).
- Removed support to non-prefixed environment variables. All valid env vars start with
ARGILLA_(See #3392).
- Fixed
GET /api/v1/me/datasets/{dataset_id}/recordsendpoint returning always the responses for the records even ifresponseswas not provided via theincludequery parameter (#3304). - Values for protected metadata fields are not truncated (Closes #3331).
- Big number ids are properly rendered in UI (Closes #3265)
- Fixed
ArgillaDatasetCardto include the values/labels for all the existing questions (#3366)
- Integer support for record id in text classification, token classification and text2text datasets.
- Using
rg.initwith defaultargillauser skips setting the default workspace if not available. (Closes #3340) - Resolved wrong import structure for
ArgillaTrainerandTrainingTaskMapping(Closes #3345) - Pin pydantic dependency to version < 2 (Closes 3348)
- Added
RankingQuestionSettingsclass allowing to create ranking questions in the API usingPOST /api/v1/datasets/{dataset_id}/questionsendpoint (#3232) - Added
RankingQuestionin the Python client to create ranking questions (#3275). - Added
Rankingcomponent in feedback task question form (#3177 & #3246). - Added
FeedbackDataset.prepare_for_trainingmethod for generaring a framework-specific dataset with the responses provided forRatingQuestion,LabelQuestionandMultiLabelQuestion(#3151). - Added
ArgillaSpaCyTransformersTrainerclass for supporting the training withspacy-transformers(#3256).
- Added instructions for how to run the Argilla frontend in the developer docs (#3314).
- All docker related files have been moved into the
dockerfolder (#3053). release.Dockerfilehave been renamed toDockerfile(#3133).- Updated
rg.loadfunction to raise aValueErrorwith a explanatory message for the cases in which the user tries to use the function to load aFeedbackDataset(#3289). - Updated
ArgillaSpaCyTrainerto allow re-usingtok2vec(#3256).
- Check available workspaces on Argilla on
rg.set_workspace(Closes #3262)
- Replaced
np.floatalias byfloatto avoidAttributeErrorwhen usingfind_label_errorsfunction withnumpy>=1.24.0(#3214). - Fixed
format_as("datasets")when no responses or optional respones inFeedbackRecord, to set their value to what 🤗 Datasets expects instead of justNone(#3224). - Fixed
push_to_huggingface()whengenerate_card=True(default behaviour), as we were passing a sample record to theArgillaDatasetCardclass, andUUIDs introduced in 1.10.0 (#3192), are not JSON-serializable (#3231). - Fixed
from_argillaandpush_to_argillato ensure consistency on both field and question re-construction, and to ensureUUIDs are properly serialized asstr, respectively (#3234). - Refactored usage of
import argilla as rgto clarify package navigation (#3279).
- Fixed URLs in Weak Supervision with Sentence Tranformers tutorial #3243.
- Fixed library buttons' formatting on Tutorials page (#3255).
- Modified styling of error code outputs in notebooks (#3270).
- Added ElasticSearch and OpenSearch versions (#3280).
- Removed template notebook from table of contents (#3271).
- Fixed tutorials with
pip install argillato not use older versions of the package (#3282).
- Added
metadataattribute to theRecordof theFeedbackDataset(#3194) - New
users updatecommand to update the role for an existing user (#3188) - New
Workspaceclass to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180) - Added
Userclass to let users manage their Argilla users via the Python client (#3169). - Added an option to display
tqdmprogress bar toFeedbackDataset.push_to_argillawhen looping over the records to upload (#3233).
- The role system now support three different roles
owner,adminandannotator(#3104) adminrole is scoped to workspace-level operations (#3115)- The
owneruser is created among the default pool of users in the quickstart, and the default user in the server has nowownerrole (#3248), reverting (#3188).
- As of Python 3.7 end-of-life (EOL) on 2023-06-27, Argilla will no longer support Python 3.7 (#3188). More information at https://peps.python.org/pep-0537/
- Added search component for feedback datasets (#3138)
- Added markdown support for feedback dataset guidelines (#3153)
- Added Train button for feedback datasets (#3170)
- Updated
SearchEngineandPOST /api/v1/me/datasets/{dataset_id}/records/searchto return thetotalnumber of records matching the search query (#3166)
- Replaced Enum for string value in URLs for client API calls (Closes #3149)
- Resolve breaking issue with
ArgillaSpanMarkerTrainerfor Named Entity Recognition withspan_markerv1.1.x onwards. - Move
ArgillaDatasetCardimport under@requires_versiondecorator, so that theImportErroronhuggingface_hubis handled properly (#3174) - Allow flow
FeedbackDataset.from_argilla->FeedbackDataset.push_to_argillaunder different dataset names and/or workspaces (#3192)
- Added boolean
use_markdownproperty toTextFieldSettingsmodel. - Added boolean
use_markdownproperty toTextQuestionSettingsmodel. - Added new status
draftfor theResponsemodel. - Added
LabelSelectionQuestionSettingsclass allowing to create label selection (single-choice) questions in the API (#3005) - Added
MultiLabelSelectionQuestionSettingsclass allowing to create multi-label selection (multi-choice) questions in the API (#3010). - Added
POST /api/v1/me/datasets/{dataset_id}/records/searchendpoint (#3068). - Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
- Added docstrings to the
pydantic.BaseModels defined atargilla/client/feedback/schemas.py(#3137) - Added the information about executing tests in the developer documentation ([#3143]).
- Updated
GET /api/v1/me/datasets/:dataset_id/metricsoutput payload to include the count of responses withdraftstatus. - Added
LabelSelectionQuestionSettingsclass allowing to create label selection (single-choice) questions in the API. - Added
MultiLabelSelectionQuestionSettingsclass allowing to create multi-label selection (multi-choice) questions in the API. - Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
- Updated
alembicsetup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044) - Improved
DatasetCardgeneration onFeedbackDataset.push_to_huggingfacewhengenerate_card=True, following the official HuggingFace Hub template, but suited toFeedbackDatasets from Argilla (#3110)
- Disallow
fieldsandquestionsinFeedbackDatasetwith the same name (#3126). - Fixed broken links in the documentation and updated the development branch name from
developmenttodevelop([#3145]).
/api/v1/datasetsnew endpoint to list and create datasets (#2615)./api/v1/datasets/{dataset_id}new endpoint to get and delete datasets (#2615)./api/v1/datasets/{dataset_id}/publishnew endpoint to publish a dataset (#2615)./api/v1/datasets/{dataset_id}/questionsnew endpoint to list and create dataset questions (#2615)/api/v1/datasets/{dataset_id}/fieldsnew endpoint to list and create dataset fields (#2615)/api/v1/datasets/{dataset_id}/questions/{question_id}new endpoint to delete a dataset questions (#2615)/api/v1/datasets/{dataset_id}/fields/{field_id}new endpoint to delete a dataset field (#2615)/api/v1/workspaces/{workspace_id}new endpoint to get workspaces by id (#2615)/api/v1/responses/{response_id}new endpoint to update and delete a response (#2615)/api/v1/datasets/{dataset_id}/recordsnew endpoint to create and list dataset records (#2615)/api/v1/me/datasetsnew endpoint to list user visible datasets (#2615)/api/v1/me/dataset/{dataset_id}/recordsnew endpoint to list dataset records with user responses (#2615)/api/v1/me/datasets/{dataset_id}/metricsnew endpoint to get the dataset user metrics (#2615)/api/v1/me/records/{record_id}/responsesnew endpoint to create record user responses (#2615)- showing new feedback task datasets in datasets list ([#2719])
- new page for feedback task ([#2680])
- show feedback task metrics ([#2822])
- user can delete dataset in dataset settings page ([#2792])
- Support for
FeedbackDatasetin Python client (parent PR #2615, and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003]) - Integration with the HuggingFace Hub ([#2949])
- Added
ArgillaPeftTrainerfor text and token classificaiton #2854 - Added
predict_proba()method toArgillaSetFitTrainer - Added
ArgillaAutoTrainTrainerfor Text Classification #2664 - New
database revisionscommand showing database revisions info
- Avoid rendering html for invalid html strings in Text2text ([#2911]argilla-io/argilla#2911)
- The
database migratecommand accepts a--revisionparam to provide specific revision id tokens_lengthmetrics function returns empty data (#3045)token_lengthmetrics function returns empty data (#3045)mention_lengthmetrics function returns empty data (#3045)entity_densitymetrics function returns empty data (#3045)
- Using Argilla with Python 3.7 runtime is deprecated and support will be removed from version 1.11.0 (#2902)
tokens_lengthmetrics function has been deprecated and will be removed in 1.10.0 (#3045)token_lengthmetrics function has been deprecated and will be removed in 1.10.0 (#3045)mention_lengthmetrics function has been deprecated and will be removed in 1.10.0 (#3045)entity_densitymetrics function has been deprecated and will be removed in 1.10.0 (#3045)
- Removed mention
density,tokens_lengthandchars_lengthmetrics from token classification metrics storage (#3045) - Removed token
char_start,char_end,tag, andscoremetrics from token classification metrics storage (#3045) - Removed tags-related metrics from token classification metrics storage (#3045)
- add
max_retriesandnum_threadsparameters torg.logto run data logging request concurrently with backoff retry policy. See #2458 and #2533 rg.loadacceptsinclude_vectorsandinclude_metricswhen loading data. Closes #2398- Added
settingsparam toprepare_for_training(#2689) - Added
prepare_for_trainingforopenai(#2658) - Added
ArgillaOpenAITrainer(#2659) - Added
ArgillaSpanMarkerTrainerfor Named Entity Recognition (#2693) - Added
ArgillaTrainerCLI support. Closes (#2809)
- fix image alignment on token classification
- Argilla quickstart image dependencies are externalized into
quickstart.requirements.txt. See #2666 - bulk endpoints will upsert data when record
idis present. Closes #2535 - moved from
clicktotyperCLI support. Closes (#2815) - Argilla server docker image is built with PostgreSQL support. Closes #2686
- The
rg.logcomputes all batches and raise an error for all failed batches. - The default batch size for
rg.logis now 100.
argilla.trainingbugfixes and unification (#2665)- Resolved several small bugs in the
ArgillaTrainer.
- The
rg.log_asyncfunction is deprecated and will be removed in next minor release.
ARGILLA_HOME_PATHnew environment variable (#2564).ARGILLA_DATABASE_URLnew environment variable (#2564).- Basic support for user roles with
adminandannotator(#2564). id,first_name,last_name,role,inserted_atandupdated_atnew user fields (#2564)./api/usersnew endpoint to list and create users (#2564)./api/users/{user_id}new endpoint to delete users (#2564)./api/workspacesnew endpoint to list and create workspaces (#2564)./api/workspaces/{workspace_id}/usersnew endpoint to list workspace users (#2564)./api/workspaces/{workspace_id}/users/{user_id}new endpoint to create and delete workspace users (#2564).argilla.tasks.users.migratenew task to migrate users from old YAML file to database (#2564).argilla.tasks.users.createnew task to create a user (#2564).argilla.tasks.users.create_defaultnew task to create a user with default credentials (#2564).argilla.tasks.database.migratenew task to execute database migrations (#2564).release.Dockerfileandquickstart.Dockerfilenow creates a defaultargilladatavolume to persist data (#2564).- Add user settings page. Closes #2496
- Added
Argilla.trainingmodule with support forspacy,setfit, andtransformers. Closes #2504
- Now the
prepare_for_trainingmethod is working whenmulti_label=True. Closes #2606
ARGILLA_USERS_DB_FILEenvironment variable now it's only used to migrate users from YAML file to database (#2564).full_nameuser field is now deprecated andfirst_nameandlast_nameshould be used instead (#2564).passworduser field now requires a minimum of8and a maximum of100characters in size (#2564).quickstart.Dockerfileimage default users fromteamandargillatoadminandannotatorincluding new passwords and API keys (#2564).- Datasets to be managed only by users with
adminrole (#2564). - The list of rules is now accessible while metrics are computed. Closes#2117
- Style updates for weak labeling and adding feedback toast when delete rules. See #2626 and #2648
emailuser field (#2564).disableduser field (#2564).- Support for private workspaces (#2564).
ARGILLA_LOCAL_AUTH_DEFAULT_APIKEYandARGILLA_LOCAL_AUTH_DEFAULT_PASSWORDenvironment variables. Usepython -m argilla.tasks.users.create_defaultinstead (#2564).- The old headers for
API Keyandworkspacefrom python client - The default value for old
API Keyconstant. Closes #2251
1.5.1 - 2023-03-30
- Copying datasets between workspaces with proper owner/workspace info. Closes #2562
- Copy dataset with empty workspace to the default user workspace 905d4de
- Using elasticsearch config to request backend version. Closes #2311
- Remove sorting by score in labels. Closes #2622
- Update field name in metadata for image url. See #2609
- Improvements in tutorial doc cards. Closes #2216
1.5.0 - 2023-03-21
- Add the fields to retrieve when loading the data from argilla.
rg.loadtakes too long because of the vector field, even when users don't need it. Closes #2398 - Add new page and components for dataset settings. Closes #2442
- Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key _image_url
- Non-searchable fields support in metadata. #2570
- Add record ID references to the prepare for training methods. Closes #2483
- Add tutorial on Image Classification. #2420
- Add Train button, visible for "admin" role, with code snippets from a selection of libraries. Closes [#2591] (argilla-io/argilla#2591)
- Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see argilla-io/argilla#2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
- The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
- Update "Define a labeling schema" section in docs.
- The record inputs are sorted alphabetically in UI by default. #2581
- The record inputs are fully visible when pagination size is one and the height of collapsed area size is bigger for laptop screen. #2587
- Allow URL to be clickable in Jupyter notebook again. Closes #2527
- Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client
<v1.3.0 - Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version
<1.3.0 - Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.