Argilla is a foss Data Annotation tool.
Argilla in Docker
- By default see ms to want to use ElasticSearch
- does not have any credentials for elasticsearch
- adding env var to disable security on elastic seems to work:
- The default credentials are
argilla
and1234
Argilla User Management
Mainly done via the CLI as documented here.
Log in to Argilla
The user API key is available the user settings page
Create a User
There is currently no way for users to set their own passwords or API keys so use a secure example or generate one.
Setting --workspace
during creation saves running a second command later. This arg can be set multiple times to add them to multiple workspaces.
Changing a user’s role
Can only be done from the server via the server database users
command.
Adding Users to a Workspace
Can be done from the client CLI:
Datasets
Create Dataset
Define your dataset in python, run push_to_argilla
and use the resulting RemoteDataset
to manipulate data entries:
Add data to Dataset
We can send embeddings alongside our text which enables argilla to “find similar” examples and so on.
Moving Responses Around
If you need to move responses from one copy of a dataset to another you can but it is tricky. The documented dataset.update_records() method does not seem to handle upserts on responses.
I was able to achieve the use case I wanted by taking a backup and then deleting and re-creating the records in the combined dataset.
Let’s assume we want to copy all of test
user’s annotations from their personal dataset source_ds
into a group dataset called target_ds
. This code assumes that both datasets contain a common set of labels where the text and metadata are the same.
Evaluation
Merging Answers for Evaluation
Related to Moving Answers Around above, we may want to take a set of responses from single datasets in separate workspaces and merge them together in order to make use of Argilla’s built in Krippendorff’s alpha functionality.
Then we create a local dataset with the same properties as the remote datasets that we’ve been iterating through
We add the records and responses that we collected above and compute agreement label for the question we care about
Per-Label Krippendorff’s Alpha for Multi-Label Data
Argilla offers the ability to calculate an overall Krippendorff’s Alpha score for the full dataset but does not currently support per-label calculations are useful for if you need to understand which labels annotators find hardest to get their heads around.
When calculating K-A for multi-label data, we treat each label as a binary yes/no problem. We build a matrix m
of NUM_USERS x NUM_DATAPOINTS
and iterate. We assign position m[user][dpoint]
a 1
if the user tagged the given question with the label and 0
if they didn’t.
We should know the difference between the user not labelling the data and the user skipping the question because in the former case Argilla will store a response with an empty array for its value.
Refer to the above code snippets for getting a local dataset with all responses in one place. Then we can use something like the following:
Storing Data Locally
The easiest way to export a dataset is to go via datasets
and create a pandas dataframe. From there we can format as required:
Prepare Dataset for Training
Multi-Label Dataset with records that have no label
If you have a multi-label problem where none of the labels apply and you try to use prepare_for_training
you will get a KeyError
due to the code trying to locate the question’s key
in the values property of the row that doesn’t have a corresponding response.
For example:
│ 547 │ │ │ responses = [resp for resp in rec.responses if resp.status == "submitted"] │
│ 548 │ │ │ # get responses with a value that is most frequent │
│ 549 │ │ │ for resp in responses: │
│ ❱ 550 │ │ │ │ if isinstance(resp.values[question].value, list): │
│ 551 │ │ │ │ │ for value in resp.values[question].value: │
│ 552 │ │ │ │ │ │ counter.update([value]) │
│ 553 │ │ │ │ else: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'label'
The workaround I used was implementing a NotRelevant
tag for the multi-label question so that all rows always have a response. Then if we want to we can actually remove the not relevant rows in post processing. However, it may be helpful to have a NotRelevant output on a multi-class classifier anyway.
I haven’t yet figured out if there is an easy way to remove records with no label from the dataset before training but this would probably be a good option too.
We can use some code to find records that don’t have a corresponding label with ease:
Multi-label prep where some labels have no records
The prepare_for_training
method currently breaks label indexing by silently removing labels that have zero examples from the binarized_label object that is prepared. The issue is documented here.
The current workaround is to manually manipulate the MultiLabelQuestion.labels
property and remove unused labels:
Rules and ElasticSearch
It is possible to define ElasticSearch Queries to automatically apply labels to documents via rules in the platform.
Active Learning
Argilla provides active learning via the Small Text library.
Few-Shot Learning
Assuming you have a small number of annotated examples to start with you can use SetFit and Python to train a model.
Once we have a trained model we can trivially use it to provide a best estimate at a label for a dataset: