naturalistic-ai

This is the repository to share/centralize code for Naturalistic Uses of AI.

Setup

Using Python >=3.8:

pip install -r requirements.txt

Load Datasets

# If necessary:
huggingface-cli login

python src/scripts/load_datasets.py --dataset wildchat_v1 --save data/wildchat_v1_sample.csv --sample 100

Run GPT Annotations

Set your OPENAI API KEY:

echo "OPENAI_API_KEY='<your_key>'" > data/.env

Create json file, with a list of input examples like data/tests/sample.json.
Run GPT annotation on the examples, using this script:

bash run_gpt.sh

The run_gpt.sh script supports various arguments:

input_format: Type of prompt used for passing conversation logs ("free": as free-text, "json": in distinct JSON format)
level_id: Prompt level ID [str]
prompt_id: Prompt system ID [str]
model_id: Name of model (choices=["gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-3.5-turbo", "o3-mini"]) [str]
input: Input json file [str]
save: Output jsonl file [str]

The current run_gpt.sh has sample commands for running 1) Media format (prompt level), 2) Answer form (response level), and 3) Topic (turn level) for test json file in data/static/test_cedric.json using GPTo3-mini.

Run Simple Annotations

The simple annotations include annotations for the text length, word count, character count, token count, and language detection via Lingua. Running the simple annotation pipeline requires 2 steps:

Find the path to a json file of your examples to annotate, as with the GPT annotations above.
Run the script with the following arguments:

python run_simple_annotations.py \
  --input path/to/input.json \
  --save path/to/output/annotations_folder      <-this is a folder, not a file! A file will be created for each feature.

Each output file will be a json file with the same IDs as your input file. For the language annotation, the annotation will be a list of all languages found in the text through the Lingua library. For specific details / choices on language identification, see the documentation for the language prediction function here. For all other annotations, the annotation will be a string of the annotation measure.

Run Evaluations

Run evaluation on the GPT annotations, using this script:

python -u src/scripts/evaluator/form_checker.py \
    --input_dir $INPUT_DIRECTORY \
    --save $SAVE_CSV_FILE;

input_dir: Input directory containing GPT annotations (in jsonl files) to evaluate
save: Name of csv file to save the evaluation results

Output will be saved as csv file in the following format:

level_id,prompt_id,total_entries,invalid_Invalid JSON list,invalid_Item is not a dictionary,invalid_Missing keys,invalid_Confidence out of range,invalid_Invalid option,conf_[0.0-0.2),conf_[0.2-0.4),conf_[0.4-0.6),conf_[0.6-0.8),conf_[0.8-1.0],avg_preds_per_row,unique_labels,label_entropy,label_gini
prompt,interaction_features,597,0,0,1,0,0,0.96,1.11,0.32,2.71,94.9,1.05,6 / 6,1.0856,0.3693
prompt,media_format,597,1,0,2,0,4,0.0,0.09,0.27,1.01,98.63,1.85,11 / 11,1.9362,0.651
prompt,multi_turn_relationship,597,0,0,0,0,0,0.0,0.33,0.16,2.14,97.36,1.02,5 / 5,2.2077,0.7654
prompt,topic,597,0,0,1,0,12,0.13,0.13,1.14,14.16,84.45,1.35,36 / 39,4.4549,0.9395
response,answer_form,597,0,0,0,0,0,0.0,0.16,0.0,1.63,98.21,1.03,5 / 6,0.8211,0.247
response,interaction_features,597,0,0,0,0,0,0.2,0.2,0.2,2.85,96.54,0.82,8 / 9,1.3246,0.4023
response,media_format,597,0,0,0,0,1,0.0,0.0,0.08,1.45,98.47,2.08,10 / 11,1.942,0.6761
turn,sensitive_use_flags,597,0,0,0,0,1,0.0,0.0,0.0,4.11,95.89,0.9,19 / 24,1.0779,0.2639
turn,topic,597,0,0,0,0,5,0.17,0.09,1.14,27.3,71.3,1.93,35 / 37,3.9428,0.8918

level_id: Prompt level ID
prompt_id: Prompt system ID
total_entries: Total number of entries used in evaluation (597 for 120 rows)
Errors are categorized in the following groups:
- Invalid JSON: Model response not parseable as JSON
- Invalid dict: Parseable but not in dictionary format
- Missing keys: Parseable but "labels" and "confidence" keys are missing
- Confidence range: Parseable but confidence score is out of range (not in 0-1)
- Invalid option: Parseable but the option generated is not in the taxonomy's options
conf_{bucket}: Confidence buckets (from 0.0 to 1.0)
avg_preds_per_row: Average prediction (labels) per annotation (for multi-label)
unique_labels: Number of labels used / Total number of labels
label_entropy, label_gini: Measure of diversity of label predictions across the responses (higher, more diverse)

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
data/static		data/static
notebooks		notebooks
res/tests		res/tests
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_gpt.sh		run_gpt.sh
tests.ipynb		tests.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

naturalistic-ai

Setup

Load Datasets

Run GPT Annotations

Run Simple Annotations

Run Evaluations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

naturalistic-ai

Setup

Load Datasets

Run GPT Annotations

Run Simple Annotations

Run Evaluations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages