Skip to content

Commit c9d8804

Browse files
authored
Merge pull request #152 from alphagov/remove-guardrail-evaluations
Remove Guardrail Evaluation Code
2 parents 6f40d7a + 66a80eb commit c9d8804

6 files changed

Lines changed: 0 additions & 397 deletions

File tree

docs/guardrails.md

Lines changed: 0 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -38,45 +38,6 @@ The file also contains the prompts we use to run the guardrails. Copy/paste thes
3838

3939
You can also use the playground to ask the reasoning behind any response it gives.
4040

41-
## Evaluation
42-
43-
There is a rake task `guardrails:evaluate_guardrails` that will read a CSV and call the LLM with the content of the `input` column, compare the result to the `output` column and output metrics like this
44-
45-
```
46-
{:count=>116,
47-
:percent_correct=>66.38,
48-
:exact_match_count=>77,
49-
:failure_count=>39,
50-
:average_latency=>0.6342312241422718,
51-
:max_latency=>1.213642000220716,
52-
:average_prompt_token_count=>1011,
53-
:max_prompt_token_count=>2582,
54-
:failures=> [
55-
{:input=>“This is a false statement that will fail guardrails”,
56-
:expected=>"True | \"1\"",
57-
:actual=>"True | \"1, 4\""},
58-
]
59-
}
60-
```
61-
This is intended to be run whenever we change the prompts and examples so we know if we are improving the performance.
62-
63-
This rake task takes 3 arguments
64-
* `guardrail_type` (required) - the guardrail type you want to run the evaluation on. This must be `answer_guardrails` or `question_routing_guardrails`
65-
* `dataset_path` (required) - the path of the dataset you want to run the evaluation on
66-
* `output_path` (optional) - a path to write the JSON to. The rake task will output to stdout if an `output_path` is not provided
67-
68-
Here is an example of how to run the rake task and write the output to a file:
69-
70-
```
71-
rake guardrails:evaluate_guardrails["answer_guardrails", "example/dataset.csv", "path_to_write_to"]
72-
```
73-
74-
And here it an example that outputs to stdout:
75-
76-
```
77-
rake guardrails:evaluate_guardrails["answer_guardrails", "example/dataset.csv"]
78-
```
79-
8041
## Printing prompts
8142

8243
The `guardrails:print_prompts` rake task outputs the combined system and user prompt for the answer or question routing guardrails. It takes one argument

lib/guardrails/evaluation.rb

Lines changed: 0 additions & 164 deletions
This file was deleted.

lib/tasks/guardrails.rake

Lines changed: 0 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,4 @@
11
namespace "guardrails" do
2-
desc "Output guardrail evaluation using Guardrails::MultipleChecker"
3-
task :evaluate_guardrails, %i[guardrail_type dataset_path output_path llm_provider] => :environment do |_, args|
4-
guardrail_type = args[:guardrail_type].to_sym
5-
valid_guardrail_types = %i[answer_guardrails question_routing_guardrails]
6-
if guardrail_type.blank? || valid_guardrail_types.exclude?(guardrail_type)
7-
abort("Invalid guardrail type. Valid guardrail types are #{valid_guardrail_types.to_sentence}")
8-
end
9-
10-
dataset_path = args[:dataset_path]
11-
if dataset_path.blank?
12-
abort("No dataset path provided")
13-
end
14-
15-
dataset_absolute_path = Pathname.new(Dir.pwd).join(args[:dataset_path])
16-
unless File.exist?(dataset_absolute_path)
17-
abort("No file found at #{dataset_absolute_path}")
18-
end
19-
20-
output_path = args[:output_path]
21-
llm_provider = (args[:llm_provider] || :openai).to_sym
22-
valid_providers = %i[openai claude]
23-
if valid_providers.exclude?(llm_provider)
24-
abort("Invalid LLM provider. Valid providers are #{valid_providers.to_sentence}")
25-
end
26-
27-
true_eval = ->(v) { v != "False | None" }
28-
29-
prompt_token_counts = []
30-
31-
results = Guardrails::Evaluation.call(dataset_absolute_path, true_eval:) do |input|
32-
result = Guardrails::MultipleChecker.call(input, guardrail_type, llm_provider)
33-
prompt_token_counts << result.llm_prompt_tokens
34-
result.llm_guardrail_result
35-
rescue Guardrails::MultipleChecker::ResponseError => e
36-
prompt_token_counts << e.llm_prompt_tokens
37-
"ERR: #{e.llm_response}"
38-
end
39-
40-
average_prompt_token_count = prompt_token_counts.sum / prompt_token_counts.size
41-
42-
results.merge!(
43-
average_prompt_token_count:,
44-
max_prompt_token_count: prompt_token_counts.max,
45-
)
46-
47-
if output_path.nil?
48-
pp results
49-
else
50-
pp results.slice(:model, :count, :percent_correct, :precision, :recall, :average_latency)
51-
File.write(output_path, JSON.pretty_generate(results))
52-
53-
puts "Full results have been saved to: #{output_path}"
54-
end
55-
end
56-
572
desc "Print prompts for a guardrail type"
583
task :print_prompts, %i[guardrail_type llm_provider] => :environment do |_, args|
594
guardrail_type = args[:guardrail_type].to_sym

spec/lib/guardrails/evaluation_spec.rb

Lines changed: 0 additions & 46 deletions
This file was deleted.

0 commit comments

Comments
 (0)