alphagov
diff --git a/‎docs/guardrails.md‎
Lines changed: 0 additions & 39 deletions b/‎docs/guardrails.md‎
Lines changed: 0 additions & 39 deletions
diff --git a/‎lib/guardrails/evaluation.rb‎
Lines changed: 0 additions & 164 deletions b/‎lib/guardrails/evaluation.rb‎
Lines changed: 0 additions & 164 deletions
diff --git a/‎lib/tasks/guardrails.rake‎
Lines changed: 0 additions & 55 deletions b/‎lib/tasks/guardrails.rake‎
Lines changed: 0 additions & 55 deletions
diff --git a/‎spec/lib/guardrails/evaluation_spec.rb‎
Lines changed: 0 additions & 46 deletions b/‎spec/lib/guardrails/evaluation_spec.rb‎
Lines changed: 0 additions & 46 deletions
@@ -38,45 +38,6 @@ The file also contains the prompts we use to run the guardrails. Copy/paste thes
 
 You can also use the playground to ask the reasoning behind any response it gives.
 
-## Evaluation
-
-There is a rake task `guardrails:evaluate_guardrails` that will read a CSV and call the LLM with the content of the `input` column, compare the result to the `output` column and output metrics like this
-
-```
-{:count=>116,
- :percent_correct=>66.38,
- :exact_match_count=>77,
- :failure_count=>39,
- :average_latency=>0.6342312241422718,
- :max_latency=>1.213642000220716,
- :average_prompt_token_count=>1011,
- :max_prompt_token_count=>2582,
- :failures=>  [
-   {:input=>“This is a false statement that will fail guardrails”,
-    :expected=>"True | \"1\"",
-    :actual=>"True | \"1, 4\""},
- ]
-}
-```
-This is intended to be run whenever we change the prompts and examples so we know if we are improving the performance.
-
-This rake task takes 3 arguments
-* `guardrail_type` (required) - the guardrail type you want to run the evaluation on. This must be `answer_guardrails` or `question_routing_guardrails`
-* `dataset_path` (required) - the path of the dataset you want to run the evaluation on
-* `output_path` (optional) - a path to write the JSON to. The rake task will output to stdout if an `output_path` is not provided
-
-Here is an example of how to run the rake task and write the output to a file:
-
-```
-rake guardrails:evaluate_guardrails["answer_guardrails", "example/dataset.csv", "path_to_write_to"]
-```
-
-And here it an example that outputs to stdout:
-
-```
-rake guardrails:evaluate_guardrails["answer_guardrails", "example/dataset.csv"]
-```
-
 ## Printing prompts
 
 The `guardrails:print_prompts` rake task outputs the combined system and user prompt for the answer or question routing guardrails. It takes one argument
 
@@ -1,59 +1,4 @@
 namespace "guardrails" do
-  desc "Output guardrail evaluation using Guardrails::MultipleChecker"
-  task :evaluate_guardrails, %i[guardrail_type dataset_path output_path llm_provider] => :environment do |_, args|
-    guardrail_type = args[:guardrail_type].to_sym
-    valid_guardrail_types = %i[answer_guardrails question_routing_guardrails]
-    if guardrail_type.blank? || valid_guardrail_types.exclude?(guardrail_type)
-      abort("Invalid guardrail type. Valid guardrail types are #{valid_guardrail_types.to_sentence}")
-    end
-
-    dataset_path = args[:dataset_path]
-    if dataset_path.blank?
-      abort("No dataset path provided")
-    end
-
-    dataset_absolute_path = Pathname.new(Dir.pwd).join(args[:dataset_path])
-    unless File.exist?(dataset_absolute_path)
-      abort("No file found at #{dataset_absolute_path}")
-    end
-
-    output_path = args[:output_path]
-    llm_provider = (args[:llm_provider] || :openai).to_sym
-    valid_providers = %i[openai claude]
-    if valid_providers.exclude?(llm_provider)
-      abort("Invalid LLM provider. Valid providers are #{valid_providers.to_sentence}")
-    end
-
-    true_eval = ->(v) { v != "False | None" }
-
-    prompt_token_counts = []
-
-    results = Guardrails::Evaluation.call(dataset_absolute_path, true_eval:) do |input|
-      result = Guardrails::MultipleChecker.call(input, guardrail_type, llm_provider)
-      prompt_token_counts << result.llm_prompt_tokens
-      result.llm_guardrail_result
-    rescue Guardrails::MultipleChecker::ResponseError => e
-      prompt_token_counts << e.llm_prompt_tokens
-      "ERR: #{e.llm_response}"
-    end
-
-    average_prompt_token_count = prompt_token_counts.sum / prompt_token_counts.size
-
-    results.merge!(
-      average_prompt_token_count:,
-      max_prompt_token_count: prompt_token_counts.max,
-    )
-
-    if output_path.nil?
-      pp results
-    else
-      pp results.slice(:model, :count, :percent_correct, :precision, :recall, :average_latency)
-      File.write(output_path, JSON.pretty_generate(results))
-
-      puts "Full results have been saved to: #{output_path}"
-    end
-  end
-
   desc "Print prompts for a guardrail type"
   task :print_prompts, %i[guardrail_type llm_provider] => :environment do |_, args|
     guardrail_type = args[:guardrail_type].to_sym