Merge branch 'add-json-schema' of github.com:wkukka1/ai-autograding-feedback into add-json-schema

Will Kukkamalla · Will Kukkamalla · commit f24a1e073d32 · 2025-07-09T09:29:20.000-04:00
diff --git a/README.md b/README.md
@@ -29,9 +29,8 @@ For the image scope, the program takes up to two files, depending on the prompt
 | Argument             | Description                                                       | Required |
 |----------------------|-------------------------------------------------------------------|----------|
 | `--submission_type`  | Type of submission (from `arg_options.FileType`)                  | ❌ |
-| `--prompt`           | The name of a preddefined prompt file (from `arg_options.Prompt`) | ❌ **|
+| `--prompt`           | Pre-defined prompt name or file path to custom prompt file        | ❌ **|
 | `--prompt_text`      | Additional string text prompt that can be fed to model.           | ❌ ** |
-| `--prompt_custom`    | The name of prompt file uploaded to be used by model.             | ❌ ** |
 | `--scope`            | Processing scope (`image` or `code` or `text`)                    | ✅ |
 | `--submission`       | Submission file path                                              | ✅ |
 | `--question`         | Specific question to evaluate                                     | ❌ |
@@ -41,11 +40,11 @@ For the image scope, the program takes up to two files, depending on the prompt
 | `--test_output`      | File path for the file containing the results from tests          | ❌ |
 | `--submission_image` | File path for the submission image file                           | ❌ |
 | `--solution_image`   | File path for the solution image file                             | ❌ |
-| `--system_prompt`    | File path for the system instructions prompt                      | ❌ |
+| `--system_prompt`    | Pre-defined system prompt name or file path to custom system prompt | ❌ |
 | `--llama_mode`       | How to invoke deepSeek-v3 (choices in `arg_options.LlamaMode`)    | ❌ |
 | `--output_template`  | Output template file (from `arg_options.OutputTemplate)           | ❌ |
 | `--json_schema`      | File path to json file for schema for structured output           | ❌ |
-** One of either prompt, prompt_custom, or prompt_text must be selected.
+** One of either `--prompt` or `--prompt_text` must be selected.
 
 ## Scope
 The program supports three scopes: code or text or image. Depending on which is selected, the program supports different models and prompts tailored for each option.
@@ -67,8 +66,15 @@ The user can also explicitly specify the submission type using the `--submission
 Currently, jupyter notebook, pdf, and python assignments are supported.
 
 ## Prompts
-The user can use this argument to specify which predefined prompt they wish the model to use.
-To view the predefined prompts, navigate to the ai_feedback/data/prompts/user folder. Each prompt is stored as a markdown file that can contain template placeholders with the following structure:
+The `--prompt` argument accepts either pre-defined prompt names or custom file paths:
+
+### Pre-defined Prompts
+To use pre-defined prompts, specify the prompt name (without extension). Pre-defined prompts are stored as markdown (.md) files in the `ai_feedback/data/prompts/user/` directory.
+
+### Custom Prompt Files
+To use custom prompt files, specify the file path to your custom prompt. The file should be a markdown (.md) file.
+
+Prompt files can contain template placeholders with the following structure:
 
 ```markdown
 Consider this question:
@@ -86,7 +92,7 @@ Prompt Naming Conventions:
 - Prompts to be used when --scope image is selected are prefixed with image_{}.md
 - Prompts to be used when --scope text is selected are prefixed with text_{}.md
 
-If the --scope argument is provided and its value does not match the prefix of the selected --prompt, an error message will be displayed.
+Scope validation (prefix matching) only applies to pre-defined prompts. Custom prompt files can be used with any scope.
 
 All prompts are treated as templates that can contain special placeholder blocks, the following template placeholders are automatically replaced:
 - `{context}` - Question context
@@ -123,8 +129,16 @@ All prompts are treated as templates that can contain special placeholder blocks
 ## Prompt_text
 Additonally, the user can pass in a string through the --prompt_text argument. This will either be concatenated to the prompt if --prompt is used or fed in as the only prompt if --prompt is not used.
 
-## Prompt_custom
-The user can pass in their own custom prompt file and use the --prompt_custom argument to flag that the model should use the custom prompt. This can be used instead of choosing one of the predefined prompts.
+## System Prompts
+The `--system_prompt` argument accepts either pre-defined system prompt names or custom file paths:
+
+### Pre-defined System Prompts
+To use pre-defined system prompts, specify the system prompt name (without extension). Pre-defined system prompts are stored as markdown (.md) files in the `ai_feedback/data/prompts/system/` directory.
+
+### Custom System Prompt Files
+To use custom system prompt files, specify the file path to your custom system prompt. The file should be a markdown (.md) file.
+
+System prompts define the AI model's behavior, tone, and approach to providing feedback. They are used to set the context and personality of the AI assistant.
 
 ## Models
 The models used can be seen under the ai_feedback/models folder.
@@ -304,11 +318,17 @@ python3 -m ai_feedback --prompt code_table --scope code \
         --model deepSeek-v3 --llama_mode cli
 ```
 
+
 #### Get annotations for cnn_example test using openAI model
 ```bash
 python -m ai_feedback --prompt code_annotations --scope code --submission test_submissions/cnn_example/cnn_submission --solution test_submissions/cnn_example/cnn_solution.py --model openai --json_schema ai_feedback/data/schema/code_annotation_schema.json
 ```
 
+#### Evaluate using custom prompt file path
+```bash
+python -m ai_feedback --prompt ai_feedback/data/prompts/user/code_overall.md --scope code --submission test_submissions/csc108/correct_submission/correct_submission.py --solution test_submissions/csc108/solution.py --model codellama:latest
+```
+
 #### Using Ollama
 In order to run this project on Bigmouth:
 1. SSH into teach.cs
@@ -346,8 +366,6 @@ Files:
 - python_tester_llm_pdf.py: Runs LLM on any pdf assignment (solution file and submission file) uploaded to the autotester. Creates general feedback about whether the student's written responses matches the instructors feedback. Dislayed in test outputs and overall comments.
 - custom_tester_llm_code.sh: Runs LLM on assignments (solution file, submission file, test output file) uploaded to the custom autotester. Currently, supports jupyter notebook files uploaded. Can specify prompt and model used in the script. Displays in overall comments and in test outputs. Can optionally uncomment the annotations section to display annotations, however the annotations will display on the .txt version of the file uploaded by the student, not the .ipynb file.
 
-<<<<<<< Updated upstream
-
 #### Python AutoTester Usage
 ##### Code Scope
 1. Ensure the student has submitted a submission file (_submission suffixed).
@@ -412,7 +430,7 @@ Also pip install other packages that the submission or solution file uses.
 - Student uploads: test1_submission.ipynb,  test1_submission.txt
 
 NOTE: if the LLM Test Group appears to be blank/does not turn green, try increasing the timeout.
-=======
+
 #### Custom Tester
 - custom_tester_llm_code.sh: Runs LLM on any assignment (solution file, submission file, test output file) uploaded to the autotester. Can specify prompt and model used in the script. Displays in overall comments and in test outputs.
 
@@ -435,4 +453,3 @@ To run the test suite:
 ```console
 $ pytest
 ```
->>>>>>> Stashed changes
diff --git a/ai_feedback/__main__.py b/ai_feedback/__main__.py
@@ -7,7 +7,7 @@
 
 from . import code_processing, image_processing, text_processing
 from .helpers import arg_options
-from .helpers.constants import HELP_MESSAGES, TEST_OUTPUTS_DIRECTORY
+from .helpers.constants import HELP_MESSAGES
 
 
 def detect_submission_type(filename: str) -> str:
@@ -52,26 +52,77 @@ def load_markdown_template(template: str) -> str:
         sys.exit(1)
 
 
-def load_markdown_prompt(prompt_name: str) -> dict:
-    """Loads a markdown prompt file.
+def _load_content_with_fallback(
+    content_arg: str, predefined_values: list[str], predefined_subdir: str, content_type: str
+) -> str:
+    """Generic function to load content by trying pre-defined names first, then treating as file path.
 
     Args:
-        prompt_name (str): Name of the prompt file (without extension)
+        content_arg (str): Either a pre-defined name or a file path
+        predefined_values (list[str]): List of valid pre-defined names
+        predefined_subdir (str): Subdirectory for pre-defined files (e.g., "user", "system")
+        content_type (str): Type of content for error messages (e.g., "prompt", "system prompt")
 
     Returns:
-        dict: Dictionary containing prompt_content
+        str: The content
 
     Raises:
-        SystemExit: If the prompt file is not found
+        SystemExit: If the content cannot be loaded
     """
-    try:
-        prompt_file = os.path.join(os.path.dirname(__file__), f"data/prompts/user/{prompt_name}.md")
-        with open(prompt_file, "r") as file:
-            prompt_content = file.read()
-        return {"prompt_content": prompt_content}
-    except FileNotFoundError:
-        print(f"Error: Prompt file '{prompt_name}.md' not found in user subfolder.")
-        sys.exit(1)
+    # First, check if it's a pre-defined name
+    if content_arg in predefined_values:
+        try:
+            file_path = os.path.join(os.path.dirname(__file__), f"data/prompts/{predefined_subdir}/{content_arg}.md")
+            with open(file_path, "r", encoding='utf-8') as file:
+                return file.read()
+        except FileNotFoundError:
+            print(
+                f"Error: Pre-defined {content_type} file '{content_arg}.md' not found in {predefined_subdir} subfolder."
+            )
+            sys.exit(1)
+    else:
+        # Treat as a file path
+        try:
+            with open(content_arg, "r", encoding='utf-8') as file:
+                return file.read()
+        except FileNotFoundError:
+            print(f"Error: {content_type.title()} file '{content_arg}' not found.")
+            sys.exit(1)
+        except Exception as e:
+            print(f"Error reading {content_type} file '{content_arg}': {e}")
+            sys.exit(1)
+
+
+def load_prompt_content(prompt_arg: str) -> str:
+    """Loads prompt content by trying pre-defined names first, then treating as file path.
+
+    Args:
+        prompt_arg (str): Either a pre-defined prompt name or a file path
+
+    Returns:
+        str: The prompt content
+
+    Raises:
+        SystemExit: If the prompt cannot be loaded
+    """
+    return _load_content_with_fallback(prompt_arg, arg_options.get_enum_values(arg_options.Prompt), "user", "prompt")
+
+
+def load_system_prompt_content(system_prompt_arg: str) -> str:
+    """Loads system prompt content by trying pre-defined names first, then treating as file path.
+
+    Args:
+        system_prompt_arg (str): Either a pre-defined system prompt name or a file path
+
+    Returns:
+        str: The system prompt content
+
+    Raises:
+        SystemExit: If the system prompt cannot be loaded
+    """
+    return _load_content_with_fallback(
+        system_prompt_arg, arg_options.get_enum_values(arg_options.SystemPrompt), "system", "system prompt"
+    )
 
 
 def main() -> int:
@@ -97,12 +148,10 @@ def main() -> int:
     parser.add_argument(
         "--prompt",
         type=str,
-        choices=arg_options.get_enum_values(arg_options.Prompt),
         required=False,
         help=HELP_MESSAGES["prompt"],
     )
     parser.add_argument("--prompt_text", type=str, required=False, help=HELP_MESSAGES["prompt_text"])
-    parser.add_argument("--prompt_custom", type=str, required=False, help=HELP_MESSAGES["prompt_custom"])
     parser.add_argument(
         "--scope",
         type=str,
@@ -147,7 +196,6 @@ def main() -> int:
         "--system_prompt",
         type=str,
         required=False,
-        choices=arg_options.get_enum_values(arg_options.SystemPrompt),
         help=HELP_MESSAGES["system_prompt"],
         default="student_test_feedback",
     )
@@ -175,18 +223,12 @@ def main() -> int:
 
     prompt_content = ""
 
-    system_prompt_path = os.path.join(
-        os.path.dirname(os.path.abspath(__file__)), f"data/prompts/system/{args.system_prompt}.md"
-    )
-    with open(system_prompt_path, encoding='utf-8') as file:
-        system_instructions = file.read()
+    system_instructions = load_system_prompt_content(args.system_prompt)
 
-    if args.prompt_custom:
-        prompt_filename = os.path.join("./", args.prompt_custom)
-        with open(prompt_filename, encoding='utf-8') as prompt_file:
-            prompt_content += prompt_file.read()
-    else:
-        if args.prompt:
+    if args.prompt:
+        # Only validate scope for pre-defined prompts (not for arbitrary file paths)
+        predefined_prompts = arg_options.get_enum_values(arg_options.Prompt)
+        if args.prompt in predefined_prompts:
             if not args.prompt.startswith("image") and args.scope == "image":
                 print("Error: The prompt must start with 'image'. Please re-run the command with a valid prompt.")
                 sys.exit(1)
@@ -197,14 +239,13 @@ def main() -> int:
                 print("Error: The prompt must start with 'text'. Please re-run the command with a valid prompt.")
                 sys.exit(1)
 
-            prompt = load_markdown_prompt(args.prompt)
-            prompt_content += prompt["prompt_content"]
+        prompt_content += load_prompt_content(args.prompt)
 
-        if args.prompt_text:
-            prompt_content += args.prompt_text
+    if args.prompt_text:
+        prompt_content += args.prompt_text
 
     if args.scope == "image":
-        prompt["prompt_content"] = prompt_content
+        prompt = {"prompt_content": prompt_content}
         request, response = image_processing.process_image(args, prompt, system_instructions)
     elif args.scope == "text":
         request, response = text_processing.process_text(args, prompt_content, system_instructions)
diff --git a/ai_feedback/helpers/constants.py b/ai_feedback/helpers/constants.py
@@ -1,9 +1,8 @@
 TEST_OUTPUTS_DIRECTORY = "test_responses_md"
 HELP_MESSAGES = {
     "submission_type": "The format of the submission file (e.g., Jupyter notebook, Python script).",
-    "prompt": "The specific prompt to use for evaluating the assignment.",
+    "prompt": "Pre-defined prompt name (from ai_feedback/data/prompts/user/) or file path to custom prompt file.",
     "prompt_text": "Additional messages to concatenate to the prompt.",
-    "prompt_custom": "The path to a prompt to use.",
     "scope": "The section of the assignment the model should analyze (e.g., code or image).",
     "submission": "The file path for the submission file.",
     "solution": "The file path for the solution file.",
@@ -15,6 +14,6 @@
     "test_output": "The output of tests from evaluating the assignment.",
     "submission_image": "The file path for the image file.",
     "solution_image": "The file path to the solution image.",
-    "system_prompt": "The specific system instructions to send to the AI Model.",
     "json_schema": "file path to a json file that contains the schema for ai output",
+    "system_prompt": "Pre-defined system prompt name (from ai_feedback/data/prompts/system/) or file path to custom system prompt file.",
 }
diff --git a/ai_feedback/helpers/template_utils.py b/ai_feedback/helpers/template_utils.py
@@ -104,28 +104,21 @@ def gather_file_contents(assignment_files: List[Optional[Path]]) -> str:
             # Handle PDF files separately
             if filename.lower().endswith('.pdf'):
                 text_content = extract_pdf_text(file_path)
-                file_contents += f"=== {filename} ===\n"
                 lines = text_content.split('\n')
-                for i, line in enumerate(lines, start=1):
-                    stripped_line = line.rstrip()
-                    if stripped_line.strip():
-                        file_contents += f"(Line {i}) {stripped_line}\n"
-                    else:
-                        file_contents += f"(Line {i}) \n"
-                file_contents += "\n"
             else:
                 # Handle regular text files
                 with open(file_path, "r", encoding="utf-8") as file:
                     lines = file.readlines()
 
-                file_contents += f"=== {filename} ===\n"
-                for i, line in enumerate(lines, start=1):
-                    stripped_line = line.rstrip("\n")
-                    if stripped_line.strip():
-                        file_contents += f"(Line {i}) {stripped_line}\n"
-                    else:
-                        file_contents += f"(Line {i}) {line}"
-                file_contents += "\n"
+            # Common processing for both file types
+            file_contents += f"=== {filename} ===\n"
+            for i, line in enumerate(lines, start=1):
+                stripped_line = line.rstrip('\n').rstrip()
+                if stripped_line.strip():
+                    file_contents += f"(Line {i}) {stripped_line}\n"
+                else:
+                    file_contents += f"(Line {i}) \n"
+            file_contents += "\n"
 
         except Exception as e:
             print(f"Error reading file {filename}: {e}")
diff --git a/promptfoo/promptfoo_test_runner.py b/promptfoo/promptfoo_test_runner.py
@@ -38,8 +38,6 @@ def call_api(prompt: str, context: dict, metadata: dict) -> dict:
             options['prompt'],
             "--llama_mode",
             "server",
-            '--submission_type',
-            submission_type,
             "--output_template",
             "response_and_prompt",
         ]
diff --git a/promptfoo/tests/codellama_tests/codellama_code_tests.yaml b/promptfoo/tests/codellama_tests/codellama_code_tests.yaml
@@ -5,7 +5,6 @@ defaultTest:
   vars:
     model: codellama:latest
     scope: code
-    submission_type: python
 
 scenarios:
   - config:
diff --git a/promptfoo/tests/deepseek_r1_tests/deepseek_r1_code_tests.yaml b/promptfoo/tests/deepseek_r1_tests/deepseek_r1_code_tests.yaml
@@ -5,7 +5,6 @@ defaultTest:
   vars:
     model: deepSeek-R1:70B
     scope: code
-    submission_type: python
 
 scenarios:
   - config:
diff --git a/promptfoo/tests/deepseek_r1_tests/deepseek_r1_text_tests.yaml b/promptfoo/tests/deepseek_r1_tests/deepseek_r1_text_tests.yaml
@@ -5,7 +5,6 @@ defaultTest:
   vars:
     model: deepSeek-R1:70B
     scope: text
-    submission_type: pdf
 
 scenarios:
   - config:
diff --git a/promptfoo/tests/deepseek_v3_tests/deepseek_v3_code_tests.yaml b/promptfoo/tests/deepseek_v3_tests/deepseek_v3_code_tests.yaml
@@ -5,7 +5,6 @@ defaultTest:
   vars:
     model: deepSeek-v3
     scope: code
-    submission_type: python
 
 scenarios:
   - config:
diff --git a/promptfoo/tests/deepseek_v3_tests/deepseek_v3_text_tests.yaml b/promptfoo/tests/deepseek_v3_tests/deepseek_v3_text_tests.yaml
@@ -5,7 +5,6 @@ defaultTest:
   vars:
     model: deepSeek-v3
     scope: text
-    submission_type: pdf
 
 scenarios:
   - config:
diff --git a/promptfoo/tests/llama_3.2_vision_tests/llama_3.2_vision_image_tests.yaml b/promptfoo/tests/llama_3.2_vision_tests/llama_3.2_vision_image_tests.yaml
@@ -5,7 +5,6 @@ defaultTest:
   vars:
     model: llama3.2-vision:90b
     scope: image
-    submission_type: jupyter
 
 scenarios:
   - config:
diff --git a/promptfoo/tests/llama_3.2_vision_tests/llama_3.2_vision_image_tests_with_system_prompt.yaml b/promptfoo/tests/llama_3.2_vision_tests/llama_3.2_vision_image_tests_with_system_prompt.yaml
@@ -5,7 +5,6 @@ defaultTest:
   vars:
     model: llama3.2-vision:90b
     scope: image
-    submission_type: jupyter
     system_prompt: image_style_grader
 
 scenarios:
diff --git a/promptfoo/tests/llava_tests/llava_34b_image_tests.yaml b/promptfoo/tests/llava_tests/llava_34b_image_tests.yaml
diff --git a/promptfoo/tests/llava_tests/llava_34b_image_tests_with_system_prompt.yaml b/promptfoo/tests/llava_tests/llava_34b_image_tests_with_system_prompt.yaml
diff --git a/tests/test_load_content_fallback.py b/tests/test_load_content_fallback.py

Original file line number	Diff line number	Diff line change
`@@ -38,8 +38,6 @@ def call_api(prompt: str, context: dict, metadata: dict) -> dict:`
`38`	`38`	`options['prompt'],`
`39`	`39`	`"--llama_mode",`
`40`	`40`	`"server",`
`41`		`- '--submission_type',`
`42`		`- submission_type,`
`43`	`41`	`"--output_template",`
`44`	`42`	`"response_and_prompt",`
`45`	`43`	`]`