Grading harness for Python exercises

harness.py is essentially a shim to go between Python's unittest and the grading tests written by the task author. It transparently takes care of some of the more common errors that would occur during grading, but it also requires the grading tests to be written slightly differently.

General approach

See this grading test suite for a simple example and review the following sections for more details.

Boilerplate

To write a new grading test suite, add the boilerplate needed to use the test harness:

#!/usr/bin/env python3

# Boilerplate necessary to set up ACCESS test
import sys
try: from universal.harness import *
except: sys.path.append("../../universal/"); from harness import *

# Grading test suite starts here

What happens in detail is not important for writing tests, but to explain: this code will attempt to import the harness from two different locations:

A local module universal.harness. This is for when the test suite is executed on ACCESS or via access-cli, as it will copy the global universal folder into the docker container before grading.
A relatively referenced module ../../universal/. This is for when you execute a test suite locally on your machine using the grade_command. Of course, it requires that you use a directory structure where courses contain assignments and assignments contain tasks. If not, adjust the relative import as needed.

Imports

Next, we typically want to import the student's submission or parts of it. But rather than importing the student's submission directly...

import task.script as implementation

we use the grading_import function:

implementation = grading_import("task", "script")

Or to, for example, import a class from the implementation, instead of doing

from task.script import TheClass

we do:

TheClass = grading_import("task.script", "TheClass")

The grading_import function will ensure that if the student's submission suffers from any kind of problem during the import, all tests will be skipped and an appropriate solution hint generated.

Test Cases

Make sure you inherit from AccessTestCase rather than unittest.TestCase. Then, write tests as usual (remember Python tests should start with the character sequence test).

class GradingTests(AccessTestCase):

    def test_x_is_42(self):
        self.hint("x is not 42")
        self.assertEqual(implementation.x, 42)

Important: Before any action that could fail or error the test, you must call self.hint(...) to provide a solution hint. Whenever a test fails or errors, the latest hint supplied will be used to explain the failure.

Test Runner

Do not use Python unittest's auto-discovery, but instead use TestRunner to execute the grading test suite explicitely. The AccessTestSuite always takes two parameters. The first is the maximum number of points that this task should award, and the second is a list of AccessTestCases:

TestRunner().run(AccessTestSuite(2, [GradingTests]))

So to run your grading test suite locally, execute

python -m grading.tests

Correspondingly, the grade_command in config.toml should always be

grade_command = "python -m grading.tests"

Note that an AccessTestSuite can contain multiple AccessTestCases, and the order of hints provided corresponds to the order of test classes in the suite.

Finally, use access-cli to validate the task as explained in the main README.

Weights and points

By default, each unit test has the same weight. For example, if there are 3 unit tests, and the task awards a maximum of 6 points, then each test will contribute 2 points. If you wish to change the weight of a test, add a @weight(x) annotation, where x is the weight that will be given to this unittest. Typical use cases for this are

If a test will pass even for the template, it should not award any points, so @weight(0) can be used. This can be useful to do basic validation that warns the student if they messed with the template.
If you want to give students an additional challenge that does not need to be completed to achieve full points, you can also set a weight of 0.
If a test is more important, it could have a higher weight than 1.

When all tests have been executed (via TestRunner), the harness will calculate the correct number of points to award based on the max_points defined in config.toml (i.e. (weights_awarded / weights_achievable) * max_points))

Order of failure hints

Since the order of failure hints is important (see course README.md), the harness implicitely controls how hints will be sorted based on test method and test case order. Within an AccessTestCase, the order of test method definitions determines hint priority (earlier tests have higher priority). Within an AccessTestSuite, earlier test cases have higher priority.

Finally, any hint caused by a test error rather than a test failure will receive lower overall priority.

Tests should FAIL, not ERROR

As much as possible, we want to write grading tests that result in a test failure rather than an error. If your grading test errors given a student solution, the test should be augmented with additional checks to result in a failure instead. Here is an example for a test that could error:

    def test_x_is_positive(self):
        self.hint("x is not positive")
        self.assertGreater(implementation.x, 0)

In case the student wrote faulty code where implementation.x is, for example, a string, assertGreater will error, because '>' not supported between instances of 'str' and 'int'. So instead of failing, this test case would error. Thus, the test suite should check if x is a number before checking its value.

    def test_x_is_positive(self):
        self.hint("x is not a number")
        self.assertIsInstance(implementation.x, numbers.Number)
        self.hint("x is not positive")
        self.assertGreater(implementation.x, 0)

or more broadly:

    def test_x_is_positive(self):
        self.hint("x is not a positive number")
        try:
            self.assertGreater(implementation.x, 0)
        except:
            self.fail()

It often makes sense to implement such checks in a separate (non-test) method and to call that method in every test case. You may similarily be able to use setUp(), but if you do, make sure you call super().setUp() inside it.

Input function

The harness replaces builtins.input with an implementation that always fails. If a student uses input(), they will receive a hint that this is forbidden.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grading harness for Python exercises

General approach

Boilerplate

Imports

Test Cases

Test Runner

Weights and points

Order of failure hints

Tests should FAIL, not ERROR

Input function

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Grading harness for Python exercises

General approach

Boilerplate

Imports

Test Cases

Test Runner

Weights and points

Order of failure hints

Tests should FAIL, not ERROR

Input function