-
Notifications
You must be signed in to change notification settings - Fork 2
Test Framework
Testing is a major part of any software project, for many reasons:
- It allows us to verify that a given feature is working as expected when a new feature is added.
- When we add a new feature, having existing tests allow us to verify that that new feature doesn't break existing behavior.
- Tests also serve as a form of documentation when done well. If you think about it, when we make a new change and get all tests to pass, what we are really doing is documenting the current behavior of the system with all of our tests. In the future, if and when that behavior is modified, that test will fail, notifying us that this behavior has changed. We can then decide if this failure was intentional/expected (and change the test to match the new behavior), or if we have caught a bug (and we need to go back and change the implementation of the new feature).
Our test framework contains two broad parts:
- Command Line Interfaces (CLIs) which are tools that allow you to treat Runtime as a "black box" and provide it inputs by pretending to be Dawn, Shepherd, or Arduino devices connecting / disconnecting from Runtime.
- Automated Testing which allows us to do the three points mentioned above
At an extremely high level, the way we test Runtime is by "black-boxing" it, i.e. we try to abstract Runtime itself away as just a "function" (a black box) that receives input and generates some output as a result of those inputs. The inputs and outputs that we receive can be visualized in the following diagram:
TODO: add a diagram here showing the inputs and outputs of Runtime
This means that in order to test Runtime, we need three things:
- Some way to "spawn" Runtime locally on a computer, without any external stimuli (ex. Shepherd connecting, Dawn connecting, devices connecting...)
- A set of functions that we can call from a test that can send commands to this "spawned" Runtime that mimics external stimuli interacting with Runtime
- Some way to examine the state of Runtime at any given time and capture the outputs generated by Runtime throughout the test so that we can actually verify that Runtime is behaving correctly for the given set of inputs.
The first point is accomplished by writing functions in the process clients that start and stop their respective processes. For example, the net_handler_client
has two functions, net_handler_start
and net_handler_stop
, which spawn and kill net_handler
running locally on the machine.
The second point is accomplished by writing additional functions in the process clients for the tests to call to either send Runtime input or retrieve/examine its output. For example, shm_client
has the print_shm
function, which prints out the state of shared memory at that time for use in verifying the state of the system.
The third point is accomplished by the aforementioned print_shm
function to examine the state of Runtime (virtually all of Runtime's static internal state is contained within shared memory) and special functions in net_handler_client
that "dump" received TCP and UDP data from Runtime to the screen. So, when using net_handler_client
, the output you see on the screen is exactly what you would see in the Dawn console by the student.
Test Cases (tc
s) are named like so: tc_<Github issue number>_<test number>
. The Github issue number is the number of the Github issue that defined that behavior, not the issue that called for the tests. In other words, suppose there is an issue that calls for a certain new feature in Runtime to be added. It's common for there to be a separate issue that is made for writing tests to make sure the new behavior caused by that new feature is being implemented. The tc
name should reference the issue in which the new feature is described, not the issue that calls for the tests. This is so that if and when that test fails in the future, the person whose changes are causing that test to fail can go back to that Github issue and read the exact definition of the feature for which the failing test was written to test. The "exception" to this rule is our first set of tests, which all reference Github issue 68. That issue contains a link to a master document describing the behavior of Runtime at the time of version 1.0, and so all of the tests in that first set were written to verify Runtime version 1.0 (and to verify that this test framework was functional).
To explain the usefulness of this convention, let's go through an example of a possible workflow:
- A new feature that implements some new behavior is completed. The issue number that defines this behavior is issue 100, and the person who implemented that behavior has written up a good description of the new behavior, including edge-case handling, in that issue.
- Tests are written to verify that the new feature is working, numbered
tc_100_1
,tc_100_2
,tc_100_3
, andtc_100_4
. Once those tests pass, they are merged intomaster
and become part of the Runtime codebase. - Some time later (weeks, months, etc.), a new feature that has some effects on the behavior defined in issue 100 is implemented by a different person.
- Tests are written to verify that the new feature is working. However, when the test suite is run, we see that
tc_100_1
,tc_100_2
,tc_100_3
, andtc_100_4
all fail. It is easy for the person writing the tests to determine what previous behavior was being tested with those failing tests, and determine if the tests are failing expectedly (because the newly implemented behavior changed past behavior) or the failing tests are catching a bug in the new implementation! This is the whole point, because without the convention and the ease of finding the definition of past behavior, it is really hard to find what the test was actually supposed to be testing, what feature / behavior was being verified by that test, and whether that behavior needs to change with the new feature.
If new features override the behavior of old features, the test case should retain the same name, but a comment added to the old Github issue referencing the new Github issue saying that the behavior has changed and the test case was modified to reflect that change in behavior. Then, go ahead and modify the test so that it passes and now verifies the new behavior.
This section does a deep dive into some of the more obtuse and quirky parts of the test framework that we felt needed some explanation.
Recall that the goal of the test framework is to be able to "black-box" Runtime: spawn Runtime by starting up its processes/components, give it some inputs, examine its outputs, verify that they're correct, and then terminate Runtime—all from within a test program (not manually done). The process clients and process CLIs help us do exactly that.
For each process in Runtime (and shared memory), there is a corresponding client for that process. The client is not another process! Rather, it is an interface for a program to use to interact with the corresponding Runtime process. For example, the net_handler_client.h
file defines the functions that a program can use to interact with net_handler
. These functions include: sending various types of messages to Runtime via TCP or UDP connections, starting and stopping net_handler
, and viewing the device data coming back from Runtime on UDP. Notice that this includes starting and stopping net_handler
. To understand how net_handler_client
(and the other clients as well) start and stop their respective Runtime processes, first read the wiki page on processes for an overview of UNIX processes and management.
Starting the corresponding process is done by forking the program, and then calling execlp
on the actual executable for the corresponding Runtime process to spawn the Runtime process. Stopping the Runtime process is done by sending the spawned Runtime process SIGINT
with the kill
function, and then waiting for the process to terminate using the waitpid
function.
For each process in Runtime (and shared memory), there is also a corresponding CLI for that process. The CLI is a process that uses the corresponding client to allow a user to interact with the corresponding Runtime process via the command line. This is really useful for getting to know Runtime, or just poking the system around manually. The CLIs also provide a way to test the test framework and clients manually, without trying to write tests to test if the clients are working.
All of the interfaces provided by all of the clients COMBINED, plus the functions provided in test.h
, make up the functions that are available for use in automated tests in tests/integration
. This combined interface should give the test the ability to start and stop any Runtime process, to retrieve the output and examine the state of Runtime, and to provide inputs to the system. The diagram below shows the relationship between an automated test, the net_handler_cli
, net_handler_client
, and net_handler
itself (this diagram applies in general (sort of—more caveats later) to all Runtime client/CLI/process trios):
TODO: add a diagram here kinda similar to the one Vincent had about dev_handler_client
when planning.
Below is a diagram showing the typical usage of a client (here, net_handler_client
is used as an example) and the lifetime of the spawned net_handler
process:
TODO: add a digram here showing the above
The goal of this section is to describe how we get executor
to be able to find sample student code used for testing in the tests/student_code
folder in our library.
Every Python file (<something>.py
) is a Python module. You're probably familiar with the import
keyword in Python; what it's doing is that it's specifying that the current module should import
some other module on your system. Let's illustrate with an example:
Suppose I have a directory called foo
and I have a file bar.py
and another file baz.py
in that directory. Since they're in the same directory, if in baz.py
I write import bar
, baz
will have imported bar.py
as a module for use in baz.py
. Now, in bar.py
, I can call functions in baz.py
!
In executor
, we normally execute studentcode.py
, which is located in the executor
directory. In order for executor
to run student code, it must import it as a module. (More on this in the executor wiki.) So, we essentially do an import studentcode
and voila, we can access all of the student code functions in executor
. The problem is that when you try to import a module that is outside the directory that you're in, Python doesn't know where to begin to look for those modules! (This is actually smart, you don't want Python to go searching your entire file system for a module, which would take forever.) This is the problem that we run into when trying to import modules in tests/student_code
from executor
.
The way to tell Python where to look for modules is by using the PYTHONPATH
environment variable. In other words, if we're in the executor
folder and we add ../tests/student_code
to PYTHONPATH
, all of a sudden executor
will know to search in that folder for modules, and will find the sample student code for the tests.
There are two places where we would like the ability to add that folder to the PYTHONPATH
environment variable:
- In automated tests, to supply
executor
with student code to run tests on - In the CLI, to try and manually recreate tests (or for experimenting with Runtime without touching
studentcode.py
)
In the first case, the modification of the PYTHONPATH
environment variable is done by the test.sh
shell script that we use to run automated tests.
In the second case, the modification of the PYTHONPATH
environment variable is done by the start_executor
function in executor_client.c
, with all of the logic that uses getenv
and setenv
before calling execlp
on executor
to start up the actual Runtime executor
process.
The shm_client
is unlike the other clients in that it does not spawn a process in its start
function that will run in the background until terminated in its stop
function. This is because shared memory is initialized by the shm_start
program, which terminates normally by itself (without being interrupted by SIGINT
), and stopped by the shm_stop
program, which terminates normally by itself as well.
Therefore, the shm_client
start
function calls execlp
on the shm_start
executable in the child process after the call to fork
, which starts the shm_start
function. The parent process then calls waitpid
to wait for the child (shm_start
) to finish, before returning. Similarly, the shm_client
stop
function calls execlp
on the shm_stop
executable in the child process after the call to fork
, and the parent process waits for the child (shm_stop
) to finish, before returning.
Crucially, in the start
function, the parent calls shm_init
after waiting for the child to finish. This allows the program using the shm_client
to view the contents of shared memory by calling the print_shm
function (useful in both the CLI and in tests). Without it, the print_shm
function would generate a segmentation fault because the semaphores and shared memory blocks would not have been initialized yet. The following is a timeline of events for shm_client
:
TODO: add a diagram here explaining the above
@levincent06
This logic is in test.c
, and explains how we are able to record what the test is outputting for comparison after the test has finished while also being able to see the output of the test in real time in the terminal. (This diagram can be found in docs/Test-Output-Redirect-Logic.png
)
- Important
- Advanced/Specific