Skip to content

Test Framework

benliao1 edited this page Aug 4, 2020 · 24 revisions

The Importance of Testing

Testing is a major part of any software project, for many reasons:

  1. It allows us to verify that a given feature is working as expected when a new feature is added.
  2. When we add a new feature, having existing tests allow us to verify that that new feature doesn't break existing behavior.
  3. Tests also serve as a form of documentation when done well. If you think about it, when we make a new change and get all tests to pass, what we are really doing is documenting the current behavior of the system with all of our tests. In the future, if and when that behavior is modified, that test will fail, notifying us that this behavior has changed. We can then decide if this failure was intentional/expected (and change the test to match the new behavior), or if we have caught a bug (and we need to go back and change the implementation of the new feature).

Runtime Test Framework Overview

Our test framework contains two broad parts:

  1. Command Line Interfaces (CLIs) which are tools that allow you to treat Runtime as a "black box" and provide it inputs by pretending to be Dawn, Shepherd, or Arduino devices connecting / disconnecting from Runtime.
  2. Automated Testing which allows us to do the three points mentioned above

At an extremely high level, the way we test Runtime is by "black-boxing" it, i.e. we try to abstract Runtime itself away as just a "function" (a black box) that receives input and generates some output as a result of those inputs. The inputs and outputs that we receive can be visualized in the following diagram:

TODO: add a diagram here showing the inputs and outputs of Runtime

This means that in order to test Runtime, we need three things:

  1. Some way to "spawn" Runtime locally on a computer, without any external stimuli (ex. Shepherd connecting, Dawn connecting, devices connecting...)
  2. A set of functions that we can call from a test that can send commands to this "spawned" Runtime that mimics external stimuli interacting with Runtime
  3. Some way to examine the state of Runtime at any given time and capture the outputs generated by Runtime throughout the test so that we can actually verify that Runtime is behaving correctly for the given set of inputs.

The first point is accomplished by writing functions in the process clients that start and stop their respective processes. For example, the net_handler_client has two functions, net_handler_start and net_handler_stop, which spawn and kill net_handler running locally on the machine.

The second point is accomplished by writing additional functions in the process clients for the tests to call to either send Runtime input or retrieve/examine its output. For example, shm_client has the print_shm function, which prints out the state of shared memory at that time for use in verifying the state of the system.

The third point is accomplished by the aforementioned print_shm function to examine the state of Runtime (virtually all of Runtime's static internal state is contained within shared memory) and special functions in net_handler_client that "dump" received TCP and UDP data from Runtime to the screen. So, when using net_handler_client, the output you see on the screen is exactly what you would see in the Dawn console by the student.

Test Naming Convention

Test Cases (tcs) are named like so: tc_<Github issue number>_<test number>. The Github issue number is the number of the Github issue that defined that behavior, not the issue that called for the tests. In other words, suppose there is an issue that calls for a certain new feature in Runtime to be added. It's common for there to be a separate issue that is made for writing tests to make sure the new behavior caused by that new feature is being implemented. The tc name should reference the issue in which the new feature is described, not the issue that calls for the tests. This is so that if and when that test fails in the future, the person whose changes are causing that test to fail can go back to that Github issue and read the exact definition of the feature for which the failing test was written to test. The "exception" to this rule is our first set of tests, which all reference Github issue 68. That issue contains a link to a master document describing the behavior of Runtime at the time of version 1.0, and so all of the tests in that first set were written to verify Runtime version 1.0 (and to verify that this test framework was functional).

To explain the usefulness of this convention, let's go through an example of a possible workflow:

  1. A new feature that implements some new behavior is completed. The issue number that defines this behavior is issue 100, and the person who implemented that behavior has written up a good description of the new behavior, including edge-case handling, in that issue.
  2. Tests are written to verify that the new feature is working, numbered tc_100_1, tc_100_2, tc_100_3, and tc_100_4. Once those tests pass, they are merged into master and become part of the Runtime codebase.
  3. Some time later (weeks, months, etc.), a new feature that has some effects on the behavior defined in issue 100 is implemented by a different person.
  4. Tests are written to verify that the new feature is working. However, when the test suite is run, we see that tc_100_1, tc_100_2, tc_100_3, and tc_100_4 all fail. It is easy for the person writing the tests to determine what previous behavior was being tested with those failing tests, and determine if the tests are failing expectedly (because the newly implemented behavior changed past behavior) or the failing tests are catching a bug in the new implementation! This is the whole point, because without the convention and the ease of finding the definition of past behavior, it is really hard to find what the test was actually supposed to be testing, what feature / behavior was being verified by that test, and whether that behavior needs to change with the new feature.

If new features override the behavior of old features, the test case should retain the same name, but a comment added to the old Github issue referencing the new Github issue saying that the behavior has changed and the test case was modified to reflect that change in behavior. Then, go ahead and modify the test so that it passes and now verifies the new behavior.

Details

This section does a deep dive into some of the more obtuse and quirky parts of the test framework that we felt needed some explanation.

Client / CLI / Test Relationship

Recall that the goal of the test framework is to be able to "black-box" Runtime: spawn Runtime by starting up its processes/components, give it some inputs, examine its outputs, verify that they're correct, and then terminate Runtime—all from within a test program (not manually done). The process clients and process CLIs help us do exactly that.

For each process in Runtime (and shared memory), there is a corresponding client for that process. The client is not another process! Rather, it is an interface for a program to use to interact with the corresponding Runtime process. For example, the net_handler_client.h file defines the functions that a program can use to interact with net_handler. These functions include: sending various types of messages to Runtime via TCP or UDP connections, starting and stopping net_handler, and viewing the device data coming back from Runtime on UDP. Notice that this includes starting and stopping net_handler. To understand how net_handler_client (and the other clients as well) start and stop their respective Runtime processes, first read the wiki page on processes for an overview of UNIX processes and management.

Starting the corresponding process is done by forking the program, and then calling execlp on the actual executable for the corresponding Runtime process to spawn the Runtime process. Stopping the Runtime process is done by sending the spawned Runtime process SIGINT with the kill function, and then waiting for the process to terminate using the waitpid function.

For each process in Runtime (and shared memory), there is also a corresponding CLI for that process. The CLI is a process that uses the corresponding client to allow a user to interact with the corresponding Runtime process via the command line. This is really useful for getting to know Runtime, or just poking the system around manually. The CLIs also provide a way to test the test framework and clients manually, without trying to write tests to test if the clients are working.

All of the interfaces provided by all of the clients COMBINED, plus the functions provided in test.h, make up the functions that are available for use in automated tests in tests/integration. This combined interface should give the test the ability to start and stop any Runtime process, to retrieve the output and examine the state of Runtime, and to provide inputs to the system. The diagram below shows the relationship between an automated test, the net_handler_cli, net_handler_client, and net_handler itself (this diagram applies in general (sort of—more caveats later) to all Runtime client/CLI/process trios):

TODO: add a diagram here kinda similar to the one Vincent had about dev_handler_client when planning.

Below is a diagram showing the typical usage of a client (here, net_handler_client is used as an example) and the lifetime of the spawned net_handler process:

TODO: add a digram here showing the above

Modifying the PYTHONPATH for the Executor

The goal of this section is to describe how we get executor to be able to find sample student code used for testing in the tests/student_code folder in our library.

Every Python file (<something>.py) is a Python module. You're probably familiar with the import keyword in Python; what it's doing is that it's specifying that the current module should import some other module on your system. Let's illustrate with an example:

Suppose I have a directory called foo and I have a file bar.py and another file baz.py in that directory. Since they're in the same directory, if in baz.py I write import bar, baz will have imported bar.py as a module for use in baz.py. Now, in bar.py, I can call functions in baz.py!

In executor, we normally execute studentcode.py, which is located in the executor directory. In order for executor to run student code, it must import it as a module. (More on this in the executor wiki.) So, we essentially do an import studentcode and voila, we can access all of the student code functions in executor. The problem is that when you try to import a module that is outside the directory that you're in, Python doesn't know where to begin to look for those modules! (This is actually smart, you don't want Python to go searching your entire file system for a module, which would take forever.) This is the problem that we run into when trying to import modules in tests/student_code from executor.

The way to tell Python where to look for modules is by using the PYTHONPATH environment variable. In other words, if we're in the executor folder and we add ../tests/student_code to PYTHONPATH, all of a sudden executor will know to search in that folder for modules, and will find the sample student code for the tests.

There are two places where we would like the ability to add that folder to the PYTHONPATH environment variable:

  1. In automated tests, to supply executor with student code to run tests on
  2. In the CLI, to try and manually recreate tests (or for experimenting with Runtime without touching studentcode.py)

In the first case, the modification of the PYTHONPATH environment variable is done by the test.sh shell script that we use to run automated tests.

In the second case, the modification of the PYTHONPATH environment variable is done by the start_executor function in executor_client.c, with all of the logic that uses getenv and setenv before calling execlp on executor to start up the actual Runtime executor process.

shm_client Quirks

The shm_client is unlike the other clients in that it does not spawn a process in its start function that will run in the background until terminated in its stop function. This is because shared memory is initialized by the shm_start program, which terminates normally by itself (without being interrupted by SIGINT), and stopped by the shm_stop program, which terminates normally by itself as well.

Therefore, the shm_client start function calls execlp on the shm_start executable in the child process after the call to fork, which starts the shm_start function. The parent process then calls waitpid to wait for the child (shm_start) to finish, before returning. Similarly, the shm_client stop function calls execlp on the shm_stop executable in the child process after the call to fork, and the parent process waits for the child (shm_stop) to finish, before returning.

Crucially, in the start function, the parent calls shm_init after waiting for the child to finish. This allows the program using the shm_client to view the contents of shared memory by calling the print_shm function (useful in both the CLI and in tests). Without it, the print_shm function would generate a segmentation fault because the semaphores and shared memory blocks would not have been initialized yet. The following is a timeline of events for shm_client:

TODO: add a diagram here explaining the above

dev_handler_client Quirks; Test Devices

@levincent06

net_handler_client Output Suppression

Output Capture When Running Tests

This logic is in test.c, and explains how we are able to record what the test is outputting for comparison after the test has finished while also being able to see the output of the test in real time in the terminal. (This diagram can be found in docs/Test-Output-Redirect-Logic.png)

Test-Output-Redirect-Logic