Automated testing in RKWard

Automated testing can be used to test that RKWard the application, and particularily, that RKWard’s plugins behave as expected. As expected, here, means that the result of each single test has is still the same as it was in earlier runs of the test. To run all currently existing tests, enter

make plugintests

in the top-level source directory, after compiling and installing.

Every single plugin in RKWard should have at least one automated test, that checks the basic functionality. Complex plugins should have several tests, and where new problems are discovered, usually a test should be added after fixing the problem, to make sure it does not recur.

What automated testing can / cannot do for you

Automated testing does not mean that RKWard magically knows, what the “correct” behavior of a plugin should be. All the tests can do is to compare the result of running a test to the result of earlier runs. In detail the testing framework can check:

You can also add some more checks manually, to make sure, e.g. data created by a plugin is correct.

In general, automated tests should help to detect the following problems:

Writing automated tests

Creating a test suite (i.e. a set of tests) consists of the following steps, to be dealt with in turn:

Load the framework

As a first step, load the library “rkwardtests”. This is included in the official rkward distribution, and contains a set of functions to run tests, and to facilitate writing tests.

library(rkwardtests)

Defining the test suite

In the tests/ subdirectory,

Of course you will replace “MYTESTSUITE” with a more meaningful name.

tests/MYTESTSUITE.R will contain code like this:

## definition of the test suite
suite <- new ("RKTestSuite", id="import_export_plugins",
    # list here libraries which are needed by all (or at least most) tests
    # in this suite
libraries = c ("R2HTML", "datasets"),
# initCalls are run *before* any tests. Use this to set up the environment
initCalls = list (
    function () {
        # prepare some different files for loading
                    library ("datasets")
        women <- datasets::women

        save (women, file="women.RData")
        write.csv (women, file="women.csv")

        suppressWarnings (rm ("women"))
    }
## the tests
), tests = list (
    new ("RKTest", id="load_r_object", call=function () {
        rk.call.plugin ("rkward::load_r_object", file.selection="women.RData", other_env.state="0", submit.mode="submit")

        stopifnot (all.equal (.GlobalEnv$women, datasets::women))
    }),
    new ("RKTest", id="one_plus_one", call=function () {
        stopifnot (1 + 1 == 2)
    }),
    new ("RKTest", id="import_csv_overwrite", call=function () {
        assign ("women", datasets::women, envir=globalenv ())
        rk.sync.global ()

        # this one is expected to fail, as it would overwrite the existing "women" in globalenv()
        rk.call.plugin ("rkward::import_csv", file.selection="women.csv", name.selection="women", submit.mode="submit")
    }, expect_error=TRUE)
    # like initCalls: run after all tests to clean up. Empty in this case.
), postCalls = list ()
)

As you can see, the “RKTestSuite” is mostly just a list of functions, and in fact the functions are called in exactly the order in which they are defined, here. The testing frameworks adds all the logic to test output/commands/messages for correctness, and to clean up after each test.

Before the actual test definitions, there is only line libraries = … and one line initCalls = …. Set libraries to a character vector listing the names of R packages that are needed for this suite as a whole (i.e. for most or at least many of the tests in this suite). These libraries will be loaded, before any test from the suite is run. Please do not include libraries that are only used by one or two tests in your suite. List those in the tests, themselves, instead (see below). This way, if a test requires an exotic library, not installed on most systems, this test can simply be skipped, when the library is not available.

The initCalls (you can put several functions in the list) can be used to set up further details of the testing environment, e.g. to create sample data for the tests to use. in this case the files women.RData and women.csv are created, which we will attempt to load in the tests.

Note that in the example there is a call to library(“datasets”) in the initCalls. If you need to reference objects from a library in the initCalls, you need to list them in libraries and load the manually (usually, the libraries are assumed to be needed in the tests, only).

Writing a test

The next section is more interesting. Here, we define the actual tests.

Each test must be given an id, which should ideally give you some idea, what the test is all about (use comments, liberally, in addition to that). The essence of the test, however, is the parameter call. Once again, this is a function without parameters, as shown in the example. Let’s take a closer look at the load_r_object-test:

new ("RKTest", id="load_r_object", call=function () {
    rk.call.plugin ("rkward::load_r_object", file.selection="women.RData", other_env.state="0", submit.mode="submit")

    stopifnot (all.equal (.GlobalEnv$women, datasets::women))
})
Calling plugins

The first command in this test is a call to rk.call.plugin. What this function does, is call the plugin “rkward::load_r_object”, set the property file.selection to “women.RData”, the property other_env.state to “0”, and try to submit the plugin with these settings (submit.mode=”submit”). That’s all.

Don’t worry, if you’re testing a more complex plugin. The library “rkwardtests” contains a little helper function rktest.replaceRunAgainLink(). Run this, then call the plugin in the usual way, and make all the settings. When you submit the plugin, at the bottom of the output you are shown the corresponding rk.call.plugin(…)-call with all needed parameters. You can simply copy and paste that into your test.

Note: In the test runs, the rerun-link is not printed for the plugins, and a few other tweaks are applied as well, to make the output less variable. In general you do not need to worry about this, though.

Checking results

The load_r_object-test has another line after calling the plugin:

stopifnot (all.equal (.GlobalEnv$women, datasets::women))

This simply checks that the imported object really matches the one that we exported in the initCalls(). Should there be a mismatch, stopifnot() throws an error, and hence the test is marked as having failed.

Note: Since the tests are not run in the .GlobalEnv, directly, if you want to access an object in .GlobalEnv, you should always specifically write .GlobalEnv$OBJECTNAME (most of the time the correct object will be picked, anyway, but it’s best to err on the safe side).

Note: You do not have to add any test with stopifnot() or the like. For most purposes a call to rk.call.plugin() is entirely enough for one test. Remember that the test framework will automatically check whether the R code generated by this plugin, the output, and any R messages match the standard set from previous runs.

Libraries

If your test / the plugin relies on an R add-on package, add libraries = c(“library_name”, “further_library_name”) in the call to new (“RKTest”, […]). E.g.:

new ("RKTest", id="import_spss", call=function () {
    rk.call.plugin ("rkward::import_spss", [...])

    [...]
}, libraries=c("foreign"))

This is needed even though the plugin, too, will make sure the library is loaded as well. The reason is, that for the test to always yield the same messages, the loading has to happen *before* the test is run. Else the message output may differ, depending on whether or not the library was already loaded before running the test.

Note: You do not need to list the libraries already listed for the suite. Both libraries will the combined, automatically.

Expected errors

In some cases, you may want to create a test that is expected to throw an error. For instance, you may want to make sure that a plugin can not be run with a certain combination of settings. In this case, set up the test as usual, but add expect_error=TRUE in the call to new (“RKTest”, […]). See the third test in the example. When you do this, please always add a comment in the test-definition as to why this is expected to fail.

Please do not use this for known bugs. Just let those tests fail. Use expect_error=TRUE, when the error is the correct behavior, only.

Fuzzy output

In some cases, the output is non-determinate to some degree. For instance, the exact results may be subject to randomness.

To make sure this is not seen as an error, add fuzzy_output=TRUE in the call to new (“RKTest”, […]). When you do this, please always add a comment in the test-definition telling in what way the output is expected to vary.

The test framework will still check for a comparable length of the output, but will not let the test fail on small differences.

The first test run

The easiest way to run the tests in your newly created test suite is to:

  1. cd to the tests/ directory (e.g. using Run->CD to script directory)
  2. run
rktest.makeplugintests("''MYTESTSUITE''.R")

Oh dear! All your tests are failing? Don’t despair, read the next section.

Setting the test standards

If you try running your tests/testsuite, now, all your tests will be seen as failing. Don’t worry, though. This is to be expected. The test framework simply does not know which results to expect, yet, and so the actual results can’t match its expectations.

Find out, where the test results have been stored by running rktest.getTempDir() Take a look at the files in the subdirectory MYTESTSUITE in a file browser. For each tests you will find the files

(If any of the above was empty, it will have been deleted, so you may not find each file for each test).

Take a close look at each one in turn and make sure they actually have the content that you expect the test to produce. If everything is correct, you can switch back to the R Console and enter

rktest.setSuiteStandards(suite)

This function is actually quite simple. All it does is to copy the test result-files to the test root (the current directory). If you want to adjust only a few select standards, later, you can also copy the files, manually.

Now run the test again. This time your tests should pass.

IMPORTANT: The step of setting the test standards is the one that requires the most attention of all steps. By setting the standards you really tell the test system that the result of the test run was correct. Please make extra sure, this really is the case. Check, and second check everything (especially, but not only, when modifying existing tests).

Adding the test suite to the fully automated test run

So far your test-suite is not included in make plugintests. Including it is quite simple. Just add it to the testsuites-vector in tests/all_tests.R.

Adding the tests to SVN

The final step is to add your tests/testsuite to SVN. You will want to include the following files (and no others):

Be sure to review the output of svn.diff, carefully.

Of course, as usual, try to come up with a good description of what you did and why in the SVN commit message (or in your message to rkward-devel, if you do not have SVN access).

Further considerations

How much to put in a single testsuite

How many, and which tests should be bundled in a single testsuite? There really isn’t a standard for this, yet. Probably it makes sense to have one testsuite for one .pluginmap-file. However, if you are more familiar with a certain sub-set of tests in a .pluginmap, it’s perfectly OK to create a testsuite for only those.

How much to put in a single test

Generally each test will have exactly one call to rk.call.plugin(). That is the smallest test unit that really makes sense, and it’s seems a good idea to split the individual tests into as small pieces as possible, so it is easy to find out quickly, just what went wrong in a failed test.

Of course one plugin can have more than one test, and in fact it would be a good thing, if most plugins did have several tests, covering different combinations of settings.

However: No test should depend on the outcome of another test. Each test should be valid on its own. (In fact it isn’t even easily possible to use the results of one test in another test). Therefore, if you want to test a sequence of actions, you will have to put that sequence into a single test.

Common pitfalls

Variable scopes

Consider a test like this:

new ("RKTest", id="t_test", call=function () {
   x <- c (1, 2, 3, 4)
   y <- c (2, 3, 4, 5)

   rk.call.plugin ("rkward::t_test_two_vars", x.available="x", y.available="y", submit.mode="submit")
})

This will not work as expected for two reasons. First, objects x and y are created in the environment of the test-function. The plugin code generated from rk.call.plugin(), however, will be run in .GlobalEnv. Secondly, RKWard usually does not need to care about object modifications until a command has finished, so at the time of rk.call.plugin(), RKWard will not know that x or y exist. You would need to rewrite the test as follows:

new ("RKTest", id="t_test", call=function () {
   .GlobaleEnv$x <- c (1, 2, 3, 4)
   .GlobaleEnv$y <- c (2, 3, 4, 5)
   rk.sync.global ()

   rk.call.plugin ("rkward::t_test_two_vars", x.available="x", y.available="y", submit.mode="submit")
})

Also, consider defining objects .GlobaleEnv$x and .GlobaleEnv$y in the initCalls of your testsuite. Perhaps they can be used in more than just one single test.

Running automated tests

You should run the tests early, and often. Especially, it is useful to run the test before, and after an upgrade of R. Also, of course before a new release, all test should ideally pass on a variety of different systems.

Please do make plugintests often, and report any failures to rkward-devel.