Introduction
The unit test is the lowest common denominator. We can use them, but they are cumbersome and they only peek once into the fabric of our work. A better solution for many tasks, namely Property Based Testing exists for Erlang. This post is an example of how to use the statem
type of test, mainly because there are so few of these out there. The outset is this: We will randomly generate operations on a priority queue and then validate that these operations are correct according to a simpler model.
The setup was that Michael Truog had a nice library of (min) priority queues which he had written (see okeuday/pqueue) He was mostly interested in how they benchmarked, but before benchmarking, you need to have some kind of knowledge that your PrioQ is correct indeed. Otherwise, you may just return the value 42
on each output and claim that this is the "right" thing to do. So the system under test (the SUT) is the various priority queue implementations by Michael. In order to test them, we need a model of a priority queue though.
The model of a SUT is some code we write which can reflect the observations of the SUT. In the case of a priority queue, there is a very simple model we can choose. We can simply base it upon a naive priority queue. The idea is then, that our PrioQ is easy to write but it may not be particularly fast. If an error shows up, it means that the SUT and our model will disagree (hopefully) and we can figure out if the problem is with our Model or the SUT.
We are going to use PropEr. The choice could equally well have been Erlang QuickCheck though, with minor changes. Both tools reflect the same interface for this kind of work, even though their real working might be slightly different in practice.
Note: All code is in Michaels repository linked above. If you want to digest the code yourself, go read it there - after having read this blog post. I've pulled out the important parts, but eh real repo has a bit more code in it.
Building the model
The model in this case is fairly simple. When constructing the model you need to have an idea of what you are going to test for. The goal is to get the model as small as possible while still being able to handle everything you may need. On the other end of the spectrum, you build up a model that is a complete specification - which can in some cases actually mean your model will be larger than the SUT. But in this case the model is way smaller and simpler than the queue it tests.
For this particular example though, we are very lucky. Essentially our example is a complex implementation (implementations) of a simple idea. So our model goes for reflecting the simple idea instead.
Let us write a priority queue to serve as the model. Our representation is [{priority(), [element()]}]
that is a list of tuple-pairs where the firstelement is a priority and the second element is the list of values we are storing under that priority. We require that the list is sorted by the priorities. This makes lookup a bit faster.
Insertion is awfully simple. Just skip until we hit the right spot and inject the value:
%% A listq is a sorted list of priorities
listq_insert({P, V}, []) -> [{P, [V]}];
listq_insert({P, V}, [{P1, _} | _] = LQ) when P < P1 ->
[{P, [V]} | LQ];
listq_insert({P, V}, [{P1, Vs} | Next]) when P == P1 ->
[{P, Vs ++ [V]} | Next];
listq_insert({P, V}, [{P1, Vs} | Next]) when P > P1 ->
[{P1, Vs} | listq_insert({P, V}, Next)].
Note that if we get a hit on the priority, we just add the next value in the back. This is slow, but we don't care since we only need the model to carry out tests. Now, converting a prioq to a list is easy. So is taking the length of a priority queue in the model:
listq_to_list(L) ->
lists:concat(
[ Vals || {_Prio, Vals} <- L]).
listq_length(L) ->
lists:sum(
[ length(Vs) || {_Prio, Vs} <- L]).
There! Michael have two ways to remove elements from a priority queue. The first is the common solution where you remove the next element. Note that if the queue is empty then the result is an empty queue. Also note, we don't return the removed element. This semantics seem a little odd to me, but on the other hand, it is better to make the model reflect the expected behavior here:
listq_rem([]) -> [];
listq_rem([{_P, [_V]} | Next]) -> Next;
listq_rem([{P, [_V1 | Vs]} | Next]) -> [{P, Vs} | Next].
Michael also have a variant where we remove an element of a given priority. We can simply dig through the list until we find the desired priority and then remove an element from that one. If there are no more elements left of that priority, we kill the pair:
listq_rem([], _P) -> [];
listq_rem([{P, [_]} | Next], P) -> Next;
listq_rem([{P, [_ | Vs]} | Next], P) -> [{P, Vs} | Next];
listq_rem([{P1, Vs} | Next], P) -> [{P1, Vs} | listq_rem(Next, P)].
Finally, we need to be able to peek into the queue. There are three variants. We can peek for the minimal element. We can peek for the element with a given priority, or we can peek for an element but also obtain its priority in question:
listq_peek([]) -> empty;
listq_peek([{_P, [V | _]} | _]) -> {value, V}.
listq_prio_peek([{P, [V | _]} | _], P) -> {value, V};
listq_prio_peek([{_P1, _} | Next], P) -> listq_prio_peek(Next, P);
listq_prio_peek([], _P) -> empty.
listq_ppeek([]) -> empty;
listq_ppeek([{P, [V | _]} | _]) -> {value, V, P}.
Making a gen_server to drive the SUT.
Strictly speaking, this example does not need a properstatem
test. You could just build up the priority queue by using valid operations and then make tests. But to illustrate the use of statem
we have built a simple driver for a priority queue. This code is in okeuday/pqueue/src/queuesrv.erl and it is not that interesting. We just have a separate process which keeps track of the internal state of the priority queue for us. This means, that we can only observe the queue_srv
by means of what we prod it to do and what it answers with.
You can have a look at the code, but it is not that interesting. It merely reflects what operations are allowed on a priority queue.
Introducing the StateM behaviour
We need a bit of work in order to support the proper_statem
behaviour:
-module(pqueue_proper).
-include_lib("proper/include/proper.hrl").
-behaviour(proper_statem).
-export([command/1, initial_state/0, next_state/3, postcondition/3,
precondition/2]).
The next part is somewhat wrong, from a type perspective. I should probably change it. The state is tracked far too broadly here, but the code will still function properly.
-type value() :: integer().
-record(state, { in_queue :: [{value(), term()}] }).
-define(SERVER, queue_srv).
Now we are at the point where we need to handle the given callbacks from proper_statem
. The first one we will be attacking is the command/1
which states what commands are eligible for firing. But before doing that, we need some helpers:
priority() -> integer(-20, 20).
value() -> integer().
This is a generator of priorities and one of values. Michael decided that priorities should be in the range -20 to 20, so we reflect that in our generator. The values are always integers, but they have no bound on the other hand.
priority(InQ) -> elements([P || {P, _} <- InQ]).
Given a queue, we extract all the priorities from that queue. And then we use elements
to select one at random. This is a generator of existing priorites that are already in the queue. Then there is the initial state of the system, which is the empty priority queue:
initial_state() -> #state { in_queue = [] }.
Commands
Now we are ready to define the command function. It is keyed by the current state, so we can take that into consideration as well. It may be that the current state limits what commands are eligible for firing.
command(#state { in_queue = InQ }) ->
oneof([{call, ?SERVER, in, [value()]},
{call, ?SERVER, in, [value(), priority()]},
{call, ?SERVER, is_empty, []},
{call, ?SERVER, is_queue, []},
{call, ?SERVER, len, []},
{call, ?SERVER, out, []}] ++
[{call, ?SERVER, out, [priority(InQ)]} || InQ =/= []] ++
[{call, ?SERVER, pout, []},
{call, ?SERVER, to_list, []}]).
We generate calls to ?SERVER
which is our queuesrv
genserver construction. We use the already given generators to generate random values and priorities where applicable. Note the trick [priority(InQ) || InQ =/= []]
which is a degenerate list comprehension. The list is []
if InQ is empty. Otherwise it uses the priority/1
function from above to pick a random priority to remove among those already in the queue.
Updating our model
Now, assume we have run an operation on the SUT with the command/1
callback. Then our model needs to be updated with the same value as well. Otherwise we could not check for correctness. The function next_state(State, Ret, Call)
reflects this change. The three parameters taken are the current State of the model, the return value of the operation we invoked, and Call describes the command we executed.
Note: There is one very important thing with the next_state/3
function. It is used twice internally in PropEr/QuickCheck. First it is used in an abstract mode where all values are symbolic in nature. That is you can not rely on a return value being a "real" value. Rather you must only transfer values around without discriminating on them. Secondly, the function is used concretely when you are running the test cases. The notion used is symbolic and dynamic respectively. Just keep this in mind when you write your own tests. Since you may be executing this with symbolic values, it is limited how you can discriminate values as they may be symbolic in nature.
Here is the first clause:
next_state(#state { in_queue = InQ } = S, _V, {call, _, out, []}) ->
S#state { in_queue = listq_rem(InQ) };
This means we had a call to out/0
on the queue and we should remove an element from the queue. We can then call our own listq_rem/1
function to track this operation in our own model state. Note also that some of the clauses are not going to update the state since the calls generated does not alter the state. Also note that we sometimes look inside the call to figure out what values were generated as input so we can transition our model state accordingly:
next_state(#state { in_queue = InQ } = S, _V, {call, _, out, [Prio]}) ->
S#state { in_queue = listq_rem(InQ, Prio) };
next_state(#state { in_queue = InQ } = S, _V, {call, _, pout, _}) ->
S#state { in_queue = listq_rem(InQ) };
next_state(S, _V, {call, _, to_list, _}) -> S;
next_state(S, _V, {call, _, is_queue, _}) -> S;
next_state(S, _V, {call, _, is_empty, _}) -> S;
next_state(S, _V, {call, _, len, _}) -> S;
next_state(#state { in_queue = InQ } = S, _V,
{call, _, in, [Value, Prio]}) ->
S#state { in_queue = listq_insert({Prio, Value}, InQ) };
next_state(#state { in_queue = InQ } = S, _V,
{call, _, in, [Value]}) ->
S#state { in_queue = listq_insert({0, Value}, InQ) }.
Preconditions
This example does not use preconditions:
precondition(_S, _Call) -> true
Normally you use preconditions to limit what calls can be done in what state. That is, if the precondition fails, then the given transition is not allowed. One can use this as a constraining measure in some tests. But for this example, everything can always fire, so there is no reason to limit the calls in any way.
Postconditions
The postcondition is where we check that the model and SUT agrees on the observations. This is not executed symbolically, but is entirely dynamic execution. Here is the first clause:
postcondition(#state { in_queue = InQ }, {call, _, out, [Prio]}, R) ->
R == listq_prio_peek(InQ, Prio);
it states: If we have a current state, InQ
and a call to out
with a specific priority, Prio
and the SUT returned the value R. Then peeking in InQ
for the first element at that given priority should be the same element. After this test, the state transition with next_state/3
will happen. Equally:
postcondition(#state { in_queue = InQ }, {call, _, pout, _}, R) ->
R == listq_ppeek(InQ);
postcondition(#state { in_queue = InQ }, {call, _, out, _}, R) ->
R == listq_peek(InQ);
handles the remaining checks for removing elements from the priority queue. Converting to a list or taking the length can also be done by calling our model variants and looking for equality:
postcondition(S, {call, _, to_list, _}, R) ->
R == listq_to_list(S#state.in_queue);
postcondition(S, {call, _, len, _}, L) ->
L == listq_length(S#state.in_queue);
Finally, one can ask if we have a queue, which should always be true
- and we could ask if the queue is currently empty, which we can determine by discriminating on our model state. In this case they should also agree. Inserting elements into a queue always succeeds in addition:
postcondition(_S, {call, _, is_queue, _}, true) -> true;
postcondition(S, {call, _, is_empty, _}, Res) ->
Res == (S#state.in_queue == []);
postcondition(_S, {call, _, in, _}, _) ->
true;
All other outcomes are errors:
postcondition(_, _, _) ->
false.
Running the test
Now, running this property test is the following property:
correct(M) ->
?FORALL(Cmds, commands(?MODULE),
?TRAPEXIT(
begin
?SERVER:start_link(M),
{History,State,Result} = run_commands(?MODULE, Cmds),
?SERVER:stop(),
?WHENFAIL(
io:format("History: ~w\nState: ~w\nResult: ~w\n",
[History,State,Result]),
aggregate(
command_names(Cmds),
Result =:= ok))
end)).
Where M
describes the module we wish to test. Michael wrote several and we can test them all with the same code! Then we start a queue_srv
with M
as the module and then we run a series of commands. The server is then stopped. Then, upon failure, we output necessary stuff to figure out what went wrong. Also, we aggregate how often the commands are hti so we are sure we have a decent coverage.
The test uncovered some errors straight away. Here is one such which was much much bigger but got minimized down by PropEr:
[{set,{var,1},{call,queue_srv,in,[-18]}},
{set,{var,5},{call,queue_srv,in,[9]}},
{set,{var,6},{call,queue_srv,in,[-10,-4]}},
{set,{var,18},{call,queue_srv,in,[-29]}},
{set,{var,22},{call,queue_srv,in,[11]}},
{set,{var,26},{call,queue_srv,len,[]}}]
So, if you insert [-18, 9, -4 at priority -10, -29, 11] and then ask for the length of the priority queue, something failed (the pqueue code crashed).
The point here is that normal unit tests won't usually come up with examples this complex. But by starting out asking: "What can be done here?" by means of generators and then carrying out random operations uncovers nasty errors quickly. Michaels code now routinely survives 1500 PropEr test runs on all his priority queue modules. Given that it took 1-2 hours writing the test, it seems like it was worth it.
View comments