First of all, it must be clear that there is no magic recipe on how to optimize our specifications. On the contrary, although there are some guidelines that we may try to follow when possible, it is not always the case that they work, or that they are applicable. For example, conditional rules are generally expensive from a computational point of view, as are membership axioms, but in some cases we may be interested in using proving tools for which using them could be a better alternative. Similarly, in Section 4.4.8 we saw that using the memo attribute was a big win in the case of the Fibonacci function, but it is not always applicable; for some specifications, the consumption of memory can become so big that we may be getting a slower specification. There is always a tradeoff between the speedup obtained using memoization and the amount of memory and the cost of handling it. We illustrate all these and other concerns in this section.
Profiling is switched on by the command set profile on. When profiling is switched on, a count of the number of executions of each statement (equation, membership, and rule) is kept. For unconditional statements, the profile information is just the number of rewrites using that statement. For conditional ones there is also the number of matches, since not every match leads to a rewrite, due to condition failure. Moreover, when searching there can be multiple rewrites for each match, since the condition may be solved in multiple ways. There is a table that for each condition fragment gives:
Special rewrites such as built-in rewrites and memoized rewrites are also tracked, but these are associated with symbols rather than with statements. For conciseness, symbols with no special rewrites, and statements that are not matched are omitted. There are some limitations: metalevel rewrites are not displayed, due to the ephemeral nature of metamodules. In addition, condition fragments associated with a match or search command are not tracked (though any rewrites initiated by such a fragment are). If you turn profiling on or off in the debugger you may get inconsistent results.
The profile information is associated with each module and is usually cleared at the start of any command that can do rewrites, except continue. This behavior can be changed with the set clear profile on / off command.
Let us first consider the Fibonacci function described in Section 4.4.8.
fmod FIBONACCI is
protecting NAT .
op fibo : Nat -> Nat .
var N : Nat .
eq fibo(0) = 0 .
eq fibo(1) = 1 .
eq fibo(s s N) = fibo(N) + fibo(s N) .
endfm
Notice in the following reductions that the times given when the profile is active are slightly higher.
Maude> red fibo(30) . reduce in FIBONACCI : fibo(30) . rewrites: 4038805 in 3920ms cpu (3960ms real) (1030201 rews/sec) result NzNat: 832040
Maude> set profile on .
Maude> red fibo(30) . reduce in FIBONACCI : fibo(30) . rewrites: 4038805 in 4170ms cpu (4194ms real) (968453 rews/sec) result NzNat: 832040
After doing the reduction with the profiler activated, we can request the collected information by means of the command show profile. In this example, since the module has no memberships or rules and there are no conditional axioms, the profiler gives the number of times each of the equations has been applied and also the number of times built-in functions are called.
Maude> show profile . op _+_ : [Nat] [Nat] -> [Nat] . built-in eq rewrites: 1346268 (33.3333%) eq fibo(0) = 0 . rewrites: 514229 (12.7322%) eq fibo(1) = 1 . rewrites: 832040 (20.6011%) eq fibo(s_^2(N)) = fibo(N) + fibo(s N) . rewrites: 1346268 (33.3333%)
In this very simple example we observe that only the three equations in
the FIBONACCI module plus the predefined addition operation on natural
numbers have been used. We can also observe how the equations are applied
a number of times relatively similar, with percentages 12, 20, and 33,
respectively. More interesting is the number of times each of them is
applied, which goes to 1346268 for the third equation. Taking into account
that we reduced fibo(30), it means that the calculations have been
repeated many times. As we saw in Section 4.4.8, this is a good place
to use the memo attribute: calculations on small arguments are
repeated many times and a small amount of memory is needed for storing the
result of such calculations.
After adding the memo attribute to the fibo operator, we get the following results from the profiler:
Maude> set profile on . Maude> red fibo(30) . reduce in FIBONACCI : fibo(30) . rewrites: 88 in 1ms cpu (1ms real) (88000 rews/sec) result NzNat: 832040 Maude> show profile . op _+_ : [Nat] [Nat] -> [Nat] . built-in eq rewrites: 29 (32.9545%) op fibo : [Nat] -> [Nat] . memo rewrites: 28 (31.8182%) eq fibo(0) = 0 . rewrites: 1 (1.13636%) eq fibo(1) = 1 . rewrites: 1 (1.13636%) eq fibo(s_^2(N)) = fibo(N) + fibo(s N) . rewrites: 29 (32.9545%)
As we already saw in Section 4.4.8, the number of rewrites and the time consumed in the computation have decreased dramatically. We may observe that now each of the values in the Fibonacci sequence is calculated only once.
Let us consider now another example, namely, the parameterized
module SORTED-LIST presented in Section 6.3.6,
which defines a sort SortedList{X} of sorted lists as a subsort of
the sort List{TOSET}{X} of lists. In this case we deal with a
parameterized module which imports several other modules, and which has
membership axioms and equations, some of which are conditional.
First of all, notice that by default the profiler provides information on a particular computation. In this example, it is not the same sorting a list in reverse order as an already sorted list, and is not the same using insertion sort, mergesort, or quicksort for sorting. To have a better insight about our specification, and thus gaining chances of improving it, we should consider several reductions, dealing with different cases, the different sorting algorithms in our case.
To be able to run examples on big lists, with numbers initially sorted in
different ways, let us consider the following module
NAT-LIST-GENERATOR, which imports the module
SORTED-LIST{NatAsToset} defining sorted lists of natural numbers,
and specifies functions nats-upto, that builds lists from zero to
the specified value, and random-nats, which generates a list of
the specified number of random numbers.
fmod NAT-LIST-GENERATOR is
protecting SORTED-LIST{NatAsToset} .
protecting RANDOM .
vars N M : Nat .
op nats-upto : Nat -> NeSortedList{NatAsToset} .
eq nats-upto(s N) = nats-upto(N) ++ s N : [] .
eq nats-upto(0) = 0 : [] .
op random-nats : Nat -> List{TOSET}{NatAsToset} .
op random-nats : Nat Nat -> List{TOSET}{NatAsToset} .
eq random-nats(N) = random-nats(0, N) .
ceq random-nats(N, M)
= random(N) : random-nats(s N, M)
if N <= M .
eq random-nats(N, M) = [] [owise] .
endfm
We execute each one of the insertion-sort, mergesort, and quicksort algorithms on three lists, namely, a sorted list, a list in reverse order, and a random one, each of them with 1000 elements.
Maude> red insertion-sort(nats-upto(1000)) .
rewrites: 2012009 in 1032ms cpu (1079ms real) (1948029 rews/sec)
result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...
Maude> red insertion-sort(reverse(nats-upto(1000))) .
rewrites: 5519515 in 3634ms cpu (3694ms real) (1518667 rews/sec)
result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...
Maude> red insertion-sort(random-nats(1000)) .
rewrites: 1535372 in 1286ms cpu (1397ms real) (1193166 rews/sec)
result NeSortedList{NatAsToset}: 23772 : 1738648 : 2016694 : ...
Maude> red mergesort(nats-upto(1000)) .
rewrites: 2082358 in 1079ms cpu (1134ms real) (1928402 rews/sec)
result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...
Maude> red mergesort(reverse(nats-upto(1000))) .
rewrites: 2578581 in 1210ms cpu (1221ms real) (2129622 rews/sec)
result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...
Maude> red mergesort(random-nats(1000)) .
rewrites: 88005 in 71ms cpu (77ms real) (1222478 rews/sec)
result NeSortedList{NatAsToset}: 23772 : 1738648 : 2016694 : ...
Maude> red quicksort(nats-upto(1000)) .
rewrites: 6519514 in 5065ms cpu (5344ms real) (1287111 rews/sec)
result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...
Maude> red quicksort(reverse(nats-upto(1000))) .
rewrites: 6528518 in 4096ms cpu (4130ms real) (1593729 rews/sec)
result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...
Maude> red quicksort(random-nats(1000)) .
rewrites: 97858 in 75ms cpu (84ms real) (1287791 rews/sec)
result NeSortedList{NatAsToset}: 23772 : 1738648 : 2016694 : ...
Instead of considering the profiling information separately, we use the
set clear profile off command, so that the profiling information gets
accumulated.
Maude> set clear profile off .
Maude> set profile on .
Maude> red insertion-sort(nats-upto(1000)) .
rewrites: 2012009 in 1169ms cpu (1351ms real) (1719927 rews/sec)
result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...
Maude> red insertion-sort(reverse(nats-upto(1000))) .
...
As mentioned above, with the profiler active, the times taken by the
reductions are slightly higher. The number of rewrites and cpu time for
each of the cases is presented in the tables displayed in
Figure 13.1 (page
), where we
also include these values for all the different executions discussed in
the rest of the section.13.1The information shown by the profiler with the show profile
command is three pages long; we just comment the most interesting pieces.
_+_, _quo_, _<=_, _>_, and
random are used an important number of times, in particular _<=_
and _>_, which are applied, respectively, 7599823
(28.2%) and 1770593 times (6.6%). Notice that _<=_ is
only used in the conditions of one of the membership axioms and in some of the
conditions of the equations for insert-list, merge,
leq-elems, gr-elems, and random-nats;
the operator _>_ is used in the conditions of the equations for
insert-list, mergesort,
merge, leq-elems, and gr-elems.
op _+_ : [Nat] [Nat] -> [Nat] . built-in eq rewrites: 29961 (0.111124%) op _quo_ : [Nat] [Nat] -> [Nat] . built-in eq rewrites: 3000 (0.0111269%) op _<=_ : [Nat] [Nat] -> [Bool] . built-in eq rewrites: 7599823 (28.1874%) op _>_ : [Nat] [Nat] -> [Bool] . built-in eq rewrites: 1770593 (6.56706%) op random : [Nat] -> [Nat] . built-in eq rewrites: 3003 (0.011138%)
mb [] : SortedList{NatAsToset} .
rewrites: 22040 (0.0817455%)
mb N : [] : NeSortedList{NatAsToset} .
rewrites: 17505 (0.0649254%)
cmb N : NEOL:NeSortedList{NatAsToset} : NeSortedList{NatAsToset}
if N <= head(NEOL:NeSortedList{NatAsToset}) = true .
lhs matches: 4795972 rewrites: 4790561 (17.768%)
Fragment Initial tries Resolve tries Successes Failures
1 4795972 0 4790561 5411
We see that the condition has been checked 4795972 times, out of which only 5411 failed.
insert-list function. From the information in the profile we see
that these conditional equations have been attempted almost the same
number of times, 756888 and 754895. We also see that both have been
applied almost the same number of times, because the one that was
attempted first almost always failed the evaluation of its condition, and
then the second equation was applied.
eq insertion-sort([]) = [] .
rewrites: 3 (1.11269e-05%)
eq insertion-sort(N : L:List{TOSET}{NatAsToset}) = insert-list(
insertion-sort(L:List{TOSET}{NatAsToset}), N) .
rewrites: 3003 (0.011138%)
eq insert-list([], M) = M : [] .
rewrites: 1010 (0.00374605%)
ceq insert-list(N : OL:SortedList{NatAsToset}, M) = M : N :
OL:SortedList{NatAsToset} if M <= N = true .
lhs matches: 756888 rewrites: 1993 (0.00739196%)
Fragment Initial tries Resolve tries Successes Failures
1 756888 0 1993 754895
ceq insert-list(N : OL:SortedList{NatAsToset}, M) = N :
insert-list(OL:SortedList{NatAsToset}, M) if M > N = true .
lhs matches: 754895 rewrites: 754895 (2.79988%)
Fragment Initial tries Resolve tries Successes Failures
1 754895 0 754895 0
eq mergesort(N : []) = N : [] .
rewrites: 3003 (0.011138%)
ceq mergesort(L:List{TOSET}{NatAsToset}) = merge(mergesort(take
length(L:List{TOSET}{NatAsToset}) quo 2 from L:List{TOSET}{
NatAsToset}),mergesort(throw length(L:List{TOSET}{
NatAsToset}) quo 2 from L:List{TOSET}{NatAsToset})) if
length(L:List{TOSET}{NatAsToset}) > 1 = true .
lhs matches: 3000 rewrites: 3000 (0.0111269%)
Fragment Initial tries Resolve tries Successes Failures
1 3000 0 3000 0
eq merge([], OL:SortedList{NatAsToset}) = OL:SortedList{
NatAsToset} .
rewrites: 3000 (0.0111269%)
ceq merge(N : OL:SortedList{NatAsToset}, M : OL':SortedList{
NatAsToset}) = N :merge(OL:SortedList{NatAsToset}, M :
OL':SortedList{NatAsToset}) if N <= M = true .
lhs matches: 18705 rewrites: 18705 (0.0693761%)
Fragment Initial tries Resolve tries Successes Failures
1 18705 0 18705 0
leq-elems and gr-elems operations are attempted the same
number of times, and for each of these operations the condition (say, N <= M)
of one of the equations fails in around half of the cases, being then used the other
equation (with condition, say, N > M.
eq quicksort([]) = [] .
rewrites: 3006 (0.0111491%)
eq quicksort(N : L:List{TOSET}{NatAsToset}) = quicksort(
leq-elems(L:List{TOSET}{NatAsToset}, N)) ++ N : quicksort(
gr-elems(L:List{TOSET}{NatAsToset}, N)) .
rewrites: 3003 (0.011138%)
eq leq-elems([], M) = [] .
rewrites: 3003 (0.011138%)
ceq leq-elems(N : L:List{TOSET}{NatAsToset}, M) = N :
leq-elems(L:List{TOSET}{NatAsToset}, M) if N <= M = true .
lhs matches: 1012626 rewrites: 506277 (1.87776%)
Fragment Initial tries Resolve tries Successes Failures
1 1012626 0 506277 506349
ceq leq-elems(N : L:List{TOSET}{NatAsToset}, M) = leq-elems(
L:List{TOSET}{NatAsToset}, M) if N > M = true .
lhs matches: 506349 rewrites: 506349 (1.87803%)
Fragment Initial tries Resolve tries Successes Failures
1 506349 0 506349 0
eq gr-elems([], M) = [] .
rewrites: 3003 (0.011138%)
ceq gr-elems(N : L:List{TOSET}{NatAsToset}, M) = gr-elems(
L:List{TOSET}{NatAsToset}, M) if N <= M = true .
lhs matches: 1012626 rewrites: 506277 (1.87776%)
Fragment Initial tries Resolve tries Successes Failures
1 1012626 0 506277 506349
ceq gr-elems(N : L:List{TOSET}{NatAsToset}, M) = N : gr-elems(
L:List{TOSET}{NatAsToset}, M) if N > M = true .
lhs matches: 506349 rewrites: 506349 (1.87803%)
Fragment Initial tries Resolve tries Successes Failures
1 506349 0 506349 0
head and _++_ operations.
eq head(E:Nat : L:List{TOSET}{NatAsToset}) = E:Nat .
rewrites: 4795972 (17.7881%)
eq [] ++ L:List{TOSET}{NatAsToset} = L:List{TOSET}{
NatAsToset} .
rewrites: 12006 (0.0445298%)
eq (E:Nat : L:List{TOSET}{NatAsToset}) ++ L':List{TOSET}{
NatAsToset} =E:Nat : (L:List{TOSET}{NatAsToset} ++ L':List{
Toset}{NatAsToset}) .
rewrites: 5010777 (18.5848%)
The equation for the head function is used in the
evaluation of the condition of the membership axiom. The concatenation
operator is used in the quicksort, nats-upto, and
reverse functions.
Taking all the information provided by the profiler into account, we may think of doing different types of modifications to the original specification.
Let us, in any case, add the memo attribute, e.g., to the head operator; the result for one of the reductions above is the following:
Maude> red insertion-sort(random-nats(1000)) .
rewrites: 1535372 in 18500ms cpu (19221ms real) (82992 rews/sec)
result NeSortedList{NatAsToset}: 23772 : 1738648 : 2016694 : ...
That is, it goes from 1286 milliseconds cpu time to 18500. See Figure 13.1 for the rest of the values.
insertion-sort as
one single unconditional equation:
eq insert-list(N : OL, M)
= if M <= N
then M : N : OL
else N : insert-list(OL, M)
fi .
The result for the same reduction is the following:
Maude> reduce insertion-sort(random-nats(1000)) .
rewrites: 1536365 in 638ms cpu (704ms real) (2404692 rews/sec)
result NeSortedList{NatAsToset}: 23772 : 1738648 : 2016694 : ...
Although the number of rewrites increases slightly--from 1535372 to 1536365--, the amount of cpu time has dropped to a half--from 1286 ms. to 638 ms.
eq mergesort(L)
= merge(mergesort(take (length(L) quo 2) from L),
mergesort(throw (length(L) quo 2) from L))
[owise] .
Although this clearly improves the equation, the win is not very significant. To test it, we run the mergesort algorithm on a randomly generated list.
Maude> red mergesort(random-nats(1000)) .
rewrites: 87005 in 68ms cpu (110ms real) (1261124 rews/sec)
result NeSortedList{NatAsToset}: 237728 : 17386481 : ...
The number of rewrites goes from 88005 to 87005 and the cpu time goes from 71 ms. to 68 ms.
op quicksort : List{TOSET}{X} -> SortedList{X} .
op leq-gr-elems : List{TOSET}{X} X$Elt -> LGResult .
op leq-gr-elems : List{TOSET}{X} List{TOSET}{X} List{TOSET}{X}
X$Elt -> LGResult .
sort LGResult .
op {_,_} : List{TOSET}{X} List{TOSET}{X} -> LGResult .
eq quicksort([]) = [] .
ceq quicksort(N : L)
= quicksort(L') ++ (N : quicksort(L''))
if {L', L''} := leq-gr-elems(L, N).
eq leq-gr-elems(L, M) = leq-gr-elems(L, [], [], M) .
eq leq-gr-elems([], L, L', M) = {L, L'} .
eq leq-gr-elems(N : L, L', L'', M)
= if N <= M
then leq-gr-elems(L, N : L', L'', M)
else leq-gr-elems(L, L', N : L'', M)
fi .
The execution of the quicksort function on a list of randomly generated numbers takes now 69612 rewrites (against 97858) and does the reduction in 31 milliseconds (around 75 before).
Maude> red quicksort(random-nats(1000)) .
reduce in SORTED-LIST-TEST : quicksort(random-nats(1000)) .
rewrites: 69612 in 31ms cpu (72ms real) (2175714 rews/sec)
result NeSortedList{NatAsToset}: 237728 : 17386481 : ...
There is still room for improvement in this specification. For instance, some operations on lists can be made more efficient by means of tail-recursive definitions with accumulator arguments, in the style of the definitions shown in Section 7.12.1.