next up previous contents
Next: Performance note Up: Debugging approaches Previous: The debugger   Contents


The profiler

Tuning up of specifications is something that may be useful in many practical situations. We illustrate the use of the profiling facilities available in Maude to understand better the execution of our specifications, helping us in this way to make them more efficient. We will discuss the use of the profiler on two examples, namely, the specification of the Fibonacci function already discussed in Section 4.4.8 and the specification of sorted lists presented in Section 6.3.6.

First of all, it must be clear that there is no magic recipe on how to optimize our specifications. On the contrary, although there are some guidelines that we may try to follow when possible, it is not always the case that they work, or that they are applicable. For example, conditional rules are generally expensive from a computational point of view, as are membership axioms, but in some cases we may be interested in using proving tools for which using them could be a better alternative. Similarly, in Section 4.4.8 we saw that using the memo attribute was a big win in the case of the Fibonacci function, but it is not always applicable; for some specifications, the consumption of memory can become so big that we may be getting a slower specification. There is always a tradeoff between the speedup obtained using memoization and the amount of memory and the cost of handling it. We illustrate all these and other concerns in this section.

Profiling is switched on by the command set profile on. When profiling is switched on, a count of the number of executions of each statement (equation, membership, and rule) is kept. For unconditional statements, the profile information is just the number of rewrites using that statement. For conditional ones there is also the number of matches, since not every match leads to a rewrite, due to condition failure. Moreover, when searching there can be multiple rewrites for each match, since the condition may be solved in multiple ways. There is a table that for each condition fragment gives:

  1. the number of times the fragment was initially tested,
  2. the number of times the fragment was tested due to backtracking,
  3. the number of times the fragment succeeded, and
  4. the number of times the fragment failed.
Normally, (1) + (2) = (3) + (4).

Special rewrites such as built-in rewrites and memoized rewrites are also tracked, but these are associated with symbols rather than with statements. For conciseness, symbols with no special rewrites, and statements that are not matched are omitted. There are some limitations: metalevel rewrites are not displayed, due to the ephemeral nature of metamodules. In addition, condition fragments associated with a match or search command are not tracked (though any rewrites initiated by such a fragment are). If you turn profiling on or off in the debugger you may get inconsistent results.

The profile information is associated with each module and is usually cleared at the start of any command that can do rewrites, except continue. This behavior can be changed with the set clear profile on / off command.

Let us first consider the Fibonacci function described in Section 4.4.8.

  fmod FIBONACCI is
    protecting NAT .
    op fibo : Nat -> Nat .

    var N : Nat .
    eq fibo(0) = 0 .
    eq fibo(1) = 1 .
    eq fibo(s s N) = fibo(N) + fibo(s N) .
  endfm

Notice in the following reductions that the times given when the profile is active are slightly higher.

  Maude> red fibo(30) .
  reduce in FIBONACCI : fibo(30) .
  rewrites: 4038805 in 3920ms cpu (3960ms real) (1030201 rews/sec)
  result NzNat: 832040

  Maude> set profile on .

  Maude> red fibo(30) .
  reduce in FIBONACCI : fibo(30) .
  rewrites: 4038805 in 4170ms cpu (4194ms real) (968453 rews/sec)
  result NzNat: 832040

After doing the reduction with the profiler activated, we can request the collected information by means of the command show profile. In this example, since the module has no memberships or rules and there are no conditional axioms, the profiler gives the number of times each of the equations has been applied and also the number of times built-in functions are called.

  Maude> show profile .

  op _+_ : [Nat] [Nat] -> [Nat] .
  built-in eq rewrites: 1346268 (33.3333%)

  eq fibo(0) = 0 .
  rewrites: 514229 (12.7322%)

  eq fibo(1) = 1 .
  rewrites: 832040 (20.6011%)

  eq fibo(s_^2(N)) = fibo(N) + fibo(s N) .
  rewrites: 1346268 (33.3333%)

In this very simple example we observe that only the three equations in the FIBONACCI module plus the predefined addition operation on natural numbers have been used. We can also observe how the equations are applied a number of times relatively similar, with percentages 12, 20, and 33, respectively. More interesting is the number of times each of them is applied, which goes to 1346268 for the third equation. Taking into account that we reduced fibo(30), it means that the calculations have been repeated many times. As we saw in Section 4.4.8, this is a good place to use the memo attribute: calculations on small arguments are repeated many times and a small amount of memory is needed for storing the result of such calculations.

After adding the memo attribute to the fibo operator, we get the following results from the profiler:

  Maude> set profile on .

  Maude> red fibo(30) .
  reduce in FIBONACCI : fibo(30) .
  rewrites: 88 in 1ms cpu (1ms real) (88000 rews/sec)
  result NzNat: 832040

  Maude> show profile .
  op _+_ : [Nat] [Nat] -> [Nat] .
  built-in eq rewrites: 29 (32.9545%)

  op fibo : [Nat] -> [Nat] .
  memo rewrites: 28 (31.8182%)

  eq fibo(0) = 0 .
  rewrites: 1 (1.13636%)

  eq fibo(1) = 1 .
  rewrites: 1 (1.13636%)

  eq fibo(s_^2(N)) = fibo(N) + fibo(s N) .
  rewrites: 29 (32.9545%)

As we already saw in Section 4.4.8, the number of rewrites and the time consumed in the computation have decreased dramatically. We may observe that now each of the values in the Fibonacci sequence is calculated only once.

Let us consider now another example, namely, the parameterized module SORTED-LIST presented in Section 6.3.6, which defines a sort SortedList{X} of sorted lists as a subsort of the sort List{TOSET}{X} of lists. In this case we deal with a parameterized module which imports several other modules, and which has membership axioms and equations, some of which are conditional.

First of all, notice that by default the profiler provides information on a particular computation. In this example, it is not the same sorting a list in reverse order as an already sorted list, and is not the same using insertion sort, mergesort, or quicksort for sorting. To have a better insight about our specification, and thus gaining chances of improving it, we should consider several reductions, dealing with different cases, the different sorting algorithms in our case.

To be able to run examples on big lists, with numbers initially sorted in different ways, let us consider the following module NAT-LIST-GENERATOR, which imports the module SORTED-LIST{NatAsToset} defining sorted lists of natural numbers, and specifies functions nats-upto, that builds lists from zero to the specified value, and random-nats, which generates a list of the specified number of random numbers.

  fmod NAT-LIST-GENERATOR is
    protecting SORTED-LIST{NatAsToset} .
    protecting RANDOM .

    vars N M : Nat .

    op nats-upto : Nat -> NeSortedList{NatAsToset} .
    eq nats-upto(s N) = nats-upto(N) ++ s N : [] .
    eq nats-upto(0) = 0 : [] .

    op random-nats : Nat -> List{TOSET}{NatAsToset} .
    op random-nats : Nat Nat -> List{TOSET}{NatAsToset} .
    eq random-nats(N) = random-nats(0, N) .
    ceq random-nats(N, M) 
      = random(N) : random-nats(s N, M) 
      if N <= M .
    eq random-nats(N, M) = [] [owise] .
  endfm

We execute each one of the insertion-sort, mergesort, and quicksort algorithms on three lists, namely, a sorted list, a list in reverse order, and a random one, each of them with 1000 elements.

  Maude> red insertion-sort(nats-upto(1000)) .
  rewrites: 2012009 in 1032ms cpu (1079ms real) (1948029 rews/sec)
  result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...

  Maude> red insertion-sort(reverse(nats-upto(1000))) .
  rewrites: 5519515 in 3634ms cpu (3694ms real) (1518667 rews/sec)
  result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...

  Maude> red insertion-sort(random-nats(1000)) .
  rewrites: 1535372 in 1286ms cpu (1397ms real) (1193166 rews/sec)
  result NeSortedList{NatAsToset}: 23772 : 1738648 : 2016694 : ...

  Maude> red mergesort(nats-upto(1000)) .
  rewrites: 2082358 in 1079ms cpu (1134ms real) (1928402 rews/sec)
  result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...

  Maude> red mergesort(reverse(nats-upto(1000))) .
  rewrites: 2578581 in 1210ms cpu (1221ms real) (2129622 rews/sec)
  result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...

  Maude> red mergesort(random-nats(1000)) .
  rewrites: 88005 in 71ms cpu (77ms real) (1222478 rews/sec)
  result NeSortedList{NatAsToset}: 23772 : 1738648 : 2016694 : ...

  Maude> red quicksort(nats-upto(1000)) .
  rewrites: 6519514 in 5065ms cpu (5344ms real) (1287111 rews/sec)
  result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...

  Maude> red quicksort(reverse(nats-upto(1000))) .
  rewrites: 6528518 in 4096ms cpu (4130ms real) (1593729 rews/sec)
  result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...

  Maude> red quicksort(random-nats(1000)) .
  rewrites: 97858 in 75ms cpu (84ms real) (1287791 rews/sec)
  result NeSortedList{NatAsToset}: 23772 : 1738648 : 2016694 : ...

Instead of considering the profiling information separately, we use the set clear profile off command, so that the profiling information gets accumulated.

  Maude> set clear profile off .

  Maude> set profile on .

  Maude> red insertion-sort(nats-upto(1000)) .
  rewrites: 2012009 in 1169ms cpu (1351ms real) (1719927 rews/sec)
  result NeSortedList{NatAsToset}: 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : ...

  Maude> red insertion-sort(reverse(nats-upto(1000))) .
  ...

As mentioned above, with the profiler active, the times taken by the reductions are slightly higher. The number of rewrites and cpu time for each of the cases is presented in the tables displayed in Figure 13.1 (page [*]), where we also include these values for all the different executions discussed in the rest of the section.13.1The information shown by the profiler with the show profile command is three pages long; we just comment the most interesting pieces.

  1. Predefined operators _+_, _quo_, _<=_, _>_, and random are used an important number of times, in particular _<=_ and _>_, which are applied, respectively, 7599823 (28.2%) and 1770593 times (6.6%). Notice that _<=_ is only used in the conditions of one of the membership axioms and in some of the conditions of the equations for insert-list, merge, leq-elems, gr-elems, and random-nats; the operator _>_ is used in the conditions of the equations for insert-list, mergesort, merge, leq-elems, and gr-elems.

      op _+_ : [Nat] [Nat] -> [Nat] .
      built-in eq rewrites: 29961 (0.111124%)
    
      op _quo_ : [Nat] [Nat] -> [Nat] .
      built-in eq rewrites: 3000 (0.0111269%)
    
      op _<=_ : [Nat] [Nat] -> [Bool] .
      built-in eq rewrites: 7599823 (28.1874%)
    
      op _>_ : [Nat] [Nat] -> [Bool] .
      built-in eq rewrites: 1770593 (6.56706%)
    
      op random : [Nat] -> [Nat] .
      built-in eq rewrites: 3003 (0.011138%)
    

  2. From the numbers for the membership axioms we may conclude that they are applied a considerable number of times, in particular the conditional one--4790561 rewrites (17.8%)--. Note that for conditional membership axioms, as for all conditional axioms, the system gives information on the number of matches, that is, the number of times that the conditions are reduced. It also provides the number of times each one of the fragments of the condition is reduced. In the case of the membership axioms in this specification, there is only one fragment. The part of the output corresponding to the membership axioms is the following:

      mb [] : SortedList{NatAsToset} .
      rewrites: 22040 (0.0817455%)
    
      mb N : [] : NeSortedList{NatAsToset} .
      rewrites: 17505 (0.0649254%)
    
      cmb N : NEOL:NeSortedList{NatAsToset} : NeSortedList{NatAsToset} 
        if N <= head(NEOL:NeSortedList{NatAsToset}) = true .
      lhs matches: 4795972    rewrites: 4790561 (17.768%)
      Fragment   Initial tries   Resolve tries   Successes    Failures
      1          4795972         0               4790561      5411
    

    We see that the condition has been checked 4795972 times, out of which only 5411 failed.

  3. From the equations specifying the insertion-sort algorithm, the ones used more times are the two conditional ones for the insert-list function. From the information in the profile we see that these conditional equations have been attempted almost the same number of times, 756888 and 754895. We also see that both have been applied almost the same number of times, because the one that was attempted first almost always failed the evaluation of its condition, and then the second equation was applied.

      eq insertion-sort([]) = [] .
      rewrites: 3 (1.11269e-05%)
    
      eq insertion-sort(N : L:List{TOSET}{NatAsToset}) = insert-list(
        insertion-sort(L:List{TOSET}{NatAsToset}), N) .
      rewrites: 3003 (0.011138%)
    
      eq insert-list([], M) = M : [] .
      rewrites: 1010 (0.00374605%)
    
      ceq insert-list(N : OL:SortedList{NatAsToset}, M) = M : N : 
        OL:SortedList{NatAsToset} if M <= N = true .
      lhs matches: 756888     rewrites: 1993 (0.00739196%)
      Fragment   Initial tries   Resolve tries   Successes    Failures
      1          756888          0               1993         754895
    
      ceq insert-list(N : OL:SortedList{NatAsToset}, M) = N : 
        insert-list(OL:SortedList{NatAsToset}, M) if M > N = true .
      lhs matches: 754895     rewrites: 754895 (2.79988%)
      Fragment   Initial tries   Resolve tries   Successes    Failures
      1          754895          0               754895       0
    

  4. The information on the equations for mergesort presents a similar pattern, the main difference being that there is only one conditional equation, since merge is declared as commutative. In this case the number of rewrites of all the equations is relatively small, much smaller than the total of rewrites in the computation. This makes us think that the weight of the computation of the memberships and generation of the lists to sort is much higher that the sorting itself.

      eq mergesort(N : []) = N : [] .
      rewrites: 3003 (0.011138%)
    
      ceq mergesort(L:List{TOSET}{NatAsToset}) = merge(mergesort(take 
        length(L:List{TOSET}{NatAsToset}) quo 2 from L:List{TOSET}{
        NatAsToset}),mergesort(throw length(L:List{TOSET}{
        NatAsToset}) quo 2 from L:List{TOSET}{NatAsToset})) if 
        length(L:List{TOSET}{NatAsToset}) > 1 = true .
      lhs matches: 3000       rewrites: 3000 (0.0111269%)
      Fragment   Initial tries   Resolve tries   Successes    Failures
      1          3000            0               3000         0
    
      eq merge([], OL:SortedList{NatAsToset}) = OL:SortedList{
        NatAsToset} .
      rewrites: 3000 (0.0111269%)
    
      ceq merge(N : OL:SortedList{NatAsToset}, M : OL':SortedList{
        NatAsToset}) = N :merge(OL:SortedList{NatAsToset}, M : 
        OL':SortedList{NatAsToset}) if N <= M = true .
      lhs matches: 18705      rewrites: 18705 (0.0693761%)
      Fragment   Initial tries   Resolve tries   Successes    Failures
      1          18705           0               18705        0
    

  5. The information for the quicksort algorithm follows a similar pattern as well. However, in this case it is interesting to notice that the equations for the leq-elems and gr-elems operations are attempted the same number of times, and for each of these operations the condition (say, N <= M) of one of the equations fails in around half of the cases, being then used the other equation (with condition, say, N > M.

      eq quicksort([]) = [] .
      rewrites: 3006 (0.0111491%)
    
      eq quicksort(N : L:List{TOSET}{NatAsToset}) = quicksort(
        leq-elems(L:List{TOSET}{NatAsToset}, N)) ++ N : quicksort(
        gr-elems(L:List{TOSET}{NatAsToset}, N)) .
      rewrites: 3003 (0.011138%)
    
      eq leq-elems([], M) = [] .
      rewrites: 3003 (0.011138%)
    
      ceq leq-elems(N : L:List{TOSET}{NatAsToset}, M) = N : 
        leq-elems(L:List{TOSET}{NatAsToset}, M) if N <= M = true .
      lhs matches: 1012626    rewrites: 506277 (1.87776%)
      Fragment   Initial tries   Resolve tries   Successes    Failures
      1          1012626         0               506277       506349
    
      ceq leq-elems(N : L:List{TOSET}{NatAsToset}, M) = leq-elems(
        L:List{TOSET}{NatAsToset}, M) if N > M = true .
      lhs matches: 506349     rewrites: 506349 (1.87803%)
      Fragment   Initial tries   Resolve tries   Successes    Failures
      1          506349          0               506349       0
    
      eq gr-elems([], M) = [] .
      rewrites: 3003 (0.011138%)
    
      ceq gr-elems(N : L:List{TOSET}{NatAsToset}, M) = gr-elems(
        L:List{TOSET}{NatAsToset}, M) if N <= M = true .
      lhs matches: 1012626    rewrites: 506277 (1.87776%)
      Fragment   Initial tries   Resolve tries   Successes    Failures
      1          1012626         0               506277       506349
    
      ceq gr-elems(N : L:List{TOSET}{NatAsToset}, M) = N : gr-elems(
        L:List{TOSET}{NatAsToset}, M) if N > M = true .
      lhs matches: 506349     rewrites: 506349 (1.87803%)
      Fragment   Initial tries   Resolve tries   Successes    Failures
      1          506349          0               506349       0
    

  6. From the rest of the equations applied we may highlight those for the head and _++_ operations.

      eq head(E:Nat : L:List{TOSET}{NatAsToset}) = E:Nat .
      rewrites: 4795972 (17.7881%)
    
      eq [] ++ L:List{TOSET}{NatAsToset} = L:List{TOSET}{
        NatAsToset} .
      rewrites: 12006 (0.0445298%)
    
      eq (E:Nat : L:List{TOSET}{NatAsToset}) ++ L':List{TOSET}{
        NatAsToset} =E:Nat : (L:List{TOSET}{NatAsToset} ++ L':List{
        Toset}{NatAsToset}) .
      rewrites: 5010777 (18.5848%)
    

    The equation for the head function is used in the evaluation of the condition of the membership axiom. The concatenation operator is used in the quicksort, nats-upto, and reverse functions.

Taking all the information provided by the profiler into account, we may think of doing different types of modifications to the original specification.

There is still room for improvement in this specification. For instance, some operations on lists can be made more efficient by means of tail-recursive definitions with accumulator arguments, in the style of the definitions shown in Section 7.12.1.


next up previous contents
Next: Performance note Up: Debugging approaches Previous: The debugger   Contents
The Maude Team