Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules

                 Reinhold P. Weicker
                 Siemens AG, E STE 35
                 Postfach 3240
                 D-8520 Erlangen
                 Germany (West)




The Dhrystone benchmark program [1] has become a popular benchmark  for
CPU/compiler  performance  measurement,  in  particular  in the area of
minicomputers, workstations, PC's and  microprocesors.   It  apparently
satisfies a need for an easy-to-use integer benchmark; it gives a first
performance indication which  is  more  meaningful  than  MIPS  numbers
which,  in  their  literal  meaning  (million instructions per second),
cannot be used across different instruction sets (e.g. RISC vs.  CISC).
With  the  increasing  use  of  the  benchmark,  it  seems necessary to
reconsider the benchmark and to check whether it can still fulfill this
function.   Version  2  of  Dhrystone  is  the  result  of  such  a re-
evaluation, it has been made for two reasons:

o Dhrystone has been published in Ada [1], and Versions in Ada,  Pascal
  and  C  have  been  distributed  by Reinhold Weicker via floppy disk.
  However, the version that was used most often  for  benchmarking  has
  been  the version made by Rick Richardson by another translation from
  the Ada version into the C programming language, this  has  been  the
  version distributed via the UNIX network Usenet [2].

  There is an obvious need for a common C version of Dhrystone, since C
  is  at  present  the most popular system programming language for the
  class of systems (microcomputers, minicomputers, workstations)  where
  Dhrystone  is  used  most.  There should be, as far as possible, only
  one C version of Dhrystone such that results can be compared  without
  restrictions.  In  the  past,  the  C  versions  distributed  by Rick
  Richardson (Version 1.1) and by Reinhold Weicker  had  small  (though
  not significant) differences.

  Together with the new C version, the Ada  and  Pascal  versions  have
  been updated as well.

o As far as it is possible without changes to the Dhrystone statistics,
  optimizing  compilers  should  be prevented from removing significant
  statements.  It has turned out in the past that optimizing  compilers
  suppressed  code  generation  for  too many statements (by "dead code
  removal" or "dead variable  elimination").   This  has  lead  to  the
  danger  that  benchmarking results obtained by a naive application of
  Dhrystone - without inspection of the code that was generated - could
  become meaningless.

The overall policiy for version 2 has been  that  the  distribution  of
statements,  operand types and operand locality described in [1] should
remain  unchanged  as  much  as  possible.   (Very  few  changes   were
necessary;  their  impact  should  be  negligible.)  Also, the order of
statements should  remain  unchanged.  Although  I  am  aware  of  some
critical  remarks on the benchmark - I agree with several of them - and
know some suggestions for improvement, I  didn't  want  to  change  the
benchmark  into  something  different  from  what  has  become known as
"Dhrystone"; the confusion generated by such a  change  would  probably
outweight  the  benefits. If I were to write a new benchmark program, I
wouldn't give it the name "Dhrystone" since this  denotes  the  program
published in [1].  However, I do recognize the need for a larger number
of representative programs that can be used as benchmarks; users should
always be encouraged to use more than just one benchmark.

The  new  versions  (version  2.1  for  C,  Pascal  and  Ada)  will  be
distributed  as  widely as possible.  (Version 2.1 differs from version
2.0 distributed via the UNIX Network Usenet in March 1988 only in a few
corrections  for  minor  deficiencies  found  by users of version 2.0.)
Readers who want to use the benchmark for their  own  measurements  can
obtain  a copy in machine-readable form on floppy disk (MS-DOS or XENIX
format) from the author.


In general, version 2 follows - in the parts that are  significant  for
performance  measurement,  i.e.   within  the  measurement  loop  - the
published (Ada) version and  the  C  versions  previously  distributed.
Where  the  versions  distributed  by  Rick Richardson [2] and Reinhold
Weicker have been different, it  follows  the  version  distributed  by
Reinhold  Weicker.  (However,  the  differences have been so small that
their impact on execution time in all likelihood has been  negligible.)
The  initialization  and  UNIX  instrumentation  part  - which had been
omitted in [1] - follows mostly  the  ideas  of  Rick  Richardson  [2].
However,  any changes in the initialization part and in the printing of
the result have no impact on performance  measurement  since  they  are
outside  the  measaurement  loop.   As a concession to older compilers,
names have been made unique within the first 8  characters  for  the  C
version.

The original publication of Dhrystone did not  contain  any  statements
for  time  measurement  since  they  are  necessarily system-dependent.
However, it turned out that it is not enough just to inclose  the  main
procedure of Dhrystone in a loop and to measure the execution time.  If
the variables that are computed are not  used  somehow,  there  is  the
danger  that  the  compiler  considers  them  as  "dead  variables" and
suppresses code generation for a part of the statements.  Therefore  in
version  2  all  variables  of  "main"  are  printed  at the end of the
program. This  also  permits  some  plausibility  control  for  correct
execution of the benchmark.

At several places in the benchmark, code has been added,  but  only  in
branches  that  are  not  executed.  The  intention  is that optimizing
compilers should be prevented from moving code out of  the  measurement
loop,  or  from  removing code altogether. Statements that are executed
have been changed in very few places only.  In these  cases,  only  the
role  of  some operands has been changed, and it was made sure that the
numbers  defining  the  "Dhrystone   distribution"   (distribution   of
statements, operand types and locality) still hold as much as possible.
Except for sophisticated  optimizing  compilers,  execution  times  for
version 2.1 should be the same as for previous versions.

Because of the self-imposed limitation that the order and  distribution
of the executed statements should not be changed, there are still cases
where optimizing compilers may not generate code for  some  statements.
To   a   certain  degree,  this  is  unavoidable  for  small  synthetic
benchmarks.  Users of the benchmark are advised to check code  listings
whether code is generated for all statements of Dhrystone.

Contrary to the suggestion in the published paper and  its  realization
in  the  versions  previously  distributed, no attempt has been made to
subtract the time for the measurement loop overhead. (This  calculation
has  proven  difficult  to implement in a correct way, and its omission
makes the program simpler.) However, since the loop check is  now  part
of  the benchmark, this does have an impact - though a very minor one -
on the  distribution  statistics  which  have  been  updated  for  this
version.


In this section, all changes are described that affect the  measurement
loop and that are not just renamings of variables. All remarks refer to
the C version; the other language versions have been updated similarly.

In addition to adding the measurement loop and the printout statements,
changes have been made at the following places:

o In procedure "main", three statements have been  added  in  the  non-
  executed "then" part of the statement
    if (Enum_Loc == Func_1 (Ch_Index, 'C'))
  they are
    strcpy (Str_2_Loc, "DHRYSTONE PROGRAM, 3'RD STRING");
    Int_2_Loc = Run_Index;
    Int_Glob = Run_Index;
  The string assignment prevents movement of the  preceding  assignment
  to  Str_2_Loc  (5'th statement of "main") out of the measurement loop
  (This probably will not happen for the C version, but it  did  happen
  with  another  language  and  compiler.)  The assignment to Int_2_Loc
  prevents value propagation  for  Int_2_Loc,  and  the  assignment  to
  Int_Glob  makes  the  value  of  Int_Glob possibly dependent from the
  value of Run_Index.

o In the three arithmetic computations at the end  of  the  measurement
  loop  in  "main  ", the role of some variables has been exchanged, to
  prevent the division from just cancelling out the  multiplication  as
  it  was in [1].  A very smart compiler might have recognized this and
  suppressed code generation for the division.

o For Proc_2, no code has been changed, but the values  of  the  actual
  parameter have changed due to changes in "main".

o In Proc_4, the second assignment has been changed from
    Bool_Loc = Bool_Loc | Bool_Glob;
  to
    Bool_Glob = Bool_Loc | Bool_Glob;
  It now assigns a value to  a  global  variable  instead  of  a  local
  variable (Bool_Loc); Bool_Loc would be a "dead variable" which is not
  used afterwards.

o In Func_1, the statement
    Ch_1_Glob = Ch_1_Loc;
  was added in the non-executed "else" part of the "if"  statement,  to
  prevent  the  suppression  of  code  generation for the assignment to
  Ch_1_Loc.

o In Func_2, the second character comparison statement has been changed
  to
    if (Ch_Loc == 'R')
  ('R' instead of 'X') because a comparison with 'X' is implied in  the
  preceding "if" statement.

  Also in Func_2, the statement
    Int_Glob = Int_Loc;
  has been added in the non-executed part of the last  "if"  statement,
  in order to prevent Int_Loc from becoming a dead variable.

o In Func_3, a non-executed "else" part has  been  added  to  the  "if"
  statement.   While  the  program  would not be incorrect without this
  "else" part, it is considered bad programming practice if a  function
  can be left without a return value.

  To compensate for this change, the (non-executed) "else" part in  the
  "if" statement of Proc_3 was removed.

The distribution statistics have been changed only by the  addition  of
the  measurement  loop  iteration (1 additional statement, 4 additional
local integer operands) and  by  the  change  in  Proc_4  (one  operand
changed  from  local  to  global).  The  distribution statistics in the
comment headers have been updated accordingly.


The string operations (string assignment and  string  comparison)  have
not  been  changed,  to  keep  the program consistent with the original
version.

There has been some  concern  that  the  string  operations  are  over-
represented  in  the  program,  and that execution time is dominated by
these  operations.   This  was  true  in  particular  when   optimizing
compilers  removed  too much code in the main part of the program, this
should have been mitigated in version 2.

It should be noted that this is a language-dependent issue:   Dhrystone
was  first published in Ada, and with Ada or Pascal semantics, the time
spent in the string operations is,  at  least  in  all  implementations
known  to  me, considerably smaller.  In Ada and Pascal, assignment and
comparison of strings are operators defined in the  language,  and  the
upper  bounds of the strings occuring in Dhrystone are part of the type
information known at compilation time.   The  compilers  can  therefore
generate efficient inline code.  In C, string assignemt and comparisons
are not part  of  the  language,  so  the  string  operations  must  be
expressed  in  terms  of the C library functions "strcpy" and "strcmp".
(ANSI  C  allows  an  implementation  to  use  inline  code  for  these
functions.)   In addition to the overhead caused by additional function
calls, these functions are defined for  null-terminated  strings  where
the  length  of  the  strings  is  not  known  at compilation time; the
function has to check every byte for  the  termination  condition  (the
null byte).

Obviously, a C library which includes efficiently  coded  "strcpy"  and
"strcmp"  functions  helps to obtain good Dhrystone results. However, I
don't think that this is unfair since string functions do  occur  quite
frequently  in real programs (editors, command interpreters, etc.).  If
the strings functions are  implemented  efficiently,  this  helps  real
programs as well as benchmark programs.

I admit that the string comparison in Dhrystone terminates later (after
scanning  20 characters) than most string comparisons in real programs.
For consistency with  the  original  benchmark,  I  didn't  change  the
program despite this weakness.


When Dhrystone is used, the following "ground rules" apply:

o Separate compilation (Ada and C versions)

  As  mentioned  in  [1],  Dhrystone  was  written  to  reflect  actual
  programming  practice  in  systems  programming.   The  division into
  several compilation units (5 in the Ada version, 2 in the C  version)
  is  intended, as is the distribution of inter-module and intra-module
  subprogram  calls.   Although  on  many  systems  there  will  be  no
  difference  in  execution  time  to  a  Dhrystone  version  where all
  compilation units are merged into one file, the rule is that separate
  compilation  should  be used.  The intention is that real programming
  practice, where programs consist of  several  independently  compiled
  units, should be reflected.  This also has implies that the compiler,
  while compiling one  unit,  has  no  information  about  the  use  of
  variables,  register  allocation  etc.  occuring in other compilation
  units.  Although in real life  compilation  units  will  probably  be
  larger,  the  intention is that these effects of separate compilation
  are modeled in Dhrystone.

  A few  language  systems  have  post-linkage  optimization  available
  (e.g.,  final  register allocation is performed after linkage).  This
  is a borderline case: Post-linkage optimization  involves  additional
  program  preparation time (although not as much as compilation in one
  unit) which may prevent its general use in practical programming.   I
  think that since it defeats the intentions given above, it should not
  be used for Dhrystone.

  Unfortunately, ISO/ANSI Pascal does not contain language features for
  separate  compilation.   Although  most  commercial  Pascal compilers
  provide separate compilation in  some  way,  we  cannot  use  it  for
  Dhrystone  since such a version would not be portable.  Therefore, no
  attempt has been made  to  provide  a  Pascal  version  with  several
  compilation units.

o No procedure merging

  Although  Dhrystone  contains  some  very  short   procedures   where
  execution  would  benefit  from  procedure  merging  (inlining, macro
  expansion of procedures), procedure merging is not to be  used.   The
  reason is that the percentage of procedure and function calls is part
  of the "Dhrystone distribution" of statements contained in [1].  This
  restriction  does  not hold for the string functions of the C version
  since ANSI C allows an implementation to use inline  code  for  these
  functions.



o Other optimizations are allowed, but they should be indicated

  It is  often  hard  to  draw  an  exact  line  between  "normal  code
  generation"  and  "optimization" in compilers: Some compilers perform
  operations by default that are invoked in other compilers  only  when
  optimization  is explicitly requested.  Also, we cannot avoid that in
  benchmarking people try to achieve  results  that  look  as  good  as
  possible.   Therefore,  optimizations  performed by compilers - other
  than those listed above - are not forbidden when Dhrystone  execution
  times  are measured.  Dhrystone is not intended to be non-optimizable
  but is intended to be similarly optimizable as normal programs.   For
  example,  there  are  several  places  in Dhrystone where performance
  benefits from optimizations like  common  subexpression  elimination,
  value propagation etc., but normal programs usually also benefit from
  these optimizations.  Therefore, no effort was made  to  artificially
  prevent  such  optimizations.   However,  measurement  reports should
  indicate which compiler  optimization  levels  have  been  used,  and
  reporting  results with different levels of compiler optimization for
  the same hardware is encouraged.

o Default results are those without "register" declarations (C version)

  When Dhrystone results are quoted without  additional  qualification,
  they  should  be  understood  as  results obtained without use of the
  "register" attribute. Good compilers should be able to make good  use
  of  registers  even  without  explicit register declarations ([3], p.
  193).

Of  course,  for  experimental  purposes,  post-linkage   optimization,
procedure  merging  and/or  compilation  in  one  unit  can  be done to
determine their effects.  However,  Dhrystone  numbers  obtained  under
these   conditions  should  be  explicitly  marked  as  such;  "normal"
Dhrystone results should be understood as  results  obtained  following
the ground rules listed above.

In any case, for serious performance evaluation, users are  advised  to
ask  for  code listings and to check them carefully.  In this way, when
results for different systems  are  compared,  the  reader  can  get  a
feeling how much performance difference is due to compiler optimization
and how much is due to hardware speed.


The C version 2.1 of Dhrystone has been developed in  cooperation  with
Rick Richardson (Tinton Falls, NJ), it incorporates many ideas from the
"Version 1.1" distributed previously  by  him  over  the  UNIX  network
Usenet.  Through  his  activity with Usenet, Rick Richardson has made a
very valuable contribution to the dissemination of  the  benchmark.   I
also  thank  Chaim  Benedelac  (National  Semiconductor),  David Ditzel
(SUN), Earl Killian and John  Mashey  (MIPS),  Alan  Smith  and  Rafael
Saavedra-Barrera  (UC  at  Berkeley)  for  their  help with comments on
earlier versions of the benchmark.


[1]
   Reinhold P. Weicker:  Dhrystone:  A  Synthetic  Systems  Programming
   Benchmark.
   Communications of the ACM 27, 10 (Oct. 1984), 1013-1030

[2]
   Rick Richardson: Dhrystone 1.1 Benchmark Summary (and Program Text)
   Informal Distribution via "Usenet", Last Version Known to me:  Sept.
   21, 1987

[3]
   Brian W.  Kernighan  and  Dennis  M.  Ritchie:   The  C  Programming
   Language.
   Prentice-Hall, Englewood Cliffs (NJ) 1978