ARM Neon reference tests
========================
This package contains extensive tests for the ARM/Neon instructions.

It works by building a program which uses all of them, and then
executing it on an actual target or a simulator.

It can be used to validate the simulator against an actual HW target,
or to validate C compilers in presence of Neon intrinsics calls.

The supplied Makefile enables to build with both ARM RVCT compiler and
GNU GCC (for the ARM target), and supports execution with ARM RVDEBUG
on an ARM simulator and with QEMU.

For convenience, the ARM ELF binary file (as compiled with RVCT) is
supplied (compute_ref.axf), as well as expected output (ref-rvct.txt).

A second file containing expected output is also supplied:
ref-rvct-neon.txt, which contains only the results of the Neon
instrinsics tests. It is aimed at being used to check GCC's results,
since this compiler does not support the integer & dsp builtins whose
results are also present in ref-rvct.txt.

Typical usage when used to debug QEmu:
$ make all # to build the test program with ARM rvct and execute with QEmu
$ make check # to compare the results with the expected output


Known issues:
-------------
Some tests currently fail to build with GCC/ARM:
- missing include files: dspfns.h, armdsp.h

As GCC/ARM provides no support for the
Neon_Cumulative_Saturation/fpsrc register, auxiliary accessor
functions have been implemented in stm-arm-neon-ref.h.

Engineering:
------------
In order to cover all the Neon instructions extensively, these tests
make intensive use of the C-preprocessor, to save maintenance efforts.

Most tests (the more regular ones) share a common basic structure. In
general, variable names are suffixed by their type name, so as to
differentiate variables with the same purpose but of differente types.
Hence vector1_int8x8, vector1_int16x4 etc...

For instance in ref_vmul.c the layout of the code is as follows:

- declare input and output vectors (named 'vector1', 'vector2' and
  'vector_res') of each possible type (s/u, 8/16/32/64 bits).

- clean the result buffers.

- initialize input vectors 'vector1' and 'vector2'.

- call each variant of the intrinsic and store the result in a buffer
  named 'buffer', whose contents is printed after execution.

One can then compare the actual result with the expected one.