STREAM benchmark
更改 Makefile
CC = icc
CFLAGS = -O3 -axSSE4.2 -openmp
# export OMP_NUM_THREADS=16
# ./stream_c.exe
————————————————————-
STREAM version $Revision: 5.10 $
————————————————————-
This system uses 8 bytes per array element.
————————————————————-
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
————————————————————-
Number of Threads requested = 16
Number of Threads counted = 16
————————————————————-
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 1475 microseconds.
(= 1475 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
————————————————————-
WARNING — The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
————————————————————-
Function Best Rate MB/s Avg time Min time Max time
Copy: 62479.2 0.002572 0.002561 0.002590
Scale: 73359.1 0.002190 0.002181 0.002208
Add: 75783.6 0.003184 0.003167 0.003201
Triad: 77142.5 0.003142 0.003111 0.003182
————————————————————-
Solution Validates: avg error less than 1.000000e-13 on all three arrays
————————————————————-
參考 : http://www.cs.virginia.edu/stream/,http://www.cs.virginia.edu/stream/ref.html