Skip to content

Commit 4602dfa

Browse files
committed
[tput] empty commit with the current V100/T4 status for the paper
This is the status of the current upstream/master NB! The EvtsPerSec[MatrixElems] in CUDA are varying wildly. In double I see this go from 5.9E8 to 6.7E8 today, yesterday 7.2E8 with the same code. The EvtsPerSec[MatrixElems] however is much more stable around 1.37E9 in all those cases. I will therefore keep the numbers in the paper from previous measurements and vCHEP for EvtsPerSec[MatrixElems] (namely 7.25E8 /double and 1.59E9 /float) and add any recent EvtsPerSec[MECalcOnly] as 1.37E9 /double and 3.28E9 /float On itscrd70.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla V100S-PCIE-32GB]: ========================================================================= Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = DOUBLE (NaN/abnormal=0, zero=0) EvtsPerSec[MatrixElems] (3) = ( 6.743408e+08 ) sec^-1 EvtsPerSec[MECalcOnly] (3a) = ( 1.367534e+09 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 0.743706 sec 2,588,597,917 cycles # 2.648 GHz 3,526,137,396 instructions # 1.36 insn per cycle 1.048524808 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 120 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = FLOAT (NaN/abnormal=2, zero=0) EvtsPerSec[MatrixElems] (3) = ( 1.516160e+09 ) sec^-1 EvtsPerSec[MECalcOnly] (3a) = ( 3.280236e+09 ) sec^-1 MeanMatrixElemValue = ( 1.371686e-02 +- 3.270219e-06 ) GeV^0 TOTAL : 0.690254 sec 2,368,009,568 cycles # 2.649 GHz 3,367,981,007 instructions # 1.42 insn per cycle 0.978942760 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 48 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ========================================================================= Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD) OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 1.301297e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 7.192022 sec 19,241,362,073 cycles # 2.673 GHz 48,583,081,581 instructions # 2.52 insn per cycle 7.203110555 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 614) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[2] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 2.532683e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 4.831134 sec 12,924,690,078 cycles # 2.671 GHz 29,940,147,028 instructions # 2.32 insn per cycle 4.842102414 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 3274) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('avx2': AVX2, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.593254e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 3.662869 sec 9,247,838,169 cycles # 2.520 GHz 16,560,392,033 instructions # 1.79 insn per cycle 3.673581024 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2746) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.892414e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 3.603951 sec 9,129,054,132 cycles # 2.528 GHz 16,497,282,072 instructions # 1.81 insn per cycle 3.614598246 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2572) (512y: 95) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[8] ('512z': AVX512, 512bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 3.738379e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 4.020666 sec 8,891,618,256 cycles # 2.208 GHz 13,361,398,930 instructions # 1.50 insn per cycle 4.035327755 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1127) (512y: 205) (512z: 2045) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=6, zero=0) Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD) OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 1.211023e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371707e-02 +- 3.270376e-06 ) GeV^0 TOTAL : 7.119093 sec 19,060,830,463 cycles # 2.675 GHz 47,728,069,981 instructions # 2.50 insn per cycle 7.129763876 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 578) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=6, zero=0) Internal loops fptype_sv = VECTOR[4] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.507245e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270375e-06 ) GeV^0 TOTAL : 3.351029 sec 8,971,452,327 cycles # 2.672 GHz 19,719,600,560 instructions # 2.20 insn per cycle 3.362076705 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 3719) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[8] ('avx2': AVX2, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 8.160704e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0 TOTAL : 2.695110 sec 6,906,959,097 cycles # 2.556 GHz 12,504,366,929 instructions # 1.81 insn per cycle 2.706037106 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 3077) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[8] ('512y': AVX512, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 8.869343e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0 TOTAL : 2.621328 sec 6,734,710,370 cycles # 2.562 GHz 12,522,685,250 instructions # 1.86 insn per cycle 2.631936677 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2917) (512y: 81) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[16] ('512z': AVX512, 512bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 7.434054e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270340e-06 ) GeV^0 TOTAL : 2.750946 sec 6,437,749,555 cycles # 2.334 GHz 10,930,642,414 instructions # 1.70 insn per cycle 2.761224316 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1559) (512y: 179) (512z: 2157) ========================================================================= On lxplus770.cern.ch [CPU: Intel(R) Xeon(R) Silver 4216 CPU] [GPU: NVIDIA Tesla T4]: ========================================================================= Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = DOUBLE (NaN/abnormal=0, zero=0) EvtsPerSec[MatrixElems] (3) = ( 3.891814e+07 ) sec^-1 EvtsPerSec[MECalcOnly] (3a) = ( 4.009506e+07 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 1.144038 sec 672,779,391 cycles:u # 0.552 GHz 1,362,529,910 instructions:u # 2.03 insn per cycle 1.285736848 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 120 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CUDA [nvcc 11.0.221] FP precision = FLOAT (NaN/abnormal=2, zero=0) EvtsPerSec[MatrixElems] (3) = ( 6.377864e+08 ) sec^-1 EvtsPerSec[MECalcOnly] (3a) = ( 8.217691e+08 ) sec^-1 MeanMatrixElemValue = ( 1.371686e-02 +- 3.270219e-06 ) GeV^0 TOTAL : 0.825550 sec 386,914,589 cycles:u # 0.437 GHz 828,824,446 instructions:u # 2.14 insn per cycle 0.963103649 seconds time elapsed ==PROF== Profiling "sigmaKin": launch__registers_per_thread 64 ==PROF== Profiling "sigmaKin": sm__sass_average_branch_targets_threads_uniform.pct 100% ========================================================================= Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD) OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 1.275975e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 7.515231 sec 19,209,549,764 cycles:u # 2.561 GHz 48,627,460,774 instructions:u # 2.53 insn per cycle 7.553956776 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 614) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[2] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 2.542904e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 4.980303 sec 12,899,873,466 cycles:u # 2.584 GHz 29,995,111,387 instructions:u # 2.33 insn per cycle 5.033503232 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 3274) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('avx2': AVX2, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.508129e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 3.899591 sec 9,295,935,243 cycles:u # 2.391 GHz 16,619,353,171 instructions:u # 1.79 insn per cycle 3.969700585 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2746) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[4] ('512y': AVX512, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.775861e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 3.870736 sec 9,103,619,309 cycles:u # 2.376 GHz 16,556,440,384 instructions:u # 1.82 insn per cycle 3.911828159 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2572) (512y: 95) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = DOUBLE (NaN/abnormal=0, zero=0) Internal loops fptype_sv = VECTOR[8] ('512z': AVX512, 512bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 3.381439e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270315e-06 ) GeV^0 TOTAL : 4.330337 sec 9,151,794,863 cycles:u # 2.106 GHz 13,418,541,159 instructions:u # 1.47 insn per cycle 4.463700520 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1127) (512y: 205) (512z: 2045) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=6, zero=0) Internal loops fptype_sv = SCALAR ('none': ~vector[1], no SIMD) OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 1.169022e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371707e-02 +- 3.270376e-06 ) GeV^0 TOTAL : 7.431409 sec 19,242,935,493 cycles:u # 2.591 GHz 47,777,821,032 instructions:u # 2.48 insn per cycle 7.485035275 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 578) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=6, zero=0) Internal loops fptype_sv = VECTOR[4] ('sse4': SSE4.2, 128bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 4.425426e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371706e-02 +- 3.270375e-06 ) GeV^0 TOTAL : 3.526400 sec 8,942,606,781 cycles:u # 2.588 GHz 19,782,130,775 instructions:u # 2.21 insn per cycle 3.631748518 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 3719) (avx2: 0) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[8] ('avx2': AVX2, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 8.068817e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0 TOTAL : 2.782846 sec 6,900,906,289 cycles:u # 2.473 GHz 12,568,910,963 instructions:u # 1.82 insn per cycle 2.833799464 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 3077) (512y: 0) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[8] ('512y': AVX512, 256bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 8.514326e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270339e-06 ) GeV^0 TOTAL : 2.790172 sec 6,805,090,790 cycles:u # 2.475 GHz 12,587,787,277 instructions:u # 1.85 insn per cycle 2.943286999 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 2917) (512y: 81) (512z: 0) ------------------------------------------------------------------------- Process = EPOCH1_EEMUMU_CPP [gcc (GCC) 9.2.0] FP precision = FLOAT (NaN/abnormal=5, zero=0) Internal loops fptype_sv = VECTOR[16] ('512z': AVX512, 512bit) [cxtype_ref=YES] OMP threads / `nproc --all` = 1 / 4 EvtsPerSec[MECalcOnly] (3a) = ( 6.798061e+06 ) sec^-1 MeanMatrixElemValue = ( 1.371705e-02 +- 3.270340e-06 ) GeV^0 TOTAL : 2.938570 sec 6,641,119,578 cycles:u # 2.249 GHz 10,993,670,890 instructions:u # 1.66 insn per cycle 2.979389891 seconds time elapsed =Symbols in CPPProcess.o= (~sse4: 0) (avx2: 1559) (512y: 179) (512z: 2157) =========================================================================
1 parent 795db2a commit 4602dfa

File tree

0 file changed

+0
-0
lines changed

    0 file changed

    +0
    -0
    lines changed

    0 commit comments

    Comments
     (0)