Double precision ...

G.L.I.S.
G.L.I.S.
Joined: 27 Dec 08
Posts: 2
Credit: 33600816
RAC: 25179
Topic 230228

Maybe I discover hot water, but at least several GPUs succeed in double-precision calculation (except that the performance drops dramatically)...

mikey
mikey
Joined: 22 Jan 05
Posts: 12043
Credit: 1834323219
RAC: 37243

G.L.I.S. wrote: Maybe I

G.L.I.S. wrote:

Maybe I discover hot water, but at least several GPUs succeed in double-precision calculation (except that the performance drops dramatically)...

Thjis is normal with the gpu tasks here at Einstein, the cpu does alot of finishing and preparing the task to be sent back to the Server. Nvidia gpu's have reduced the 64 bit precision ratio dramatically over the years and while AMD gpu's are better they too have recently dropped their dual precision rations. It seems games don't need it and they are by far the bigger driver of gpu sales.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6550
Credit: 288418937
RAC: 71506

Broadly you'd think that for

Broadly you'd think that for a given GPU, single-precision-speed to double-precision-speed would be around 2:1 like it is on, say, a Fermi (or a Kepler at 3:1). But most Nvidia consumer (i.e. gaming) GPUs are much worse like up to 32:1. Games just don't need a 64 bit solution, they don't even need a 33rd bit. Also such implementations don't usually adhere to IEEE standards for FP64 like in handling rounding and some fiddly exceptions. That is woeful for scientific aims.

Hence at E@H we search for candidates that are then given closer/conforming scrutiny after the GPU is finished. Some of that extra attention is on the user's CPU, and some is possibly done on the Atlas cluster at AEI after return of the work-unit results. Why this works at all as a search strategy is that we get to complete the examination of some subset of the parameter space in a reasonably finite time using the high parallelism of GPUs. (The technical detail on such issues is mind boggling.)

Cheers, Mike

(edit) A related issue is the degree of fineness/coarseness of the division of the parameter space that we study. An E@H work-unit is from a relatively coarse grained 'slice' of parameter space compared to a finer division that could occur. But we want to finish some inquiry of the parameter space in a reasonable time. So a real signal present in the detector data will give a lower match statistic ( probabilistic ) from a user's results compared to say a follow up examination of an interesting candidate by the Atlas cluster. This is called a hierarchical search. If we didn't have GPUs and their speed then some searches would not be practicable at all!

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

I have been running Lateah

I have been running Lateah GPU tasks on an Intel CPU i5 with its embedded GPU processor. Slow but no errors or invalid tasks.

Tullio

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 869
Credit: 16295642
RAC: 5807

tullio schrieb:I have been

tullio wrote:
I have been running Lateah GPU tasks on an Intel CPU i5 with its embedded GPU processor. Slow but no errors or invalid tasks.

If I remember Bernd's last explanations correctly, then the FGRPB1G search (LATeah on GPU) does not use FP64 calculations within the main part of the analysis, but only in the final follow up of the TOP10 candidates, which is done on CPU for Intel iGPUs. With BRP7, the main part of the analysis requires FP64, which is why there is no science app for iGPU.

I can only underline what Mike Hewson explained in detail. FP64 floating point does not mean a reliable arithmetics in a scientific sense which returns exactly the same result independent of different GPU types. (rounding errors, exceptions, etc.). That's what the FPU of a CPU does, namely comply with IEEE754.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.