[firedrake] PDESoft 2014 slides

Florian Rathgeber f.rathgeber10 at imperial.ac.uk
Sat Jul 12 22:51:24 BST 2014


On 12/07/14 08:06, David Ham wrote:
> 
> Those look like interesting results.
> 
> Do we have any idea why we are slow on CUDA on the RHS? 

The reason is that afaict the kernel uses too many resources: 57
registers and 28.047K of shared memory. We therefore get a theoretical
occupancy of 6.25% i.e. only 1/16 SMX units on the 680 can be used. That
is up to 64 DP FMAs at half the clock speed of a Xeon core...

> Do we have any indication of actual speed compared with peak flops or
> bandwidth?

I haven't been able to figure out how to drive the Nvidia profiler to
record the required metrics, but we should be able to get those somehow.

Florian

> Regards,
> 
> David
> 
> 
> 
> On Friday, July 11, 2014, Rathgeber, Florian
> <f.rathgeber10 at imperial.ac.uk <mailto:f.rathgeber10 at imperial.ac.uk>> wrote:
> 
>     I have now added performance results for advection assembly (matrix +
>     RHS). We can still claim (performance) portability to some degree across
>     sequential, OpenMP and CUDA.
> 
>     On 10/07/14 11:23, David Ham wrote:
>     > I'm concerned that there are no performance results at all. Do we not
>     > even have CPU results?
>     >
>     > On Wednesday, July 9, 2014, Rathgeber, Florian
>     > <f.rathgeber10 at imperial.ac.uk <javascript:;>
>     <mailto:f.rathgeber10 at imperial.ac.uk <javascript:;>>> wrote:
>     >
>     >     Draft slides for my 15min PDESoft talk on PyOP2 next week are at
>     >     http://kynan.github.io/pdesoft2014
>     >
>     >     Any comments and suggestions much appreciated.
>     >
>     >     Florian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2980 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.ic.ac.uk/pipermail/firedrake/attachments/20140712/2b9611e8/attachment.p7s>


More information about the firedrake mailing list