[firedrake] PDESoft 2014 slides

Sun Jul 13 11:45:20 BST 2014

Hi florian,
Are the lines labelled right on slide 12?
12 core is presumably fast than one.
Some absolute measure of how fast would help a lot - ie flops or gb/s.  I don't see why that requires the cuda profiler.  If you can figure out the flops and bytes moved on the cpu, it should be the same?

The point of the left log log graph is presumably to talk about startup overhead. Not sure why you are leading with this.

Paul

----- Reply message -----
From: "David Ham" <David.Ham at imperial.ac.uk>
To: "firedrake" <firedrake at imperial.ac.uk>
Subject: [firedrake] PDESoft 2014 slides
Date: Sun, Jul 13, 2014 07:25

On Saturday, July 12, 2014, Rathgeber, Florian <f.rathgeber10 at imperial.ac.uk<mailto:f.rathgeber10 at imperial.ac.uk>> wrote:
On 12/07/14 08:06, David Ham wrote:
>
> Those look like interesting results.
>
> Do we have any idea why we are slow on CUDA on the RHS?

The reason is that afaict the kernel uses too many resources: 57
registers and 28.047K of shared memory. We therefore get a theoretical
occupancy of 6.25% i.e. only 1/16 SMX units on the 680 can be used. That
is up to 64 DP FMAs at half the clock speed of a Xeon core...

oK. that's a good analysis, make sure you give yourself time to give it. It'll make the audience realise you really know what you are doing.

> Do we have any indication of actual speed compared with peak flops or
> bandwidth?

I haven't been able to figure out how to drive the Nvidia profiler to
record the required metrics, but we should be able to get those somehow.

Florian

> Regards,
>
> David
>
>
>
> On Friday, July 11, 2014, Rathgeber, Florian
> <f.rathgeber10 at imperial.ac.uk <mailto:f.rathgeber10 at imperial.ac.uk>> wrote:
>
>     I have now added performance results for advection assembly (matrix +
>     RHS). We can still claim (performance) portability to some degree across
>     sequential, OpenMP and CUDA.
>
>     On 10/07/14 11:23, David Ham wrote:
>     > I'm concerned that there are no performance results at all. Do we not
>     > even have CPU results?
>     >
>     > On Wednesday, July 9, 2014, Rathgeber, Florian
>     > <f.rathgeber10 at imperial.ac.uk <javascript:;>
>     <mailto:f.rathgeber10 at imperial.ac.uk <javascript:;>>> wrote:
>     >
>     >     Draft slides for my 15min PDESoft talk on PyOP2 next week are at
>     >     http://kynan.github.io/pdesoft2014
>     >
>     >     Any comments and suggestions much appreciated.
>     >
>     >     Florian

--
Dr David Ham
Departments of Mathematics and Computing
Imperial College London

http://www.imperial.ac.uk/people/david.ham

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 7328 bytes
Desc: not available
URL: <http://mailman.ic.ac.uk/pipermail/firedrake/attachments/20140713/32b2d070/attachment.bin>