[firedrake] PDESoft 2014 slides
Florian Rathgeber
f.rathgeber10 at imperial.ac.uk
Sat Jul 12 22:51:24 BST 2014
On 12/07/14 08:06, David Ham wrote:
>
> Those look like interesting results.
>
> Do we have any idea why we are slow on CUDA on the RHS?
The reason is that afaict the kernel uses too many resources: 57
registers and 28.047K of shared memory. We therefore get a theoretical
occupancy of 6.25% i.e. only 1/16 SMX units on the 680 can be used. That
is up to 64 DP FMAs at half the clock speed of a Xeon core...
> Do we have any indication of actual speed compared with peak flops or
> bandwidth?
I haven't been able to figure out how to drive the Nvidia profiler to
record the required metrics, but we should be able to get those somehow.
Florian
> Regards,
>
> David
>
>
>
> On Friday, July 11, 2014, Rathgeber, Florian
> <f.rathgeber10 at imperial.ac.uk <mailto:f.rathgeber10 at imperial.ac.uk>> wrote:
>
> I have now added performance results for advection assembly (matrix +
> RHS). We can still claim (performance) portability to some degree across
> sequential, OpenMP and CUDA.
>
> On 10/07/14 11:23, David Ham wrote:
> > I'm concerned that there are no performance results at all. Do we not
> > even have CPU results?
> >
> > On Wednesday, July 9, 2014, Rathgeber, Florian
> > <f.rathgeber10 at imperial.ac.uk <javascript:;>
> <mailto:f.rathgeber10 at imperial.ac.uk <javascript:;>>> wrote:
> >
> > Draft slides for my 15min PDESoft talk on PyOP2 next week are at
> > http://kynan.github.io/pdesoft2014
> >
> > Any comments and suggestions much appreciated.
> >
> > Florian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2980 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mailman.ic.ac.uk/pipermail/firedrake/attachments/20140712/2b9611e8/attachment.p7s>
More information about the firedrake
mailing list