[firedrake] cached kernels

David Ham David.Ham at imperial.ac.uk
Thu Nov 12 11:42:48 GMT 2015


Is a bandwidth 3, 64 row matrix enough to make LAPACK fast? That's not a
hell of a lot of non-zero entries....

On Thu, 12 Nov 2015 at 11:39 Eike Mueller <E.Mueller at bath.ac.uk> wrote:

> Hi Lawrence,
>
> >> I have effectively no idea what's going on.  Does the LU solve take
> >> this long on this much data if you just call it from C?
>
> I just carried out exactly that experiment (see the C-code in
> firedrake-multigridpaper/code/test_lusolve). Basically in each vertical
> column I do an LU solve with exactly the same matrix size and bandwidth as
> in the firedrake code. I use the same horizontal grid size as for the
> firedrake code and also use the compiler/linker flags which are printed out
> at the end of the run by PETSc, so I’m confident that the C-code does
> exactly the same as the code autogenerated by firedrake (only difference is
> that I initialise the matrix with random values, but make it diagonally
> dominant to avoid excessive pivoting).
>
> Interestingly I can reproduce the problem in the pure C-code, so there
> must be an issue with the LAPACK/BLAS on ARCHER or the matrices are simply
> to small to get good performance (for example because the indirect
> addressing in the horizontal avoids efficient reuse of higher-level
> caches). More specifically I get the following bandwidths (calculated by
> working out the data volume that is streamed through for every LU solve and
> dividing this by the measured runtime):
>
> *** lowest order ***
> (81920 vertical columns, matrix size = 64x64, matrix bandwidth=3, 24 cores
> on ARCHER)
> Measured memory bandwidth = 0.530GB/s (per core), 12.722GB/s (per node)
>
> *** higher order ***
> (5120 vertical columns, matrix size = 384x384, matrix bandwidth=55, 24
> cores on ARCHER)
> Measured memory bandwidth = 3.601GB/s (per core), 86.434GB/s (per node)
>
> so the higher order case is running at bandwidth peak, but the lowest
> order case is far away from it.
>
> That implies that the problem lies with the BLAS/LAPACK implementation,
> not hidden firedrake overheads.
>
> Probably the way forward is to follow this up with ARCHER support, I can
> now give them a well defined test case which reproduces the problem.
>
> Any other ideas?
>
> Thanks,
>
> Eike
> _______________________________________________
> firedrake mailing list
> firedrake at imperial.ac.uk
> https://mailman.ic.ac.uk/mailman/listinfo/firedrake
>
-------------- next part --------------
HTML attachment scrubbed and removed


More information about the firedrake mailing list