[firedrake] cached kernels

Sun Nov 8 10:06:31 GMT 2015

Hi Lawrence (copied to firedrake, since overheads from loading libraries might be a general concern),

I tried it on ARCHER and adding caching for the kernels does not make any difference. The LU solve performance at lowest order is poor, but an individual call takes actually more time (~0.01s) than the operator application (~0.001s), so I would have thought the overheads are actually relatively smaller for the LU solve. For the operator application the reported BW is excellent, but for the LU solve it is very poor. At higher order both BWs are good, here the data volume is larger, but the time for one LU solve call is still ~0.01s. Maybe in this case any overhead that shows up at lowest order is hidden.

Could there be an overhead from loading the LAPACK library, which is required for the LU solve? The operator application does not require LAPACK, so this would be consistent with the observations. Are all libraries statically or dynamically linked? I thought I shouldn't see an overhead since PyOP2 loads the libraries in the warmup run, but maybe not anything that's dynamically linked? At higher order, where there is more work, this would then be hidden.

Looking at the total time per iteration, the matrix-free time (which is dominated by the LU solve) is definitely worse than the PETSc time.

Thanks,

Eike

Sent from my iPad

> On 5 Nov 2015, at 19:44, Eike Mueller <E.Mueller at bath.ac.uk> wrote:
> 
> Hi Lawrence,
> 
> I added some code for caching the kernels in the BandedMatrix class, see “cache_kernels” branch, which I pushed to origin. Basically, I wrapped every op2.Kernel(…) call in BandedMatrix with the class method _cached_kernel(). This method checks if the kernel has been added to a dictionary before, and retrieves it. However, even if I *don’t* do this and simply construct the kernels with
> 
> kernel = op2.Kernel(…)
> 
> and then print out their id with
> 
> print hex(id(kernel))
> 
> I get the same number in all calls. This looks like the kernel has already been cached, but maybe it is simply because id() can return the same value for objects which have non-overlapping lifetimes? However, if I ensure that the kernel object is not destroyed (for example by appending to a list of kernel objects), I still observe the same behaviour: all entries in the list have the same if.
> 
> Thanks,
> 
> Eike