[firedrake] Projection on DG0 space fails on large core counts

Lawrence Mitchell lawrence.mitchell at imperial.ac.uk
Wed Jun 3 12:27:22 BST 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 30/05/15 19:22, Eike Mueller wrote:
> Dear firedrakers,
> 
> I now re-ran my code on up to 1536 cores on ARCHER, but I get a
> problem when I try to project an expression onto a DG0 function
> space on an extruded grid.
> 
> The full (very large) log is here
> https://gist.github.com/eikehmueller/83a5fc139e1fedb5306c but as 
> far as I can tell
> 
> the following crashes:
> 
> r_p.project(expression,solver_parameters={'ksp_type':'cg','pc_type':'jacobi’})
>
>  and here is the relevant part of the trace that I attempted to
> reconstruct:
> 
> File 
> "/work/n02/n02/eike/git_workspace/firedrake/firedrake/function.py",
> line 157, in project return projection.project(b, self, *args,
> **kwargs) File 
> "/work/n02/n02/eike/git_workspace/firedrake/firedrake/projection.py",
>
> 
line 94, in project
> […] solving_utils.check_snes_convergence(self.snes) File 
> "/work/n02/n02/eike/git_workspace/firedrake/firedrake/variational_solver.py",
>
> 
line 163, in solve
> File "/work/n02/n02/eike/git_workspace/PyOP2/pyop2/profiling.py",
> line 199, in wrapper %s""" % (snes.getIterationNumber(), msg)) 
> File 
> "/work/n02/n02/eike/git_workspace/firedrake/firedrake/solving_utils.py",
>
> 
line 62, in check_snes_convergence
> return f(*args, **kwargs) RuntimeError: Nonlinear solve failed to
> converge after 1 nonlinear iterations.
> 
> It does work fine on smaller processor numbers. Maybe the PETSc
> integers overflow again, the number of cells is 5242880 x 64 =
> 335544320 ~ 2^{28}, which is not too far from 2^{32}, but I thought
> I check in case you’ve seen something similar before. I thought I
> had managed to run problems of this size in the past (i.e. earlier
> this year).

So the other potentially useful piece of information is that this
solver failed to converge:

"Inner linear solve failed to converge after 0 iterations with reason:
DIVERGED_NANORINF"

Which means that the initial residual that you were trying to project
had a norm which was either NAN or INF.  I.e.
assemble(expression*DG0_test_function*dx) had a nan/inf.

Does this help?  Otherwise I'm pretty stumped.

Lawrence
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJVbuSWAAoJECOc1kQ8PEYvuBMIAMkR9DN4AANk9AE9bU/zgoJA
JKrRlPVrLWs5YlYjh5J0AmjjL0MQHZG0SbPlw9YdDSHybE4jXKm0MHiQAMxCN3El
zeB85znyq5o3JaU8yaG/fp0I5jr1TyGc7Kpc9uk7roXklmFOGl4L6zTF0u7299+5
k7UCxaodRou2klT2K7CR3bj/11rItllWkg3Zu3JTSGpnC6Uoh8nVE1OH+ZVq+w81
Ujl05rAWjJaOlKOoWhaTF+6SK4nECtrOHH6HRujDZyOZR/qVGVxc9KHcyDX4aqLy
QoM+0DLfjNH0/Fzf+HPj2ISA5wBT4TC5sXNJaAuM8SB6FJfVWaI/SA/1lTq5Gd0=
=ISuP
-----END PGP SIGNATURE-----



More information about the firedrake mailing list