[firedrake] Installing Firedrake on an HPC machine

Lawrence Mitchell lawrence.mitchell at imperial.ac.uk
Thu Aug 6 10:18:57 BST 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Justin,

On 06/08/15 06:16, Justin Chang wrote:
> Hi everyone,
> 
> I have installed firedrake on my university's HPC machine, and
> whenever i attempt to run any Firedrake program, I get this error:
> 
> --------------------------------------------------------------------------
>
>  An MPI process has executed an operation involving a call to the
> 
> "fork()" system call to create a child process.  Open MPI is
> currently
> 
> operating in a condition that could result in memory corruption or
> 
> other system errors; your MPI job may hang, crash, or produce
> silent
> 
> data corruption.  The use of fork() (or system() or other calls
> that
> 
> create child processes) is strongly discouraged.
> 
> 
> The process that invoked fork was:
> 
> 
> Local host:          compute-0-0 (PID 28214)
> 
> MPI_COMM_WORLD rank: 0
> 
> 
> If you are *absolutely sure* that your application will
> successfully
> 
> and correctly survive a call to fork(), you may disable this
> warning
> 
> by setting the mpi_warn_on_fork MCA parameter to 0.

So I recently made some changes to PyOP2 to make us more robust in the
face of OpenMPI not allowing forking, which we need to do to invoke
compilers when jit-compiling code.  To do this, we therefore attempt
to fork a single process /before/ MPI is initialized (which is safe,
because OpenMPI doesn't see it), this child process then does
subsequent forks.  Naturally, this will fail if MPI is already
initialized by the time we come to fork.

So possibly the programs you're running are initialising MPI?

Let's check some things.

Let's first try something that doesn't invoke fork at all:

cat > no-fork.py << EOF
from mpi4py import MPI
print MPI.COMM_WORLD.size
EOF
mpiexec -n 2 python no-fork.py

Now something that does call fork, but /before/ initialising MPI

cat > fork-before.py << EOF
import os
def my_fork():
    ret = os.fork()
    if ret == 0:
        print 'child exiting'
        os._exit(0)
    else:
        pass
my_fork()
from mpi4py import MPI
print MPI.COMM_WORLD.size
EOF
mpiexec -n 2 python fork-before.py

I hope this one works!

Now fork afterwards (which I expect to fail with the error message above):

cat > fork-after.py << EOF
import os
def my_fork():
    ret = os.fork()
    if ret == 0:
        print 'child exiting'
        os._exit(0)
    else:
        pass
from mpi4py import MPI
print MPI.COMM_WORLD.size
my_fork()
EOF
mpiexec -n 2 python fork-after.py

Now something more like how PyOP2/Firedrake does things:

cat > closer-test.py << EOF
import os
import socket

def child(sock):
    val = sock.recv(1)
    import mpi4py.rc
    mpi4py.rc.initialize = False
    from mpi4py import MPI
    print 'In child', MPI.Is_initialized()
    os._exit(0)

def parent(sock):
    from mpi4py import MPI
    print 'In parent', MPI.Is_initialized()
    sock.send("1")

a, b = socket.socketpair()
ret = os.fork()

if ret == 0:
    a.close()
    child(b)
else:
    b.close()
    parent(a)
EOF
mpiexec -n 2 python closer-test.

Now let's try doing it the way PyOP2/firedrake does this:

cat > fork-pyop2.py << EOF
from pyop2_utils import enable_mpi_prefork
enable_mpi_prefork()
from mpi4py import MPI
print MPI.COMM_WORLD.size
EOF
mpiexec -n 2 python fork-pyop2.py

I hope this should work, because it's effectively just doing what
fork-before.py does.

Now let's just run pyop2 on its own:

cat > pyop2.py << EOF
from pyop2 import op2
op2.init()
EOF
mpiexec -n 2 python pyop2.py

And then firedrake:

cat > import-firedrake.py << EOF
from firedrake import *
EOF
mpiexec -n 2 python import-firedrake.py

And finally a short test in firedrake:

cat > firedrake-test.py << EOF
from firedrake import *
mesh = UnitSquareMesh(3, 3)
print assemble(Constant(1)*dx(domain=mesh))
EOF
mpiexec -n 2 python firedrake-test.py


Hopefully these tests will allow us to better see where things are
going wrong.

Cheers,

Lawrence
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJVwyZ9AAoJECOc1kQ8PEYv+dEIAIYn6MfkhLS1XVKbqzTfhQ6T
Yb+uoGm2/hXnUki5JYoVRWrWrc3gOYDBxBFWBEBKQHy/d5tzutDvEZyM66nmzAhl
YXSZEcfputIbT9d6VlmAzdjW39Yi/V6v+imuuyIhsAVDo8P/J5bD4xR2Q6DC+v30
+QglNfStcAfuQGrlfE7uQpR0SV4+PdkpQHCsbhuV8fGrXptQTSB+Q6GqNxrIK72X
BmLR20dLZCW01pW0GYoSqak92E8SpFgaFTScPHHj4jV2yDyJpvWBnuxcdbfnOV3r
0hOh2gk2pHRcHdetL/pdhdQ2WkXevXTtrGeeqwMaw19Jq/XRaQa9umR4m1O5FKU=
=68Vi
-----END PGP SIGNATURE-----



More information about the firedrake mailing list