MPI problem

Post Reply
schmuker
Posts: 11
Joined: Wed Aug 22, 2007 8:18 am
Location: Brighton, UK
Contact:

MPI problem

Post by schmuker »

Hi all,

I'm trying to get neuron working with mpi on an dual 4-core Intel (ubuntu 7.04). I'm using today's trunk from svn. It compiles nicely, and src/parallel/test0.hoc runs without errors. But test1.hoc crashes:

Code: Select all

:~/sims$ mpirun -np 2 nrniv -mpi src/parallel/test1.hoc
numprocs=2
NEURON -- VERSION 6.1.929 (1844) 2007-09-01
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2007
See http://www.neuron.yale.edu/credits.html

        0
oc: Resource temporarily unavailable
0 neuron/nrn/x86_64/bin/nrniv: errno set during call of ParallelContext[0].submit
0  in src/neuron/nrn/src/parallel/test1.hoc near line 16
0  }
  ^
oc: Resource temporarily unavailable
0 neuron/nrn/x86_64/bin/nrniv: errno set during call of ParallelContext[0].submit
0  in src/neuron/nrn/src/parallel/test1.hoc near line 16
1 unpack size=100 upkpos=12 type[0]=99625888   datatype=-1808921696  type[1]=1  count=1
0  ^
nrniv: bbsmpipack.c:42: unpack: Assertion `type[0] == datatype' failed.
oc: Resource temporarily unavailable
0 neuron/nrn/x86_64/bin/nrniv: errno set during call of ParallelContext[0].submit
0  in src/neuron/nrn/src/parallel/test1.hoc near line 16
0  ^
oc: Resource temporarily unavailable
0 neuron/nrn/x86_64/bin/nrniv: errno set during call of ParallelContext[0].submit
0  in src/neuron/nrn/src/parallel/test1.hoc near line 16
0  ^
oc: Resource temporarily unavailable
No more errno warnings during this execution
0 neuron/nrn/x86_64/bin/nrniv: errno set during call of ParallelContext[0].submit
0  in src/neuron/nrn/src/parallel/test1.hoc near line 16
0  ^
errno set 10 times on last execution
0 unpack size=100 upkpos=12 type[0]=99625888   datatype=99625888  type[1]=1  count=1
nrniv: bbsmpipack.c:42: unpack: Assertion `type[0] == datatype' failed.
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code.  This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 32573 failed on node n0 (127.0.0.1) due to signal 6.
-----------------------------------------------------------------------------
The test also fails without mpiexec:

Code: Select all

:~/sims$ nrniv -mpi src/parallel/test1.hoc
numprocs=1
NEURON -- VERSION 6.1.929 (1844) 2007-09-01
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2007
See http://www.neuron.yale.edu/credits.html

        0
0 unpack size=100 upkpos=12 type[0]=1969515424   datatype=1969515424  type[1]=1  count=1
nrniv: bbsmpipack.c:42: unpack: Assertion `type[0] == datatype' failed.
Aborted (core dumped)
Anyone got an idea what might be going wrong here?

Thanks in advance!
hines
Site Admin
Posts: 1600
Joined: Wed May 18, 2005 3:32 pm

Post by hines »

Can you tell me the configure line you used to build NEURON?
Also, I assume you are using mpich2,
can you tell me the result of
mpich2version
schmuker
Posts: 11
Joined: Wed Aug 22, 2007 8:18 am
Location: Brighton, UK
Contact:

Post by schmuker »

The configure line I used is

Code: Select all

./configure --with-iv=~/sims/neuron/iv --with-paranrn --with-mpi --with-nrnpython --prefix=~/sims/neuron/nrn
I didn't use mpich2, but openmpi (version 1.1). Which MPI library is recommended for neuron?
hines
Site Admin
Posts: 1600
Joined: Wed May 18, 2005 3:32 pm

Post by hines »

I use mpich2

Code: Select all

cd $HOME
curl -O http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-1.0.5p4.tar.gz
tar xzf mpich2-1.0.5p4.tar.gz
cd mpich2-1.0.5p4
./configure --prefix=$HOME/mpich2 --with-device=ch3:nemesis
make
make install
export PATH=$HOME/mpich2/bin:$PATH
However I will install openmpi on my machine and fix any problems with NEURON that result.

By the way, you do not need the --with-mpi configure option if you invoke --with-paranrn
hines
Site Admin
Posts: 1600
Joined: Wed May 18, 2005 3:32 pm

Post by hines »

The latest svn changeset fixes the problems NEURON has with openmpi.
http://www.neuron.yale.edu/cgi-bin/trac ... geset/1846
I built using openmpi-1.2.3 with
./configure --prefix=$HOME/openmpi
and to build/run NEURON needed
export LD_LIBRARY_PATH=$HOME/openmpi/lib
schmuker
Posts: 11
Joined: Wed Aug 22, 2007 8:18 am
Location: Brighton, UK
Contact:

Post by schmuker »

Thanks for the prompt reply! However, I still get the error message, although sometimes it runs without complaining. When I tried it the first few times, it always worked, now it crashes almost always...

Code: Select all

mpiexec -np 2 nrniv -mpi ~/sims/srcs/neuron6/nrn/src/parallel/test1.hoc
numprocs=2
NEURON -- VERSION 6.1.931 ()
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2007
See http://www.neuron.yale.edu/credits.html

        0
1 unpack size=100 upkpos=12 type[0]=-1215706784   datatype=-1215620768  type[1]=1  count=1
nrniv: bbsmpipack.c:42: unpack: Assertion `type[0] == (int)datatype' failed.
[endlich:23946] *** Process received signal ***
[endlich:23946] Signal: Aborted (6)
[endlich:23946] Signal code:  (-6)
[endlich:23946] [ 0] [0xb7fcc440]
[endlich:23946] [ 1] /lib/libc.so.6(abort+0x101) [0xb7561801]
[endlich:23946] [ 2] /lib/libc.so.6(__assert_fail+0xfb) [0xb75597bb]
[endlich:23946] [ 3] /home/micha/sims/bin/neuron6/nrn/i686/lib/libnrnmpi.so.0 [0xb7c379ce]
[endlich:23946] [ 4] /home/micha/sims/bin/neuron6/nrn/i686/lib/libnrnmpi.so.0(nrnmpi_upkint+0x35) [0xb7c37b95]
[endlich:23946] [ 5] /home/micha/sims/bin/neuron6/nrn/i686/lib/libnrniv.so.0(_ZN9BBSClient6upkintEv+0x20) [0xb7dbed10]
[endlich:23946] [ 6] /home/micha/sims/bin/neuron6/nrn/i686/lib/libnrniv.so.0(_ZN7BBSImpl7executeEi+0x4f) [0xb7dbba0f]
[endlich:23946] [ 7] /home/micha/sims/bin/neuron6/nrn/i686/lib/libnrniv.so.0(_ZN7BBSImpl6workerEv+0x68) [0xb7dbb118]
[endlich:23946] [ 8] /home/micha/sims/bin/neuron6/nrn/i686/lib/libnrniv.so.0(_ZN3BBS6workerEv+0x14) [0xb7dbb0a4]
[endlich:23946] [ 9] /home/micha/sims/bin/neuron6/nrn/i686/lib/libnrniv.so.0 [0xb7dba5dd]
[endlich:23946] [10] /home/micha/sims/bin/neuron6/nrn/i686/lib/libnrnoc.so.0(call_ob_proc+0x23f) [0xb7fb54ef]
[endlich:23946] [11] /home/micha/sims/bin/neuron6/nrn/i686/lib/libnrnoc.so.0(hoc_object_component+0x42f) [0xb7fb6bbf]
[endlich:23946] [12] /home/micha/sims/bin/neuron6/nrn/i686/lib/libnrnoc.so.0(hoc_execute+0x59) [0xb7fadf09]
[endlich:23946] [13] /home/micha/sims/bin/neuron6/nrn/i686/lib/liboc.so.0 [0xb7f715dc]
[endlich:23946] [14] /home/micha/sims/bin/neuron6/nrn/i686/lib/liboc.so.0(hoc_main1+0xd5) [0xb7f71ae5]
[endlich:23946] [15] /home/micha/sims/bin/neuron6/nrn/i686/lib/libivoc.so.0(_ZN2Oc3runEiPPc+0x2c) [0xb7c8277c]
[endlich:23946] [16] nrniv(ivocmain+0x270) [0x8057e90]
[endlich:23946] [17] nrniv(main+0x6b) [0x80579eb]
[endlich:23946] [18] /lib/libc.so.6(__libc_start_main+0xdc) [0xb754cf9c]
[endlich:23946] [19] nrniv(__gxx_personality_v0+0x2f1) [0x8057761]
[endlich:23946] *** End of error message ***
mpiexec noticed that job rank 0 with PID 23945 on node endlich exited on signal 15 (Terminated).
1 additional process aborted (not shown)
Cheers,

Michael
hines
Site Admin
Posts: 1600
Joined: Wed May 18, 2005 3:32 pm

Post by hines »

type[0]=-1215706784 datatype=-121562076
Hmm. I guess its back to the drawing board. I tested on an x86_64 and the casts must be inappropriate for an i686.
I'll have to do some experimenting and may end up with a fairly substantial algorithm change. It may be a few days and if you need to get going more rapidly I recommend installing mpich2.
schmuker
Posts: 11
Joined: Wed Aug 22, 2007 8:18 am
Location: Brighton, UK
Contact:

Post by schmuker »

OK, no problem. I'll use mpich2 for the time being. Let me know if I can help testing!

Cheers,

Michael
hines
Site Admin
Posts: 1600
Joined: Wed May 18, 2005 3:32 pm

Post by hines »

The current svn repository version
http://www.neuron.yale.edu/cgi-bin/trac ... geset/1848
should fix the assertion error problem in a (now hopefully) machine independent fashion.
schmuker
Posts: 11
Joined: Wed Aug 22, 2007 8:18 am
Location: Brighton, UK
Contact:

Post by schmuker »

Just checked it on a i686 with openmpi version 1.2.3: Revision 1848 passed all tests in src/parallel without complaining! Great, thanks a lot!

Cheers,

Michael
Post Reply