nrn --with-mpi

nizar
Posts: 22
Joined: Sat Oct 08, 2005 11:13 am
Location: localhost
Contact:

nrn --with-mpi

Post by nizar » Sat Oct 28, 2006 5:38 am

Hi,

I'm trying to install nrn-5.9 with mpi

Code: Select all

./configure --prefix=`pwd` --without-x --without-nrnjava --with-nrniv --with-mpi
x86_64-pc-linux-gnu-4.1.1
glibc-2.4
libtool-1.5.22
kernel-2.6.17

mpich2version:

Code: Select all

Version:           1.0.3
Device:            ch3:sock
Configure Options: --prefix=/usr --enable-sharedlibs=gcc --enable-fast --enable-g=none --with-thread-package=pthreads --enable-rlog=no --enable-slog2=no --enable-cxx --enable-mpe --enable-threads --includedir=/usr/include --libdir=/usr/lib64 --mandir=/usr/share/man --with-docdir=/usr/share/doc/mpich2-1.0.3 --with-htmldir=/usr/share/doc/mpich2-1.0.3/html --sysconfdir=/etc/mpich2 --datadir=/usr/share/mpich2
configure fails with Cannot compile MPI program

The full configure output:

Code: Select all

./configure --prefix=`pwd` --without-x --without-nrnjava --with-nrniv --with-mpi checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for style of include used by make... GNU
checking for g++... g++
checking for C++ compiler default output file name... a.out
checking whether the C++ compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... gcc3
Not trying to build rpms for your system (use --enable-rpm-rules to override)
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... gcc -E
checking for gawk... (cached) gawk
checking for a BSD-compatible install... /usr/bin/install -c
checking for flex... flex
checking for yywrap in -lfl... yes
checking lex output file root... lex.yy
checking whether yytext is a pointer... yes
checking for bison... bison -y
checking whether ln -s works... yes
checking for a sed that does not truncate output... /bin/sed
checking for egrep... grep -E
checking for ld used by gcc... /usr/x86_64-pc-linux-gnu/bin/ld
checking if the linker (/usr/x86_64-pc-linux-gnu/bin/ld) is GNU ld... yes
checking for /usr/x86_64-pc-linux-gnu/bin/ld option to reload object files... -r
checking for BSD-compatible nm... /usr/bin/nm -B
checking how to recognise dependent libraries... pass_all
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking dlfcn.h usability... yes
checking dlfcn.h presence... yes
checking for dlfcn.h... yes
checking how to run the C++ preprocessor... g++ -E
checking for g77... no
checking for f77... no
checking for xlf... no
checking for frt... no
checking for pgf77... no
checking for fort77... no
checking for fl32... no
checking for af77... no
checking for f90... no
checking for xlf90... no
checking for pgf90... no
checking for epcf90... no
checking for f95... no
checking for fort... no
checking for xlf95... no
checking for ifc... no
checking for efc... no
checking for pgf95... no
checking for lf95... no
checking for gfortran... gfortran
checking whether we are using the GNU Fortran 77 compiler... yes
checking whether gfortran accepts -g... yes
checking the maximum length of command line arguments... 32768
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for objdir... .libs
checking for ar... ar
checking for ranlib... ranlib
checking for strip... strip
checking if gcc static flag  works... yes
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC
checking if gcc PIC flag -fPIC works... yes
checking if gcc supports -c -o file.o... yes
checking whether the gcc linker (/usr/x86_64-pc-linux-gnu/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking for shl_load... no
checking for shl_load in -ldld... no
checking for dlopen... no
checking for dlopen in -ldl... yes
checking whether a program can dlopen itself... yes
checking whether a statically linked program can dlopen itself... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... no
configure: creating libtool
appending configuration tag "CXX" to libtool
checking for ld used by g++... /usr/x86_64-pc-linux-gnu/bin/ld -m elf_x86_64
checking if the linker (/usr/x86_64-pc-linux-gnu/bin/ld -m elf_x86_64) is GNU ld... yes
checking whether the g++ linker (/usr/x86_64-pc-linux-gnu/bin/ld -m elf_x86_64) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC
checking if g++ PIC flag -fPIC works... yes
checking if g++ supports -c -o file.o... yes
checking whether the g++ linker (/usr/x86_64-pc-linux-gnu/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking for shl_load... (cached) no
checking for shl_load in -ldld... (cached) no
checking for dlopen... (cached) no
checking for dlopen in -ldl... (cached) yes
checking whether a program can dlopen itself... (cached) yes
checking whether a statically linked program can dlopen itself... (cached) yes
appending configuration tag "F77" to libtool
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... no
checking for gfortran option to produce PIC... -fPIC
checking if gfortran PIC flag -fPIC works... yes
checking if gfortran supports -c -o file.o... yes
checking whether the gfortran linker (/usr/x86_64-pc-linux-gnu/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking for mpicc... mpicc
checking for MPI_Init... no
checking for MPI_Init in -lmpi... no
checking for MPI_Init in -lmpich... no
configure: error: Cannot compile MPI program
why the configure script unable to recognize MPI_Init?

I can run mpd on all nodes, test with mpdringtest, run /usr/share/mpich2/cpi: mpiexec -n 5000 /usr/share/mpich2/cpi with no problem.

Thanks in advance.

hines
Site Admin
Posts: 1577
Joined: Wed May 18, 2005 3:32 pm

Post by hines » Sun Oct 29, 2006 10:01 am

I don't see anything obviously wrong with
your environment. On my machine I am using

Code: Select all

$ mpich2version
Version:           1.0.4p1
Device:            ch3:sock
Configure Options: '--prefix=/home/hines/mpich2'
CC:  gcc
CXX: c++
F77: g77
F90:
The only way to do further diagnosis of the

Code: Select all

checking for MPI_Init... no
problem is to examine the output of config.log. If it is not clear from that, please send it to me so I can take a look at it.
michael dot hines at yale dot edu

nizar
Posts: 22
Joined: Sat Oct 08, 2005 11:13 am
Location: localhost
Contact:

Post by nizar » Sun Oct 29, 2006 10:34 am

hines wrote:I don't see anything obviously wrong with
your environment. On my machine I am using

Code: Select all

$ mpich2version
Version:           1.0.4p1
Device:            ch3:sock
Configure Options: '--prefix=/home/hines/mpich2'
CC:  gcc
CXX: c++
F77: g77
F90:
The only way to do further diagnosis of the

Code: Select all

checking for MPI_Init... no
problem is to examine the output of config.log. If it is not clear from that, please send it to me so I can take a look at it.
Thanks, I compiled mpich2

Code: Select all

Version:           1.0.4p1
Device:            ch3:sock
Configure Options: '--prefix=/opt/mpich2' '--with-mpe' '--enable-f90' '--enable-cxx' '--enable-romio' '--enable-mpe' '--enable-sharedlibs=gcc'
CC:  gcc
CXX: g++
F77: gfortran
F90: gfortran
Now neuron compiles fine with --with-mpi.

Now we are trying to run neuron parallely. We run with

Code: Select all

mpiexec -n x nrniv "code.hoc"
We have tried to use code with both the pnm and the pc options. Under both NEURON runs n (seperate) processes with nhost = 1 confirmed by printout. We would like to have one process with nhost = n. How should this be done?

mpd is running, mpdringtest pass tests and mpi C code compiles and runs with no problem.

hines
Site Admin
Posts: 1577
Joined: Wed May 18, 2005 3:32 pm

Post by hines » Sun Oct 29, 2006 10:52 am

Code: Select all

mpiexec -n x nrniv "code.hoc"
Launch with

Code: Select all

mpiexec -n x nrniv -mpi "code.hoc"
This is a very recent change and I apologise that it has not made it into the documentation. The problem is that when I started out with mpich1, it was possible to examine the arguments when launching with mpirun and decide whether to call MPI_Init or not. Unfortunately, with lam or any of the supercomputer implementations of mpi, this did not work but on those machines, one never wanted to run in serial mode and so I added the configure option (environment variable) "always_call_mpi_init=yes" and that was sufficient in many cases. When I upgraded to mpich2 I discovered that they changed the launch process and no longer use a modified argument list. But I did not want to have two NEURON installations for serial and parallel. Hence, if you are using the installation
only for parallel runs, use the configure option. If you need both, then leave it off and when you want to run in parallel use the -mpi option at launch time.

hines
Site Admin
Posts: 1577
Joined: Wed May 18, 2005 3:32 pm

Post by hines » Sun Oct 29, 2006 10:58 am

By the way, I still don't know why your original mpich2 installation failed at NEURON configure. Was there any indication of the reason in the config.log file?

Also, there is no reason to type the --without-java or --with-nrniv since those are default options. Also, if you desire a full combination serial/parallel version that can run either way and you are able to install the InterViews part on your machine, then you can leave off the --without-x option. When running in parallel, all gui is turned off. Lastly, if you think you might ever want to use gap junctions in parallel, then replace the --with-mpi option with --with-paranrn

nizar
Posts: 22
Joined: Sat Oct 08, 2005 11:13 am
Location: localhost
Contact:

Post by nizar » Sun Oct 29, 2006 1:17 pm

hines wrote:

Code: Select all

mpiexec -n x nrniv "code.hoc"
Launch with

Code: Select all

mpiexec -n x nrniv -mpi "code.hoc"
This is a very recent change and I apologise that it has not made it into the documentation. The problem is that when I started out with mpich1, it was possible to examine the arguments when launching with mpirun and decide whether to call MPI_Init or not. Unfortunately, with lam or any of the supercomputer implementations of mpi, this did not work but on those machines, one never wanted to run in serial mode and so I added the configure option (environment variable) "always_call_mpi_init=yes" and that was sufficient in many cases. When I upgraded to mpich2 I discovered that they changed the launch process and no longer use a modified argument list. But I did not want to have two NEURON installations for serial and parallel. Hence, if you are using the installation
only for parallel runs, use the configure option. If you need both, then leave it off and when you want to run in parallel use the -mpi option at launch time.

Code: Select all

mpiexec -n x nrniv -mpi "code.hoc"
I get this:
nrniv: can't open -mpi

I already compiled neuron with --with-mpi, does -mpi relevant only for neuron compiled without --with-mpi?

nizar
Posts: 22
Joined: Sat Oct 08, 2005 11:13 am
Location: localhost
Contact:

Post by nizar » Sun Oct 29, 2006 1:27 pm

hines wrote:By the way, I still don't know why your original mpich2 installation failed at NEURON configure. Was there any indication of the reason in the config.log file?

Also, there is no reason to type the --without-java or --with-nrniv since those are default options. Also, if you desire a full combination serial/parallel version that can run either way and you are able to install the InterViews part on your machine, then you can leave off the --without-x option. When running in parallel, all gui is turned off. Lastly, if you think you might ever want to use gap junctions in parallel, then replace the --with-mpi option with --with-paranrn
Don't have the config.log any more, will try to reproduce the situation on other machines and post the config.log.

I started to add explicit configure flags --without-java or --with-nrniv after a number of failures of the configure script.

I don't have X on these machines, hence the --without-x.

Assuming that I want to to run only parallel without X on these machines (beowulf cluster: 28 machines, dual opteron 280 dual-core), the best is:

Code: Select all

./configure --prefix=`pwd` --without-x  --with-mpi --with-paranrn
?

many thanks.

hines
Site Admin
Posts: 1577
Joined: Wed May 18, 2005 3:32 pm

Post by hines » Sun Oct 29, 2006 9:52 pm

Code: Select all

nrniv: can't open -mpi 
That's strange. The addition was two months ago at changeset 1497 and should be in current standard distribution
which links to nrn-5.9.rel-9.tar.gz.
If your version predates that then you must use the configure option "always_call_mpi_init=yes"
Assuming that I want to to run only parallel without X on these machines (beowulf cluster: 28 machines, dual opteron 280 dual-core), the best is:
Code:

./configure --prefix=`pwd` --without-x --with-mpi --with-paranrn

?
The best is

Code: Select all

./configure --prefix=`pwd` --without-x --with-paranrn always_call_mpi_init=yes

nizar
Posts: 22
Joined: Sat Oct 08, 2005 11:13 am
Location: localhost
Contact:

Post by nizar » Mon Oct 30, 2006 12:50 am

hines wrote:

Code: Select all

nrniv: can't open -mpi 
That's strange. The addition was two months ago at changeset 1497 and should be in current standard distribution
which links to nrn-5.9.rel-9.tar.gz.
If your version predates that then you must use the configure option "always_call_mpi_init=yes"
Assuming that I want to to run only parallel without X on these machines (beowulf cluster: 28 machines, dual opteron 280 dual-core), the best is:
Code:

./configure --prefix=`pwd` --without-x --with-mpi --with-paranrn

?
The best is

Code: Select all

./configure --prefix=`pwd` --without-x --with-paranrn always_call_mpi_init=yes

Code: Select all

nizar@ocean ~ $ nrngui
NEURON -- Release 5.9.9 (1529) 2006-09-11
by John W. Moore, Michael Hines, and Ted Carnevale
Duke and Yale University -- Copyright 1984-2006
compiled

Code: Select all

./configure --prefix=`pwd` --without-x --with-paranrn 
still
nrniv: can't open -mpi

mpdlistjobs shows nrniv on the different nodes like:

Code: Select all

jobid    = 8@ocean_48314
jobalias =
username = nizar
host     = brain17
pid      = 12460
sid      = 12455
rank     = 8
pgm      = nrniv

jobid    = 8@ocean_48314
jobalias =
username = nizar
host     = brain16
pid      = 11733
sid      = 11728
rank     = 10
pgm      = nrniv

hines
Site Admin
Posts: 1577
Joined: Wed May 18, 2005 3:32 pm

Post by hines » Mon Oct 30, 2006 8:25 am

Code: Select all

nrniv: can't open -mpi
Use the "always_call_mpi_init=yes" configure option or upgrade to the alpha version http://www.neuron.yale.edu/ftp/neuron/v ... 796.tar.gz

Abject apologies.
The -mpi option was added to the splitcell branch which was merged to the main trunk after the Release version was copied from the main trunk.

nizar
Posts: 22
Joined: Sat Oct 08, 2005 11:13 am
Location: localhost
Contact:

Post by nizar » Mon Oct 30, 2006 2:37 pm

hines wrote:

Code: Select all

nrniv: can't open -mpi
Use the "always_call_mpi_init=yes" configure option or upgrade to the alpha version http://www.neuron.yale.edu/ftp/neuron/v ... 796.tar.gz

Abject apologies.
The -mpi option was added to the splitcell branch which was merged to the main trunk after the Release version was copied from the main trunk.
5.9.9 with always_call_mpi_init=yes, no problem with -mpi I get the following:

Code: Select all

[cli_44]: aborting job:
Fatal error in MPI_Allgather: Other MPI error, error stack:
MPI_Allgather(953)........................: MPI_Allgather(sbuf=0x2b8044ad2f18, scount=1, MPI_INT, rbuf=0x60bb40, rcount=1, MPI_INT, MPI_COMM_WORLD) failed
MPIR_Allgather(533).......................:
MPIC_Sendrecv(161)........................:
MPIC_Wait(324)............................:
MPIDI_CH3_Progress_wait(217)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(415):
MPIDU_Socki_handle_read(670)..............: connection failure (set=0,sock=2,errno=104:Connection reset by peer)
[cli_47]: aborting job:
Fatal error in MPI_Allgather: Other MPI error, error stack:
MPI_Allgather(953)........................: MPI_Allgather(sbuf=0x2b20e07a1f18, scount=1, MPI_INT, rbuf=0x60bb40, rcount=1, MPI_INT, MPI_COMM_WORLD) failed
MPIR_Allgather(533).......................:
MPIC_Sendrecv(161)........................:
MPIC_Wait(324)............................:
MPIDI_CH3_Progress_wait(217)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(415):
MPIDU_Socki_handle_read(670)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)
rank 47 in job 1  ocean_49547   caused collective abort of all ranks
  exit status of rank 47: return code 1
rank 44 in job 1  ocean_49547   caused collective abort of all ranks
  exit status of rank 44: return code 1
rank 0 in job 1  ocean_49547   caused collective abort of all ranks
  exit status of rank 0: return code 255
Is it still related to neuron yet, or it's only MPI issue?

mpdringtest 5000

Code: Select all

time for 5000 loops = 17.7262170315 seconds 
Is there any simple mpi neuron code for tetsing that you can send me, to be sure it's not the code I'm testing with?

10x

hines
Site Admin
Posts: 1577
Joined: Wed May 18, 2005 3:32 pm

Post by hines » Mon Oct 30, 2006 3:33 pm

5.9.9 with always_call_mpi_init=yes, no problem with -mpi I get the following:
The -mpi argument is not implemented unless you are using a 6.0 alpha version.

Are you sure that is all the stdout/stderr
information. There should have been a first line printed that gives the number of hosts used. If you used a -mpi arg then any machine that got to the point of execution where it was being handled as a filename should have printed the normal
error message. When an error occurs on any machine, an MPI_Abort is called which should stop all the other processes and perhaps you are seeing the results of that. Anyway, the hello world level test program is
nrn/src/parallel/test0.hoc

nizar
Posts: 22
Joined: Sat Oct 08, 2005 11:13 am
Location: localhost
Contact:

Post by nizar » Mon Oct 30, 2006 4:04 pm

hines wrote: The -mpi argument is not implemented unless you are using a 6.0 alpha version.
right

Are you sure that is all the stdout/stderr
information. There should have been a first line printed that gives the number of hosts used. If you used a -mpi arg then any machine that got to the point of execution where it was being handled as a filename should have printed the normal
error message. When an error occurs on any machine, an MPI_Abort is called which should stop all the other processes and perhaps you are seeing the results of that. Anyway, the hello world level test program is
nrn/src/parallel/test0.hoc[/quote]

Code: Select all

 mpirun -n 50 nrniv "code.hoc"
I see many 'nhost is 50' and
Created 6400 connections to targets on host 0
SetupTime: 0.069999933
nrnmpi_use=1 active=1
....

I posted only the errors... sorry if I wasn't clear.

Code: Select all

mpirun -n 10 nrniv "test0.hoc"
gives no errors

Code: Select all

nrnmpi_init(): numprocs=10 myid=0
hello from id 0 on ocean

        0
hello from id 1 on brain3

hello from id 5 on brain13

hello from id 4 on brain3

hello from id 9 on brain12

hello from id 6 on brain13

hello from id 3 on brain3

hello from id 2 on brain3

hello from id 8 on brain13

hello from id 7 on brain13

bbs_msg_cnt_=9 bbs_poll_cnt_=6667 bbs_poll_=93
        0
No errors also for
mpirun -n 50 nrniv "testN.hoc", {N|N=1-7}

hines
Site Admin
Posts: 1577
Joined: Wed May 18, 2005 3:32 pm

Post by hines » Mon Oct 30, 2006 4:49 pm

Good. Looks like you are on your way.

nizar
Posts: 22
Joined: Sat Oct 08, 2005 11:13 am
Location: localhost
Contact:

Post by nizar » Mon Oct 30, 2006 4:57 pm

hines wrote:Good. Looks like you are on your way.
wow...
I'll make more tests and let you know.

very many thanks.

Post Reply