Page 1 of 1

Neuron and Myrinet

Posted: Sun May 28, 2006 6:39 am
by Thomas Wennekers
Hi all

Did somebody manage to compile Neuron with MPI for Myrinet and/or LAM-MPI?

The following code snippet from src/nrnmpi/nrnmpi.c seems to restrict usage to MPICH and ethernet networks only:

#if !ALWAYS_CALL_MPI_INIT
/* this is not good. depends on mpirun adding at least one
arg that starts with -p4 but that probably is dependent
on mpich and the use of the ch_p4 device. We are trying to
work around the problem that MPI_Init may change the working
directory and so when not invoked under mpirun we would like to
NOT call MPI_Init.
*/
{
int i, b;
b = 0;
for (i=0; i < *pargc; ++i) {
if (strncmp("-p4", (*pargv), 3) == 0) {
b = 1;
break;
}
}
if (!b) {
nrnmpi_use = 0;
return;
}
}
#endif


Does somebody know a workaround?

Thanks
Thomas


PS: Just inactivating the above code is not a workaround; it results in the subsequent errors on my Laptop (used for testing) running Kubuntu and LAM-MPI, with or without iv [doNotify() somehow links to hoc_notify_iv() in src/oc/hoc_init.c so I thought it could be iv]

mpirun -np 2 nrniv test0.hoc
NEURON -- Version 5.8 2005-10-7 13:46:29 Main (85)
by John W. Moore, Michael Hines, and Ted Carnevale
Duke and Yale University -- Copyright 1984-2005

hello from id 1 on beatrix

nrnmpi_init(): numprocs=2 myid=0
NEURON -- Version 5.8 2005-10-7 13:46:29 Main (85)
by John W. Moore, Michael Hines, and Ted Carnevale
Duke and Yale University -- Copyright 1984-2005

hello from id 0 on beatrix

0
oc: Resource temporarily unavailable
0 nrniv: errno set during call of doNotify
0 in test0.hoc near line 9
0 for i=1, 1000000 doNotify()
^
oc: Resource temporarily unavailable
0 nrniv: errno set during call of doNotify
0 in test0.hoc near line 9
0 ^
oc: Resource temporarily unavailable
0 nrniv: errno set during call of doNotify
0 in test0.hoc near line 9
0 ^
oc: Resource temporarily unavailable
0 nrniv: errno set during call of doNotify
0 in test0.hoc near line 9
0 ^
oc: Resource temporarily unavailable
No more errno warnings during this execution
0 nrniv: errno set during call of doNotify
0 in test0.hoc near line 9
0 ^
errno set 6666 times on last execution
bbs_msg_cnt_=1 bbs_poll_cnt_=6667 bbs_poll_=93
0


Same problem appears on Beowulf cluster with MPICH and Myrinet

Neuron version is 5.8 but the newest version on the Neuron-website 5.9.?? contains the same code particle. Didn't try to compile and test yet, but am pretty sure that the same problem would likely occur

Again, thanks for hints!

Best
Thomas

Posted: Sun May 28, 2006 5:27 pm
by hines
The problem is not the #if !ALWAYS_CALL_MPI_INIT code section. In fact everything is working in your example except for the benign but annoying error messages. They may just go away if you upgrade to 5.9. If not let me know. Usually the problem traces to failing to reset errno=0 after a call to an mpi function.

5.3 compilation fails

Posted: Tue May 30, 2006 10:46 am
by Thomas Wennekers
Hi

Thanks for the info.

Compilation of version 5.9 fails with

-----------------------------
.....
mpicxx -g -O2 -o .libs/ivoc nrnmain.o ivocmain.o classreg.o datapath.o ocjump.o symdir.o ../oc/nocable.o ../oc/modlreg.o ../oc/.libs/libocxt.so ../oc/.libs/liboc.so -L/usr/X11R6/lib64 -lX11 ./.libs/libivoc.so ../nrnmpi/.libs/libnrnmpi.so ../memacs/.libs/libmemacs.so ../mesch/.libs/libmeschach.so ../gnu/.libs/libneuron_gnu.so /usr/local/Cluster-Apps/nrn/nrn-5.9/mx-iv-mpi-gcc/iv/x86_64/lib/libIVhines.so /usr/lib64/libstdc++.so -lreadline -lncurses -ldl -lm -Wl,--rpath -Wl,/usr/local/Cluster-Apps/nrn/nrn-5.9/mx-iv-mpi-gcc//x86_64/lib -Wl,--rpath -Wl,/usr/local/Cluster-Apps/nrn/nrn-5.9/mx-iv-mpi-gcc/iv/x86_64/lib
./.libs/libivoc.so: undefined reference to `ListImpl_best_new_count(long, unsigned int, unsigned int)'
collect2: ld returned 1 exit status
make[3]: *** [ivoc] Error 1
make[3]: Leaving directory `/home/thomas/soft/nrn-5.9/src/ivoc'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/thomas/soft/nrn-5.9/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/thomas/soft/nrn-5.9'
make: *** [all] Error 2
----------------------------------

configuration complains about varargs dperecated, but that's unlikekly the reason.

The above error is the first during compilation (beside several size mismatches in variables)

full logs of configuration and make are here: www.pion.ac.uk/~thomas/open

Best wishes
Thomas

Posted: Tue May 30, 2006 11:00 am
by hines
reinstall interviews (get it from the v5.9
directory or the alpha directory or the link in the install page of the web site). Note that if you are running with mpi then all gui is turned off.
In that case interviews is unnecessary and you can configure with the --without-x option

--without-iv worked

Posted: Tue May 30, 2006 12:43 pm
by Thomas Wennekers
Hi

Configuration without interviews worked. I still had to switch off the above code-snippet in nrnmpi.c by hand. Looks like things compiled properly (up to a few warnings). Simple example programs did run without errors. The pretend-to-be error messaged I had with version 5.8 vanished.

Thanks!

Thomas

Posted: Tue May 30, 2006 12:58 pm
by hines
you can avoid changing the .h file by hand by using the configuration argument
always_call_mpi_init=yes
(there is no -- before the name since it is just an environment variable.)

Neuron & Intel & Myrinet

Posted: Tue May 30, 2006 2:13 pm
by Thomas Wennekers
Hi again

Just for info:

Successfully compiled Neuron version 5.9. with Intel compilers for myrinet, but without interviews.

Lots of warnings, basically related to missing prototypes.

One bug: In oc/ocbbs.cpp I had to replace "nrnmpi_nhost" by "nrnmpi_numprocs". I am not entirely sure wether that made a bug just more subtle or killed it. Nonethelss, compilation run through afterwards and simple tests succeeded (src/parallel/test0.hoc and the example from the ParallelNetworkManager manpage)

static double nhost(void* v) {
#if defined(HAVE_STL)
OcBBS* bbs = (OcBBS*)v;
return double(bbs->nhost());
#else
// return nrnmpi_nhost;
return nrnmpi_numprocs;
#endif
}

config and make logs are here: www.pion.ac.uk/~thomas/open

Regards,
Thomas

Posted: Tue May 30, 2006 2:48 pm
by hines
We are going to have to straighten that out. That is a piece of code that is never supposed to be compiled because HAVE_STL is required nowadays.
Lets deal with this by email. Please send your src/parallel/bbsconf.h file to michael.hines@yale.edu.