Nrn timeout error

General issues of interest both for network and
individual cell parallelization.

Moderator: hines

Post Reply
shyam_u2
Posts: 77
Joined: Sun Feb 20, 2011 7:15 pm

Nrn timeout error

Post by shyam_u2 »

I am getting this timeout error when I run my model.

nrn_timeout t=3
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 28500 on
node tombo11103 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

May I know whats causes this ?
hines
Site Admin
Posts: 1687
Joined: Wed May 18, 2005 3:32 pm

Re: Nrn timeout error

Post by hines »

It means that during a run, 20 seconds of wall time passed without t increasing.
This is present to avoid potentially wasting thousands of cpu hours on a supercomputer until the time limit is reached.
It could happen if there is a bug that causes an MPI collective to wait forever.
But sometimes it means you are stopping the sim and taking a long time to write data.
I don't know which it is in your case.
You can set your own timeout with
pc.timeout(x) and if x is 0 the timeout is off. (pc must be a ParallelContext instance)
shyam_u2
Posts: 77
Joined: Sun Feb 20, 2011 7:15 pm

Re: Nrn timeout error

Post by shyam_u2 »

Let me explain the context of this error.
I am working with pc.source_var and pc.target_var for the purpose of making gap junctions working in mpi environment.
I added lines of code incrementally and compiled them. Till the point I insert the statement pc.setup_transfer() everything is fine (Model completes execution). But when I insert pc.setup_transfer, it hangs for sometime and finally throws up nrn_timeout error.
Do you have any idea whats going on here ?
pc.timeout(x) and if x is 0 the timeout is off. (pc must be a ParallelContext instance)
NEURON says timeout is not a public member of Parallel Context.
hines
Site Admin
Posts: 1687
Joined: Wed May 18, 2005 3:32 pm

Re: Nrn timeout error

Post by hines »

Your experience with setup_transfer may be due to a bug which has already been fixed.
The last change to that area of the code was 6 months ago.
ww.neuron.yale.edu/hg/neuron/nrn/rev/b60b3450eff6

Also ParallelContext.timeout was introduced
into the main trunk of the repository 10 months ago.
http://www.neuron.yale.edu/hg/neuron/nr ... c98139370e
So I think it makes sense for you to either build from the repository sources or else the tar.gz file at
http://www.neuron.yale.edu/ftp/neuron/versions/alpha/

If you continue to have problems with timeout in pc.setup_transfer, you can send me all the hoc,mod files in a zip file needed to reproduce
the problem and I can do some diagnosis.
shyam_u2
Posts: 77
Joined: Sun Feb 20, 2011 7:15 pm

Re: Nrn timeout error

Post by shyam_u2 »

All the above things which I mentioned happens only when I decalare vgap as RANGE variable in gap.mod(This is gap junction mechanism adopted in NEURON book chapter 10.1.2). But when I change vgap to pointer variable as it is given in the NEURON book, it throws up a segmentation fault at stdinit.

What causes this segmentation fault ?
Any idea would be greatly appreciated.


Thanks.
hines
Site Admin
Posts: 1687
Joined: Wed May 18, 2005 3:32 pm

Re: Nrn timeout error

Post by hines »

A mod file that declares vgap to be a POINTER would be inconsistent with gap junctions that work in a parallel program. A vgap POINTER could only "watch" a variable, v, in the same
address space and could not watch a variable on another machine. To implement parallel gap junctions in which vgap and v can be on different machines requires the use of a pair of
ParallelContext.source_var and target_var calls, each on the machine where the corresponding variable exists, and the target_var needs vpre to be a RANGE variable, not a POINTER.

Returning to the timeout problem, if that still occurs with the latest version of NEURON, you should send me your code and instructions on how to see the problem and I can try to
diagnose what is going wrong. Send the zip file to michael dot hines at yale dot edu.
shyam_u2
Posts: 77
Joined: Sun Feb 20, 2011 7:15 pm

Re: Nrn timeout error

Post by shyam_u2 »

OK. I will execute it with recent NEURON version. Thank you Hines.
shyam_u2
Posts: 77
Joined: Sun Feb 20, 2011 7:15 pm

Re: Nrn timeout error

Post by shyam_u2 »

I installed the recent alpha version and it works fine. Thank you for your help.
Post Reply