Minimal Working Example of MPI Gap Junctions (in python)

Tom Close · Post by **Tom Close** » Wed Jan 09, 2013 5:26 am

Hi Guys,

I have been trying to get a "minimal" working example together for an implementation of gap junctions over MPI before moving on to a more complex system. Well actually, I got the more complex system working using MPI-aware syntax (i.e. pc.source_var, pc.target_var) on a single process, but when I increase the number of processes it throws a segfault, so I have tried to get a simpler model working to help understand where the problem is. However, I am still running into trouble and I am not even sure whether it is the same trouble I was having before:)

Here is the MWE code I have written (NB: at this stage it is a "rectified gap junction" where the current only flows one way for testing purposes, whereas usually you will want to have a bidirectional gap junction. In that case you will need another gap junction connection going from the "post-synaptic" cell to the "pre-synaptic" cell):

Code: Select all

"""

A minimum working example of a NEURON gap junction over MPI

Author: Tom Close
Date: 8/1/2013
Email: tclose@oist.jp
"""

import os
import argparse
import numpy as np
# This is a hack I use on our cluster, to get MPI initialised=True. There is probably something
# wrong with our setup but I can't be bothered trying to work out what it is at this point. All
# suggestions welcome :)
try:
    from mpi4py import MPI #@UnresolvedImport @UnusedImport
except:
    print "mpi4py was not found, MPI will remain disabled if MPI initialized==False on startup"
from neuron import h, load_mechanisms
# Not sure this is necessary, or whether I can just use h.finitialize instead of h.stdinit
h.load_file('stdrun.hoc')

# The GID used for the gap junction connection. NB: this number is completely independent from the 
# GID's used for NEURON sections.
GID_FOR_VAR = 0

# Arguments to the script
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('--plot', action='store_true', help="Plot the data instead of saving it")
parser.add_argument('--output_dir', type=str, default=os.getcwd(), 
                    help="The directory to save the output files into")
parser.add_argument('--gap_mechanism_dir', type=str, default=os.getcwd(),
                    help="The directory to load the gap mechanism from")
args = parser.parse_args()

# Load gap mechanism from another directory if required    
if args.gap_mechanism_dir is not os.getcwd():
    load_mechanisms(args.gap_mechanism_dir)
# Get the parallel context and related parameters
pc = h.ParallelContext()
num_processes = int(pc.nhost())
mpi_rank = int(pc.id())
print "On process {} of {}".format(mpi_rank+1, num_processes)

print "Creating test network..."
# The pre-synaptic cell is created on the first node and the post-synaptic cell on the last node 
# (NB: which will obviously be the same if there is only one node)
if mpi_rank == 0:
    print "Creating pre-synaptic cell on process {}".format(mpi_rank)
    # Create the pre-synaptic cell
    pre_cell = h.Section()
    pre_cell.insert('pas')
    # Connect the voltage of the pre-synaptic cell to the gap junction on the post-synaptic cell
    pc.source_var(pre_cell(0.5)._ref_v, GID_FOR_VAR)    
    # Stimulate the first cell to make it obvious whether gap junction is working
    stim = h.IClamp(pre_cell(0.5))
    stim.delay = 50
    stim.amp = 10
    stim.dur = 100
    # Record Voltage of pre-synaptic cell
    pre_v = h.Vector()
    pre_v.record(pre_cell(0.5)._ref_v)
if mpi_rank == (num_processes - 1):
    print "Creating post-synaptic cell on process {}".format(mpi_rank)
    # Create the post-synaptic cell
    post_cell = h.Section()
    post_cell.insert('pas')    
    # Insert gap junction
    gap_junction = h.gap(0.5, sec=post_cell)
    gap_junction.g = 1.0
    # Connect gap junction to pre-synaptic cell
    pc.target_var(gap_junction._ref_vgap, GID_FOR_VAR)
    # Record Voltage of post-synaptic cell
    post_v = h.Vector()
    post_v.record(post_cell(0.5)._ref_v)
# Finalise construction of parallel context
pc.setup_transfer()    
# Record time
rec_t = h.Vector()
rec_t.record(h._ref_t)    
print "Finished network construction on process {}".format(mpi_rank)
    
# Run simulation    
print "Setting maxstep on process {}".format(mpi_rank)
pc.set_maxstep(10)
print "Finitialise on process {}".format(mpi_rank)
#h.finitialize(-60)
h.stdinit()
print "Solving on process {}".format(mpi_rank)
pc.psolve(100)
print "Running worker on process {}".format(mpi_rank)
pc.runworker()
print "Completing parallel context on process {}".format(mpi_rank)
pc.done()
print "Finished run on process {}".format(mpi_rank)

# Convert recorded data into Numpy arrays
t_array = np.array(rec_t)
if mpi_rank == 0:
    pre_v_array = np.array(pre_v)
if mpi_rank == (num_processes - 1):
    post_v_array = np.array(post_v)

# Either plot the recorded values
if args.plot and num_processes == 1:
    print "Plotting..."
    import matplotlib.pyplot as plt
    if mpi_rank == 0:
        pre_fig = plt.figure()
        plt.plot(t_array, pre_v_array)
        plt.title("Pre-synaptic cell voltage")
        plt.xlabel("Time (ms)")
        plt.ylabel("Voltage (mV)")
    if mpi_rank == (num_processes - 1):
        pre_fig = plt.figure()
        plt.plot(t_array, post_v_array)
        plt.title("Post-synaptic cell voltage")
        plt.xlabel("Time (ms)")
        plt.ylabel("Voltage (mV)")
    plt.show()
else:
    # Save data
    print "Saving data..."
    if mpi_rank == 0:
        np.savetxt(os.path.join(args.output_dir, "pre_v.dat"), 
                   np.transpose(np.vstack((t_array, pre_v_array)))) 
    if mpi_rank == (num_processes - 1):
        np.savetxt(os.path.join(args.output_dir, "post_v.dat"), 
                   np.transpose(np.vstack((t_array, post_v_array))))
print "Done."

where the gap junction mod file is basically the same as the one in the SRandCRnet example (my colleague suggested changing vgap to a pointer but it didn't seem to effect anything)

Code: Select all

: gap.mod
: This is a conductance based gap junction model rather
: than resistance because Traub occasionally likes to 
: set g=0 which of course is infinite resistance.
NEURON {
  SUFFIX gap
  POINT_PROCESS gap
  RANGE g, i
  POINTER vgap
  ELECTRODE_CURRENT i
}
PARAMETER { g = 1e-10 (1/megohm) }
ASSIGNED {
  v (millivolt)
  vgap (millivolt)
  i (nanoamp)
}
BREAKPOINT { i = (vgap - v)*g }

with "MPI initialised==false" this code works fine, however, when I run it on our cluster with mpirun and -np 1 I get

Code: Select all

MPI_Initialized==true, enabling MPI functionality.
numprocs=1
NEURON -- Release 7.2 (562:42a47463b504) 2011-12-21
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2008
See http://www.neuron.yale.edu/credits.html

--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.  

The process that invoked fork was:

  Local host:          tombo20501 (PID 23294)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
Additional mechanisms from files
 adexp.mod alphaisyn.mod alphasyn.mod expisyn.mod gap.mod gsfa_grr.mod hh_traub.mod netstim2.mod new_gap.mod refrac.mod reset.mod stdwa_guetig.mod stdwa_softlimits.mod stdwa_songabbott.mod stdwa_symm.mod tmgsyn.mod tmisyn.mod vecstim.mod
On process 1 of 1
Creating test network...
Creating pre cell on process 0
Creating post cell on process 0
Finished network construction on process 0
Setting maxstep on process 0
Finitialise on process 0
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 23294 on node tombo20501 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I am not sure if the fork warning has anything to do with the segfault, as I have received it before and happily ignored it without any trouble. However, it does concern me a little. This segmentation fault goes away when I remove the gap junctions (the source_var and target_var and don't insert the mechanism).

I get another error, which is probably unrelated, when I run the script with -np 2,

Code: Select all

MPI_Initialized==true, enabling MPI functionality.
NEURON -- Release 7.2 (562:42a47463b504) 2011-12-21
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2008
See http://www.neuron.yale.edu/credits.html

MPI_Initialized==true, enabling MPI functionality.
numprocs=2
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.  

The process that invoked fork was:

  Local host:          tombo20502 (PID 8277)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
Additional mechanisms from files
 adexp.mod alphaisyn.mod alphasyn.mod expisyn.mod gap.mod gsfa_grr.mod hh_traub.mod netstim2.mod new_gap.mod refrac.mod reset.mod stdwa_guetig.mod stdwa_softlimits.mod stdwa_songabbott.mod stdwa_symm.mod tmgsyn.mod tmisyn.mod vecstim.mod
On process 2 of 2
Creating test network...
Creating post cell on process 1
On process 1 of 2
Creating test network...
Creating pre cell on process 0
Finished network construction on process 1
Setting maxstep on process 1
Finished network construction on process 0
Setting maxstep on process 0
Finitialise on process 0
Finitialise on process 1
python: netpar.cpp:469: void nrn_spike_exchange_init(): Assertion `usable_mindelay_ >= dt && (usable_mindelay_ * dt1_) < 255' failed.
python: netpar.cpp:469: void nrn_spike_exchange_init(): Assertion `usable_mindelay_ >= dt && (usable_mindelay_ * dt1_) < 255' failed.
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 8278 on node tombo20502 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I am kind of out of ideas, so any leads would be appreciated.

Thanks,

Tom

Tom Close · Post by **Tom Close** » Mon Jan 21, 2013 10:28 pm

So I have made some progress, in that by using a slightly different mod file with the pointer to vgap removed and making vgap a range variable (probably should have tried this earlier):

Code: Select all

NEURON {
  POINT_PROCESS gap
  ELECTRODE_CURRENT i
  RANGE g, i, vgap
}

PARAMETER {
  g = 1 (nanosiemens)
}

ASSIGNED {
  v (millivolt)
  vgap (millivolt)
  i (nanoamp)
}

BREAKPOINT {
  i = g * (vgap - v)
}

I am able to get it to run with MPI initialised==true. However, I haven't been able to test whether it works over multiple processes because our server is down for scheduled maintenance at the moment, and when I try the line

Code: Select all

mpirun -np 2 gap_mwe.py

on my local machine it still only gives me numprocs=1 (but MPI initialised==true). Is there something I need to configure to get this to use numprocs=2 on my machine?

Post by **ted** » Tue Jan 22, 2013 11:06 am

Tom Close wrote:I have been trying to get a "minimal" working example together for an implementation of gap junctions
. . .
where the gap junction mod file is basically the same as the one in the SRandCRnet example

1. You mean you patterned it after gap.mod in ModelDB entry 135902?

2. "basically the same" except that

(1) this double declaration

Code: Select all

  SUFFIX gap
  POINT_PROCESS gap

should gag nrnivmodl (or mknrndll). Are you sure that compiling your mod file(s) didn't issue a big warning? An NMODL-defined mechanism can be either a density mechanism or a point process, but not both. A gap junction generates a spatially localized current, and should therefore be implemented as a point process. Delete SUFFIX gap, declare the mechanism to be POINT_PROCESS Gap (to conform with the general practice of capitalizing the names of point process classes), then edit your Python code accordingly.

(2) Lazarewicz's Gap mechanism delivers a NONSPECIFIC_CURRENT i, and so should yours. To understand why, first see http://www.neuron.yale.edu/neuron/stati ... ic_Current and then read http://www.neuron.yale.edu/phpbb/viewto ... 2475#p9807

(my colleague suggested changing vgap to a pointer but it didn't seem to effect anything)

Changing which vgap where? vgap is already a pointer in gap.mod

Tom Close · Post by **Tom Close** » Tue Jan 29, 2013 3:07 am

Thanks for your reply Ted.

1. Yes

2. Okay, that was silly. So now I am using the same as Lazarewicz's Gap NMODL code, with the slight exception that 'vgap' is declared as a RANGE variable instead of a POINTER (as per Hines' instructions at http://www.neuron.yale.edu/phpBB/viewto ... =31&t=2743) and I have replaced 'ggap' with 'g' to match the rest of my code, i.e.:

Code: Select all

COMMENT

//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
//
// NOTICE OF COPYRIGHT AND OWNERSHIP OF SOFTWARE
//
// Copyright 2007, The University Of Pennsylvania
//  School of Engineering & Applied Science.
//   All rights reserved.
//   For research use only; commercial use prohibited.
//   Distribution without permission of Maciej T. Lazarewicz not permitted.
//   mlazarew@seas.upenn.edu
//
//%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

ENDCOMMENT

NEURON {

    POINT_PROCESS Gap
    RANGE g, i, vgap
    NONSPECIFIC_CURRENT i
}

UNITS {

  (nA) = (nanoamp)
  (mV) = (millivolt)
  (nS) = (nanosiemens)
}

PARAMETER { g = 0 (nS) }
    
ASSIGNED {

    v    (mV)
    vgap (mV)
    i    (nA)
}
 
BREAKPOINT { 

  if (g>0) {i = (1e-3) * g * (v-vgap) }

}

The code now gets past the finitialise step but like http://www.neuron.yale.edu/phpBB/viewto ... =31&t=2743 (who is actually my colleague, Shyam, and who in the mean time has been trying to get a hoc version to work based off the original ModelDB:135902 example) it gets stuck somewhere until it throws a nrn_timeout t=10:

Code: Select all

============== Starting mpirun ===============
MPI_Initialized==true, enabling MPI functionality.
numprocs=2
MPI_Initialized==true, enabling MPI functionality.
NEURON -- Release 7.2 (562:42a47463b504) 2011-12-21
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2008
See http://www.neuron.yale.edu/credits.html

--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.  

The process that invoked fork was:

  Local host:         <server-name> (PID 7956)
  MPI_COMM_WORLD rank: 1

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
Additional mechanisms from files
gap.mod
On process 1 of 2
Creating test network...
Creating pre cell on process 0
On process 2 of 2
Creating test network...
Creating post cell on process 1
Finished network construction on process 1
Setting maxstep on process 1
Finitialise on process 1
Finished network construction on process 0
Setting maxstep on process 0
Finitialise on process 0
Solving on process 0
Solving on process 1
[server-name] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:warn-fork
[server-name] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
nrn_timeout t=10
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

So it looks as though there is a problem with the MPI setup, especially considering it works fine when run on one process (with MPI_Initialized==true).

This made me think that the fork warning was a likely candidate for the problem, but after reading this response http://www.open-mpi.org/community/lists ... 1/8729.php and http://www.open-mpi.org/community/lists ... 1/8735.php it doesn't look quite as likely because it occurs before the mechanisms are loaded (for more information see http://www.open-mpi.de/faq/?category=tu ... rk-warning and http://www.open-mpi.org/faq/?category=o ... s#ofa-fork). But I can't say for sure and don't know what to try next.

(NB: http://www.neuron.yale.edu/phpBB/viewto ... =31&t=2743 and this thread are now up to pretty much the same point, would you like to combine them in some way?)

Tom Close · Post by **Tom Close** » Tue Jan 29, 2013 8:07 pm

Okay, so we found out that if you insert a second gap junction mechanism (going back from section2 to section1) to form a bi-directional gap junction, the problem with the nrn_timeout goes away. So maybe using MPI introduces some sort of instability that isn't a problem when the gap junctions are bi-directional (because the rectified gap junction worked on one node). Is the bi-directional gap junction an inherently more stable configuration than the rectified configuration?

In any case, I was planning to use bi-directional gap junctions , so this won't be a problem for me. However, it would be interesting to find out what the root cause of it is, just in case there are other problems that might stem from it.

Here is my final code if anyone is interested (I also realised that I hadn't understood all the parallel context methods I was using, like runworker() and done(), properly in my first attempt).

Code: Select all

#!/usr/bin/env python
"""

A minimum working example of NEURON gap junctions over MPI

Author: Tom Close
Date: 8/1/2013
Email: tclose@oist.jp
"""

import sys
import os
import argparse
import numpy as np
# This is a hack I use on our cluster, to get MPI initialised=True. There is probably something
# wrong with our setup but I can't be bothered trying to work out what it is at this point. All
# suggestions welcome :)
try:
    from mpi4py import MPI #@UnresolvedImport @UnusedImport
except:
    print "mpi4py was not found, MPI will remain disabled if MPI initialized==false on startup"
from neuron import h
# Arguments to the script
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('--plot', action='store_true', help="Plot the data instead of saving it")
parser.add_argument('--output_dir', type=str, default=os.getcwd(),
                    help="The directory to save the output files into")
args = parser.parse_args()
# Get the parallel context and related parameters
pc = h.ParallelContext()
num_processes = int(pc.nhost())
mpi_rank = int(pc.id())
print "Creating test network..."
# The first section is created on the first node and the second section on the last node 
# (NB: which will obviously be the same if there is only one node)
if mpi_rank == 0:
    print "Creating pre section on process {}".format(mpi_rank)
    # Create the first section
    section1 = h.Section()
    section1.insert('pas')
    # Set up the voltage of the first section to be able to connect to the gap junction
    # on the second section
    pc.source_var(section1(0.5)._ref_v, 0)
    # Insert gap junction
    gap_junction1 = h.Gap(0.5, sec=section1)
    gap_junction1.g = 100.0
    # Set up the gap junction on the first section to connect with the second section
    # voltage
    pc.target_var(gap_junction1._ref_vgap, 1)
    # Stimulate the first section to make it obvious whether gap junction is working
    stim = h.IClamp(section1(0.5))
    stim.delay = 50
    stim.amp = 10
    stim.dur = 100
    # Record Voltage of first section
    v1 = h.Vector()
    v1.record(section1(0.5)._ref_v)
if mpi_rank == (num_processes - 1):
    print "Creating post section on process {}".format(mpi_rank)
    # Create the second section
    section2 = h.Section()
    section2.insert('pas')
    # Set up the voltage of the second section to be able to connect to the gap junction
    # on the first section
    pc.source_var(section2(0.5)._ref_v, 1)
    # Insert gap junction
    gap_junction2 = h.Gap(0.5, sec=section2)
    gap_junction2.g = 100.0
    # Set up the gap junction on the second section to connect with the first section
    # voltage
    pc.target_var(gap_junction2._ref_vgap, 0)
    # Record Voltage of second section
    v2 = h.Vector()
    v2.record(section2(0.5)._ref_v)
# Finalise construction of parallel context
pc.setup_transfer()
# Record time
rec_t = h.Vector()
rec_t.record(h._ref_t)
print "Finished network construction on process {}".format(mpi_rank)

# Steps to run simulation    
h.dt = 0.25
print "Setting maxstep on process {}".format(mpi_rank)
pc.set_maxstep(10)
print "Finitialise on process {}".format(mpi_rank)
h.finitialize(-60)
print "Running simulation on process {}".format(mpi_rank)
pc.psolve(100)
print "Finished run on process {}".format(mpi_rank)

# Convert recorded data into Numpy arrays
t_array = np.array(rec_t)
if mpi_rank == 0:
    v1_array = np.array(v1)
if mpi_rank == (num_processes - 1):
    v2_array = np.array(v2)

# Either plot the recorded values
if args.plot and num_processes == 1:
    print "Plotting..."
    import matplotlib.pyplot as plt
    if mpi_rank == 0:
        pre_fig = plt.figure()
        plt.plot(t_array, v1_array)
        plt.title("Section 1 voltage")
        plt.xlabel("Time (ms)")
        plt.ylabel("Voltage (mV)")
    if mpi_rank == (num_processes - 1):
        pre_fig = plt.figure()
        plt.plot(t_array, v2_array)
        plt.title("Section 2 voltage")
        plt.xlabel("Time (ms)")
        plt.ylabel("Voltage (mV)")
    plt.show()
else:
    # Save data
    print "Saving data on process {}...".format(mpi_rank)
    if mpi_rank == 0:
        np.savetxt(os.path.join(args.output_dir, "v1.dat"),
                   np.transpose(np.vstack((t_array, v1_array))))
    if mpi_rank == (num_processes - 1):
        np.savetxt(os.path.join(args.output_dir, "v2.dat"),
                   np.transpose(np.vstack((t_array, v2_array))))
if mpi_rank == 0:
    print "Allowing other processes (if present) to complete on process 0"
    pc.runworker()
    print "Completing parallel context from process 0"
    pc.done()
print "Finished process {}".format(mpi_rank)

The gap junction mechanism is the same as my previous post.

Cheers,

Tom

Post by **ted** » Tue Jan 29, 2013 11:56 pm

Tom Close wrote:we found out that if you insert a second gap junction mechanism (going back from section2 to section1) to form a bi-directional gap junction, the problem with the nrn_timeout goes away. So maybe using MPI introduces some sort of instability that isn't a problem when the gap junctions are bi-directional (because the rectified gap junction worked on one node). Is the bi-directional gap junction an inherently more stable configuration than the rectified configuration?

A real gap junction conserves charge and mass, i.e. ions that pass through the gap are lost from the "upstream" cell but gained by the "downstream" cell. This cannot be accurately represented in a model with a single "gap" point process, because a point process can only affect charge and mass in the section to which it is attached; there must be a second gap point process attached to the other section that delivers an equal and opposite ion flux. If there isn't, the model violates charge and mass balance. Sounds like that was the ultimate cause of the timeout.

Tom Close · Post by **Tom Close** » Wed Jan 30, 2013 1:18 am

I take your point that only providing one side of the gap junction is not equivalent to a rectifying electrical synapse (which having had a think about it, would require the conductance to vary with the voltage difference). However, whether or not it is a realistic representation of a real gap junction doesn't explain why the solver only blows up when using multiple processes does it? In any case, as you said you need both sides for a realistic gap junction so it is probably academic, but I would be interested if you ever work out what might have caused it.

Thanks for all your help Ted.

Cheers,

Tom

Post by **hines** » Wed Jan 30, 2013 12:36 pm

A single side Gap point process could be interpreted as a weak series resistance two electrode voltage clamp in which the control voltage is the source_var and you are trying to move the target_var voltage aways toward the source voltage. As long as the target voltage does not effect the source voltage with too strong a coupling
(after all, they are on different cells), I don't see any numerical stability problems which would cause the
solver to fail. I have to guess that the problems you experienced (with timeout? with segmentation violations?) had to do with bugs in the use of the ParallelContext source_var, target_var, setup_transfer
methods. pc.setup_transfer was extended 6 months ago (so you should use the latest version 7.3) to give a decent error message when a target variable uses a source id that has no corresponding
source variable. Anyway, if you take your now working multiprocessor version and eliminate one of the half gaps and that causes an error, let me know.

If I might make a suggestion about a design for connecting cells with gap junctions, it has turned out to be extremely useful to make the network setup code independent
of the number of processors or distribution of cells on
processors. For this purpose, one can make use of the method pc.gid_exists(cellgid) to establish that a specific cell exists in the process and
http://www.neuron.yale.edu/neuron/stati ... ion_exists
if you have gone so far as to use the multsplit method to distribute pieces of the same cell on different processors.

Tom Close · Post by **Tom Close** » Fri Feb 08, 2013 6:43 am

Thanks for the tips. I am actually writing a gap junction connector for pyNN so if I understand what you mean correctly the gap junctions will be independent of the number of processes (is there any specific reason why it is useful by the way?).

I have now progressed from my simple "minimal working example" model to the pyNN connector class method, so things have gotten a lot more complicated, including a fair bit of hacking that I had to do to the existing pyNN code. But I can now get the code to run over multiple processes... up until the number of processes exceeds the number of gap junction when I get

error: nrn_timeout =0.0125 (or some number like that).

Any suggestions as to what could cause that? I have added print statements to check to see if all the source_var and target_var are connected up right and they seem to be so I am a little lost. It could well be a problem with my connection algorithm but any clues as to help me track it down would be appreciated. I am running the latest alpha version of NEURON, 7.3, (I got the same message with 7.2 as well though), and I was trying to play around with the timeout but didn't find it listed under the neuron.simulator.state.parallel_context object.

Thanks,

Tom

Post by **hines** » Fri Feb 08, 2013 9:42 am

I see from
http://www.neuron.yale.edu/neuron/stati ... p_transfer
that I did not explicitly state that ALL mpi processes have to call pc.setup_transfer() if any do.
I will have to make that explicit. It is generally the case when calling any ParallelContext method that makes use of mpi collectives
that one has to do this for all processes (or at least all processes in a subworld).

Post by **hines** » Fri Feb 08, 2013 9:48 am

Forgot to comment on your first question. The important performance reason is because load balance is all important and it will not
in general be possible to know a-priori which is the best distribution of cells on processes. Another reason is that it makes debugging
much easier if you can get quantitatively identical results on different numbers of processes. Finally, it allows the number of processes
to increase without modifying the model code.

Tom Close · Post by **Tom Close** » Fri Feb 08, 2013 10:42 am

Okay, that makes sense.

I double checked and I am calling "setup_transfer()" on all processes. It doesn't make any assumptions about there being source_var and target_var connections being present on every process does it? Because I seem to get this timeout error when there is at least one node that doesn't have both.

Tom Close · Post by **Tom Close** » Fri Feb 08, 2013 11:04 am

I think it might be something to do with that because I was able to replicate the error:

nrn_timeout t=0.125

with my MWE with > 2 processes (<= 2 processes works fine).

(I also had to add "dummy_section = h.Section()" somewhere near the top to keep the rest of the code happy).

Post by **hines** » Fri Feb 08, 2013 11:22 am

There do not have to be source and target variables on every process. Actually, I wasn't certain about setup_transfer being the
problem since I wasn't certain about timeout being in effect until after the call to finitialize. You are using a version more recent than
6 months ago, right? Anyway, it seems reasonable at this point to see if i can reproduce your problem on my machine. You woud need to
send me all the code i need to do so (ie. the hoc, py, mod, sed files specific to NEURON, in a zip file) along with instruction as to how to
run. Since you mentioned PyNN, if that is required I'd need enough of it as well to launch a simulation. perhaps it is in a public repository
or tar.gz file somwhere? We can communicate via email, michael dot hines at yale dot edu

Tom Close · Post by **Tom Close** » Fri Feb 08, 2013 8:01 pm

Thank Michael, I will send you an email with the code, but actually you can reproduce the problem on 3 processes with the code I have posted above (adding in the dummy_section to keep the record_t function happy on the middle and otherwise unused process).

www.neuron.yale.edu

Minimal Working Example of MPI Gap Junctions (in python)

Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)

Re: Minimal Working Example of MPI Gap Junctions (in python)