Page 1 of 1

NEURON, Python, ParallelNetManager, and BeforestepCallback

Posted: Thu Jan 07, 2016 3:56 pm
by Grado
I am working on a a python extension of the Hahn & McIntyre model http://senselab.med.yale.edu/ModelDb/sh ... del=127388 for the purpose of performing data analysis at each timestep of the model. The model uses ParallelNetManager for parallelization, and I am attempting to use the BeforeStepCallback method (as described here by Hines: viewtopic.php?f=2&t=3389#p14347) to call a python function at each timestep.

To start with, I created a python wrapper for the H&M Model. This code runs the model using any number of processors, and successfully collects and saves all the spikes at the end of the run. (For clarification, the 'py_parBGLaunch.hoc' file is the same as parBGLaunch.hoc from the ModelDB, except it no longer asks for a launch file. parBGLaunch.hoc completely sets up the model, but does not run it, the pBGLaunch.000 file is responsible for initializing and running the model).

Code: Select all

from mpi4py import MPI
from neuron import h, load_mechanisms
comm = MPI.COMM_WORLD

load_mechanisms('H&M/') #Loads mechanisms (Including BSCallback)
h.load_file('H&M/py_parBGLaunch.hoc') #Loads model
h.getOutput('H&M/pNets/dat',0,1) #Loads parameter set

h.tstop = 1000
h.pnm.pinit()
h.pnm.prun()
h.pnm.gatherspikes()
h.saveSpikes()
h.pnm.pc.runworker()
h.pnm.pc.done()
I then added an empty beforestep() function, as shown below, which also runs on any number of processors without issue.

Code: Select all

from mpi4py import MPI
from neuron import h, load_mechanisms
comm = MPI.COMM_WORLD

def beforestep():
  pass

load_mechanisms('H&M/') #Loads mechanisms (Including BSCallback)
h.load_file('H&M/py_parBGLaunch.hoc') #Loads model
h.getOutput('H&M/pNets/dat',0,1) #Loads parameter set

bs_sec = h.Section()
bscallback = h.beforestep_callback(bs_sec(0.5))
bscallback.set_callback(beforestep)

h.tstop = 1000
h.pnm.pinit()
h.pnm.prun()
h.pnm.gatherspikes()
h.saveSpikes()
h.pnm.pc.runworker()
h.pnm.pc.done()
Enough of the preamble. The problem I am having arises in the beforestep() code. Below is a brief sudo-code outline of what I need to do in beforestep(). First, the master collects the spikes from all the workers, and the performs a quick data analysis on the spikes, producing an action to be taken. The workers initialize an empty variable, action. Then, a barrier, syncing all processors, the action is broadcast from the master to all workers, and all processors execute action.

Code: Select all

def beforestep():
  if comm.Get_rank() == 0:
    h.pnm.gatherspikes()
    action = [some analysis code]
  else:
    action = None
  comm.barrier()
  comm.bcast(action,root=0)
  [execute action]
Obviously, I did not attempt to implement this all at once. I started with just performing the data analysis on the spikes on the master worker, shown below. This code runs without error on any number of processors (but when run on any more than one processor, not all the spikes are available to the master)

Code: Select all

def beforestep():
  if comm.Get_rank() == 0:
    action = [some analysis code]
I then attempted to use h.pnm.gatherspikes() to get all of the spikes from the workers to the master. This is where the problem first arises. The sudo-code below runs on one processor, but will stall at the first timestep for any n>1 processors.

Code: Select all

def beforestep():
  if comm.Get_rank() == 0:
    h.pnm.gatherspikes()
    action = [some analysis code]
I believe the issue arrises because of the ParallelNetManager. In py_parBGLaunch.hoc, near the end, pnm.pc.runworker is called. From what I understand, this sends the workers into an infinite loop, asking for tasks from the master, while the master returns immediately. Then, in the python code, I call h.pnm.pinit() and h.pnm.prun(), initializing and running the model. When the code enters beforestep(), and calls h.pnm.gatherspikes(), the master asks for spikes from the workers, but the workers are stuck in infinite loops awaiting tasks from the master! How do I avoid this problem?

Interestingly, h.pnm.gatherspikes() does not produce a problem after the model has finished running

Re: NEURON, Python, ParallelNetManager, and BeforestepCallba

Posted: Sat Jan 09, 2016 12:13 pm
by hines
The implementation of ParallelNetManager.gatherspikes() makes use of the bulletin board and that cannot be used within a parallel network simulation (the only exception
is with ParallelContext.subworlds and that context is each bulletin board worker is doing an independent simulation of an independent network.) and that is why
the gatherspikes is not being successful on each time step (also gather_spike transfers all spikes starting from time 0 so I don't think it makes sense in this context anyway).

Assuming you have your spikes for a step in a list on each rank, all ranks should participate in an MPI_Gather to rank 0.

Re: NEURON, Python, ParallelNetManager, and BeforestepCallba

Posted: Tue Jan 12, 2016 5:03 pm
by Grado
I tried using comm.gather, comm.gatherv, comm.allgather, h.pnm.pc.allgather (I'm not sure where the MPI_Gather is specifically that you referenced). With each of these attempts, it hung on the gather step. I may be coding it wrong, but I never get errors indicating that I did something wrong; it just hangs. Below is an example of the code as I am running it:

Code: Select all

idvec = np.asarray(h.pnm.idvec,dtype='int')
m = idvec.shape[0]
n = comm.Get_size()
recvBuff = np.zeros(m*n,dtype='int')
comm.gather(idvec,recvBuff,root=0)
#comm.gatherv(idvec,recvBuff,root=0)
#comm.allgather(idvec,recvBuff)
#h.pnm.pc.allgather(idvec,recvBuff)
I did find a possible workaround. Instead of using the BeforeStepCallback, I changed the way I am integrating. Because I am only trying to calculate my state at predefined intervals (greater than the step size), instead of using h.pnm.prun(), I am using h.pnm.pcontinue() in a while loop. In this manner, I can put h.pnm.gatherspikes() in the beforestep() function without hanging. However, using this code, something else interesting happens. The length of the IDvec (and spikevec) both become much longer than the should. Over a 20ms simulation, it should have around 700 spikes at the end. When I run the below code with one processor, that is exactly what I get. But when I run it with 2 or more processors, it has around 5000 at the end! I can only imagine that by calling gatherspikes() during a simulation, I am counting spikes more than once. If theres an easy way to prevent this, that could solve my problem.

Code: Select all

def beforestep():
  h.pnm.gatherspikes()
  if comm.Get_rank() == 0:
    print len(h.pnm.idvec)

while h.pnm.pc.t(0) < h.tstop:
  beforestep()
  h.pnm.pcontinue(h.pnm.pc.t(0) + 1)
Additionally, I have tried using pack, post, and take under the above framework (using pcontinue instead of prun). The issue I seem to be having using this method twofold. First, I believe I need the comm.barrier() to ensure that all messages have been posted before the master attempts to unpack. However, with this line, it hangs.
Second, if I comment out comm.barrier(), and then run the below code, it tells me that no messages have been posted.

Code: Select all

if comm.Get_rank() != 0:
  idvec = np.array(h.pnm.idvec,dtype='int')
  h.pnm.pc.pack(idvec)
  h.pnm.pc.post(comm.Get_rank())
comm.barrier()
if comm.Get_rank() == 0:
  for i in range(1,comm.Get_size()):
    print h.pnm.pc.look(i)      

Thanks for your help.

Re: NEURON, Python, ParallelNetManager, and BeforestepCallba

Posted: Wed Jan 13, 2016 12:15 pm
by hines
Does your program hang if you use com.barrier() instead of trying to gather spikes? Remember that an mpi collective will hang if not all ranks participate by calling the
comm.barrier().
Also, as I mentioned earlier, the idvec and spikevec are getting larger and larger as spikes get added to them. It would seem that you wan to transfer only spikes that
occurred since the last time you gather.

It is not surprising that a single process run does not hang since all ranks by definition participate in the collective call.

You cannot use the bulletin board (pc.context,pack,post,take, etc) unless you move to a master/worker style with pc.runworker(). This is generally inconsistent with
a network simulation where spikes are exchanged among ranks during a run.

comm.gather is a reasonable collective to use but be sure to understand what its arguments and return value should be and that any rank that calls it will wait until
all ranks have called it. I "guess" that it returns a list of objects (one from each rank).

Re: NEURON, Python, ParallelNetManager, and BeforestepCallba

Posted: Wed Jan 13, 2016 3:37 pm
by Grado
Yes, the program hangs if I call comm.barrier(). I understand that this is happening because not all ranks are participating in the call to comm.barrier(). I believe this is the case because the Hahn & McIntyre model uses the master/worker style of pc.runworker(). In parBGLaunch.hoc (http://senselab.med.yale.edu/ModelDb/sh ... Launch.hoc), near the end, he calls pnm.pc.runworker(). Any function that requires all processes to complete before moving on (comm.barrier(), comm.gather(), etc) will hang.

What I think would solve my problem is if there is a way to kick the workers OUT of their loops for the duration of my beforestep() function, and then call pmn.pc.runworker() again at the end, thus allowing processes n>=1 to participate in the code. Either way, it appears as though all ranks 1 and greater DO NOT participate in any code whatsoever.


I am still curious about pnm.gatherspikes() however. I agree with you that it is only necessary to transfer spikes that occurred since the last gather. But with the master/worker style of parallelization that this model is in, gather is not working. As a temporary measure, I have implemented a workaround using pnm.gatherspikes() like this:

Code: Select all

def beforestep():
  vecLen = int(len(h.pnm.idvec))
  h.pnm.gatherspikes()
  if comm.Get_rank() == 0:
    idvec = np.asarray(h.pnm.idvec,dtype='int')
    spkvec = np.asarray(h.pnm.spikevec,dtype='f')
    h.pnm.idvec.resize(vecLen)
    h.pnm.spikevec.resize(vecLen)

    [analysis code]
This is obviously not the most efficient (especially as the spikevecs get longer), but it solves my problem in the near term. Previously, I had issues because of the native functionality of gatherspikes(): It appends all of the workers spikes to the master's lists. By calling gatherspikes() multiple times, I was counting spikes multiple times and getting duplicate entries. To get around this, I simply delete entries that gatherspikes() appends after I am done using them. Once again, not the most efficient, but it works.

I would much rather get comm.gather() or comm.pack/post/take working, and only gather spikes that occurred since the last transfer occurred. However, pc.runworker() seems to be preventing this. Any ideas?

Thanks again for your help.

Re: NEURON, Python, ParallelNetManager, and BeforestepCallba

Posted: Thu Jan 14, 2016 12:52 pm
by hines
As far as I can see, this model from modeldb does not work in parallel (at least on my mac air) due to the simultaneous use of mpi and bulletin board.
I need to look into this in more detail as the legacy netparmpi.hoc file encourages this usage. I'm returning from a conference and will be able to
diagnose more effectively this weekend.

Re: NEURON, Python, ParallelNetManager, and BeforestepCallba

Posted: Sun Jan 17, 2016 9:08 am
by hines
The basic problem is that your fragment

Code: Select all

bs_sec = h.Section()
bscallback = h.beforestep_callback(bs_sec(0.5))
bscallback.set_callback(beforestep)
is executed only by the master (rank 0) since that code is after pc.runworker() is called. Hence only rank 0 is participating in any mpi collective that appears inside the 'beforestep' function and so that never returns.
You can put that fragment after def beforestep and before you load py_parBGLaunch.hoc

I cannot recommend controlling a network simulation using the master-worker bulletin board style along with MPI collectives. The master-worker pattern does not scale well to large nhost and it introduces needless complexity when managing any MPI collective call. Also, if you need to keep python fragments such as the above subsequent to the pc.runworker() separation in to master/worker, it would be necessary to put them into a def befroe the separation and then after the separation, invoke with
pc.context(python_callable, arg1, ...) #all the workers
python_callable(arg1, ...) #master

Here is a test.py file.

Code: Select all

from neuron import h
pc = h.ParallelContext()
rank = int(pc.id())
nhost = int(pc.nhost())

def beforestep():
  print pc.t(0), rank
  pc.barrier()

def foo():
  global bs_sec, bscallback
  print 'hello', rank 
  bs_sec = h.Section()
  bscallback = h.beforestep_callback(bs_sec(0.5))
  bscallback.set_callback(beforestep)

h.load_file('py_parBGLaunch.hoc')
h.getOutput('pNets.dat', 0, 1)

pc.context(foo)
foo()

h.tstop = 0.1
h.pnm.pinit()
h.pnm.prun()
h.pnm.gatherspikes()
h.saveSpikes('spikes000.txt')
h.quit()