I am fairly new to parallel processing and I am running into a somewhat odd problem while trying to access and re-use a previously used ParallelContext(). When I first use a ParallelContext in my hoc code to run a parameter search (i.e. simulate many iterations of my models) using a bulletin board style parallelization (or master-slave paradigm) on Neuroscience Gateway (NSG), all of the output comes out fine and the parameter search takes a lot less time. However, I've found that with the large number of files generated from these parameter searches, it would be best to analyze all of the model traces on NSG (using the eFEL module in Python) such that I can delete the model files once the analysis is done and only have to download summary vectors of select measurements instead of having to download all of the traces generated in the parameter search. Also, because analysis of the dataset takes quite a long time in serial, I have decided to try to run it in Parallel as well (i.e. with the same bulletin board style of parallelization).
The problem that I am having is that, while the simulations are run in parallel without issue, the analysis of the traces seems to get stuck and the jobs that I submit to NSG fail to finish. Specifically, this depends on the number of processors that are available. In serial (1 processor available), the analysis completes without issue. When 2 processors are available, the 1st processor gets assigned the 1st trace and gets stuck, while the 2nd processor runs the analysis of the rest of the traces with no issue (essentially in serial, since there is only one processor remaining because the 1st one is stuck). When 3 processor are available, the first two processors get assigned the first two traces for analysis and both get stuck, while the 3rd processor runs the analysis of the rest of the traces with no issue (again, essentially in serial). As I mentioned, these jobs fail to finish since I never receive any output for traces that get submitted for analysis onto any of the N-1 processors. The tricky part is that I do not receive any errors when this happens (aside from having exceeded the runtime limit). I've recreated this scenario with a much simplified code:
init.py:
Code: Select all
from neuron import h
h.load_file("SynParamSearch.hoc")
execfile("CutSpikes_HighConductanceMeasurements.py")
h.quit()
Code: Select all
// This script is used to search the synaptic parameter space of the IS3 model by varying the number of excitatory and inhibitory synapses as well as their presynaptic spike rates
load_file("nrngui.hoc")
proc f() {
count = $1
print count
}
// Set up parallel bulletin-board context
objectvar pc
pc = new ParallelContext()
{pc.runworker()}
count = 25
// Set up parallel context
if (pc.nhost == 1){
for l = 0, count-1 f(l)
}else{
for l = 0, count-1 pc.submit("f",l)
while (pc.working) { // gather results
}
}
{pc.done()}
Code: Select all
def getMeasures(TrIn):
trace_index = int(TrIn)
print('Trace Index = ' + str(trace_index))
outputresults = [trace_index,trace_index*2,trace_index*3,trace_index*4,trace_index*5,trace_index*6]
return outputresults
from neuron import h
import numpy
pc = h.pc
pc.runworker()
Vec_count = 25
StdVolt = numpy.zeros((Vec_count,), dtype=numpy.float64)
MeanVolt = numpy.zeros((Vec_count,), dtype=numpy.float64)
MeanAPamp = numpy.zeros((Vec_count,), dtype=numpy.float64)
ISICV = numpy.zeros((Vec_count,), dtype=numpy.float64)
NumSpikes = numpy.zeros((Vec_count,), dtype=numpy.int)
# Set up parallel context
print 'Number of Hosts = ' + str(pc.nhost())
if pc.nhost() == 1:
for l in range(0,Vec_count):
results = getMeasures(l)
StdVolt[results[0]] = results[1]
MeanVolt[results[0]] = results[2]
NumSpikes[results[0]] = results[3]
MeanAPamp[results[0]] = results[4]
ISICV[results[0]] = results[5]
else:
print 'Step 1: submit jobs'
for l in range(0,Vec_count):
pc.submit(getMeasures,l)
print 'Trace Index Submit = ' + str(l)
print 'Step 2: working'
while pc.working():
print 'Step 2A'
print 'User ID = ' + str(pc.userid())
results = pc.pyret()
print 'Results = ' + str(results)
print 'Step 2B: Store Results'
StdVolt[results[0]] = results[1]
MeanVolt[results[0]] = results[2]
NumSpikes[results[0]] = results[3]
MeanAPamp[results[0]] = results[4]
ISICV[results[0]] = results[5]
print 'Step 2C: Results Storage Complete'
print 'Step 3: Done'
pc.done()
Code: Select all
22
0
4
6
10
14
18
23
2
7
11
15
19
24
1
5
8
12
16
20
3
9
13
17
21
Number of Hosts = 25.0
Step 1: submit jobs
Trace Index Submit = 0
Trace Index Submit = 1
Trace Index Submit = 2
Trace Index Submit = 3
Trace Index Submit = 4
Trace Index Submit = 5
Trace Index Submit = 6
Trace Index Submit = 7
Trace Index Submit = 8
Trace Index Submit = 9
Trace Index Submit = 10
Trace Index Submit = 11
Trace Index Submit = 12
Trace Index Submit = 13
Trace Index Submit = 14
Trace Index Submit = 15
Trace Index Submit = 16
Trace Index Submit = 17
Trace Index Submit = 18
Trace Index Submit = 19
Trace Index Submit = 20
Trace Index Submit = 21
Trace Index Submit = 22
Trace Index Submit = 23
Trace Index Submit = 24
Step 2: working
Trace Index = 24
Step 2A
User ID = 50.0
Results = [24, 48, 72, 96, 120, 144]
Step 2B: Store Results
Step 2C: Results Storage Complete
Thanks for your time,
Alex GM