Debug parallel NEURON on a Linux cluster

neuromau · Post by **neuromau** » Thu Aug 13, 2009 7:30 pm

Hi, we are running a parallel network model on an 8-node Linux cluster and are wondering what approach to take with debugging parallel NEURON programs. I'm not too familiar with our options, but think they include the following. Do you have any suggestions as to which option is best?

1. Run a serial debugger in parallel
GDB: Not sure what all this involves. It looks like we would need to introduce a while loop early in the code that essentially pauses our program, then attach gdb to each of the 8 processors, and then unpause our program.

2. Run a parallel debugger
Totalview: http://www.totalviewtech.com/
DDT: http://www.allinea.com/?page=48

If using a parallel debugger is your suggestion, do you recommend a particular one?

Thanks,
Marianne

Post by **ted** » Fri Aug 14, 2009 11:50 am

My preference would be none of the above, but it would be very interesting to hear other informed opinions.

The problems that afflict parallelized model implementations are the same as those that afflict serial implementations, plus a handful of issues that are peculiar to parallelization itself.

Most problems fall into the broad category of "mismatch between conceptual model and computational implementation." The principal task in computational modeling is to establish and verify a close match between what is in the modeler's head and what is in the computer. Absent such a match, the computational model cannot be relied on as a means for testing or gaining insight into the conceptual model on which it is based. Modular program design, incremental development and testing, and use of the Model View tool to check model properties at run time are the best ways to ensure a close match between concept and computation.

The issues that are peculiar to parallelization are related to the problem of ensuring that the implementation generates the same results no matter how many processors there are, or how cells are distributed over the processors. It is best to start with a serial implementation that is known to be correct. Reproducibility of results is an essential aspect of being "correct": the serial implementation should produce the same results on each run. So should the parallel implementation, and the parallel and serial implementations should produce identical results.

Problems often arise from failed attempts to introduce randomness into a network. The correct way to do this is by associating each cell with its own pseudorandom sequence generator. For examples of good programming practice see
Hines ML, Carnevale NT (2008) Translating network models to parallel hardware in NEURON J. Neurosci. Meth. 169:425-455
http://senselab.med.yale.edu/modeldb/Sh ... odel=96444
and the NEURON implementation associated with
Brette R, Rudolph M, Carnevale T, Hines M, Beeman D, Bower JM, Diesmann M, Morrison A, et al. (2007) Simulation of networks of spiking neurons: A review of tools and strategies. J Comp Neurosci 23:349-98
http://senselab.med.yale.edu/modeldb/Sh ... odel=83319

For a spiking net, the results are the spike raster (record of all spikes produced by all cells during a run). The parallelized implementation should produce the same spike raster regardless of the number of hosts or which cells are on which hosts. If it does not, identify the cell in which the first error is seen. Verify that the biophysical and anatomical properties of the cell itself are correct in the parallel and serial implementation (this includes verifying that spatial discretization is identical). Identify the afferent spike trains to that cell in the serial implementation and in the parallel implementation and compare them. Verify that afferents target the proper synapses, with correct weighs, and that the synapses are attached to the correct locations. It may be easiest just to simulate that cell by itself, while driving its synapses with the recorded afferent spike trains to make sure that it produces the same voltage trajectory in the serial and parallel implementations (using PatternStim, as described in
Hines M, Eichner H, Schuermann F (2008) Neuron splitting in compute-bound parallel network simulations enables runtime scaling with twice as many processors J Comput Neurosci 25(1):203-210
which has a preprint you can get from here
http://www.neuron.yale.edu/neuron/paper ... itcell.pdf
and an entry in ModelDB
http://senselab.med.yale.edu/ModelDB/Sh ... odel=97917 ).

neuromau · Post by **neuromau** » Fri Aug 14, 2009 12:58 pm

Thanks for the quick reply and these resource links. I've read the first one a few times and found it very helpful. Now I will look through the rest of these links as well. Plus, my NEURON book just arrived yesterday!

www.neuron.yale.edu

Debug parallel NEURON on a Linux cluster

What is the best option for debugging a parallel NEURON program?

Debug parallel NEURON on a Linux cluster

Re: Debug parallel NEURON on a Linux cluster

Re: Debug parallel NEURON on a Linux cluster