Parallelizing single-cell models

Eleftheria Pissadaki · Fri Dec 04, 2009 8:31 pm

Dear all,

Can I run in parallel a single-cell model simulation? In "Translating network models to parallel hardware in NEURON" [1] I see the simulation is parallelized by manually distributing cells to different processes. Is there any option for the model I describe?

Thanks,
Eleftheria

[1] http://www.neuron.yale.edu/ftp/ted/neur ... _press.pdf

Post by **ted** » Sat Dec 05, 2009 2:58 pm

The easiest way to execute parallel simulation of single cell models is to take advantage of multithreaded execution on multiprocessor PCs or Macs. This will benefit simulations if the model involves more than about 3 thousand DEs that require numerical integration. So given a model cell with M compartments and the average number of states/compartment is N, you'll probably get some speedup if the product M*N is > 3000 or so. Note that v is a state variable, and voltage- or ligand-gated variables such as the HH m, n, and h are other examples of state variables. To take advantage of multithreaded execution, all inserted mechanisms must be "thread safe." This is discussed elsewhere in the Forum.

An alternative to multithreaded simulation is to break up a cell into multiple pieces that are distributed over multiple processors (the "multisplit" approach, described in
Hines, M.L., Markram, H. and Schuermann, F.
Fully implicit parallel simulation of single neurons.
Journal of Computational Neuroscience 25:439-448, 2008
(paper available from a link at the "publications about NEURON" page http://www.neuron.yale.edu/neuron/nrnpubs). This requires significant program revision, and is usually done principally for load balance in network models that involve cells with a wide range of sizes.

Finally, I should mention a different style of parallel execution that is useful when one has to run a large number of simulations, each of which takes at least a few seconds to complete. This need arises commonly in optimization and parameter space exploration problems. The strategy is that a "master" processor posts a bunch of "jobs" to a "bulletin board" (each job consisting of a simulation that is to be performed with a given set of parameter values), so that each processor picks a job from the board and works on it until it is complete, at which point it puts its result back on the bulletin board (where the "master" processor can get it) and then picks another job to work on. This continues until all jobs have been completed. This approach is much easier to set up than "multisplit" parallelization (especially for MSWin users, who don't even have to install MPI since MPI is included in the standard distribution of NEURON for MSWin), and it works well for suitable tasks. It is described in the documentation of the ParallelContext class http://www.neuron.yale.edu/neuron/stati ... arcon.html

ounos · Post by **ounos** » Mon Dec 07, 2009 2:28 pm

Hi Ted,

I'm completely novice in Neuron, but I'm quite knowledgable on parallel computing, so bare with me.

(Please correct me if I'm wrong in anything of the following). My idea is that in Neuron, one constructs some model, which is then simulated by stepping forward the time, one delta at each step, at the whole model. For example, Neuron calculates the charge of a capacitator given a state of the model at time 't' and a time difference 'dt'. This naturally creates a whole bunch of computational tasks that can run independently, which would synchronize only after each dt. The execution of the simulation for any single dt is embarrassingly parallel. So, I would expect Neuron would automatically be able to take advantage of available extra cores, without further intervention from the user (creating the model should be enough, and of course, plugged mechanisms would also have to be thread-safe) - though not via MPI, but with shared memory. (I'm not fluent in C/C++, but surely this kind of thing is possible).

So, can you explain whatever technical limitations require the user to revise his/her model construction code in order to take advantage of parallelism? Is it something to be fixed? Is it something that it is too costly to fix with the current approach of Neuron? Or is there a fundamental reason that makes this sort of automatic parallel simulation execution impossible? Or perhaps I have the completely wrong idea and this isn't how Neuron is supposed to run a simulation?

Thanks, and regards,
Jim

Post by **ted** » Mon Dec 07, 2009 5:46 pm

Good questions, Jim. Here are some very brief answers.

First, with regard to multithreaded simulation execution--
Multithreaded parallelism is the easiest to implement, and often requires no change at all to source code. However, the NMODL programming language (one of the tools that is used to add new mechanisms to NEURON) gives users a great deal of power over program structure and execution. In particular, it allows users to save space by storing intermediate results in global variables. Since models generally have multiple instances of any given mechanism, writing to a global variable can produce race conditions during thread parallel execution. It's easy to detect the presence of global variables in NMODL source code, and there are simple ways to revise such code so that it becomes thread safe, but a human being must decide what to do.

Distributed simulation of network models requires careful attention to program structure, because there is no guarantee that a presynaptic cell and its postsynaptic target will both exist on the same processor. Many additional subtleties are introduced by models in which parameters such as synaptic weights are assigned pseudorandom values. With careful planning, it is possible to write code in such a way that it produces identical numerical results regardless of whether it is run on serial or parallel hardware, and regardless of the number of processors that are available. However, this necessitates consideration of issues that don't arise in the context of creating a serial program. If you want a quick introduction to some of the problems involved in parallelizing network models, and their solutions, see
Hines, M.L. and Carnevale, N.T.
Translating network models to parallel hardware in NEURON.
J. Neurosci. Methods 169:425-455, 2008
(pdf available from http://www.neuron.yale.edu/neuron/nrnpubs).

Distributed simulation of networks that involve multisplit models of cells raises additonal concerns, at least some of which are discussed in the reference by Hines et al. cited in my earlier post.

ounos · Post by **ounos** » Tue Dec 08, 2009 6:18 am

Oh, nuts. Somehow, in replying to the most recent post by ounos, clicked on the wrong button at some point and ended up editing away his post, leaving just a few quotes and my replies. My apologies, Jim! I had too many browser tabs open, and was more careful in composing my thoughts than in executing my actions. So much for my own abilities as a serial and/or parallel processor.

Anyway, what follows are quotes from ounos's post and my replies to them.

--Ted

any NMODL mechanism is assumed not to be thread-safe

Users are merely advised to test all mod files with a utility called mkthreadsafe--see "How to make NMODL code thread safe" in http://www.neuron.yale.edu/phpBB/viewto ... =22&t=1476, which also provides specific examples of problem cases and how to fix them.

ted wrote: Distributed simulation of network models requires careful attention to program structure, because there is no guarantee that a presynaptic cell and its postsynaptic target will both exist on the same processor
I see, so in such cases there is still shared state (i.e. the state of the postsynaptic target).

Not unless the term "shared state" means something other than I think it does. In NEURON, synaptic mechanisms are attached to the postsynaptic cell, and spike triggered synaptic transmission is implemented with an event delivery system that detects a threshold crossing in the presynaptic cell and then communicates an "event", after some delay, that perturbs a state variable in a synaptic mechanism. The problem is how to ensure that events that occur in a presynaptic cell on one processor are conveyed to all target synapses that are attached to postsynaptic cells on other processors. To ensure that this happens regardless of the number of processors and how cells are distributed over them, model setup code must assign a unique integer ("global identifier" or gid) to each spike source, "tell" each synaptic mechanism the gids of all spike sources that drive it, and associate each cell with the processor that handles it. So in a sense there is "shared" information--the gids--but the "state of the postsynaptic target" is not shared. For more info about gids etc. see Hines & Carnevale 2008 (cited earlier in this discussion thread), and/or one of the following:
Migliore, M, Cannia, C., Lytton, W.W., Markram, H. and Hines, M.L.
Parallel network simulations with NEURON.
Journal of Computational Neuroscience
21:119-129, 2006.
Discussion of NEURON in Brette et al.
Simulation of networks of spiking neurons: a review of tools and strategies.
J. Comput. Neurosci. 23:349-398, 2007.

I wouldn't consider marginally different results, caused by essentially picking different initial seeds, as a problem.

Parallelizing serial code often requires numerous changes that may affect model setup, initialization, simulation execution, and reporting of results. Errors may be introduced at any step that destroy the fidelity of the parallel implementation to the original serial implementation. If a parallel implementation cannot produce results that are identical to those generated by a serial implementation, then one cannot claim that results generated with the parallel implementation are a reliable indication of the behavior of the serial implementation.

it didn't run long enough

Randomness has been used to modulate many model attributes--not just initial conditions, but numbers of cells, which cells are connected, where synaptic mechanisms are located, synaptic weights, channel densities, shifts of voltage dependencies network architecture, synaptic weights, channel density, cellular branching pattern--about any imaginable parameter. None of these perturbations dissipates with increasing run time. Even initial condition perturbations may not dissipate with time (e.g. a system with multiple "basins of attraction" (like a pendulum clock)).

ounos · Post by **ounos** » Tue Dec 08, 2009 10:17 pm

Hi Ted,

ted wrote:
ted wrote: Distributed simulation of network models requires careful attention to program structure, because there is no guarantee that a presynaptic cell and its postsynaptic target will both exist on the same processor
I see, so in such cases there is still shared state (i.e. the state of the postsynaptic target).
Not unless the term "shared state" means something other than I think it does. In NEURON, synaptic mechanisms are attached to the postsynaptic cell, and spike triggered synaptic transmission is implemented with an event delivery system that detects a threshold crossing in the presynaptic cell and then communicates an "event", after some delay, that perturbs a state variable in a synaptic mechanism. The problem is how to ensure that events that occur in a presynaptic cell on one processor are conveyed to all target synapses that are attached to postsynaptic cells on other processors. To ensure that this happens regardless of the number of processors and how cells are distributed over them, model setup code must assign a unique integer ("global identifier" or gid) to each spike source, "tell" each synaptic mechanism the gids of all spike sources that drive it, and associate each cell with the processor that handles it. So in a sense there is "shared" information--the gids--but the "state of the postsynaptic target" is not shared. For more info about gids etc. see Hines & Carnevale 2008 (cited earlier in this discussion thread), and/or one of the following:
Migliore, M, Cannia, C., Lytton, W.W., Markram, H. and Hines, M.L.
Parallel network simulations with NEURON.
Journal of Computational Neuroscience
21:119-129, 2006.
Discussion of NEURON in Brette et al.
Simulation of networks of spiking neurons: a review of tools and strategies.
J. Comput. Neurosci. 23:349-398, 2007.

Sorry, apparently there is a miscommunication here, and I think I contributed to it. I was trying to interpret this sentence: "Distributed simulation of network models requires careful attention to program structure, because there is no guarantee that a presynaptic cell and its postsynaptic target will both exist on the same processor" under the implied assumptions that we are talking about shared memory, which I thought I made it clear on my first post - but from your reply it is clear that you missed that; you are talking about the particular parallelization strategy implemented in Neuron, which partitions cells and assigns them to processors. I was talking about a much more fine-grained parallelism, at the level of individual computations such as "calculating the charge of a particular capacitator at a particular dt". This doesn't assign cells or whatever to a particular processor, only particular tasks. (To have a better view of what I had in mind, you may want to see Cilk).

ted wrote:
I wouldn't consider marginally different results, caused by essentially picking different initial seeds, as a problem.
Parallelizing serial code often requires numerous changes that may affect model setup, initialization, simulation execution, and reporting of results. Errors may be introduced at any step that destroy the fidelity of the parallel implementation to the original serial implementation. If a parallel implementation cannot produce results that are identical to those generated by a serial implementation, then one cannot claim that results generated with the parallel implementation are a reliable indication of the behavior of the serial implementation.

I already described a way in which a Neuron-like simulation could be run in parallel without requiring modifications on user code, assuming a shared-memory model, modulo any unfortunate technical limitations (till now I saw nothing fundamental, my original and current feeling is that this kind of thing is entirely feasible, but perhaps too costly for Neuron current design).

ted wrote:
it didn't run long enough
Randomness has been used to modulate many model attributes--not just initial conditions, but numbers of cells, which cells are connected, where synaptic mechanisms are located, synaptic weights, channel densities, shifts of voltage dependencies network architecture, synaptic weights, channel density, cellular branching pattern--about any imaginable parameter. None of these perturbations dissipates with increasing run time. Even initial condition perturbations may not dissipate with time (e.g. a system with multiple "basins of attraction" (like a pendulum clock)).

Allow me to repeat more concisely: if initial random seed plays a significant role to a simulation's result, then the simulation is meaningless. If a simulation is sensitive to this number, then (assuming just a single random generator with an int seed) there are 2^32 possibly different simulation experiments, reporting the result of just one doesn't say anything about the rest 2^32 - 1 cases. Result invariance against changes in initial seeds is a quite fundamental assumption/requirement needed for even the serial case. That said, I don't see the trouble constructing the model serially, in the good old fashioned way - I was talking about parallelizing the simulation of the model, instead of the model's creation.

Anyway, I just wanted to "test the waters", so to speak. All in all, I still see such simulations as easily parallelizable (the main reason being that in each step/dt, it is known that computations at each part of the model are truly independent and can trivially run in parallel, and furthermore, these are likely larger in number than the number of available processors so to keep them busy), and thus a pity that the users have jump hops to exploit their extra cores. Or even more hops in the case their models don't map particularly well to Neuron's strategy (such as simulating a single cell). At least I hope I gave some food for thought, perhaps useful for the future.

Again, thanks for taking time to inform me on these matters.

Regards,
Jim

Post by **ted** » Tue Dec 08, 2009 11:32 pm

ounos wrote:I was trying to interpret this sentence: "Distributed simulation of network models

The imprecision of human language strikes again. Yep, you were thinking multithreaded/shared memory parallelism, I was thinking "parallel network simulation" as being synonymous with MPI-style parallelism in which each processor has its own separate memory space (which for lack of a better term I will call "simulation of distributed models" because the programmer explicitly specifies how the cells are distributed over the processors).

NEURON can do both (and "embarrassingly parallel" simulations as well).

As I described in a previous post, multithreaded requires the least effort on the part of the user--often no real effort at all (a couple of mouse clicks in a GUI tool is enough). However multithreaded is subject to practical limitations. For one thing, multithreaded overhead is greater than simulation of distributed models for a number of reasons. Models must have a sufficient level of complexity (total number of states + voltages must be ~5000 or more) to benefit significantly from multithreaded execution.

Although more programmer effort is required to set up code for distributed models, much of this effort involves the use of "idioms" and reusable software patterns and strategies that are not difficult to learn. MPI itself is very efficient, so distributed model run time is inversely proportional to the number of processors as long as each processor has "enough work to keep it busy" (at a minimum, ~200 ODEs to integrate). This has been verified on PCs and Macs with a few processors, on workstation clusters with dozens of processors, and massively parallel machines with up to thousands of processors.

Eleftheria Pissadaki · Fri Jan 08, 2010 6:36 am

Dear Ounos and Ted,

I am coming back to the issue of paralellizing complex single neuron models.
I would like to thank you for this wonderful and interesting conversation. The answers however have left a vague feeling about the parallelization of my very complex and huge neuron. Should I split the code or not? If so, how can I test the validity of my results?

I am sorry for tracking you back to this old post.

Wishes for a Happy New Year!
Eleftheria

Post by **ted** » Sat Jan 09, 2010 2:29 am

The principal question is whether it is worth the effort. Since thread parallelism is usually achieved with no effort, or almost no effort if a few mod files must be changed to be thread safe, I would suggest trying that first. If you get an M-fold speedup, and M is "sufficiently close to N, the number of processors," there's no need to bother going to the effort needed to create a multisplit implementation. However, if you have a large model cell but find that multithreaded execution results in a speedup that is unacceptably smaller than N (less than half as big?), maybe multisplit is worth the effort if you really have a lot of long runs to execute. But keep in mind that, given the current limitations of hardware, even the largest cells will not benefit from being split into more than about 16 pieces.

Eleftheria Pissadaki · Thu Jan 21, 2010 10:31 am

Dear Ted,

Can you give a pointer regarding how I go about trying that option? I do have some mod files which I don't think are thread safe.

Many thanks for all the help,
Eleftheria

Post by **ted** » Thu Jan 21, 2010 11:58 am

Eleftheria Pissadaki wrote:I do have some mod files which I don't think are thread safe.

You can check them with mkthreadsafe, as described here
Using mkthreadsafe under MSWin
http://www.neuron.yale.edu/phpBB/viewto ... =28&t=1865

www.neuron.yale.edu

Parallelizing single-cell models

Parallelizing single-cell models

Re: Parallelizing single-cell models

Re: Parallelizing single-cell models

Re: Parallelizing single-cell models

Re: Parallelizing single-cell models

Re: Parallelizing single-cell models

Re: Parallelizing single-cell models

Re: Parallelizing single-cell models

Re: Parallelizing single-cell models

Re: Parallelizing single-cell models

Re: Parallelizing single-cell models