Speeding up simulations with threads

A collection of noteworthy items selected by our moderators from discussions about making and using models with NEURON.

Moderators: wwlytton, tom_morse, ted

Post Reply
Site Admin
Posts: 5918
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine

Speeding up simulations with threads

Post by ted »

Thread parallel simulation ("multithreaded execution") is a quick and easy way to speed up simulations of models of single cells or networks on your multicore PC or Mac. The runtime improvement depends on model complexity: the bigger the model, the more closely the speedup will approach the number of independent processing units. The chief benefit will be for fairly large models (number of states + voltages ~ 5000 or more). Users with single core workstations may also benefit from improved cache efficiency (see discussion of the Parallel Computing GUI tool below).

Multithreaded simulation on individual workstations has two advantages over parallel simulations on workstation clusters or supercomputer hardware: it does not require any modification of hoc code, and it preserves use of the GUI. However, some mechanisms written in NMODL may need to be revised, as described below. Furthermore, multithreaded execution works only with fixed step and global variable time step integration, and it cannot be used with models that involve extracellular or LinearMechanism.

Parallel Computing tool

To simplify the use of threads, the GUI now has a Parallel Computing tool that can be brought up by clicking on NEURON Main Menu / Tools / Parallel Computing. This tool provides useful information about your computer and the model you are simulating, and offers a convenient interface for specifying how multithreaded simulations will be executed.

"Number of useful processors"

This is the number of available processors. Clicking on the "Refresh" button returns an estimate of the number of processors. The estimate is correct about 70% of the time, depending on whatever other programs may also be running, so it is a good idea to repeat this test a few times and take the most frequent result.

"Total model complexity"

This is the sum of the number of states and membrane potentials in the model.

"Number of pieces"

This is normally the number of cells in the model. In order to improve load balance, model cells that have many sections can be split into two or more pieces that are simulated by different threads (multisplit simulation, as described in (Hines et al. 2008)).

"Load imbalance"

Ideally, the total computational load during a simulation would be distributed evenly across all processors, so that all would finish at the same time. Almost always, however, the load on some processors will be smaller and they will finish early, while others will have a greater load and finish later. Load imbalance is undesirable because total run time is governed by the last processor to finish.

The Parallel Computing tool reports load imbalance as a percentage, which is calculated by
((cmax/cavg) - 1)*100
cmax is the maximum complexity per processor
cavg is the average complexity per processor (i.e. total complexity/# processors)
For example, if there are two processors and one does all the work, the load imbalance is (2-1)*100 = 100%.

If load imbalance is > 10%, try activating "Multisplit" (see below).

"# threads"

Specifies the number of threads to use. The optimum number of threads for any particular model depends on many factors, including the complexity of the model, the number of processors, and the amount of cache memory available to each processor. A suitable choice will achieve good load balance while at the same time making efficient use of cache. Typically, the number of threads is chosen to be the number of processors.

"Thread Parallel"

This checkbox can be used to toggle between plain old serial execution
and thread parallel execution.

"Cache Efficient"

When activated, reorganizes memory so that it is laid out in the order in which it is needed by the processors. This can speed up simulations even on single processor workstations.

"Use busy waiting"

May improve performance if number of threads is less than number of cores--see Programmer's Reference entry https://www.neuron.yale.edu/neuron/stat ... d_busywait


Good load balance may be difficult to achieve in network models that involve neurons with widely different complexities. For such models, activating "Multisplit" can improve load balance by splitting cells into two or more pieces that are distributed over multiple processors. This is worth trying when imbalance is 10% or more.


Hines, M.L., Markram, H. and Schuermann, F.
Fully implicit parallel simulation of single neurons.
Journal of Computational Neuroscience 25:439-448, 2008.

How to make NMODL code thread safe

User-written NMODL code may need some changes to make it safe for multithreaded execution.

The most common issues have to do with assigning values to GLOBAL variables. GLOBALs are often used in ion channel mechanisms to store "volatile" values that are computed in a "rates" procedure, e.g. rate constants, or gating variable time constants and steady state values. Such mechanisms can be "fixed" by inserting the keyword
into the NEURON block. GLOBALs will then be promoted to "thread instance" variables, i.e. each thread will have its own instance of each GLOBAL variable.
Example: the NEURON block in hh.mod declares
GLOBAL minf, hinf, ninf, mtau, htau, ntau
each of which is computed anew at every time step for each segment of every section that has the hh mechanism. To make hh.mod thread safe it was only necessary to add the line
to its NEURON block.

Assignment to GLOBAL variables is also often encountered in an INITIAL block. For instance, multicompartmental models of ion accumulation may involve factors that take diffusion path lengths and the surface/volume ratios of intracellular compartments into account. It makes sense to calculate such factors once during initialization, and then reuse them throughout the simulation for each segment in the model, e.g.

Code: Select all

  if (factors_done == 0) {
    factors_done = 1
    . . . statements that compute the global factors . . .
  . . . other initialization stuff . . .
(also see "9.10.1 Modeling diffusion with kinetic schemes" in chapter 9 of The NEURON Book). The least space-efficient way to make this thread safe would be to change all the GLOBALs to RANGE variables.

A more space-efficient alternative would be to promote the GLOBALs to thread instance variables by adding the THREADSAFE directive to the NEURON block.

The most space-efficient approach would be to use MUTEXLOCK...MUTEXUNLOCK so that only the first thread that executes the INITIAL block will execute the code that assigns values to the factors.

Code: Select all

  if (factors_done == 0) {
    factors_done = 1
    . . . statements that compute the global factors . . .
  . . .
This would be the slowest approach because each thread would have to wait its turn to execute this code. However, the performance penalty would be negligible because the computation in all but the first thread is only to check the factors_done flag.

Sometimes a GLOBAL is used as a "counter" or "accumulator" to keep track of the total of some quantity, summed over all instances of a mechanism. This can be done safely by surrounding the statement(s) with MUTEXLOCK...MUTEXUNLOCK. Alternatively, a single assignment statement can be prefixed with the PROTECT keyword, e.g.
PROTECT openchannels = openchannels + nopen

Use cnexp or derivimplicit instead of euler

The euler integration method is not thread safe (it is also numerically unstable, which should be reason enough to avoid it). Instead, use cnexp if each dstate/dt equation depends linearly only on the state, or use derivimplicit.

Updating mod files with mkthreadsafe

To check all mod files in the current directory, type
at the command line. This program will echo the name of each mod file, report any problems that it finds, and offer to fix the file by inserting the keyword
into the NEURON block. When mkthreadsafe reports a problem with a file, it is best to examine the file before approving insertion of THREADSAFE.

MSWin users may find it helpful to read
Using mkthreadsafe under MSWin
http://www.neuron.yale.edu/phpBB/viewto ... =28&t=1865
Post Reply