The bottom line is that you are almost certain to see superlinear speedup
with your simulations as long as your per machine high speed cache is much faster
than the main memory bandwidth and your problem is large enough so that each
machine is integrating more than 100 or so equations. Load balance will be extremely
good with no effort on your part if each cell type is a multiple of the number of CPUs
used. If load balance (related to the number of equations integrated on each CPU)
becomes an issue, please be aware that the biophysical cell and network specification
is completely independent from the cell distribution strategy chosen, and, even when
random connectivity and spike stimululators are used, idioms have been devised so
that simulation results are double precision quantitatively identical regardless of number
of CPUs or cell distribution.
I should also mention that our experience has shown it to be straightforward to specify
simulation setup in such a way that the setup time scales properly with the number of
CPUs. Generally, cell creation and cell connection algorithms only need to have their
outer loop modified so that the iteration is only over the cells that exist on "this" CPU.
. . .
I do recommend that parallel simulations be carried out in batch mode and only the
spike activity be saved for optimum performance. This in no way prevents a focus
on state trajectories since, with the entire network spike data, any subset (even 1)
of neurons can be re-simulated with the aid of the GUI to examine any variable as
function of time. This process makes use of the PatternStim class which provides
as input just those events that would have been generated by the rest of the network.
The results for the subnet are quantitatively identical to the full network simulation.
Lastly, I should mention that the current alpha version of NEURON
http://www.neuron.yale.edu/ftp/neuron/versions/alpha/
(sources after 5.8.105) has extended the parallel network capabilities to simulate
interprocessor gap junctions and synapses where post-synaptic state is continuously
dependent on pre-synaptic voltage. Communication overhead is greatly increased for
such models since voltages must be exchanged every time step. Gap junctions in
combination with discrete events can presently use only the fixed step method but
this will be soon extended to global variable step method.