netcon enhancement

General issues of interest both for network and
individual cell parallelization.

Moderator: hines

Post Reply
vgoudar
Posts: 19
Joined: Fri Dec 03, 2010 3:41 am

netcon enhancement

Post by vgoudar »

Hello,

Im updating some old neuron code to speed it up by parallelizing it. The code simulates a network of IAF neurons and uses pointers to pass variables pertaining to synaptic state between neurons. While porting the code, I've had to do away with the pointers and move to the netcon/NET_RECEIVE framework. As a consequence, only spikes are reported to post-synaptic neurons, with all other synaptic computations having to occur at the post-synaptic neuron. Whereas earlier, synaptic computations that were common to a pre-synaptic neuron, including short-term synaptic plasticity, NMDA dynamics, etc were computed once pre-synaptically and disseminated with pointers, now they are duplicated at each post-synaptic neuron significantly offsetting the benefits of the parallelization.

Short of using gap-junctions (spiking is sparse in our networks), is there anyway to append information to the spike messages that gets delivered via the netcon? I only want to send the pre-syaptically computed values when spikes occur, so the communication does not have to add overhead. If such a facility is not immediately available, could the netcon object be easily altered to allow this? Any "pointers" will be sincerely appreciated!

Thanks.
Vishwa
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: netcon enhancement

Post by ted »

vgoudar wrote:Whereas earlier, synaptic computations that were common to a pre-synaptic neuron, including short-term synaptic plasticity, NMDA dynamics, etc were computed once pre-synaptically and disseminated with pointers, now they are duplicated at each post-synaptic neuron significantly offsetting the benefits of the parallelization.
I don't know the details of your plasticity rules, but any rule that depends only on the times of presynaptic spikes or on the intervals between pre- and postsynaptic spikes is probably implementable with a small number of conditional statements and algebraic calculations in the NET_RECEIVE block, which execute much more quickly than numerical integration of even a single differential equation--besides, the code in the NET_RECEIVE block is executed only when a new event arrives, and not at every time step. Furthermore, many synaptic plasticity rules can be implemented in a way that allows a single instance of the synaptic mechanism (the handful of DEs necessary to represent the kinetics of a single synapse) to handle multiple afferent streams, yet with stream-specific plasticity. Have you seen these examples
https://senselab.med.yale.edu/ModelDB/S ... model=3264
https://senselab.med.yale.edu/ModelDB/S ... model=3815
http://www.neuron.yale.edu/neuron/stati ... 23_06.html
?
I also should mention the GSyn mechanism illustrated in chapter 10 of the NEURON Book, which shows stream-specific paired-pulse facilitation

There was a time when the saturating synapse described in chapter 10 of The NEURON Book was limited to a single afferent spike train--see
http://www.neuron.yale.edu/phpbb/viewto ... =16&t=2201
Unfortunately, from my notes I do not see if that limitation has been removed; will have to check. In any case, it's yet another example of use-dependent synaptic plasticity.

Of course other plasticity rules are possible that are amenable to efficient implementation with events alone. But if it turns out that your rules don't fit that rubric, there is something else you can do that might allow you to continue using your original mechanisms, rather than switch to event-driven synapses.
Short of using gap-junctions (spiking is sparse in our networks), is there anyway to append information to the spike messages that gets delivered via the netcon?
No. Instead of bothering with spike driven synapses, if your plasticity rules really do require communication from pre- to postsynaptic cell at each time step, just take advantage of the "Parallel Transfer" feature of the ParallelContext class--see https://www.neuron.yale.edu/neuron/stat ... l-transfer.
vgoudar
Posts: 19
Joined: Fri Dec 03, 2010 3:41 am

Re: netcon enhancement

Post by vgoudar »

Hi Ted,

Thanks for that detailed response. I should have described our setup more clearly, I apologize for not doing that. Here goes:

We are using the standard Tsodyks-Markram formulation of short-term synaptic plasticity, similar to the 2nd link you responded with. As you surmised, these dynamics are event-driven and depend solely on the spike times on the pre-synaptic neuron. The relevant state variable is, of course, appropriately scaled at the post-synaptic point process but the rest of the heavy lifting is done at the pre-synaptic side. The reason is because the efferent stream shares this computation, i.e. it is exactly the same for each efferent synapse of a neuron. To improve efficiency in our simulations of large networks, we took advantage of this fact by splitting each synapse into a post-synaptic and pre-synaptic mechanism. All the dynamics are then computed at the single instance of the presynaptic mechanism and then shared, via pointers, to all of the post-synaptic instances (each located at a neuron post-synaptic to this one).

Here is the computation that occurs ONCE presynaptically:

Code: Select all

 ampa = Rinf_1 + (R0_1 - Rinf_1) * exp(- (t - lastrelease) / Rtau_1)
where R0_1, Rinf_1 and Rtau_1 are solved via a set of algebraic equations at the time step when a pre-synaptic spike occurs.

ampa is shared as a pointer with all post-synaptic instances, and here it is being scaled at each post-synaptic instance:

Code: Select all

  gAMPA = gmaxAMPA * ampa
i = gAMPA*(v-Erev_1)
where gmaxAMPA and Erev_1 are constant but vary across synapses. So, the post-synaptic instances are very light-weight, and moreover, the bottleneck computation, done pre-synptically, is O(N) (N=network size).

Doing this instead in a post-synaptic NET_RECEIVE block requires R0_1, Rinf_1, Rtau_1, ampa (and other temporary variables) to be solved for redundantly at each post-synaptic instance, resulting in a severe duplication of computational effort. As the spike frequency increases, the benefits of parallelization are offset and even over-run. Basically, this is because, the bottleneck computation has become O(N^2).

I dont feel I can effectively use the FOR_NETCONS or shared NETCON constructs here because the common computation in my case is on the efferent stream, rather than the afferent stream. And parallel transfer wouldn't be prudent either because I just want to share the state variable once when the pre-synaptic neuron spikes and at no other time (i.e. not continuously).

For these reasons, I think a reasonable solution is to extend NETCON/NET_RECEIVE to share a few additional variables beyond just the spike event, and assuming this is just an MPI broadcast, I will now be able to efficiently perform and share computations common to the efferent stream. Or, I'm having a massive "D'oh!" moment, either way, I'd appreciate your help/suggestions.

Thanks.
Vishwa
hines
Site Admin
Posts: 1682
Joined: Wed May 18, 2005 3:32 pm

Re: netcon enhancement

Post by hines »

You pose an interesting performance problem. The cost, of course, is greater description and user administrative complexity. Another issue is the somewhat specialized case
of being able to compute a value at the source when the spike is generated which is correct at the time of delivery (delays may be heterogeneous) for every presynaptic terminal
with that source.

Perhaps a half-way measure would suffice for the case where there are many targets on every process for a given source. The speed up (for the
ampa = Rinf_1 + (R0_1 - Rinf_1) * exp(- (t - lastrelease) / Rtau_1)
would then be proportional to the value of the "many targets/process".) (I suppose that each instance of the heterogenous delays would be at least constant and not so heterogenous that
the least delay source spike arrives before the previous greatest delay source spike arrives.) Anyway, then one could have one extra presyn ARTIFICIAL_CELL per source cell per target
thread instantiated on the target thread
that computes your ampa and is available to all targets on that process via a POINTER variable. The delay from source to ARTIFICIAL_CELL must be <= least delay of the
source to target.

A full user solution would be to have one ARTIFICIAL_CELL on each source process that receives a 0 delay event from every source existing on that process
with a weight equal to the source
gid. When an event arrives, compute ampa and buffer the gid and ampa. (MUTLOCK the buffer if there are multiple threads) The NET_RECEIVE block would have a minimum spike delay selfevent which on deliverly would copy the buffers to every process via an MPI_Allgather, MPI_Allgatherv pair. and then copies the ampa to the proper gid slot. Then the target
synapse can get it from the proper POINTER.
vgoudar
Posts: 19
Joined: Fri Dec 03, 2010 3:41 am

Re: netcon enhancement

Post by vgoudar »

Hi Ted,

This is brilliant, thanks for that. Getting the "half-way" solution working seems straightforward with some refactoring of the python and mod code for synaptic transmission. I will try that first.

To address the problem flexibly for long-term, I will need to apply the full solution you describe. While I understand how this solution is supposed to work, I'm a bit lost as to where to start. I understand how to calculate and buffer ampa. Thereafter,
The NET_RECEIVE block would have a minimum spike delay selfevent which on deliverly would copy the buffers to every process via an MPI_Allgather, MPI_Allgatherv pair. and then copies the ampa to the proper gid slot.

loses me. How do I call and co-ordinated MPI_Allgather/MPI_Allgatherv from the mod file. Could you please flesh this out a bit for me, or point me to existing code that I could use as a template?

Thanks for your continued support on this and for the elegant solutions!

Best,
Vishwa
hines
Site Admin
Posts: 1682
Joined: Wed May 18, 2005 3:32 pm

Re: netcon enhancement

Post by hines »

As long as there is one and only one instance of the ARTIFICIAL_CELL in each rank participating in the MPI collective, all with the same net_send intervals in the NET_RECEIVE block,
and there are no other events (except the standard internal NetParEvent that manages (gid, spiketime) exchange) that use MPI, things have the possibility of working. There are several
ways to factor what goes on between the source voltage threshold, calculation of ampa, and transfer of (gid, ampa) pairs. If you take the strategy that everything is being done for all sources on a rank by the artificial cell, then I suggest using (note you need a NetCon with 0 delay from each source cell on the rank with that ARTIFICIAL_CELL as the target and the weight should be equal to the source cell
local index into the ARTIFICIAL_CELL data vectors) giving each ARTIFICIAL_CELL several Vector instances to buffer the gid, ampa, time, and other presyn parameters with a
vector size equal to the number of sources on the rank. The ARTIFICIAL cell also needs a vector of size nhost to exchange the count of items and a Vector to send all the (gid, ampa).
This is starting to get painful as it is identical to the implementation in nrn/src/nrniv/netpar.cpp. Anyway when all your data is setup calls to

extern void nrnmpi_int_allgather(int* s, int* r, int n); with n =1 for the count followed by
extern void nrnmpi_int_allgatherv(int* s, int* r, int* n, int* dspl); for integer gids
extern void nrnmpi_dbl_allgatherv(double* s, double* r, int* n, int* dspl); for double ampa

when you receive the (gid, ampa) you will need a hash map to convert the gid to a local index to put the ampa. The target postsynaptic object will have a POINTER that watches
the proper ampa index.

this is just an outline with I have obviously not implemented so there could be a few missing aspects. However the idea is sound.
Post Reply