Page 1 of 1

h.parallelContext in a model being simulated on a cluster

Posted: Mon Oct 16, 2023 1:10 am
by Nilapratim_S
I am porting an old model implemented in HOC to Python. The model is that of a pyramidal neuron which has a soma, dendritic arbor along with myelinated axon and collaterals. The key idea is to elicit spikes in the soma and observe their propagation along the axon and collaterals.
I am trying to parallelize (using ParallelContext) the execution of my code on a remote cluster to tackle parameter exploration.

The code runs on my Mac (with 8 cores) and I can see improvement over serial execution in terms of real time taken for the simulation.

However, it fails to run on the cluster that I use and quits, citing the following error:
[n10:865021:0:865317] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))

Some relevant details as provided by the personnel maintaining the cluster:
"1) Neuron is running on a multi node compute cluster
2) Neuron was installed through Conda, not compiled
3) We are certain that neuron works with the cluster MPI because we have it running using the hoc way of doing things
4) We don't have mpi4py installed in the neuron Conda environment. Should we?
5) We use Slurm to submit jobs."

Also, I have used NetCon to detect spikes (thought this might be relevant).
The person in charge of the cluster does not have NEURON expertise and I don't know how to tackle this issue.

I understand that one might need more details to debug the issue.
But before sharing the complex model, wished to see if anyone has faced a similar issue and if so, if they had any pointers.

Regards,
Nil

Re: h.parallelContext in a model being simulated on a cluster

Posted: Mon Oct 16, 2023 12:13 pm
by ted
Interesting problem. Just in case you ran into a NEURON bug: what version of NEURON is running on your Mac, and what version is the cluster using?

Re: h.parallelContext in a model being simulated on a cluster

Posted: Mon Oct 16, 2023 12:59 pm
by Nilapratim_S
Thank you for replying, Ted.

On my Mac I have: NEURON -- VERSION 8.2.0 HEAD (156b9dee3) 2022-07-01
On the cluster: NEURON -- VERSION 8.0.2 HEAD (f0ca7454) 2022-02-02

Regards,
Nil

Re: h.parallelContext in a model being simulated on a cluster

Posted: Mon Oct 16, 2023 1:16 pm
by Nilapratim_S
I must also highlight a key difference between the earlier and new versions of the models.
The earlier parallelized HOC model (that I used to simulate on the cluster as well) used APCount().
While porting the model to Python I have used h.NetCon instead, for spike detection.

Regards,
Nil