C++ memory? error, using NetPyNE

Moderator: tom_morse

Post Reply
ecker.andris
Posts: 10
Joined: Tue May 10, 2016 10:49 am

C++ memory? error, using NetPyNE

Post by ecker.andris » Tue Oct 24, 2017 4:50 pm

Hi,

I'm using jNeuroML generated NetPyNE codes to run a big scale hippocampal model on NSG (tool: OSBPYNEURON74 @ Comet)
(see model repo here: https://github.com/mbezaire/ca1/tree/development) and after increasing the size to ~650 cells I get an error, which looks like a memory error to me, but according the NSG developers the memory/node is fine!

Here is the (main) error message:

Code: Select all

terminate called after throwing an instance of 'std::bad_alloc' what():  std::bad_alloc
and it continues like this:

Code: Select all

[comet-20-53:27930] *** Process received signal ***
[comet-20-53:27930] Signal: Aborted (6)
[comet-20-53:27930] Signal code:  (-6)
[comet-20-53:27930] [ 0] /lib64/libpthread.so.0[0x3aa140f7e0]
[comet-20-53:27930] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x3aa0832495]
[comet-20-53:27930] [ 2] /lib64/libc.so.6(abort+0x175)[0x3aa0833c75]
[comet-20-53:27930] [ 3] /opt/gnu/gcc/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x15d)[0x2b1e0b64c07d]
[comet-20-53:27930] [ 4] /opt/gnu/gcc/lib64/libstdc++.so.6(+0x5e0e6)[0x2b1e0b64a0e6]
[comet-20-53:27930] [ 5] /opt/gnu/gcc/lib64/libstdc++.so.6(+0x5e131)[0x2b1e0b64a131]
[comet-20-53:27930] [ 6] /opt/gnu/gcc/lib64/libstdc++.so.6(+0x5e348)[0x2b1e0b64a348]
[comet-20-53:27930] [ 7] /opt/gnu/gcc/lib64/libstdc++.so.6(+0x5e859)[0x2b1e0b64a859]
[comet-20-53:27930] [ 8] /opt/gnu/gcc/lib64/libstdc++.so.6(_Znam+0x9)[0x2b1e0b64a8b9]
[comet-20-53:27930] [ 9] /projects/ps-nsg/home/nsguser/applications/osbneuron74_py/nrn-7.4/installdir/x86_64/lib/libnrnpython.so.0(+0x130d6)[0x2b1e0a84e0d6]
[comet-20-53:27930] [10] /projects/ps-nsg/home/nsguser/applications/osbneuron74_py/nrn-7.4/installdir/x86_64/lib/libnrniv.so.0(+0x802c7)[0x2b1e0903e2c7]
[comet-20-53:27930] [11] /projects/ps-nsg/home/nsguser/applications/osbneuron74_py/nrn-7.4/installdir/x86_64/lib/libnrnoc.so.0(hoc_call_ob_proc+0x2ab)[0x2b1e08d977cb]
[comet-20-53:27930] [12] /projects/ps-nsg/home/nsguser/applications/osbneuron74_py/nrn-7.4/installdir/x86_64/lib/libnrnoc.so.0(hoc_object_component+0x76e)[0x2b1e08d9868e]
[comet-20-53:27930] [13] /projects/ps-nsg/home/nsguser/applications/osbneuron74_py/nrn-7.4/installdir/x86_64/lib/libnrnpython.so.0(+0xb0fe)[0x2b1e0a8460fe]
[comet-20-53:27930] [14] /projects/ps-nsg/home/nsguser/applications/osbneuron74_py/nrn-7.4/installdir/x86_64/lib/libnrniv.so.0(_ZN10OcJumpImpl7fpycallEPFPvS0_S0_ES0_S0_+0x61)[0x2b1e090174e1]
[comet-20-53:27930] [15] /projects/ps-nsg/home/nsguser/applications/osbneuron74_py/nrn-7.4/installdir/x86_64/lib/libnrnpython.so.0(+0xb392)[0x2b1e0a846392]
[comet-20-53:27930] [16] /opt/python/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x2b1e0aaa7b73]
[comet-20-53:27930] [17] /opt/python/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3b2e)[0x2b1e0ab5c00e]
[comet-20-53:27930] [18] /opt/python/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5a5d)[0x2b1e0ab5df3d]
[comet-20-53:27930] [19] /opt/python/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5a5d)[0x2b1e0ab5df3d]
[comet-20-53:27930] [20] /opt/python/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5a5d)[0x2b1e0ab5df3d]
[comet-20-53:27930] [21] /opt/python/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x830)[0x2b1e0ab5f320]
[comet-20-53:27930] [22] /opt/python/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x2b1e0ab5f449]
[comet-20-53:27930] [23] /opt/python/lib/libpython2.7.so.1.0(PyImport_ExecCodeModuleEx+0x99)[0x2b1e0ab72c79]
[comet-20-53:27930] [24] /opt/python/lib/libpython2.7.so.1.0(+0x11dfce)[0x2b1e0ab72fce]
[comet-20-53:27930] [25] /opt/python/lib/libpython2.7.so.1.0(+0x11edb9)[0x2b1e0ab73db9]
[comet-20-53:27930] [26] /opt/python/lib/libpython2.7.so.1.0(PyImport_ImportModuleLevel+0x1dd)[0x2b1e0ab74a2d]
[comet-20-53:27930] [27] /opt/python/lib/libpython2.7.so.1.0(+0x1013e8)[0x2b1e0ab563e8]
[comet-20-53:27930] [28] /opt/python/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x2b1e0aaa7b73]
[comet-20-53:27930] [29] /opt/python/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x47)[0x2b1e0ab57ee7]
[comet-20-53:27930] *** End of error message ***
Do you know why do I get this?
I appears after running the simulation and gathering data for saving around 20 random traces and all the spikes (500ms long simulation)

Thanks,
András

PS: The error seems to be independent from the total number of cores and number of code per node, and the same code runs perfectly fine for ~450 cells (instead of ~650).

salvadord
Posts: 57
Joined: Tue Aug 18, 2015 3:49 pm

Re: C++ memory? error, using NetPyNE

Post by salvadord » Wed Oct 25, 2017 11:17 am

Hi András,
Thanks for your question. We had a similar post recently: viewtopic.php?f=45&t=3727&sid=30e51a901 ... e9ab8424ad

The conclusion from that post was that if the amount of data to be gathered and the number of cores used is above a threshold, the mpi error is triggered during the pc.py_alltoall() function (during gathering). Note that the amount of data depends on: the number of cells (and how detailed they are), connections, stims, traces recorded, and simulation duration. NetPyNE's metadata can add a significant overhead when gathering.

So there are several ways to reduce the amount of data gathered from nodes:
- reduce one or more of the above (eg. reduce duration or number of traces recorded)
- set cfg.saveCellSecs=False -- removes all data on cell sections prior to gathering from nodes
- set cfg.saveCellConns=False -- removes all data on cell connections prior to gathering from nodes
- set cfg.gatherOnlySimData=True -- gathers from nodes only the output simulation data (not the network instance)

If you want to generate and save a the full network instance in netpyne's format, you can always run a separate single simulation in 1 core, where you just generate the network, gather and save the data (i.e. don't run the sim.runSim() func).

That said, I'm a bit surprised that just 650 cells, 500 ms and 20 traces trigger this error, since we have ran much larger models on Comet directly via ssh. Not sure if the fact that this is happening via OSB and NSG has any effect -- don't think it should. How many cores did you try running on? For the other user (see post above), reducing the number of cores used also seemed to avoid the error, but honestly not sure why this would be the case.

I saw on the github repo that you are working with Padraig and Marianne, so I assume the conversion of the model into netpyne was done correctly. I've discussed with them in the last couple conferences converting the CA3 model into netpyne, so glad to see this is happening. Please let me know if you have any other issues/questions or if I can help in any way.

Salva

ecker.andris
Posts: 10
Joined: Tue May 10, 2016 10:49 am

Re: C++ memory? error, using NetPyNE

Post by ecker.andris » Wed Oct 25, 2017 12:21 pm

Hi Salva,
I saw on the github repo that you are working with Padraig and Marianne, so I assume the conversion of the model into netpyne was done correctly. I've discussed with them in the last couple conferences converting the CA1 model into netpyne, so glad to see this is happening. Please let me know if you have any other issues/questions or if I can help in any way.
Yeah the question is related to that model and the conversion is almost done! (we just run into issues like this...)
That said, I'm a bit surprised that just 650 cells, 500 ms and 20 traces trigger this error, since we have ran much larger models on Comet directly via ssh. Not sure if the fact that this is happening via OSB and NSG has any effect -- don't think it should. How many cores did you try running on? For the other user (see post above), reducing the number of cores used also seemed to avoid the error, but honestly not sure why this would be the case.
I've tried different number of cores and different cores per node on NSG, but I think the one from which I've copied the error message was 40 cores on 4 separate nodes on Comet. The simulation was directly sent to Comet from my local machine and it's not through OSB (yet). (One more comment about the saving: it's not only 20 traces, but also all the spikes.)
So there are several ways to reduce the amount of data gathered from nodes:
- reduce one or more of the above (eg. reduce duration or number of traces recorded)
- set cfg.saveCellSecs=False -- removes all data on cell sections prior to gathering from nodes
- set cfg.saveCellConns=False -- removes all data on cell connections prior to gathering from nodes
- set cfg.gatherOnlySimData=True -- gathers from nodes only the output simulation data (not the network instance)
I'm gonna ask Padraig to put these into the NetPyNE code generator and will let you know what happens!

Thanks for your answer,
András

ecker.andris
Posts: 10
Joined: Tue May 10, 2016 10:49 am

Re: C++ memory? error, using NetPyNE

Post by ecker.andris » Fri Nov 10, 2017 12:28 pm

Hi Salva,

we've implemented the changes you've suggested and tried to save spike only from 100 cells and traces for ~40 but I still get the same error!
For smaller network size I was able to save spikes from up to 500 cells and traces from ~40.

Do you have any other idea in mind?

Thanks,
András

salvadord
Posts: 57
Joined: Tue Aug 18, 2015 3:49 pm

Re: C++ memory? error, using NetPyNE

Post by salvadord » Tue Nov 14, 2017 6:26 pm

Hmm the spike times don't really use up that much space so can probably them from all cells. Cell traces however do require more memory, so the 40 traces might be causing the problem.

Can you point me again to the repo with code you are running and I'll try myself to see if can find any issues?

thanks
salva

ecker.andris
Posts: 10
Joined: Tue May 10, 2016 10:49 am

Re: C++ memory? error, using NetPyNE

Post by ecker.andris » Wed Nov 15, 2017 5:27 am

Hi Salva,

you can find the repo here: https://github.com/mbezaire/ca1/tree/development/
and we are mostly using: `NeuroML2/network/GenerateHippocampalNet_oc.py` which creates an xml (or h5 based on the size) simulation, which gets converted to NetPyNE (by jNeuroML) during zipping for NSG here: https://github.com/mbezaire/ca1/blob/de ... SG.py#L108
I've added such a computer generated NetPyNE code for you (using scale=500-> ~630 real cells and 900 stimulating ones, saving ~40traces and ~350spikes from real cells only.) You can find it here: https://github.com/mbezaire/ca1/blob/de ... netpyne.py

Thanks and let me know if I can do anything else,
András

salvadord
Posts: 57
Joined: Tue Aug 18, 2015 3:49 pm

Re: C++ memory? error, using NetPyNE

Post by salvadord » Thu Nov 16, 2017 9:31 pm

Getting this error: IOError: ``HippocampalNet_scale500_oc.net.nml.h5`` does not exist

maybe need to add that file to the repo?

thanks

ecker.andris
Posts: 10
Joined: Tue May 10, 2016 10:49 am

Re: C++ memory? error, using NetPyNE

Post by ecker.andris » Fri Nov 17, 2017 8:33 am

oh, good point. Just added it!

András

salvadord
Posts: 57
Joined: Tue Aug 18, 2015 3:49 pm

Re: C++ memory? error, using NetPyNE

Post by salvadord » Fri Nov 17, 2017 6:08 pm

thanks. Now I'm getting a bunch of these messages: "Id not found in <neuroml> element. All ids: ['CavL', 'CavN', ... ]" and then error: "ValueError: argument not a density mechanism name."

I tried nrnivmodl on the mod files in root folder and then symlinking x86_64 folder in /NeuroML2/network but the mech names don't seem to match (eg. they all start with 'ch_'

I also saw a 'NeuroML2/channels' folder but not sure how to compile those.

ecker.andris
Posts: 10
Joined: Tue May 10, 2016 10:49 am

Re: C++ memory? error, using NetPyNE

Post by ecker.andris » Fri Nov 17, 2017 6:33 pm

Well either you run the scripts generating the mod files from those channels (the ones you've compiled correspond to the NEURON version) or I think the best would be if you could write me an e-mail to andras.ecker@epfl.ch and I'll send you everything instead of putting big files to github.

Thanks,
András

salvadord
Posts: 57
Joined: Tue Aug 18, 2015 3:49 pm

Re: C++ memory? error, using NetPyNE

Post by salvadord » Fri Nov 17, 2017 6:45 pm

what's the script to generate the mod files from those channels?

I'll email you anyway, thanks.

Post Reply