Memory leak (?) in NEURON/Python
Posted: Wed Jun 24, 2009 4:40 pm
My model seems to be leaking memory.
I've had no success, so far, debugging it myself.
Results so far seem to implicate NEURON, rather than Python, but I'm not 100% confident of that.
Unless I'm very lucky, nobody's going to be able to tell me the solution, but I'd like suggestions about how to proceed.
Details follow:
The error message is:
The model reads afferent spike times from a file.
It runs for a very long time (thousands of seconds).
Therefore, I cannot read in all the afferent spike times for the entire run all at once (won't fit in memory).
Therefore, I read afferent spike times in buffered fashion, a little at a time.
I do that as follows:
I've tried writing a "toy" model that does the buffered spike read thing, but with a trival (one compartment) model, and the toy program doesn't leak memory, as far as I can tell.
I've tried instrumenting the model to find out what parts of the code are leaking.
The results seem to be:
In more detail, here's what I've done:
First, I inserted, in various places throughout the code, the following
which prints the memory owned by the 'nrniv' process (as a percent of total, which, in this case, is 4 Gb).
With this, I partitioned execution into three parts:
The amount by which phase #1 increases memory usage is fairly constant around 1%
The amount by which phase #2 increases memory usage is variable, either 0%, 7%, or 14% I think maybe this means its 7%, but with memory allocated in integral blocks.
The amount by which phase #3 increases memory usage seems to increase each time: 2.8%,2.8%,5.3%,7.5%
When the total memory usage gets to 52%, something new happens: for the first time it decreases (by 1%, in phase #3).
On the next iteration, total memory usage decreases by 3.6%, and on the next, by 0.7%.
After that, malloc fails (mid-way through phase #2). The error message appears earlier in this post.
I have investigated the possibility of a memory leak on the Python side, using Python's gc module, which allows introspection of Pythons garbage collector.
Liberal use of gc.collect() changes nothing (so the problem is not failure to garbage collect). gc.garbage always returns an empty list (so the problem is not proliferation of objects that are uncollectable, either due to circular references, or because the TopLevelHocInterpreter object references them). I assume, though, that this doesn't exclude the possibility of a memory leak on the NEURON side.
Debugging memory leaks is way above my skill level, but here's a hypothesis based on the above observations.
Perhaps NEURON's garbage collection is triggered when memory usage exceeds 50%.
Perhaps, then, memory's returned to the free pool" but in some form that's not accessible to a malloc call.
Why would memory show up as free, to the ps utility, but not be accessable to malloc? Maybe fragmented memory. Maybe there's no contiguous block of free memory of the requested size (apparently, the size is 1048580096).
I've had no success, so far, debugging it myself.
Results so far seem to implicate NEURON, rather than Python, but I'm not 100% confident of that.
Unless I'm very lucky, nobody's going to be able to tell me the solution, but I'd like suggestions about how to proceed.
Details follow:
The error message is:
Code: Select all
NEURON -- Release 7.0 (281:80827e3cd201) 80827e3cd201
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2008
See http://www.neuron.yale.edu/credits.html
... snip ...
nrniv(21540,0xa0591fa0) malloc: *** mmap(size=1048580096) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc
Abort
The model reads afferent spike times from a file.
It runs for a very long time (thousands of seconds).
Therefore, I cannot read in all the afferent spike times for the entire run all at once (won't fit in memory).
Therefore, I read afferent spike times in buffered fashion, a little at a time.
I do that as follows:
- At intervals I interrupt solving the model, using CVode.event(time,callback)
The callback routine- reads spikeTimes from a file on disk
loads them into NEURON's event queue by repeatedly calling the event method of the appropriate Netcon object, passing the approprate spikeTime as argument
calls CVode.event(nextTime,callback) to ensure that another batch of spikeTimes gets loaded just before the current one runs out
- reads spikeTimes from a file on disk
I've tried writing a "toy" model that does the buffered spike read thing, but with a trival (one compartment) model, and the toy program doesn't leak memory, as far as I can tell.
I've tried instrumenting the model to find out what parts of the code are leaking.
The results seem to be:
- All parts of the code leak memory, and
NEURON, rather than Python is leaking
In more detail, here's what I've done:
First, I inserted, in various places throughout the code, the following
Code: Select all
print(subprocess.Popen("ps -o\%mem,ucomm | awk '/nrniv/ {print $1}'",shell=True,stdout=subprocess.PIPE).communicate()[0])
With this, I partitioned execution into three parts:
- 1) loading spike times into a buffer
2) loading the buffer into NEURON's event queue
3) everything btween 2 & 1 (i.e. solving the model -- what NEURON does between calls to the callback)
The amount by which phase #1 increases memory usage is fairly constant around 1%
The amount by which phase #2 increases memory usage is variable, either 0%, 7%, or 14% I think maybe this means its 7%, but with memory allocated in integral blocks.
The amount by which phase #3 increases memory usage seems to increase each time: 2.8%,2.8%,5.3%,7.5%
When the total memory usage gets to 52%, something new happens: for the first time it decreases (by 1%, in phase #3).
On the next iteration, total memory usage decreases by 3.6%, and on the next, by 0.7%.
After that, malloc fails (mid-way through phase #2). The error message appears earlier in this post.
I have investigated the possibility of a memory leak on the Python side, using Python's gc module, which allows introspection of Pythons garbage collector.
Liberal use of gc.collect() changes nothing (so the problem is not failure to garbage collect). gc.garbage always returns an empty list (so the problem is not proliferation of objects that are uncollectable, either due to circular references, or because the TopLevelHocInterpreter object references them). I assume, though, that this doesn't exclude the possibility of a memory leak on the NEURON side.
Debugging memory leaks is way above my skill level, but here's a hypothesis based on the above observations.
Perhaps NEURON's garbage collection is triggered when memory usage exceeds 50%.
Perhaps, then, memory's returned to the free pool" but in some form that's not accessible to a malloc call.
Why would memory show up as free, to the ps utility, but not be accessable to malloc? Maybe fragmented memory. Maybe there's no contiguous block of free memory of the requested size (apparently, the size is 1048580096).