www.neuron.yale.edu

Posted: **Wed Jun 24, 2009 4:40 pm**

My model seems to be leaking memory.
I've had no success, so far, debugging it myself.
Results so far seem to implicate NEURON, rather than Python, but I'm not 100% confident of that.
Unless I'm very lucky, nobody's going to be able to tell me the solution, but I'd like suggestions about how to proceed.
Details follow:

The error message is:

Code: Select all

NEURON -- Release 7.0 (281:80827e3cd201) 80827e3cd201
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2008
See http://www.neuron.yale.edu/credits.html

... snip ...

nrniv(21540,0xa0591fa0) malloc: *** mmap(size=1048580096) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
terminate called after throwing an instance of 'std::bad_alloc'
  what():  St9bad_alloc
Abort

The model reads afferent spike times from a file.
It runs for a very long time (thousands of seconds).
Therefore, I cannot read in all the afferent spike times for the entire run all at once (won't fit in memory).
Therefore, I read afferent spike times in buffered fashion, a little at a time.
I do that as follows:

At intervals I interrupt solving the model, using CVode.event(time,callback)
The callback routine
- reads spikeTimes from a file on disk
  loads them into NEURON's event queue by repeatedly calling the event method of the appropriate Netcon object, passing the approprate spikeTime as argument
  calls CVode.event(nextTime,callback) to ensure that another batch of spikeTimes gets loaded just before the current one runs out

I do not actually know whether the memory leak has anything to do with this buffered input but it's certainly one of the biggest memory hogs in the model.
I've tried writing a "toy" model that does the buffered spike read thing, but with a trival (one compartment) model, and the toy program doesn't leak memory, as far as I can tell.

I've tried instrumenting the model to find out what parts of the code are leaking.
The results seem to be:

All parts of the code leak memory, and
NEURON, rather than Python is leaking

I have a hypothesis, which appears at the end of this post (but I'm very inexpert in this area).

In more detail, here's what I've done:

First, I inserted, in various places throughout the code, the following

Code: Select all

print(subprocess.Popen("ps -o\%mem,ucomm | awk '/nrniv/ {print $1}'",shell=True,stdout=subprocess.PIPE).communicate()[0])

which prints the memory owned by the 'nrniv' process (as a percent of total, which, in this case, is 4 Gb).

With this, I partitioned execution into three parts:

1) loading spike times into a buffer
2) loading the buffer into NEURON's event queue
3) everything btween 2 & 1 (i.e. solving the model -- what NEURON does between calls to the callback)

It seems that memory usage increases during all three of these activities up 'till about 52% memory usage.

The amount by which phase #1 increases memory usage is fairly constant around 1%
The amount by which phase #2 increases memory usage is variable, either 0%, 7%, or 14% I think maybe this means its 7%, but with memory allocated in integral blocks.

The amount by which phase #3 increases memory usage seems to increase each time: 2.8%,2.8%,5.3%,7.5%

When the total memory usage gets to 52%, something new happens: for the first time it decreases (by 1%, in phase #3).
On the next iteration, total memory usage decreases by 3.6%, and on the next, by 0.7%.
After that, malloc fails (mid-way through phase #2). The error message appears earlier in this post.

I have investigated the possibility of a memory leak on the Python side, using Python's gc module, which allows introspection of Pythons garbage collector.
Liberal use of gc.collect() changes nothing (so the problem is not failure to garbage collect). gc.garbage always returns an empty list (so the problem is not proliferation of objects that are uncollectable, either due to circular references, or because the TopLevelHocInterpreter object references them). I assume, though, that this doesn't exclude the possibility of a memory leak on the NEURON side.

Debugging memory leaks is way above my skill level, but here's a hypothesis based on the above observations.
Perhaps NEURON's garbage collection is triggered when memory usage exceeds 50%.
Perhaps, then, memory's returned to the free pool" but in some form that's not accessible to a malloc call.
Why would memory show up as free, to the ps utility, but not be accessable to malloc? Maybe fragmented memory. Maybe there's no contiguous block of free memory of the requested size (apparently, the size is 1048580096).

Posted: **Thu Jun 25, 2009 4:52 pm**

Try the latest 7.1 version from
http://www.neuron.yale.edu/ftp/neuron/versions/alpha/
or the hg repository
http://www.neuron.yale.edu/hg/neuron/nrn
I believe several memory leaks have been fixed since the 7.0 distribution
but if yours is still there, that is the place I will start diagnosing.

Posted: **Thu Jun 25, 2009 5:15 pm**

Thanks, will do.
I thought updating to 7.1 might be the thing to do, but when I searched the Hg repository for "leak," I didn't find anything newer than 7.0.

Posted: **Fri Jun 26, 2009 7:22 am**

...requested size (apparently, the size is 1048580096).

That is a large block. Do you know where and why it is being requested?
When Vector grows and overruns its current area, it typically requests a
two fold size increase, copies itself to the new area, and then releases the
old area. If you know how large it needs to be, you can preallocate with a single
resize (then resize again to 0) or see
http://www.neuron.yale.edu/neuron/stati ... uffer_size
It may be useful to see the existing Vector sizes after (or just before) the failure with

Code: Select all

objref xxx
xxx = new List("Vector")
for i=0, xxx,count-1 print i, xxx, xxx.o(i).size, xxx.o(i).buffer_size

and remember that a double is 8 bytes.

Posted: **Fri Jun 26, 2009 12:15 pm**

1) I installed version 7.1 (much easier than I expected: thanks for the precompiled binaries), and I get the same behavior.

2) I only *think* the requested block size is 1048580096, based on the error message (quoted above) which I may be misunderstanding.

3) I, too, thought of the "two fold size increase" thing, when I saw that 50% memory usage was the breaking point, give or take a little. Thing is, though, I've turned off recording spike times in my model, so I don't see where I'd be allocating huge amounts of memory for storing Vectors -- unless some bug is causing that to happen despite my intentions. Your suggested code snippet is a much better way of checking into that than what I was planning. I will give that a try.

Posted: **Mon Jun 29, 2009 7:22 am**

That is a large block. Do you know where and why it is being requested?

Yes, it occurs while "loading" NEURON's event queue, i.e. within the following set of nested loops:

Code: Select all

for i in range(numCells):
	for j in range(numExCells):
		if exNetConArray[i,j] != None:
			for (k,spikeTime) in enumerate(exTrains[j]):
				spikeTime = float(spikeTime)
				exNetConArray[i,j].event(spikeTime)

Therefore, very probably, the culprit line is:

Code: Select all

exNetConArray[i,j].event(spikeTime)

numCells is the number of cells in the model. (Since it's a single-compartment, network model, I call them "cells" though NEURON would call them "sections")
numExCells is the number of cells afferent to the cells in the model.
exTrains is a Python.List of rank-1 Python.numpy.ndarray Each array is a spike train for one afferent cell.
exNetConArray is a connectivity matrix represented by a rank-2 Python.numpy.ndarray, each element of which is either a hoc.Netcon object, or Python.none
exNetConArray.shape == (numCells,numExCells)

Per your suggestion, I instrumented the code as follows:

Code: Select all

for i in range(numCells):
	for j in range(numExCells):
		if exNetConArray[i,j] != None:
			for (k,spikeTime) in enumerate(exTrains[j]):
				spikeTime = float(spikeTime)
				exNetConArray[i,j].event(spikeTime)
				print "hoc Vectors:"
				vvv = h.List("Vector")
				if len(vvv) > 0:
					for v in vvv: print "\tsize=%s\tbuffer_size=%s" % (v.size(),v.buffer_size())
				else:
					print "\tnone"

and it turns out that there are no hoc.Vector objects at all in the workspace. (Which is as intended. I was just checking whether I might be creating, and enlarging, hoc.Vector objects unintentionally.)

www.neuron.yale.edu

Memory leak (?) in NEURON/Python

Memory leak (?) in NEURON/Python

Re: Memory leak (?) in NEURON/Python

Re: Memory leak (?) in NEURON/Python

Re: Memory leak (?) in NEURON/Python

Re: Memory leak (?) in NEURON/Python

Re: Memory leak (?) in NEURON/Python