Parallelized simulations result in PythonObject accumulation when using the bulletin-board-style parallelization

General issues of interest both for network and
individual cell parallelization.

Moderator: hines

Post Reply
lukas
Posts: 2
Joined: Thu Dec 14, 2023 6:53 am

Parallelized simulations result in PythonObject accumulation when using the bulletin-board-style parallelization

Post by lukas »

Hello all,
I am using NEURON (8.2.7) + Python (3.13) with the Bulletin-board style parallelization to simulate populations of (multicompartment) neurons. NEURON so far works wonderfully for this however I have now run into a bit of memory trouble. The issue is that I often do parameter explorations or simulate stochastic behavior, both requiring running my population of cells multiple times, saving the results in between (for example voltages, or i_membrane) to pickle files. I noticed that if use "pc.submit(...) and while(pc.working())" multiple times, memory accumulates until I run out of memory. I thought I could "reset" the simulation and bulletin board in between rounds if I delete the generated python objects, therefore removing any (python side) references that NEURON still has on the individual cells and vectors, however this does not seem to free up memory.

I created a very simplified version of my simulation setup to illustrate, where I have a cell class that is used to create cells, and set up its recording vectors, a simulator class that handles the submission of jobs to the bulletin board, creation of cells using the cell class, running the individual simulations and then saving of results. And finally a run script with which I create a simulator object and run the simulation (multiple times).

If I print found PythonObjects using h.allobjects('PythonObject') after the first round of simulations and then again after the second round of simulations, I see after the first run:
...
PythonObject[11094] with 1 refs
PythonObject[11095] with 1 refs
PythonObject[11096] with 1 refs
PythonObject[11097] with 1 refs

and after second run:
...
PythonObject[41832] with 1 refs
PythonObject[41833] with 1 refs
PythonObject[41834] with 1 refs
PythonObject[41835] with 1 refs

so the number of PythonObjects increases with each round of simulations, and if i track the memory consumption of the main python process it steadily accumulates over the two rounds. This happens even if I delete my cell objects after each simulation and also if i delete the whole simulator object and recreate it.

Is there something I am missing to cleanup the "hoc side" of NEURON between rounds or is this style of running multiple simulations not really a good idea when using NEURON in general?

To execute the run script i am running:

Code: Select all

mpiexec -n 9 python run_simple.py > output.txt
Thanks a lot and have a good day
Lukas

run_simple.py

Code: Select all

from anf_simple import ANF as model
from simulator_simple import Simulator as sim
import gc
from neuron import h

my_sim = sim(model=model, nof_runs=100, neuron_range=range(100))
my_sim.setup_mpi()

my_sim.run_fiber_population()
h.allobjects('PythonObject')
del my_sim.sim_results
del my_sim
gc.collect() # deletion does not have any effect on number of PythonObjects still found with 1 reference
h.allobjects('PythonObject')
my_sim = sim(model=model, nof_runs=100, neuron_range=range(100))

my_sim.setup_mpi()

my_sim.run_fiber_population()
h.allobjects('PythonObject')

my_sim.close_mpi()
h.quit()
simulator_simple.py

Code: Select all

from neuron import h
from neuron.units import mV, ms
import gc

h.load_file("stdrun.hoc")
h.nrnmpi_init()
pc = h.ParallelContext()

def _run_single_fiber(model, run_no, fiber_no):
    print(f"Running fiber {fiber_no} in run {run_no}")
    fiber = model(run_no, fiber_no)
    
    h.finitialize(-65 * mV)
    h.fcurrent()
    # Here recalculate e.g. leakage reversal potential
    h.fcurrent()
    h.frecord_init()
    h.continuerun(100 * ms)
    
    ret_vals ={}
    ret_vals["v"] = fiber.v_vec
    ret_vals["t"] = fiber.t_vec
    
    fiber.cleanup_fiber()
    del fiber
    gc.collect()
    return ret_vals

class Simulator:
    def __init__(self, model, nof_runs, neuron_range):
        self.model = model
        self.nof_runs = nof_runs
        self.neuron_range = neuron_range
    
    def setup_mpi(self):
        pc.runworker()
    
    def close_mpi(self):
        pc.done()
    
    def run_fiber_population(self):
        self.ret_list = []
        self.sim_results = None
        
        for run_no in range(self.nof_runs):
            for fiber_no in self.neuron_range:
                pc.submit(_run_single_fiber, self.model, run_no, fiber_no)
                
        while pc.working():
            self.ret_list.append(pc.pyret())
            
        self.sim_results = self.ret_list
        print(self.sim_results)
        del self.ret_list
anf_simple.py

Code: Select all

from neuron import h
from neuron.units import um, mV
import gc

class ANF:
    def __init__(self, run_no, fiber_no):
        self.run_no = run_no
        self.fiber_no = fiber_no
        
        self._setup_morphology()
        self._setup_biophysics()
        self._setup_recording_vecs()

    def _setup_morphology(self):
        self.sections = []
        for i in range(10):
            sec = h.Section(name=f"fiber_{self.fiber_no}_sec_{i}", cell=self)
            if i % 2 == 0:  # Node
                self.L = 2 * um
                self.diam = 2 * um
            else:           # internode
                self.L = 100 * um
                self.diam = 2 * um
            self.sections.append(sec)
            
    def _setup_biophysics(self):
        for i, sec in enumerate(self.sections):
            if i % 2 == 0:  # Node
                sec.insert("hh")
                sec.gnabar_hh = 0.12 * 10
                sec.gkbar_hh = 0.036 * 10
                sec.gl_hh = 0.0003 * 10
                sec.Ra = 50
                sec.cm = 1.0
            else:           # internode
                sec.insert("pas")
                sec.e_pas = -65 * mV
                sec.g_pas = 1 * 1e-3 # mS
                sec.cm = 1.0 / 30
    
    def _setup_recording_vecs(self):
        self.v_vec = []
        self.t_vec = []
        self.t_vec.append(h.Vector())
        self.t_vec[-1].record(h._ref_t)
        for sec in self.sections:
            self.v_vec.append(h.Vector())
            self.v_vec[-1].record(sec(0.5)._ref_v)
            
    def cleanup_fiber(self):
        del self.v_vec
        del self.t_vec
        for sec in self.sections:
            del sec
        gc.collect()
hines
Site Admin
Posts: 1713
Joined: Wed May 18, 2005 3:32 pm

Re: Parallelized simulations result in PythonObject accumulation when using the bulletin-board-style parallelization

Post by hines »

Running your model on my desktop with 8.2.7, I do see the steady increase in memory usage. I will try to diagnose the cause and get back when I know more.
hines
Site Admin
Posts: 1713
Joined: Wed May 18, 2005 3:32 pm

Re: Parallelized simulations result in PythonObject accumulation when using the bulletin-board-style parallelization

Post by hines »

Although my diagnosis of the memory leak is incomplete, I can give you
a temporary work-around so that you can continue your simulations
without running out of memory and continue to use your existing NEURON
version. Meanwhile, I'll work on repairing the internal reference counting
bug.

The problem is that the python object returned by

Code: Select all

pc.pyret()
cannot be freed as it is referenced by a HOC world PythonObject wrapper
whose own refcount never goes to 0 and so the PythonObject is never freed.
Note that when a PythonObject is freed, it reduces the reference count of
its wrapped python object by 1.

The work around until a proper bug fix is to have you manually reduce
the refcount of the offending instance of the PythonObject wrapper.

In simulator_simple.py, change

Code: Select all

-            self.ret_list.append(pc.pyret())
+            self.ret_list.append(h.my_pyret(pc))
and copy the following fragment before the ``def _run_single_fiber``
of simulator_simple.py

Code: Select all

h(r'''
obfunc my_pyret() {localobj ho
    ho = $o1.pyret()
    // printf("my_pyret %s %d\n", ho, object_id(ho, 1))
    unref(ho)
    return ho
}
''')
Note that my_pyret() is a HOC function executing in the HOC world
and so pyret() is returning the
offending HOC PythonObject wrapper (wrapping the dict of Vector lists)
which needs its refcount to be reduced by 1. (It is hard to get the correct
PythonObject in python world. In python world all we ever see is the wrapped
python object.) The ``pc`` argument to ``my_pyret`` ensures that the ``pyret`` is using the same ParallelContext instance that submitted the jobs.

The last fragment needed is an implementation of ``unref(ho)`` and
that can be supplied as a mod file (so don't forget the nrnivmodl).

Code: Select all

$ cat unref.mod
NEURON { SUFFIX nothing }

PROCEDURE unref() {
VERBATIM
  Object* ho = *(hoc_objgetarg(1));
  ho->refcount--;
  // printf("unref %s %d\n", hoc_object_name(ho), ho->refcount);
ENDVERBATIM
}
Several additional notes.

As the underlying internal NEURON bug is that in your simlation some PythonObject instances are not properly reference counted, the use of python garbage collection does not work because the python objects are referenced by existing PythonObject wrappers.

Your example code that I am using submits 10000 jobs and accumulates the results each time ``my_sim.run_fiber_population()`` is called. Naturally, during each population run, memory usage should increase as the pyret return values are stored. I reduced my testing time by using

Code: Select all

-my_sim = sim(model=model, nof_runs=100, neuron_range=range(100))
+my_sim = sim(model=model, nof_runs=10, neuron_range=range(10))
Additionally I printed memory usage with

Code: Select all

+print("memory ", h.nrn_mallinfo(0))
 my_sim.run_fiber_population()
+print("memory ", h.nrn_mallinfo(0))
for the two population calls. My memory and overall time results are

Code: Select all

$ time mpiexec -n 8 python run_simple.py |grep memory
memory  21240320.0
memory  57019440.0
memory  21590784.0
memory  57029184.0

real	0m2.603s
user	0m19.972s
sys	0m0.777s
hines@hines-ThinkStation-P5:~/models/lukas$ time python run_simple.py |grep memory
memory  21161056.0
memory  56898944.0
memory  21472944.0
memory  56917488.0

real	0m6.616s
user	0m7.747s
sys	0m0.171s
The 2.5 fold speedup with 8 processes is a bit disappointing but should improve with larger model jobs. One can see that most of the memory usage is recovered after the first population run with ``del my_sim``. There are other PythonObject instances that are causing minor memory leakage that I will also fix with an eventual pull request.
lukas
Posts: 2
Joined: Thu Dec 14, 2023 6:53 am

Re: Parallelized simulations result in PythonObject accumulation when using the bulletin-board-style parallelization

Post by lukas »

Dear Dr. Hines, thank you very much for your fast replies and your time. I implemented the workaround in the simplified model and my full simulation and the memory increase after every population run is now indeed very little /negligible for my full simulations.

I noticed the issue you mentioned in the github issue with NEURON 9.0 as well when I wanted to test if the memory leak also happens in the latest version, but the order of arguments in pc.submit seemed to have been scrambled.

Best regards
Lukas
Post Reply