Check pointing in Neuron

nicolangelo · Post by **nicolangelo** » Sun Dec 17, 2006 11:29 pm

Dear all,

Has anyone had any success/experience at "Check Pointing" using Neuron on either a linux cluster or a stand alone machine?

I have also noted from the current Neuron documentation, that some new functions (which still seem to be under development) have been recently introduced to address some "check pointing" issues, but this has been exclusively developed for the BlueGene architecture. Furthermore, I have seen some related discussion in the Forum about the SaveState() function which seems to address some "check pointing" issues at the single cpu level,with no information regarding parallel simulations.

Specifically, it would great to here from those of you who have succeeded or had some experience at "Check Pointing" in either a parallel environment or a single cpu, I would greatly appreciate to know what you did or for any ideas.

Finally, I have a question which is equally related to "Check Pointing" as well as the discussion dealing with the SaveState() function.

For those who have introduced "randomness" into their simulations, ie either stochastic channels or random synaptic inputs. How did you save the current state of the random number generator (RNG) in your simultion, even for the case when the RNG is a "user defined" external function, during the check pointing process? Does the SaveState() function handle such a situation or does one need to impliment a new function/strategy for this?

Thanks in advance for any ideas or potential solutions

Nicolangelo

Post by **hines** » Wed Dec 20, 2006 6:15 pm

http://www.neuron.yale.edu/neuron/stati ... state.html
indicates the information saved. It is missing the state of the random generators and so that, as well as any user defined state will have to be saved explicitly.

SaveState can be used to save the state of a parallel simulation but only if you
have returned normally from a
http://www.neuron.yale.edu/neuron/stati ... tml#psolve
i.e do not try to save at arbitrary times in the middle of a simulation using a callback such as http://www.neuron.yale.edu/neuron/stati ... html#event
since you would miss the buffered spikes which are getting ready to be transferred at the end of a http://www.neuron.yale.edu/neuron/stati ... et_maxstep
interval. Clearly, I need to do another round of SaveState enhancement to bring it up to date with the new ParallelContext network simulation methods.

With regard to the state of the random generator, I highly recommend use of
http://www.neuron.yale.edu/neuron/stati ... #MCellRan4
This has the interesting property of being restartable if you know the global lowindex value from http://www.neuron.yale.edu/neuron/stati ... _ran4_init
and (if you know the high index value you can restart the Random instance with
Random.MCellRan4(highindex). The problem, of course is knowing the index and here again, clearly, I need to add a method that gives the present value. Til
I make the change, take a look at
nrn/src/ivoc/ivocrand.cpp and specifically
the

Code: Select all

class MCellRan4 : public RNG {
...
        unsigned int idum_;
...
}

where idum_ gets incremented on
every pick of the random number.
If you keep setting idum to the same number the next pick will always be the same.

Post by **hines** » Sat Dec 23, 2006 10:09 am

I was wrong about SaveState being usable in a parallel simulation. It also failed to save ARTIFICIAL_CELL information after they no longer were located in sections. Anyway, I spent the past few days bringing it up to date, at least to the point where my 1/10 size version of the Traub model can be saved and restored on subsequent launches. I also modified the SaveState.fread and fwrite methods so that you can request on return that they leave the file open so you can add user specific info to the file. Note that the Random.MCellRan4 generator now has a Random.seq method that allows saving and restoring of its internal state. The bottom line remains that to have confidence in SaveState, one absolutely must test it with something like:

Code: Select all

        stdinit()
        if (0) {
                pnm.psolve(tstop/2)
                savestate()
        }else{
                restorestate()
        }
        pnm.psolve(tstop)

I have had good success with the following procedures:

Code: Select all

proc savestate() {local i  localobj s, ss, f, rl
        s = new String()
        sprint(s.s, "svst.%04d", pc.id)
        f = new File(s.s)
        ss = new SaveState()
        ss.save()
        ss.fwrite(f, 0)

        rl = new List("Random")
        f.printf("Random %d\n", rl.count)
        for i=0, rl.count-1 {
                f.printf("%d\n", rl.object(i).seq())
        }
        f.close
}

proc restorestate() {local i  localobj s, ss, f, rl
        stdinit()
        s = new String()
        sprint(s.s, "svst.%04d", pc.id)
        f = new File(s.s)
        ss = new SaveState()
        ss.fread(f, 0)
        rl = new List("Random")
        if (f.scanvar() != rl.count) {
                execerror("Random count unexpected", "")
        }
        for i=0, rl.count-1 {
                rl.object(i).seq(f.scanvar())
        }
        f.close
        ss.restore()
}

nicolangelo · Post by **nicolangelo** » Fri Feb 02, 2007 1:32 am

Dear Michael Hines,

Apologies for the long delay in replying .... anyway, thanks for all the information it should be
helpful, will need to find the time and try this all.

Thanks again for all the help.

If I get bogged down again, I will drop another line.

best regards & thanks again

Nicolangelo