Regarding saving data from parallel simulations

JBall · Post by **JBall** » Tue Jun 15, 2010 9:15 pm

I have what is probably a relatively simple problem, but I haven't been able to solve it on my own.

I finally have a parallel simulation running on our university's supercluster that is based on the example "ring network" code in Hines & Carnevale 2008 in J. Neurosci Methods. I've inserted my own cell type and I'm satisfied that the cells are doing what they should based on serial simulations.

However, when I want to write the voltage data to a file, I'm having significant problems. As a test run, I'm using 20 cells on 20 nodes. Here is the code I'm trying to run to save the data:

Code: Select all

objref voltage[NCELL],time
time = new Vector()

//In the for loop of the mkcells procedure, I'm executing these lines ("i" is the number of the cell just created):
                voltage[i]=new Vector()
                voltage[i].record(&cell.soma.v(0.5))
                if(pc.id==0) {time.record(&t)}

//then once the simulation has run, I'm trying to save the data to a file in the following general way:

objref fobj
fobj = new File("data")

fobj.wopen()
if(pc.id==0) {time.printf(fobj)}

        for (i=0; i < NCELL; i += 1) {
            if(pc.id=i){
                voltage[i].printf(fobj)
                pc.barrier()
              }
        }

fobj.close()

"pc.barrier()" is there in the vain hopes that I'll only get the output of one cell into the file at one time, but I'm not terribly comfortable with all of this just yet and this hasn't worked out. I only get two sets of data, usually the time vector and one voltage vector, but I've also gotten two voltage vectors in the file.

Any help is of course very much appreciated.

Post by **ted** » Wed Jun 16, 2010 10:44 am

If you want just one output file:
the file should be created, opened, and closed only by the host with pc.id == 0
each host, in turn, should append its results to that file

If you're ok with multiple output files, each host must create a file that has a unique name (you can do this with sprint), and then write to that file.

With a network of nontrivial size it's not generally useful to record and save continuous variables from each cell to a file--storage requirements are enormous, and what is anyone going to do with all that stuff anyway? Most often one is interested in spike times (much more compact). In those cases where continuous variables are of interest, it's only the activity of a small subset of cells, and that can be reconstructed by using PatternStim to play back the recorded spike times into the relevant subnet (go to http://www.neuron.yale.edu and use the search tool to look for PatternStim).

JBall · Post by **JBall** » Wed Jun 16, 2010 5:06 pm

I agree that having the voltage output at every time step of each cell during a large network simulation is not necessarily useful; I'm doing this mostly as a test to teach myself how to implement parallel neuron code. My hope is that what I learn by doing this will carry over into my actual work with the software.

Creating a data file on each node works just fine, but I'm having trouble exporting all of the data to one file. It seems that my main problem is understanding how to send the data from the vector on each node to one file. My most recent attempt is like so:

Code: Select all

 //In the mkcell procedure:
                voltage[i]=new Vector()
                time[i]=new Vector()
                cvode.record(&cell.soma.v(0.5),voltage[i],time[i])

//After the simulation has run:
    if(pc.id==0){
        fobj = new File("data")
        fobj.wopen()
        }

if(pc.id==0){

for (i=0; i < NCELL;i += 1){
        time[i].printf(fobj)
        voltage[i].printf(fobj)
        pc.barrier()
        }
}

if(pc.id==0){
fobj.close()
}

I know the three "if" statements are superfluous, but this is code in progress and I haven't cleaned it up from previous iterations. As far as I can tell, my problem is summarized thus: My data file is created on and accessed from the main node, yet the voltage and time vectors exist on different nodes. So, the nodes can't access the file on the main node, and the main node can't access the data vectors to print them to the file. Again, I'm going to stress that I likely have a fundamental gap in my knowledge of how objects are created and accessed during parallel computing. An option I can think of is to create the file on the main node, write the data from the main node, then close the file, and do a "fobj.aopen()" on each remaining node in series until all of the data is written. However, you said that only the main node should open and close the file, so I have a feeling I'm missing something.

Again, thanks for the assistance.

Post by **hines** » Thu Jun 17, 2010 10:46 am

I'm using 20 cells on 20 nodes.

It is always a bad idea to conflate cells and processors. What you learn isn't generalizable and
the idiom:

Code: Select all

        for (i=0; i < NCELL; i += 1) {
            if(pc.id=i){

is just accidentally correct since it assumes cell is on processor i.
Anyway, the bug is at

Code: Select all

fobj.wopen()

Since you are simultaneously opening the same file and writing to it 20 times.
It's is likely that only the last process that closes the file will actually get its information saved.
Your barrier in this case only ensures that everyone is guaranteed to be racing to close the file
at the same time after they have all written their own version of it.
The proper idiom for serialization in this case is for processor 0 to create the file, write and close, then
all the other in turn open it for appending, write, and close. ie. (a slight variation)
if (pc.id == 0) {
fobj.wopen()
fobj.close()
}
pc.barrier()
for i=0, pc.nhost-1 {
if (pc.id == i) {
fobj.aopen()
...write to file...
fobj.close()
}
pc.barrier()
}

JBall · Post by **JBall** » Thu Jun 17, 2010 12:32 pm

I love being accidentally correct! In seriousness, though, if in this case I want to ensure that a processor is performing an operation only if the cell exists on the node, assuming I don't have one cell per node, would a better option be:

Code: Select all

for(i=0;i<NCELL; i += 1){
       if(pc.gid_exists(i)) { do these things } 
}

assuming that each cell's gid is set to its cell index when they are first made?

And another quick question, just to clarify: You say that my use of "fobj.wopen()" will open the file 20 times in this case, but in the second snippet of code I posted, I wrapped this command in an "if(pc.id==0)" condition, which to my understanding means that the code is only executed on the main node. You also show this in your code. This is probably an asinine question, but I want my understanding of how the code is executed to be clear.

Again, thanks. My dissertation certainly appreciates it.

Post by **hines** » Thu Jun 17, 2010 1:46 pm

Code: Select all

for(i=0;i<NCELL; i += 1){
       if(pc.gid_exists(i)) { do these things }
}

is fine. Typically cells are kept in a list which saves space over
objref cell[NCELL] where on each processor you are using only
NCELL/pc.nhost entries. One can retrieve the cell knowing the gid with
pc.gid2cell(gid)

With regard to your second snippet, you do open a new file for processor 0 but there is no open file(writing or appending)
for any of the other processors.

www.neuron.yale.edu

Regarding saving data from parallel simulations

Regarding saving data from parallel simulations

Re: Regarding saving data from parallel simulations

Re: Regarding saving data from parallel simulations

Re: Regarding saving data from parallel simulations

Re: Regarding saving data from parallel simulations

Re: Regarding saving data from parallel simulations