Writing output files from parallel simulations

A collection of noteworthy items selected by our moderators from discussions about making and using models with NEURON.

Moderators: ted, wwlytton, tom_morse

Post Reply
ted
Site Admin
Posts: 6287
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Writing output files from parallel simulations

Post by ted »

Often the easiest way to save output from parallel simulations is to use ordinary print or printf statements to dump it to stdout, as in this hoc example:

Code: Select all

objref pc
pc = new ParallelContext()
 . . . many statements and procedures later . . .
proc spikeout() { local i, rank
  pc.barrier()  // wait for all hosts to get to this point
  if (pc.id==0) printf("\ntime\t cell\n")  // print header once
  for rank=0, pc.nhost-1 {  // host 0 first, then 1, 2, etc.
    if (rank==pc.id) {
      // the elements of the tvec and idvec Vectors
      // are the time of each spike
      // and the gid of the cell that generated it, respectively
      for i=0, tvec.size-1 {
        printf("%g\t %d\n", tvec.x[i], idvec.x[i])
      }
    }
    pc.barrier()  // wait for all hosts to get to this point
  }
}
spikeout()
Notice how the ParallelContext class's barrier method, and iterating over host (processor) IDs, are used to "serialize" the reporting of spike times--each host, one at a time, is allowed to report the spikes generated by the gids that have been assigned to it. Program output can then be redirected to a file; this produces a file that contains plain ASCII.

However, sometimes it is desirable to explicitly write results to a file, to save results to two or more files, or to save results in binary format. This raises questions such as
* does each host has its own file system, or do all hosts share the same file system?
* if the file system is shared, and the aim is to write all results to the same file, how can this be done without each host overwriting what was written by another host?

To answer these questions, I just ran some tests on my own desktop PC under Linux, and on the Neuroscience Gateway Portal NSG http://www.nsgportal.org/. Here's what I found out.

First: is the file system shared?

On my PC the answer had to be yes, but I wasn't sure about the NSG. To find out, I wrote and executed ftest1.hoc

Code: Select all

objref pc
pc = new ParallelContext()
strdef nom
objref fil
{
if (pc.id==0) print "ftest1.hoc--generates nhost output files"
sprint(nom,"f%d.dat",pc.id)
printf("I am %d of %d, nom is %s\n", pc.id, pc.nhost, nom)
fil = new File(nom)
fil.wopen()
fil.printf("I am %d of %d, nom is %s\n", pc.id, pc.nhost, nom)
fil.close()
}
{pc.runworker()}
{pc.done()}
quit()
As expected, executing this command line on my PC
mpiexec -n 2 nrniv -mpi ftest1.hoc
produced files called f0.dat and f1.dat which contained
I am 0 of 2, nom is f0.dat
and
I am 1 of 2, nom is f1.dat
respectively. So on my PC, each host saw the same file system. No surprise, but it's a nice sanity check.

Executing ftest1.hoc on the NSG with two cores, each on a different node, produced the same result. This means that each host saw the same file system, even if the host was on a different node. So the NSG has a shared file system too.

And if that's the case, it should be possible to produce a program in which file output generated by one host is overwritten by file output generated by another. To this end I wrote ftest2.hoc

Code: Select all

objref pc
pc = new ParallelContext()
objref fil
{
if (pc.id==0) {
  print "ftest2.hoc--on a shared filesystem machine"
  print "generates one output file that is overwritten by each host"
}
fil = new File("ofil.dat")
fil.wopen()
printf("I am %d of %d\n", pc.id, pc.nhost)
fil.printf("I am %d of %d\n", pc.id, pc.nhost)
fil.close()
}
{pc.runworker()}
{pc.done()}
quit()
Executing this command line on my PC
mpiexec -n 4 nrniv -mpi ftest2.hoc
produced a single file called ofil.dat which contained
I am 1 of 4
which is what would happen if each host was writing to the same file system, indeed overwriting whatever already existed on disk, and host 1 was the slowest. I tried this again a couple of times, and occasionally a different host was the last one, but none of these runs produced an ofil.dat that contained more than one line of text. A run on NSG with 4 cores on 1 node produced similar results.

So that confirmed the conjecture that the NSG's file system is shared, but it raised a new, important question: How to prevent each host's file output from interfering with the output from each other host? In particular:

How to make all hosts write nondestructively to the same output file?

The trick is to make each host append its output to the same file. "But what if there is already a file with the same name that contains results of a previous simulation?" Of course, one must first test for the existance of such a file, and delete it if found. And that's what ftest3.hoc does

Code: Select all

objref pc
pc = new ParallelContext()
objref fil
{
  fil = new File("ofil.dat")
  if (pc.id==0) {
    print "ftest3.hoc--on a shared filesystem machine"
    print "generates one output file to which each host appends data"
  }
  if (pc.id==0) { // test for existence of output file, delete if found
    if (fil.ropen()==1) fil.unlink()
  }
  pc.barrier() // wait for all hosts to get to this point
  for rank=0,pc.nhost-1 { // host 0 first, then 1, 2, etc.
    if (rank==pc.id) {
      printf("I am %d of %d\n", pc.id, pc.nhost)
      fil.aopen()
      fil.printf("I am %d of %d\n", pc.id, pc.nhost)
      fil.close()
    }
    pc.barrier() // wait for all hosts to get to this point
  }
}
{pc.runworker()}
{pc.done()}
quit()
Executed on my PC with this command line
mpiexec -n 4 nrniv -mpi ftest3.hoc
it produced an ofil.dat that contained these lines
I am 0 of 4
I am 1 of 4
I am 2 of 4
I am 3 of 4

which is exactly as expected. And any ofil.dat that already exists is deleted before a new ofil.dat is generated. I repeated this test on NSG using a total of 4 cores (2 cores on each of 2 nodes), and got the same result.
Post Reply