Gap junctions with ParallelContext

General issues of interest both for network and
individual cell parallelization.

Moderator: hines

Post Reply
mlazarew

Gap junctions with ParallelContext

Post by mlazarew »

I am trying to understand how to use a ParallelContext class with a network of neurons connected by gap junctions across nodes.

I have tried to run a simple program (below) on two nodes. It should transfer the value 99 of the variable "a" between nodes 1 and 0.

If my program is correct, then I should see a value 99 assigned to the variable "a" on both nodes 0 and 1. But after running the program the variable "a" has the value 99 only on the node 1, where it was set to this value, and value 1 on the node 0, where it should be transfered. That means that the transfer across nodes did not happened.

The most probably I am not using these methods properly.

Does somebody know how to use them correctly?

Code: Select all

load_file("nrngui.hoc")
objref pc

proc set_maxstep() { pc.set_maxstep(5) }
proc doinit()      { stdinit() }
proc psolve()      { pc.psolve($1) print_a() }
proc print_a()     { print "My id: ", pc.id, " a= ", a, "time= ", t }
proc tr()          { pc.setup_transfer() }
proc set_a()       { if (pc.id == 1) a = 99 }

cvode.active(0)

pc    = new ParallelContext()
tstop = 100
a     = 1

if (pc.id == 0) {
    pc.target_var(&a, 0)
    pc.setup_transfer()
}

if (pc.id == 1) {
    pc.source_var(&a, 0)
    pc.setup_transfer()
}
    
pc.runworker()

pc.context("set_maxstep()\n")
set_maxstep()

pc.context("tr()\n")
tr()

pc.context("doinit()\n")
doinit()

pc.context("set_a()\n")

pc.context("psolve", tstop)
psolve(tstop)

pc.done()
quit()
numprocs=2
NEURON -- VERSION 6.0.870 (1745) 2007-05-16 (1745M)
by John W. Moore, Michael Hines, and Ted Carnevale
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2007

1
1
0
1
1
1
1
1
My id: 1 a= 99 time= 99.975
My id: 0 a= 1 time= 99.975
0
hines
Site Admin
Posts: 1691
Joined: Wed May 18, 2005 3:32 pm

Post by hines »

Does the following work:

Code: Select all

{load_file("nrngui.hoc")}
objref pc
proc set_maxstep() { pc.set_maxstep(5) }
proc doinit()      { stdinit() }
proc psolve()      { pc.psolve($1) print_a() }
proc print_a()     { printf("My id: %d a=%g time=%g\n", pc.id, a, t) }
proc tr()          { pc.setup_transfer() }
proc set_a()       { if (pc.id == 1) a = 99 }

{cvode.active(0)}

pc    = new ParallelContext()
tstop = 100
a     = 1

if (pc.id == 0) {
    pc.target_var(&a, 0)
}

if (pc.id == 1) {
    pc.source_var(&a, 0)
}

set_maxstep()
tr()
doinit()
set_a()
print_a()
psolve(tstop)

{pc.runworker()}
{pc.done()}
quit()
On my machine it prints:

Code: Select all

[hines@localhost Parallel]$ mpiexec -np 2 /home/hines/neuron/nrnmpi/x86_64/bin/nrniv -mpi maciej.hoc
numprocs=2
NEURON -- VERSION 6.0.870 (1745) 2007-05-16
by John W. Moore, Michael Hines, and Ted Carnevale
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2007

My id: 0 a=1 time=0
My id: 1 a=99 time=0
My id: 0 a=99 time=99.975
My id: 1 a=99 time=99.975
[hines@localhost Parallel]$
Also my machine hanged on your original code in one of the pc.context calls. I'll look further into that. Notice how I enclose each of the function calls that would print 1 or 0 within {}. Also I replaced the print statement in print_a with a printf which prints the line atomically and prevents mixing from different pc.id.
mlazarew

Post by mlazarew »

On my machine I am getting the following error:

Code: Select all

mpiexec -n 2 /Volumes/MTL/neuron/nrn/powerpc/bin/nrniv -mpi  /Volumes/MTL/CA1/maciej.hoc 

numprocs=2

NEURON -- VERSION 6.0.870 (1745) 2007-05-16 (1745M)
by John W. Moore, Michael Hines, and Ted Carnevale
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2007

Additional mechanisms from files
 Exp2SynNMDA.mod caolmw.mod capr.mod gap.mod icaolmw.mod icappr.mod iholmw.mod imppr.mod kahpppr.mod kcaolmw.mod kcppr.mod kdrbwb.mod kdrmsw.mod kdrppr.mod koppr.mod ksmsw.mod nafbwb.mod nafmsw.mod nafppr.mod naoppr.mod
1 /Volumes/MTL/neuron/nrn/powerpc/bin/nrniv: multiple instances of source gid: 0
1  in /Volumes/MTL/CA1/test_hines.hoc near line 25
1  tr()
     ^
        1 ParallelContext[0].setup_transfer()
      1 tr()
dyld: lazy symbol binding failed: Symbol not found: _MPI_Initialized
  Referenced from: /Volumes/MTL/neuron/nrn/powerpc/lib/libnrnmpi.0.dylib
  Expected in: flat namespace

dyld: Symbol not found: _MPI_Initialized
  Referenced from: /Volumes/MTL/neuron/nrn/powerpc/lib/libnrnmpi.0.dylib
  Expected in: flat namespace

rank 1 in job 46  d221mac4_87651   caused collective abort of all ranks
  exit status of rank 1: killed by signal 5
But after I modified the code to

Code: Select all

{load_file("nrngui.hoc")}
objref pc
proc set_maxstep() { pc.set_maxstep(5) }
proc doinit()      { stdinit() }
proc psolve()      { pc.psolve($1) print_a() }
proc print_a()     { printf("My id: %d a=%g time=%g\n", pc.id, a, t) }
proc tr()          { pc.setup_transfer() }
proc set_a()       { if (pc.id == 1) a = 99 }

{cvode.active(0)}

pc    = new ParallelContext()
tstop = 100
a     = 1

if (pc.id == 0) {
    pc.target_var(&a, 0)
    tr()
}

if (pc.id == 1) {
    pc.source_var(&a, 0)
    tr()
}

set_maxstep()
{printf("ID: %d\n", pc.id)}
//tr()
doinit()
set_a()
print_a()
psolve(tstop)

{pc.runworker()}
{pc.done()}
quit() 
then I am getting:

Code: Select all

numprocs=2
NEURON -- VERSION 6.0.870 (1745) 2007-05-16 (1745M)
by John W. Moore, Michael Hines, and Ted Carnevale
Duke, Yale, and the BlueBrain Project -- Copyright 1984-2007

Additional mechanisms from files
 Exp2SynNMDA.mod caolmw.mod capr.mod gap.mod icaolmw.mod icappr.mod iholmw.mod imppr.mod kahpppr.mod kcaolmw.mod kcppr.mod kdrbwb.mod kdrmsw.mod kdrppr.mod koppr.mod ksmsw.mod nafbwb.mod nafmsw.mod nafppr.mod naoppr.mod
ID: 0
ID: 1
My id: 0 a=1 time=0
My id: 1 a=99 time=0
My id: 0 a=1 time=99.975
My id: 1 a=99 time=99.975
That means that there is some problem on my machine that prevents me to use successfully the inter-node gap junction.
hines
Site Admin
Posts: 1691
Joined: Wed May 18, 2005 3:32 pm

Post by hines »

Code: Select all

1 /Volumes/MTL/neuron/nrn/powerpc/bin/nrniv: multiple instances of source gid: 0 
This is exceedingly puzzling. The order of pc.set_maxstep() and pc.setup_transfer() should not matter. The message suggests that pc.source_var(&a, 0) was executed twice but that manifestly is not the case.

Code: Select all

dyld: lazy symbol binding failed: Symbol not found: _MPI_Initialized
Since setup_transfer failed, NEURON called nrnmpi_abort which calls MPI_Abort() or abort() depending on the flag MPI_Initialized returns. The mac requires that all the MPI functions that NEURON calls be listed in the file, nrn/src/oc/ockludge.c . I see that MPI_Initialized is missing from the list and I'll have to add it to make the message go away for the mac.

From my previous message
Also my machine hanged on your original code in one of the pc.context calls.
I've concluded that is due to a race condition between how the bulletin board communicates and the direct MPI_... calls for spike exchange, setup, transfer, etc. The bottom line is do not make a pc.context(statement\n)
call where the statement does something that uses any non-bulletin board MPI call.
hines
Site Admin
Posts: 1691
Joined: Wed May 18, 2005 3:32 pm

Post by hines »

The problem was a missing, reasonable, error message. The svn repository version will now state:

Code: Select all

numprocs=2
NEURON -- VERSION 6.0.872 (1747) 2007-05-19
...
0...: To use ParallelContext.setup_transfer when nhost > 1, NEURON must be configured with --with-paranrn
...
Post Reply