node number based numerical error

General issues of interest both for network and
individual cell parallelization.

Moderator: hines

Post Reply
spen123

node number based numerical error

Post by spen123 »

Hi

I am new to Parallel neuron, i converted my 65 cell serial model in to parallel code and executed it on my university linux cluster. The results vary by 5% when i chose different number of nodes. Please help me by suggesting some ways to reduce this error.

Should i use pc.barrier()?
ted
Site Admin
Posts: 6300
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: node number based numerical error

Post by ted »

First suggestion: execute it on your own PC or Mac, even if it is a single processor machine. It should run and generate the same result regardless of whether you are using mpi or not, and (if you are using mpi) regardless of the number of "hosts" (processors) specified on the command line. So you can develop and debug without having access to parallel hardware.
i converted my 65 cell serial model in to parallel code
How? Did you follow the strategy presented in
Hines, M.L. and Carnevale, N.T.
Translating network models to parallel hardware in NEURON.
J. Neurosci. Methods 169:425-455, 2008.
?
spen123

Re: node number based numerical error

Post by spen123 »

Thanks for reply Ted,

I initially ran on single processor no errors and works fine. Using the paper you referenced i converted this to parallel code. The parallel code runs on the my laptop (windows XP). When i execute the same code on linux cluster results don't match. When i change the number of nodes to 10 the results are OFF 2% when compared to parallel code i ran on my laptop. when i executed it with 5 nodes resutls were off by 8%.

I am getting different results (varying ~2-8%) when i vary the number of nodes. I am looking for a way to debug this error.

Thanks
Sandeep
ted
Site Admin
Posts: 6300
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: node number based numerical error

Post by ted »

Ask the question "which cell in the parallel implementation is the first one to have an incorrect spike time." The problem will originate either in the properties of the cell iteself, or in the synaptic input to that cell.

Examples of errors in the cell itself include:
incorrect morphology, topology, biophysical properties, or spatial discretization--perhaps an instance of the wrong cell class, or incorrect assignment of some anatomical or biophyscical or spatial discretization parameter of the target cell.

Examples of errors in the synaptic input include:
--incorrect location of the synapse on the target cell
--incorrect type of synapse (ExpSyn vs. Exp2Syn vs. something else)
--error in the numeric values of synaptic parameters (e.g. dynamics, reversal potential, plasticity rule)
--incorrect weight or delay of one or more of the NetCons that project to it
ted
Site Admin
Posts: 6300
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: node number based numerical error

Post by ted »

Additional hints for debugging:
Reduce the size of the problem as much as possible. That means cut run time and number of cells and synaptic connections (at least two of which should already be parameterized, for the sake of facilitating development--what is development anyway, other than an iterative cycle of breaking and fixing code?) to something manageable--something that allows you to run for a few seconds and yet see the symptom.
Post Reply