Regarding fitting of data sampled at irregular intervals

Using the Multiple Run Fitter, praxis, etc..
Post Reply
shailesh
Posts: 104
Joined: Thu Mar 10, 2011 12:11 am

Regarding fitting of data sampled at irregular intervals

Post by shailesh »

I have two queries:

1> If the data (voltage vs time) to be fitted is sampled at irregular intervals, does MRF while evaluating the fitness consider only the points provided (loaded from file) or does it implicitly perform a linear interpolation between the data points - and then use all these values for comparison?

2> I created a model of a cell with HH mechanism. Its gnabar, gkbar, gl, el were tweaked to get a slightly different shape. The voltage vs time plots were recorded for both fixed step as well as adaptive integration. I tried fitting data from these two cases (for a model with default HH) using MRF and I found that the parameter sets turned out to be quite different! I understand that there arises some differences in the plots from the two cases and thus different parameter sets might arise. But considering that the difference is huge, how should we decide which method (fixed/adaptive) when performing MRF? This might take more importance when handling experimental data (such as digitizing data from figures etc)?
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: Regarding fitting of data sampled at irregular intervals

Post by ted »

shailesh wrote:If the data (voltage vs time) to be fitted is sampled at irregular intervals, does MRF while evaluating the fitness consider only the points provided (loaded from file) or does it implicitly perform a linear interpolation between the data points - and then use all these values for comparison?
Good question. Here's one way to find out:
1. creating a toy model (e.g. passive single compartment with known membrane time constant driven by a current step)
2. using that to generate a set of test data that are sampled at irregular intervals (no more than 4 or 5 points--keep it simple)
3. setting up a "run fitness" optimization problem in which g_pas and cm are to be are adjusted so that the simulated v vs. t (generated with fixed dt so that a run involves dozens or hundreds of time steps) matches the (irregularly sampled) time course of v vs. t
4. but before doing the optimization, click on the generator's "Error" button, see what value it reports, and compare that against the sum of squared errors you would expect if the errors were evaluated only at the 4 or 5 sample points.
I created a model of a cell with HH mechanism. Its gnabar, gkbar, gl, el were tweaked to get a slightly different shape. The voltage vs time plots were recorded for both fixed step as well as adaptive integration. I tried fitting data from these two cases (for a model with default HH) using MRF and I found that the parameter sets turned out to be quite different! I understand that there arises some differences in the plots from the two cases and thus different parameter sets might arise. But considering that the difference is huge, how should we decide which method (fixed/adaptive) when performing MRF? This might take more importance when handling experimental data (such as digitizing data from figures etc)?
Multiple interesting questions here. The answer to all of them depends in part on the answer to your first question--it may be as simple as "if the 'experimental data' were sampled at a particular set of times, the model's output must be sampled at exactly the same times." If that is the case, you may want to resample irregularly sampled "experimental data" at a fixed interval, e.g. by using the Vector class's interpolate() method or perhaps by implementing a resampling strategy that uses low order polynomials or splines. Simulation results generated by adaptive integration can be captured at regular intervals by using the Vector class's record(&var, Dt) or record(&var, tvec) syntax (see the Programmer's reference about these features); if you resort to that, you'll have to provide your own error function that makes use of the regularly sampled simulation results.
shailesh
Posts: 104
Joined: Thu Mar 10, 2011 12:11 am

Re: Regarding fitting of data sampled at irregular intervals

Post by shailesh »

Regarding the first question:
> I followed your suggestion and set up a similar toy model (preferred 'hh' instead of 'pas' as I had it ready). The results were interesting but left me with further queries!
The data to be fitted was a shape of an AP (with tweaked values of HH parameters) with just 3 points (just before onset, peak and after-hyperpolarization). The model (with default HH parameters) was set to run using fixed step integration. On clicking "Error Value", the following was observed:
- The voltage vs time graph plotted the entire continuous waveform for the simulated run
- The MRF Generator window originally had just a red plot (with three points) showing the data to be fitted (loaded from file). After the run, it plotted a waveform in black with just three points - and these points corresponded with the (voltage,time) value pairs on the voltage graph.
- Error Vaue was shown = 403.87
- But when I manually calculate the sum of squared errors, I get 1211.6121
Points (t, vm) from File:
5 -76.2631
5.725 43.6974
7.625 -76.8252

Points (t, vm) from Simulation
5 -64.9492
5.725 40.9602
7.625 -44.021
I am sure that if anything like interpolation was being done then certainly there would be more points and thus a higher Error Value (more terms in sum of squared errors). So I am left wondering how MRF arrived at the value of 403.87?! Quite sure I haven't goofed up, but one can never be certain I suppose.
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: Regarding fitting of data sampled at irregular intervals

Post by ted »

Interesting. The three points would be connected by straight line segments, right? Try limiting the interval over which the generator evaluates the error. Try to isolate a region that lies _between_ two of your "original data" points (if you're not sure how, see this part of the MRF tutorial http://www.neuron.yale.edu/neuron/stati ... imize.html), then evaluate the error over this region. If you get a value other than 0, then maybe the data are being interpolated. I wonder if you can reduce the interval width to where it contains only about 2 or 3 solution points, which would allow you to make a quick "manual" calculation that confirms or rules out interpolation.
shailesh
Posts: 104
Joined: Thu Mar 10, 2011 12:11 am

Re: Regarding fitting of data sampled at irregular intervals

Post by shailesh »

Yes, the three points are connected by straight line segments.
I tried what you suggested about restricting regions. Firstly, I found that it was not possible to position the blue lines such that they were between two of our points of interest. I even tried using the weight panel and entering the startpoint/endpoint to achieve the same. They would either leave atleast one point inbetween or snap on to each other.

So I continued testing for restricted regions that it allowed. For all three points individually it worked fine with the MRF Error Values being same as the calculated values:
Point 1: 128
Point 2: 7.4921
Point 3: 1076.1
All is well. In these cases, the model does not plot anything on the MRF Generator graph as only point is under consideration.

But now when I try including multiple points, the trouble arises.
Points 1 & 2 -> MRF = 67.748 vs Calculated = 135.4921
Points 2 & 3 -> MRF = 541.8 vs Calculated = 1083.5921
Points 1 & 3 -> MRF = 389.86 vs Calculated = 1204.1
Points 1, 2 & 3 -> MRF = 403.87 vs Calculated = 1211.5921
(approx values). In these cases we do have straight line segments joining the concerned points.

It is weird that the sum of squared errors is less for all 3 points together than some combinations of just two points. Not sure what is happening...
Last edited by shailesh on Wed Jun 18, 2014 10:36 am, edited 1 time in total.
hines
Site Admin
Posts: 1682
Joined: Wed May 18, 2005 3:32 pm

Re: Regarding fitting of data sampled at irregular intervals

Post by hines »

Since you mention "regions" I infer that we are talking about nrn/lib/hoc/mulfit/e_norm.hoc with the comment at the beginning of:

Code: Select all

error is weighted sum of normalized error in each region. Region i
has weight[i] and domain boundary[i] < x < boundary[i+1].
Normalized region error is an approximation to the integral of
(y(x) - ydat(x))^2 over the integral of x.
In func efun() in that file, for model and data that are on independent non-uniform grids, the calculation used is

Code: Select all

e = ydat_.meansqerr($o1.interpolate(xdat_,$o2), dw_)
Here, $o1 is the model trajectory y values and $o2 is the model trajectory t values.
ydat and xdat are the data y and t values. So the model trajectory is interpolated to the data trajectory and only the resulting values at the data locations are used in the "meansqerr" calculation which
is defined as "return value is sum of w*(v1 - v2)^2 / size".

dw_ depends on the region sizes and intervals between data points and is implemented in the above file in the set_w() procedure. That is certainly complicated and the implementation goal is to make
the first comment of this reply, true. Let's leave this as an open question in lieu of further code review and testing and see if the above is sufficient for you to resolve your test result differences.

I should mention that I believe the entire notion of mean square model-data trajectory difference as a fitness function can certainly be criticised, especially for action potentials. To me, there seem
to be two criteria for a fitness function for the praxis method, 1) From a reasonable starting set of parameters, there must be a path to the minimum which is all downhill. 2) the minimum is
meaningful in terms of ones judgment of what constitutes a reasonable fit of model to data.
shailesh
Posts: 104
Joined: Thu Mar 10, 2011 12:11 am

Re: Regarding fitting of data sampled at irregular intervals

Post by shailesh »

Sorry, I wasn't sure what you meant by:
... to make the first comment of this reply, true.
and so wanted to clarify.

Was it regarding my first question on this thread:
1> If the data (voltage vs time) to be fitted is sampled at irregular intervals, does MRF while evaluating the fitness consider only the points provided (loaded from file) or does it implicitly perform a linear interpolation between the data points - and then use all these values for comparison?
and that, yes, MRF does indeed does interpolate between provided points to evaluate the error value?
hines
Site Admin
Posts: 1682
Joined: Wed May 18, 2005 3:32 pm

Re: Regarding fitting of data sampled at irregular intervals

Post by hines »

I was referring to my comment:

Code: Select all

error is weighted sum of normalized error in each region. Region i
has weight[i] and domain boundary[i] < x < boundary[i+1].
Normalized region error is an approximation to the integral of
(y(x) - ydat(x))^2 over the integral of x.
I have been looking at the implementation of the set_w() procedure and it appears that the comment is, in fact, false. From the point of view of integration, the implementation implicitly assumes the data in each region is at uniform
intervals since the weight of each data point within each region is constant. Each y(x) - ydat(x) at the x data values is given a weight proportional to the weight of the region it is in.
shailesh
Posts: 104
Joined: Thu Mar 10, 2011 12:11 am

Re: Regarding fitting of data sampled at irregular intervals

Post by shailesh »

Based on your comments:
the model trajectory is interpolated to the data trajectory and only the resulting values at the data locations are used in the "meansqerr" calculation which is defined as "return value is sum of w*(v1 - v2)^2 / size".
and
error is weighted sum of normalized error in each region
I took a second look at the values I posted earlier:
Point 1: 128 (same as calculated)
Point 2: 7.4921 (same as calculated)
Point 3: 1076.1 (same as calculated)

Points 1 & 2 -> MRF = 67.748 vs Calculated = 135.4921
Points 2 & 3 -> MRF = 541.8 vs Calculated = 1083.5921
Points 1 & 3 -> MRF = 389.86 vs Calculated = 1204.1
Points 1, 2 & 3 -> MRF = 403.87 vs Calculated = 1211.5921

I had missed the 'size' earlier, and incorporating that in calculating the error sum of squares I got:
Points 1 & 2 -> MRF = 67.748 vs Calculated = 135.4921/2 = 67.746
Points 2 & 3 -> MRF = 541.8 vs Calculated = 1083.5921/2 = 541.796
Points 1, 2 & 3 -> MRF = 403.87 vs Calculated = 1211.5921/3 = 403.864
... and the values match! It should be noted that all the above involved a single region with multiple points.

The only exception is:
Points 1 & 3 -> MRF = 389.86
This involves individual points in two different regions. We find a similar situation when three regions are defined with one point in each region:
Point 1 (region 1, weight 1), Point 2 (region 2, weight 1) & Point 3 (region 3, weight 3) -> MRF = 326.16 (Total weight 1)

These have to be evaluated as:
Error Value = ( (w1*e1/s1) + (w2*e2/s2) + ... + (wN*eN/sN) ) / ( (w1/s1) + (w2/s2) + ... + (wN/sN) )
where
wX : weight assigned to interval (region) #X
eX : error sum of squares obtained in interval (region) #X
sX : size of interval (region) #X (in ms)
N: number of intervals (regions)

The Total weight (scale) is merely multiplied to the above Error Value to give the final Error Value of the fitness function. This is useful when we have multiple generators and want to adjust the relative contribution (weightage) of each of the generators to the overall MRF optimization.

As an example:
> Region 1 (5 < t < 5.3625) -> s1 = 0.3625
Weight = 1 = w1
Point 1 @ t = 5, Error sum of squares (-64.9492 vs -76.2631) = 128 = e1

> Region 2 (5.3625 < t < 6.675) -> s2 = 1.3125
Weight = 1 = w2
Point 2 @ t = 5.725, Error sum of squares (-64.9492 vs -76.2631) = 7.4921 = e2

> Region 3 (6.675 < t < 7.625) -> s3 = 0.95
Weight = 1 = w3
Point 3 @ t = 7.625, Error sum of squares (-64.9492 vs -76.2631) = 1076.1 = e3

From the forumula, we get, Error Value = 326.15
Total weight (scale) = 1, So the Error Value returned by the generator = 326.15 x 1 = 326.15 ... which matches the earlier mentioned value!

Took me quite a while to figure out that, but once obtained it seemed so straightforward and - should I say - obvious!
shailesh
Posts: 104
Joined: Thu Mar 10, 2011 12:11 am

Re: Regarding fitting of data sampled at irregular intervals

Post by shailesh »

So in context to my first question on this thread:
... does MRF while evaluating the fitness consider only the points provided (loaded from file) or does it implicitly perform a linear interpolation between the data points - and then use all these values for comparison?
I suppose we can summarize that the model trajectory is interpolated to the data trajectory and only the resulting values at the data locations are used in the calculating the error value, i.e. the number of points at which error evaluated = number of points provided in data trajectory. So, yes, it does perform liner interpolation but only if required to evaluate the model values at the desired timestamps. (Not sure if related, but procedure "set_modelx()" in 'e_norm.hoc' appears to perform linear interpolation).

As a further check to the above, if we use adaptive integration to fit the data point:
5.873875 43.9653
The closest points that the model returns are:
5.84661 38.2502
5.90114 36.4325
(I confess that the data point was chosen in retrospect)

The MRF generator returns error value = 43.877
The model does not have a value for t = 5.873875 ms and thus it linearly interpolates between the closest points t1 = 5.84661 ms and t2 = 5.90114, t = (t1+t2)/2. Similary, v = (v1+v2)/2 = (38.2502+36.4325)/2 = 37.34135
Error Sum of Squares (37.34135 vs 43.9653) = 43.8767 ... which matches the error value returned by the MRF generator!

One last question... would you have any tips on digitizing data from figures with view of using for fitting? Any do's and dont's to keep track of? I suppose the one thing that applies always is your advice to "use ones judgment of what constitutes a reasonable fit of model to data".
ted
Site Admin
Posts: 6286
Joined: Wed May 18, 2005 4:50 pm
Location: Yale University School of Medicine
Contact:

Re: Regarding fitting of data sampled at irregular intervals

Post by ted »

would you have any tips on digitizing data from figures with view of using for fitting?
Yes. the original data, if at all possible. They're obligated to preserve it and make it available--or should be according to scientific principles and stated policies of scientific journals and funding agencies.
shailesh
Posts: 104
Joined: Thu Mar 10, 2011 12:11 am

Re: Regarding fitting of data sampled at irregular intervals

Post by shailesh »

Thanks. I agree, it would be best to get the raw data (whenever available) for such purposes.
Post Reply