Regarding fitting of data sampled at irregular intervals
Regarding fitting of data sampled at irregular intervals
I have two queries:
1> If the data (voltage vs time) to be fitted is sampled at irregular intervals, does MRF while evaluating the fitness consider only the points provided (loaded from file) or does it implicitly perform a linear interpolation between the data points - and then use all these values for comparison?
2> I created a model of a cell with HH mechanism. Its gnabar, gkbar, gl, el were tweaked to get a slightly different shape. The voltage vs time plots were recorded for both fixed step as well as adaptive integration. I tried fitting data from these two cases (for a model with default HH) using MRF and I found that the parameter sets turned out to be quite different! I understand that there arises some differences in the plots from the two cases and thus different parameter sets might arise. But considering that the difference is huge, how should we decide which method (fixed/adaptive) when performing MRF? This might take more importance when handling experimental data (such as digitizing data from figures etc)?
1> If the data (voltage vs time) to be fitted is sampled at irregular intervals, does MRF while evaluating the fitness consider only the points provided (loaded from file) or does it implicitly perform a linear interpolation between the data points - and then use all these values for comparison?
2> I created a model of a cell with HH mechanism. Its gnabar, gkbar, gl, el were tweaked to get a slightly different shape. The voltage vs time plots were recorded for both fixed step as well as adaptive integration. I tried fitting data from these two cases (for a model with default HH) using MRF and I found that the parameter sets turned out to be quite different! I understand that there arises some differences in the plots from the two cases and thus different parameter sets might arise. But considering that the difference is huge, how should we decide which method (fixed/adaptive) when performing MRF? This might take more importance when handling experimental data (such as digitizing data from figures etc)?
-
- Site Admin
- Posts: 6384
- Joined: Wed May 18, 2005 4:50 pm
- Location: Yale University School of Medicine
- Contact:
Re: Regarding fitting of data sampled at irregular intervals
Good question. Here's one way to find out:shailesh wrote:If the data (voltage vs time) to be fitted is sampled at irregular intervals, does MRF while evaluating the fitness consider only the points provided (loaded from file) or does it implicitly perform a linear interpolation between the data points - and then use all these values for comparison?
1. creating a toy model (e.g. passive single compartment with known membrane time constant driven by a current step)
2. using that to generate a set of test data that are sampled at irregular intervals (no more than 4 or 5 points--keep it simple)
3. setting up a "run fitness" optimization problem in which g_pas and cm are to be are adjusted so that the simulated v vs. t (generated with fixed dt so that a run involves dozens or hundreds of time steps) matches the (irregularly sampled) time course of v vs. t
4. but before doing the optimization, click on the generator's "Error" button, see what value it reports, and compare that against the sum of squared errors you would expect if the errors were evaluated only at the 4 or 5 sample points.
Multiple interesting questions here. The answer to all of them depends in part on the answer to your first question--it may be as simple as "if the 'experimental data' were sampled at a particular set of times, the model's output must be sampled at exactly the same times." If that is the case, you may want to resample irregularly sampled "experimental data" at a fixed interval, e.g. by using the Vector class's interpolate() method or perhaps by implementing a resampling strategy that uses low order polynomials or splines. Simulation results generated by adaptive integration can be captured at regular intervals by using the Vector class's record(&var, Dt) or record(&var, tvec) syntax (see the Programmer's reference about these features); if you resort to that, you'll have to provide your own error function that makes use of the regularly sampled simulation results.I created a model of a cell with HH mechanism. Its gnabar, gkbar, gl, el were tweaked to get a slightly different shape. The voltage vs time plots were recorded for both fixed step as well as adaptive integration. I tried fitting data from these two cases (for a model with default HH) using MRF and I found that the parameter sets turned out to be quite different! I understand that there arises some differences in the plots from the two cases and thus different parameter sets might arise. But considering that the difference is huge, how should we decide which method (fixed/adaptive) when performing MRF? This might take more importance when handling experimental data (such as digitizing data from figures etc)?
Re: Regarding fitting of data sampled at irregular intervals
Regarding the first question:
> I followed your suggestion and set up a similar toy model (preferred 'hh' instead of 'pas' as I had it ready). The results were interesting but left me with further queries!
The data to be fitted was a shape of an AP (with tweaked values of HH parameters) with just 3 points (just before onset, peak and after-hyperpolarization). The model (with default HH parameters) was set to run using fixed step integration. On clicking "Error Value", the following was observed:
- The voltage vs time graph plotted the entire continuous waveform for the simulated run
- The MRF Generator window originally had just a red plot (with three points) showing the data to be fitted (loaded from file). After the run, it plotted a waveform in black with just three points - and these points corresponded with the (voltage,time) value pairs on the voltage graph.
- Error Vaue was shown = 403.87
- But when I manually calculate the sum of squared errors, I get 1211.6121
> I followed your suggestion and set up a similar toy model (preferred 'hh' instead of 'pas' as I had it ready). The results were interesting but left me with further queries!
The data to be fitted was a shape of an AP (with tweaked values of HH parameters) with just 3 points (just before onset, peak and after-hyperpolarization). The model (with default HH parameters) was set to run using fixed step integration. On clicking "Error Value", the following was observed:
- The voltage vs time graph plotted the entire continuous waveform for the simulated run
- The MRF Generator window originally had just a red plot (with three points) showing the data to be fitted (loaded from file). After the run, it plotted a waveform in black with just three points - and these points corresponded with the (voltage,time) value pairs on the voltage graph.
- Error Vaue was shown = 403.87
- But when I manually calculate the sum of squared errors, I get 1211.6121
I am sure that if anything like interpolation was being done then certainly there would be more points and thus a higher Error Value (more terms in sum of squared errors). So I am left wondering how MRF arrived at the value of 403.87?! Quite sure I haven't goofed up, but one can never be certain I suppose.Points (t, vm) from File:
5 -76.2631
5.725 43.6974
7.625 -76.8252
Points (t, vm) from Simulation
5 -64.9492
5.725 40.9602
7.625 -44.021
-
- Site Admin
- Posts: 6384
- Joined: Wed May 18, 2005 4:50 pm
- Location: Yale University School of Medicine
- Contact:
Re: Regarding fitting of data sampled at irregular intervals
Interesting. The three points would be connected by straight line segments, right? Try limiting the interval over which the generator evaluates the error. Try to isolate a region that lies _between_ two of your "original data" points (if you're not sure how, see this part of the MRF tutorial http://www.neuron.yale.edu/neuron/stati ... imize.html), then evaluate the error over this region. If you get a value other than 0, then maybe the data are being interpolated. I wonder if you can reduce the interval width to where it contains only about 2 or 3 solution points, which would allow you to make a quick "manual" calculation that confirms or rules out interpolation.
Re: Regarding fitting of data sampled at irregular intervals
Yes, the three points are connected by straight line segments.
I tried what you suggested about restricting regions. Firstly, I found that it was not possible to position the blue lines such that they were between two of our points of interest. I even tried using the weight panel and entering the startpoint/endpoint to achieve the same. They would either leave atleast one point inbetween or snap on to each other.
So I continued testing for restricted regions that it allowed. For all three points individually it worked fine with the MRF Error Values being same as the calculated values:
Point 1: 128
Point 2: 7.4921
Point 3: 1076.1
All is well. In these cases, the model does not plot anything on the MRF Generator graph as only point is under consideration.
But now when I try including multiple points, the trouble arises.
Points 1 & 2 -> MRF = 67.748 vs Calculated = 135.4921
Points 2 & 3 -> MRF = 541.8 vs Calculated = 1083.5921
Points 1 & 3 -> MRF = 389.86 vs Calculated = 1204.1
Points 1, 2 & 3 -> MRF = 403.87 vs Calculated = 1211.5921
(approx values). In these cases we do have straight line segments joining the concerned points.
It is weird that the sum of squared errors is less for all 3 points together than some combinations of just two points. Not sure what is happening...
I tried what you suggested about restricting regions. Firstly, I found that it was not possible to position the blue lines such that they were between two of our points of interest. I even tried using the weight panel and entering the startpoint/endpoint to achieve the same. They would either leave atleast one point inbetween or snap on to each other.
So I continued testing for restricted regions that it allowed. For all three points individually it worked fine with the MRF Error Values being same as the calculated values:
Point 1: 128
Point 2: 7.4921
Point 3: 1076.1
All is well. In these cases, the model does not plot anything on the MRF Generator graph as only point is under consideration.
But now when I try including multiple points, the trouble arises.
Points 1 & 2 -> MRF = 67.748 vs Calculated = 135.4921
Points 2 & 3 -> MRF = 541.8 vs Calculated = 1083.5921
Points 1 & 3 -> MRF = 389.86 vs Calculated = 1204.1
Points 1, 2 & 3 -> MRF = 403.87 vs Calculated = 1211.5921
(approx values). In these cases we do have straight line segments joining the concerned points.
It is weird that the sum of squared errors is less for all 3 points together than some combinations of just two points. Not sure what is happening...
Last edited by shailesh on Wed Jun 18, 2014 10:36 am, edited 1 time in total.
Re: Regarding fitting of data sampled at irregular intervals
Since you mention "regions" I infer that we are talking about nrn/lib/hoc/mulfit/e_norm.hoc with the comment at the beginning of:
In func efun() in that file, for model and data that are on independent non-uniform grids, the calculation used is
Here, $o1 is the model trajectory y values and $o2 is the model trajectory t values.
ydat and xdat are the data y and t values. So the model trajectory is interpolated to the data trajectory and only the resulting values at the data locations are used in the "meansqerr" calculation which
is defined as "return value is sum of w*(v1 - v2)^2 / size".
dw_ depends on the region sizes and intervals between data points and is implemented in the above file in the set_w() procedure. That is certainly complicated and the implementation goal is to make
the first comment of this reply, true. Let's leave this as an open question in lieu of further code review and testing and see if the above is sufficient for you to resolve your test result differences.
I should mention that I believe the entire notion of mean square model-data trajectory difference as a fitness function can certainly be criticised, especially for action potentials. To me, there seem
to be two criteria for a fitness function for the praxis method, 1) From a reasonable starting set of parameters, there must be a path to the minimum which is all downhill. 2) the minimum is
meaningful in terms of ones judgment of what constitutes a reasonable fit of model to data.
Code: Select all
error is weighted sum of normalized error in each region. Region i
has weight[i] and domain boundary[i] < x < boundary[i+1].
Normalized region error is an approximation to the integral of
(y(x) - ydat(x))^2 over the integral of x.
Code: Select all
e = ydat_.meansqerr($o1.interpolate(xdat_,$o2), dw_)
ydat and xdat are the data y and t values. So the model trajectory is interpolated to the data trajectory and only the resulting values at the data locations are used in the "meansqerr" calculation which
is defined as "return value is sum of w*(v1 - v2)^2 / size".
dw_ depends on the region sizes and intervals between data points and is implemented in the above file in the set_w() procedure. That is certainly complicated and the implementation goal is to make
the first comment of this reply, true. Let's leave this as an open question in lieu of further code review and testing and see if the above is sufficient for you to resolve your test result differences.
I should mention that I believe the entire notion of mean square model-data trajectory difference as a fitness function can certainly be criticised, especially for action potentials. To me, there seem
to be two criteria for a fitness function for the praxis method, 1) From a reasonable starting set of parameters, there must be a path to the minimum which is all downhill. 2) the minimum is
meaningful in terms of ones judgment of what constitutes a reasonable fit of model to data.
Re: Regarding fitting of data sampled at irregular intervals
Sorry, I wasn't sure what you meant by:
Was it regarding my first question on this thread:
and so wanted to clarify.... to make the first comment of this reply, true.
Was it regarding my first question on this thread:
and that, yes, MRF does indeed does interpolate between provided points to evaluate the error value?1> If the data (voltage vs time) to be fitted is sampled at irregular intervals, does MRF while evaluating the fitness consider only the points provided (loaded from file) or does it implicitly perform a linear interpolation between the data points - and then use all these values for comparison?
Re: Regarding fitting of data sampled at irregular intervals
I was referring to my comment:
I have been looking at the implementation of the set_w() procedure and it appears that the comment is, in fact, false. From the point of view of integration, the implementation implicitly assumes the data in each region is at uniform
intervals since the weight of each data point within each region is constant. Each y(x) - ydat(x) at the x data values is given a weight proportional to the weight of the region it is in.
Code: Select all
error is weighted sum of normalized error in each region. Region i
has weight[i] and domain boundary[i] < x < boundary[i+1].
Normalized region error is an approximation to the integral of
(y(x) - ydat(x))^2 over the integral of x.
intervals since the weight of each data point within each region is constant. Each y(x) - ydat(x) at the x data values is given a weight proportional to the weight of the region it is in.
Re: Regarding fitting of data sampled at irregular intervals
Based on your comments:
I had missed the 'size' earlier, and incorporating that in calculating the error sum of squares I got:
The only exception is:
Point 1 (region 1, weight 1), Point 2 (region 2, weight 1) & Point 3 (region 3, weight 3) -> MRF = 326.16 (Total weight 1)
These have to be evaluated as:
Error Value = ( (w1*e1/s1) + (w2*e2/s2) + ... + (wN*eN/sN) ) / ( (w1/s1) + (w2/s2) + ... + (wN/sN) )
where
wX : weight assigned to interval (region) #X
eX : error sum of squares obtained in interval (region) #X
sX : size of interval (region) #X (in ms)
N: number of intervals (regions)
The Total weight (scale) is merely multiplied to the above Error Value to give the final Error Value of the fitness function. This is useful when we have multiple generators and want to adjust the relative contribution (weightage) of each of the generators to the overall MRF optimization.
As an example:
> Region 1 (5 < t < 5.3625) -> s1 = 0.3625
Weight = 1 = w1
Point 1 @ t = 5, Error sum of squares (-64.9492 vs -76.2631) = 128 = e1
> Region 2 (5.3625 < t < 6.675) -> s2 = 1.3125
Weight = 1 = w2
Point 2 @ t = 5.725, Error sum of squares (-64.9492 vs -76.2631) = 7.4921 = e2
> Region 3 (6.675 < t < 7.625) -> s3 = 0.95
Weight = 1 = w3
Point 3 @ t = 7.625, Error sum of squares (-64.9492 vs -76.2631) = 1076.1 = e3
From the forumula, we get, Error Value = 326.15
Total weight (scale) = 1, So the Error Value returned by the generator = 326.15 x 1 = 326.15 ... which matches the earlier mentioned value!
Took me quite a while to figure out that, but once obtained it seemed so straightforward and - should I say - obvious!
andthe model trajectory is interpolated to the data trajectory and only the resulting values at the data locations are used in the "meansqerr" calculation which is defined as "return value is sum of w*(v1 - v2)^2 / size".
I took a second look at the values I posted earlier:error is weighted sum of normalized error in each region
Point 1: 128 (same as calculated)
Point 2: 7.4921 (same as calculated)
Point 3: 1076.1 (same as calculated)
Points 1 & 2 -> MRF = 67.748 vs Calculated = 135.4921
Points 2 & 3 -> MRF = 541.8 vs Calculated = 1083.5921
Points 1 & 3 -> MRF = 389.86 vs Calculated = 1204.1
Points 1, 2 & 3 -> MRF = 403.87 vs Calculated = 1211.5921
I had missed the 'size' earlier, and incorporating that in calculating the error sum of squares I got:
... and the values match! It should be noted that all the above involved a single region with multiple points.Points 1 & 2 -> MRF = 67.748 vs Calculated = 135.4921/2 = 67.746
Points 2 & 3 -> MRF = 541.8 vs Calculated = 1083.5921/2 = 541.796
Points 1, 2 & 3 -> MRF = 403.87 vs Calculated = 1211.5921/3 = 403.864
The only exception is:
This involves individual points in two different regions. We find a similar situation when three regions are defined with one point in each region:Points 1 & 3 -> MRF = 389.86
Point 1 (region 1, weight 1), Point 2 (region 2, weight 1) & Point 3 (region 3, weight 3) -> MRF = 326.16 (Total weight 1)
These have to be evaluated as:
Error Value = ( (w1*e1/s1) + (w2*e2/s2) + ... + (wN*eN/sN) ) / ( (w1/s1) + (w2/s2) + ... + (wN/sN) )
where
wX : weight assigned to interval (region) #X
eX : error sum of squares obtained in interval (region) #X
sX : size of interval (region) #X (in ms)
N: number of intervals (regions)
The Total weight (scale) is merely multiplied to the above Error Value to give the final Error Value of the fitness function. This is useful when we have multiple generators and want to adjust the relative contribution (weightage) of each of the generators to the overall MRF optimization.
As an example:
> Region 1 (5 < t < 5.3625) -> s1 = 0.3625
Weight = 1 = w1
Point 1 @ t = 5, Error sum of squares (-64.9492 vs -76.2631) = 128 = e1
> Region 2 (5.3625 < t < 6.675) -> s2 = 1.3125
Weight = 1 = w2
Point 2 @ t = 5.725, Error sum of squares (-64.9492 vs -76.2631) = 7.4921 = e2
> Region 3 (6.675 < t < 7.625) -> s3 = 0.95
Weight = 1 = w3
Point 3 @ t = 7.625, Error sum of squares (-64.9492 vs -76.2631) = 1076.1 = e3
From the forumula, we get, Error Value = 326.15
Total weight (scale) = 1, So the Error Value returned by the generator = 326.15 x 1 = 326.15 ... which matches the earlier mentioned value!
Took me quite a while to figure out that, but once obtained it seemed so straightforward and - should I say - obvious!
Re: Regarding fitting of data sampled at irregular intervals
So in context to my first question on this thread:
As a further check to the above, if we use adaptive integration to fit the data point:
The MRF generator returns error value = 43.877
The model does not have a value for t = 5.873875 ms and thus it linearly interpolates between the closest points t1 = 5.84661 ms and t2 = 5.90114, t = (t1+t2)/2. Similary, v = (v1+v2)/2 = (38.2502+36.4325)/2 = 37.34135
Error Sum of Squares (37.34135 vs 43.9653) = 43.8767 ... which matches the error value returned by the MRF generator!
One last question... would you have any tips on digitizing data from figures with view of using for fitting? Any do's and dont's to keep track of? I suppose the one thing that applies always is your advice to "use ones judgment of what constitutes a reasonable fit of model to data".
I suppose we can summarize that the model trajectory is interpolated to the data trajectory and only the resulting values at the data locations are used in the calculating the error value, i.e. the number of points at which error evaluated = number of points provided in data trajectory. So, yes, it does perform liner interpolation but only if required to evaluate the model values at the desired timestamps. (Not sure if related, but procedure "set_modelx()" in 'e_norm.hoc' appears to perform linear interpolation).... does MRF while evaluating the fitness consider only the points provided (loaded from file) or does it implicitly perform a linear interpolation between the data points - and then use all these values for comparison?
As a further check to the above, if we use adaptive integration to fit the data point:
The closest points that the model returns are:5.873875 43.9653
(I confess that the data point was chosen in retrospect)5.84661 38.2502
5.90114 36.4325
The MRF generator returns error value = 43.877
The model does not have a value for t = 5.873875 ms and thus it linearly interpolates between the closest points t1 = 5.84661 ms and t2 = 5.90114, t = (t1+t2)/2. Similary, v = (v1+v2)/2 = (38.2502+36.4325)/2 = 37.34135
Error Sum of Squares (37.34135 vs 43.9653) = 43.8767 ... which matches the error value returned by the MRF generator!
One last question... would you have any tips on digitizing data from figures with view of using for fitting? Any do's and dont's to keep track of? I suppose the one thing that applies always is your advice to "use ones judgment of what constitutes a reasonable fit of model to data".
-
- Site Admin
- Posts: 6384
- Joined: Wed May 18, 2005 4:50 pm
- Location: Yale University School of Medicine
- Contact:
Re: Regarding fitting of data sampled at irregular intervals
Yes. the original data, if at all possible. They're obligated to preserve it and make it available--or should be according to scientific principles and stated policies of scientific journals and funding agencies.would you have any tips on digitizing data from figures with view of using for fitting?
Re: Regarding fitting of data sampled at irregular intervals
Thanks. I agree, it would be best to get the raw data (whenever available) for such purposes.