DEBtool Toolbox: lib_regr
Functions for the identification of parameters of user-defined models are parked in this domain. They also quantify the uncertainty in parameter values and show the goodness of fit in graphs. All functions use a Maximum Likelihood criterion for parameter estimation; the underlying assumptions, the algorithm and the number of variables are indicated in the names of the routines.
The string regr
in the function names stands for "regression", and assumes a constant normally distributed scatter around a deterministic user-defined model.
The script-files mydata_regr
and mydata_regr2
show examples of applications of the regression routines.
The string surv
in the function names stands for "survivor", and assumes a multinomially distributed number of surviving individuals.
The first independent variable always has the interpretation "time".
The script-files mydata_surv
and mydata_surv2
show examples of applications of the survivor routines.
The string vc
in the function name
(nrvcregr
,
nmvcregr
,
nmvcregr2
,
pvcregr
,
mrevc
) stands for "variation coefficient".
These functions assume that the scatter is (independently) normally distributed with a variation coefficient (i.e. the ratio of the standard deviation and the mean) that is constant.
The following table gives a survey of available parameter-identification routines:
general data | survival data | ||||
---|---|---|---|---|---|
algorithm | 1 var | 2 vars | 1 var | 2 vars | 3 vars |
newton raphson | nrregr
nrvcregr |
nrregr2 |
scsurv |
scsurv2 |
scsurv3 |
nelder mead | nmregr
nmvcregr |
nmregr2
nmvcregr2 |
nmsurv |
nmsurv2 |
nmsurv3 |
genetic algorithm | garegr |
garegr2 |
gasurv |
gasurv2 |
gasurv3 |
par-covariances | pregr
pvcregr |
pregr2 |
psurv |
psurv2 |
psurv3 |
goodness of fit | ssq |
ssq2 |
dev |
dev2 |
dev3 |
show routines | shregr |
shregr2 |
shsurv |
shsurv2 |
|
example script | mydata_regr |
mydata_regr2 |
mydata_surv |
mydata_surv2 |
mydata_surv3 |
The uni-variate routines can handle multiple samples, each described by its own model, which allows for the implementation of fuzzy constraints on parameter values. Particular parameters of DEB models occur in respiration as well as in growth data; the maximum throughput rate in a chemostat is frequently known, knowledge that can be used in fitting substrate and biomass data by including this function of parameter values as a new data set.
The Newton Raphson method converges typically fast for models that are sufficiently smooth, but it has a small domain of attraction; the initial parameter estimate must be chosen within that domain for convergion. The jacobian (i.e. the matrix of derivatives of the loss function with respect to the parameters) is obtained numerially using a fixed (small) step in the parameter values. The "best" parameter values are typically rather accurate. The method of scores that is used for survival data, is a Newton Raphson method, applied to the expected values, rather than observed one.
The Nelder Mead method converges much slower, but has a much larger domain of attraction. Since it does not use derivatives it typically works well on models that are not smooth. An initial parameter estimate must still be specified in the domain of attraction to the "best" value. The "best" parameter values are typically less accurate, and it is a good idea to run a Newton Raphson method on the result of a Nelder Mead method.
The Genetic Algorithm converges slowly, and uses a stochastic rather than a deterministic algorithm to arrive at the "best" value. It does not require an initial parameter estimate, just a range in which the "best" value must be. The end result is less sensitive to local minima. It stops if a fixed number of steps does not lead to further improvement (which does not garantee that no improvement is possible). Since genetic algorithms maximize functions, the method is run on minus the weighted sum of squared deviations. It is a good idea to run a Newton Raphson method on the result of a genetic algorithm.
All methods suffer from the problem that the result can represent a local rather than a global minimum (in decreasing sensitivity of the three methods).
It is always a good idea to check the result graphically (using the "show" routines that start with the string sh
,
and to use common sense (both for checking the goodness of fit and the parameter values).
Since the user-interface of the methods is rather similar, only the instructions for the Newton Raphson method are more detailed.s
shregr
Plots model predictions and data(optional). Plot options can be set with routineshregr_options
.
Input as in nrregr
.
The plot-range can set optionally in a vector of length 2, with the lower and upper boundaries.
The columns >1 in the parameter matrix are ignored.
If column 2 in the data matrix is not present, model predictions are plotted only.
The colums >2 in the data matrices are ignored.
Example of use, assuming that function_name
, pars
, and data xy
are defined properly:
shregr('function_name', pars, xy)
, or if the user-defined function codes for two data sets, for instance
shregr('function_name', pars, xy1, xy2)
.
shregr_options
Sets options for plotting routines shregr
and shsurv
.
This routine sets default values which you can overwrite, once you have run shregr
.
Input:
- 'default': resets all
shregr_options
to default values;shregr_options('default')
- string, scalar:
- 'dataset': sets data sets that must be plotted; default is all.
In the case of 4 data sets, of instance, you can select number 2 and 4 by
shregr_options('dataset', [2 4])
- 'Range': plot range for the independent variables.
In the case of 4 data sets, of instance, you can set the plot range of data set 3 from 0 to 15 by
shregr_options('Range', 3, [0 15])
- 'xlabel': x-labels for the plots.
In the case of 4 data sets, for instance, you can label the x-axis of data set 2 with 'time, h' by
shregr_options('xlabel', 2, 'time, h')
- 'ylabel': y-labels for the plots, see
xlabel
- 'all_in_one': all selected data sets are plotted in one graph.
The x,y-labels are those of the first selected data set.
The data points are projected on the curves.
Default: multiplot.
Set single plot by
shregr_options('all_in_one', 1)
- 'dataset': sets data sets that must be plotted; default is all.
In the case of 4 data sets, of instance, you can select number 2 and 4 by
nrregr2
Finds parameter values in (non-linear) weighted least-squares regression problems likenrregr
,
but for 2 independent variables, rather than a single one.
So this routine fits a surface, no a curve.
See nmregr2
for the simplex method, and garegr2
for a genetic
algorithm in 2 independent variables.
Options in the numerical procedure can be set by nrregr_options
.
Input:
- string with name of user-defined function.
The user-defined function must take the two column vectors of values for the independent variable and a single vector-argument with parameter values as inputs, and a matrix value of model predictions as output. The size of the output matrix must match the number of values of the independent variable: the first independent variable in the rows, the second one in the columns - (n,1 or 2)-matrix with initial estimates, see
nrregr
- (nx,1)-matrix with values of first independent variable
- (ny,1)-matrix with values of second independent variable
- (nx,ny)-matrix with values of the dependent variable
- (nx,ny)-matrix with values of the weight coefficients. Optional; default is all ones. The weight coefficients need not sum to unity
Output:
- parameter matrix similar to the parameter input matrix, but the initial estimates are replaced by the values the minimize the weighted sum of squared deviations
(see
ssq2
) - scalar, which equals 0 for unsuccessful termination, and 1 for successful termination. If the option report is set to 1 (default), unsuccessful termination is reported
The iteration is terminated if the norm, i.e. the sum of squared derivetives of sum of squared deviations with respect to the iterated parameters,
is less than the maximum norm or if the number of iterations exceeds a maximum values (see nrregr_options
).
For advices see nrregr
.
There is no need to use the values of the second dependent variable in the user defined function.
This allows the implementation of a multi-variable function in compact form, where all clumns are treated as dependent variables that have
the same values for the single independent variable.
Use nrregr
, if the dependent variables have different values for the independent variable.
Both matrices for the independent variables can have more than one column, to pass information about other variables to the user-defined function.
See cnplim
for an application.
Example of use, assuming that 'xvalues', 'yvalues', 'zdata', 'weights', function 'function_name' and initial paramer estimates 'ipars' are properly defined:
pars = nrregr2('function_name', ipars, xvalues, yvalues, zdata, weights)
.
See script file mydata_regr2.m
for an example of specification.
If progression seems hopeful, but the number of iterations not large enough, you can continue with
pars = nrregr2('function_name', pars, xvalues, yvalues, zdata)
.
Alternatively you can increase the maximum number of iterations with nrregr_options
.
nmregr2
Finds parameter values in (non-linear) weighted least-squares regression problems using Nelder Mead's simplex method. The convergence is usually slow, but the domain of attraction is rather large. Seenrregr2
for the definition of the user-defined function and
nmregr
for one independent variable with multiple samples.
Options in the numerical procedure can be set by nmregr_options
.
It is usually a good idea to run nrregr2
on the result of nmregr2
.
nmvcregr2
Finds parameter values in Maximum Likelihood regression problems using Nead Melder's simplex method. The standard deviation is here taken to be proportional to the mean (unlikenmregr2
, where the standard deviation is assumed to be constant).
Options in the numerical procedure can be set by nmregr_options
.
garegr2
Finds parameter values in (non-linear) weighted least-squares regression problems using a genetic algorithm. The (stochastic) convergence is usually slow, but it does not require an initial estimate, only a range. The process starts from an initial set of (randomly generated) trials. Seenrregr2
for the definition
of the user-defined function and garegr
for one independent variable with multiple samples.
Options in the numerical procedure can be set by garegr_options
.
It is usually a good idea to run nrregr2
on the result of garegr2
.
pregr2
Calculates covariance matrix and standard deviations of parameters in regression models, likepregr
, but for 2 independent variables.
Input as in nrregr2
.
Output (4 items):
- covariance matrix of parameter estimates
- correlation matrix of parameter estimates
- vector of standard deviations of parameter estimates
- weighted sum of squared deviations from model predictions
The elements in the covariance and correlation matrices equal zero for parameters that have code 0 in the second row of the parameter input matrix. The values are the maximum likelihood estimates in the case of a identially normally distributed scatter distribution. Therefore, no corrections for bias are made.
Example of use, assuming that function_name
, pars
, xvalues
, yvalues
, zdata
,
and weights
are defined properly:
[cov, cor, sd, ss] = pregr2('function_name', pars, xvalues, yvalues, zdata, weights)
.
ssq2
Calculates weighted sum of squared deviations from model predictions.
Input as in nrregr2
.
Output:
- scalar with sum of squared deviations.
Identical to 4th output of
pregr2
Example of use, assuming that function_name
, pars
, xvalues
, yvalues
, zdata,
and weights
are defined properly:
ssq2('function_name', pars, xvalues, yvalues, zdata, weights)
.
shregr2
Plots model predictions and data (optionally). Options can be set byshregr2_options
.
It allows you to plot a response surface in 3 dimensions, together with your data, or sets of xz-curves or yz-curves.
The data are projected on the surface, and on the curves, to show the difference between the data and model predictions.
These projections help to reveal which data-point relates to which curve.
Inputs as in nrregr2
;
the data matrix is optional.
The ranges of the two independent variables can be set optionally in two 2-vectors with upper and lower values.
Example of use, assuming that function_name
, pars
, xvalues
, yvalues
, and zdata
are defined properly:
shregr2('function_name', pars, xvalues, yvalues, zdata)
or
shregr2 ('function_name', pars, xvalues, yvalues)
.
shregr2_options
Sets options for plotting routine shregr2
.
This routine sets default values which you can overwrite, once you have run shregr2
.
Input:
- 'default': resets all
shregr2_options
to default values;shregr2_options('default')
- string, scalar:
- 'plotnr': selects the plots that must be shown; default is [1 2];
plotnr 1 is z versus x and plotnr 2 is z versus y.
Select plotnr 1 by
shregr2_options('plotnr', 1)
- 'Range': plot range for the independent variables.
Set plot-range for y-values as independent variables in plot number 2 from 0 to 15 by
shregr2_options('Range', 2, [0 15])
- 'xlabel': x-label for the plots.
You can label the x-data with 'time, h' by
shregr2_options('xlabel', 'time, h')
- 'ylabel': y-label for the plots, see 'xlabel'
- 'zlabel': z-label for the plots, see 'xlabel'
- 'all_in_one':
all selected data sets are plotted in one 3 dimensional graph.
The data points are projected on the surface.
Default: multiplot.
Set single plot by
shregr2_options('all_in_one',1)
- 'plotnr': selects the plots that must be shown; default is [1 2];
plotnr 1 is z versus x and plotnr 2 is z versus y.
Select plotnr 1 by
scsurv
Finds maximum likelihood estimates from survival data using the method of scores, with numerically obtained values for the jacobian. It can deal with an arbitrary number of samples, which might share one or more parameters. The convergence is usually fast, but the domain of attraction can be small, depending on data and model. Seenmsurv
for simplex method, and
scsurv2
and
nmsurv2
for 2 independent variables.
Options in the numerical procedure can be set by scsurv_options
.
Input:
- string with name of a user-defined function for the survival model.
The user-defined function must take a single vector-argument with parameter values and one or more data matrices as inputs, and a vector value of model predictions for each data matrix as output. Examples:f = func(p, tn)
or[f1, f2] = func(p, tn1, tn2)
or[f1, f2, f3] = func(p, tn1, tn2, tn3)
. The rows offi
must match those oftni
. The number of samples is arbitrary (in principle) - (k, 1 or 2)-matrix with initial parameter estimates The initial parameter estimates in column 1, and booleans in column 2, where 0 stands for "keeping the corresponding parameter fixed", and 1 for "iterate the corresponding parameter values". If the second column is omitted, values one are assumed (i.e. all parameters are iterated).
- one or more (n,2) matrices with data: time in column 1 and the number of survivors in column 2. The matrices might differ in number of rows between the samples. The procedure uses only the first two columns, and pass the input matrices to the user defined function. This allows the addition of more columns, which might contain data-point specific information, as a kind of fixed parameters
Output:
- parameter matrix similar to the parameter input matrix, but the initial estimates are replaced by the values that maximize the likelihood function
(see
dev
) - scalar, which equals 0 for unsuccessful termination, and 1 for successful termination. If the option report is set to 1 (default), unsuccessful termination is reported
The iteration is terminated if the norm, i.e. the sum of squared derivetives of the deviance with respect to the iterated parameters,
is less than the maximum norm or if the number of iterations exceeds a maximum values (see scsurv_options
).
Example of use, assuming that data
and function function_name
and initial paramer estimates ipars
are properly defined:
pars = scsurv('function_name', ipars, data)
or
pars = scsurv('function_name', ipars, data1, data2)
(or more data sets, depending on the definition of the model functions.).
See script file mydata_surv.m
for an example of specification.
If progression seems hopeful, but the number of iterations not large enough, you can continue with
pars = scsurv('function_name', pars, data)
.
Alternatively you can increase the maximum number of iterations with scsurv_options
.
scsurv_options
Sets options for parameter finding routinescsurv
and scsurv2
and scsurv3
.
These routines set default value which you can overwrite. See nrregr_options
for further explanation.
nmsurv
Finds maximum likelihood estimates from survival data using Nelder Mead's simplex method. The convergence is usually slow, but the domain of attraction is rather large. Seescsurv
for the definition of the user-defined function, and
scsurv2
and
nmsurv2
for 2 independent variables and
scsurv3
and
nmsurv3
for 3 independent variables.
Options in the numerical procedure can be set by nmsurv_options
.
It is usually a good idea to run scsurv
on the result of nmsurv.
nmsurv_options
Sets options for parameter finding routinenmsurv
, nmsurv2
and nmsurv3
.
These routines set default value which you can overwrite.
See nmregr_options
for further explanation.
gasurv
Finds maximum likelihood estimates from survival data using a genetic algorithm. The (stochastic) convergence is usually slow, but it does not require an initial estimate, only a range. The process starts from an initial set of (randomly generated) trials. Seescsurv
for the definition of the user-defined function, and
scsurv2
,
nmsurv2
and
gasurv2
for 2 independent variables and
scsurv3
,
nmsurv3
and
gasurv3
for 3 independent variables.
See
garegr
for a description of the outputs.
Options in the numerical procedure can be set by gasurv_options
.
It is usually a good idea to run scsurv
on the result of gasurv
.
gasurv_options
Sets options for parameter finding routinegasurv
, gasurv2
and gasurv3
.
These routines set default value which you can overwrite.
See garegr_options
for further explanation.
psurv
Calculates covariance matrix and standard deviations of parameters in survivor models.
Input:
user-defined function name, data matrix, and parameter values, as in scsurv
.
Output:
- covariance matrix of parameter estimates
- correlation matrix of parameter estimates
- vector of standard deviations of parameter estimates
- deviance
The elements in the covariance and correlation matrices equal zero for parameters that have code 0 in the second row of the parameter input matrix. The values are the maximum likelihood estimates, therefore, no corrections for bias are made.
Example of use, assuming that function_name
, pars
, and tn1
(and possibly more data matrices) are defined properly:
[cov, cor, sd, ss] = psurv('function_name', pars, tn1, tn2, ...)
.
dev
Calculates the deviance: two times the difference between the log likelihood function and its maximum (which based on the multinomial distribution where the death probabilities correspond with the observed relative frequencies). The maximum likelihood estimates minimize the deviance. Seedev2
for the surface-equivalent.
Input:
user-defined function name, data matrix, and parameter values, as in scsurv
.
Output:
scalar with deviance. Identical with 4th output of psurv
.
Example of use, assuming that function_name
, pars
, and tn1
(and possibly more data matrices) are defined properly:
dev('function_name', pars, tn1, tn2, ...)
.
plsurv
Calculates the profile likelihood function for a parameter that is indicated with code2
in the second column of the parameter matrix.
The other parameters with positive codes are ml-estimated using the simplex method for each evaluation of the target parameter.
See plsurv2
for response surfaces.
Input:
user-defined function name, parameter values, and one or more data matrices as in scsurv
.
The last argument is a 2 or 3 -vector with the lower and upper boundaries of the indicated parameter.
The value of that parameter in the parameter matrix must be within the range.
The optional element 3 specifies the numer of parameter evaluations; default: 100.
Output:
- (99,2)-matrix with parameter values and deviance
- scalar with value 1 for successful convergence at all parameter evalation points; and value 0 with one or more convergence problems
Example of use, assuming that function_name
, pars
, tn1
(and possibly more data matrices) and Range
are defined properly:
plsurv('function_name', pars, tn1, tn2, ..., Range)
.
shsurv
Plots model predictions and data (optional). Plot options can be set with routineshsurv_options
.
Input:
user-defined function name, parameter values, and one or more data matrices as in scsurv
.
The plot-range can set optionally in a vector of length 2, with the lower and upper boundaries.
The columns >1 in the parameter matrix are ignored.
If column 2 in the data matrix is not present, model predictions are plotted only.
The colums >2 in the data matrices are ignored.
Example of use, assuming that function_name
, pars
, and data tn
are defined properly:
shsurv('function_name', pars, tn)
, or if the user-defined function codes for two data sets, for instance
shsurv('function_name', pars, tn1, tn2)
.
Set options for plotting routine shsurv
with shregr_options
.
scsurv2
Finds maximum likelihood estimates from survivor data likescsurv
, but for an additional independent variable,
rather than time only. So this routine fits a surface, no a curve.
Options in the numerical procedure can be set by scsurv_options
.
Input:
- string with the name of a user-defined function for the survival model. The user-defined function must take the two column vectors of values for the independent variable and a single vector-argument with parameter values as inputs, and a matrix value of model predictions as output. The size of the output matrix must match the number of values of the independent variable: the first independent variable (time) in the rows, the second one in the columns
- (k, 1 or 2)-matrix with initial parameter estimates in column 1 and booleans in column 2, where 0 stands for "keeping the corresponding parameter fixed", and 1 for "iterate the corresponding parameter values". If column 2 is omitted, values one are assumed (i.e. all parameters are iterated)
- (nt,1 or more)-matrix with time points
- (ny,1 or more)-matrix with second independent variable The second and other columns can accomodate modifying factors. The extra columns are just passed to the user-defined function, see e.g.
- (nt,ny)-matrix with the number of surviving individual
fomortph
for an example;
notice that the plotting routine shsurv2
then only works properly with option plotnr
is either 1 or 2.
Output:
- parameter matrix similar to the parameter input matrix, but the initial estimates are replaced by the values the that maximize the likelihood function
(see
dev2
) - scalar, which equals 0 for unsuccessful termination, and 1 for successful termination. If the option report is set to 1 (default), unsuccessful termination is reported
The iteration is terminated if the norm, i.e. the sum of squared derivetives of the deviance with respect to the iterated parameters,
is less than the maximum norm or if the number of iterations exceeds a maximum values
(see scsurv_options
).
Example of use, assuming that 'tvalues', 'yvalues', 'numbers', function 'function_name' and initial paramer estimates 'ipars' are properly defined:
pars = scsurv2('function_name', ipars, tvalues, yvalues, numbers)
.
See sample file 'mydata_surv2.m' for an example of specification.
If progression seems hopeful, but the number of iterations not large enough, you can continue with
pars = scsurv2('function_name', pars, tvalues, yvalues, numbers)
.
Alternatively you can increase the maximum number of iterations with scsurv_options
.
nmsurv2
Finds maximum likelihood estimates from survival data using Nelder Mead's simplex method. The convergence is usually slow, but the domain of attraction is rather large. Seescsurv2
for the definition of the user-defined function, and
scsurv
and
nmsurv
for one independent variable (time) with multiple samples.
Options in the numerical procedure can be set by nmsurv_options
.
It is usually a good idea to run scsurv2
on the result of nmsurv2
.
gasurv2
Finds maximum likelihood estimates from survival data using a genetic algorithm. The (stochastic) convergence is usually slow, but it does not require an initial estimate, only a range. The process starts from an initial set of (randomly generated) trials. Seescsurv2
for the definition of the user-defined function, and
scsurv
,
nmscsurv
and
gasurv
for one independent variable (time) with multiple samples.
Options in the numerical procedure can be set by gasurv_options
.
It is usually a good idea to run scsurv2
on the result of gasurv2
.
psurv2
Calculates covariance matrix and standard deviations of parameters in survivor models, likepsurv
, but for 2 independent variables.
Input:
user-defined function name, 2 vectors of independent variables, a matrix with surviving individuals, and parameter values, as in scsurv2
.
Output:
- covariance matrix of parameter estimates
- correlation matrix of parameter estimates
- vector of standard deviations of parameter estimates
- deviance
The elements in the covariance and correlation matrices equal zero for parameters that have code 0 in the second row of the parameter input matrix. The values are the maximum likelihood estimates, therefore, no corrections for bias are made.
Example of use, assuming that function_name
, pars
, tvalues
, yvalues
, numbers,
are defined properly:
[cov, cor, sd, ss] = pregr2('function_name', pars, tvalues, yvalues, numbers)
.
dev2
Calculates deviance, seedev
.
Input:
user-defined function name, 2 vectors of independent variables, a matrix with surviving individuals, and parameter values, as in scsurv2
.
Output:
scalar with the deviance. Identical with 4th output of psurv2
.
Example of use, assuming that function_name
, pars
, tvalues
, yvalues
, numbers
are defined properly:
dev2('function_name', pars, tvalues, yvalues, numbers)
.
plsurv2
Calculates the profile likelihood function for a parameter that is indicated with code2
in the second column of the parameter matrix.
The other parameters with positive codes are ml-estimated using the simplex method for each evaluation of the target parameter.
See plsurv
for the multiple sample case.
Input:
user-defined function name, parameter values, times, second dependent variable, data matrix as in scsurv2
.
The last argument is a 2 or 3 -vector with the lower and upper boundaries of the indicated parameter.
The value of that parameter in the parameter matrix must be within the range.
The optional element 3 specifies the numer of parameter evaluations; default: 100.
Output:
- (99,2)-matrix with parameter values and deviance
- scalar with value 1 for successful convergence at all parameter evalation points; and value 0 with one or more convergence problems
Example of use, assuming that
function_name
,
pars
(parameter matrix),
t
(time-vector),
y
(vector with second dependent variable),
tny
(matrix with numbers of survivors) and
Range
(2-vector with lower and upper boundary for target parameter) are defined properly:
plsurv('function_name', pars, t, y, tny, Range)
.