Sunday, 31 May 2015

optimoptions


Answering some questions to better understand the behavior of optimoptions.
Although the answers have been discussed in [1], I am pasting some examples for clarity.
*I was using the terms solver/algorithm in the wrong context before. Solver names are functions such as lsqnonlin, fmincon, etc. and one solver can have multiple algorithms such as interior-point, lev-mar, etc. *

1) Is there an error or warning if optimoptions is used to assign an option not contained in the specified algorithm? 

If the option belongs to a different algorithm of the solver, Matlab stores it as options "not" used by the current algorithm so if we change the algorithm to which the option belongs, we do not need to set it again. 

For eg. The default algorithm of lsqnonlin is trust-region-reflective but when I try to set the option 'ScaleProblem' which is specific to 'levenberg-marquardt' algorithm, I get: 

opts = optimoptions ('lsqnonlin', 'ScaleProblem', 'Jacobian')

opts = 

  lsqnonlin options:

   Options used by current Algorithm ('trust-region-reflective'):
   (Other available algorithms: 'levenberg-marquardt')

   Set by user:
     No options set by user.

   Default:

 Algorithm: 'trust-region-reflective'
    DerivativeCheck: 'off'
     .               .  
     .               .  
     .               .  
     .               .  
    TolX: 1.0000e-06

   Show options not used by current Algorithm ('trust-region-reflective')

Set by user:
    ScaleProblem: 'jacobian'

This gives the same result using dot notation:

opts = optimoptions ('lsqnonlin')
opts.ScaleProblem = 'Jacobian'

2) ... or to assign an option that does not exist in the specified solver or any solver?

This gives an error. Trying to set the option for SQP Algorithm, which is not used by lsqnonlin:

opts=optimoptions('lsqnonlin', 'MaxSQPIter', 100)
Error using optimoptions
'MaxSQPIter' is not an option for LSQNONLIN

using dot notation:

opts.MaxSQPIter = 100;
No public field MaxSQPIter exists for class optim.options.Lsqnonlin.
 
3) If options are transfered to a different solver with optimoptions, are there errors or warnings if the new solver does not have some of these options?

No errors or warnings. The options common to both solvers are copied. They could be options of different algorithms. For eg. 

opts = optimoptions ('fmincon', 'Algorithm', 'sqp', 'TolX', 1e-10) 
opts_lsq = optimoptions ('lsqnonlin', opts) 

Options set in opts_lsq are: 
                PrecondBandWidth: 0 
                TolX: 1.0000e-10 

The option PrecondBandwidth belongs to the trust-region algorithm of fmincon solver. 
Another option copied from opts in opts_lsq belongs to the lev-mar algorithm of lsqnonlin. It is the stored option as mentioned in 1) 

  ScaleProblem: 'none' 
 
4) Can options returned by optimoptions be used with optimset, and   vice versa? 

This returns an error in both cases:

opts = optimoptions ('lsqnonlin','MaxIter',100);
opts = optimset (opts, 'TolX', 1e-6);
Error using optimset
Cannot use OPTIMSET to alter options created using OPTIMOPTIONS.
Use OPTIMOPTIONS instead.

opts = optimset ('TolX', 1e-6);
opts = optimoptions ('lsqnonlin', opts);
Error using optimoptions
Invalid option name specified. Provide a string (such as 'Display').

References: 
[1] http://octave.1599824.n4.nabble.com/lsqnonlin-and-nonlin-residmin-tp4670540p4670569.html


lsqnonlin wrapping nonlin_residmin

So the first week of GSoC is officially over.

I was working on lsqnonlin. My code is accessible here:

https://github.com/AsmaAfzal/octave_workspace/blob/master/lsqnonlin.m


[x, resnorm, residual, exitflag, output, lambda, jacobian]...
= lsqnonlin (fun, x0, lb, ub, options)

[x, resid, cvg, outp] = nonlin_residmin (f, x0, settings)

A recap of encountered problems:
  1. output lambda- it wasn't previously returned from the backend [1]
  2. options- In addition to optimset, Matlab uses optimoptions to set options specific to a certain optimization function. optimoptions
    • Creates and modifies only the options that apply to a solver
    • Shows your option choices and default values for a specific solver/algorithm
    • Displays links for more information on solver options and other available solver algorithms [2].
    octave currently does not have this functionality.  For more discussion on this, check [3]
Things to do:
  1. check how user-specified jacobian is to be provided.
  2. check for matrix/complex inputs.
  3. come up with a plan for writing optimoptions in octave.
 References:
[1] http://octave.1599824.n4.nabble.com/lsqnonlin-and-nonlin-residmin-tp4670540p4670557.html
[2] http://uk.mathworks.com/help/optim/ug/optimoptions-and-optimset.html
[3] http://octave.1599824.n4.nabble.com/lsqnonlin-and-nonlin-residmin-tp4670540p4670560.html

Saturday, 23 May 2015

Quadratic Programming quadprog, qp

Just a post to review the things I've learned over the past week:

Intuitive explanation of Lagrange multipliers [1]. Given a D-dimensional function to optimize and a set of a few equality constraints.
  • The vector normal to the function should be a scaled version of the vector normal to the constraint function (In other words, both normals are parallel to each other). This scaling factor is the Lagrange multiplier corresponding a particular constraint. 
  • $\nabla f=\lambda_1\nabla g+\lambda_2 \nabla h$, where $g$ and $h$ are two constraint functions.

KKT Conditions: Generalized method of Lagrange multipliers to be applicable on Inequality constraints.  $g_i(x)-b_i\geq 0$

  • Feasibility- $g_i(x^*)-b_i\geq 0$
  • Stationarity- $\nabla f(x^*)-\sum\limits_i\lambda_i^*\nabla g_i(x^*)=0$
  • Complementary Slackness $\lambda_i^*(g_i(x^*)-b_i)=0$
  • Positive Lagrange multipliers $\lambda_i \geq 0, \forall i$

In case we obtain negative Lagrange multiplier, the constraint corresponding to the most negative multiplier is removed and the optimization is performed again until all multipliers are positive.

A bit about active set algorithm [2]:

  • Possibility that none of the constraints are active or may be some are active. We only need to solve for equality constraints that are active at the optimum (binding).
  • When we have an active set $S^*, x^* \in F$, where $F=\{x|A x^*\leq b\}$,$\lambda^* \geq0$, where $\lambda^*$ is the set of Lagrange multipliers for equality constraints $Ax=b$ 
Algorithm:
  • Start at $x=x_0$ and initial active set
  • Calculate $x^*_{EQP}, \lambda_{EQP}^*$ which minimizes the EQP defined by the current active set. Two possible outcomes:
    1. $x^*_{EQP}$ is feasible  ($x_{EQP}^* \in F$). Set $x=x_{EQP}$ and check Lagrange mulitpliers $\lambda_{EQP}^*$. If positive, solution found! Otherwise, remove constraint with $\min(\lambda_{EQP}^*)$ and repeat.
    2. $x^*_{EQP}$is infeasible. We move as far as possible along the line segment from $x_0$ to $x^*_{EQP}$ while staying feasible. Add to $S$ the constraint we encounter that prevents further progress. This is the blocking constraint.

Quadratic programming:

$\min\limits_{x}\frac{1}{2} x^THx + x^Tq$
s.t.
$ A_{eq} x = b_{eq}$
$lb \leq x \leq ub$
$A_{lb} \leq A_{in}x \leq A_{ub}$

What qp.m does:

 [xobjinfolambda] = qp (x0HqAeqbeqlbubA_lbA_inA_ub)     
  • Checks feasibility of initial guess $x_0$
  • Checks size of inputs and that they make sense.
  • Checks if bounds lb,ub too close or A_lb or A_ub too close. If they are very close then the inequality is treated as an equality constraint instead.
  • Checks if any bound is set to Inf or -Inf. qp simply strikes it off.
  • Calls backend solver __qp__ using null space active set algorithm. 
The ordering of lambda

  • quadprog returns Lagrange multipliers in a structure (with fields upper, lower, eqlin, ineqlin) and the multipliers corresponding to the constraints not provided are left empty.  
  • In qp, lambda is a column vector with Lagrange multipliers associated to the constraints in the following order: [equality constraints; lower bounds; upper bounds; other inequality constraints]
  • The length of lambda vector output from qp depends on the number of different constraints provided as  input.
  • Two issues in wrapping qp.m
    1.  the order, i.e. the position of the bounds constraints within the inequality constraints) is not specified by qp.m. The code could change and the ordering too. 
    2. qp.m strips off the INF constraints before calling __qp__ but does not process the lambda (returned by __qp__) accordingly.
    • Solution:
      • If this order is "specified" then we could extract parts of lambda. Patch for Inf checks in qp output lambda will make things easier but is not critical.
References
[1] http://www.slimy.com/~steuard/teaching/tutorials/Lagrange.html
[2] https://www.youtube.com/user/ucmath352/videos

Saturday, 9 May 2015

Nonlinear Regression and 'nlinfit'

In MATLAB, all three fucntions 'lsqnonlin', 'lsqcurvefit' and 'nlinfit' are used to perform non-linear curve fitting.

To better understand the differences and similarities in these functions, consider the model function:
$y= \beta_1+\beta_2  \text{exp}(-\beta_3x)$

We wish to estimate the $\beta=\{\beta_1,\beta_2,\beta_3\}$ for the set of independents {$x_i$} and observed values {$y_i$} such that the model fits the data.

Both 'nlinfit' and 'lsqcurvefit' are very similar as we can pass the regression function to compute the parameters. 'lsqnonlin' on the other hand, solves optimization problems of the type $min_{\beta} \sum_k f_k(\beta)^2$, so we cannot directly specify the regression function and instead, an error function has to be provided.  This is shown in the code below:


modelfun = @(b,x)(b(1)+b(2)*exp(-b(3)*x));
b = [1;3;2]; %actual
x = exprnd(2,100,1); %independents
y = modelfun(b,x) + normrnd(0,0.1,100,1); %noisy observation
beta0 = [2;2;2]; %guess
beta = nlinfit(x,y,modelfun,beta0)
beta = lsqcurvefit(modelfun,beta0,x,y)
beta = lsqnonlin(@(b)err_fun(b,x,y),beta0) %err_fun = modelfun-y

All three functions generate:

beta =

    1.0071
    3.0805
    2.1418

Observations:
  • lsqcurvefit is more superior in the sense that we can define the bounds for the design variable (unlike nlinfit) while inputting the observed values separately (unlike lsqnonlin). 
  • Nlinfit provides extra statistics such as covariance matrix of the fitted coefficients and information about error model.
  • As an alternative to defining weights for the observed values in 'nlinfit', 'RobustWgtFtn' option can choose from different pre-defined weight functions for robust regression (with robust regression, fitting criterion is not as vulnerable to unusual data as least squares weighting function.)


References:

Tuesday, 5 May 2015

Project Goals

Here, I list down the project goals as stated in my Wiki page:

          Start of GSoC (May) 
  1. 'lsqnonlin' using 'nonlin_residmin'
  2. 'lsqcurvefit' using 'nonlin_curvefit', 'nonlin_residmin', or 'lsqnonlin',
  3. 'fmincon' using 'nonlin_min',
  4. 'nlinfit' using 'leasqr',
    Midterm
     
  5. Test cases for the above functions [10] .
  6. Instead of wrappers for top-level functions like qp, call back-end function (__qp__) to be able to extract lambda. See [11].
    Stretch Goals
  7.  Further missing functions in Optim package. See [12] Implement another back-end algorithm/add variant.
 Details

 6.  quadprog and lsqlin should call a private intermediate function instead of qp.m
 This private function should do the argument processing for calling __qp__. It could also be configured to call yet to be written alternatives to __qp__ .
Among other things this should make ordering of the 'lambda' output feasible.

((I have yet to study __qp__ and how this will be done))

GSoC Acceptance :)

I got selected to work on Octave GSoC! ( Yeaa! :) )

Here is the link to my Wiki application:

http://wiki.octave.org/User:Asma

The project officially starts on 25th May, 2015. During this community bonding period, I will try to get a clear breakdown of the project goals with the help of my mentors.