Wednesday, 24 June 2015

Progress Update: Midterm Evaluation


Objective:
Adding functions to the Optim package for Octave using existing back-ends.

Expected deliverables before midterm:
  • 'lsqnonlin' using 'nonlin_residmin'
Done in [1]. Differences in backends, nonlin_residmin uses "lm_svd" algorithm for optimization as currently the only backend. However, lsqnonlin in Matlab can choose between "trust-region-reflective" and "Levenberg-Marquardt" (LM) algorithms.
Another difference is in complex inputs. lm_svd does not support complex valued inputs whereas Matlab's LM algorithm can accept complex input arguments. One way of providing complex inputs to lsqnonlin in Octave is to split the real and imaginary parts into separate variables and running the optimization.  
  • 'lsqcurvefit' using 'nonlin_curvefit', 'nonlin_residmin', or 'lsqnonlin'
Done in [2] using nonlin_curvefit. lsqcurvefit is very similar to lsqnonlin with only a few minor interface differences. lsqcurvefit explicitly takes independent variables and the observations as inputs while these values can be wrapped inside the objective function while using lsqnonlin. Additional bounds for the optimized parameters can be specified. 
  • 'nlinfit' using 'leasqr',
I wrapped nlinfit on nonlin_curvefit and curvefit_stat as leasqr repeats the optimization to compute the additional statistics (Jacobian and Covariance matrix) while curvefit_stat saves this computation overhead. I have partially implemented nlinfit in [3] (It hasn't been thoroughly reviewed yet). Two missing features are: 1) Error models and Error parameters estimation, and 2) Robust Weight function. Meanwhile, no such functionality exists in the Octave's optimization backends for the missing features. My current implementation supports the input of scalar positive array of weights for robust regression.  
Since nlinfit is from the statistics toolbox of Matlab, it uses statset and statget to create and get options respectively. I created additional functions statset, statget and __all_stat_opts__ with minor changes to the code in optimset, optimget and __all_opts__.
  • 'fmincon' using 'nonlin_min',
In progress [4].


Future goals:
    1. Complete fmincon implementation.
    2. Create solver specific options using optimoptions and desirably still be able to use optimget. 
    3. Arranging lambda output for quadprog by wrapping it on __qp__ instead of qp.m
    4. Test cases for all the implemented functions.

[1] https://github.com/AsmaAfzal/octave_workspace/blob/master/lsqnonlin.m
[2] https://github.com/AsmaAfzal/octave_workspace/blob/master/lsqcurvefit.m
[3] https://github.com/AsmaAfzal/octave_workspace/blob/master/nlinfit.m
[4] https://github.com/AsmaAfzal/octave_workspace/blob/master/fmincon.m

Monday, 22 June 2015

Week4: fmincon wrapping nonlin_min

Time flies.. A third of the way through already..

fmincon/nonlin_min is the most elaborate function of all that I have previously 
implemented so before actual coding I would like to thoroughly check the the 
mapping of arguments and options. 

[x,fval,exitflag,output,lambda,grad,hessian] = fmincon (fun,x0,A,b,Aeq,beq,lb,ub,nonlcon,options)
[x,fval,exitflag,output] = nonlin_min (fun,x0,settings)

A,b Aeq, beq- Linear inequality and equality constraints
Inequality constraints:
Matlab standard: A * x - b <= 0 ,
Octave standard: m.' * x  + v >= 0. 
This implies: m=-A.' , v=b.
Set in Octave using 
optimset ("inequc", {m,v})
Similar for equality constraints:
optimset ("equc", {m,v})

lb, ub- Lower and upper bounds
Set in Octave using optimset
 optimset ("lbound", ..., "ubound",...)

nonlcon- Nonliner constraint function handle 
In Matlab, nonlinear constraints are given in a function with the following format

function [c,ceq,gradc,gradceq] = mycon(x)
c = ...     % Compute nonlinear inequalities at x.  
ceq = ...   % Compute nonlinear equalities at x.
% Optional output arguments:
gradc = ...   % Gradient of c(x).
gradceq = ...   % Gradient of ceq(x).
options = optimoptions('fmincon','GradConstr','on')

Alternative to nonlcon in nonlin_min
  optimset ("equc", {constraint_function})

Options- Options common to both fmincon and nonlin_min
User-supplied Gradient
In Matlab, the objective function must return the second output when the 
GradObj option is set:
optimset('GradObj','on')
Octave:
optimset("objf_grad",@fun)
User-supplied Hessian
In Matlab, the objective function must return the third output when the Hessian 
option is set:
optimset('Hessian','user-supplied')
Octave:
optimset("objf_hessian",@fun)

Lambda- returned as a structure in Matlab but as a vector field of the "output" 
ouput argument in Octave.

Things to investigate:
1. Algorithm mapping. 
2. Exitflag mapping.
3. Setting gradc (the gradient of general equality/inequality functions). The second 
entry for the equc/inequc setting implements this feature in Octave as stated by 
optim_doc but I was unable to make it work properly.
4. Returning additional statistics (Hessian, Gradient)- I used 
residmin_stat/curvefit_stat previously for lsqnonlin and lsqcurvefit. No such function for nonlin_min.
5. Rearranging values of lambda in the fields of a structure.

P.s. this week I spent quite some time with Mercurial and how to possibly work in the optim package. There was a slight miscommunication/confusion but it is clear now and I will continue to publish my code on github..




Sunday, 14 June 2015

Week3: nlinfit, statset, stat_get and __all_stat_opts__

So this week I achieved the following milestones:

Wrapping nlinfit on nonlin_curvefit

[beta,R,J,CovB,MSE,ErrorModelInfo] = nlinfit(X,Y,modelfun,beta0,options,Name,Value)

Implementation [1]:

I chose not to wrap nlinfit on lsqcurvefit because 
  1. We might end up wrapping lsqcurvefit on lsqnonlin eventually so it is undecided.
  2. The default options for lsqnonlin/lsqcurvefit are different from nlinfit. 
Missing features:
  1. RobustWgtFun - The field RobustWgtFun in options can be provided with a function handle which computes the residuals at every iteration. The backend optimization algorithm in Octave currently does not support this functionality.  
  2. Name-Value pairs. Currently the only implemented one is "weights",  which takes an array of weights for the weighted optimization. "ErrorModelInfo" and "ErrorParameters" are not implemented. The possible error models include, "constant", "proportional" and "combined". The error model also translates to a weight function which helps in reducing the effect of outliers.   
  3. ErrorModelInfo- output field which gives information about the error variance, and estimates the parameters of Error models. 
Setting options using statset

In Matlab, options for statistics toolbox are set using statset [2].  The functionality is almost identical to optimset but the different functions in Matlab are because of different toolboxes (statset for statistics and optimset for optimization toolbox).

Added functions:
statset.m, statget.m and __all_stat_opts__.m [3]-[5]
Creating these functions was pretty straight forward.

Still to do:
  1. Have to check if the weighted residual and weighted Jacobian output in octave is consistent with Matlab and further refine the functions with the feedback from my mentors.
  2. Move on to fmincon wrapping nonlin_min.
[1] https://github.com/AsmaAfzal/octave_workspace/blob/master/nlinfit.m
[2] http://uk.mathworks.com/help/stats/statset.html
[3] https://github.com/AsmaAfzal/octave_workspace/blob/master/statset.m
[4] https://github.com/AsmaAfzal/octave_workspace/blob/master/statget.m
[5] https://github.com/AsmaAfzal/octave_workspace/blob/master/__all_stat_opts__.m

Tuesday, 9 June 2015

Week 2: lsqnonlin and lsqcurvefit

A bit late blogging about week 2. 

Almost completed functions lsqnonlin and lsqcurvefit. 
Successfully mapped user-specified Jacobian. 
In Matlab, if the Jacobian option is set to "on",  the model function must return 
a second output which is the Jacobian function.
In Octave, the Jacobian function handle is given to the dfdp option using optimset.

Lsqnonlin function description:

[x,resnorm,residual,exitflag,output,lambda,jacobian] = ...
lsqnonlin(fun,x0,lb,ub,options)

This function maps on:

[x, residual, exitflag, output] = nonlin_residmin (fun, x0, options)

Features of lsqnonlin in Octave:
  • Input arguments: Acceptable forms are lsqnonlin(fun,x0), lsqnonlin(fun,x0,lb,ub) and lsqnonlin(fun,x0,lb,ub,options)
  • Outputs
    • x, exitflag, residual and output currently same as nonlin_residmin.
    • resnorm=sum(residual.^2)
    • Lambda is computed using the complementary pivoting in __lm_svd__.
      It's values differ from Matlab's due to the difference in backends.
    • Jacobian is computed using the function residmin_stat ().


Lsqcurvefit function description:

[x,resnorm,residual,exitflag,output,lambda,jacobian] = ...
lsqcurvefit(fun,x0,xdata,ydata,lb,ub,options)

This function maps on

[x, fy, exitflag, output] = nonlin_curvefit (fun, x0, xdata, ydata, options)

Features of lsqcurvefit in Octave:
  • Input arguments: Acceptable forms are lsqcurvefit (fun,x0,xdata,ydata), lsqcurvefit (fun,x0,xdata,ydata,lb,ub) and lsqcurvefit (fun,x0,lb,ub,xdata,ydata,options)
  • Outputs
    • x, exitflag, residual and output currently same as nonlin_curvefit.
    • residual = fy-ydata, resnorm = sum(residual.^2)
    • Lambda and Jacobian same as in lsqnonlin.
There are only minor interface differences between lsqcurvefit and lsqnonlin.

This week's plan:

  • Hopefully, with lsqnonlin and lsqcurvefit wrapped up, I'll move on to nlinfit
  • Three key challenges need to be addressed when wrapping nlinfit using nonlin_curvefit
    and curvefit_stat:
    • Weight functions: Currently, no such functionality exists in nonlin_curvefit,
       where a user can specify weight functions to perform Robust regression
      (weights computed using the specified function in every iteration).
    • Error Models and ErrorModelInfo
    • Setting options using statset instead of optimset or optimoptions. 

Sunday, 31 May 2015

optimoptions


Answering some questions to better understand the behavior of optimoptions.
Although the answers have been discussed in [1], I am pasting some examples for clarity.
*I was using the terms solver/algorithm in the wrong context before. Solver names are functions such as lsqnonlin, fmincon, etc. and one solver can have multiple algorithms such as interior-point, lev-mar, etc. *

1) Is there an error or warning if optimoptions is used to assign an option not contained in the specified algorithm? 

If the option belongs to a different algorithm of the solver, Matlab stores it as options "not" used by the current algorithm so if we change the algorithm to which the option belongs, we do not need to set it again. 

For eg. The default algorithm of lsqnonlin is trust-region-reflective but when I try to set the option 'ScaleProblem' which is specific to 'levenberg-marquardt' algorithm, I get: 

opts = optimoptions ('lsqnonlin', 'ScaleProblem', 'Jacobian')

opts = 

  lsqnonlin options:

   Options used by current Algorithm ('trust-region-reflective'):
   (Other available algorithms: 'levenberg-marquardt')

   Set by user:
     No options set by user.

   Default:

 Algorithm: 'trust-region-reflective'
    DerivativeCheck: 'off'
     .               .  
     .               .  
     .               .  
     .               .  
    TolX: 1.0000e-06

   Show options not used by current Algorithm ('trust-region-reflective')

Set by user:
    ScaleProblem: 'jacobian'

This gives the same result using dot notation:

opts = optimoptions ('lsqnonlin')
opts.ScaleProblem = 'Jacobian'

2) ... or to assign an option that does not exist in the specified solver or any solver?

This gives an error. Trying to set the option for SQP Algorithm, which is not used by lsqnonlin:

opts=optimoptions('lsqnonlin', 'MaxSQPIter', 100)
Error using optimoptions
'MaxSQPIter' is not an option for LSQNONLIN

using dot notation:

opts.MaxSQPIter = 100;
No public field MaxSQPIter exists for class optim.options.Lsqnonlin.
 
3) If options are transfered to a different solver with optimoptions, are there errors or warnings if the new solver does not have some of these options?

No errors or warnings. The options common to both solvers are copied. They could be options of different algorithms. For eg. 

opts = optimoptions ('fmincon', 'Algorithm', 'sqp', 'TolX', 1e-10) 
opts_lsq = optimoptions ('lsqnonlin', opts) 

Options set in opts_lsq are: 
                PrecondBandWidth: 0 
                TolX: 1.0000e-10 

The option PrecondBandwidth belongs to the trust-region algorithm of fmincon solver. 
Another option copied from opts in opts_lsq belongs to the lev-mar algorithm of lsqnonlin. It is the stored option as mentioned in 1) 

  ScaleProblem: 'none' 
 
4) Can options returned by optimoptions be used with optimset, and   vice versa? 

This returns an error in both cases:

opts = optimoptions ('lsqnonlin','MaxIter',100);
opts = optimset (opts, 'TolX', 1e-6);
Error using optimset
Cannot use OPTIMSET to alter options created using OPTIMOPTIONS.
Use OPTIMOPTIONS instead.

opts = optimset ('TolX', 1e-6);
opts = optimoptions ('lsqnonlin', opts);
Error using optimoptions
Invalid option name specified. Provide a string (such as 'Display').

References: 
[1] http://octave.1599824.n4.nabble.com/lsqnonlin-and-nonlin-residmin-tp4670540p4670569.html


lsqnonlin wrapping nonlin_residmin

So the first week of GSoC is officially over.

I was working on lsqnonlin. My code is accessible here:

https://github.com/AsmaAfzal/octave_workspace/blob/master/lsqnonlin.m


[x, resnorm, residual, exitflag, output, lambda, jacobian]...
= lsqnonlin (fun, x0, lb, ub, options)

[x, resid, cvg, outp] = nonlin_residmin (f, x0, settings)

A recap of encountered problems:
  1. output lambda- it wasn't previously returned from the backend [1]
  2. options- In addition to optimset, Matlab uses optimoptions to set options specific to a certain optimization function. optimoptions
    • Creates and modifies only the options that apply to a solver
    • Shows your option choices and default values for a specific solver/algorithm
    • Displays links for more information on solver options and other available solver algorithms [2].
    octave currently does not have this functionality.  For more discussion on this, check [3]
Things to do:
  1. check how user-specified jacobian is to be provided.
  2. check for matrix/complex inputs.
  3. come up with a plan for writing optimoptions in octave.
 References:
[1] http://octave.1599824.n4.nabble.com/lsqnonlin-and-nonlin-residmin-tp4670540p4670557.html
[2] http://uk.mathworks.com/help/optim/ug/optimoptions-and-optimset.html
[3] http://octave.1599824.n4.nabble.com/lsqnonlin-and-nonlin-residmin-tp4670540p4670560.html

Saturday, 23 May 2015

Quadratic Programming quadprog, qp

Just a post to review the things I've learned over the past week:

Intuitive explanation of Lagrange multipliers [1]. Given a D-dimensional function to optimize and a set of a few equality constraints.
  • The vector normal to the function should be a scaled version of the vector normal to the constraint function (In other words, both normals are parallel to each other). This scaling factor is the Lagrange multiplier corresponding a particular constraint. 
  • $\nabla f=\lambda_1\nabla g+\lambda_2 \nabla h$, where $g$ and $h$ are two constraint functions.

KKT Conditions: Generalized method of Lagrange multipliers to be applicable on Inequality constraints.  $g_i(x)-b_i\geq 0$

  • Feasibility- $g_i(x^*)-b_i\geq 0$
  • Stationarity- $\nabla f(x^*)-\sum\limits_i\lambda_i^*\nabla g_i(x^*)=0$
  • Complementary Slackness $\lambda_i^*(g_i(x^*)-b_i)=0$
  • Positive Lagrange multipliers $\lambda_i \geq 0, \forall i$

In case we obtain negative Lagrange multiplier, the constraint corresponding to the most negative multiplier is removed and the optimization is performed again until all multipliers are positive.

A bit about active set algorithm [2]:

  • Possibility that none of the constraints are active or may be some are active. We only need to solve for equality constraints that are active at the optimum (binding).
  • When we have an active set $S^*, x^* \in F$, where $F=\{x|A x^*\leq b\}$,$\lambda^* \geq0$, where $\lambda^*$ is the set of Lagrange multipliers for equality constraints $Ax=b$ 
Algorithm:
  • Start at $x=x_0$ and initial active set
  • Calculate $x^*_{EQP}, \lambda_{EQP}^*$ which minimizes the EQP defined by the current active set. Two possible outcomes:
    1. $x^*_{EQP}$ is feasible  ($x_{EQP}^* \in F$). Set $x=x_{EQP}$ and check Lagrange mulitpliers $\lambda_{EQP}^*$. If positive, solution found! Otherwise, remove constraint with $\min(\lambda_{EQP}^*)$ and repeat.
    2. $x^*_{EQP}$is infeasible. We move as far as possible along the line segment from $x_0$ to $x^*_{EQP}$ while staying feasible. Add to $S$ the constraint we encounter that prevents further progress. This is the blocking constraint.

Quadratic programming:

$\min\limits_{x}\frac{1}{2} x^THx + x^Tq$
s.t.
$ A_{eq} x = b_{eq}$
$lb \leq x \leq ub$
$A_{lb} \leq A_{in}x \leq A_{ub}$

What qp.m does:

 [xobjinfolambda] = qp (x0HqAeqbeqlbubA_lbA_inA_ub)     
  • Checks feasibility of initial guess $x_0$
  • Checks size of inputs and that they make sense.
  • Checks if bounds lb,ub too close or A_lb or A_ub too close. If they are very close then the inequality is treated as an equality constraint instead.
  • Checks if any bound is set to Inf or -Inf. qp simply strikes it off.
  • Calls backend solver __qp__ using null space active set algorithm. 
The ordering of lambda

  • quadprog returns Lagrange multipliers in a structure (with fields upper, lower, eqlin, ineqlin) and the multipliers corresponding to the constraints not provided are left empty.  
  • In qp, lambda is a column vector with Lagrange multipliers associated to the constraints in the following order: [equality constraints; lower bounds; upper bounds; other inequality constraints]
  • The length of lambda vector output from qp depends on the number of different constraints provided as  input.
  • Two issues in wrapping qp.m
    1.  the order, i.e. the position of the bounds constraints within the inequality constraints) is not specified by qp.m. The code could change and the ordering too. 
    2. qp.m strips off the INF constraints before calling __qp__ but does not process the lambda (returned by __qp__) accordingly.
    • Solution:
      • If this order is "specified" then we could extract parts of lambda. Patch for Inf checks in qp output lambda will make things easier but is not critical.
References
[1] http://www.slimy.com/~steuard/teaching/tutorials/Lagrange.html
[2] https://www.youtube.com/user/ucmath352/videos