module Bfgs:BFGS (Broyden-Fletcher-Goldfarb-Shanno) quasi-Newton local optimization method.sig
..end
The pure Newton method uses a local quadratic approximation of the
objective function f
being minimized:
f(x)=f(x0)+ t(x-x0)G + t(x-x0)H(x-x0)
where G
and H
are respectively the gradient vector and the hessian
matrix evaluated at x0
, and t
is the transpose operator.
When this approximation is valid, and when the function is locally convex,
the minimum is found at x
when G + H(x-x0) = 0
. This gives us the
Newton step x= x0 - inv(H)G
, where inv(H)
is the inverse hessian.
Quasi-Newton methods rely on an iterative approximation of the inverse
hessian, usually starting with the identity matrix as an initial
approximation (which means that we follow the gradient descent direction).
At each step k
of the algorithm, the function is minimized along the
search direction.
Let us call lambda
the step size along the quasi-Newton direction,
and h(k)
the approximation of the INVERSE hessian at point x(k)
.
The algorithm is the following. At each step k
:
1) minimize the function along the quasi-Newton direction. The line search
method minimizes f(x(k)-lambda.h(k)G)
as a function of lambda
.
2) make a step in the quasi-Newton direction. If the minimum along
that direction is found at lambda_k
, the quasi-Newton step is then
x(k+1)= x(k) - lambda_k.h(k)G
3) compute a new approximation h(k+1)
of the inverse hessian.
Several expressions can be found for this update. We used the
analytical expression:
h(k+1)= t(I-yk.t(sk)/t(yk).sk).h(k).(I-yk.t(sk)/t(yk).sk) - sk.t(sk)/t(yk).sk
where yk= G(k+1) - G(k)
and sk= x(k+1) - x(k)
The search stops when the value of the objective function falls beyond and absolute tolerance, or when it cannot be improved anymore.
Now that we have presented the main features of the algorithm, let us talk about a few additional subtleties. Three line search methods have been implemented: backracking, golden section search, and Brent's parabolic interpolation.
The backtracking starts with an initial step (default size is 1) along
the quasi-Newton direction.
val verbose : bool ref
val logpoints : string ref
type
linesearch_method =
| |
Backtracking |
(* | RECOMMENDED. Line search with a backtracking method, starting
with an initial step (default 1) in a descent direction, and
reducing the step until the objective function is sufficiently
improved (Armijo-Goldstein condition to ensure convergence). | *) |
| |
Golden |
(* | Line search using golden section method | *) |
| |
Brent |
val linesearch_method : linesearch_method ref
bt_acctol
, bt_reduc
, and bt_epsilon
are parameters for the
backtracking method. Backtracking constists in trying a candidate step
along the search direction. If the objective function is sufficiently
decreased, then the step is accepted. The decrease must be at least equal
to !bt_acctol
multiplied by the norm of the gradient projection along the
search direction (simplified Armijo-Goldstein condition).
Otherwise, if the current step is not accepted, a new step is tried
by multiplying the current step by a reduction factor !bt_reduc
.
The search stops when the step size falls under !bt_epsilon
.
ls_abstol
, ls_reltol
, and ls_max_iter
are
parameters for golden section search and Brent's parabolic interpolation.
For these methods, the search stops when the range of the search interval
falls under an absolute tolerance !ls_abstol
, or when it cannot be
significantly reduced (relative tolerance !ls_reltol
). If none of these
criteria is met, the search stops after !ls_max_iter
iterations.
val bt_acctol : float ref
val bt_reduc : float ref
val bt_epsilon : float ref
!bt_epsilon
.val ls_max_iter : int ref
val ls_abstol : float ref
val ls_reltol : float ref
val step : float ref
val bfgs : (float array -> float) ->
(float array -> float array) ->
float array ->
int ->
float ->
float -> int -> (int -> bool) -> int * float array * float * float array
bfgs f g xinit max_iter abstol reltol freq_verbose debug
returns a tuple (i,x,gx,hx)
where i
is the iteration at which the
BFGS search stopped, x
is the best local minimum found for the function
f
, gx
is the value of the gradient g
at this point, and hx
is
the approximation of the inverse hessian at the minimum.
The search of a minimum stops when the maximum number of iterations
is reached or when two successive values f1
and f2
of the objective
function are closer than the absolute tolerance abstol
, or when the
relative difference between the two values falls under the relative
tolerance reltol
.
The exact formulation of the two last conditions is:
|f2-f1|< abstol
or |f2-f1|< reltol*(|f1|+reltol)
.
We add reltol
to |f1|
to take account of the case when
f1
is zero).
xinit
is the initial point at which the search starts,
freq_verbose
is the frequency at which some information is printed
on the standard output.
debug
is a function of the iteration number i
. Outputs for debugging
are printed when debug i
is true.