Re: Are there resources explaining Matlab optimization results and displays?



 Science > Physics > Re: Are there resources explaining Matlab optimization results and displays?

LINK TO THIS PAGE  


rating :  0   |  0


  Page 1 of 1

1

 
Topic: Science > Physics
User: "Gino Victor"
Date: 19 Jul 2007 10:42:54 PM
Object: Re: Are there resources explaining Matlab optimization results and displays?
"Peter Spellucci" <spellucci@fb04373.mathematik.tu-darmstadt.de> wrote in
message news:f7o0kl$5g0$1@fb04373.mathematik.tu-darmstadt.de...


In article <f7nisf$9jr$1@news.Stanford.EDU>,
"Gino Victor" <gino_victor@hotmail.com> writes:


"Peter Spellucci" <spellucci@fb04373.mathematik.tu-darmstadt.de> wrote in
message news:f7nhn8$3r3$1@fb04373.mathematik.tu-darmstadt.de...


In article <f7msus$e9u$1@news.Stanford.EDU>,
"Gino Victor" <gino_victor@hotmail.com> writes:

Hi there,

I was wondering if there are resources and articles that explain the
Matlab's optimization results and iteration displays better?

For example, Matlab "fmincon" displays the following during execution,
and
at the end it will terminate with message showing that why it ends.
And
it
will also mention how many constraints are active or inactive, etc.

What do all these stuff mean? I searched Matlab's own documentation
but
couldn't find a good explanation of these terms. Specifically I ask
these
questions because my optimization doesn't yield good results and a lot
of
times the solver seems to terminate very fast after just few function
evaluations, and the "optimal" x values are some boundary values.

I imagine an expert on practical optimization will be able to gain a
lot
of
insights from Matlab's output messages during and at the end of the
iterations... I want to learn such technical skills. Could anybody
give
me
some pointers? Thanks a lot and much appreciated!

--------------------------------------------------------------------

Max Line search Directional
First-order
Iter F-count f(x) constraint steplength derivative
optimality Procedure
0 15 6.21466 1.11e-016



the written documentation for the optimization toolbox contains some
info
on the meaning of this output, but in order to be able to properly
interpret
this you must know a lot about the optimization method behind:
the book "numerical optimization" by Nocedal and Wright is quite
accessible
for nonspecialists and explains many of these algorithms.
in the above case, F-count means the objective has been evaluated 15
times,
the current point satisfies all constraints up to 1.11e-16
(which by far is enough in the sense of the defaults)
, and no step has been done at all
because the Kuhn-Tucker criteria are satisfied sufficiently (in the
menaing
of the code). if the input x is no solution in your sense, there might
be
an effect I have observed quite often in practice: obviously you force
fmincon to compute the gradients by finite differences. (otherwise it
is
hard to
see no step computed but 15 function evaluations spent).
If your function evaluation is insensisitve against very small changes
in
x
(for example a simulation program where output depends not sensitive on
design due to discretization) , and the default variations in the
finite
differencing in fmincon are very small, (you can change this via
optimset),
then the computed finite difference gradient of f appears as (almost)
zero,
the multipliers are set to zero and the problem is solved (in the sense
of
the software)
hth
peter


Thanks a lot Peter. Your explaination makes sense.Do you think

(1) Supplying a gradient myself to Matlab;

supplying the gradient is always useful and you should do that if
it is feasible for you at all,
there are of course occasions there this is impossible
(functions coming from a simulation program for which you have the
binary
only and which doesn't deliver sensitivity information)

and/or
(2) make the step size of finite differencing larger;

will work?

For (2), I am afraid that if I make the step size large, it may not work
for
other part of the function. Let's say some parts of the function is
sensitive to a small variation, some other parts of the function is
sensitive to a large variation. Then there is no uniformly good step
size?

I've read the Nocedal book and other books and have even taken classes on
optimization. But it is just to link the class stuff to the practical
stuff.
It's all about experience and no school can teach that unless there are
TAs
or the professors are good at practical stuff themselves and are willing
to
spend time with students hands-on. But these are not realistic
expectations.
So please shed some lights on how to further enhance my learning about
these
experience stuff -- any books/resources/tutorials at least more
practical?
Thanks again!



using numerical gradients is pretty tricky in real life:
your function will also have some evaluation noise
(for example if you are running a FEM analysis which uses adaptivity
then the output will _not_ behave as a smooth function of the current
design if a change in the design changes the grid)
this noise can completely spoil your information

therefore , if forced to use numerical gradients, you should first
run some experiments with numerical differencing. In the situation you
describe you could use scaling of the variables. one rule of thumb says
that the Hessian should have diagonal entries of approximately equal size
(this applies for uniformly convex linearly constrained problems) but
another rule of thumb proposes to scale the variables to make them all
of approximately equal size. this rule may contradict the first one.
example
f(x) = 0.5*( 0.01*x_1^2 +x_2^2) -b*x_1 - x_2 , b in [0,1000]

matlab gives you the opportunity to provide a "typical X"
which is used for defining such a scaling.
if you have decided on this, then compute your gradient numerically
and check whether
f(x+t*g) = f(x)+t*norm(g)^2 approximately for small t
and play around with the differencing stepsize for this. this might help.

Hi Peter,
I don't understand your two rules of thumbs and your example. Could you
please elaborate a bit more? Thanks!
Perhaps there are books called "practical optimization with Matlab" or
something like that?
.

User: "Peter Spellucci"

Title: Re: Are there resources explaining Matlab optimization results and displays? 20 Jul 2007 01:12:39 PM
In article <f7pb3o$6rm$1@news.Stanford.EDU>,
"Gino Victor" <gino_victor@hotmail.com> writes:


"Peter Spellucci" <spellucci@fb04373.mathematik.tu-darmstadt.de> wrote in
message news:f7o0kl$5g0$1@fb04373.mathematik.tu-darmstadt.de...


In article <f7nisf$9jr$1@news.Stanford.EDU>,
"Gino Victor" <gino_victor@hotmail.com> writes:


"Peter Spellucci" <spellucci@fb04373.mathematik.tu-darmstadt.de> wrote in
message news:f7nhn8$3r3$1@fb04373.mathematik.tu-darmstadt.de...


In article <f7msus$e9u$1@news.Stanford.EDU>,
"Gino Victor" <gino_victor@hotmail.com> writes:

Hi there,

I was wondering if there are resources and articles that explain the
Matlab's optimization results and iteration displays better?

For example, Matlab "fmincon" displays the following during execution,
and
at the end it will terminate with message showing that why it ends.
And
it
will also mention how many constraints are active or inactive, etc.

What do all these stuff mean? I searched Matlab's own documentation
but
couldn't find a good explanation of these terms. Specifically I ask
these
questions because my optimization doesn't yield good results and a lot
of
times the solver seems to terminate very fast after just few function
evaluations, and the "optimal" x values are some boundary values.

I imagine an expert on practical optimization will be able to gain a
lot
of
insights from Matlab's output messages during and at the end of the
iterations... I want to learn such technical skills. Could anybody
give
me
some pointers? Thanks a lot and much appreciated!

--------------------------------------------------------------------

Max Line search Directional
First-order
Iter F-count f(x) constraint steplength derivative
optimality Procedure
0 15 6.21466 1.11e-016



the written documentation for the optimization toolbox contains some
info
on the meaning of this output, but in order to be able to properly
interpret
this you must know a lot about the optimization method behind:
the book "numerical optimization" by Nocedal and Wright is quite
accessible
for nonspecialists and explains many of these algorithms.
in the above case, F-count means the objective has been evaluated 15
times,
the current point satisfies all constraints up to 1.11e-16
(which by far is enough in the sense of the defaults)
, and no step has been done at all
because the Kuhn-Tucker criteria are satisfied sufficiently (in the
menaing
of the code). if the input x is no solution in your sense, there might
be
an effect I have observed quite often in practice: obviously you force
fmincon to compute the gradients by finite differences. (otherwise it
is
hard to
see no step computed but 15 function evaluations spent).
If your function evaluation is insensisitve against very small changes
in
x
(for example a simulation program where output depends not sensitive on
design due to discretization) , and the default variations in the
finite
differencing in fmincon are very small, (you can change this via
optimset),
then the computed finite difference gradient of f appears as (almost)
zero,
the multipliers are set to zero and the problem is solved (in the sense
of
the software)
hth
peter


Thanks a lot Peter. Your explaination makes sense.Do you think

(1) Supplying a gradient myself to Matlab;

supplying the gradient is always useful and you should do that if
it is feasible for you at all,
there are of course occasions there this is impossible
(functions coming from a simulation program for which you have the
binary
only and which doesn't deliver sensitivity information)

and/or
(2) make the step size of finite differencing larger;

will work?

For (2), I am afraid that if I make the step size large, it may not work
for
other part of the function. Let's say some parts of the function is
sensitive to a small variation, some other parts of the function is
sensitive to a large variation. Then there is no uniformly good step
size?

I've read the Nocedal book and other books and have even taken classes on
optimization. But it is just to link the class stuff to the practical
stuff.
It's all about experience and no school can teach that unless there are
TAs
or the professors are good at practical stuff themselves and are willing
to
spend time with students hands-on. But these are not realistic
expectations.
So please shed some lights on how to further enhance my learning about
these
experience stuff -- any books/resources/tutorials at least more
practical?
Thanks again!



using numerical gradients is pretty tricky in real life:
your function will also have some evaluation noise
(for example if you are running a FEM analysis which uses adaptivity
then the output will _not_ behave as a smooth function of the current
design if a change in the design changes the grid)
this noise can completely spoil your information

therefore , if forced to use numerical gradients, you should first
run some experiments with numerical differencing. In the situation you
describe you could use scaling of the variables. one rule of thumb says
that the Hessian should have diagonal entries of approximately equal size
(this applies for uniformly convex linearly constrained problems) but
another rule of thumb proposes to scale the variables to make them all
of approximately equal size. this rule may contradict the first one.
example
f(x) = 0.5*( 0.01*x_1^2 +x_2^2) -b*x_1 - x_2 , b in [0,1000]

matlab gives you the opportunity to provide a "typical X"
which is used for defining such a scaling.
if you have decided on this, then compute your gradient numerically
and check whether
f(x+t*g) = f(x)+t*norm(g)^2 approximately for small t
and play around with the differencing stepsize for this. this might help.


Hi Peter,

I don't understand your two rules of thumbs and your example. Could you
please elaborate a bit more? Thanks!

Perhaps there are books called "practical optimization with Matlab" or
something like that?


these two rules are quite easy to understand (and quite common, compare
the old classic "practical optimization" by Gill and Murray )
in 1969, van der Sluis proved the following:
a symmetric positive definite matrix is optimally scaled with respect
to diagonal scaling
A-> DAD
up to a factor sqrt(n) if its diagonal is a multiple of the unit matrix.
as you will (should) know the conditioning of the Hessian is quite important
for the performance of optimization algorithms and the Hessian of the objective
is also the Hessian of the Lagrangian if all constraints are affine linear.
hence rule of thumb number one:
if you transform
Dz = x
and
ftilde(z)=f(Dz)=f(x)
then
Hessian(ftilde) =D Hessian(f)*D
and of course
z=inv(D)*x
hence you might want to minimize ftilde with a scaling which makes
the diagonal of its Hessian approximately a multiple of the unit matrix.
in the example above
D=diag(0.1,1)
and
ftilde(z)=0.5*(z_1^2+z_2^2) -10*b*z_1-z_2.
here one step of the ordinary gradient minimization does the complete job!
on the other side, imagine a problem where in the original variables the solution
is
(100000,60000,0.1,-0.2).
the initial guess being 50000,30000,0.01,0.
the problem being highly sensitive with respect to x_3 and x_4.
(this is no artifice, many practical problems show such effects)
clearly you must make large moves in x_1 and x_2 , but small moves
in x_3 and x_4. now you cannot hope that search directions generated, either by
quasi-Newton , true Newton , cg or whatever will always reflect these relative
sizes and you may have to make horribly many very small moves to reach the
solution. here it might be much better to introduce
z_1=10^(-6)*x_1,z_2=10^(-5)*x_2, z_3=x_3, z_4=x_4
and exactly this is done if you provide fmincon with typical x =
(1000000,100000,1,1).
lasdon and coworkers performed a study on the benefits of automatic scaling with
no clearcut outcome: sometimes this scaling is better, sometimes the other,
sometimes no scaling at all is the best advice:
Lasdon, L.S.; Beck, P.O.
Scaling nonlinear programs. (English)
[J] Oper. Res. Lett. 1, 6-9 (1981). ISSN 0167-6377
you simply must experiment with your problem in order to see what is best.
hth
peter
.
User: "Gino Victor"

Title: Re: Are there resources explaining Matlab optimization results and displays? 20 Jul 2007 10:49:51 PM
"Peter Spellucci" <spellucci@fb04373.mathematik.tu-darmstadt.de> wrote in
message news:f7qtun$gjd$1@fb04373.mathematik.tu-darmstadt.de...


In article <f7pb3o$6rm$1@news.Stanford.EDU>,
"Gino Victor" <gino_victor@hotmail.com> writes:


"Peter Spellucci" <spellucci@fb04373.mathematik.tu-darmstadt.de> wrote in
message news:f7o0kl$5g0$1@fb04373.mathematik.tu-darmstadt.de...


In article <f7nisf$9jr$1@news.Stanford.EDU>,
"Gino Victor" <gino_victor@hotmail.com> writes:


"Peter Spellucci" <spellucci@fb04373.mathematik.tu-darmstadt.de> wrote
in
message news:f7nhn8$3r3$1@fb04373.mathematik.tu-darmstadt.de...


In article <f7msus$e9u$1@news.Stanford.EDU>,
"Gino Victor" <gino_victor@hotmail.com> writes:

Hi there,

I was wondering if there are resources and articles that explain
the
Matlab's optimization results and iteration displays better?

For example, Matlab "fmincon" displays the following during
execution,
and
at the end it will terminate with message showing that why it ends.
And
it
will also mention how many constraints are active or inactive, etc.

What do all these stuff mean? I searched Matlab's own documentation
but
couldn't find a good explanation of these terms. Specifically I ask
these
questions because my optimization doesn't yield good results and a
lot
of
times the solver seems to terminate very fast after just few
function
evaluations, and the "optimal" x values are some boundary values.

I imagine an expert on practical optimization will be able to gain
a
lot
of
insights from Matlab's output messages during and at the end of the
iterations... I want to learn such technical skills. Could anybody
give
me
some pointers? Thanks a lot and much appreciated!

--------------------------------------------------------------------

Max Line search Directional
First-order
Iter F-count f(x) constraint steplength derivative
optimality Procedure
0 15 6.21466 1.11e-016



the written documentation for the optimization toolbox contains some
info
on the meaning of this output, but in order to be able to properly
interpret
this you must know a lot about the optimization method behind:
the book "numerical optimization" by Nocedal and Wright is quite
accessible
for nonspecialists and explains many of these algorithms.
in the above case, F-count means the objective has been evaluated 15
times,
the current point satisfies all constraints up to 1.11e-16
(which by far is enough in the sense of the defaults)
, and no step has been done at all
because the Kuhn-Tucker criteria are satisfied sufficiently (in the
menaing
of the code). if the input x is no solution in your sense, there
might
be
an effect I have observed quite often in practice: obviously you
force
fmincon to compute the gradients by finite differences. (otherwise
it
is
hard to
see no step computed but 15 function evaluations spent).
If your function evaluation is insensisitve against very small
changes
in
x
(for example a simulation program where output depends not sensitive
on
design due to discretization) , and the default variations in the
finite
differencing in fmincon are very small, (you can change this via
optimset),
then the computed finite difference gradient of f appears as
(almost)
zero,
the multipliers are set to zero and the problem is solved (in the
sense
of
the software)
hth
peter


Thanks a lot Peter. Your explaination makes sense.Do you think

(1) Supplying a gradient myself to Matlab;

supplying the gradient is always useful and you should do that if
it is feasible for you at all,
there are of course occasions there this is impossible
(functions coming from a simulation program for which you have the
binary
only and which doesn't deliver sensitivity information)

and/or
(2) make the step size of finite differencing larger;

will work?

For (2), I am afraid that if I make the step size large, it may not
work
for
other part of the function. Let's say some parts of the function is
sensitive to a small variation, some other parts of the function is
sensitive to a large variation. Then there is no uniformly good step
size?

I've read the Nocedal book and other books and have even taken
classes on
optimization. But it is just to link the class stuff to the practical
stuff.
It's all about experience and no school can teach that unless there
are
TAs
or the professors are good at practical stuff themselves and are
willing
to
spend time with students hands-on. But these are not realistic
expectations.
So please shed some lights on how to further enhance my learning about
these
experience stuff -- any books/resources/tutorials at least more
practical?
Thanks again!



using numerical gradients is pretty tricky in real life:
your function will also have some evaluation noise
(for example if you are running a FEM analysis which uses adaptivity
then the output will _not_ behave as a smooth function of the current
design if a change in the design changes the grid)
this noise can completely spoil your information

therefore , if forced to use numerical gradients, you should first
run some experiments with numerical differencing. In the situation you
describe you could use scaling of the variables. one rule of thumb says
that the Hessian should have diagonal entries of approximately equal
size
(this applies for uniformly convex linearly constrained problems) but
another rule of thumb proposes to scale the variables to make them all
of approximately equal size. this rule may contradict the first one.
example
f(x) = 0.5*( 0.01*x_1^2 +x_2^2) -b*x_1 - x_2 , b in [0,1000]

matlab gives you the opportunity to provide a "typical X"
which is used for defining such a scaling.
if you have decided on this, then compute your gradient numerically
and check whether
f(x+t*g) = f(x)+t*norm(g)^2 approximately for small t
and play around with the differencing stepsize for this. this might
help.


Hi Peter,

I don't understand your two rules of thumbs and your example. Could you
please elaborate a bit more? Thanks!

Perhaps there are books called "practical optimization with Matlab" or
something like that?



these two rules are quite easy to understand (and quite common, compare
the old classic "practical optimization" by Gill and Murray )

in 1969, van der Sluis proved the following:
a symmetric positive definite matrix is optimally scaled with respect
to diagonal scaling
A-> DAD
up to a factor sqrt(n) if its diagonal is a multiple of the unit matrix.
as you will (should) know the conditioning of the Hessian is quite
important
for the performance of optimization algorithms and the Hessian of the
objective
is also the Hessian of the Lagrangian if all constraints are affine
linear.
hence rule of thumb number one:
if you transform
Dz = x
and
ftilde(z)=f(Dz)=f(x)
then
Hessian(ftilde) =D Hessian(f)*D
and of course
z=inv(D)*x
hence you might want to minimize ftilde with a scaling which makes
the diagonal of its Hessian approximately a multiple of the unit matrix.
in the example above
D=diag(0.1,1)
and
ftilde(z)=0.5*(z_1^2+z_2^2) -10*b*z_1-z_2.
here one step of the ordinary gradient minimization does the complete job!

on the other side, imagine a problem where in the original variables the
solution
is
(100000,60000,0.1,-0.2).
the initial guess being 50000,30000,0.01,0.
the problem being highly sensitive with respect to x_3 and x_4.
(this is no artifice, many practical problems show such effects)
clearly you must make large moves in x_1 and x_2 , but small moves
in x_3 and x_4. now you cannot hope that search directions generated,
either by
quasi-Newton , true Newton , cg or whatever will always reflect these
relative
sizes and you may have to make horribly many very small moves to reach the
solution. here it might be much better to introduce
z_1=10^(-6)*x_1,z_2=10^(-5)*x_2, z_3=x_3, z_4=x_4
and exactly this is done if you provide fmincon with typical x =
(1000000,100000,1,1).

lasdon and coworkers performed a study on the benefits of automatic
scaling with
no clearcut outcome: sometimes this scaling is better, sometimes the
other,
sometimes no scaling at all is the best advice:



Lasdon, L.S.; Beck, P.O.
Scaling nonlinear programs. (English)
[J] Oper. Res. Lett. 1, 6-9 (1981). ISSN 0167-6377


you simply must experiment with your problem in order to see what is best.


hth
peter

Have to digest it... Thanks Peter!
.
User: "A.L."

Title: Re: Are there resources explaining Matlab optimization results and displays? 21 Jul 2007 06:40:19 AM
On Fri, 20 Jul 2007 23:49:51 -0400, "Gino Victor"
<gino_victor@hotmail.com> wrote:
[2 pages cut}

you simply must experiment with your problem in order to see what is best.


hth
peter


Have to digest it... Thanks Peter!

Hey! Don't qutoe 2 pages to add one sentence!
A.L.
.

User: "Timo A. Nieminen"

Title: Re: Are there resources explaining Matlab optimization results anddisplays? 21 Jul 2007 02:37:53 AM
On Fri, 20 Jul 2007, Gino Victor wrote:

Peter Spellucci wrote:

you simply must experiment with your problem in order to see what is best.


Have to digest it... Thanks Peter!

The technical content is secondary. The sentence above is primary!
Computational work is an art, not a crank-the-handle method.
People use computational methods which have essentially unknown
convergence, even though it seems to work in many cases.
Computational methods that work analytically fail with finite precision
arithmetic (wrt your problem, this is especially the case with numerical
differentiation).
There are no universal answers, nothing that you can plug an arbitrary set
of linear equations/nonlinear equations/ODEs/PDes into and get answer that
you can have faith in.
I have no specific advice, not having done much in the way of
optimisation, especially using matlab functions. However, I can tell you
that many, many people share your pain and suffering (well, your posts
indicate no shortage of these). Perhaps this will be a consolation, or
perhaps it will feed infuriation about documentation etc. Ah well!
--
Timo Nieminen - Home page: http://www.physics.uq.edu.au/people/nieminen/
E-prints: http://eprint.uq.edu.au/view/person/Nieminen,_Timo_A..html
Shrine to Spirits: http://www.users.bigpond.com/timo_nieminen/spirits.html
.



User: "A.L."

Title: Re: Are there resources explaining Matlab optimization results and displays? 19 Jul 2007 11:10:47 PM
On Thu, 19 Jul 2007 23:42:54 -0400, "Gino Victor"
<gino_victor@hotmail.com> wrote:


Hi Peter,

I don't understand your two rules of thumbs and your example. Could you
please elaborate a bit more? Thanks!

I don't understand these rules either.
Consider function of 2 variables named "Rosenbrock banana valley" that
is standard optimization benchmark
100*(x[2]-x[1]^2)^2+(1-x[1])^2
Plot this function and thing what is the "best" value of increment to
calulate. Plot you can find here
http://en.wikipedia.org/wiki/Rosenbrock_function
This function has deep, narrow, "banana shape" valley with flat
bottom. Gradient optimization gets to the bottom of valley very
quickly, and then progreses along the bottom of the valley. Think what
discretization increment would be needed in various parts of the
valley. Since valley is "banana shape", the best increment will depend
strongly on the point where you are calculating the derivative.
This illustrates that a) finding "universal" step can be hard if not
impossible, b) "best" step can be defined only locally, c) "best" step
changes as optimziation progresses and some adaptive algorithm for
step selection must be implemented.
A.L.
.
User: "Randy Poe"

Title: Re: Are there resources explaining Matlab optimization results and displays? 20 Jul 2007 05:56:28 AM
On Jul 20, 12:10 am, A.L. <f...@2005.com> wrote:

On Thu, 19 Jul 2007 23:42:54 -0400, "Gino Victor"

<gino_vic...@hotmail.com> wrote:

Hi Peter,


I don't understand your two rules of thumbs and your example. Could you
please elaborate a bit more? Thanks!


I don't understand these rules either.

Consider function of 2 variables named "Rosenbrock banana valley" that
is standard optimization benchmark

100*(x[2]-x[1]^2)^2+(1-x[1])^2

Plot this function and thing what is the "best" value of increment to
calulate. Plot you can find here

http://en.wikipedia.org/wiki/Rosenbrock_function

This function has deep, narrow, "banana shape" valley with flat
bottom. Gradient optimization gets to the bottom of valley very
quickly, and then progreses along the bottom of the valley. Think what
discretization increment would be needed in various parts of the
valley. Since valley is "banana shape", the best increment will depend
strongly on the point where you are calculating the derivative.

This illustrates that a) finding "universal" step can be hard if not
impossible, b) "best" step can be defined only locally, c) "best" step
changes as optimziation progresses and some adaptive algorithm for
step selection must be implemented.

Here is a reference to GNU library with an adaptive step size.
http://www.network-theory.co.uk/docs/gslref/NumericalDifferentiationfunctions.html
The derivative is calculated based on 3-point and 5-point
interpolating polynomials, and the difference between them
used for an error estimate. Note that this means at least 5
function evaluations for each derivative estimate.
Note that a central difference is a lot better than a forward
or backward difference. The error term based on f(x+h)-
f(x-h) is much smaller than that for f(x+2h)-f(h), and as I
recall is quadratic rather than linear in h.
- Randy
.
User: "Gino Victor"

Title: Re: Are there resources explaining Matlab optimization results and displays? 20 Jul 2007 06:35:24 AM
"Randy Poe" <poespam-trap@yahoo.com> wrote in message
news:1184928988.394420.56950@w3g2000hsg.googlegroups.com...

On Jul 20, 12:10 am, A.L. <f...@2005.com> wrote:

On Thu, 19 Jul 2007 23:42:54 -0400, "Gino Victor"

<gino_vic...@hotmail.com> wrote:

Hi Peter,


I don't understand your two rules of thumbs and your example. Could you
please elaborate a bit more? Thanks!


I don't understand these rules either.

Consider function of 2 variables named "Rosenbrock banana valley" that
is standard optimization benchmark

100*(x[2]-x[1]^2)^2+(1-x[1])^2

Plot this function and thing what is the "best" value of increment to
calulate. Plot you can find here

http://en.wikipedia.org/wiki/Rosenbrock_function

This function has deep, narrow, "banana shape" valley with flat
bottom. Gradient optimization gets to the bottom of valley very
quickly, and then progreses along the bottom of the valley. Think what
discretization increment would be needed in various parts of the
valley. Since valley is "banana shape", the best increment will depend
strongly on the point where you are calculating the derivative.

This illustrates that a) finding "universal" step can be hard if not
impossible, b) "best" step can be defined only locally, c) "best" step
changes as optimziation progresses and some adaptive algorithm for
step selection must be implemented.


Here is a reference to GNU library with an adaptive step size.

http://www.network-theory.co.uk/docs/gslref/NumericalDifferentiationfunctions.html

The derivative is calculated based on 3-point and 5-point
interpolating polynomials, and the difference between them
used for an error estimate. Note that this means at least 5
function evaluations for each derivative estimate.

Note that a central difference is a lot better than a forward
or backward difference. The error term based on f(x+h)-
f(x-h) is much smaller than that for f(x+2h)-f(h), and as I
recall is quadratic rather than linear in h.

- Randy

Thanks Randy.
Is the task of the finding a good step size adaptively the job of a modern
optimization solver?
Or I should take care of it in my call to solvers such as Matlab's
"fmincon"?
Thanks again!
.
User: "Randy Poe"

Title: Re: Are there resources explaining Matlab optimization results and displays? 20 Jul 2007 08:08:03 AM
On Jul 20, 7:35 am, "Gino Victor" <gino_vic...@hotmail.com> wrote:

"Randy Poe" <poespam-t...@yahoo.com> wrote in message

news:1184928988.394420.56950@w3g2000hsg.googlegroups.com...



On Jul 20, 12:10 am, A.L. <f...@2005.com> wrote:

On Thu, 19 Jul 2007 23:42:54 -0400, "Gino Victor"


<gino_vic...@hotmail.com> wrote:


Hi Peter,


I don't understand your two rules of thumbs and your example. Could you
please elaborate a bit more? Thanks!


I don't understand these rules either.


Consider function of 2 variables named "Rosenbrock banana valley" that
is standard optimization benchmark


100*(x[2]-x[1]^2)^2+(1-x[1])^2


Plot this function and thing what is the "best" value of increment to
calulate. Plot you can find here


http://en.wikipedia.org/wiki/Rosenbrock_function


This function has deep, narrow, "banana shape" valley with flat
bottom. Gradient optimization gets to the bottom of valley very
quickly, and then progreses along the bottom of the valley. Think what
discretization increment would be needed in various parts of the
valley. Since valley is "banana shape", the best increment will depend
strongly on the point where you are calculating the derivative.


This illustrates that a) finding "universal" step can be hard if not
impossible, b) "best" step can be defined only locally, c) "best" step
changes as optimziation progresses and some adaptive algorithm for
step selection must be implemented.


Here is a reference to GNU library with an adaptive step size.


http://www.network-theory.co.uk/docs/gslref/NumericalDifferentiationf...


The derivative is calculated based on 3-point and 5-point
interpolating polynomials, and the difference between them
used for an error estimate. Note that this means at least 5
function evaluations for each derivative estimate.


Note that a central difference is a lot better than a forward
or backward difference. The error term based on f(x+h)-
f(x-h) is much smaller than that for f(x+2h)-f(h), and as I
recall is quadratic rather than linear in h.


- Randy


Thanks Randy.

Is the task of the finding a good step size adaptively the job of a modern
optimization solver?

Or I should take care of it in my call to solvers such as Matlab's
"fmincon"?

We're still talking about the step-size for differentiation,
right? Because there is also the descent step size. Certainly
I would think all modern optimizers use an adaptive line search
procedure of some kind to determine descent step size. But
I'm not sure what FMINCON, for example, does for gradient step
size. If you are concerned, you should provide your own gradient
function rather than let FMINCON estimate it.
- Randy
.
User: "A.L."

Title: Re: Are there resources explaining Matlab optimization results and displays? 20 Jul 2007 08:17:36 AM
On Fri, 20 Jul 2007 06:08:03 -0700, Randy Poe <poespam-trap@yahoo.com>
wrote:


We're still talking about the step-size for differentiation,
right? Because there is also the descent step size. Certainly
I would think all modern optimizers use an adaptive line search
procedure of some kind to determine descent step size. But
I'm not sure what FMINCON, for example, does for gradient step
size.

Read the manual. It says what strategy is used for directional search
and refers to publications that describe this method.
"A line search is performed using a merit function similar to that
proposed by [1] and [2, 3]. The QP subproblem is solved using an
active set strategy similar to that described in [4]. A full
description of this algorithm is found in the "Constrained
Optimization" section of the Introduction to Algorithms chapter of the
toolbox manual...."
1] Han, S.P., "A Globally Convergent Method for Nonlinear
Programming," Journal of Optimization Theory and Applications, Vol.
22, p. 297, 1977.
[2] Powell, M.J.D., "The Convergence of Variable Metric Methods For
Nonlinearly Constrained Optimization Calculations," Nonlinear
Programming 3, (O.L. Mangasarian, R.R. Meyer, and S.M. Robinson, eds.)
Academic Press, 1978.
[3] Powell, M.J.D., "A Fast Algorithm for Nonlineary Constrained
Optimization Calculations," Numerical Analysis, ed. G.A. Watson,
Lecture Notes in Mathematics, Springer Verlag, Vol. 630, 1978.
Try to understand (in-depth) the algorithm used by fmicon. Otherwise
you will be playing computer alchemistry....
A.L.
.
User: "Gino Victor"

Title: Re: Are there resources explaining Matlab optimization results and displays? 20 Jul 2007 09:35:35 PM
"A.L." <fela@2005.com> wrote in message
news:l6d1a35vqgu0j54hv54fp57ok4gtegak4s@4ax.com...

On Fri, 20 Jul 2007 06:08:03 -0700, Randy Poe <poespam-trap@yahoo.com>
wrote:


We're still talking about the step-size for differentiation,
right? Because there is also the descent step size. Certainly
I would think all modern optimizers use an adaptive line search
procedure of some kind to determine descent step size. But
I'm not sure what FMINCON, for example, does for gradient step
size.


Read the manual. It says what strategy is used for directional search
and refers to publications that describe this method.

"A line search is performed using a merit function similar to that
proposed by [1] and [2, 3]. The QP subproblem is solved using an
active set strategy similar to that described in [4]. A full
description of this algorithm is found in the "Constrained
Optimization" section of the Introduction to Algorithms chapter of the
toolbox manual...."

1] Han, S.P., "A Globally Convergent Method for Nonlinear
Programming," Journal of Optimization Theory and Applications, Vol.
22, p. 297, 1977.

[2] Powell, M.J.D., "The Convergence of Variable Metric Methods For
Nonlinearly Constrained Optimization Calculations," Nonlinear
Programming 3, (O.L. Mangasarian, R.R. Meyer, and S.M. Robinson, eds.)
Academic Press, 1978.

[3] Powell, M.J.D., "A Fast Algorithm for Nonlineary Constrained
Optimization Calculations," Numerical Analysis, ed. G.A. Watson,
Lecture Notes in Mathematics, Springer Verlag, Vol. 630, 1978.

Try to understand (in-depth) the algorithm used by fmicon. Otherwise
you will be playing computer alchemistry....

A.L.

These manuals are badly written and not practical at all, in the sense that
after hinting you by a few sentences, it points you to some literature,
which is often too complicated for a non-optimization expert to read. Please
remember that most of the people who uses optimization are not optimization
PHD students and professors...
.
User: "A.L."

Title: Re: Are there resources explaining Matlab optimization results and displays? 20 Jul 2007 10:22:12 PM
On Fri, 20 Jul 2007 22:35:35 -0400, "Gino Victor"
<gino_victor@hotmail.com> wrote:



These manuals are badly written and not practical at all, in the sense that
after hinting you by a few sentences, it points you to some literature,
which is often too complicated for a non-optimization expert to read. Please
remember that most of the people who uses optimization are not optimization
PHD students and professors...

You don't need to be PhD or professor to use optimization, but
optimization, and mumerical methods in general, differ a bit from
hammer or electric drill. To use numerical methods YOU MUST, repeat,
MUST understand how they work. To the level used by papers quoted by
the manual.
A.L.
.


User: "Randy Poe"

Title: Re: Are there resources explaining Matlab optimization results and displays? 20 Jul 2007 09:27:04 AM
On Jul 20, 9:17 am, A.L. <f...@2005.com> wrote:

On Fri, 20 Jul 2007 06:08:03 -0700, Randy Poe <poespam-t...@yahoo.com>
wrote:



We're still talking about the step-size for differentiation,
right? Because there is also the descent step size. Certainly
I would think all modern optimizers use an adaptive line search
procedure of some kind to determine descent step size. But
I'm not sure what FMINCON, for example, does for gradient step
size.


Read the manual. It says what strategy is used for directional search
and refers to publications that describe this method.

Right. Just what I said, a line search is used to determine
descent step size, and no mention is made of the details
of the gradient estimation or its step size.

"A line search is performed using a merit function similar to that
proposed by [1] and [2, 3]. The QP subproblem is solved using an
active set strategy similar to that described in [4]. A full
description of this algorithm is found in the "Constrained
Optimization" section of the Introduction to Algorithms chapter of the
toolbox manual...."

And again that section makes no mention, so far as I can
tell, for how the gradient is estimated or what step size is
used in that estimation.

1] Han, S.P., "A Globally Convergent Method for Nonlinear
Programming," Journal of Optimization Theory and Applications, Vol.
22, p. 297, 1977.

[2] Powell, M.J.D., "The Convergence of Variable Metric Methods For
Nonlinearly Constrained Optimization Calculations," Nonlinear
Programming 3, (O.L. Mangasarian, R.R. Meyer, and S.M. Robinson, eds.)
Academic Press, 1978.

[3] Powell, M.J.D., "A Fast Algorithm for Nonlineary Constrained
Optimization Calculations," Numerical Analysis, ed. G.A. Watson,
Lecture Notes in Mathematics, Springer Verlag, Vol. 630, 1978.

Try to understand (in-depth) the algorithm used by fmicon. Otherwise
you will be playing computer alchemistry....

In general such methods include a gradient term but don't
direct you as to how you are supposed to calculate that gradient.
So my comments stand. FMINCON uses a line search for
descent step size, and there is no documentation on how it
determines gradient step size.
- Randy
.







  Page 1 of 1

1

 


Related Articles
Are there any reliable results in present from measuring distances and distribution of GRBs ?
MiniBooNE results
First Results from the Cryogenic Dark Matter Search in the SoudanUnderground Lab
Does absorption in semiconductors results from density of states ?
Test Results You Want, Test Results You Get -- MAN AS OLD AS COAL.
Re: Null Results Prove Nothing.
Results: full parity Eotvos experiment in quartz.
Re: gas mode michaelson interferometer - 4 experimental results
Gravity Probe B results
We never said they gave negative results, and they did not in fact give negative results (Miller 1926)
Stringent Data Results
New Three Year Results on the Oldest Light in the Universe
Re: Good summaries and reviews of the results of the MER rover Mars missions.
MAXIPOL CMB Polarization Results
Experiment Results of Primary State Diffusion
 

NEWER

pg.1612     pg.1232     pg.940     pg.716     pg.544     pg.412     pg.311     pg.234     pg.175     pg.130     pg.96     pg.70     pg.50     pg.35     pg.24     pg.16     pg.10     pg.6     pg.3     pg.1

OLDER