【正文】
ther necessary, sufficient, and existence conditions? – Help understand what is really going on – Can be used to terminate an algorithm when these conditions are satisfied – Can be used to develop or improve algorithms ? Move x in a direction so that these conditions are “more” satisfied 9/11/2021 Fall 2021, Copy Right P. B. Luh 26 Gradient Methods ? Framework ? Iterative Descent Algorithms Minimize f(x), with x ? X ? Rn f(x): Continuously differentiable ? Analogy: – A blind person with a stick goes down a hill – Knows the current elevation and the slope – Can measure a few more points, one at a time and at a cost – Wants to decide which direction to go, and by how far to reach the top 9/11/2021 Fall 2021, Copy Right P. B. Luh 27 ? Mathematically – Start with an initial point x0, evaluate f(x0), ?f(x0) (and possibly ?2f(x0)) – Based on the information, select a search direction “d” emanating from x0 – Find the next point x1 along the direction d, and evaluate f(x1), ?f(x1) (and possibly ?2f(x1)) – Repeat the above, and successfully generate x2, x3, .., such that f(x) is decreased at each iteration ? Iterative descent ? Key Questions? x0 f (x ) = c d x2 x3 x1 – Which direction to go? How far? Is the algorithm guaranteed to reach x*? How fast to reach x*? 9/11/2021 Fall 2021, Copy Right P. B. Luh 28 Gradient Type Methods ? Key Ideas – Gradient ?f(x): The direction of steepest ascent – A very intuitive direction: d = ?f(x) xk+1 = xk ?k?f(xk), with ?k ? 0 – Shall talk about the selection of ?k later ~ Line Search ? Is this an iterative descent algorithm? How to show this? f(xk+1) = f(xk ?k?f(xk)) = f(xk) ?f(xk)39。[?k ?f(xk)] + o(||?k?f(xk)||) = f(xk) ?k||?f(xk) ||2 + o(?k)||?f(xk)|| f(xk) for small ?k ? Most of the gradient methods considered are descent algorithms 9/11/2021 Fall 2021, Copy Right P. B. Luh 29 ? The method is putationally simple. However, is negative gradient a good direction to follow? ? Slow convergence when level curves are “elongated” ? Gradient is almost orthogonal to the direction leading to x*, resulting in zigzagging of the trajectory. ? Other possibilities? Is d = ?f(x) really needed to have a descent algorithm? 9/11/2021 Fall 2021, Copy Right P. B. Luh 30 ? xk+1 = xk + ?k dk as long as ?f(x)39。 d 0 f(xk+1) = f(xk) + ?k?f(xk)39。d + o(?k) f(xk) for sufficiently small ?k x ? ? f( x )x? f( x )P o ten tiald ir ectio nB add ir ectio nP o ten tiald ir ectio n9/11/2021 Fall 2021, Copy Right P. B. Luh 31 Selecting a Descent Direction – General Cases ? dk = Dk ?f(xk) with Dk 0 – The requirements of ?f(x)39。 d 0, or ?f(xk)39。 Dk ?f(xk) 0 is satisfied since Dk 0 – This is called the “gradient method” in the text – Generally, the gradient method is referred to the case where d = ?f(xk) ? Many methods are special cases of the above – Steepest Descent: Dk ? I – Newton39。s Method: Dk ? (?2f(xk)) 1 ~ More in the futur