【文章內(nèi)容簡介】
nd Ensemble (What are we Learning) ? Gradient Boosting (How do we Learn) ? Summary Take Home Message for this section ? Biasvariance tradeoff is everywhere ? The loss + regularization objective pattern applies for regression tree learning (function learning) ? We want predictive and simple functions ? This defines what we want to learn (objective, model). ? But how do we learn it? ? Next section So How do we Learn? ? Objective: ? We can not use methods such as SGD, to find f (since they are trees, instead of just numerical vectors) ? Solution: Additive Training (Boosting) ? Start from constant prediction, add a new function each time Model at training round t New function Keep functions added in previous round Additive Training ? Consider square loss ? How do we decide which f to add? ? Optimize the objective!! ? The prediction at round t is This is what we need to decide in round t Goal: find to minimize this This is usually called residual from previous round Taylor Expansion Approximation of Loss ? Goal ? Seems still plicated except for the case of square loss ? Take Taylor expansion of the objective ? Recall ? Define ? If you are not fortable with this, think of square loss ? Compare what we get to previous slide Our New Goal ? Objective, with constants removed ? where ? Why spending s much efforts to derive the objective, why not just grow trees … ? Theoretical benefit: know what we are learning, convergence ? Engineering benefit, recall the elements of supervised learning ? and es from definition of loss function ? The learning of function only depend on the objective via and ? Think of how you can separate modules of your code when you are asked to implement boosted tree for both square loss and logistic loss Refine the definition of tree ? We define tree by a vector of scores in leafs, and a leaf index mapping function that maps an instance to a leaf age 15 is male? Y N Y N Leaf 1 Leaf 2 Leaf 3 q( ) = 1 q( ) = 3 w1=+2 w2= w3=1 The structure of the tree The leaf weight of the tree Define Complexity of a Tree (cont’) ? Define plexity as (this is not the only possible definition) Number of leaves L2 norm of leaf scores age 15 is male? Y N Y N Leaf 1 Leaf 2 Leaf 3 w1=+2 w2= w3=1 Revisit the Objectives ? Define the instance set in leaf j as ? Regroup the objective by each leaf ? This is sum of T independent quadratic functions The Structure Score ? Two facts about single variable quadratic function ? Let us define ? Assume the structure of tree ( q(x) ) is fixed, the optimal weight in each leaf, and the resulting objective value are This measures how good a tree structure is! The Structur