首页 > 代码库 > Chapter 9 Linear Predictors

Chapter 9 Linear Predictors

<style></style>

In this chapter we will study the family of linear predictors, one of the most useful families of hypothesis classes. Many learning algorithms that are being widely used in practice rely on linear predictors, first and foremost because of the ability to learn them efficiently in many cases. In addition, linear predictors are intuitive, are easy to interpret, and fit the data reasonably well in many natural learning problems.

We will introduce several hypothesis classes belonging to this family – halfspaces, linear regression predictors, and logistic regression predictors – and present relevant learning algorithms: linear programming and the Perceptron algorithm for the class of halfspaces and the Least Squares algorithm for linear regression. This chapter is focused on learning linear predictors using the ERM approach; however, in later chapters we will see alternative paradigms for leaning these hypothesis classes.

First, we define the class of affine functions as

技术分享 (1)

where

技术分享 (2)

It will be convenient also to use the notation

技术分享 (3)

which reads as follows: 技术分享 is a set of functions, where each function is parameterized by 技术分享 and 技术分享, and each function takes as input a vector 技术分享 and returns as output the scalar 技术分享.

The different hypothesis classes of linear predictors are compositions of a function 技术分享 on 技术分享. For example, in binary classification, we can choose 技术分享 to be the sign function, and for regression problems, where 技术分享, 技术分享 is simply the identity function.

It may be more convenient to incorporate 技术分享, called the bias, into 技术分享 as an extra coordinate and add an extra coordinate with a value of 1 to all 技术分享; namely, let 技术分享and let 技术分享. Therefore,

技术分享 (4)

It follows that each affine function in 技术分享 can be rewritten as a homogenous linear function in 技术分享 applied over the transformation that appends the constant 1 to each input vector. Therefore, whenever it simplifies the presentation, we will omit the bias term and refer to 技术分享 as the class of homogenous linear functions of the form 技术分享.


Chapter 9 Linear Predictors