首页 > 代码库 > LOGISTIC REGRESSION

LOGISTIC REGRESSION

<style></style>

In logistic regression we learn a family of functions 技术分享 from 技术分享 to the interval 技术分享. However, logistic regression is used for classification tasks: We can interpret 技术分享 as the probability that the label of 技术分享 is 技术分享. The hypothesis class associated with logistic regression is the composition of a sigmoid function 技术分享 over the class of linear functions技术分享. In particular, the sigmoid function used in logistic regression is the logistic function, defined as

技术分享 (6)

The hypothesis class is therefore (where for simplicity we are using homogenous linear functions):

技术分享 (7)

Note that when 技术分享 is very large then 技术分享 is close to 技术分享, whereas if 技术分享 is very small then 技术分享 is close to 技术分享. Recall that the prediction of the halfspace corresponding to a vector 技术分享 is 技术分享. Therefore, the predictions of the halfspace hypothesis and the logistic hypothesis are very similar whenever 技术分享 is large. However, when 技术分享 is close to 技术分享 we have that 技术分享. Intuitively, the logistic hypothesis is not sure about the value of the label so it guesses that the label is 技术分享 with probability slightly larger than 技术分享. In contrast, the halfspace hypothesis always outputs a deterministic prediction of either 技术分享 or 技术分享, even if 技术分享 is very close to 技术分享.

Next, we need to specify a loss function. That is, we should define how bad it is to predict some 技术分享 given that the true label is 技术分享. Clearly, we would like that 技术分享 would be large if 技术分享 and that 技术分享 (i.e., the probability of predicting 技术分享) would be large if 技术分享. Note that

技术分享 (8)

Therefore, any reasonable loss function would increase monotonically with 技术分享, or equivalently, would increase monotonically with 技术分享. The logistic loss function used in logistic regression penalizes 技术分享 based on the log of 技术分享 (recall that log is a monotonic function). That is,

技术分享 (9)

Therefore, given a training set 技术分享, the ERM problem associated with logistic regression is

技术分享 (10)

The advantage of the logistic loss function is that it is a convex function with respect to 技术分享; hence the ERM problem can be solved efficiently using standard methods. We will study how to learn with convex functions, and in particular specify a simple algorithm for minimizing convex functions, in later chapters.


LOGISTIC REGRESSION