首页 > 代码库 > Chapter1——机器学习绪论

Chapter1——机器学习绪论

第一章的主要目的是为了了解一下基本概念,如什么是机器学习、无监督学习、监督学习等等。

一、什么是机器学习

1、机器学习是一门新的研究领域,主要是指在不需要显示编程情况下,计算机具有学习的能力

Field of study that gives computers the ability to learn without being explicitly programmed——Arthur Samuel (1959)

2、A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E——Tom Mitchell (1998) 

question:

Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam.  What is the task T in this setting? 

A. Classifying emails as spam or not spam.                T

B. Watching you label emails as spam or not spam.     E

C. The number (or fraction) of emails correctly classified as spam/not spam.         P

D. None of the above—this is not a machine learning problem.

二、机器学习算法

1、Supervised learning

2、Unsupervised learning

3、Reinforcement learning

4、Recommender system

三、Supervised learning

有监督学习的特点:样本是有标签的

1、回归问题:预测给定样本(测试样本)的输出值

2、分类问题:分类出给定样本(测试样本)的标签,如:肿瘤问题,1表示肿瘤是恶性的,0表示良性

question:

Problem 1: You have a large inventory of identical items.  You want to predict how many of these items will sell over the next 3 months.

Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised.

Should you treat these as classification or as regression problems? 

 

A. Treat both as classification problems. 

B. Treat problem 1 as a classification problem, problem 2 as a regression problem. 

C. Treat problem 1 as a regression problem, problem 2 as a classification problem. 

D. Treat both as regression problems. 

四、Unsupervised learning

无监督学习的特点:样本没有标签,如下图,聚类是经典的无监督学习

 

question:

which would you address using an unsupervised learning algorithm?  

A. Given email labeled as spam/not spam, learn a spam filter.

B. Given a set of news articles found on the web, group them into set of articles about the same story. 

C. Given a database of customer data, automatically discover market segments and group customers into different market segments. 

D. Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not. 

 

Chapter1——机器学习绪论