首页 > 代码库 > Kaggle竞赛题目之——Digit Recognizer

Kaggle竞赛题目之——Digit Recognizer

2024-11-06 04:47:02 204人阅读

Classify handwritten digits using the famous MNIST data

This competition is the first in a series of tutorial competitions designed to introduce people to Machine Learning.

The goal in this competition is to take an image of a handwritten single digit, and determine what that digit is. As the competition progresses, we will release tutorials which explain different machine learning algorithms and help you to get started.

The data for this competition were taken from the MNIST dataset. The MNIST ("Modified National Institute of Standards and Technology") dataset is a classic within the Machine Learning community that has been extensively studied. More detail about the dataset, including Machine Learning algorithms that have been tried on it and their levels of success, can be found at http://yann.lecun.com/exdb/mnist/index.html.

题目链接：http://www.kaggle.com/c/digit-recognizer

手写体的数字识别

数据描述：http://www.kaggle.com/c/digit-recognizer/data

每张图片长宽分别是28个像素，每个像素用一个数字表示(介于0～255)，所以每一张图片用28×28个数字来表示。训练数据包含一列label和784列像素值。测试数据没有label列。目的：对训练数据进行训练，得出模型，预测测试数据的label值。

下面将图片由像素值还原为实际的图片，使用ipython notebook：

In [1]:

pwd

C:\Users\zhaohf\Desktop

In [5]:

cd ../../../workspace/kaggle/DigitRecognizer/Data/

C:\workspace\kaggle\DigitRecognizer\Data

In [6]:

ls

 驱动器 C 中的卷是 OS
 卷的序列号是 6C93-0DF3

 C:\workspace\kaggle\DigitRecognizer\Data 的目录

2015/01/15  16:04    <DIR>          .
2015/01/15  16:04    <DIR>          ..
2014/12/28  15:06           240,909 rf_benchmark.csv
2015/01/15  16:04        51,118,294 test.csv
2014/12/28  15:06        51,118,296 test.csv.bak
2014/12/28  15:06        76,775,041 train.csv
               4 个文件    179,252,540 字节
               2 个目录 105,536,135,168 可用字节

In [7]:

import pandas as pd

df = pd.read_csv(‘train.csv‘,header=0).head() #只要前5行

In [8]:

df

Out[8]:

	label	...
0	1	...
1	0	...
2	1	...
3	4	...
4	0	...

5 rows × 785 columns

In [9]:

df[‘label‘]

Out[9]:

0    1
1    0
2    1
3    4
4    0
Name: label, dtype: int64

In [14]:

df = df.ix[:,‘pixel0‘:] #去除label列

In [15]:

df

Out[15]:

	pixel0	pixel1	pixel2	pixel3	pixel4	pixel5	pixel6	pixel7	pixel8	pixel9	...	pixel774	pixel775	pixel776	pixel777	pixel778	pixel779	pixel780	pixel781	pixel782	pixel783
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

5 rows × 784 columns

In [21]:

%matplotlib inline

import matplotlib.pyplot as plt

for i in range(df.shape[0]):

    img = df.ix[i].values.reshape((28,28))

    plt.subplot(2,5,i+1)

    plt.imshow(img)

下面是采用随机森林进行训练和预测：

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from numpy import savetxt,loadtxt

train = loadtxt('../Data/train.csv', delimiter=',',skiprows=1)
X_train = np.array([x[1:] for x in train])
print X_train.shape
Y_train = np.array([x[0] for x in train])
print Y_train.shape
X_test = loadtxt('../Data/test.csv', delimiter=',',skiprows=1)
print X_test.shape
print 'Training...'
rf = RandomForestClassifier(n_estimators=100)
print 'Predicting...'
rf_model = rf.fit(X_train,Y_train)
pred = [[index+1,x] for index,x in enumerate(rf_model.predict(X_test))]
savetxt('../Submissions/myrf_benchmark.csv',pred,delimiter=',',fmt='%d,%d',header='ImageId,Label',comments='')
print 'Done.'

第一次提交结果：

技术分享

Kaggle竞赛题目之——Digit Recognizer

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

	label	...
0	1	...
1	0	...
2	1	...
3	4	...
4	0	...

	pixel0	pixel1	pixel2	pixel3	pixel4	pixel5	pixel6	pixel7	pixel8	pixel9	...	pixel774	pixel775	pixel776	pixel777	pixel778	pixel779	pixel780	pixel781	pixel782	pixel783
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

	label	...
0	1	...
1	0	...
2	1	...
3	4	...
4	0	...

	pixel0	pixel1	pixel2	pixel3	pixel4	pixel5	pixel6	pixel7	pixel8	pixel9	...	pixel774	pixel775	pixel776	pixel777	pixel778	pixel779	pixel780	pixel781	pixel782	pixel783
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

首页 > 代码库 > Kaggle竞赛题目之——Digit Recognizer

Kaggle竞赛题目之——Digit Recognizer

Classify handwritten digits using the famous MNIST data

看完仍有疑问？有类似问题直接问程序猿

	label	...
0	1	...
1	0	...
2	1	...
3	4	...
4	0	...

	pixel0	pixel1	pixel2	pixel3	pixel4	pixel5	pixel6	pixel7	pixel8	pixel9	...	pixel774	pixel775	pixel776	pixel777	pixel778	pixel779	pixel780	pixel781	pixel782	pixel783
0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0