首页 > 代码库 > Mahout学习

Mahout学习

Mahout小案例学习,实现k-means算法。

环境:OS:Centos 6.5 x64 & Soft:Hadoop 1.2.1 & Mahout 0.9

1、下载测试数据

[huser@master hadoop]$ wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

2、数据拷贝到HDFS

[huser@master hadoop]$ hadoop-1.2.1/bin/hadoop fs -mkdir ./testdata
Warning: $HADOOP_HOME is deprecated.

[huser@master hadoop]$ hadoop-1.2.1/bin/hadoop fs -put ./synthetic_control.data ./testdata
Warning: $HADOOP_HOME is deprecated.

[huser@master hadoop]$ hadoop-1.2.1/bin/hadoop fs -ls ./testdata
Warning: $HADOOP_HOME is deprecated.
Found 1 items
-rw-r--r-- 1 huser supergroup 288374 2014-04-17 14:02 /user/huser/testdata/synthetic_control.data

3、做一个kmeans聚类测试

[huser@master hadoop]$ mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

4、观察输出

[huser@master hadoop]$ hadoop-1.2.1/bin/hadoop fs -ls ./output
Warning: $HADOOP_HOME is deprecated.

Found 15 items
-rw-r--r-- 1 huser supergroup 194 2014-04-17 14:18 /user/huser/output/_policy
drwxr-xr-x - huser supergroup 0 2014-04-17 14:19 /user/huser/output/clusteredPoints
drwxr-xr-x - huser supergroup 0 2014-04-17 14:10 /user/huser/output/clusters-0
drwxr-xr-x - huser supergroup 0 2014-04-17 14:13 /user/huser/output/clusters-1
drwxr-xr-x - huser supergroup 0 2014-04-17 14:18 /user/huser/output/clusters-10-final
drwxr-xr-x - huser supergroup 0 2014-04-17 14:14 /user/huser/output/clusters-2
drwxr-xr-x - huser supergroup 0 2014-04-17 14:14 /user/huser/output/clusters-3
drwxr-xr-x - huser supergroup 0 2014-04-17 14:15 /user/huser/output/clusters-4
drwxr-xr-x - huser supergroup 0 2014-04-17 14:15 /user/huser/output/clusters-5
drwxr-xr-x - huser supergroup 0 2014-04-17 14:16 /user/huser/output/clusters-6
drwxr-xr-x - huser supergroup 0 2014-04-17 14:17 /user/huser/output/clusters-7
drwxr-xr-x - huser supergroup 0 2014-04-17 14:17 /user/huser/output/clusters-8
drwxr-xr-x - huser supergroup 0 2014-04-17 14:18 /user/huser/output/clusters-9
drwxr-xr-x - huser supergroup 0 2014-04-17 14:10 /user/huser/output/data
drwxr-xr-x - huser supergroup 0 2014-04-17 14:10 /user/huser/output/random-seeds

[huser@master hadoop]$ hadoop-1.2.1/bin/hadoop fs -ls ./output/data
Warning: $HADOOP_HOME is deprecated.

Found 3 items
-rw-r--r-- 1 huser supergroup 0 2014-04-17 14:10 /user/huser/output/data/_SUCCESS
drwxr-xr-x - huser supergroup 0 2014-04-17 14:07 /user/huser/output/data/_logs
-rw-r--r-- 1 huser supergroup 335470 2014-04-17 14:10 /user/huser/output/data/part-m-00000