首页 > 代码库 > 基于Java的数据采集(终结篇)

基于Java的数据采集(终结篇)

关于写过关于JAVA采集入库的三篇文章:

基于Java数据采集入库(一):http://www.cnblogs.com/lichenwei/p/3904715.html

基于Java数据采集入库(二):http://www.cnblogs.com/lichenwei/p/3905370.html

基于Java数据采集入库(三):http://www.cnblogs.com/lichenwei/p/3907007.html

分别实现了

①抓取页面信息并显示

②简单采集入库存储

③调用本地数据库查询

以上这些功能都是基于本地的,有时候我们需要远程去调用这类数据,这时我们就可以用JAVA提供的RMI机制实行远程调用访问。

当然也可以用WebServices实现(PHP版本,有时间再写个JAVA版本的):http://www.cnblogs.com/lichenwei/p/3891297.html

什么是RMI?

RMI 指的是远程方法调用 (Remote Method Invocation)。它是一种机制,能够让在某个 Java虚拟机上的对象调用另一个 Java 虚拟机中的对象上的方法。可以用此方法调用的任何对象必须实现该远程接口。调用这样一个对象时,其参数为 "marshalled" 并将其从本地虚拟机发送到远程虚拟机(该远程虚拟机的参数为 "unmarshalled")上。该方法终止时,将编组来自远程机的结果并将结果发送到调用方的虚拟机。如果方法调用导致抛出异常,则该异常将指示给调用方。

简单了解下RMI,看下简单实现吧

 

1、定义远程接口

首先,我们需要写个远程接口IHello 该接口继承了远程对象Remote.

接口IHello里面有个hello的方法,用于客户端连接后 打招呼.

由于IHello继承了远程Remote对象, 所以需要抛一个 RemoteException 远程异常.

1 import java.rmi.Remote;2 import java.rmi.RemoteException;3 4 5 public interface IHello extends Remote{6 7     public String hello(String name) throws RemoteException;8 }

2、实现接口

接下来,我们实现下 该接口里的方法, 实现接口的方法在服务端.

这里的HelloImpl类 实现了接口IHello里的方法.

注意:这里HelloImpl同样继承了 UnicastRemoteObject 远程对象,这个必须写,不然服务端启动后会莫名其妙报错.

 1 import java.rmi.RemoteException; 2 import java.rmi.server.UnicastRemoteObject; 3  4 /** 5  * UnicastRemoteObject 这个必须写,虽然不写代码也不会出错,但在运行服务器的时候会出现莫名错误 6  * @author Balla_兔子 7  * 8  */ 9 public class HelloImpl extends UnicastRemoteObject implements IHello {10 11     protected HelloImpl() throws RemoteException {12         super();13     }14 15     @Override16     public String hello(String name) {17         String strHello="你好!"+name+"正在访问服务端";18         System.out.println(name+"正在访问服务端");19         return strHello;20     }21 22 }

3、编写服务端

服务端,由于RMI实现远程访问的机制是指:客户端通过在RMI注册表上寻找远程接口对象的地址(服务端地址) 达到实现远程访问的目的,

所以,我们需要在服务端创建一个远程对象的注册表,用于绑定和注册 服务端地址 和 远程接口对象,便于后期客户端能够成功找到服务端

 1 import java.rmi.Naming; 2 import java.rmi.RemoteException; 3 import java.rmi.registry.LocateRegistry; 4  5  6 public class Server { 7  8     /** 9      * @param args10      */11     public static void main(String[] args) {12         try {13             IHello hello=new HelloImpl();14             int port=6666;15             LocateRegistry.createRegistry(port);16             String address="rmi://localhost:"+port+"/tuzi";17             Naming.bind(address, hello);18             System.out.println(">>>服务端启动成功");19             System.out.println(">>>请启动客户端进行连接访问..");20             21         } catch (Exception e) {22             e.printStackTrace();23         }24     }25 26 }

4、编写客户端

客户端上同样需要定义一个 远程访问的地址 - 即服务端地址,

然后,通过在RMI注册表上寻找该地址;  如果找到 则建立连接.

 1 import java.net.MalformedURLException; 2 import java.rmi.Naming; 3 import java.rmi.NotBoundException; 4 import java.rmi.RemoteException; 5 import java.util.Scanner; 6  7  8 public class Client { 9     public static void main(String[] args) {10         11         int port=6666;12         String address="rmi://localhost:"+port+"/tuzi";13         try {14             IHello hello=(IHello) Naming.lookup(address);15             System.out.println("<<<客户端访问成功!");16             //客户端 Client 调用 远程接口里的 sayHello 方法  并打印出来17             System.out.println(hello.hello("Rabbit"));             18             Scanner scanner=new Scanner(System.in);19             String input=scanner.next();20         } catch (MalformedURLException e) {21             // TODO Auto-generated catch block22             e.printStackTrace();23         } catch (RemoteException e) {24             // TODO Auto-generated catch block25             e.printStackTrace();26         } catch (NotBoundException e) {27             // TODO Auto-generated catch block28             e.printStackTrace();29         }30         31     }32 }

运行效果图:

 

接下来就来看看我们的程序吧,先上效果图:

 

好了,剩下的上代码吧,具体看代码注释:

 

IdoAction.java (功能调用接口代码)

 1 package com.lcw.rmi.collection; 2  3 import java.rmi.Remote; 4 import java.rmi.RemoteException; 5 import java.util.List; 6  7 public interface IdoAction extends Remote{ 8      9     10     public void initData() throws RemoteException;11     12     public void getAllDatas() throws RemoteException;13     14     public List<String> getAllTeams() throws RemoteException;15     16     public List<String> getTeamInfo(String team) throws RemoteException;17     18     public List<String> getAllInfo() throws RemoteException;19     20 }
IdoAction.java

doActionImpl.java (接口实现类)

  1 package com.lcw.rmi.collection;  2   3 import java.rmi.RemoteException;  4 import java.rmi.server.UnicastRemoteObject;  5 import java.sql.ResultSet;  6 import java.sql.SQLException;  7 import java.util.ArrayList;  8 import java.util.List;  9  10 public class doActionImpl extends UnicastRemoteObject implements IdoAction { 11  12     /** 13      *  14      */ 15     private static final long serialVersionUID = 1L; 16     private Mysql mysql; 17     private ResultSet resultSet; 18  19     public doActionImpl() throws RemoteException { 20         mysql = new Mysql(); 21     } 22  23     @Override 24     public void getAllDatas() throws RemoteException { 25         // 调用采集类,获取所有数据 26         CollectData data = http://www.mamicode.com/new CollectData(); 27         data.getAllDatas(); 28         System.out.println("数据采集成功!"); 29     } 30  31     @Override 32     public List<String> getAllInfo() throws RemoteException { 33         // 查询所有数据 34         String sql = "select * from data"; 35         resultSet = mysql.querySQL(sql); 36         List<String> list=new ArrayList<String>(); 37         System.out.println("当前执行命令5,正在获取NBA(2013-2014)赛季常规赛队伍所有信息.."); 38         System.out.println("获取成功,已在客户端展示.."); 39         try { 40             while(resultSet.next()) { 41                 for (int i = 2; i < 17; i++) { 42                     //System.out.println("++++++++++++++");调试 43                     list.add(resultSet.getString(i)); 44                 } 45                 System.out.println(); 46             } 47         } catch (SQLException e) { 48             e.printStackTrace(); 49         } 50         return list; 51     } 52  53     @Override 54     public List<String> getAllTeams() throws RemoteException { 55         // 查询所有队伍名称 56         String sql = "select team from data"; 57         resultSet = mysql.querySQL(sql); 58         List<String> list = new ArrayList<String>(); 59         System.out.println("当前执行命令3,正在获取NBA(2013-2014)赛季常规赛队伍.."); 60         System.out.println("获取成功,已在客户端展示.."); 61         try { 62             while (resultSet.next()) { 63                 list.add(resultSet.getString("team")); 64             } 65         } catch (SQLException e) { 66             System.out.println("数据库暂无信息,请执行自动化采集命令"); 67             e.printStackTrace(); 68         } 69         return list; 70  71     } 72  73     @Override 74     public List<String> getTeamInfo(String team) throws RemoteException { 75         // 根据队伍查询队伍信息 76         ResultSet resultSet = mysql.querySQL("select * from data where team=‘" 77                 + team + "‘"); 78         List<String> list=new ArrayList<String>(); 79         System.out.println("当前执行命令4,正在获取用户所查询队伍信息.."); 80         System.out.println("获取成功,已在客户端展示.."); 81         try { 82             if (resultSet.next()) { 83                 for (int i = 2; i < 17; i++) { 84                     list.add(resultSet.getString(i)); 85                 } 86             } 87             System.out.println(); 88         } catch (SQLException e) { 89             System.out.println("数据库暂无信息,请执行自动化采集命令"); 90             e.printStackTrace(); 91         } 92         return list; 93     } 94  95     @Override 96     public void initData() throws RemoteException { 97         // 初始化数据库 98         String sql = "delete from data"; 99         try {100             mysql.updateSQL(sql);101             System.out.println("数据库初始化成功!");102         } catch (Exception e) {103             System.out.println("数据库初始化失败!");104         }105 106     }107 108 }
doActionImpl.java

CollectData.java (采集主类)

 1 package com.lcw.rmi.collection; 2  3 import java.io.BufferedReader; 4 import java.io.IOException; 5 import java.io.InputStream; 6 import java.io.InputStreamReader; 7 import java.net.MalformedURLException; 8 import java.net.URL; 9 import java.util.ArrayList;10 import java.util.Arrays;11 import java.util.List;12 13 public class CollectData {14 15     /**16      * 采集类,获取所有数据17      */18     public void getAllDatas() {19         String address = "http://nbadata.sports.qq.com/teams_stat.aspx";// 要采集数据的url20         try {21             URL url = new URL(address);22             try {23                 InputStream inputStream = url.openStream();// 打开url,返回字节流24                 InputStreamReader inputStreamReader = new InputStreamReader(25                         inputStream, "gbk");// 将字节流转换为字符流,编码utf-826                 BufferedReader reader = new BufferedReader(inputStreamReader);// 提高效率,缓存27                 String rankRegEx = ">\\d{1,2}</td>";// 排名正则28                 String teamRegEx = ">[^<>]*</a>";// 队名正则29                 String dataRegEx = ">\\d{1,3}(\\.)\\d{0,2}</td>";// 正常数据正则30                 String percentRegEX = ">\\d{1,2}(\\.)*(\\d)*%</span></td>";// 百分比数据31                 GetRegExData regExData = http://www.mamicode.com/new GetRegExData();32                 String temp = "";// 存放临时读取数据33                 int flag = 0;34                 String tempRank = "";// 存放匹配到的返回数据35                 String tempTeam = "";// 存放匹配到的返回数据36                 String tempDatahttp://www.mamicode.com/= "";37                 String tempPercent = "";38                 List<String> list = new ArrayList<String>();39                 Mysql mysql = new Mysql();40                 while ((temp = reader.readLine()) != null) {41                     // 匹配排名42                     if ((tempRank = regExData.getData(rankRegEx, temp)) != "") {43                         tempRank = tempRank.substring(1, tempRank44                                 .indexOf("</td>"));45                         // System.out.println("排名:" + tempRank);46                         list.add(tempRank);47                         flag++;48                     }49                     // 匹配球队50                     // 由于该正则会匹配到其他地方的数据,需给它一个标识符,让它从"找到排名位置"才开始匹配51                     if ((tempTeam = regExData.getData(teamRegEx, temp)) != ""52                             && flag == 1) {53                         tempTeam = tempTeam.substring(1, tempTeam54                                 .indexOf("</a>"));55                         // System.out.println("球队名称:" + tempTeam);56                         list.add(tempTeam);57                         flag = 0;58                     }59                     // 匹配正常数据60                     if ((tempData = http://www.mamicode.com/regExData.getData(dataRegEx, temp)) !="") {61                         tempData = http://www.mamicode.com/tempData.substring(1, tempData62                                 .indexOf("</td>"));63                         // System.out.println(tempData);64                         list.add(tempData);65 66                     }67                     // 匹配百分比数据68                     if ((tempPercent = regExData.getData(percentRegEX, temp)) != "") {69                         tempPercent = tempPercent.substring(1, tempPercent70                                 .indexOf("</span></td>"));71                         // System.out.println(tempPercent);72                         list.add(tempPercent);73                     }74 75                 }76                 reader.close();77                 Object[] arr = list.toArray();// 将集合转换为数组78                 int a = -15;79                 int b = 0;80                 String sql = "insert into data(rank,team,chushou1,mingzhong1,chushou2,mingzhong2,chushou3,mingzhong3,qianchang,houchang,zong,zhugong,shiwu,fangui,defen) values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)";81                 for (int i = 0; i < 30; i++) {82                     a += 15;83                     b += 15;84                     if (b <= 450) {85                         Object[] arr1 = Arrays.copyOfRange(arr, a, b);86                         mysql.insertNewData(sql, arr1);87                         System.out.println("正在采集数据..当前采集数据:" + (i + 1) + "条");88                     }89                 }90 91             } catch (IOException e) {92                 e.printStackTrace();93             }94         } catch (MalformedURLException e) {95             e.printStackTrace();96         }97     }98 99 }
CollectData.java

GetRegExData.java (正则过滤功能类)

 1 package com.lcw.rmi.collection; 2  3 import java.util.regex.Matcher; 4 import java.util.regex.Pattern; 5  6 public class GetRegExData { 7  8     public String getData(String regex, String content) { 9         Pattern pattern = Pattern.compile(regex);10         Matcher matcher = pattern.matcher(content);11         if (matcher.find()) {12             return matcher.group();13         } else {14             return "";15         }16 17     }18 }
GetRegExData.java

Mysql.java (数据库操作类)

  1 package com.lcw.rmi.collection;  2   3 import java.sql.Connection;  4 import java.sql.DriverManager;  5 import java.sql.PreparedStatement;  6 import java.sql.ResultSet;  7 import java.sql.SQLException;  8   9 public class Mysql { 10  11     private String driver = "com.mysql.jdbc.Driver"; 12     private String url = "jdbc:mysql://localhost:3306/nba"; 13     private String user = "root"; 14     private String password = ""; 15  16     private PreparedStatement stmt = null; 17     private Connection conn = null; 18     private ResultSet resultSet = null; 19  20     /** 21      *  22      * @param insertSql 23      *            采集类,插入数据操作 24      * @param arr 25      */ 26     public void insertNewData(String insertSql, Object[] arr) { 27  28         try { 29             Class.forName(driver).newInstance(); 30             try { 31                 conn = DriverManager.getConnection(url, user, password); 32                 stmt = conn.prepareStatement(insertSql); 33                 stmt.setString(1, arr[0].toString()); 34                 stmt.setString(2, arr[1].toString()); 35                 stmt.setString(3, arr[2].toString()); 36                 stmt.setString(4, arr[3].toString()); 37                 stmt.setString(5, arr[4].toString()); 38                 stmt.setString(6, arr[5].toString()); 39                 stmt.setString(7, arr[6].toString()); 40                 stmt.setString(8, arr[7].toString()); 41                 stmt.setString(9, arr[8].toString()); 42                 stmt.setString(10, arr[9].toString()); 43                 stmt.setString(11, arr[10].toString()); 44                 stmt.setString(12, arr[11].toString()); 45                 stmt.setString(13, arr[12].toString()); 46                 stmt.setString(14, arr[13].toString()); 47                 stmt.setString(15, arr[14].toString()); 48                 stmt.executeUpdate(); 49                 stmt.close(); 50                 conn.close(); 51  52             } catch (SQLException e) { 53                 e.printStackTrace(); 54             } 55         } catch (InstantiationException e) { 56             e.printStackTrace(); 57         } catch (IllegalAccessException e) { 58             e.printStackTrace(); 59         } catch (ClassNotFoundException e) { 60             e.printStackTrace(); 61         } 62  63     } 64  65     /** 66      *  67      * @param sql更新数据库语句 68      */ 69     public void updateSQL(String updateSql) { 70         try { 71             Class.forName(driver).newInstance(); 72             try { 73                 conn = DriverManager.getConnection(url, user, password); 74             } catch (SQLException e) { 75                 e.printStackTrace(); 76             } 77             try { 78                 stmt = conn.prepareStatement(updateSql); 79                 stmt.execute(updateSql); 80             } catch (SQLException e) { 81                 e.printStackTrace(); 82             } 83  84         } catch (InstantiationException e) { 85             e.printStackTrace(); 86         } catch (IllegalAccessException e) { 87             e.printStackTrace(); 88         } catch (ClassNotFoundException e) { 89             e.printStackTrace(); 90         } 91     } 92  93     /** 94      *  95      * @param sql一般查询 96      */ 97     public ResultSet querySQL(String searchSql) { 98         try { 99             Class.forName(driver).newInstance();100             try {101                 conn = DriverManager.getConnection(url, user, password);102             } catch (SQLException e) {103                 e.printStackTrace();104             }105             try {106                 stmt = conn.prepareStatement(searchSql);107                 resultSet = stmt.executeQuery();108             } catch (SQLException e) {109                 e.printStackTrace();110             }111 112         } catch (InstantiationException e) {113             e.printStackTrace();114         } catch (IllegalAccessException e) {115             e.printStackTrace();116         } catch (ClassNotFoundException e) {117             e.printStackTrace();118         }119         return resultSet;120     }121 }
Mysql.java

Server.java (服务端类)

 1 package com.lcw.rmi.collection; 2  3 import java.net.MalformedURLException; 4 import java.rmi.AlreadyBoundException; 5 import java.rmi.Naming; 6 import java.rmi.RemoteException; 7 import java.rmi.registry.LocateRegistry; 8  9 public class Server {10 11     /**12      * @param args13      */14     public static void main(String[] args) {15         try {16             int port = 9797;17             String address = "rmi://localhost:"+port+"/nba";18             IdoAction action = new doActionImpl();19             LocateRegistry.createRegistry(port);20             try {21                 Naming.bind(address, action);22                 System.out.println(">>>正在启动服务端..");23                 System.out.println(">>>服务端启动成功!");24                 System.out.println(">>>等待客户端连接...");25                 System.out.println(">>>客户端Balla_兔子已连接。");26             } catch (MalformedURLException e) {27                 e.printStackTrace();28             } catch (AlreadyBoundException e) {29                 e.printStackTrace();30             }31         } catch (RemoteException e) {32             e.printStackTrace();33         }34     }35 36 }
Server.java

Client.java (客户端类)

  1 package com.lcw.rmi.collection;  2   3 import java.net.MalformedURLException;  4 import java.rmi.Naming;  5 import java.rmi.NotBoundException;  6 import java.rmi.RemoteException;  7 import java.util.List;  8 import java.util.Scanner;  9  10 public class Client { 11  12     public static void main(String[] args) { 13         int port = 9797; 14         String address = "rmi://localhost:" + port + "/nba"; 15  16         try { 17             IdoAction action = (IdoAction) Naming.lookup(address); 18             System.out.println("正在启动客户端.."); 19             System.out.println("客户端启动完毕,正在连接服务端.."); 20             System.out.println("连接成功..."); 21             System.out.println("---------------------------"); 22  23             while (true) { 24                 System.out.println("①初始化数据库-请按 (1)"); 25                 System.out.println(); 26                 System.out.println("②自动化采集NBA(2013-2014)赛季常规赛排名数据-请按(2)"); 27                 System.out.println(); 28                 System.out.println("③查询NBA(2013-2014)赛季常规赛排名所有队伍-请按(3)"); 29                 System.out.println(); 30                 System.out.println("④查询具体球队(2013-2014)赛季常规赛排名-请按(4)"); 31                 System.out.println(); 32                 System.out.println("⑤查询具体详情-请按(5)"); 33                 System.out.println(); 34  35                 Scanner scanner = new Scanner(System.in); 36                 String input = scanner.next(); 37  38                 if (input.equals("1")) { 39                     System.out 40                             .println("---------------------------------------------------------"); 41                     System.out.println("服务端数据已初始化,请按2进行数据自动化采集.."); 42                     action.initData(); 43                     System.out 44                             .println("---------------------------------------------------------"); 45                 } 46                 if (input.equals("2")) { 47                     System.out 48                             .println("---------------------------------------------------------"); 49                     System.out.println("数据自动化采集中,请稍后.."); 50                     int i=0; 51                     while(i<10000){//延迟操作,给数据采集缓冲时间 52                         i++; 53                     } 54                     System.out.println("数据采集完毕..按3,4,5进行相关操作"); 55                     action.getAllDatas(); 56                     System.out 57                             .println("---------------------------------------------------------"); 58                 } 59                 if (input.equals("3")) { 60                     System.out 61                             .println("---------------------------------------------------------"); 62                     System.out.println("正在获取NBA(2013-2014)赛季常规赛队伍,请稍后.."); 63                     System.out.println(); 64                     List<String> list = action.getAllTeams(); 65                     for (int i = 0; i < list.size(); i++) { 66                         if (i % 5 == 0 && i != 0) { 67                             System.out.println(); 68                         } 69                         System.out.print(list.get(i) + "\t"); 70                     } 71                     System.out.println(); 72  73                     System.out 74                             .println("---------------------------------------------------------"); 75                 } 76                 if (input.equals("4")) { 77                     System.out 78                             .println("---------------------------------------------------------"); 79                     System.out.println("请输入你要查询的队伍名称(如:76人)"); 80                     String team = scanner.next(); 81                     System.out 82                             .print("排名\t球队\t出手\t命中率\t出手\t命中率\t出手\t命中率\t前场\t后场\t总\t助攻\t失误\t犯规\t得分"); 83                     System.out.println(); 84                     List<String> list=action.getTeamInfo(team); 85                     for (int i = 0; i < 15; i++) { 86                         System.out.print(list.get(i)+"\t"); 87                     } 88                     System.out.println(); 89                     System.out 90                             .println("---------------------------------------------------------"); 91                 } 92                 if (input.equals("5")) { 93                     System.out 94                             .println("---------------------------------------------------------"); 95                     System.out.println("数据获取中,请稍后..."); 96                     System.out.println(); 97                     System.out 98                             .print("排名\t球队\t出手\t命中率\t出手\t命中率\t出手\t命中率\t前场\t后场\t总\t助攻\t失误\t犯规\t得分"); 99                     System.out.println();100                     List<String> list=action.getAllInfo();101                     for(int i=0;i<450;i++){102                         if(i%15==0&&i!=0){103                             System.out.println();104                         }105                         System.out.print(list.get(i)+"\t");106                     }107                     System.out.println();108                     System.out109                             .println("---------------------------------------------------------");110                 }111             }112         } catch (MalformedURLException e) {113             e.printStackTrace();114         } catch (RemoteException e) {115             e.printStackTrace();116         } catch (NotBoundException e) {117             e.printStackTrace();118         }119     }120 }
Client.java

 

好了,关于JAVA采集数据文章就到此为止了~ 撤··