首页 > 代码库 > 基于Java的数据采集(终结篇)
基于Java的数据采集(终结篇)
关于写过关于JAVA采集入库的三篇文章:
基于Java数据采集入库(一):http://www.cnblogs.com/lichenwei/p/3904715.html
基于Java数据采集入库(二):http://www.cnblogs.com/lichenwei/p/3905370.html
基于Java数据采集入库(三):http://www.cnblogs.com/lichenwei/p/3907007.html
分别实现了
①抓取页面信息并显示
②简单采集入库存储
③调用本地数据库查询
以上这些功能都是基于本地的,有时候我们需要远程去调用这类数据,这时我们就可以用JAVA提供的RMI机制实行远程调用访问。
当然也可以用WebServices实现(PHP版本,有时间再写个JAVA版本的):http://www.cnblogs.com/lichenwei/p/3891297.html
什么是RMI?
RMI 指的是远程方法调用 (Remote Method Invocation)。它是一种机制,能够让在某个 Java虚拟机上的对象调用另一个 Java 虚拟机中的对象上的方法。可以用此方法调用的任何对象必须实现该远程接口。调用这样一个对象时,其参数为 "marshalled" 并将其从本地虚拟机发送到远程虚拟机(该远程虚拟机的参数为 "unmarshalled")上。该方法终止时,将编组来自远程机的结果并将结果发送到调用方的虚拟机。如果方法调用导致抛出异常,则该异常将指示给调用方。
简单了解下RMI,看下简单实现吧
1、定义远程接口
首先,我们需要写个远程接口IHello 该接口继承了远程对象Remote.
接口IHello里面有个hello的方法,用于客户端连接后 打招呼.
由于IHello继承了远程Remote对象, 所以需要抛一个 RemoteException 远程异常.
1 import java.rmi.Remote;2 import java.rmi.RemoteException;3 4 5 public interface IHello extends Remote{6 7 public String hello(String name) throws RemoteException;8 }
2、实现接口
接下来,我们实现下 该接口里的方法, 实现接口的方法在服务端.
这里的HelloImpl类 实现了接口IHello里的方法.
注意:这里HelloImpl同样继承了 UnicastRemoteObject 远程对象,这个必须写,不然服务端启动后会莫名其妙报错.
1 import java.rmi.RemoteException; 2 import java.rmi.server.UnicastRemoteObject; 3 4 /** 5 * UnicastRemoteObject 这个必须写,虽然不写代码也不会出错,但在运行服务器的时候会出现莫名错误 6 * @author Balla_兔子 7 * 8 */ 9 public class HelloImpl extends UnicastRemoteObject implements IHello {10 11 protected HelloImpl() throws RemoteException {12 super();13 }14 15 @Override16 public String hello(String name) {17 String strHello="你好!"+name+"正在访问服务端";18 System.out.println(name+"正在访问服务端");19 return strHello;20 }21 22 }
3、编写服务端
服务端,由于RMI实现远程访问的机制是指:客户端通过在RMI注册表上寻找远程接口对象的地址(服务端地址) 达到实现远程访问的目的,
所以,我们需要在服务端创建一个远程对象的注册表,用于绑定和注册 服务端地址 和 远程接口对象,便于后期客户端能够成功找到服务端
1 import java.rmi.Naming; 2 import java.rmi.RemoteException; 3 import java.rmi.registry.LocateRegistry; 4 5 6 public class Server { 7 8 /** 9 * @param args10 */11 public static void main(String[] args) {12 try {13 IHello hello=new HelloImpl();14 int port=6666;15 LocateRegistry.createRegistry(port);16 String address="rmi://localhost:"+port+"/tuzi";17 Naming.bind(address, hello);18 System.out.println(">>>服务端启动成功");19 System.out.println(">>>请启动客户端进行连接访问..");20 21 } catch (Exception e) {22 e.printStackTrace();23 }24 }25 26 }
4、编写客户端
客户端上同样需要定义一个 远程访问的地址 - 即服务端地址,
然后,通过在RMI注册表上寻找该地址; 如果找到 则建立连接.
1 import java.net.MalformedURLException; 2 import java.rmi.Naming; 3 import java.rmi.NotBoundException; 4 import java.rmi.RemoteException; 5 import java.util.Scanner; 6 7 8 public class Client { 9 public static void main(String[] args) {10 11 int port=6666;12 String address="rmi://localhost:"+port+"/tuzi";13 try {14 IHello hello=(IHello) Naming.lookup(address);15 System.out.println("<<<客户端访问成功!");16 //客户端 Client 调用 远程接口里的 sayHello 方法 并打印出来17 System.out.println(hello.hello("Rabbit")); 18 Scanner scanner=new Scanner(System.in);19 String input=scanner.next();20 } catch (MalformedURLException e) {21 // TODO Auto-generated catch block22 e.printStackTrace();23 } catch (RemoteException e) {24 // TODO Auto-generated catch block25 e.printStackTrace();26 } catch (NotBoundException e) {27 // TODO Auto-generated catch block28 e.printStackTrace();29 }30 31 }32 }
运行效果图:
接下来就来看看我们的程序吧,先上效果图:
好了,剩下的上代码吧,具体看代码注释:
IdoAction.java (功能调用接口代码)
1 package com.lcw.rmi.collection; 2 3 import java.rmi.Remote; 4 import java.rmi.RemoteException; 5 import java.util.List; 6 7 public interface IdoAction extends Remote{ 8 9 10 public void initData() throws RemoteException;11 12 public void getAllDatas() throws RemoteException;13 14 public List<String> getAllTeams() throws RemoteException;15 16 public List<String> getTeamInfo(String team) throws RemoteException;17 18 public List<String> getAllInfo() throws RemoteException;19 20 }
doActionImpl.java (接口实现类)
1 package com.lcw.rmi.collection; 2 3 import java.rmi.RemoteException; 4 import java.rmi.server.UnicastRemoteObject; 5 import java.sql.ResultSet; 6 import java.sql.SQLException; 7 import java.util.ArrayList; 8 import java.util.List; 9 10 public class doActionImpl extends UnicastRemoteObject implements IdoAction { 11 12 /** 13 * 14 */ 15 private static final long serialVersionUID = 1L; 16 private Mysql mysql; 17 private ResultSet resultSet; 18 19 public doActionImpl() throws RemoteException { 20 mysql = new Mysql(); 21 } 22 23 @Override 24 public void getAllDatas() throws RemoteException { 25 // 调用采集类,获取所有数据 26 CollectData data = http://www.mamicode.com/new CollectData(); 27 data.getAllDatas(); 28 System.out.println("数据采集成功!"); 29 } 30 31 @Override 32 public List<String> getAllInfo() throws RemoteException { 33 // 查询所有数据 34 String sql = "select * from data"; 35 resultSet = mysql.querySQL(sql); 36 List<String> list=new ArrayList<String>(); 37 System.out.println("当前执行命令5,正在获取NBA(2013-2014)赛季常规赛队伍所有信息.."); 38 System.out.println("获取成功,已在客户端展示.."); 39 try { 40 while(resultSet.next()) { 41 for (int i = 2; i < 17; i++) { 42 //System.out.println("++++++++++++++");调试 43 list.add(resultSet.getString(i)); 44 } 45 System.out.println(); 46 } 47 } catch (SQLException e) { 48 e.printStackTrace(); 49 } 50 return list; 51 } 52 53 @Override 54 public List<String> getAllTeams() throws RemoteException { 55 // 查询所有队伍名称 56 String sql = "select team from data"; 57 resultSet = mysql.querySQL(sql); 58 List<String> list = new ArrayList<String>(); 59 System.out.println("当前执行命令3,正在获取NBA(2013-2014)赛季常规赛队伍.."); 60 System.out.println("获取成功,已在客户端展示.."); 61 try { 62 while (resultSet.next()) { 63 list.add(resultSet.getString("team")); 64 } 65 } catch (SQLException e) { 66 System.out.println("数据库暂无信息,请执行自动化采集命令"); 67 e.printStackTrace(); 68 } 69 return list; 70 71 } 72 73 @Override 74 public List<String> getTeamInfo(String team) throws RemoteException { 75 // 根据队伍查询队伍信息 76 ResultSet resultSet = mysql.querySQL("select * from data where team=‘" 77 + team + "‘"); 78 List<String> list=new ArrayList<String>(); 79 System.out.println("当前执行命令4,正在获取用户所查询队伍信息.."); 80 System.out.println("获取成功,已在客户端展示.."); 81 try { 82 if (resultSet.next()) { 83 for (int i = 2; i < 17; i++) { 84 list.add(resultSet.getString(i)); 85 } 86 } 87 System.out.println(); 88 } catch (SQLException e) { 89 System.out.println("数据库暂无信息,请执行自动化采集命令"); 90 e.printStackTrace(); 91 } 92 return list; 93 } 94 95 @Override 96 public void initData() throws RemoteException { 97 // 初始化数据库 98 String sql = "delete from data"; 99 try {100 mysql.updateSQL(sql);101 System.out.println("数据库初始化成功!");102 } catch (Exception e) {103 System.out.println("数据库初始化失败!");104 }105 106 }107 108 }
CollectData.java (采集主类)
1 package com.lcw.rmi.collection; 2 3 import java.io.BufferedReader; 4 import java.io.IOException; 5 import java.io.InputStream; 6 import java.io.InputStreamReader; 7 import java.net.MalformedURLException; 8 import java.net.URL; 9 import java.util.ArrayList;10 import java.util.Arrays;11 import java.util.List;12 13 public class CollectData {14 15 /**16 * 采集类,获取所有数据17 */18 public void getAllDatas() {19 String address = "http://nbadata.sports.qq.com/teams_stat.aspx";// 要采集数据的url20 try {21 URL url = new URL(address);22 try {23 InputStream inputStream = url.openStream();// 打开url,返回字节流24 InputStreamReader inputStreamReader = new InputStreamReader(25 inputStream, "gbk");// 将字节流转换为字符流,编码utf-826 BufferedReader reader = new BufferedReader(inputStreamReader);// 提高效率,缓存27 String rankRegEx = ">\\d{1,2}</td>";// 排名正则28 String teamRegEx = ">[^<>]*</a>";// 队名正则29 String dataRegEx = ">\\d{1,3}(\\.)\\d{0,2}</td>";// 正常数据正则30 String percentRegEX = ">\\d{1,2}(\\.)*(\\d)*%</span></td>";// 百分比数据31 GetRegExData regExData = http://www.mamicode.com/new GetRegExData();32 String temp = "";// 存放临时读取数据33 int flag = 0;34 String tempRank = "";// 存放匹配到的返回数据35 String tempTeam = "";// 存放匹配到的返回数据36 String tempDatahttp://www.mamicode.com/= "";37 String tempPercent = "";38 List<String> list = new ArrayList<String>();39 Mysql mysql = new Mysql();40 while ((temp = reader.readLine()) != null) {41 // 匹配排名42 if ((tempRank = regExData.getData(rankRegEx, temp)) != "") {43 tempRank = tempRank.substring(1, tempRank44 .indexOf("</td>"));45 // System.out.println("排名:" + tempRank);46 list.add(tempRank);47 flag++;48 }49 // 匹配球队50 // 由于该正则会匹配到其他地方的数据,需给它一个标识符,让它从"找到排名位置"才开始匹配51 if ((tempTeam = regExData.getData(teamRegEx, temp)) != ""52 && flag == 1) {53 tempTeam = tempTeam.substring(1, tempTeam54 .indexOf("</a>"));55 // System.out.println("球队名称:" + tempTeam);56 list.add(tempTeam);57 flag = 0;58 }59 // 匹配正常数据60 if ((tempData = http://www.mamicode.com/regExData.getData(dataRegEx, temp)) !="") {61 tempData = http://www.mamicode.com/tempData.substring(1, tempData62 .indexOf("</td>"));63 // System.out.println(tempData);64 list.add(tempData);65 66 }67 // 匹配百分比数据68 if ((tempPercent = regExData.getData(percentRegEX, temp)) != "") {69 tempPercent = tempPercent.substring(1, tempPercent70 .indexOf("</span></td>"));71 // System.out.println(tempPercent);72 list.add(tempPercent);73 }74 75 }76 reader.close();77 Object[] arr = list.toArray();// 将集合转换为数组78 int a = -15;79 int b = 0;80 String sql = "insert into data(rank,team,chushou1,mingzhong1,chushou2,mingzhong2,chushou3,mingzhong3,qianchang,houchang,zong,zhugong,shiwu,fangui,defen) values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)";81 for (int i = 0; i < 30; i++) {82 a += 15;83 b += 15;84 if (b <= 450) {85 Object[] arr1 = Arrays.copyOfRange(arr, a, b);86 mysql.insertNewData(sql, arr1);87 System.out.println("正在采集数据..当前采集数据:" + (i + 1) + "条");88 }89 }90 91 } catch (IOException e) {92 e.printStackTrace();93 }94 } catch (MalformedURLException e) {95 e.printStackTrace();96 }97 }98 99 }
GetRegExData.java (正则过滤功能类)
1 package com.lcw.rmi.collection; 2 3 import java.util.regex.Matcher; 4 import java.util.regex.Pattern; 5 6 public class GetRegExData { 7 8 public String getData(String regex, String content) { 9 Pattern pattern = Pattern.compile(regex);10 Matcher matcher = pattern.matcher(content);11 if (matcher.find()) {12 return matcher.group();13 } else {14 return "";15 }16 17 }18 }
Mysql.java (数据库操作类)
1 package com.lcw.rmi.collection; 2 3 import java.sql.Connection; 4 import java.sql.DriverManager; 5 import java.sql.PreparedStatement; 6 import java.sql.ResultSet; 7 import java.sql.SQLException; 8 9 public class Mysql { 10 11 private String driver = "com.mysql.jdbc.Driver"; 12 private String url = "jdbc:mysql://localhost:3306/nba"; 13 private String user = "root"; 14 private String password = ""; 15 16 private PreparedStatement stmt = null; 17 private Connection conn = null; 18 private ResultSet resultSet = null; 19 20 /** 21 * 22 * @param insertSql 23 * 采集类,插入数据操作 24 * @param arr 25 */ 26 public void insertNewData(String insertSql, Object[] arr) { 27 28 try { 29 Class.forName(driver).newInstance(); 30 try { 31 conn = DriverManager.getConnection(url, user, password); 32 stmt = conn.prepareStatement(insertSql); 33 stmt.setString(1, arr[0].toString()); 34 stmt.setString(2, arr[1].toString()); 35 stmt.setString(3, arr[2].toString()); 36 stmt.setString(4, arr[3].toString()); 37 stmt.setString(5, arr[4].toString()); 38 stmt.setString(6, arr[5].toString()); 39 stmt.setString(7, arr[6].toString()); 40 stmt.setString(8, arr[7].toString()); 41 stmt.setString(9, arr[8].toString()); 42 stmt.setString(10, arr[9].toString()); 43 stmt.setString(11, arr[10].toString()); 44 stmt.setString(12, arr[11].toString()); 45 stmt.setString(13, arr[12].toString()); 46 stmt.setString(14, arr[13].toString()); 47 stmt.setString(15, arr[14].toString()); 48 stmt.executeUpdate(); 49 stmt.close(); 50 conn.close(); 51 52 } catch (SQLException e) { 53 e.printStackTrace(); 54 } 55 } catch (InstantiationException e) { 56 e.printStackTrace(); 57 } catch (IllegalAccessException e) { 58 e.printStackTrace(); 59 } catch (ClassNotFoundException e) { 60 e.printStackTrace(); 61 } 62 63 } 64 65 /** 66 * 67 * @param sql更新数据库语句 68 */ 69 public void updateSQL(String updateSql) { 70 try { 71 Class.forName(driver).newInstance(); 72 try { 73 conn = DriverManager.getConnection(url, user, password); 74 } catch (SQLException e) { 75 e.printStackTrace(); 76 } 77 try { 78 stmt = conn.prepareStatement(updateSql); 79 stmt.execute(updateSql); 80 } catch (SQLException e) { 81 e.printStackTrace(); 82 } 83 84 } catch (InstantiationException e) { 85 e.printStackTrace(); 86 } catch (IllegalAccessException e) { 87 e.printStackTrace(); 88 } catch (ClassNotFoundException e) { 89 e.printStackTrace(); 90 } 91 } 92 93 /** 94 * 95 * @param sql一般查询 96 */ 97 public ResultSet querySQL(String searchSql) { 98 try { 99 Class.forName(driver).newInstance();100 try {101 conn = DriverManager.getConnection(url, user, password);102 } catch (SQLException e) {103 e.printStackTrace();104 }105 try {106 stmt = conn.prepareStatement(searchSql);107 resultSet = stmt.executeQuery();108 } catch (SQLException e) {109 e.printStackTrace();110 }111 112 } catch (InstantiationException e) {113 e.printStackTrace();114 } catch (IllegalAccessException e) {115 e.printStackTrace();116 } catch (ClassNotFoundException e) {117 e.printStackTrace();118 }119 return resultSet;120 }121 }
Server.java (服务端类)
1 package com.lcw.rmi.collection; 2 3 import java.net.MalformedURLException; 4 import java.rmi.AlreadyBoundException; 5 import java.rmi.Naming; 6 import java.rmi.RemoteException; 7 import java.rmi.registry.LocateRegistry; 8 9 public class Server {10 11 /**12 * @param args13 */14 public static void main(String[] args) {15 try {16 int port = 9797;17 String address = "rmi://localhost:"+port+"/nba";18 IdoAction action = new doActionImpl();19 LocateRegistry.createRegistry(port);20 try {21 Naming.bind(address, action);22 System.out.println(">>>正在启动服务端..");23 System.out.println(">>>服务端启动成功!");24 System.out.println(">>>等待客户端连接...");25 System.out.println(">>>客户端Balla_兔子已连接。");26 } catch (MalformedURLException e) {27 e.printStackTrace();28 } catch (AlreadyBoundException e) {29 e.printStackTrace();30 }31 } catch (RemoteException e) {32 e.printStackTrace();33 }34 }35 36 }
Client.java (客户端类)
1 package com.lcw.rmi.collection; 2 3 import java.net.MalformedURLException; 4 import java.rmi.Naming; 5 import java.rmi.NotBoundException; 6 import java.rmi.RemoteException; 7 import java.util.List; 8 import java.util.Scanner; 9 10 public class Client { 11 12 public static void main(String[] args) { 13 int port = 9797; 14 String address = "rmi://localhost:" + port + "/nba"; 15 16 try { 17 IdoAction action = (IdoAction) Naming.lookup(address); 18 System.out.println("正在启动客户端.."); 19 System.out.println("客户端启动完毕,正在连接服务端.."); 20 System.out.println("连接成功..."); 21 System.out.println("---------------------------"); 22 23 while (true) { 24 System.out.println("①初始化数据库-请按 (1)"); 25 System.out.println(); 26 System.out.println("②自动化采集NBA(2013-2014)赛季常规赛排名数据-请按(2)"); 27 System.out.println(); 28 System.out.println("③查询NBA(2013-2014)赛季常规赛排名所有队伍-请按(3)"); 29 System.out.println(); 30 System.out.println("④查询具体球队(2013-2014)赛季常规赛排名-请按(4)"); 31 System.out.println(); 32 System.out.println("⑤查询具体详情-请按(5)"); 33 System.out.println(); 34 35 Scanner scanner = new Scanner(System.in); 36 String input = scanner.next(); 37 38 if (input.equals("1")) { 39 System.out 40 .println("---------------------------------------------------------"); 41 System.out.println("服务端数据已初始化,请按2进行数据自动化采集.."); 42 action.initData(); 43 System.out 44 .println("---------------------------------------------------------"); 45 } 46 if (input.equals("2")) { 47 System.out 48 .println("---------------------------------------------------------"); 49 System.out.println("数据自动化采集中,请稍后.."); 50 int i=0; 51 while(i<10000){//延迟操作,给数据采集缓冲时间 52 i++; 53 } 54 System.out.println("数据采集完毕..按3,4,5进行相关操作"); 55 action.getAllDatas(); 56 System.out 57 .println("---------------------------------------------------------"); 58 } 59 if (input.equals("3")) { 60 System.out 61 .println("---------------------------------------------------------"); 62 System.out.println("正在获取NBA(2013-2014)赛季常规赛队伍,请稍后.."); 63 System.out.println(); 64 List<String> list = action.getAllTeams(); 65 for (int i = 0; i < list.size(); i++) { 66 if (i % 5 == 0 && i != 0) { 67 System.out.println(); 68 } 69 System.out.print(list.get(i) + "\t"); 70 } 71 System.out.println(); 72 73 System.out 74 .println("---------------------------------------------------------"); 75 } 76 if (input.equals("4")) { 77 System.out 78 .println("---------------------------------------------------------"); 79 System.out.println("请输入你要查询的队伍名称(如:76人)"); 80 String team = scanner.next(); 81 System.out 82 .print("排名\t球队\t出手\t命中率\t出手\t命中率\t出手\t命中率\t前场\t后场\t总\t助攻\t失误\t犯规\t得分"); 83 System.out.println(); 84 List<String> list=action.getTeamInfo(team); 85 for (int i = 0; i < 15; i++) { 86 System.out.print(list.get(i)+"\t"); 87 } 88 System.out.println(); 89 System.out 90 .println("---------------------------------------------------------"); 91 } 92 if (input.equals("5")) { 93 System.out 94 .println("---------------------------------------------------------"); 95 System.out.println("数据获取中,请稍后..."); 96 System.out.println(); 97 System.out 98 .print("排名\t球队\t出手\t命中率\t出手\t命中率\t出手\t命中率\t前场\t后场\t总\t助攻\t失误\t犯规\t得分"); 99 System.out.println();100 List<String> list=action.getAllInfo();101 for(int i=0;i<450;i++){102 if(i%15==0&&i!=0){103 System.out.println();104 }105 System.out.print(list.get(i)+"\t");106 }107 System.out.println();108 System.out109 .println("---------------------------------------------------------");110 }111 }112 } catch (MalformedURLException e) {113 e.printStackTrace();114 } catch (RemoteException e) {115 e.printStackTrace();116 } catch (NotBoundException e) {117 e.printStackTrace();118 }119 }120 }
好了,关于JAVA采集数据文章就到此为止了~ 撤··