首页 > 代码库 > R爬取网页信息

R爬取网页信息

#爬取电影票房信息
library(stringr)
library(XML)
library(maps)
#htmlParse()用来interpreting HTML
#创建一个object
movie_parsed<-htmlParse("http://58921.com/boxoffice/wangpiao/20161004",
                        encoding = "UTF-8")
#the next step:extract tables/data
#readHTMLTable() for identifying and reading out those tables
tables<-readHTMLTable(movie_parsed,stringsAsFactors=FALSE)
is.matrix(tables)
is.character(tables)
is.data.frame(tables)
is.list(tables)
#so we got an "list" format#

R爬取网页信息