首页 > 代码库 > Jsoup
Jsoup
jsoup 是一款Java 的HTML解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于jQuery的操作方法来取出和操作数据。(百度百科)jar包下载,可以看到如下的案例:
package com.gqx.jsoupTest;import java.io.BufferedInputStream;import java.io.File;import java.io.FileOutputStream;import java.io.IOException;import java.io.InputStream;import java.net.URL;import java.net.URLConnection;import java.util.Iterator;import java.util.Set;import org.jsoup.Jsoup;import org.jsoup.nodes.Document;import org.jsoup.nodes.Element;import org.jsoup.select.Elements;public class Crawler { public static void main(String[] args) throws Exception { // TODO Auto-generated method stub Document document= Jsoup.connect("http://www.cnblogs.com/helloworldcode/").get(); Elements select=document.select("a[id=Header1_HeaderTitle]"); for (Element element : select) { System.out.println(element.text()); } } }
其中关于Jsoup的connect()方法中:API描述如下:
public static Connection connect(String url)//Creates a new Connection to a URL. Use to fetch and parse a HTML page.Use examples:Document doc = Jsoup.connect("http://example.com").userAgent("Mozilla").data("name", "jsoup").get();Document doc = Jsoup.connect("http://example.com").cookie("auth", "token").post();Parameters://url - URL to connect to. The protocol must be http or https.Returns://the connection. You can add data, cookies, and headers; set the user-agent, referrer, method; and then execute.
可以看出通过Jsoup.connect(String url)就可以得到一个connection对象,继续查看其定义,A Connection provides a convenient interface to fetch content from the web, and parse them into Documents。容易看出,通过connection对象我们就可以得到了网页的所有内容,现在问题是怎么在类中得到获得的标签元素以及内容。就是从网页中的所有html内容转化为一个document对象。这个时候就是可以通过get()对象获得。
Document get() throws IOExceptionExecute the request as a GET, and parse the result.Returns:parsed DocumentThrows:MalformedURLException - if the request URL is not a HTTP or HTTPS URL, or is otherwise malformedHttpStatusException - if the response is not OK and HTTP response errors are not ignoredUnsupportedMimeTypeException - if the response mime type is not supported and those errors are not ignoredSocketTimeoutException - if the connection times outIOException - on error
Jsoup
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。