首页 > 代码库 > HtmlAgilityPack类库解析html
HtmlAgilityPack类库解析html
一般解决方式:http://www.cnblogs.com/kissdodog/archive/2013/02/28/2936950.html
特殊处理方式:如果请求页面ContentEncoding=gzip
//获取ContentEncoding
static void getch(string url)
{
WebRequest rebRequest = WebRequest.Create(url);
HttpWebResponse web = (HttpWebResponse)rebRequest.GetResponse();
string chart = web.CharacterSet;
string conending = web.ContentEncoding;
string contenttype = web.ContentType;
Console.WriteLine(chart);
Console.WriteLine(conending);
Console.WriteLine(contenttype);
}
1.给HttpWebRequest对象,添加如下Header:
2.对接收到的流进行解码:
{
string responseBody = string.Empty;
if (response.ContentEncoding.ToLower().Contains("gzip")) {
using (GZipStream stream = new GZipStream(
response.GetResponseStream(), CompressionMode.Decompress))
{
using (StreamReaderreader = new StreamReader(stream))
{
responseBody = reader.ReadToEnd();
}
}
}
else if (response.ContentEncoding.ToLower().Contains("deflate"))
{
using (DeflateStream stream = new DeflateStream(
response.GetResponseStream(), CompressionMode.Decompress))
{
using (StreamReader reader =
new StreamReader(stream, Encoding.UTF8))
{
responseBody = reader.ReadToEnd();
}
}
}
else
{
using (Stream stream = response.GetResponseStream())
{
using (StreamReader reader =
new StreamReader(stream, Encoding.UTF8))
{
responseBody = reader.ReadToEnd();
}
}
}
return responseBody;
}
解析: HtmlDocument doc = new HtmlDocument();
//string html = wc.DownloadString("agenthome/-i31-j310-kw/");
doc.LoadHtml(responseBody);
HtmlNode node = doc.DocumentNode.SelectSingleNode("/html/body/div[1]");
Console.WriteLine(node.InnerText);
Console.WriteLine(node.InnerHtml);
Console.WriteLine(node.Name);
参考http://www.csharpwin.com/csharpspace/13345r5893.shtml
HTML解析利器HtmlAgilityPack
参考http://zhoufoxcn.blog.51cto.com/792419/595344/
HtmlAgilityPack类库解析html