首页 > 代码库 > ganon抓取网页示例
ganon抓取网页示例
项目地址: http://code.google.com/p/ganon/
文档: http://code.google.com/p/ganon/w/list
这个功能强大的很,使用类似js的标签选择器识别DOM
The Ganon library gives access to HTML/XML documents in a very simple object oriented way. It eases modifying the DOM and makes finding elements easy with CSS3-like queries.
Ganon 使用示例:
// Parse the google code website into a DOM $html = file_get_dom(‘http://code.google.com/‘);
Access
Accessing elements is made easy through the CSS3-like selectors and the object model.
// Find all the paragraph tags with a class attribute and print the // value of the class attribute foreach($html(‘p[class]‘) as $element) { echo $element->class, "<br>\n"; } // Find the first div with ID "gc-header" and print the plain text of // the parent element (plain text means no HTML tags, just the text) echo $html(‘div#gc-header‘, 0)->parent->getPlainText(); // Find out how many tags there are which are "ns:tag" or "div", but not // "a" and do not have a class attribute echo count($html(‘(ns|tag, div + !a)[!class]‘); ?>
Modification
Elements can be easily modified after you‘ve found them.
// Find all paragraph tags which are nested inside a div tag, change // their ID attribute and print the new HTML code foreach($html(‘div p‘) as $index => $element) { $element->id = "id$index"; } echo $html; // Center all the links inside a document which start with "http://" // and print out the new HTML foreach($html(‘a[href ^= "http://"]‘) as $element) { $element->wrap(‘center‘); } echo $html; // Find all odd indexed "td" elements and change the HTML to make them links foreach($html(‘table td:odd‘) as $element) { $element->setInnerText(‘<a href="http://www.mamicode.com/#">‘.$element->getPlainText().‘</a>‘); } echo $html;
Beautify
Ganon can also help you beautify your code and format it properly.
// Beautify the old HTML code and print out the new, formatted code dom_format($html, array(‘attributes_case‘ => CASE_LOWER)); echo $html;
ganon抓取网页示例
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。