首页 > 代码库 > 介绍一下Mojolicious的DOM选择器Mojo::DOM和它的Mojo::UserAgent(比较Web::Scraper)
介绍一下Mojolicious的DOM选择器Mojo::DOM和它的Mojo::UserAgent(比较Web::Scraper)
最近正好又需要做页面分析,以前全是用AnyEvent::HTTP和Web::Scraper。这次试了试Mojo::DOM和Mojo::UserAgent。
先说结论,我的试用结论是:如果程序不和web沾边,只是个页面分析或文件处理程序,那还是前者好。否则的话可以考虑Mojo.
先说Mojo::DOM和Mojo::UserAgent的优点:
Mojo::DOM做的这个dom选择器在一些时候是非常方便的
读入HTML以后可以精确定位需要的元素或是用回调的方式遍历。
- my $dom = Mojo::DOM->new($html_string);
- $dom->find(‘p[id]‘)->each(sub { say shift->{id} });
在配合Mojo::UserAgent使用的时候就更方便了。Mojo::UserAgent有丰富的功能,但如果你不想用那些,你可以就把它当成一个wget(http client)用。它不但支持同步get也支持非阻塞get网页。而且和Mojo::DOM整合的很好。比如:
- my $ua = Mojo::UserAgent->new;
- my $title = $tx->res->dom->at(‘head title‘)->text;
当把这一切放到Mojolicious web框架里的时候就更美好了,因为都是一个作者写的,整合性就非常好。以前要兴师动众的工作现在2,3行代码就完成了。
以上看着都很美好了,我说些在我看来的缺点。
1. 不支持XPATH。
我很熟悉XPATH,但很不幸,不支持XPATH。虽然很多东西都可以用mojo的方式实现,但我还是能说出一些我常用但没实现的东西。并且我猜测因为此,效率也会差很多。由于Web::Scraper是用xpath,并且可以用XML::LibXML来解析html/xml,XML::LibXML是目前所有DOM方式中最快的(libxml2 > expat)。所以我认为一个纯perl写的非xpath方式的DOM选择器的效率是不足以做大规模数据分析的。(仅是猜测)
2. 可能是我的使用习惯,页面复杂的时候我还是更喜欢用Web::Scraper
用过Web::Scraper的人都知道,你需要先用xpath写一个符合某类页面的统一规则,然后用这一整套规则去分析一类页面。页面信息复杂的时候这一套规则可能几十甚至上百行。而用Mojo::DOM就只能用好多find->each和perl回调函数裹在一起,不方便调试,写页面分析规则的人还必须得会perl。
3. 没法用Coro::rouse_cb和Coro::rouse_wait了。
- my $coro = async {
- http_get "http://www.example.com/", Coro::rouse_cb;
- my ($data, $header) = Coro::rouse_wait;
- print Dumper $header;
- };
上面的这个可以。下面的这个就不行了。
- my $coro = async {
- my $ua = Mojo::UserAgent->new;
- $ua->get(‘http://www.example.com/‘ => Coro::rouse_cb);
- my ($ua2, $tx) = Coro::rouse_wait;
- my $title = $tx->res->dom->at(‘head title‘)->text;
- print "$title\n";
- };
www.hwmqh.com/gggbdfwww.hwmqh.com/gbdfgfwwww.hwmqh.com/gbdfkhwwww.hwmqh.com/gbdfshwww.hwmqh.com/gbdfsjxzwww.hwmqh.com/gbdfylsjxzwww.hwmqh.com/gbdfwfmwww.hwmqh.com/gbdfdtkhwww.hwmqh.com/gbdfhywww.hwmqh.com/gbdfrhkhwww.hwmqh.com/gbdfzdlwww.hwmqh.com/gbdfwwww.hwmqh.com/gbdfdtkmdlwww.hwmqh.com/gbdfglwwww.hwmqh.com/gbdfxjwwww.hwmqh.com/gbdfwtkhzxwww.hwmqh.com/gbdfwtdhkhwww.hwmqh.com/gbdfwkhwww.hwmqh.com/gbdfwthykhwww.hwmqh.com/gbdftgywww.hwmqh.com/gbdfylwzwww.hwmqh.com/gbdfzmzcwww.hwmqh.com/gbdfbjlwww.hwmqh.com/gbdfylyqwww.hwmqh.com/mdgbdfrqrhwww.hwmqh.com/gbdfmdyjmwww.hwmqh.com/mdgbdfaqmwww.hwmqh.com/gbdfkmdlwww.hwmqh.com/gbdfxwzwww.hwmqh.com/gbdfwtzxwww.hwmqh.com/gbdfdmswww.hwmqh.com/gbdfzcwww.hwmqh.com/gbdfsywww.hwmqh.com/gbdfwzxwww.hwmqh.com/gbdfzjwww.hwmqh.com/gbdfdzwww.rhliv.com/gbdfwww.rhliv.com/gbdfkhwww.rhliv.com/gbdfylwwww.rhliv.com/gbdfylwww.rhliv.com/gbdfhykhwww.rhliv.com/1659988_comgbdfwww.rhliv.com/gbdfdhtzwww.rhliv.com/gbdfylptwww.rhliv.com/gbdfshywww.rhliv.com/gbdfzxkhwww.rhliv.com/gbdfgwwww.rhliv.com/gbdfwtwww.rhliv.com/gbdfylcwww.rhliv.com/gbdfdlwww.rhliv.com/gbdfxcwww.rhliv.com/gbdfyldlwww.rhliv.com/gbdfkhblwww.rhliv.com/gbdfylkhwww.rhliv.com/gbylgbdfwww.rhliv.com/gggbdfylcwww.rhliv.com/gbdfsjzmdlwww.rhliv.com/gbdfylflwww.rhliv.com/gbdfzmnyqwww.rhliv.com/gbdfyjwww.rhliv.com/gbdfxmfwww.rhliv.com/szdmdgbdfwww.rhliv.com/mdgbdfwww.rhliv.com/gbdfdhkhwww.rhliv.com/gbdfdlkhwww.rhliv.com/gbdfwtkhwww.rhliv.com/gbdfkh1581260www.rhliv.com/gbdfylhbwzwww.rhliv.com/gbdfyqwww.rhliv.com/sygbdfylwww.rhliv.com/gbdfylzmyqwww.rhliv.com/gbdfylyflmwww.rhliv.com/gbdfylcznlwww.rhliv.com/gbdfwzwww.rhliv.com/gbdftzwww.rhliv.com/gbdfdhwww.rhliv.com/gbdfsjwww.rhliv.com/gggbdfwww.rhliv.com/gbdfgfwwww.rhliv.com/gbdfkhwwww.rhliv.com/gbdfshwww.rhliv.com/gbdfsjxzwww.rhliv.com/gbdfylsjxzwww.rhliv.com/gbdfwfmwww.rhliv.com/gbdfdtkhwww.rhliv.com/gbdfhywww.rhliv.com/gbdfrhkhwww.rhliv.com/gbdfzdlwww.rhliv.com/gbdfwwww.rhliv.com/gbdfdtkmdlwww.rhliv.com/gbdfglwwww.rhliv.com/gbdfxjwwww.rhliv.com/gbdfwtkhzxwww.rhliv.com/gbdfwtdhkhwww.rhliv.com/gbdfwkhwww.rhliv.com/gbdfwthykhwww.rhliv.com/gbdftgywww.rhliv.com/gbdfylwzwww.rhliv.com/gbdfzmzcwww.rhliv.com/gbdfbjlwww.rhliv.com/gbdfylyqwww.rhliv.com/mdgbdfrqrhwww.rhliv.com/gbdfmdyjmwww.rhliv.com/mdgbdfaqmwww.rhliv.com/gbdfkmdlwww.rhliv.com/gbdfxwzwww.rhliv.com/gbdfwtzxwww.rhliv.com/gbdfdmswww.rhliv.com/gbdfzcwww.rhliv.com/gbdfsywww.rhliv.com/gbdfwzxwww.rhliv.com/gbdfnyqbwww.rhliv.com/gbdfzjwww.rhliv.com/gbdfdzwww.bbilo.com/gbdfwww.bbilo.com/gbdfkhwww.bbilo.com/gbdfylwwww.bbilo.com/gbdfylwww.bbilo.com/gbdfhykhwww.bbilo.com/1659988_comgbdfwww.bbilo.com/gbdfylptwww.bbilo.com/gbdfshywww.bbilo.com/gbdfzxkhwww.bbilo.com/gbdfgwwww.bbilo.com/gbdfwtwww.bbilo.com/gbdfylcwww.bbilo.com/gbdfdlwww.bbilo.com/gbdfxcwww.bbilo.com/gbdfyldlwww.bbilo.com/gbdfkhblwww.bbilo.com/gbdfylkhwww.bbilo.com/gggbdfylcwww.bbilo.com/gbdfsjzmdlwww.bbilo.com/gbdfylflwww.bbilo.com/gbdfzmnyqwww.bbilo.com/gbdfyjwww.bbilo.com/gbdfxmfwww.bbilo.com/szdmdgbdfwww.bbilo.com/mdgbdfwww.bbilo.com/gbdfdhkhwww.bbilo.com/gbdfwtkhwww.bbilo.com/gbdfkh1581260www.bbilo.com/gbdfylhbwzwww.bbilo.com/gbdfyqwww.bbilo.com/gbdfylzmyqwww.bbilo.com/gbdfylyflmwww.bbilo.com/gbdfylcznlwww.bbilo.com/gbdfwzwww.bbilo.com/gbdftzwww.bbilo.com/gbdfdhwww.bbilo.com/gbdfsjwww.bbilo.com/gggbdfwww.bbilo.com/gbdfgfwwww.bbilo.com/gbdfkhwwww.bbilo.com/gbdfshwww.bbilo.com/gbdfsjxzwww.bbilo.com/gbdfylsjxzwww.bbilo.com/gbdfwfmwww.bbilo.com/gbdfhywww.bbilo.com/gbdfzdlwww.bbilo.com/gbdfwwww.bbilo.com/gbdfdtkmdlwww.bbilo.com/gbdfglwwww.bbilo.com/gbdfxjwwww.bbilo.com/gbdfwtkhzxwww.bbilo.com/gbdfwtdhkhwww.bbilo.com/gbdfwkhwww.bbilo.com/gbdfwthykhwww.bbilo.com/gbdftgywww.bbilo.com/gbdfylwz
介绍一下Mojolicious的DOM选择器Mojo::DOM和它的Mojo::UserAgent(比较Web::Scraper)