首页 > 代码库 > Selenium FF WebDriver 遍历所有链接(另类爬虫)
Selenium FF WebDriver 遍历所有链接(另类爬虫)
请看这个页面,我想要找到某个公告的内容,必须一个一个打开链接,尼玛好多啊。
于是,我机智的使用selenium打开每一个链接,然后把公告内容写入txt
那需要做一下步奏
1.依次打开一个公告
2.切换focus到新窗口,找到公告内容,写到txt
3.关闭该窗口
4.切换到主窗口
5.当前页面遍历完,点击下一页
6.重复步奏1
由于下一页是一个很好用的flag,就可以当做循环条件,因为最后一页没有下一页的element
接下来要找到相关的的xpath
列表数目: count(//tr/td/a[starts-with(@href,‘article_show.asp?ID=‘) and @title!=‘‘ ])
列表: //tr/td/a[starts-with(@href,‘article_show.asp?ID=‘) and @title!=‘‘ ]
下一页: //div/a[text()=‘下一页‘]
selenium WebDriver测试网页时,点击target=”_blank”的链接,在打开新页面,切换到新窗口的
这要使用
String currentWindow = driver.getWindowHandle();//获取当前窗口句柄
Set<String> handles = driver.getWindowHandles();//获取所有窗口句
WebDriver window = driver.switchTo().window(it.next());//切换到新窗口
driver.switchTo().window(currentWindow);//回到原来页面
driver=driver.switchTo().window(driver.getWindowHandle()); //把下一页变成当前driver
currentWindow = driver.getWindowHandle(); //get all windows Set<String> handles= driver.getWindowHandles(); for (String s : handles) { //current page is don‘t close if (s.equals(currentWindow)) continue; else { window =driver.switchTo().window(s); } window .close() ; }driver.switchTo().window(currentWindow);
具体代码
package com.packt.webdriver.chapter3;import java.io.BufferedWriter;import java.io.FileWriter;import java.io.IOException;import java.util.List;import java.util.Set;import java.util.concurrent.TimeUnit;import org.openqa.selenium.By;import org.openqa.selenium.WebDriver;import org.openqa.selenium.WebElement;public class TraversalAllLinks { private static String currentWindow; public static void main(String[] args) { WebDriver driver=DriverFactory.getFirefoxDriver(); driver.get("http://www.lhgtj.gov.cn/article.asp?ClassID=86&page=1"); driver.manage().window().maximize(); driver.manage().timeouts().implicitlyWait(60, TimeUnit.SECONDS); driver.manage().timeouts().pageLoadTimeout(60, TimeUnit.SECONDS); WebElement nextPage=driver.findElement(By.xpath("//tr/td/a[@title=‘下一页‘]")); while(nextPage.isDisplayed()) { List<WebElement> links=driver.findElements(By.xpath("//tr/td/a[starts-with(@href,‘article_show.asp?ID=‘) and @title!=‘‘ ]")); for(WebElement link:links) { WebDriver window; System.out.println(link.getText()); try { writeToTXT(link.getText()); } catch (IOException e1) { // TODO Auto-generated catch block e1.printStackTrace(); } link.click(); currentWindow = driver.getWindowHandle(); //get all windows Set<String> handles= driver.getWindowHandles(); for (String s : handles) { //current page is don‘t close if (s.equals(currentWindow)) continue; else { window =driver.switchTo().window(s); window.manage().window().maximize(); window.manage().timeouts().implicitlyWait(60, TimeUnit.SECONDS); window.manage().timeouts().pageLoadTimeout(60, TimeUnit.SECONDS); //get all tables List<WebElement> tbs=window.findElements(By.xpath("//tbody/tr/td/p")); for(WebElement tb:tbs) { System.out.println(tb.getText()); try { writeToTXT(tb.getText()+"\n"); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } //close the table window window .close() ; } //swich to current window driver.switchTo().window(currentWindow); } } // click next page nextPage.click(); //set next page to current page driver=driver.switchTo().window(driver.getWindowHandle()); driver.manage().window().maximize(); driver.manage().timeouts().implicitlyWait(60, TimeUnit.SECONDS); driver.manage().timeouts().pageLoadTimeout(60, TimeUnit.SECONDS); nextPage=driver.findElement(By.xpath("//tr/td/a[@title=‘下一页‘]")); } } //write logs public static void writeToTXT(String message) throws IOException { BufferedWriter bf = null; try { //set true ,avoid bf = new BufferedWriter(new FileWriter("report.txt", true)); bf.write(message); bf.flush(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } finally { bf.close(); } }}
DriverFactory
public static WebDriver getFirefoxDriver() { try { WindowsUtils.tryToKillByName("firefox.exe"); } catch(Exception e) { System.out.println("can not find firefox process"); } File file=new File("d:\\firebug-2.0.4-fx.xpi"); FirefoxProfile profile = new FirefoxProfile(); try { profile.addExtension(file); profile.setPreference("extensions.firebug.currentVersion", "2.0.4"); profile.setPreference("extensions.firebug.allPagesActivation", "on"); } catch (IOException e3) { // TODO Auto-generated catch block e3.printStackTrace(); } WebDriver driver = new FirefoxDriver(profile); return driver; }
Selenium FF WebDriver 遍历所有链接(另类爬虫)