java爬取百度首页logo

首页 > 代码库 > java爬取百度首页logo

2024-07-24 02:10:11 223人阅读

两个方法
- 一个获得Url的网页源代码getUrlContentString，另外一个从源代码中得到想要的地址片段，其中需要用到正则表达式去匹配
得到网页源代码的过程：
- 地址为string，将地址转换为java中的url对象
- url的openConnection方法返回urlConnection
- urlConnection的connect方法建立连接
- 新建一个InputStreamReader对象，其中InputStreamReader的构建需要InputStream输入流对象，而URLConnection的getInputStream方法则返回输入流对象，所以可以连接起来
- 然后利用建立好的InputStreamReader对象建立BuffereReader对象
- 从bufferedreader对象中按行读入网页源码，追加到result字符串中，result字符串即为网页源代码字符串
logo地址匹配
- Pattern pattern = Pattern.compile(patternString);
  - java.util.regex：java类库包，用正则表达式所定义的模式对字符串进行匹配

它包括两个类：Pattern和Matcher 。

Pattern：创建匹配模式字符串。

Matcher：将匹配模式字符串与输入字符串。

pattern的compile方法：将指定的字符编译到模式中

Matcher matcher = pattern.matcher(contentString);

package com.test;
 
import java.io.*;
import java.net.*;
import java.util.regex.*;
 
public class baidulogo {
    static String  getUrlContentString(String urlString) throws Exception {
        String result = "";
        URL url = new URL(urlString);
        URLConnection urlConnection = url.openConnection();
        urlConnection.connect();
        InputStreamReader inputStreamReader = new InputStreamReader(
                urlConnection.getInputStream(), "utf-8");
        BufferedReader in = new BufferedReader(inputStreamReader);
        String line;
        while ((line = in.readLine()) != null)  {
            result += line;
        }
        return result;
    }
 
    static String  getLogoUrl(String contentString, String patternString) {
        String LogoUrl = null;
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(contentString);
        if (matcher.find()) {
            LogoUrl = matcher.group(1);
        }
        return LogoUrl;
 
    }
 
    public staticvoid main(String[] args) throws Exception {
        // 定义即将访问的链接
        String urlString = "http://www.baidu.com";
        String result = getUrlContentString(urlString);
        String patternString = "src=http://www.mamicode.com/"(.+?)\"";
        String contentString = result;
        String logoUrl = getLogoUrl(contentString, patternString);
        System.out.println(logoUrl);
    }
}

java爬取百度首页logo

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > java爬取百度首页logo

java爬取百度首页logo

看完仍有疑问？有类似问题直接问程序猿