首页 > 代码库 > 通过网络得到html,并解析出其中网址
通过网络得到html,并解析出其中网址
1 import java.io.BufferedReader; 2 import java.io.InputStream; 3 import java.io.InputStreamReader; 4 import java.net.URL; 5 import java.net.URLConnection; 6 import java.util.ArrayList; 7 import java.util.List; 8 9 public class TestIndex {10 11 private String rootUrl = "http://localhost/apk/";12 private String listUrl = rootUrl + "test-index.htm";13 private static List<String> imageUrlList = new ArrayList<String>();14 public static void main(String args[]){15 TestIndex ti = new TestIndex();16 ti.getData();17 System.out.println(imageUrlList.size());18 for(int i=0; i<imageUrlList.size();i++){19 System.out.println(imageUrlList.get(i));20 }21 22 }23 24 private InputStream getNetInputStream(String urlStr)25 {26 try27 {28 URL url = new URL(urlStr);29 URLConnection conn = url.openConnection();30 conn.connect();31 InputStream is = conn.getInputStream();32 return is;33 }34 catch (Exception e)35 {36 37 }38 return null;39 }40 private void getData() {41 try42 {43 InputStream is = getNetInputStream(listUrl);44 InputStreamReader isr = new InputStreamReader(is);45 BufferedReader br = new BufferedReader(isr);46 String s = null;47 String html="";48 while ((s = br.readLine()) != null)49 {50 html+=s;51 }52 53 is.close();54 String startStr = "src=http://www.mamicode.com/"https://";55 String endStr = " width=";56 int start = 0;57 int end = 0;58 int index =0;59 imageUrlList.clear();60 while (true)61 {62 start = html.indexOf(startStr, index);63 if (start < 0)64 break;65 index=start;66 end = html.indexOf(endStr, index);67 String ss = html.substring(start+5,end-1);68 imageUrlList.add(ss);69 index +=ss.length();70 }71 }72 catch (Exception e)73 {74 // TODO: handle exception75 }76 }77 }
解析出htm文件中包含的网址。
结果:
20https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcRvQgUjsVDBncM3mVIgIyIuE87BnlyJUy2BNsAp8kUoTanrC_css5mVAwhttps://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcThd8cYjOTmCgYJZxX5ls-xpxaAlH1_yocOSCqI5_7OkL29SNtbCZ7q2Yojhttps://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTl-FzKmsppxuwzmTITGCv9uDxmrWr1pG0lw8mUD9wkWIloASxQeBEMnVjzhttps://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcQWbmiZJIXKHV2IoTBp7zSY6kD5g5VPzVtBTLJYYR5nwTtKi2-0_u93qL4ehttps://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSlrLi_GtVgUehU7coFe1eMdrJxPdvS42iTqXkla0g75s31NBfAq2u1LE4https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSkrlyGxSs8Dr_7k3MUvoGq1vE45LgHZ0zEhIEdD9LLZiaoMcE7IAqn8hohttps://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTu__OUSJ4R4EKBu4jOi2ZAdHohpVQIBy3-SfnI8FYpN8wVC9kJG_aWuk_whttps://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcR3Bf7YtsHJ813A5_wWzpxIy4MbEmqz5NLw3qv1nPxOZqVjH7QlY-qYSCghttps://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcToB4nJPqVwnzn0xeasnXyhxGgOqHXdypE6KZIMTfV9k52eIrE3iYsA6Ixmhttps://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcTkKw0LpqdB2eQMUpwdQdvM9DTeNtq1mrvMNivoQtN37p3m0OPsx4ME9i4Ohttps://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSZGzMf_3hmdDktz91yp5ZQi-eGWLCenZ0U446sXT2nqYuwlWRI_V_BVIWihttps://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTQF-55T5GM3dLdaoafPdlIYK0ESNvM6-Bsb4-B2rQTeyD5gGoCKxokExM-https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcRoRjo4TFeXmx47zE6VH0ylcO0IQ2HBsOHYIMJCI9MsRyg_PF1WhHbqG76Qhttps://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcRrdegt1koEy51dLWrJAbVMJBlCEZ7fPl2mztYYM6onvxocRCq030Ft1gEhttps://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcTtnQpte0uq9Ue9nsg25GeO1kw_-Hcn69ozTQkiMBHrXKwlANutyhwKD9XMhttps://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRNRdxzmuFKABoGgyv2SC0gMticosL2LB3V1fBMOwNtVBZxHkyMw4IcWBFjhttps://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQr40CEf75nWCj5dg-oeKtb9zK6mhktu7vnfoYAh5ioy34goC3c9ptDkQwPhttps://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQUnyHrVEbppqhZnWnQrijhBFP0X34gRf7pKw6PdT4ggepB2k9g-p71sgGhhttps://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcR9Us9qblbTJaw47gULXCI8sHKN4I61gYsT2ijebtZzgsMDI8GmYqQpIIwhttps://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSIrW-IbBZjM9Ztn60r9QE1_FIMjt494qGX12tqsLsibYPLuFVwyVSgz1I
通过网络得到html,并解析出其中网址
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。