通过脚本自动下载Esri会议材料

首页 > 代码库 > 通过脚本自动下载Esri会议材料

通过脚本自动下载Esri会议材料

2024-10-01 07:22:02 216人阅读

在Esri的官网上，可以下载到Esri参加或者举办的各类会议的材料。官方地址为：http://proceedings.esri.com/library/userconf/index.html。

针对某一会议，该网上往往提供会议材料清单表格，部分材料是提供下载的，例如PPT文档或者PDF文档。

以下脚本可用于辅助下载Esri Proceeding网站上的会议PPT，下载好的文档会自动以文档的标题重命名，这样方便检索使用。

制定下载后存放文档的本地文件夹，预先把包含会议材料清单表格的页面使用浏览器保存到本地。

# -*- coding:utf-8 -*-
from lxml import etree
from lxml.html import soupparser
import re
from os import path
import shutil
from os import rename
from urllib.request import Request
import urllib.request

try:
    rootpath = ‘D:/EsriPPT/‘
    f = open(‘D:/Recent Proceedings.html‘, ‘r‘, encoding="windows-1252", errors=‘ignore‘)
    t = ‘‘.join(f.readlines())
    parser = etree.XMLParser(encoding=‘gbk‘, dtd_validation=False, recover=True, ns_clean=True)
    tree = soupparser.fromstring(t)
    rows = tree.xpath(‘//table/tbody/tr‘)
    for r in rows:
        cols = r.xpath(‘td‘)
        for links in cols[1].iterchildren(tag=‘a‘):
            result = re.search(r‘dev_int_\d+\.pdf‘, links.get(‘href‘))
            if(result!=None):
                oldpath = rootpath + result.group(0)
                newpath = cols[0].text + ‘.pdf‘
                newpath = rootpath + newpath.replace(‘:‘, ‘_‘).replace(‘/‘, ‘‘).replace(‘?‘, ‘‘)
                # to check whether the original file has downloaded
                if path.exists(oldpath) and not path.exists(newpath):
                    rename(oldpath, newpath)
                else:
                    remote = ‘http://proceedings.esri.com/library/userconf/devsummit17/papers/‘ + result.group(0)
                    urllib.request.urlretrieve(remote, oldpath)
                    rename(oldpath, newpath)

finally:
    f.close()
    del tree

通过脚本自动下载Esri会议材料

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > 通过脚本自动下载Esri会议材料

通过脚本自动下载Esri会议材料

看完仍有疑问？有类似问题直接问程序猿