软工博客归档工具（自用）

2021-06-21 20:04:23 阅读：149 来源： 互联网

标签：text 软工 date html 自用归档 print div class

#-*- codeing = utf-8 -*-
#@Time :2021/6/21 16:51
#@Author :Xxg
#@Site :
#@File :作业归档完善版.py
#@Software :PyCharm
import random
import requests
import pymysql
from lxml import etree
import docx
headers={
    "User-Agent": ""
}
url = ''

reponse = requests.get(url, headers=headers)   # reponse
html = etree.HTML(reponse.text)
# print(html)
date = html.xpath('//div[@class="dayTitle"]/a/text()')
name = html.xpath('//div[@class="postTitle"]/a/span/text()')
zhaiyao = html.xpath('//div[@class="postCon"]/div[@class="c_b_p_desc"]/text()')
# 链接
yueduquanwen = html.xpath('//div[@class="postCon"]/div[@class="c_b_p_desc"]/a/@href')
for i in range(len(yueduquanwen)):
    url1 = yueduquanwen[i]
    # url1 = "https://www.cnblogs.com/sakura-xxg/category/1990334.html"
    reponse1 = requests.get(url1, headers=headers)  # reponse
    html_son = etree.HTML(reponse1.text)
    title = html_son.xpath('//div[@class="post"]/h1[@class="postTitle"]/a/span/text()')
    print(title)
    content = html_son.xpath('//div[@class="blogpost-body blogpost-body-html"]/p/text()')
    print(content)
    date = html_son.xpath('//div[@class="postDesc"]/span[@id="post-date"]/text()')
    print(date)
# 创建docx对象
    file = docx.Document()
    file.add_paragraph(date)
    for j in range(len(content)):
        file.add_paragraph(content[j])
    file.save("D:\\"+title[0]+".docx")
    # for j in range(len(content)):
    #   file.add_paragraphy(content[j])
    # date_son = html.xpath('//div[@class="dayTitle"]/a/text()')
    # name_son = html.xpath('//div[@class="postTitle"]/a/span/text()')
    # zhaiyao_son = html.xpath('//div[@class="postCon"]/div[@class="c_b_p_desc"]/text()')
    # print(date_son)
    # print(zhaiyao_son)
print(yueduquanwen)
# print(date[0])
# print(name[0].replace(" ","").replace("\n",""))
# print(zhaiyao[0].replace("\n",""))
# print(zhaiyao[0])

# 保存成word
# for n in range(len(date)):
#     file = docx.Document()
#     file.add_paragraph(date[n])
#     file.add_paragraph(zhaiyao[2*n].replace("\n",""))
#     # file.save("F:\\word\\"+name[n].replace(" ","").replace("\n","")+".docx")
#     print(date[n])
#     print(zhaiyao[2*n])

标签：text,软工,date,html,自用,归档,print,div,class
来源： https://www.cnblogs.com/sakura-xxg/p/14915406.html

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

软工博客归档工具（自用）