标签:get python text 30 some soup 新手 data find
import requests
from bs4 import BeautifulSoup
import time
import csv
urls = ["http://30daydo.com/sort_type-new__day-0__is_recommend-0__page-{}".format(str(i)) for i in range(1, 31)]
j = 0
f = open('文件名.csv', 'w', encoding='utf-8', newline="")
csv_writer = csv.writer(f)
csv_writer.writerow(["文章名称", "分类", "作者", "评论数", "浏览次数", "时间", "地址"])
for url in urls:
j = j + 1
html_code = requests.get(url)
html_code.encoding = "utf-8"
print("正在爬取", j, "页,", html_code.status_code, ",", url)
soup = BeautifulSoup(html_code.text, "html.parser")
soup_2 = soup.find_all(class_="aw-question-content")
# print(len(soup_2))
for soup_3 in soup_2:
some_data = soup_3.find_all("span", attrs={"class": "text-color-999"})[0].get_text()
if some_data == "贡献":
some_data = soup_3.find_all("span", attrs={"class": "text-color-999"})[1].get_text()
# print(some_data)
# print(some_data)
if "关注" in some_data:
pls = some_data.split(" • ")[2]
llcs = some_data.split(" • ")[3]
times = some_data.split(" • ")[4]
else:
pls = some_data.split(" • ")[1]
llcs = some_data.split(" • ")[2]
times = some_data.split(" • ")[3]
data = {
"name": soup_3.find("h4").find("a").get_text(),
"fl": soup_3.find("a", attrs={"class": "aw-question-tags"}).get_text(),
"author": soup_3.find("a", attrs={"class": "aw-user-name"}).get_text(),
"pls": pls,
"llcs": llcs,
"time": times,
"url": soup_3.find("h4").find("a").get("href"),
}
# print(data)
csv_writer.writerow(
[data["name"], data["fl"], data["author"], data["pls"], data["llcs"], data["time"], data["url"]])
time.sleep(2)
f.close()
写出excel截图:
Ferencz 发布了6 篇原创文章 · 获赞 2 · 访问量 778 私信 关注
标签:get,python,text,30,some,soup,新手,data,find 来源: https://blog.csdn.net/Ferencz/article/details/104071989
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。