ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

网易云评论爬虫

2021-10-27 23:35:40  阅读:191  来源: 互联网

标签:网易 url 爬虫 comments json 评论 likedCount print dict


import requests
import json
import my_fake_useragent
import threading

def getHTMLText(url):
    try:
        headers = {"user-agent": my_fake_useragent.UserAgent().random()}
        r = requests.get(url, headers=headers)
        r.raise_for_status()
        # r.encoding = r.apparent_encoding		#这里自动识别编码方式会乱码,注释掉就行了
        return r.text
    except:
        #print("getHTMLText失败!")
        #return ""
        pass

def fillList(music_id, url, commentlist):

    new_url = url + "{0}".format(music_id)
    #print("new_url======>", new_url)

    html = getHTMLText(new_url)
    #print("html======>", html)

    try:
        json_dict = json.loads(html)  # 利用json方法把json类型转成dict
        #print(json_dict)

        # likedCount 点赞数量
        likedCount = json_dict['hotComments'][0]['likedCount']
        # print(type(likedCount))
        #print("点赞数======>",likedCount)
        # 如果点赞数大于30万
        if likedCount > 300000:
            comments = json_dict['hotComments']
            # print("comments======>",comments)
            # print("type======>",type(comments))
            # print("type======>",len(comments))

            # commentlist.append([item['user']['nickname'], item['content']])
            m = "点赞数===>" + str(likedCount) + " 歌曲ID===>" + str(music_id) + " 用户===>" + comments[0]['user']['nickname'] + " 评论===>" + comments[0][
                'content']
            print(m)
            save(m, "./网抑云.txt")
    except:
        pass

def save(m, path):
    with open(path, 'a', encoding='utf-8') as f:
        f.write(m + "\n")
        f.close()

def main():
    mutex = threading.Lock()
    mutex.acquire()
    i = 254574
    mutex.release()
    #新歌的ID   1882041535  18亿
    while i <= 2000000000:
        music_id = str(i)
        url = "http://music.163.com/api/v1/resource/comments/R_SO_4_"
        commentlist = []
        fillList(music_id, url, commentlist)
        i = i + 1
        #print("url_id======>",music_id)

def thread():
    for i in range(6):
        t = threading.Thread(target=main())
        t.start()
    main_thread = threading.currentThread()
    for t in threading.enumerate():
        if t is main_thread:
            continue
        t.join()

thread()

参考:https://blog.csdn.net/weixin_43881394/article/details/109240813

标签:网易,url,爬虫,comments,json,评论,likedCount,print,dict
来源: https://www.cnblogs.com/yzgblogs/p/15473514.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有