ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

微博根据关键字搜索爬虫

2022-04-24 16:04:19  阅读:242  来源: 互联网

标签:weibo cookies 22% 爬虫 关键字 微博 user 2C% div


1.登录获取cookies
2.cookie转cookies

# -*- coding: utf-8 -*-
# TODO cookies_str转cookies_dic
# @Date    : 2022/4/22 9:38
# @Author  : layman
cookies_str = "SINAGLOBAL=462092313429110.737.1648189947190; login_sid_t=799d349cdfsd25759903d131ca6fd0ad0; cross_origin_proto=SSL; _s_tentry=weibo.com; Apache=8348613412866.332.1650589816565; ULV=1650589816569:2:1:1:8348613412866.332.1650589816565:1648189947200; SUB=_2A25PZnDJDeRhGeFN6VUW-S_Kyj6IHXVsEuUBrDV8PUNbmtAKLUL6kW9NQFh55mlCd6g7TuU659NR2F5DNWShYC_i; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WF4kv-4n5KEAdq3XeiQfdqc5JpX5KzhUgL.FoM0eoMN1K2ceKz2dJLoI7LbIgUjqPL_qgRt; ALF=1682125848; SSOLoginState=1650589849; wvr=6; webim_unReadCount=%7B%22time%22%3A1650589853165%2C%22dm_pub_total%22%3A9%2C%22chat_group_client%22%3A0%2C%22chat_group_notice%22%3A0%2C%22allcountNum%22%3A32%2C%22msgbox%22%3A0%7D; PC_TOKEN=0d19237494; WBStorage=4d96c54e|undefined"

cookies_dic = {}
for cookie in cookies_str.split('; '):
    cookies_dic[cookie.split('=')[0]] = cookie.split('=')[-1]

print(cookies_dic)

3.爬取收集

# -*- coding: utf-8 -*-
# TODO 微博查询
# @Date    : 2022/4/22 9:12
# @Author  : layman
import json
import time

import pandas as pd
import pymysql
import requests
from lxml import etree

headers = {
    'referer': 'https://s.weibo.com/user?q=%E5%AE%9C%E6%98%8C&Refer=weibo_user',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36',
}
cookies = {'SINAGLOBAL': '462092384310.737.1648189947190', 'login_sid_t': '799d349cf324w903d131ca6fd0ad0',
           'cross_origin_proto': 'SSL', 'PC_TOKEN': 'c797273222', '_s_tentry': 'weibo.com',
           'Apache': '8348613412866.332.1650589816565',
           'ULV': '1650589816569:2:1:1:8348613412866.332.1650589816565:1648189947200',
           'SUB': '_2A25PZnDJDeRhewrN6VUW-S_Kyj6IHXVsEuUBrDV8PUNbmtAKLUL6kW9NQFh55mlCd6g7TuU659NR2F5DNWShYC_i',
           'SUBP': '0033WrSXqPxfM725Ws9jqgMF55529P9D9WF4kv-4n5KEAdq3XeiQfdqc5JpX5KzhUgL.FoM0eoMN1K2ceKz2dJLoI7LbIgUjqPL_qgRt',
           'ALF': '1682125848', 'SSOLoginState': '1650589849', 'wvr': '6',
           'webim_unReadCount': '%7B%22time%22%3A1650589853165%2C%22dm_pub_total%22%3A9%2C%22chat_group_client%22%3A0%2C%22chat_group_notice%22%3A0%2C%22allcountNum%22%3A32%2C%22msgbox%22%3A0%7D',
           'WBStorage': '4d96c54e|undefined'}
db = pymysql.connect(host='localhost', port=3306,
                     user='root', passwd='root', db='wxb', charset='utf8')

cursor = db.cursor()
for page in range(1, 51):
    resp = requests.get(url=f'https://s.weibo.com/user?q=%E5%AE%9C%E6%98%8C&Refer=weibo_user&page={page}',
                        cookies=cookies)
    time.sleep(1)
    html = etree.HTML(resp.text)
    try:
        user_list = html.xpath('//*[@id="pl_user_feedList"]')[0]
        for user_name, official, user_fans in zip(user_list.xpath('./div[*]/div[2]/div/a[1]/text()'),
                                                  user_list.xpath('./div[*]/div[2]/p[2]/text()'),
                                                  user_list.xpath('./div[*]/div[2]/p[3]/span[2]/a/text()')):
            # user_name = user_list.xpath('./div[*]/div[2]/div/a[1]/text()')
            # user_fans = user_list.xpath('./div[*]/div[2]/p[3]/span[2]/a/text()')
            print(official)
            if official is None or len(str(official).strip()) == 0:
                official = '非官微'
            values = (user_name, official, user_fans)
            try:
                sql = "INSERT INTO weibo(user_name, official, user_fans) VALUES (%s,%s,%s)"
                cursor.execute(sql, values)
                db.commit()
            except:
                pass
    except:
        pass

标签:weibo,cookies,22%,爬虫,关键字,微博,user,2C%,div
来源: https://www.cnblogs.com/shun998/p/16186209.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有