ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

python如何解决爬虫ip被封- - -“您操作太频繁,请稍后再访问“

2022-10-26 13:07:14  阅读:345  来源: 互联网

标签:python 小爬虫 爬虫 频繁 访问 招聘


描述

python 3.9.6 pycharm

问题

当我想爬取某招聘网站的信息的时候出现如下信息

{"status":false,"msg":"您操作太频繁,请稍后再访问","clientIp":"113.92.xxx.xxx","state":2402}

原因

招聘网站的反爬机制会识别访问的ip地址,没有携带hearders字段时,机制会认为是爬虫,将访问ip封了

解决方法

需要添加header,用来模拟用户登陆。 右键-》copy-》copy as cURL

将复制的url复制到的curl command板块 将此hearders添加到代码里 再在请求里添加headers字段

req = requests.post(url,data=data,headers=headers)

即可成功获取

完整代码

import requests



data = {
          
   
    first: true,
    pn: 1,
    kd: devops

}

headers = {
          
   
    authority: www.lagou.com,
    sec-ch-ua: " Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91",
    x-anit-forge-code: 0,
    sec-ch-ua-mobile: ?0,
    user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36,
    content-type: application/x-www-form-urlencoded; charset=UTF-8,
    accept: application/json, text/javascript, */*; q=0.01,
    x-requested-with: XMLHttpRequest,
    x-anit-forge-token: None,
    origin: https://www.lagou.com,
    sec-fetch-site: same-origin,
    sec-fetch-mode: cors,
    sec-fetch-dest: empty,
    referer: https://www.lagou.com/jobs/list_devops?labelWords=&fromSearch=true&suginput=,
    accept-language: zh-CN,zh;q=0.9,
    cookie: user_trace_token=20210701180011-4072c9db-d003-4844-a073-736f42bf40d2; _ga=GA1.2.990750347.1625133612; LGUID=20210701180012-2e17d8bd-5ea4-44c5-8778-f1c7a1d55733; RECOMMEND_TIP=true; privacyPolicyPopup=false; _gid=GA1.2.1172577386.1625133628; index_location_city=%E5%85%A8%E5%9B%BD; __lg_stoken__=c464107bfc8c7699b4b9ab091a02b36fa0da7206bb819632fd3fd24aaa845416a2fedb45e6ce11b7c47e4caf7f6cdcb4148deec393528ad92441dded9e313ab97f29157b284b; JSESSIONID=ABAAAECAAEBABIICDEA3CABC2939F48693F2083DDF69F92; WEBTJ-ID=2021072%E4%B8%8A%E5%8D%8811:04:33110433-17a652cd0ed36b-005519fd181336-6373264-921600-17a652cd0eee17; sensorsdata2015session=%7B%7D; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1625133612,1625133614,1625133628,1625206020; PRE_UTM=; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2F; LGSID=20210702140659-b01cbbaa-d692-4da4-8e24-f1f4d2d57725; PRE_HOST=www.baidu.com; PRE_SITE=https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DiBegwdc5MIYG8VRAnt1Sl3KH1qai9frV%5FGMfPmg2wuO%26wd%3D%26eqid%3Defb6541e0006959b0000000660deacff; TG-TRACK-CODE=index_search; X_HTTP_TOKEN=6d7dc50382c24c1a0906025261711c7aa8b8ab0f8e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2217a61833b30333-0d73eba337c105-6373264-921600-17a61833b31a06%22%2C%22first_id%22%3A%22%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24os%22%3A%22Windows%22%2C%22%24browser%22%3A%22Chrome%22%2C%22%24browser_version%22%3A%2291.0.4472.124%22%7D%2C%22%24device_id%22%3A%2217a61833b30333-0d73eba337c105-6373264-921600-17a61833b31a06%22%7D; _gat=1; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1625206091; LGRID=20210702140811-09ff2eee-5c0f-44d2-8501-2117d8d83d89; SEARCH_ID=29f013ed02e6461cb49f2da2573cf25a,
}

url = https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false


s = requests.session()
s.keep_alive = False


session=requests.session()
session.get(https://www.lagou.com/jobs/list_devops?labelWords=&fromSearch=true&suginput=,headers=headers)
cookies = session.cookies

req = requests.post(url,data=data,headers=headers)
print(req.text)

标签:python,小爬虫,爬虫,频繁,访问,招聘
来源:

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有