首页 > 其他分享> 文章详细

爬虫_requests基本使用

2022-05-26 17:01:04 阅读：212 来源： 互联网

标签：基本 url 爬虫 content headers print requests response

1.基本使用

1.1 文档

官方文档：

　　http://cn.python-requests.org/zh_CN/latest/

快速上手：

　　http://cn.python-requests.org/zh_CN/latest/user/quickstart.html

1.2.安装

pip install requests

或

pip install requests -i https://pypi.douban.com/simple

1.3.response的属性及类型

类型：models.Response

r.text:　获取网站源码

r.encoding:　访问或定制编码方式

r.url:　获取请求的url

r.content:　响应的字节类型

r.status_code:　响应的状态码

r.headers:　响应的头信息

import requests

url= 'http://www.baidu.com'
#一个类型和六个属性
response = requests.get(url=url)
print(type(response))

#设置响应的编码格式
response.encoding='utf-8'

#以字符串的形式来返回了网页的源码（常用）
print(response.text)

#返回一个url地址
print(response.url)

#返回二进制数据
print(response.content)

# 返回响应的状态码
print(response.status_code)

# 返回的是响应头信息
print(response.headers)

2.request的get请求

import requests
url='https://www.baidu.com/s?'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36'
}
data={
    'wd':'北京'
}
# url 请求资源路径
# params 参数
# kwargs 字典
response = requests.get(url=url,params=data,headers=headers)
content = response.text
print(content)
# 总结：
# （1）参数使用params传递
# （2）参数无需urlencode编码
# （3）不需要请求对象定制
# （4）请求资源路径中的？可以加也可以不加

标签：基本,url,爬虫,content,headers,print,requests,response
来源： https://www.cnblogs.com/ckfuture/p/16314077.html