ICode9

精准搜索请尝试: 精确搜索
首页 > 数据库> 文章详细

Scrapy、Scrapy-Splash、Scrapy-Redis安装

2021-02-01 19:04:50  阅读:310  来源: 互联网

标签:02 01 04 54 Redis 53 Scrapy Splash 2021


前题:安装docker并能使用

安装完在JSON文件中加入国内镜像,阿里云需要自己申请。

"registry-mirrors": [
    "https://********.mirror.aliyuncs.com",
    "https://registry.docker-cn.com",
    "http://hub-mirror.c.163.com",
    "https://docker.mirrors.ustc.edu.cn"
  ]

其他前题:

已为anaconda配置好PATH

Scrapy 安装

JupyterLab 输入

pip install scrapy
import scrapy

Splash 安装

终端中输入:

docker run -p 8050:8050 scrapinghub/splash

成功安装返回类似如下内容

Digest: sha256:b4173a88a9d11c424a4df4c8a41ce67ff6a6a3205bd093808966c12e0b06dacf
Status: Downloaded newer image for scrapinghub/splash:latest
2021-02-01 04:53:28+0000 [-] Log opened.
2021-02-01 04:53:29.033164 [-] Xvfb is started: [‘Xvfb’, ‘:846388905’, ‘-screen’, ‘0’, ‘1024x768x24’, ‘-nolisten’, ‘tcp’]
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to ‘/tmp/runtime-splash’
2021-02-01 04:53:29.354172 [-] Splash version: 3.5
2021-02-01 04:53:29.420819 [-] Qt 5.14.1, PyQt 5.14.2, WebKit 602.1, Chromium 77.0.3865.129, sip 4.19.22, Twisted 19.7.0, Lua 5.2
2021-02-01 04:53:29.421057 [-] Python 3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0]
2021-02-01 04:53:29.421504 [-] Open files limit: 1048576
2021-02-01 04:53:29.421711 [-] Can’t bump open files limit
2021-02-01 04:53:29.441758 [-] proxy profiles support is enabled, proxy profiles path: /etc/splash/proxy-profiles
2021-02-01 04:53:29.442007 [-] memory cache: enabled, private mode: enabled, js cross-domain access: disabled
2021-02-01 04:53:29.616771 [-] verbosity=1, slots=20, argument_cache_max_entries=500, max-timeout=90.0
2021-02-01 04:53:29.617331 [-] Web UI: enabled, Lua: enabled (sandbox: enabled), Webkit: enabled, Chromium: enabled
2021-02-01 04:53:29.618280 [-] Site starting on 8050
2021-02-01 04:53:29.618402 [-] Starting factory <twisted.web.server.Site object at 0x7f07b402c5c0>
2021-02-01 04:53:29.618800 [-] Server listening on http://0.0.0.0:8050
2021-02-01 04:54:39.943377 [-] “172.17.0.1” - - [01/Feb/2021:04:54:39 +0000] “GET / HTTP/1.1” 200 7675 “-” “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36”
2021-02-01 04:54:40.007321 [-] “172.17.0.1” - - [01/Feb/2021:04:54:39 +0000] “GET /_ui/style.css HTTP/1.1” 200 2591 “http://localhost:8050/” “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36”
2021-02-01 04:54:40.025381 [-] “172.17.0.1” - - [01/Feb/2021:04:54:39 +0000] “GET /_ui/main.js HTTP/1.1” 200 13055 “http://localhost:8050/” “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36”
2021-02-01 04:54:42.573986 [-] “172.17.0.1” - - [01/Feb/2021:04:54:42 +0000] “GET /_ui/inspections/splash-auto.json HTTP/1.1” 200 177721 “http://localhost:8050/” “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36”
2021-02-01 04:54:42.698853 [-] “172.17.0.1” - - [01/Feb/2021:04:54:42 +0000] “GET /_ui/favicon.ico HTTP/1.1” 200 4286 “http://localhost:8050/” “Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36”
2021-02-01 04:55:39.940831 [-] Timing out client: IPv4Address(type=‘TCP’, host=‘172.17.0.1’, port=55762)
2021-02-01 04:55:42.699230 [-] Timing out client: IPv4Address(type=‘TCP’, host=‘172.17.0.1’, port=55764)

Splash关闭

先关闭容器再删除容器

sudo docker ps -a
sudo docker stop CONTAINER_ID
sudo docker rm CONTAINER_ID

Scrapy-Splash 安装

JupyterLab 中输入:

pip install scrapy-splash

不能import

Scrapy-Redis 安装

JupyterLab 中输入:

pip install scrapy-redis
import scrapy_redis

Scrapyd 等 安装

pip install scrapyd

pip install scrapyd-client

pip install python-scrapyd-api

Scrapyrt 安装 轻量级scrapyd

pip install scrapyrt

标签:02,01,04,54,Redis,53,Scrapy,Splash,2021
来源: https://blog.csdn.net/jelatinprotain/article/details/113506670

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有