ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

scrapy爬取图片(以汽车之家的图片为例子)

2021-07-01 15:03:19  阅读:151  来源: 互联网

标签:category url 爬取 scrapy urls div carhome 图片


setting

from fake_useragent import UserAgent

BOT_NAME = 'carhome'

SPIDER_MODULES = ['carhome.spiders']
NEWSPIDER_MODULE = 'carhome.spiders'
ROBOTSTXT_OBEY = False
DEFAULT_REQUEST_HEADERS = {
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en',
    'User-Agent': str(UserAgent().random),
}
ITEM_PIPELINES = {
   # 'carhome.pipelines.CarhomePipeline': 300,
    'scrapy.pipelines.images.ImagesPipeline':1
}
IMAGES_STORE = "D:\python\scrapy_demo\carhome\carhome\images"

items:

import scrapy


class CarhomeItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    category = scrapy.Field()
    image_urls = scrapy.Field()
    imgs = scrapy.Field()

 

spiders/carhome_spider:

import scrapy
from carhome.items import CarhomeItem

class CarhomeSpiderSpider(scrapy.Spider):
    name = 'carhome_spider'
    allowed_domains = ['car.autohome.com']
    start_urls = ['https://car.autohome.com.cn/pic/series/66.html#pvareaid=3454438']

    def parse(self, response):
        divs = response.xpath("//div[@class='uibox']")[1:]
        for div in divs:
            category = div.xpath('.//div[@class="uibox-title"]/a/text()').get()
            urls = div.xpath(".//ul/li/a/img/@src").getall()
            # for url in urls:
            #     url=response.urljoin(url)
            #     print(url)
            urls = map(lambda url:response.urljoin(url),urls)
            urls = list(urls)
            item = CarhomeItem(category = category,image_urls=urls)
            yield item

 

标签:category,url,爬取,scrapy,urls,div,carhome,图片
来源: https://www.cnblogs.com/djwww/p/14958735.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有