ActivityNet数据集下载过程

2019-09-18 14:09:16 阅读：896 来源： 互联网

标签：tmp ActivityNet label command output video 数据 annotations 下载

准备工作：

安装anaconda

安装下面依赖包，其中ffmpeg安装方法见我的另一篇博客https://blog.csdn.net/weixin_43659035/article/details/99445208

pip install youtube-dl
sudo apt install ffmpeg
pip install pafy

1.首先利用如下命令，将该github项目下载下来。或者直接去github上进行下载

git clone https://github.com/activitynet/ActivityNet.git

2.进入到Crawler文件夹下

4.利用下面命令创建一个存放将要下载的数据集的文件夹

mkdir dataset

5.随便建一个工程，新建一个python file文档，命名为download.py，将下面的代码拷贝进去

6.在ActivityNet官网上下载json文件：

网址为：http://activity-net.org/download.html

另保存文件：activity_net.v1-3.min.json

将其复制到Crawler文件夹下

7. 将下面代码的 directory = 'dataset/'按照你自己的重新要求命名，注意后面的‘/’不能少。

8. cd 到download.py所在的目录，执行python download.py 就可以下载啦

import os
import json

import uuid
import glob
import subprocess
from joblib import delayed
from joblib import Parallel

# specify download directory
directory = 'dataset/'
videoCounter = 0
num_jobs = 24

# open json file
with open('activity_net.v1-3.min.json') as data_file:
data = json.load(data_file)

# take only video informations from database object
videos = data['database']
total = len(videos)
# iterate through dictionary of videos
def download_clip(videos, i, key, directory, total):
video = videos[key]

# find video subset
subset = video['subset']

   # find video label
   annotations = video['annotations']
   label = ''
   if len(annotations) != 0:
       label = annotations[0]['label']
       label = '/' + label.replace(' ', '_')

   # create folder named as <label> if does not exist
   label_dir = directory + subset + label
   if not os.path.exists(label_dir):
       os.makedirs(label_dir)

# take url of video
url = video['url']

   tmp_dir = 'tmp'
   url_base = 'https://www.youtube.com/watch?v='
   video_identifier = key

   tmp_filename = os.path.join(tmp_dir,
                               '%s.%%(ext)s' % uuid.uuid4())
   command = ['youtube-dl',
               '--quiet', '--no-warnings',
               '-f', 'mp4',
               '-o', '"%s"' % tmp_filename,
               '"%s"' % (url_base + video_identifier)]
   command = ' '.join(command)
   attempts = 0
   num_attempts = 5
   while True:
       try:
           output = subprocess.check_output(command, shell=True,
                                           stderr=subprocess.STDOUT)
       except subprocess.CalledProcessError as err:
           attempts += 1
           if attempts == num_attempts:
               return err.output
       else:
           break

   tmp_filename = glob.glob('%s*' % tmp_filename.split('.')[0])[0]
   # Construct command to trim the videos (ffmpeg required).
   if len(annotations) == 0:
       sstr = label_dir + '/' + key + '.mp4'
       command = ['ffmpeg',
                   '-i', '"%s"' % tmp_filename,
                   '-strict', str(-2),
                   '-s', '320x240',
                   '-loglevel', 'panic',
                   '"%s"' % sstr]
       command = ' '.join(command)
       try:
           output = subprocess.check_output(command, shell=True,
                                           stderr=subprocess.STDOUT)
       except subprocess.CalledProcessError as err:
           return err.output
   else:
       for jj in range(len(annotations)):
           sstr = label_dir + '/' +key + '_' + str(annotations[jj]['segment'][0]) + '_' + str(annotations[jj]['segment'][1]) + '.mp4'
           command = ['ffmpeg',
                       '-i', '"%s"' % tmp_filename,
                       '-ss', str(annotations[jj]['segment'][0]),
                       '-t', str(annotations[jj]['segment'][1] - annotations[jj]['segment'][0]),
                       '-strict', str(-2),
                       '-s', '320x240',
                       '-c:v', 'libx264', '-c:a', 'copy',
                       '-threads', '1',
                       '-loglevel', 'panic',
                       '"%s"' % sstr]
           command = ' '.join(command)
           try:
               output = subprocess.check_output(command, shell=True,
                                               stderr=subprocess.STDOUT)
           except subprocess.CalledProcessError as err:
               return err.output

   # Check if the video was successfully saved.
   # status = os.path.exists(output_filename)
   os.remove(tmp_filename)
   print('已经下载了   {}/{}，请耐心等待'.format(i, total))
   return True
if num_jobs == 1:
   for i, key in enumerate(videos):
       download_clip(videos, i, key, directory, total)
else:
   Parallel(n_jobs=num_jobs)(delayed(download_clip)(
            videos, i, key, directory, total) for i, key in enumerate(videos) if i >= videoCounter)

# 后面的if语句是为了防止下载途中意外中断，videoCouter表示之前已经下载了的视频个数，终端之后你只需要，
# 比如“已经下载了 8887/9682，请耐心等待”，在重新运行代码之前修改代码里面的 videoCounter = 0，当然由于是并行，所以最好修改成 #videoCounter = 8500.这个if语句相当于从断点下载，但不需要一个一个的重新扫描已经下载的文件，非常耗时。本人下载kinetics600时，中途
#突然掉线扫描已经下载好的文件都扫描了一两天的时间。

标签：tmp,ActivityNet,label,command,output,video,数据,annotations,下载
来源： https://blog.csdn.net/weixin_43659035/article/details/100980672

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

ActivityNet数据集下载过程