精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

How to know that a new data is been added to HDFS?

2019-09-09 19:07:56  阅读:51  来源: 互联网

标签:HDFS added jobs system been Oozie new data


I am implementing a Notification system based on publish subscribe model to notify about the availability of data as it arrives/loaded to HDFS. I did n’t find a ways where to look for this. Is there any HDFS API which can be used to do this or what method should I use to get information of new data written to HDFS? I am using Hadoop v2.0.2 and I don’t want to use HCatalog, I want to implement my own tool to do this.?

What you are looking for is Oozie Coordinator.

HDFS is a file system, so something must be built on top of HDFS to check for file availability. HBase has coprocessor which are triggered procedures . But it is only available for HBase tables. So it cannot be used for detecting data availabilty in HDFS.

Oozie is a workflow scheduler system to manage Hadoop jobs. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty. Also you can execute other programs from it :

来源: https://blog.csdn.net/hayaqi0504/article/details/100671331


Copyright (C)ICode9.com, All Rights Reserved.