ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

How to know that a new data is been added to HDFS?

2019-09-09 19:07:56  阅读:51  来源: 互联网

标签:HDFS added jobs system been Oozie new data


原文链接:https://stackoverflow.com/questions/14934079/how-to-know-that-a-new-data-is-been-added-to-hdfs

I am implementing a Notification system based on publish subscribe model to notify about the availability of data as it arrives/loaded to HDFS. I did n’t find a ways where to look for this. Is there any HDFS API which can be used to do this or what method should I use to get information of new data written to HDFS? I am using Hadoop v2.0.2 and I don’t want to use HCatalog, I want to implement my own tool to do this.?

What you are looking for is Oozie Coordinator.

HDFS is a file system, so something must be built on top of HDFS to check for file availability. HBase has coprocessor which are triggered procedures . But it is only available for HBase tables. So it cannot be used for detecting data availabilty in HDFS.

Oozie is a workflow scheduler system to manage Hadoop jobs. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty. Also you can execute other programs from it :

标签:HDFS,added,jobs,system,been,Oozie,new,data
来源: https://blog.csdn.net/hayaqi0504/article/details/100671331

专注分享技术,共同学习,共同进步。侵权联系[admin#icode9.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有