ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

Spark操作数据表入门【进行数据写入和读出】————附带详细步骤

2021-07-09 00:02:34  阅读:249  来源: 互联网

标签:INFO 13 06 21 54 30 数据表 读出 Spark


文章目录

0 准备

运行路径为:/usr/app/spark-2.4.7-bin-hadoop2.7

1 使用脚本运行

执行脚本运行下面的python文件:

export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export  PATH=$PATH:$LD_LIBRARY_PATH

bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.3 bin/testSpark.py


/usr/app/spark-2.4.7-bin-hadoop2.7/bin/spark-submit --jars $HUDI_SPARK_BUNDLE --master spark://10.20.3.72:7077 --driver-class-path $HADOOP_CONF_DIR:/usr/app/apache-hive-2.3.8-bin/conf/:/software/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --deploy-mode client --driver-memory 1G --executor-memory 1G --num-executors 3 --packages org.apache.spark:spark-avro_2.11:2.4.4 /usr/app/spark-2.4.7-bin-hadoop2.7/bin/testSpark.py

spark对数据进行读取和创建的python文件:

# coding=utf-8
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
spark = SparkSession.builder \
      .master("local[*]") \
      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
      .config("spark.default.parallelism", 2) \
      .appName("hudi-datalake-test") \
      .getOrCreate()
sparkContext = spark.sparkContext;

basePath = '/user/hive/warehouse/test_spark_mor'
tripsSnapshotDF = spark. \
  read. \
  format("hudi"). \
  load(basePath + "/*/*/*/*")
tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
print('原始文件:')
df=spark.sql("select * from hudi_trips_snapshot where category = 'E'")
df.show()

# 写数据:
savePath = '/user/hive/warehouse/test_spark_mor10'

hudi_options = {
  'hoodie.table.name': 'test_spark_mor10',
  'hoodie.datasource.write.recordkey.field': 'id',
  'hoodie.datasource.write.partitionpath.field': 'create_date',
  'hoodie.datasource.write.table.name': 'test_spark_mor10',
  'hoodie.datasource.write.operation': 'insert',
  'hoodie.datasource.write.precombine.field': 'ts',
  'hoodie.upsert.shuffle.parallelism': 2,
  'hoodie.insert.shuffle.parallelism': 2
}

df.write.format("hudi"). \
  options(**hudi_options). \
  mode("append"). \
  save(savePath)

# 再读新数据
basePath =  savePath
tripsSnapshotDF = spark. \
  read. \
  format("hudi"). \
  load(basePath + "/*/*/*/*")
tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
print('新数据:')
spark.sql("select * from hudi_trips_snapshot ").show()

2 使用shell执行

启动的shell的方法如下:

bin/pyspark --jars $HUDI_SPARK_BUNDLE --master spark://10.20.3.72:7077 --driver-class-path $HADOOP_CONF_DIR:/usr/app/apache-hive-2.3.8-bin/conf/:/software/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --deploy-mode client --driver-memory 1G --executor-memory 1G --num-executors 3 --packages org.apache.spark:spark-avro_2.11:2.4.4

启动后执行的指令如下:

basePath = '/user/hive/warehouse/test_spark_mor'
tripsSnapshotDF = spark. \
  read. \
  format("hudi"). \
  load(basePath + "/*/*/*/*")
tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
print('原始文件')
df=spark.sql("select * from hudi_trips_snapshot where category = 'E'")
df.show()

# 写数据:
savePath = '/user/hive/warehouse/test_spark_mor10'

hudi_options = {
  'hoodie.table.name': 'test_spark_mor10',
  'hoodie.datasource.write.recordkey.field': 'id',
  'hoodie.datasource.write.partitionpath.field': 'create_date',
  'hoodie.datasource.write.table.name': 'test_spark_mor10',
  'hoodie.datasource.write.operation': 'insert',
  'hoodie.datasource.write.precombine.field': 'ts',
  'hoodie.upsert.shuffle.parallelism': 2,
  'hoodie.insert.shuffle.parallelism': 2
}

df.write.format("hudi"). \
  options(**hudi_options). \
  mode("append"). \
  save(savePath)

# 再读新数据
basePath =  savePath
tripsSnapshotDF = spark. \
  read. \
  format("hudi"). \
  load(basePath + "/*/*/*/*")
tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot")
print('新数据:')
spark.sql("select * from hudi_trips_snapshot ").show()

3 使用脚本执行的结果

部分结果如下:

21/06/30 13:54:49 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[9] at map at HoodieSparkSqlWriter.scala:152), which has no missing parents
21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 23.9 KB, free 365.4 MB)
21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 11.4 KB, free 365.4 MB)
21/06/30 13:54:49 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on hdp-jk-1:36125 (size: 11.4 KB, free: 366.2 MB)
21/06/30 13:54:49 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[9] at map at HoodieSparkSqlWriter.scala:152) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:49 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
21/06/30 13:54:49 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, executor driver, partition 0, PROCESS_LOCAL, 8527 bytes)
21/06/30 13:54:49 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2)
21/06/30 13:54:49 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:49 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:49 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:49 INFO compress.CodecPool: Got brand-new decompressor [.gz]
21/06/30 13:54:49 INFO codegen.CodeGenerator: Code generated in 39.268569 ms
21/06/30 13:54:49 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 1400 bytes result sent to driver
21/06/30 13:54:49 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 283 ms on localhost (executor driver) (1/1)
21/06/30 13:54:49 INFO scheduler.DAGScheduler: ResultStage 2 (isEmpty at HoodieSparkSqlWriter.scala:181) finished in 0.309 s
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Job 2 finished: isEmpty at HoodieSparkSqlWriter.scala:181, took 0.317609 s
21/06/30 13:54:49 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/06/30 13:54:49 INFO client.AbstractHoodieWriteClient: Generate a new instant time: 20210630135447 action: commit
21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Creating a new instant [==>20210630135447__commit__REQUESTED]
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:49 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:49 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__REQUESTED]]
21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:49 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:49 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/06/30 13:54:49 INFO client.AsyncCleanerService: Auto cleaning is not enabled. Not running cleaner now
21/06/30 13:54:49 INFO spark.SparkContext: Starting job: countByKey at BaseSparkCommitActionExecutor.java:158
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Registering RDD 11 (countByKey at BaseSparkCommitActionExecutor.java:158) as input to shuffle 0
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Got job 3 (countByKey at BaseSparkCommitActionExecutor.java:158) with 2 output partitions
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (countByKey at BaseSparkCommitActionExecutor.java:158)
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 3)
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 3 (MapPartitionsRDD[11] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_9 stored as values in memory (estimated size 26.8 KB, free 365.3 MB)
21/06/30 13:54:49 INFO memory.MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 12.8 KB, free 365.3 MB)
21/06/30 13:54:49 INFO storage.BlockManagerInfo: Added broadcast_9_piece0 in memory on hdp-jk-1:36125 (size: 12.8 KB, free: 366.2 MB)
21/06/30 13:54:49 INFO spark.SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:49 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 3 (MapPartitionsRDD[11] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:49 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
21/06/30 13:54:49 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, localhost, executor driver, partition 0, PROCESS_LOCAL, 8516 bytes)
21/06/30 13:54:49 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 4, localhost, executor driver, partition 1, PROCESS_LOCAL, 8514 bytes)
21/06/30 13:54:49 INFO executor.Executor: Running task 0.0 in stage 3.0 (TID 3)
21/06/30 13:54:49 INFO executor.Executor: Running task 1.0 in stage 3.0 (TID 4)
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO compress.CodecPool: Got brand-new decompressor [.gz]
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO memory.MemoryStore: Block rdd_9_0 stored as values in memory (estimated size 196.0 B, free 365.3 MB)
21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added rdd_9_0 in memory on hdp-jk-1:36125 (size: 196.0 B, free: 366.2 MB)
21/06/30 13:54:50 INFO compat.FilterCompat: Filtering using predicate: and(noteq(category, null), eq(category, Binary{"E"}))
21/06/30 13:54:50 INFO compress.CodecPool: Got brand-new decompressor [.gz]
21/06/30 13:54:50 INFO memory.MemoryStore: Block rdd_9_1 stored as values in memory (estimated size 251.0 B, free 365.3 MB)
21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added rdd_9_1 in memory on hdp-jk-1:36125 (size: 251.0 B, free: 366.2 MB)
21/06/30 13:54:50 INFO executor.Executor: Finished task 0.0 in stage 3.0 (TID 3). 1370 bytes result sent to driver
21/06/30 13:54:50 INFO executor.Executor: Finished task 1.0 in stage 3.0 (TID 4). 1370 bytes result sent to driver
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 392 ms on localhost (executor driver) (1/2)
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 4) in 391 ms on localhost (executor driver) (2/2)
21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
21/06/30 13:54:50 INFO scheduler.DAGScheduler: ShuffleMapStage 3 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.432 s
21/06/30 13:54:50 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/06/30 13:54:50 INFO scheduler.DAGScheduler: running: Set()
21/06/30 13:54:50 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 4)
21/06/30 13:54:50 INFO scheduler.DAGScheduler: failed: Set()
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (ShuffledRDD[12] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_10 stored as values in memory (estimated size 3.7 KB, free 365.3 MB)
21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 2.1 KB, free 365.3 MB)
21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added broadcast_10_piece0 in memory on hdp-jk-1:36125 (size: 2.1 KB, free: 366.2 MB)
21/06/30 13:54:50 INFO spark.SparkContext: Created broadcast 10 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 4 (ShuffledRDD[12] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 4.0 (TID 5, localhost, executor driver, partition 1, PROCESS_LOCAL, 7662 bytes)
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 6, localhost, executor driver, partition 0, ANY, 7662 bytes)
21/06/30 13:54:50 INFO executor.Executor: Running task 1.0 in stage 4.0 (TID 5)
21/06/30 13:54:50 INFO executor.Executor: Running task 0.0 in stage 4.0 (TID 6)
21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks
21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Getting 2 non-empty blocks including 2 local blocks and 0 remote blocks
21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 13 ms
21/06/30 13:54:50 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 13 ms
21/06/30 13:54:50 INFO executor.Executor: Finished task 1.0 in stage 4.0 (TID 5). 1098 bytes result sent to driver
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 4.0 (TID 5) in 128 ms on localhost (executor driver) (1/2)
21/06/30 13:54:50 INFO executor.Executor: Finished task 0.0 in stage 4.0 (TID 6). 1176 bytes result sent to driver
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 6) in 143 ms on localhost (executor driver) (2/2)
21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool
21/06/30 13:54:50 INFO scheduler.DAGScheduler: ResultStage 4 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.185 s
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Job 3 finished: countByKey at BaseSparkCommitActionExecutor.java:158, took 0.737207 s
21/06/30 13:54:50 INFO commit.BaseSparkCommitActionExecutor: Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=4, numUpdates=0}, partitionStat={2021/06/29=WorkloadStat {numInserts=1, numUpdates=0}, 2021/06/30=WorkloadStat {numInserts=3, numUpdates=0}}, operationType=INSERT}
21/06/30 13:54:50 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.commit.requested
21/06/30 13:54:50 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.inflight
21/06/30 13:54:50 INFO commit.UpsertPartitioner: AvgRecordSize => 1024
21/06/30 13:54:50 INFO spark.SparkContext: Starting job: collectAsMap at UpsertPartitioner.java:252
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Got job 4 (collectAsMap at UpsertPartitioner.java:252) with 2 output partitions
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Final stage: ResultStage 5 (collectAsMap at UpsertPartitioner.java:252)
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[14] at mapToPair at UpsertPartitioner.java:251), which has no missing parents
21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_11 stored as values in memory (estimated size 231.2 KB, free 365.1 MB)
21/06/30 13:54:50 INFO memory.MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 82.6 KB, free 365.0 MB)
21/06/30 13:54:50 INFO storage.BlockManagerInfo: Added broadcast_11_piece0 in memory on hdp-jk-1:36125 (size: 82.6 KB, free: 366.1 MB)
21/06/30 13:54:50 INFO spark.SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:50 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 5 (MapPartitionsRDD[14] at mapToPair at UpsertPartitioner.java:251) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:50 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 7, localhost, executor driver, partition 0, PROCESS_LOCAL, 7733 bytes)
21/06/30 13:54:50 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 5.0 (TID 8, localhost, executor driver, partition 1, PROCESS_LOCAL, 7733 bytes)
21/06/30 13:54:50 INFO executor.Executor: Running task 0.0 in stage 5.0 (TID 7)
21/06/30 13:54:51 INFO executor.Executor: Running task 1.0 in stage 5.0 (TID 8)
21/06/30 13:54:51 INFO executor.Executor: Finished task 1.0 in stage 5.0 (TID 8). 705 bytes result sent to driver
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 5.0 (TID 8) in 73 ms on localhost (executor driver) (1/2)
21/06/30 13:54:51 INFO executor.Executor: Finished task 0.0 in stage 5.0 (TID 7). 705 bytes result sent to driver
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 7) in 142 ms on localhost (executor driver) (2/2)
21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
21/06/30 13:54:51 INFO scheduler.DAGScheduler: ResultStage 5 (collectAsMap at UpsertPartitioner.java:252) finished in 0.225 s
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Job 4 finished: collectAsMap at UpsertPartitioner.java:252, took 0.235333 s
21/06/30 13:54:51 INFO view.AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:51 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:51 INFO commit.UpsertPartitioner: For partitionPath : 2021/06/29 Small Files => []
21/06/30 13:54:51 INFO commit.UpsertPartitioner: After small file assignment: unassignedInserts => 1, totalInsertBuckets => 1, recordsPerBucket => 122880
21/06/30 13:54:51 INFO commit.UpsertPartitioner: Total insert buckets for partition path 2021/06/29 => [(InsertBucket {bucketNumber=0, weight=1.0},1.0)]
21/06/30 13:54:51 INFO commit.UpsertPartitioner: For partitionPath : 2021/06/30 Small Files => []
21/06/30 13:54:51 INFO commit.UpsertPartitioner: After small file assignment: unassignedInserts => 3, totalInsertBuckets => 1, recordsPerBucket => 122880
21/06/30 13:54:51 INFO commit.UpsertPartitioner: Total insert buckets for partition path 2021/06/30 => [(InsertBucket {bucketNumber=1, weight=1.0},1.0)]
21/06/30 13:54:51 INFO commit.UpsertPartitioner: Total Buckets :2, buckets info => {0=BucketInfo {bucketType=INSERT, fileIdPrefix=20421052-ee81-4427-ab8f-464d81b40b8f, partitionPath=2021/06/29}, 1=BucketInfo {bucketType=INSERT, fileIdPrefix=49a56794-40a5-45c2-bd4a-08a566590703, partitionPath=2021/06/30}},
Partition to insert buckets => {2021/06/29=[(InsertBucket {bucketNumber=0, weight=1.0},1.0)], 2021/06/30=[(InsertBucket {bucketNumber=1, weight=1.0},1.0)]},
UpdateLocations mapped to buckets =>{}
21/06/30 13:54:51 INFO commit.BaseCommitActionExecutor: Auto commit disabled for 20210630135447
21/06/30 13:54:51 INFO spark.SparkContext: Starting job: count at HoodieSparkSqlWriter.scala:470
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Registering RDD 15 (mapToPair at BaseSparkCommitActionExecutor.java:192) as input to shuffle 1
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Got job 5 (count at HoodieSparkSqlWriter.scala:470) with 2 output partitions
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Final stage: ResultStage 7 (count at HoodieSparkSqlWriter.scala:470)
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 6)
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[15] at mapToPair at BaseSparkCommitActionExecutor.java:192), which has no missing parents
21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_12 stored as values in memory (estimated size 253.2 KB, free 364.8 MB)
21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 92.7 KB, free 364.7 MB)
21/06/30 13:54:51 INFO storage.BlockManagerInfo: Added broadcast_12_piece0 in memory on hdp-jk-1:36125 (size: 92.7 KB, free: 366.0 MB)
21/06/30 13:54:51 INFO spark.SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[15] at mapToPair at BaseSparkCommitActionExecutor.java:192) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Adding task set 6.0 with 2 tasks
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 9, localhost, executor driver, partition 0, PROCESS_LOCAL, 8516 bytes)
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 6.0 (TID 10, localhost, executor driver, partition 1, PROCESS_LOCAL, 8514 bytes)
21/06/30 13:54:51 INFO executor.Executor: Running task 1.0 in stage 6.0 (TID 10)
21/06/30 13:54:51 INFO executor.Executor: Running task 0.0 in stage 6.0 (TID 9)
21/06/30 13:54:51 INFO storage.BlockManager: Found block rdd_9_0 locally
21/06/30 13:54:51 INFO storage.BlockManager: Found block rdd_9_1 locally
21/06/30 13:54:51 INFO executor.Executor: Finished task 0.0 in stage 6.0 (TID 9). 1370 bytes result sent to driver
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 6.0 (TID 9) in 139 ms on localhost (executor driver) (1/2)
21/06/30 13:54:51 INFO executor.Executor: Finished task 1.0 in stage 6.0 (TID 10). 1327 bytes result sent to driver
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 6.0 (TID 10) in 131 ms on localhost (executor driver) (2/2)
21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool
21/06/30 13:54:51 INFO scheduler.DAGScheduler: ShuffleMapStage 6 (mapToPair at BaseSparkCommitActionExecutor.java:192) finished in 0.195 s
21/06/30 13:54:51 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/06/30 13:54:51 INFO scheduler.DAGScheduler: running: Set()
21/06/30 13:54:51 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 7)
21/06/30 13:54:51 INFO scheduler.DAGScheduler: failed: Set()
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting ResultStage 7 (MapPartitionsRDD[20] at filter at HoodieSparkSqlWriter.scala:470), which has no missing parents
21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_13 stored as values in memory (estimated size 325.0 KB, free 364.3 MB)
21/06/30 13:54:51 INFO memory.MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 118.5 KB, free 364.2 MB)
21/06/30 13:54:51 INFO storage.BlockManagerInfo: Added broadcast_13_piece0 in memory on hdp-jk-1:36125 (size: 118.5 KB, free: 365.9 MB)
21/06/30 13:54:51 INFO spark.SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:51 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (MapPartitionsRDD[20] at filter at HoodieSparkSqlWriter.scala:470) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:51 INFO scheduler.TaskSchedulerImpl: Adding task set 7.0 with 2 tasks
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 7.0 (TID 11, localhost, executor driver, partition 0, ANY, 7662 bytes)
21/06/30 13:54:51 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 7.0 (TID 12, localhost, executor driver, partition 1, ANY, 7662 bytes)
21/06/30 13:54:51 INFO executor.Executor: Running task 0.0 in stage 7.0 (TID 11)
21/06/30 13:54:51 INFO executor.Executor: Running task 1.0 in stage 7.0 (TID 12)
21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks
21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks
21/06/30 13:54:51 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: starting to buffer records
21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: starting to buffer records
21/06/30 13:54:51 INFO queue.BoundedInMemoryExecutor: starting consumer thread
21/06/30 13:54:51 INFO queue.BoundedInMemoryExecutor: starting consumer thread
21/06/30 13:54:51 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: finished buffering records
21/06/30 13:54:51 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:51 INFO queue.IteratorBasedQueueProducer: finished buffering records
21/06/30 13:54:52 INFO table.MarkerFiles: Creating Marker Path=/user/hive/warehouse/test_spark_mor10/.hoodie/.temp/20210630135447/2021/06/30/49a56794-40a5-45c2-bd4a-08a566590703-0_1-7-12_20210630135447.parquet.marker.CREATE
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO table.MarkerFiles: Creating Marker Path=/user/hive/warehouse/test_spark_mor10/.hoodie/.temp/20210630135447/2021/06/29/20421052-ee81-4427-ab8f-464d81b40b8f-0_0-7-11_20210630135447.parquet.marker.CREATE
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO compress.CodecPool: Got brand-new compressor [.gz]
21/06/30 13:54:52 INFO compress.CodecPool: Got brand-new compressor [.gz]
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 99
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 102
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 85
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 155
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 128
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 93
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 111
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 94
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 101
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 105
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 140
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 83
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 131
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 72
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 84
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 59
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 123
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned shuffle 0
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 134
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 115
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 97
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 139
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_11_piece0 on hdp-jk-1:36125 in memory (size: 82.6 KB, free: 366.0 MB)
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 67
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 124
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 138
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 103
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 81
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 82
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 95
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 79
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 156
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 89
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 98
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 146
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 77
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 57
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 96
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 141
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 100
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 86
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on hdp-jk-1:36125 in memory (size: 12.8 KB, free: 366.0 MB)
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 117
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 147
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 150
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 66
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 114
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 65
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 116
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 109
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 113
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 60
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 62
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 78
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 143
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 104
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 135
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 112
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 58
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 63
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_10_piece0 on hdp-jk-1:36125 in memory (size: 2.1 KB, free: 366.0 MB)
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 142
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 76
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 107
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 121
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 71
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 129
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 137
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 91
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 125
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 74
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 148
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 133
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 144
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 73
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 64
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 88
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 120
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 149
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 122
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 92
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 87
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 126
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 154
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 106
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 136
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 152
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 61
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 108
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 118
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_12_piece0 on hdp-jk-1:36125 in memory (size: 92.7 KB, free: 366.1 MB)
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 145
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 151
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 70
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 90
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 119
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 110
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 127
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 80
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on hdp-jk-1:36125 in memory (size: 11.4 KB, free: 366.1 MB)
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: ], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:52 INFO io.HoodieCreateHandle: New CreateHandle for partition :2021/06/30 with fileId 49a56794-40a5-45c2-bd4a-08a566590703-0
21/06/30 13:54:52 INFO io.HoodieCreateHandle: New CreateHandle for partition :2021/06/29 with fileId 20421052-ee81-4427-ab8f-464d81b40b8f-0
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 68
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 132
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 69
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 153
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 130
21/06/30 13:54:52 INFO spark.ContextCleaner: Cleaned accumulator 75
21/06/30 13:54:52 INFO io.HoodieCreateHandle: Closing the file 20421052-ee81-4427-ab8f-464d81b40b8f-0 as we are done with all the records 1
21/06/30 13:54:52 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 240
21/06/30 13:54:52 INFO io.HoodieCreateHandle: Closing the file 49a56794-40a5-45c2-bd4a-08a566590703-0 as we are done with all the records 3
21/06/30 13:54:52 INFO hadoop.InternalParquetRecordWriter: Flushing mem columnStore to file. allocated memory: 461
21/06/30 13:54:52 INFO io.HoodieCreateHandle: CreateHandle for partitionPath 2021/06/30 fileID 49a56794-40a5-45c2-bd4a-08a566590703-0, took 1012 ms.
21/06/30 13:54:52 INFO queue.BoundedInMemoryExecutor: Queue Consumption is done; notifying producer threads
21/06/30 13:54:52 INFO memory.MemoryStore: Block rdd_19_1 stored as values in memory (estimated size 301.0 B, free 365.0 MB)
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Added rdd_19_1 in memory on hdp-jk-1:36125 (size: 301.0 B, free: 366.1 MB)
21/06/30 13:54:52 INFO io.HoodieCreateHandle: CreateHandle for partitionPath 2021/06/29 fileID 20421052-ee81-4427-ab8f-464d81b40b8f-0, took 1044 ms.
21/06/30 13:54:52 INFO queue.BoundedInMemoryExecutor: Queue Consumption is done; notifying producer threads
21/06/30 13:54:52 INFO memory.MemoryStore: Block rdd_19_0 stored as values in memory (estimated size 301.0 B, free 365.0 MB)
21/06/30 13:54:52 INFO storage.BlockManagerInfo: Added rdd_19_0 in memory on hdp-jk-1:36125 (size: 301.0 B, free: 366.1 MB)
21/06/30 13:54:52 INFO executor.Executor: Finished task 1.0 in stage 7.0 (TID 12). 1428 bytes result sent to driver
21/06/30 13:54:52 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 7.0 (TID 12) in 1256 ms on localhost (executor driver) (1/2)
21/06/30 13:54:52 INFO executor.Executor: Finished task 0.0 in stage 7.0 (TID 11). 1428 bytes result sent to driver
21/06/30 13:54:52 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 7.0 (TID 11) in 1261 ms on localhost (executor driver) (2/2)
21/06/30 13:54:52 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool
21/06/30 13:54:52 INFO scheduler.DAGScheduler: ResultStage 7 (count at HoodieSparkSqlWriter.scala:470) finished in 1.351 s
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Job 5 finished: count at HoodieSparkSqlWriter.scala:470, took 1.556432 s
21/06/30 13:54:52 INFO hudi.HoodieSparkSqlWriter$: No errors. Proceeding to commit the write.
21/06/30 13:54:52 INFO spark.SparkContext: Starting job: collect at SparkRDDWriteClient.java:120
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Got job 6 (collect at SparkRDDWriteClient.java:120) with 2 output partitions
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Final stage: ResultStage 9 (collect at SparkRDDWriteClient.java:120)
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 8)
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:52 INFO scheduler.DAGScheduler: Submitting ResultStage 9 (MapPartitionsRDD[21] at map at SparkRDDWriteClient.java:120), which has no missing parents
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_14 stored as values in memory (estimated size 325.5 KB, free 364.6 MB)
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 118.8 KB, free 364.5 MB)
21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_14_piece0 in memory on hdp-jk-1:36125 (size: 118.8 KB, free: 366.0 MB)
21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 9 (MapPartitionsRDD[21] at map at SparkRDDWriteClient.java:120) (first 15 tasks are for partitions Vector(0, 1))
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 9.0 with 2 tasks
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 9.0 (TID 13, localhost, executor driver, partition 0, PROCESS_LOCAL, 7662 bytes)
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 9.0 (TID 14, localhost, executor driver, partition 1, PROCESS_LOCAL, 7662 bytes)
21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 9.0 (TID 13)
21/06/30 13:54:53 INFO executor.Executor: Running task 1.0 in stage 9.0 (TID 14)
21/06/30 13:54:53 INFO storage.BlockManager: Found block rdd_19_1 locally
21/06/30 13:54:53 INFO storage.BlockManager: Found block rdd_19_0 locally
21/06/30 13:54:53 INFO executor.Executor: Finished task 1.0 in stage 9.0 (TID 14). 1486 bytes result sent to driver
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 9.0 (TID 14) in 125 ms on localhost (executor driver) (1/2)
21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 9.0 (TID 13). 1486 bytes result sent to driver
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 9.0 (TID 13) in 130 ms on localhost (executor driver) (2/2)
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool
21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 9 (collect at SparkRDDWriteClient.java:120) finished in 0.219 s
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 6 finished: collect at SparkRDDWriteClient.java:120, took 0.225906 s
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:53 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__INFLIGHT]]
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:53 INFO util.CommitUtils: Creating  metadata for INSERT numWriteStats:2numReplaceFileIds:0
21/06/30 13:54:53 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Got job 7 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 10 (collect at HoodieSparkEngineContext.java:78)
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting ResultStage 10 (MapPartitionsRDD[23] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_15 stored as values in memory (estimated size 72.1 KB, free 364.4 MB)
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 26.2 KB, free 364.4 MB)
21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_15_piece0 in memory on hdp-jk-1:36125 (size: 26.2 KB, free: 366.0 MB)
21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 10 (MapPartitionsRDD[23] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 10.0 with 1 tasks
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 10.0 (TID 15, localhost, executor driver, partition 0, PROCESS_LOCAL, 7816 bytes)
21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 10.0 (TID 15)
21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 10.0 (TID 15). 833 bytes result sent to driver
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 10.0 (TID 15) in 40 ms on localhost (executor driver) (1/1)
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks have all completed, from pool
21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 10 (collect at HoodieSparkEngineContext.java:78) finished in 0.071 s
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 7 finished: collect at HoodieSparkEngineContext.java:78, took 0.075028 s
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:53 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__INFLIGHT]]
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:53 INFO client.AbstractHoodieWriteClient: Committing 20210630135447 action commit
21/06/30 13:54:53 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Got job 8 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 11 (collect at HoodieSparkEngineContext.java:78)
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting ResultStage 11 (MapPartitionsRDD[25] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_16 stored as values in memory (estimated size 72.1 KB, free 364.4 MB)
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 26.2 KB, free 364.3 MB)
21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on hdp-jk-1:36125 (size: 26.2 KB, free: 365.9 MB)
21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 11 (MapPartitionsRDD[25] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 11.0 with 1 tasks
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 11.0 (TID 16, localhost, executor driver, partition 0, PROCESS_LOCAL, 7816 bytes)
21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 11.0 (TID 16)
21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 11.0 (TID 16). 876 bytes result sent to driver
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 11.0 (TID 16) in 36 ms on localhost (executor driver) (1/1)
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks have all completed, from pool
21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 11 (collect at HoodieSparkEngineContext.java:78) finished in 0.067 s
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 8 finished: collect at HoodieSparkEngineContext.java:78, took 0.073743 s
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Marking instant complete [==>20210630135447__commit__INFLIGHT]
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.inflight
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hive/warehouse/test_spark_mor10/.hoodie/20210630135447.commit
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Completed [==>20210630135447__commit__INFLIGHT]
21/06/30 13:54:53 INFO spark.SparkContext: Starting job: foreach at HoodieSparkEngineContext.java:83
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Got job 9 (foreach at HoodieSparkEngineContext.java:83) with 1 output partitions
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Final stage: ResultStage 12 (foreach at HoodieSparkEngineContext.java:83)
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting ResultStage 12 (ParallelCollectionRDD[26] at parallelize at HoodieSparkEngineContext.java:83), which has no missing parents
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_17 stored as values in memory (estimated size 71.2 KB, free 364.3 MB)
21/06/30 13:54:53 INFO memory.MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 25.7 KB, free 364.2 MB)
21/06/30 13:54:53 INFO storage.BlockManagerInfo: Added broadcast_17_piece0 in memory on hdp-jk-1:36125 (size: 25.7 KB, free: 365.9 MB)
21/06/30 13:54:53 INFO spark.SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 12 (ParallelCollectionRDD[26] at parallelize at HoodieSparkEngineContext.java:83) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Adding task set 12.0 with 1 tasks
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 12.0 (TID 17, localhost, executor driver, partition 0, PROCESS_LOCAL, 7816 bytes)
21/06/30 13:54:53 INFO executor.Executor: Running task 0.0 in stage 12.0 (TID 17)
21/06/30 13:54:53 INFO executor.Executor: Finished task 0.0 in stage 12.0 (TID 17). 666 bytes result sent to driver
21/06/30 13:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 12.0 (TID 17) in 37 ms on localhost (executor driver) (1/1)
21/06/30 13:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool
21/06/30 13:54:53 INFO scheduler.DAGScheduler: ResultStage 12 (foreach at HoodieSparkEngineContext.java:83) finished in 0.066 s
21/06/30 13:54:53 INFO scheduler.DAGScheduler: Job 9 finished: foreach at HoodieSparkEngineContext.java:83, took 0.071339 s
21/06/30 13:54:53 INFO table.MarkerFiles: Removing marker directory at /user/hive/warehouse/test_spark_mor10/.hoodie/.temp/20210630135447
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210630135447__commit__REQUESTED], [==>20210630135447__commit__INFLIGHT], [20210630135447__commit__COMPLETED]]
21/06/30 13:54:53 INFO table.HoodieTimelineArchiveLog: No Instants to archive
21/06/30 13:54:53 INFO client.AbstractHoodieWriteClient: Auto cleaning is enabled. Running cleaner now
21/06/30 13:54:53 INFO client.AbstractHoodieWriteClient: Scheduling cleaning at instant time :20210630135453
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:53 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]]
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating remote view for basePath /user/hive/warehouse/test_spark_mor10. Server=hdp-jk-1:35915, Timeout=300
21/06/30 13:54:53 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:53 INFO view.AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:53 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:53 INFO view.RemoteHoodieTableFileSystemView: Sending request : (http://hdp-jk-1:35915/v1/hoodie/view/compactions/pending/?basepath=%2Fuser%2Fhive%2Fwarehouse%2Ftest_spark_mor10&lastinstantts=20210630135447&timelinehash=09ac7cb732068e0c3926289658471433dcd40aa40b28b7eee22c414b299f5bf3)
21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:54 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]]
21/06/30 13:54:54 INFO view.AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:54 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:54 INFO service.RequestHandler: TimeTakenMillis[Total=92, Refresh=81, handle=10, Check=1], Success=true, Query=basepath=%2Fuser%2Fhive%2Fwarehouse%2Ftest_spark_mor10&lastinstantts=20210630135447&timelinehash=09ac7cb732068e0c3926289658471433dcd40aa40b28b7eee22c414b299f5bf3, Host=hdp-jk-1:35915, synced=false
21/06/30 13:54:54 INFO clean.CleanPlanner: No earliest commit to retain. No need to scan partitions !!
21/06/30 13:54:54 INFO clean.CleanPlanner: Nothing to clean here. It is already clean
21/06/30 13:54:54 INFO client.AbstractHoodieWriteClient: Cleaner started
21/06/30 13:54:54 INFO client.AbstractHoodieWriteClient: Cleaned failed attempts if any
21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:54 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hive/warehouse/test_spark_mor10
21/06/30 13:54:54 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]]
21/06/30 13:54:54 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/06/30 13:54:54 INFO view.FileSystemViewManager: Creating remote first table view
21/06/30 13:54:54 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/06/30 13:54:54 INFO client.AbstractHoodieWriteClient: Committed 20210630135447
21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Commit 20210630135447 successful!
21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Config.inlineCompactionEnabled ? false
21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Compaction Scheduled is Optional.empty
21/06/30 13:54:54 INFO hudi.HoodieSparkSqlWriter$: Is Async Compaction Enabled ? false
21/06/30 13:54:54 INFO client.AbstractHoodieClient: Stopping Timeline service !!
21/06/30 13:54:54 INFO embedded.EmbeddedTimelineService: Closing Timeline server
21/06/30 13:54:54 INFO service.TimelineService: Closing Timeline Service
21/06/30 13:54:54 INFO javalin.Javalin: Stopping Javalin ...
21/06/30 13:54:54 INFO javalin.Javalin: Javalin has stopped
21/06/30 13:54:54 INFO view.AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:54 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:54 INFO service.TimelineService: Closed Timeline Service
21/06/30 13:54:54 INFO embedded.EmbeddedTimelineService: Closed Timeline server
21/06/30 13:54:54 INFO rdd.MapPartitionsRDD: Removing RDD 19 from persistence list
21/06/30 13:54:54 INFO storage.BlockManager: Removing RDD 19
21/06/30 13:54:54 INFO rdd.MapPartitionsRDD: Removing RDD 9 from persistence list
21/06/30 13:54:54 INFO storage.BlockManager: Removing RDD 9
21/06/30 13:54:55 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:55 INFO hudi.DataSourceUtils: Getting table path..
21/06/30 13:54:55 INFO util.TablePathUtils: Getting table path from path : hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/.hoodie/.aux/.bootstrap/.fileids
21/06/30 13:54:55 INFO hudi.DefaultSource: Obtained hudi table path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:55 INFO table.HoodieTableConfig: Loading table properties from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO hudi.DefaultSource: Is bootstrapped table => false
21/06/30 13:54:55 WARN hudi.DefaultSource: Loading Base File Only View.
21/06/30 13:54:55 INFO hudi.DefaultSource: Constructing hoodie (as parquet) data source with options :Map(path -> /user/hive/warehouse/test_spark_mor10/*/*/*/*, hoodie.datasource.query.type -> snapshot)
21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://hdp-jk-1:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml, file:/usr/app/apache-hive-2.3.8-bin/conf/hive-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_1073050767_18, ugi=root (auth:SIMPLE)]]]
21/06/30 13:54:55 INFO table.HoodieTableConfig: Loading table properties from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/.hoodie/hoodie.properties
21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO table.HoodieTableMetaClient: Loading Active commit timeline for hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210630135447__commit__COMPLETED]]
21/06/30 13:54:55 INFO view.FileSystemViewManager: Creating InMemory based view for basePath hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Building file system view for partition (2021/06/29)
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: #files found in partition (2021/06/29) =2, Time taken =2
21/06/30 13:54:55 INFO view.HoodieTableFileSystemView: Adding file-groups for partition :2021/06/29, #FileGroups=1
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: addFilesToView: NumFiles=2, NumFileGroups=1, FileGroupsCreationTime=0, StoreTimeTaken=1
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Time to load partition (2021/06/29) =4
21/06/30 13:54:55 INFO hadoop.HoodieROTablePathFilter: Based on hoodie metadata from base path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10, caching 1 files under hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/29
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 1 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:55 INFO view.FileSystemViewManager: Creating InMemory based view for basePath hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Building file system view for partition (2021/06/30)
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: #files found in partition (2021/06/30) =2, Time taken =1
21/06/30 13:54:55 INFO view.HoodieTableFileSystemView: Adding file-groups for partition :2021/06/30, #FileGroups=1
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: addFilesToView: NumFiles=2, NumFileGroups=1, FileGroupsCreationTime=1, StoreTimeTaken=1
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Time to load partition (2021/06/30) =4
21/06/30 13:54:55 INFO hadoop.HoodieROTablePathFilter: Based on hoodie metadata from base path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10, caching 1 files under hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/30
21/06/30 13:54:55 INFO view.AbstractTableFileSystemView: Took 0 ms to read  0 instants, 0 replaced file groups
21/06/30 13:54:55 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/06/30 13:54:55 INFO datasources.InMemoryFileIndex: It took 93 ms to list leaf files for 6 paths.
21/06/30 13:54:55 INFO spark.SparkContext: Starting job: resolveRelation at DefaultSource.scala:193
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Got job 10 (resolveRelation at DefaultSource.scala:193) with 1 output partitions
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Final stage: ResultStage 13 (resolveRelation at DefaultSource.scala:193)
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Submitting ResultStage 13 (MapPartitionsRDD[30] at resolveRelation at DefaultSource.scala:193), which has no missing parents
21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_18 stored as values in memory (estimated size 73.5 KB, free 364.2 MB)
21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_18_piece0 stored as bytes in memory (estimated size 26.7 KB, free 364.1 MB)
21/06/30 13:54:55 INFO storage.BlockManagerInfo: Added broadcast_18_piece0 in memory on hdp-jk-1:36125 (size: 26.7 KB, free: 365.9 MB)
21/06/30 13:54:55 INFO spark.SparkContext: Created broadcast 18 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 13 (MapPartitionsRDD[30] at resolveRelation at DefaultSource.scala:193) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:55 INFO scheduler.TaskSchedulerImpl: Adding task set 13.0 with 1 tasks
21/06/30 13:54:55 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 13.0 (TID 18, localhost, executor driver, partition 0, PROCESS_LOCAL, 7869 bytes)
21/06/30 13:54:55 INFO executor.Executor: Running task 0.0 in stage 13.0 (TID 18)
21/06/30 13:54:55 INFO executor.Executor: Finished task 0.0 in stage 13.0 (TID 18). 1308 bytes result sent to driver
21/06/30 13:54:55 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 13.0 (TID 18) in 197 ms on localhost (executor driver) (1/1)
21/06/30 13:54:55 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool
21/06/30 13:54:55 INFO scheduler.DAGScheduler: ResultStage 13 (resolveRelation at DefaultSource.scala:193) finished in 0.227 s
21/06/30 13:54:55 INFO scheduler.DAGScheduler: Job 10 finished: resolveRelation at DefaultSource.scala:193, took 0.230370 s
新数据:
21/06/30 13:54:55 INFO datasources.FileSourceStrategy: Pruning directories with:
21/06/30 13:54:55 INFO datasources.FileSourceStrategy: Post-Scan Filters:
21/06/30 13:54:55 INFO datasources.FileSourceStrategy: Output Data Schema: struct<_hoodie_commit_time: string, _hoodie_commit_seqno: string, _hoodie_record_key: string, _hoodie_partition_path: string, _hoodie_file_name: string ... 9 more fields>
21/06/30 13:54:55 INFO execution.FileSourceScanExec: Pushed Filters:
21/06/30 13:54:55 INFO codegen.CodeGenerator: Code generated in 63.038365 ms
21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_19 stored as values in memory (estimated size 292.0 KB, free 363.8 MB)
21/06/30 13:54:55 INFO memory.MemoryStore: Block broadcast_19_piece0 stored as bytes in memory (estimated size 25.9 KB, free 363.8 MB)
21/06/30 13:54:55 INFO storage.BlockManagerInfo: Added broadcast_19_piece0 in memory on hdp-jk-1:36125 (size: 25.9 KB, free: 365.9 MB)
21/06/30 13:54:55 INFO spark.SparkContext: Created broadcast 19 from showString at NativeMethodAccessorImpl.java:0
21/06/30 13:54:56 INFO execution.FileSourceScanExec: Planning scan with bin packing, max size: 4629786 bytes, open cost is considered as scanning 4194304 bytes.
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 205
21/06/30 13:54:56 INFO spark.SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 238
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Got job 11 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Final stage: ResultStage 14 (showString at NativeMethodAccessorImpl.java:0)
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 256
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 305
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 266
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 167
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 298
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 264
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting ResultStage 14 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_15_piece0 on hdp-jk-1:36125 in memory (size: 26.2 KB, free: 365.9 MB)
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 182
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 165
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 302
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 301
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 172
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 299
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_13_piece0 on hdp-jk-1:36125 in memory (size: 118.5 KB, free: 366.0 MB)
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 204
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 232
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 161
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_17_piece0 on hdp-jk-1:36125 in memory (size: 25.7 KB, free: 366.0 MB)
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 163
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 316
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 206
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 166
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 175
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 170
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 186
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 201
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 214
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 297
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 300
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 277
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 287
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 292
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 219
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 235
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 249
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 239
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 185
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 254
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 326
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 330
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned shuffle 1
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 222
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 194
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 311
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 278
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 276
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 328
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 215
21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_20 stored as values in memory (estimated size 14.6 KB, free 364.5 MB)
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_16_piece0 on hdp-jk-1:36125 in memory (size: 26.2 KB, free: 366.1 MB)
21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_20_piece0 stored as bytes in memory (estimated size 5.7 KB, free 364.5 MB)
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Added broadcast_20_piece0 in memory on hdp-jk-1:36125 (size: 5.7 KB, free: 366.1 MB)
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 310
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 270
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 291
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 160
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 195
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 288
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 296
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 237
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 227
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 279
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 178
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 236
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 202
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 282
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 226
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 271
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 319
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 284
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 309
21/06/30 13:54:56 INFO spark.SparkContext: Created broadcast 20 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 233
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 257
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 259
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 293
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 14 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Adding task set 14.0 with 1 tasks
21/06/30 13:54:56 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 14.0 (TID 19, localhost, executor driver, partition 0, ANY, 8353 bytes)
21/06/30 13:54:56 INFO executor.Executor: Running task 0.0 in stage 14.0 (TID 19)
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_14_piece0 on hdp-jk-1:36125 in memory (size: 118.8 KB, free: 366.2 MB)
21/06/30 13:54:56 INFO datasources.FileScanRDD: Reading File path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/30/49a56794-40a5-45c2-bd4a-08a566590703-0_1-7-12_20210630135447.parquet, range: 0-435537, partition values: [empty row]
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 320
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 174
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 177
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 190
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 245
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 262
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 164
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 253
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 295
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 285
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 191
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 213
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 216
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 203
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 228
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 231
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 269
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 324
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 221
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 329
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Removed broadcast_18_piece0 on hdp-jk-1:36125 in memory (size: 26.7 KB, free: 366.2 MB)
21/06/30 13:54:56 INFO compress.CodecPool: Got brand-new decompressor [.gz]
21/06/30 13:54:56 INFO executor.Executor: Finished task 0.0 in stage 14.0 (TID 19). 1555 bytes result sent to driver
21/06/30 13:54:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 14.0 (TID 19) in 63 ms on localhost (executor driver) (1/1)
21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks have all completed, from pool
21/06/30 13:54:56 INFO scheduler.DAGScheduler: ResultStage 14 (showString at NativeMethodAccessorImpl.java:0) finished in 0.091 s
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Job 11 finished: showString at NativeMethodAccessorImpl.java:0, took 0.130874 s
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 272
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 321
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 252
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 169
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 183
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 322
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 224
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 220
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 306
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 303
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 261
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 267
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 162
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 176
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 229
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 258
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 218
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 268
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 217
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 280
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 193
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 196
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 234
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 225
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 187
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 159
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 318
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 171
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 274
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 197
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 192
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 244
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 211
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 240
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 246
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 313
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 283
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 198
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 210
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 242
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 290
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 314
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 308
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 200
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 263
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 248
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 212
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 184
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 199
21/06/30 13:54:56 INFO spark.SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Got job 12 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Final stage: ResultStage 15 (showString at NativeMethodAccessorImpl.java:0)
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Missing parents: List()
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting ResultStage 15 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_21 stored as values in memory (estimated size 14.6 KB, free 365.0 MB)
21/06/30 13:54:56 INFO memory.MemoryStore: Block broadcast_21_piece0 stored as bytes in memory (estimated size 5.7 KB, free 365.0 MB)
21/06/30 13:54:56 INFO storage.BlockManagerInfo: Added broadcast_21_piece0 in memory on hdp-jk-1:36125 (size: 5.7 KB, free: 366.2 MB)
21/06/30 13:54:56 INFO spark.SparkContext: Created broadcast 21 from broadcast at DAGScheduler.scala:1184
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 15 (MapPartitionsRDD[34] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(1))
21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Adding task set 15.0 with 1 tasks
21/06/30 13:54:56 INFO storage.BlockManager: Removing RDD 19
21/06/30 13:54:56 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 15.0 (TID 20, localhost, executor driver, partition 1, ANY, 8353 bytes)
21/06/30 13:54:56 INFO executor.Executor: Running task 0.0 in stage 15.0 (TID 20)
21/06/30 13:54:56 INFO datasources.FileScanRDD: Reading File path: hdfs://hdp-jk-1:8020/user/hive/warehouse/test_spark_mor10/2021/06/29/20421052-ee81-4427-ab8f-464d81b40b8f-0_0-7-11_20210630135447.parquet, range: 0-435428, partition values: [empty row]
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned RDD 19
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 307
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 168
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 243
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 289
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 247
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 275
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 209
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 157
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 286
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 173
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 331
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 241
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 251
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 158
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 294
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 179
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 273
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 180
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 323
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 181
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 250
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 304
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 327
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 315
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 255
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 223
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 189
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 281
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 188
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 230
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 207
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 312
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 265
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 317
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 325
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 208
21/06/30 13:54:56 INFO spark.ContextCleaner: Cleaned accumulator 260
21/06/30 13:54:56 INFO compress.CodecPool: Got brand-new decompressor [.gz]
21/06/30 13:54:56 INFO executor.Executor: Finished task 0.0 in stage 15.0 (TID 20). 1476 bytes result sent to driver
21/06/30 13:54:56 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 15.0 (TID 20) in 66 ms on localhost (executor driver) (1/1)
21/06/30 13:54:56 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 15.0, whose tasks have all completed, from pool
21/06/30 13:54:56 INFO scheduler.DAGScheduler: ResultStage 15 (showString at NativeMethodAccessorImpl.java:0) finished in 0.082 s
21/06/30 13:54:56 INFO scheduler.DAGScheduler: Job 12 finished: showString at NativeMethodAccessorImpl.java:0, took 0.086565 s
+-------------------+--------------------+------------------+----------------------+--------------------+---+--------+------+-------------+-----------+-------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|   _hoodie_file_name| id|category|number|  create_time|create_date|  update_time|
+-------------------+--------------------+------------------+----------------------+--------------------+---+--------+------+-------------+-----------+-------------+
|     20210630135447|  20210630135447_1_2|                31|            2021/06/30|49a56794-40a5-45c...| 31|       E|   0.5|1624989817000| 2021/06/30|1624989817000|
|     20210630135447|  20210630135447_1_3|                13|            2021/06/30|49a56794-40a5-45c...| 13|       E|   0.5|1624987648000| 2021/06/30|1624987648000|
|     20210630135447|  20210630135447_1_4|                22|            2021/06/30|49a56794-40a5-45c...| 22|       E|   0.5|1624988926000| 2021/06/30|1624988926000|
|     20210630135447|  20210630135447_0_1|                 6|            2021/06/29|20421052-ee81-442...|  6|       E|   0.5|1624976243000| 2021/06/29|1624976243000|
+-------------------+--------------------+------------------+----------------------+--------------------+---+--------+------+-------------+-----------+-------------+

21/06/30 13:54:56 INFO spark.SparkContext: Invoking stop() from shutdown hook
21/06/30 13:54:56 INFO server.AbstractConnector: Stopped Spark@6260df1d{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
21/06/30 13:54:56 INFO ui.SparkUI: Stopped Spark web UI at http://hdp-jk-1:4041
21/06/30 13:54:56 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/06/30 13:54:56 INFO memory.MemoryStore: MemoryStore cleared
21/06/30 13:54:56 INFO storage.BlockManager: BlockManager stopped
21/06/30 13:54:56 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/06/30 13:54:56 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/06/30 13:54:56 INFO spark.SparkContext: Successfully stopped SparkContext
21/06/30 13:54:56 INFO util.ShutdownHookManager: Shutdown hook called
21/06/30 13:54:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-8f70e7ed-c33c-460b-ace9-787227e65264
21/06/30 13:54:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ea529532-e2c3-44a4-b5ad-4105a92b738c/pyspark-cb620c5f-4452-4845-8ac0-a25b0dc606fa
21/06/30 13:54:56 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ea529532-e2c3-44a4-b5ad-4105a92b738c

标签:INFO,13,06,21,54,30,数据表,读出,Spark
来源: https://blog.csdn.net/qq_33375598/article/details/118539790

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有