SparkContext

2、Spark Core职责之初始化（1）——SparkContext2020-03-24 13:03:37

SparkContext（Spark上下文） /** * Main entry point for Spark functionality. A SparkContext represents the connection to a Spark * cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. * * @note Only one `SparkConte
spark的yarn cluster模式和yarn clinet模式的区别2020-02-26 21:59:08

1、spark的yarn cluster模式和yarn clinet模式的区别： Driver：表示main()函数，创建SparkContext。由SparkContext负责与ClusterManager通信，进行资源的申请，任务的分配和监控等。程序执行完毕后关闭SparkContext。理解YARN-Client和YARN-Cluster深层次的区别之前先清楚一个概念：Applica
【Spark】Spark常用方法总结1-创建编程入口（Python版本）2020-01-29 19:09:00

前言今天有时间，将自己的笔记分享出来，方便同僚查阅。不断详细与更新中。为了方便，例子都是以Python写的，后续也会有其他语言的版本。创建编程入口 SparkContext入口 from pyspark import SparkConf, SparkContext if __name__ == '__main__': conf = SparkConf().setApp
【转帖】Spark设计理念与基本架构2019-12-01 15:02:56

Spark设计理念与基本架构 https://www.cnblogs.com/swordfall/p/9280006.html 1.基本概念 Spark中的一些概念： RDD(resillient distributed dataset)：弹性分布式数据集。 Partition：数据分区。即一个RDD的数据可以划分为多少个分区。 NarrowDependency：窄依赖，即子RDD依赖
初识Spark2019-11-25 11:03:52

Spark(一): 基本架构及原理 Apache Spark是一个围绕速度、易用性和复杂分析构建的大数据处理框架，最初在2009年由加州大学伯克利分校的AMPLab开发，并于2010年成为Apache的开源项目之一，与Hadoop和Storm等其他大数据和MapReduce技术相比，Spark有如下优势： Spark提供了一个全面、
Spark(一): 基本架构及原理2019-10-01 23:56:01

Apache Spark是一个围绕速度、易用性和复杂分析构建的大数据处理框架，最初在2009年由加州大学伯克利分校的AMPLab开发，并于2010年成为Apache的开源项目之一，与Hadoop和Storm等其他大数据和MapReduce技术相比，Spark有如下优势： Spark提供了一个全面、统一的框架用于管理各种有着不同
SPARK运行基本流程2019-09-16 10:06:40

1）构建sparkContext 2）向资源管理器申请本次Spark运行需要的executor资源，并启动分布在各个节点上的executor 3）sparkContext进行任务拆解，并生达成任务集合（taskSet）并将任务集合交给任务调度器（Task Scheduler） 4）executor向任务调度器申请任务，任务调度器将任务分配给Executor并spa
Spark学习实例(Python)：RDD转换 Transformations2019-08-20 17:06:23

RDD是弹性分布式数据集，一种特殊集合，可以被缓存支持并行操作，一个RDD代表一个分区里的数据集转换操作有： map(func) filter(func) flatMap(func) mapPartitions(func) sample(withReplacement, fraction, seed) union(otherDataset) intersection(otherDataset) distinct([numPa
spark源码之SparkContext2019-07-18 18:56:07

SparkContext可以说是Spark应用的发动机引擎，Spark Drive的初始化围绕这SparkContext的初始化。 SparkContext总览 sparkcontxt的主要组成部分 sparkEnv：spark运行环境，Executor是处理任务的执行器，依赖于SparkEnv的环境。Driver中也包含SparkEnv，为了保证Local模式下任务执行。此外，Spa
Spark运行结构简洁版2019-07-16 09:04:03

mappartition：把每个分区中的内容作为整体来处理 mapPartitionsWithIndex 函数作用同mapPartitions，不过提供了两个参数，第一个参数为分区的索引。 mappartition之前应该先设置分区repartition partition 分区，默认为1，可以在local[] 设置，也可以parallelize的时候设置 TaskSetM
8.spark Core 进阶12019-07-15 09:54:20

(e.g. standalone manager, Mesos, YARN) In "cluster" mode, the framework launches the driver inside of the cluster. In "client" mode, the submitter launches the driver outside of the cluster.
SparkCore之运oa信用盘源码搭建行架构2019-07-01 14:01:58

集群模式概述oa信用盘源码搭建【地瓜源码论坛diguaym.com】企饿2152876294This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved. Read through the application submission guide to learn about launc
Spark（五十一）：Spark On YARN（Yarn-Cluster模式）启动流程源码分析（二）2019-06-25 22:48:46

上篇《Spark（四十九）：Spark On YARN启动流程源码分析（一）》我们讲到启动SparkContext初始化，ApplicationMaster启动资源中，讲解的内容明显不完整。本章将针对yarn-cluster（--master yarn –deploy-mode cluster）模式下全面进行代码补充解读： 1）什么时候初始化SparkContext； 2）如何实现Applicat
spark简介和一些核心概念2019-06-22 16:47:53

一、spark概念官网描述： Apache Spark是一个用于大规模数据处理的统一分析引擎。二、spark特点 1、高速性 Apache Spark使用最先进的DAG调度程序、查询优化器和物理执行引擎，实现了批处理和流数据的高性能；运行速度是hadoop的100倍以上。 2、易用性 Apache Spark程序可以
Spark 核心API2019-06-15 12:55:32

1.SparkConf Spark配置对象，设置各种参数，使用kv类型。 2.SparkContext spark主要入口点，代表到spark集群的连接，可以创建 rdd、累加器和广播变量。每个JVM中只能有一个SparkContext，启动新的SparkContext必须stop的原来的。 val rdd1 = sc.t
Spark 学习（七） Spark的运行流程2019-06-08 22:40:15

一，Spark中的基本概念二，Spark的运行流程三，Spark在不同集群的运行架构　　3.1 Spark on Standalone运行流程　　3.2 Spark on YARN运行过程正文文章原文：https://www.cnblogs.com/qingyunzong/p/8945933.html 一，Spark中的基本概念　　在进行Spark的运作流程分析前请看下
Spark-2.4.0源码：sparkContext2019-05-30 14:42:01

　　在看sparkContext之前，先回顾一下Scala的语法。Scala构造函数分主构造和辅构造函数，辅构造函数是关键字def+this定义的，而类中不在方法体也不在辅构造函数中的代码就是主构造函数，实例化对象的时候主构造函数都会被执行，例：　　 class person(name String,age Int){ println("主
2.初始化spark2019-05-27 10:37:46

参考： RDD programming guide http://spark.apache.org/docs/latest/rdd-programming-guide.html SQL programming guide http://spark.apache.org/docs/latest/sql-programming-guide.html we highly recommend you to switch to use Dataset, which has
spark的core知识之官网分享2019-05-10 17:39:57

以下的这篇文章是spark官网关于集群规模的一篇概述，以及一些术语的解释，还有一些图解架构 Cluster Mode Overview 群集模式概述 This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved. Read through the
异常-User class threw exception: java.lang.IllegalStateException: Cannot call methods on a stopped Spa2019-05-09 19:44:10

1 详细信息 User class threw exception: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext. This stopped SparkContext was created at: org.apache.spark.SparkContext.<init>(SparkContext.scala:76) com.wm.bigdata.spark.etl.RentO
Spark3_SparkContext2019-05-06 17:55:02

SparkContext 1.简介 1.1.tells Spark how to access a cluster (告诉Spark如何去连接集群) 开发过程中使用的运行模式包括local statdalone yarn mesos,设置完之后,spark就知道job作业运行在什么模式之上. 1.2.create a SparkConf(key-value pairs) SparkConf包含了Applicat
Spark本地测试异常之 System memory 259522560 must be at least 471859200.2019-04-08 20:50:45

解决Spark本地测试异常之 System memory 259522560 must be at least 471859200一、异常如下二、抛出异常原因三、解决办法一、异常如下 java.lang.IllegalArgumentException: System memory 259522560 must be at least 471859200. Please increase heap size using the --
IntelliJ进行Spark编程之WordCount2019-03-25 09:47:34

项目目录如下：代码： import org.apache.spark.{SparkConf, SparkContext} object WordCount{ def main(args:Array[String]) : Unit ={ val conf=new SparkConf().setAppName("word count").setMaster("local"); val sc=new SparkContext(conf);
SparkSQL——HiveContext的使用2019-01-25 09:00:53

HiveContext的使用 HiveContext也是已经过时的不推荐使用。相关配置如果想要用spark访问hive的表需要进行一下的配置 1. 拷贝 ${HIVE_HOME}/conf/hive-site.xml到 ${SPARK_HOME}/conf中 2. 在pom.xml文件中添加一下依赖示例代码 package com.spark import org.apache.spar
SparkSQL——SparkSession的使用2019-01-25 09:00:23

在spark的早期版本中，SparkContext是spark的主要切入点，由于RDD是主要的API，我们通过sparkcontext来创建和操作RDD。对于每个其他的API，我们需要使用不同的context。例如，对于Streming，我们需要使用StreamingContext；对于sql，使用sqlContext；对于hive，使用hiveContext。但是随着D

首页 < 1 2 3 4 > 尾页

ICode9

2、Spark Core职责之初始化（1）——SparkContext2020-03-24 13:03:37

spark的yarn cluster模式和yarn clinet模式的区别2020-02-26 21:59:08

【Spark】Spark常用方法总结1-创建编程入口（Python版本）2020-01-29 19:09:00

【转帖】Spark设计理念与基本架构2019-12-01 15:02:56

初识Spark2019-11-25 11:03:52

Spark(一): 基本架构及原理2019-10-01 23:56:01

SPARK运行基本流程2019-09-16 10:06:40

Spark学习实例(Python)：RDD转换 Transformations2019-08-20 17:06:23

spark源码之SparkContext2019-07-18 18:56:07

Spark运行结构简洁版2019-07-16 09:04:03

8.spark Core 进阶12019-07-15 09:54:20

SparkCore之运oa信用盘源码搭建行架构2019-07-01 14:01:58

Spark（五十一）：Spark On YARN（Yarn-Cluster模式）启动流程源码分析（二）2019-06-25 22:48:46

spark简介和一些核心概念2019-06-22 16:47:53

Spark 核心API2019-06-15 12:55:32

Spark 学习（七） Spark的运行流程2019-06-08 22:40:15

Spark-2.4.0源码：sparkContext2019-05-30 14:42:01

2.初始化spark2019-05-27 10:37:46

spark的core知识之官网分享2019-05-10 17:39:57

异常-User class threw exception: java.lang.IllegalStateException: Cannot call methods on a stopped Spa2019-05-09 19:44:10

Spark3_SparkContext2019-05-06 17:55:02

Spark本地测试异常之 System memory 259522560 must be at least 471859200.2019-04-08 20:50:45

IntelliJ进行Spark编程之WordCount2019-03-25 09:47:34

SparkSQL——HiveContext的使用2019-01-25 09:00:53

SparkSQL——SparkSession的使用2019-01-25 09:00:23