标签:closure even run executors driver accumulators 实例 transformation sent
1.RDD的特性:
1.persistent
2.lazy transformation
2.Cluster mode集群模式
Only one master/worker can run on the same machine, but a machine can be both a master and a worker
3.where to run
Most run on drivers
transformations run on executors
actions - executors and drivers
example :
Example: Let’s say you want to combine two RDDs: a, b. You remember that rdd.collect() returns a list, and in Python you can combine two lists with +A naïve implementation would be:
a = RDDa.collect() driver
b = RDDb.collect() driver
RDDc = sc.parallelize(a+b) executor
Where does this code run?
In the first line, all distributed data for a and b is sent to driver. What if a and/or b is very large? Driver could run out of memory. Also, it takes a long time to send the data to the driver.In the third line, all data is sent from driver to executors.
The correct way:
RDDc = RDDa.union(RDDb)
This runs completely at executors.
4.Closure – transformation
Any global variables used by those executors and are copies
This closure is serialized and sent to each executor from the driver when an action is invoked.
5.Accumulators – action
only driver read its value
closure和accumulator的对比
odd = sc.accumulator(0)
even = 0
def count(element):
global even
if element % 2 == 0:
even += 1
else: odd.add(1)
sc.parallelize([1, 6, 7, 8, 3, 4, 4, 2]).foreach(count) print odd,even
输出结果为3,0 因为even根本没有真正执行过,都是executor自己玩自己的。
不适合在一些transformation类型的操作中使用,很容易有问题。
标签:closure,even,run,executors,driver,accumulators,实例,transformation,sent 来源: https://blog.csdn.net/m0_37754282/article/details/111143826
本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享; 2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关; 3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关; 4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除; 5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。