Spark TempView和GlobalTempView的区别

Spark TempView和GlobalTempView的区别

TempView和GlobalTempView在spark的Dataframe中经常使用,两者的区别和应用场景有什么不同。

我们以下面的例子比较下两者的不同。

from pyspark.sql import SparkSession import numpy as np import pandas as pd spark = SparkSession.builder.getOrCreate() d = np.random.randint(1,100, 5*5).reshape(5,-1) data = pd.DataFrame(d, columns=list('abcde')) df = spark.createDataFrame(data) df.show() +---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 17| 30| 61| 61| 33| | 32| 23| 24| 7| 7| | 47| 6| 4| 95| 34| | 50| 69| 83| 21| 46| | 52| 12| 83| 49| 85| +---+---+---+---+---+ 从tempview中取数据 temp = df.createTempView('temp') temp_sql = "select * from temp where a=50" res = spark.sql(temp_sql) res.show() +---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 50| 69| 83| 21| 46| +---+---+---+---+---+ 从globaltempview中取数据 glob = df.createGlobalTempView('glob') glob_sql = "select * from global_temp.glob where a = 17" res2 = spark.sql(glob_sql) res2.show() +---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 17| 30| 61| 61| 33| +---+---+---+---+---+ Globaltempview 数据可以在多个sparkSession中共享 # 创建新的sparkSession spark2 = spark.newSession() spark2 == spark False # 新的sparkSession可以获取globaltempview中的数据 new_sql = "select * from global_temp.glob where a = 47" temp = spark2.sql(new_sql) temp.show() +---+---+---+---+---+ | a| b| c| d| e| +---+---+---+---+---+ | 47| 6| 4| 95| 34| +---+---+---+---+---+ # 新的sparkSession无法获取tempview中的数据 # 会提示找不到temp表 new_sql2 = "select * from temp where a = 47" temp = spark2.sql(new_sql2) temp.show() # 使用global_temp前缀也不行 new_sql2 = "select * from global_temp.temp where a = 47" temp = spark2.sql(new_sql2) temp.show() --------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) # 此处多行删除异常信息 AnalysisException: "Table or view not found: `global_temp`.`temp`; line 1 pos 14;\n'Project [*]\n+- 'Filter ('a = 47)\n +- 'UnresolvedRelation `global_temp`.`temp`\n" tempview删除后无法使用 spark.catalog.dropTempView('temp') spark.catalog.dropGlobalTempView('glob') # 报错,找不到table temp temp_sql2 = "select * from temp where a = 47" temp = spark.sql(temp_sql2) # 报错,找不到global_temp.glob,spark和spark2中均报错 glob_sql2 = "select * from global_temp.glob where a = 47" temp = spark.sql(glob_sql2) temp = spark2.sql(glob_sql2) 总结

spark中有四个tempview方法

df.createGlobalTempView

df.createOrReplaceGlobalTempView

df.createOrReplaceTempView

df.createTempView

replace方法:不存在则直接创建,存在则替换

tempview删除后无法使用

两个删除方法
spark.catalog.dropTempView('temp')
spark.catalog.dropGlobalTempView('glob')

TempView和GlobalTempView的异同

tempview只能在一个sparkSession中使用

GlobaltempView可以在多个sparkSession中共享使用

但是他们都不能跨Application使用

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/wsxpfp.html