pyspark.sql.SparkSession#
- class pyspark.sql.SparkSession(sparkContext, jsparkSession=None, options={})[source]#
- The entry point to programming Spark with the Dataset and DataFrame API. - A SparkSession can be used to create - DataFrame, register- DataFrameas tables, execute SQL over tables, cache tables, and read parquet files. To create a- SparkSession, use the following builder pattern:- Changed in version 3.4.0: Supports Spark Connect. - builder#
- Creates a - Builderfor constructing a- SparkSession.- Changed in version 3.4.0: Supports Spark Connect. 
 - Examples - Create a Spark session. - >>> spark = ( ... SparkSession.builder ... .master("local") ... .appName("Word Count") ... .config("spark.some.config.option", "some-value") ... .getOrCreate() ... ) - Create a Spark session with Spark Connect. - >>> spark = ( ... SparkSession.builder ... .remote("sc://localhost") ... .appName("Word Count") ... .config("spark.some.config.option", "some-value") ... .getOrCreate() ... ) - Methods - active()- Returns the active or default - SparkSessionfor the current thread, returned by the builder.- addArtifact(*path[, pyfile, archive, file])- Add artifact(s) to the client session. - addArtifacts(*path[, pyfile, archive, file])- Add artifact(s) to the client session. - addTag(tag)- Add a tag to be assigned to all the operations started by this thread in this session. - Clear all registered progress handlers. - Clear the current thread's operation tags. - copyFromLocalToFs(local_path, dest_path)- Copy file from local to cloud storage file system. - createDataFrame(data[, schema, ...])- Creates a - DataFramefrom an- RDD, a list, a- pandas.DataFrameor a- numpy.ndarray.- Returns the active - SparkSessionfor the current thread, returned by the builder- getTags()- Get the tags that are currently set to be assigned to all the operations started by this thread. - Interrupt all operations of this session currently running on the connected server. - interruptOperation(op_id)- Interrupt an operation of this session with the given operationId. - interruptTag(tag)- Interrupt all operations of this session with the given operation tag. - Returns a new - SparkSessionas new session, that has separate SQLConf, registered temporary views and UDFs, but shared- SparkContextand table cache.- range(start[, end, step, numPartitions])- Create a - DataFramewith single- pyspark.sql.types.LongTypecolumn named- id, containing elements in a range from- startto- end(exclusive) with step value- step.- registerProgressHandler(handler)- Register a progress handler to be called when a progress update is received from the server. - removeProgressHandler(handler)- Remove a progress handler that was previously registered. - removeTag(tag)- Remove a tag previously added to be assigned to all the operations started by this thread in this session. - sql(sqlQuery[, args])- Returns a - DataFramerepresenting the result of the given query.- stop()- Stop the underlying - SparkContext.- table(tableName)- Returns the specified table as a - DataFrame.- Attributes - Creates a - Builderfor constructing a- SparkSession.- Interface through which the user may create, drop, alter or query underlying databases, tables, functions, etc. - Gives access to the Spark Connect client. - Runtime configuration interface for Spark. - Returns a - DataSourceRegistrationfor data source registration.- Returns a - Profilefor performance/memory profiling.- Returns a - DataFrameReaderthat can be used to read data in as a- DataFrame.- Returns a - DataStreamReaderthat can be used to read data streams as a streaming- DataFrame.- Returns the underlying - SparkContext.- Returns a - StreamingQueryManagerthat allows managing all the- StreamingQueryinstances active on this context.- Returns a - UDFRegistrationfor UDF registration.- Returns a - UDTFRegistrationfor UDTF registration.- The version of Spark on which this application is running.