PySpark Part 1 : How to create a spark session ?

Spark session is the entry point to programming Spark. An entry point is where control is transferred from the operating system to the provided program.

# pip install pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('abc').getOrCreate()
  • Spark Session is part of sql module.
  • Class:- pyspark.sql.SparkSession(sparkContextjsparkSession=None)
  • A SparkSession can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files.
  • Refer Official website

Add Configurations to the spark session:-

spark.conf.set("spark.executor.memory", '4g')
spark.conf.set('spark.executor.cores', '2')
spark.conf.set('spark.cores.max', '2')
spark.conf.set("spark.driver.memory",'4g')

Read Part 2

Leave a Reply

Your email address will not be published. Required fields are marked *