PySpark Part 2.1 : How to create a spark dataframe from JSON?

Step 1: Create a Spark Session

from pyspark.sql import SparkSession
sparkSession = SparkSession.builder.appName('abc').getOrCreate()

Step 2: Define a JSON

import json
input = {"column1": "value1",
	"column2": "value2"}

Step 3: Create a Spark DataFrame

input = [json.dumps(input)]
jsonrdd = sparkSession.sparkContext.parallelize(input)
df = sparkSession.read.json(jsonrdd)
df.show()

Read Part 3 :- Ways to Select Column of DataFrame

Leave a Reply

Your email address will not be published. Required fields are marked *