pyspark.sql.functions.flatten#
- pyspark.sql.functions.flatten(col)[source]#
- Array function: creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed. - New in version 2.4.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- colColumnor str
- The name of the column or expression to be flattened. 
 
- col
- Returns
- Column
- A new column that contains the flattened array. 
 
 - Examples - Example 1: Flattening a simple nested array - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([[1, 2, 3], [4, 5], [6]],)], ['data']) >>> df.select(sf.flatten(df.data)).show() +------------------+ | flatten(data)| +------------------+ |[1, 2, 3, 4, 5, 6]| +------------------+ - Example 2: Flattening an array with null values - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([None, [4, 5]],)], ['data']) >>> df.select(sf.flatten(df.data)).show() +-------------+ |flatten(data)| +-------------+ | NULL| +-------------+ - Example 3: Flattening an array with more than two levels of nesting - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([[[1, 2], [3, 4]], [[5, 6], [7, 8]]],)], ['data']) >>> df.select(sf.flatten(df.data)).show(truncate=False) +--------------------------------+ |flatten(data) | +--------------------------------+ |[[1, 2], [3, 4], [5, 6], [7, 8]]| +--------------------------------+ - Example 4: Flattening an array with mixed types - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([['a', 'b', 'c'], [1, 2, 3]],)], ['data']) >>> df.select(sf.flatten(df.data)).show() +------------------+ | flatten(data)| +------------------+ |[a, b, c, 1, 2, 3]| +------------------+