pyspark.sql.table_arg.TableArg.withSinglePartition#
- TableArg.withSinglePartition()[source]#
Forces the data to be processed in a single partition.
This method indicates that all data should be treated as a single partition. It cannot be called after partitionBy() has been called. orderBy() can be called after this method to order the data within the single partition.
- Returns
TableArgA new TableArg instance with single partition constraint applied.
Examples
>>> from pyspark.sql.functions import udtf >>> >>> @udtf(returnType="key: int, value: string") ... class ProcessUDTF: ... def eval(self, row): ... yield row["key"], row["value"] ... >>> df = spark.createDataFrame( ... [(1, "a"), (2, "b"), (3, "c")], ["key", "value"] ... ) >>> >>> # Process all data in a single partition >>> result = ProcessUDTF(df.asTable().withSinglePartition()) >>> result.show() +---+-----+ |key|value| +---+-----+ | 1| a| | 2| b| | 3| c| +---+-----+ >>> >>> # Use withSinglePartition and orderBy together >>> df2 = spark.createDataFrame( ... [(3, "c"), (1, "a"), (2, "b")], ["key", "value"] ... ) >>> result2 = ProcessUDTF(df2.asTable().withSinglePartition().orderBy("key")) >>> result2.show() +---+-----+ |key|value| +---+-----+ | 1| a| | 2| b| | 3| c| +---+-----+