pyspark.sql.table_arg.TableArg.withSinglePartition#

TableArg.withSinglePartition()[source]#

Forces the data to be processed in a single partition.

This method indicates that all data should be treated as a single partition. It cannot be called after partitionBy() has been called. orderBy() can be called after this method to order the data within the single partition.

Returns

TableArg: A new TableArg instance with single partition constraint applied.

Examples

>>> from pyspark.sql.functions import udtf
>>>
>>> @udtf(returnType="key: int, value: string")
... class ProcessUDTF:
...     def eval(self, row):
...         yield row["key"], row["value"]
...
>>> df = spark.createDataFrame(
...     [(1, "a"), (2, "b"), (3, "c")], ["key", "value"]
... )
>>>
>>> # Process all data in a single partition
>>> result = ProcessUDTF(df.asTable().withSinglePartition())
>>> result.show()
+---+-----+
|key|value|
+---+-----+
|  1|    a|
|  2|    b|
|  3|    c|
+---+-----+
>>>
>>> # Use withSinglePartition and orderBy together
>>> df2 = spark.createDataFrame(
...     [(3, "c"), (1, "a"), (2, "b")], ["key", "value"]
... )
>>> result2 = ProcessUDTF(df2.asTable().withSinglePartition().orderBy("key"))
>>> result2.show()
+---+-----+
|key|value|
+---+-----+
|  1|    a|
|  2|    b|
|  3|    c|
+---+-----+