pyspark.RDD.repartitionAndSortWithinPartitions#
- RDD.repartitionAndSortWithinPartitions(numPartitions=None, partitionFunc=<function portable_hash>, ascending=True, keyfunc=<function RDD.<lambda>>)[source]#
- Repartition the RDD according to the given partitioner and, within each resulting partition, sort records by their keys. - New in version 1.2.0. - Parameters
- numPartitionsint, optional
- the number of partitions in new - RDD
- partitionFuncfunction, optional, default portable_hash
- a function to compute the partition index 
- ascendingbool, optional, default True
- sort the keys in ascending or descending order 
- keyfuncfunction, optional, default identity mapping
- a function to compute the key 
 
- Returns
 - Examples - >>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) >>> rdd2.glom().collect() [[(0, 5), (0, 8), (2, 6)], [(1, 3), (3, 8), (3, 8)]]