pyspark.RDD.intersection#
- RDD.intersection(other)[source]#
- Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did. - New in version 1.0.0. - See also - Notes - This method performs a shuffle internally. - Examples - >>> rdd1 = sc.parallelize([1, 10, 2, 3, 4, 5]) >>> rdd2 = sc.parallelize([1, 6, 2, 3, 7, 8]) >>> rdd1.intersection(rdd2).collect() [1, 2, 3]