pyspark.SparkContext.union#
- SparkContext.union(rdds)[source]#
- Build the union of a list of RDDs. - This supports unions() of RDDs with different serialized formats, although this forces them to be reserialized using the default serializer: - New in version 0.7.0. - See also - Examples - >>> import os >>> import tempfile >>> with tempfile.TemporaryDirectory(prefix="union") as d: ... # generate a text RDD ... with open(os.path.join(d, "union-text.txt"), "w") as f: ... _ = f.write("Hello") ... text_rdd = sc.textFile(d) ... ... # generate another RDD ... parallelized = sc.parallelize(["World!"]) ... ... unioned = sorted(sc.union([text_rdd, parallelized]).collect()) - >>> unioned ['Hello', 'World!']