pyspark.sql.functions.count_distinct#
- pyspark.sql.functions.count_distinct(col, *cols)[source]#
- Returns a new - Columnfor distinct count of- color- cols.- New in version 3.2.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- Returns
- Column
- distinct values of these two column values. 
 
 - Examples - Example 1: Counting distinct values of a single column - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1,), (1,), (3,)], ["value"]) >>> df.select(sf.count_distinct(df.value)).show() +---------------------+ |count(DISTINCT value)| +---------------------+ | 2| +---------------------+ - Example 2: Counting distinct values of multiple columns - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1, 1), (1, 2)], ["value1", "value2"]) >>> df.select(sf.count_distinct(df.value1, df.value2)).show() +------------------------------+ |count(DISTINCT value1, value2)| +------------------------------+ | 2| +------------------------------+ - Example 3: Counting distinct values with column names as strings - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1, 1), (1, 2)], ["value1", "value2"]) >>> df.select(sf.count_distinct("value1", "value2")).show() +------------------------------+ |count(DISTINCT value1, value2)| +------------------------------+ | 2| +------------------------------+