pyspark.sql.functions.collect_set¶
- 
pyspark.sql.functions.collect_set(col: ColumnOrName) → pyspark.sql.column.Column[source]¶
- Aggregate function: returns a set of objects with duplicate elements eliminated. - New in version 1.6.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- colColumnor str
- target column to compute on. 
 
- col
- Returns
- Column
- list of objects with no duplicates. 
 
 - Notes - The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle. - Examples - >>> df2 = spark.createDataFrame([(2,), (5,), (5,)], ('age',)) >>> df2.agg(array_sort(collect_set('age')).alias('c')).collect() [Row(c=[2, 5])]