pyspark.sql.functions.theta_union_agg#
- pyspark.sql.functions.theta_union_agg(col, lgNomEntries=None)[source]#
Aggregate function: returns the compact binary representation of the Datasketches ThetaSketch that is the union of the Theta sketches in the input column.
New in version 4.1.0.
- Parameters
- Returns
ColumnThe binary representation of the merged ThetaSketch.
See also
Examples
>>> from pyspark.sql import functions as sf >>> df1 = spark.createDataFrame([1,2,2,3], "INT") >>> df1 = df1.agg(sf.theta_sketch_agg("value").alias("sketch")) >>> df2 = spark.createDataFrame([4,5,5,6], "INT") >>> df2 = df2.agg(sf.theta_sketch_agg("value").alias("sketch")) >>> df3 = df1.union(df2) >>> df3.agg(sf.theta_sketch_estimate(sf.theta_union_agg("sketch"))).show() +--------------------------------------------------+ |theta_sketch_estimate(theta_union_agg(sketch, 12))| +--------------------------------------------------+ | 6| +--------------------------------------------------+