pyspark.sql.functions.map_filter#
- pyspark.sql.functions.map_filter(col, f)[source]#
Collection function: Returns a new map column whose key-value pairs satisfy a given predicate function.
New in version 3.1.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- col
Column
or str The name of the column or a column expression representing the map to be filtered.
- ffunction
A binary function
(k: Column, v: Column) -> Column...
that defines the predicate. This function should return a boolean column that will be used to filter the input map. Can use methods ofColumn
, functions defined inpyspark.sql.functions
and ScalaUserDefinedFunctions
. PythonUserDefinedFunctions
are not supported (SPARK-27052).
- col
- Returns
Column
A new map column containing only the key-value pairs that satisfy the predicate.
Examples
Example 1: Filtering a map with a simple condition
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1, {"foo": 42.0, "bar": 1.0, "baz": 32.0})], ("id", "data")) >>> row = df.select( ... sf.map_filter("data", lambda _, v: v > 30.0).alias("data_filtered") ... ).head() >>> sorted(row["data_filtered"].items()) [('baz', 32.0), ('foo', 42.0)]
Example 2: Filtering a map with a condition on keys
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1, {"foo": 42.0, "bar": 1.0, "baz": 32.0})], ("id", "data")) >>> row = df.select( ... sf.map_filter("data", lambda k, _: k.startswith("b")).alias("data_filtered") ... ).head() >>> sorted(row["data_filtered"].items()) [('bar', 1.0), ('baz', 32.0)]
Example 3: Filtering a map with a complex condition
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1, {"foo": 42.0, "bar": 1.0, "baz": 32.0})], ("id", "data")) >>> row = df.select( ... sf.map_filter("data", lambda k, v: k.startswith("b") & (v > 1.0)).alias("data_filtered") ... ).head() >>> sorted(row["data_filtered"].items()) [('baz', 32.0)]