pyspark.sql.functions.map_filter#

pyspark.sql.functions.map_filter(col, f)[source]#

Collection function: Returns a new map column whose key-value pairs satisfy a given predicate function.

New in version 3.1.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

The name of the column or a column expression representing the map to be filtered.

ffunction

A binary function (k: Column, v: Column) -> Column... that defines the predicate. This function should return a boolean column that will be used to filter the input map. Can use methods of Column, functions defined in pyspark.sql.functions and Scala UserDefinedFunctions. Python UserDefinedFunctions are not supported (SPARK-27052).

Returns
Column

A new map column containing only the key-value pairs that satisfy the predicate.

Examples

Example 1: Filtering a map with a simple condition

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(1, {"foo": 42.0, "bar": 1.0, "baz": 32.0})], ("id", "data"))
>>> row = df.select(
...   sf.map_filter("data", lambda _, v: v > 30.0).alias("data_filtered")
... ).head()
>>> sorted(row["data_filtered"].items())
[('baz', 32.0), ('foo', 42.0)]

Example 2: Filtering a map with a condition on keys

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(1, {"foo": 42.0, "bar": 1.0, "baz": 32.0})], ("id", "data"))
>>> row = df.select(
...   sf.map_filter("data", lambda k, _: k.startswith("b")).alias("data_filtered")
... ).head()
>>> sorted(row["data_filtered"].items())
[('bar', 1.0), ('baz', 32.0)]

Example 3: Filtering a map with a complex condition

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(1, {"foo": 42.0, "bar": 1.0, "baz": 32.0})], ("id", "data"))
>>> row = df.select(
...   sf.map_filter("data", lambda k, v: k.startswith("b") & (v > 1.0)).alias("data_filtered")
... ).head()
>>> sorted(row["data_filtered"].items())
[('baz', 32.0)]