1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] How can I filter a list within a Polars column?

Discussão em 'Python' iniciado por Stack, Outubro 4, 2024 às 00:22.

  1. Stack

    Stack Membro Participativo

    Say for example I have data like this:

    import polars as pl

    df = pl.DataFrame(
    {
    "subject": ["subject1", "subject2"],
    "emails": [
    ["samATxyz.com", "janeATxyz.com", "jimATcustomer.org"],
    ["samATxyz.com", "zaneATxyz.com", "basATcustomer.org", "jimATcustomer.org"],
    ],
    }
    )

    df


    shape: (2, 2)
    ┌──────────┬─────────────────────────────────────────────────────────────────────────────┐
    │ subject ┆ emails │
    │ --- ┆ --- │
    │ str ┆ list[str] │
    ╞══════════╪═════════════════════════════════════════════════════════════════════════════╡
    │ subject1 ┆ ["samATxyz.com", "janeATxyz.com", "jimATcustomer.org"] │
    │ subject2 ┆ ["samATxyz.com", "zaneATxyz.com", "basATcustomer.org", "jimATcustomer.org"] │
    └──────────┴─────────────────────────────────────────────────────────────────────────────┘


    I want to filter the data so that the emails column only contain emails that end in "ATxyz.com".

    shape: (2, 2)
    ┌──────────┬───────────────────────────────────┐
    │ subject ┆ emails │
    │ --- ┆ --- │
    │ str ┆ list[str] │
    ╞══════════╪═══════════════════════════════════╡
    │ subject1 ┆ ["samATxyz.com", "janeATxyz.com"] │
    │ subject2 ┆ ["samATxyz.com", "zaneATxyz.com"] │
    └──────────┴───────────────────────────────────┘


    How can I do this using polars?

    I had a few ideas, but I cannot figure out the right syntax, or it seems more complex/verbose than I would expect:

    • Maybe I could somehow filter the data using .list.eval(pl.element() ..., but I cannot figure out how to filter items in the list with this syntax.
    • I could reshape the data using .explode, but this seems verbose and more complex than needed.

    This is as close as I have got

    import polars as pl

    df = pl.DataFrame(
    {
    "subject": ["subject1", "subject2"],
    "emails": [
    ["samATxyz.com", "janeATxyz.com", "jimATcustomer.org"],
    ["samATxyz.com", "zaneATxyz.com", "basATcustomer.org", "jimATcustomer.org"],
    ],
    }
    )

    df.with_columns(
    pl.col("emails").list.eval(pl.element().str.contains("ATxyz")),
    )


    shape: (2, 2)
    ┌──────────┬────────────────────────────┐
    │ subject ┆ emails │
    │ --- ┆ --- │
    │ str ┆ list[bool] │
    ╞══════════╪════════════════════════════╡
    │ subject1 ┆ [true, true, false] │
    │ subject2 ┆ [true, true, false, false] │
    └──────────┴────────────────────────────┘

    Continue reading...

Compartilhe esta Página