1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Is there a way to utilize polars map_batches to make this code more efficient?

Discussão em 'Python' iniciado por Stack, Outubro 3, 2024 às 15:12.

  1. Stack

    Stack Membro Participativo

    I have some polars code that functionally can do what I want, but I feel it is an inefficient implementation at best. I feel that there must be some way to achieve the same result with .map_batches(), but I can't figure out how. Any thoughts or suggestions?

    Specifically, my data are organized as follows: Each column is a location, and each row is a datetime. What I'm trying to do is calculate the maximum count of consecutive non-zero values (which I converted to Booleans because I don't need the magnitude of the value, I just need to know if the value is zero or not). Example data and example expected output below:

    Example Dummy Data

    pivoted_df = pl.from_repr("""
    ┌─────────────────────┬────────────┬────────────┐
    │ Date ┆ Location 1 ┆ Location 2 │
    │ --- ┆ --- ┆ --- │
    │ datetime[ns] ┆ i64 ┆ i64 │
    ╞═════════════════════╪════════════╪════════════╡
    │ 2023-01-01 00:00:00 ┆ 0 ┆ 1 │
    │ 2023-01-01 01:00:00 ┆ 1 ┆ 1 │
    │ 2023-01-01 02:00:00 ┆ 1 ┆ 1 │
    │ 2023-01-01 03:00:00 ┆ 0 ┆ 1 │
    │ 2023-01-01 04:00:00 ┆ 1 ┆ 1 │
    │ 2023-01-01 05:00:00 ┆ 1 ┆ 0 │
    │ 2023-01-01 06:00:00 ┆ 1 ┆ 0 │
    └─────────────────────┴────────────┴────────────┘
    """)


    Expected Output:

    ┌────────────┬───────┐
    │ Location ┆ Value │
    │ --- ┆ --- │
    │ str ┆ i32 │
    ╞════════════╪═══════╡
    │ Location 1 ┆ 3 │
    │ Location 2 ┆ 5 │
    └────────────┴───────┘


    Below is the code I have that is functional, but feels like it can be improved my someone smarter and more well-versed in polars than I am.

    for col in pivoted_df.drop("Date").columns:
    xy_cont_df_a = (
    pivoted_df.select(pl.col(col))
    .with_columns(
    pl.when(
    pl.col(col).cast(pl.Boolean)
    & pl.col(col)
    .cast(pl.Boolean)
    .shift(-1, fill_value=False)
    .not_()
    ).then(
    pl.count().over(
    (
    pl.col(col).cast(pl.Boolean)
    != pl.col(col).cast(pl.Boolean).shift()
    ).cum_sum()
    )
    )
    )
    .max()
    )

    Continue reading...

Compartilhe esta Página