1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Replicate pandas ngroup behaviour in polars

Discussão em 'Python' iniciado por Stack, Setembro 27, 2024 às 21:12.

  1. Stack

    Stack Membro Participativo

    I am currently trying to replicate ngroup behaviour in polars to get consecutive group indexes (the dataframe will be grouped over two columns). For the R crowd, this would be achieved in the dplyr world with dplyr::group_indices or the newer dplyr::cur_group_id.

    As shown in the repro, I've tried couple avenues without much succcess, both approaches miss group sequentiality and merely return row counts by group.

    Quick repro:

    import polars as pl
    import pandas as pd

    df = pd.DataFrame(
    {
    "id": ["a", "a", "a", "a", "b", "b", "b", "b"],
    "cat": [1, 1, 2, 2, 1, 1, 2, 2],
    }
    )

    df_pl = pl.from_pandas(df)

    print(df.groupby(["id", "cat"]).ngroup())
    # This is the desired behaviour
    # 0 0
    # 1 0
    # 2 1
    # 3 1
    # 4 2
    # 5 2
    # 6 3
    # 7 3

    print(df_pl.select(pl.len().over("id", "cat")))
    # This is only counting observation by group
    # ┌─────┐
    # │ len │
    # │ --- │
    # │ u32 │
    # ╞═════╡
    # │ 2 │
    # │ 2 │
    # │ 2 │
    # │ 2 │
    # │ 2 │
    # │ 2 │
    # │ 2 │
    # │ 2 │
    # └─────┘

    print(df_pl.group_by("id", "cat").agg(pl.len().alias("test")))
    # shape: (4, 3)
    # ┌─────┬─────┬──────┐
    # │ id ┆ cat ┆ test │
    # │ --- ┆ --- ┆ --- │
    # │ str ┆ i64 ┆ u32 │
    # ╞═════╪═════╪══════╡
    # │ a ┆ 1 ┆ 2 │
    # │ a ┆ 2 ┆ 2 │
    # │ b ┆ 1 ┆ 2 │
    # │ b ┆ 2 ┆ 2 │
    # └─────┴─────┴──────┘

    Continue reading...

Compartilhe esta Página