1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] How to flatten/split a tuple of arrays and calculate column means in Polars dataframe?

Discussão em 'Python' iniciado por Stack, Outubro 3, 2024 às 20:32.

  1. Stack

    Stack Membro Participativo

    I have a dataframe as follows:

    df = pl.DataFrame(
    {"a": [([1, 2, 3], [2, 3, 4], [6, 7, 8]), ([1, 2, 3], [3, 4, 5], [5, 7, 9])]}
    )


    Basically, each cell of a is a tuple of three arrays of the same length. I want to fully split them to separate columns (one scalar resides in one column) like the shape below:

    shape: (2, 9)
    ┌─────────┬─────────┬─────────┬─────────┬─────┬─────────┬─────────┬─────────┬─────────┐
    │ field_0 ┆ field_1 ┆ field_2 ┆ field_3 ┆ ... ┆ field_5 ┆ field_6 ┆ field_7 ┆ field_8 │
    │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
    │ i64 ┆ i64 ┆ i64 ┆ i64 ┆ ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
    ╞═════════╪═════════╪═════════╪═════════╪═════╪═════════╪═════════╪═════════╪═════════╡
    │ 1 ┆ 2 ┆ 3 ┆ 2 ┆ ... ┆ 4 ┆ 6 ┆ 7 ┆ 8 │
    │ 1 ┆ 2 ┆ 3 ┆ 3 ┆ ... ┆ 5 ┆ 5 ┆ 7 ┆ 9 │
    └─────────┴─────────┴─────────┴─────────┴─────┴─────────┴─────────┴─────────┴─────────┘


    One way I have tried is to use list.to_struct and unnest two times to fully flatten the two nested levels. Two levels is fine here, but if there are a variety of nested levels and the number could not be determined ahead, the code will be so long.

    Is there any simpler (or more systematic) way to achieve this?

    Continue reading...

Compartilhe esta Página