1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] How to extract a struct values to a list in polars

Discussão em 'Python' iniciado por Stack, Setembro 12, 2024.

  1. Stack

    Stack Membro Participativo

    I have a pipeline that extracts regular expression groups from a polars text column, and I want to display the text and matches in a streamlit st.table. Provided that the column containing the matches is a List[str] this works well with streamlit, but polars extract_groups returns a struct (understandable since regex groups can be named).

    The code below works, but is there a way of doing this without using map_elements? In general, there could be 0, 1 or more match groups and I'd like to retain the output dtype of list[str] for streamlit compatibility.

    dataframe = pl.DataFrame([{"text":"ABC"}, {"text":"123"}])

    regexp = r'^.*(\d+).*$'
    dataframe = dataframe.with_columns(regexp_match=pl.col("text").str.extract_groups(regexp)).filter(pl.col("regexp_match").struct["1"].is_not_null())
    dataframe.with_columns(
    regexp_match = pl.struct(["regexp_match"]).map_elements(lambda x: list(x['regexp_match'].values()), return_dtype=pl.List(str))
    )

    Continue reading...

Compartilhe esta Página