1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] How to apply function to multiple columns

Discussão em 'Python' iniciado por Stack, Setembro 28, 2024 às 12:43.

  1. Stack

    Stack Membro Participativo

    I would like to replace with NaNs values that are more than 0.99 quantile and less than 0.01 quantile in the whole dataframe.
    For now I found a way of doing so with one column, so I can do it one-at-a-time, but maybe there is possibility to apply the function to all the columns without ugly for-loops?

    I also tried numpy implementation with masking, but since the length of the result is not constant, this does not seem like a proper solution to me.

    Quantile replacer for one column that works:

    train_pl.select(
    pl.when(pl.col('B_14') > pl.col('B_14').quantile(0.99))
    .then(float("nan"))
    .otherwise(pl.col('B_14'))
    )


    And here are my numpy functions in case you need them:

    def replace_high_quantile(arr, q = 0.99):
    mask = arr <= np.quantile(arr, q)
    return ma.masked_array(arr, ~mask).filled(np.nan)

    def replace_low_quantile(arr, q = 0.01):
    mask = arr >= np.quantile(arr, q)
    return ma.masked_array(arr, ~mask).filled(np.nan)

    def replace_both_quantiles(arr, low = 0.01, high = 0.99):
    mask = (arr >= np.quantile(arr, low)) & (arr <= np.quantile(arr, high)
    return ma.masked_array(arr, ~mask).filled(np.nan)

    Continue reading...

Compartilhe esta Página