1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] how can Python's pd.qcut give the same result as R's statar::xtile?

Discussão em 'Python' iniciado por Stack, Outubro 2, 2024 às 16:22.

  1. Stack

    Stack Membro Participativo

    I need to create bins based on one column in a dataframe. One problem is the values of that column are oddly distributed. Consequently, Python's pd.qcut may arbitrarily put observations into different bins, even though they have the same value.

    In R (or in Stata), I use the xtile function of the statar package. R is able to group all observations with the same value into one bin.

    library(tidyverse)

    sample_df <- data.frame(customer_id = seq(1:10),
    purch_frequency = c(1, 1, 1, 1, 1, 2, 3, 10, 11, 11))

    sample_df <- sample_df %>%
    mutate(freq_bins1=statar::xtile(purch_frequency, 2),
    freq_bins2=statar::xtile(purch_frequency, 3))

    print(sample_df)


    A corresponding implementation in Python,

    import pandas as pd

    data = {'customer_id': range(1,11),
    'purch_frequency': [1, 1, 1, 1, 1, 2, 3, 10, 11, 11]}
    sample_df = pd.DataFrame(data)

    sample_df['freq_bins1'] = \
    (sample_df['purch_frequency'].rank(method = 'first')
    .transform(lambda x: pd.qcut(x, 2, labels = False)))
    sample_df['freq_bins2'] = \
    (sample_df['purch_frequency'].rank(method = 'first')
    .transform(lambda x: pd.qcut(x, 3, labels = False)))
    print(sample_df)



    As you can see, R and Python give different answers for the last column, freq_bins2. I wondered how I can modify the Python code so that it matches R's result. Thanks!


    A quick follow-up. The R and Python outputs are now attached below. For R (python's index is "1" less than R's, which is fine):

    customer_id purch_frequency freq_bins1 freq_bins2
    1 1 1 1 1
    2 2 1 1 1
    3 3 1 1 1
    4 4 1 1 1
    5 5 1 1 1
    6 6 2 2 2
    7 7 3 2 2
    8 8 10 2 3
    9 9 11 2 3
    10 10 11 2 3


    For Python:

    customer_id purch_frequency freq_bins1 freq_bins2
    0 1 1 0 0
    1 2 1 0 0
    2 3 1 0 0
    3 4 1 0 0
    4 5 1 0 1
    5 6 2 1 1
    6 7 3 1 1
    7 8 10 1 2
    8 9 11 1 2
    9 10 11 1 2

    Continue reading...

Compartilhe esta Página