1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] plotnine geom_histogram wrong bin placement

Discussão em 'Python' iniciado por Stack, Setembro 13, 2024.

  1. Stack

    Stack Membro Participativo

    I'm trying to define very specifically the bins of my histogram so that their size is exactly 10.

    Here is an example. I defined a list of numbers. The list contains 10 numbers with 1 digit, and then 50 numbers between 50 and 59, 60 numbers between 60 and 69, and so on.

    rand_numbers = ([0]*5 + [9]*5) + \
    ([50]*20 + [59]*30) + \
    ([60]*30 + [69]*30) + \
    ([70]*35 + [79]*35) + \
    ([80]*40 + [89]*40) + \
    ([90]*45 + [99]*45)


    Then I create a data frame where I "classified" the numbers so that numbers up to 69 are in a color, numbers in the 70s are in another color, and all numbers above 80 are another color:

    df = pd.DataFrame({
    'c1': rand_numbers,
    'c2': ['foo'] * 120 + ['bar'] * 70 + ['baz']*170
    })


    To make the histogram, I'm doing:

    import plotnine as p9

    p = p9.ggplot(df, p9.aes(x='c1', fill = 'c2')) + \
    p9.scale_x_continuous(breaks=range(0, 120, 10)) +\
    p9.geom_histogram(size=0.5, colour='black', breaks=range(0, 120, 10))


    [​IMG]

    As you can see, the bins are "spilling" onto one another. Here is more or less what I expected:

    [​IMG]

    That is, I expected a histogram with exactly 10 elements in the first bin, exactly 50 elements in the next bin (between 50 and 59), then exactly 60 elements in the next one. All of the aforementioned bins should be completely blue. Then, a red bin with exactly 70 elements, and then two green bins with exactly 80 and 90 elements.

    As you can see, I'm using the solution suggested here and here on how to predefine the bins in geom_histogram(), but it didn't work the way I expected.

    In attempting to solve this problem, I found:


    EDIT: I noticed that, if I do the following, it "works". Still, I'm not sure if this is a trustworthy solution (?).

    geom_histogram(size=0.5, colour='black',
    breaks=range(-1, 120, 10)) # <------ here, starting in -1

    Continue reading...

Compartilhe esta Página