1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] pandas generate columns of cumsums based on variable names in two different columns

Discussão em 'Python' iniciado por Stack, Setembro 13, 2024.

  1. Stack

    Stack Membro Participativo

    I have a dataframe as follows:

    import pandas
    import numpy
    df = pandas.DataFrame( data= {'s1' :numpy.random.choice( ['A', 'B', 'C', 'D', 'E'], size=20 ),
    's2' :numpy.random.choice( ['A', 'B', 'C', 'D', 'E'], size=20 ),
    'val':numpy.random.randint(low=-1, high=3, size=20)}, )


    I want to generate two result columns that provide a cumulative sum of a value (val) based on the categories in 's1' and/or 's2'. I can generate the correct answer columns (ans1 and ans2 - corresponding to set1 and set2 columns) as follows:

    temp={}
    df['ans1'] = numpy.nan
    df['ans2'] = numpy.nan
    for idx, row in df.iterrows():
    if row['s1'] in temp:
    df.loc[idx,'ans1'] = temp[ row['s1'] ]
    temp[ row['s1'] ] = temp[ row['s1'] ] + row['val']
    else:
    temp[ row['s1'] ] = row['val']
    df.loc[idx,'ans1'] = 0

    if row['s2'] in temp:
    df.loc[idx,'ans2'] = temp[ row['s2'] ]
    temp[ row['s2'] ] = temp[ row['s2'] ] + row['val']
    else:
    temp[ row['s2'] ] = row['val']
    df.loc[idx,'ans2'] = 0


    using 'temp' as a dictionary to hold the running totals of each category (A-E) I can get the two answer columns... What i cant do is find a solution to this without iterating over each row of the dataframe. I dont can an issue in the case with only s1 - where i can use .groupby().cumsum().shift(1) and get the correct values in correct rows, but cannot find a solution where there are two sets s1 and s2 (or more as I have multiple sensors to track), so i am hoping there is a general more vectorised solution that will work?

    Continue reading...

Compartilhe esta Página