[Python] pandas generate columns of cumsums based on variable names in two different columns

Stack · Setembro 13, 2024

I have a dataframe as follows:

import pandas
import numpy
df = pandas.DataFrame( data= {'s1' :numpy.random.choice( ['A', 'B', 'C', 'D', 'E'], size=20 ),
's2' :numpy.random.choice( ['A', 'B', 'C', 'D', 'E'], size=20 ),
'val':numpy.random.randint(low=-1, high=3, size=20)}, )

I want to generate two result columns that provide a cumulative sum of a value (val) based on the categories in 's1' and/or 's2'. I can generate the correct answer columns (ans1 and ans2 - corresponding to set1 and set2 columns) as follows:

temp={}
df['ans1'] = numpy.nan
df['ans2'] = numpy.nan
for idx, row in df.iterrows():
if row['s1'] in temp:
df.loc[idx,'ans1'] = temp[ row['s1'] ]
temp[ row['s1'] ] = temp[ row['s1'] ] + row['val']
else:
temp[ row['s1'] ] = row['val']
df.loc[idx,'ans1'] = 0

if row['s2'] in temp:
df.loc[idx,'ans2'] = temp[ row['s2'] ]
temp[ row['s2'] ] = temp[ row['s2'] ] + row['val']
else:
temp[ row['s2'] ] = row['val']
df.loc[idx,'ans2'] = 0

using 'temp' as a dictionary to hold the running totals of each category (A-E) I can get the two answer columns... What i cant do is find a solution to this without iterating over each row of the dataframe. I dont can an issue in the case with only s1 - where i can use .groupby().cumsum().shift(1) and get the correct values in correct rows, but cannot find a solution where there are two sets s1 and s2 (or more as I have multiple sensors to track), so i am hoping there is a general more vectorised solution that will work?

Continue reading...

Logar ou Criar uma Conta

[Python] pandas generate columns of cumsums based on variable names in two different columns

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] pandas generate columns of cumsums based on variable names in two different columns

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis