1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Python recalculate cumulative values of each row of multidimensional data set without...

Discussão em 'Python' iniciado por Stack, Outubro 7, 2024.

  1. Stack

    Stack Membro Participativo

    I am working on complex calculations of a dataset that requires recalculating based on the sum of the previous rows. I'll try to articulate as best as I can.

    I have an n x n dataset, I need to create a new n x n dataset that takes the value from the first record, subtracts the cumulative sum of that row up to that value, and then multiplies by a value from another 1 x n series.

    For example: n x n Dataset :

    i 0 1 2 3 4 5 6
    0 0 0 0 0 0 0 0
    1 0 0 0 0 0 0 0
    2 200 0 0 0 0 0 0
    3 200 150 0 0 0 0 0
    4 200 150 125 0 0 0 0
    5 200 150 125 100 0 0 0
    6 200 150 125 100 75 0 0

    1 x n Series :

    i 0
    0 .000
    1 .001
    2 .002
    3 .003
    4 .004
    5 .005
    6 .006

    Result:

    i 0 1 2 3 4 5
    0 0 0 0 0 0 0
    1 0 0 0 0 0 0
    2 .4 [(200 - sum 0) * .002] 0 0 0 0 0
    3 .5988 [(200 - sum .4) * .003] .45 [(150 - sum 0) * .003] 0 0 0 0
    4 .7960 [(200 - sum (.4, .5988)) * .004] .5982 [(150 - sum .45) * .004] .50 0 0 0
    5 .9910 [(200 - sum (.4, .5988, .7960)) * .005] .7447 [(150 - sum .45, .5982) * .005] .6225 .50 0 0
    6 1.1832 [(200 - sum (.4, .5988, .7960, .9910)) * .006] .8892 [(150 - sum .45, .5982, .7447) * .006] .74326 .5970 .450 0

    I am good at using pandas or numpy to get this data. I am currently looping over each row and column and it is taking a really long time to calculate.

    Any help is greatly appreciated.

    This is code that works using looping.

    import numpy as np
    import pandas as pd


    df_a =pd.DataFrame([[0,0,0,0,0,0,0],[0,0,0,0,0,0,0],[200,0,0,0,0,0,0],[200,150,0,0,0,0,0],[200,150,125,0,0,0,0],[200,150,125,100,0,0,0],[200,150,125,100,75,0,0]]).astype(float)

    df_series = pd.Series([0.0,.001, .002, .003, .004, .005, .006])
    df_b = pd.DataFrame(np.arange(49).reshape(7,7)).astype(float)

    for i in range(df_a.shape[0]): #iterate over rows
    for j in range(df_a.shape[1]): #iterate over columns
    df_a_val = df_a.iloc[i, j] #source dataset value
    ser_val = df_series.iloc #value of series at row
    df_b_prev = df_b.iloc[:i,j] #get series of previous values
    df_b_sum = df_b_prev.sum() #sum previous values


    df_b.at[i, j] = (df_a_val-df_b_sum)*(ser_val) #cell based calculation and set value
    print("(df_a_val: {} - df_b_sum: {}) * ser_val : {} = {}".format(df_a_val, df_b_sum, ser_val, df_b.at[i, j]))
    print(df_series)
    print(df_b)

    Continue reading...

Compartilhe esta Página