1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Getting data from two nested columns in one dataframe

Discussão em 'Python' iniciado por Stack, Setembro 13, 2024.

  1. Stack

    Stack Membro Participativo

    I have a pandas dataframe where two columns contain nested data. Those columns are called payments and ledger_account_bookings. Both can contain lists of dict's. An example of an entry in the payments column is as follows:

    [{'id': '426659696713139847', 'administration_id': '394860835892102964', 'price_base': '400.0'},
    {'id': '426577578318365974', 'administration_id': '394860835892102964', 'price_base': '39.0'},
    {'id': '426577578318365974', 'administration_id': '394860835892102964', 'price_base': '389.0'},
    {'id': '426577578318365974', 'administration_id': '394860835892102964', 'price_base': '12.0'}]


    I need a row for every dict in this column, so I can use the price_base variable as a field in my dataframe (Which is different for every entry, as you can see).

    I know how to achieve this, for example by using the following code:

    df_explode = df_financial_mutations.explode(['payments'])
    #Normalize the json column into separate columns
    df_normalized = json_normalize(df_explode['payments'])
    #Add prefix to the columns that were 'exploded'
    df_normalized = df_normalized.add_prefix('payments_')


    The problem is, I have another column ledger_account_bookings with similar nested data. If I would call explode again, the result becomes murky since I already have exploded the payments column, and therefore 'duplicate' rows were introduced into my dataframe. So, where a payment was exploded, I now have two rows with exactly the same values in the ledger_account_bookings column. When I explode again, this time on the other column, those 'duplicate' are also exploded, so that my dataframe now contains rows of data that don't make sense.

    How do I solve such a problem where I need to explode two columns at once? I've seen Efficient way to unnest (explode) multiple list columns in a pandas DataFrame but unfortunately the lists of payments and ledger_account_bookings can be of different size, and can be dynamic as well (e.g. it's possible to have 0-5 payments and 0-5 ledger_account_bookings, there is no fixed value)

    Any help would be greatly appreciated.

    Continue reading...

Compartilhe esta Página