[Python] Getting data from two nested columns in one dataframe

Stack · Setembro 13, 2024

I have a pandas dataframe where two columns contain nested data. Those columns are called payments and ledger_account_bookings. Both can contain lists of dict's. An example of an entry in the payments column is as follows:

[{'id': '426659696713139847', 'administration_id': '394860835892102964', 'price_base': '400.0'},
{'id': '426577578318365974', 'administration_id': '394860835892102964', 'price_base': '39.0'},
{'id': '426577578318365974', 'administration_id': '394860835892102964', 'price_base': '389.0'},
{'id': '426577578318365974', 'administration_id': '394860835892102964', 'price_base': '12.0'}]

I need a row for every dict in this column, so I can use the price_base variable as a field in my dataframe (Which is different for every entry, as you can see).

I know how to achieve this, for example by using the following code:

df_explode = df_financial_mutations.explode(['payments'])
#Normalize the json column into separate columns
df_normalized = json_normalize(df_explode['payments'])
#Add prefix to the columns that were 'exploded'
df_normalized = df_normalized.add_prefix('payments_')

The problem is, I have another column ledger_account_bookings with similar nested data. If I would call explode again, the result becomes murky since I already have exploded the payments column, and therefore 'duplicate' rows were introduced into my dataframe. So, where a payment was exploded, I now have two rows with exactly the same values in the ledger_account_bookings column. When I explode again, this time on the other column, those 'duplicate' are also exploded, so that my dataframe now contains rows of data that don't make sense.

How do I solve such a problem where I need to explode two columns at once? I've seen Efficient way to unnest (explode) multiple list columns in a pandas DataFrame but unfortunately the lists of payments and ledger_account_bookings can be of different size, and can be dynamic as well (e.g. it's possible to have 0-5 payments and 0-5 ledger_account_bookings, there is no fixed value)

Any help would be greatly appreciated.

Continue reading...

Logar ou Criar uma Conta

[Python] Getting data from two nested columns in one dataframe

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] Getting data from two nested columns in one dataframe

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis