1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Pandas apply function behaving differently based on input size?

Discussão em 'Python' iniciado por Stack, Outubro 3, 2024 às 19:02.

  1. Stack

    Stack Membro Participativo

    I have a function which works fine with a tiny Pandas Dataframe and returns the adjustments as expected, but when I apply it to a non-test Dataframe, which is only a small df (300 x 20), it gets completely messed up.

    Using the example below as a reference, it would populate columns 3 - 7 with the same list of values for all rows in each respective column...

    To add to the confusion, when I remove the code from within the function and run it as part of one file the expected results are achieved for the non-test dataframe...

    Can anyone suggest what might be happening here and why the behaviour seems to vary? / what I am changing.

    I provide the code here for recreation

    def test_populate_columns_from_Key_Value_String():

    starting_df = pd.DataFrame(
    {
    "Names": ["Sarah", "John"],
    "Relations": [
    "has parent: Maggie, is parent of: Tom, is parent of: Grace, is parent of: Bart, is related to: Grandpa Simpson, is hated by: Joseph",
    "is friends with: Tracey, has parent: Greg",
    ],
    "has parent": [[], []],
    "is parent of": [[], []],
    "is related to": [[], []],
    "is hated by": [[], []],
    "is friends with": [[], []],
    }
    )

    desire_df = pd.DataFrame(
    {
    "Names": ["Sarah", "John"],
    "Relations": [
    "has parent: Maggie, is parent of: Tom, is parent of: Grace, is parent of: Bart, is related to: Grandpa Simpson, is hated by: Joseph",
    "is friends with: Tracey, has parent: Greg",
    ],
    "has parent": [["Maggie"], ["Greg"]],
    "is parent of": [["Tom", "Grace", "Bart"], []],
    "is related to": [["Grandpa Simpson"], []],
    "is hated by": [["Joseph"], []],
    "is friends with": [[], ["Tracey"]],
    }
    )

    starting_df.apply(
    populate_columns_from_Key_Value_String, axis=1, args=("Relations",)
    )

    print(starting_df)
    assert desire_df.loc[0, "has parent"] == starting_df.loc[0, "has parent"]
    assert starting_df.loc[0, "is hated by"] == ["Joseph"]
    assert starting_df.loc[1, "is friends with"] == ["Tracey"]
    assert starting_df.loc[0, "is parent of"] == desire_df.loc[0, "is parent of"]

    def populate_columns_from_Key_Value_String(
    row: pd.Series, column_to_extract_from="Column_name_with_Relational_String"
    ):
    string_to_extract_values_from = row[column_to_extract_from]
    if isinstance(string_to_extract_values_from, str):
    for key_value_pair in string_to_extract_values_from.split(","):
    if isinstance(row[key_value_pair.split(":")[0].strip()], list):
    row[key_value_pair.split(":")[0].strip()].append(
    key_value_pair.split(":")[1].strip()
    )

    Continue reading...

Compartilhe esta Página