1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] How to conditionally extract strings using different regex rules depending on column...

Discussão em 'Python' iniciado por Stack, Setembro 12, 2024.

  1. Stack

    Stack Membro Participativo

    I'm trying to extract strings using different regex rules conditional on some column value in Python. For example, given the following dataframe:

    col1 col2
    1 John Smith First
    2 Jane Smith First
    3 Pritchard James Doe Second
    4 Helen Joanne Doe Second
    5 Walker Jean Last
    6 Hall Jensen Last


    I want to obtain:

    col1 col2 col3
    1 John Smith First John
    2 Jane Smith First Jane
    3 Pritchard James Doe Second James
    4 Helen Joanne Doe Second Joanne
    5 Walker Jean Last Jean
    6 Hall Jensen Last Jensen


    In R I would do something like:

    library(tidyverse)
    df <- data.frame(col1 = c("John Smith", "Jane Smith", "Pritchard James Doe", "Helen Joanne Doe", "Walker Jean", "Hall Jensen"),
    col2 = c("First", "First", "Second", "Second", "Last", "Last"))
    df %>%
    mutate(col3 = case_when(col2 == "First" ~ str_extract(col1, "^[A-Za-z]+"),
    col2 == "Second" ~ str_extract(col1, "(?<=\\s+)[A-Za-z]+"),
    col2 == "Last" ~ str_extract(col1, "[A-Za-z]+$")))


    However, I'm unsure how to achieve this in Python. I've tried to use case_when in pandas but been unsuccessful in getting it to work, attempting to use a lambda function or pd.Series.str.extract

    import pandas as pd
    import re
    d = {'col1': ['John Smith', 'Jane Smith', 'Pritchard James Doe', 'Helen Joanne Doe', 'Walker Jean', 'Hall Jensen'],
    'col2': ['First', 'First', 'Second', 'Second', 'Last', 'Last']}
    df = pd.DataFrame(d)
    cl = [(df['col2'] == 'First', lambda x: re.search('(^[A-Za-z]+)', x).group()),
    (df['col2'] == 'Second', lambda x: re.search(r'(?<=\s)([A-Za-z]+)', x).group()),
    (df['col2'] == 'Last', lambda x: re.search('([A-Za-z]+$)', x).group())]
    df.assign(col3 = df['col1'].case_when(cl))

    Continue reading...

Compartilhe esta Página