1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] How to check similarities between two datasets and return a score in Snowflake (is it...

Discussão em 'Python' iniciado por Stack, Setembro 14, 2024.

  1. Stack

    Stack Membro Participativo

    I have two data sets containing full names of my company customers. Both sets are rather large (40-70k rows). I would like to check if there are similarities between the two groups. For example: if one set has a record with the value 'John Smith' and the other 'John W. Smith' I'd like to possibly consider it as a match.

    Column A Column B Similarity Score
    John Smith John W. Smith 0.97
    James Bond Andrew Bond 0.5

    The data sets have different size, they are on Snowflake . Let me know if any information is missing.

    Also, do you think that this is solvable with SQL? Or is Python more suitable?

    I tried using Jarowinkler similarity but it applies only to 'specific' strings, not to datasets.

    Continue reading...

Compartilhe esta Página