1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] How to query a large file using pandas (or an alternative)?

Discussão em 'Python' iniciado por Stack, Setembro 28, 2024 às 06:52.

  1. Stack

    Stack Membro Participativo

    I have a large file (6GB, around 17Mio lines) with 5 columns. I want to use the first column as key and retrieve the other 4 columns. I want to give multiple keys. My code so far is this:

    import pandas as pd

    df = pd.read_csv('large_file.gz',
    chunksize=1000000)

    key = "find_this" # an element of column 1

    for data in df:
    if key in data['col1'].tolist():
    found = data[data['col1'] == key]
    break

    print(found)


    How could I speed optimize this also for multiple items?

    Continue reading...

Compartilhe esta Página