1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] performance using python3/pandas to read .ods (opendocument spreadsheet)file

Discussão em 'Python' iniciado por Stack, Novembro 5, 2024 às 13:22.

  1. Stack

    Stack Membro Participativo

    Environment:


    inside docker container in WSL2/Ubuntu22.04
    python 3.12
    pandas 2.2.2
    odfpy 1.4.1
    openpyxl 3.1.3


    The .ods file I have on disk is 6.8MB (two sheets, one sheet has 16,000 rows, the other has 74,000 rows). I can open this file in MS excel in no time.

    I have the following code to read this file (first read it into a bytes variable):

    t1 = time.perf_counter()
    excel = ExcelFile(BytesIO(file_content), engine="odf")
    t2 = time.perf_counter()
    data = pd.read_excel(BytesIO(file_content), engine="odf")
    t3 = time.perf_counter()


    the following is the value of t1, t2 and t3 after the file content is read into excel and data:


    t1: 366108.7721855
    t2: 366606.7265884
    t3: 367100.7166519


    It takes about 10 minutes to read the data as Excel or pandas dataframe. Anywhere I can tune to improve the reading performance?

    Continue reading...

Compartilhe esta Página