1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] FTP server - caged

Discussão em 'Python' iniciado por Stack, Setembro 28, 2024 às 07:42.

  1. Stack

    Stack Membro Participativo

    I try to access the FTP server to get the CAGED files. But as an error, it shows that the files are corrupted. When I try to access the message “Corrupt input data”. So I'm unsure whether, in fact, the problem is with the code or with the FTP server that the script accesses to download the data.

    The question is. Has anyone had a problem similar to this on an FTP server?

    This is the FTP link that I´m trying to access: ftp://ftp.mtps.gov.br/pdet/microdados/

    the path to download the "dicionario_dados.xlsx" file is within the ftp server: ftp://ftp.mtps.gov.br/pdet/microdad...mentações/Layout Novo Caged Movimentação.xlsx

    from os import remove
    from py7zr import SevenZipFile
    import pandas as pd
    import wget
    import numpy as np
    import warnings



    excel = pd.ExcelFile("data/dicionario_dados.xlsx")
    get_dict = lambda x: pd.read_excel(excel, sheet_name=x)

    data_dict = {
    sheet: {row[1]: row[2] for row in
    get_dict(sheet).itertuples()}
    for sheet in excel.sheet_names[1:]
    }


    url = lambda year, month: f"ftp://ftp.mtps.gov.br/pdet/microdados/NOVO CAGED/{year}/{year}{month:02d}/CAGEDMOV{year}{month:02d}.7z"



    dfs = []
    start_year = 2020
    start_month = 4
    dates = []
    #dates = data["competênciamov"].unique()


    for year in range(start_year, 2025):
    for month in range(start_month, 13):
    if f"{year}-{month:02d}-01" in dates:
    continue
    try:
    print(f"{month:02d}/{year}")
    wget.download(url(year, month), 'caged.7z')
    archive = SevenZipFile('caged.7z', mode = 'r')
    print('Microdata downloaded successfully, ready for reading')
    for name, fd in archive.read(name for name in archive.getnames() if name.endswith(".txt")).items():
    caged_raw = pd.read_csv(fd, delimiter=";", decimal=",")
    caged_raw = caged_raw.loc[caged_raw["uf"] == 25, :].reset_index(drop=True)
    for col in caged_raw.columns:
    if col in data_dict:
    caged_raw[f"{col}_cod"] = caged_raw[col]
    caged_raw[col] = caged_raw[col].apply(lambda x: data_dict[col][x]
    if x in data_dict[col] else np.nan)

    dfs.append(caged_raw)
    archive.close()
    remove('caged.7z')
    print('Reading completed successfully')
    except Exception as e:
    print(f'Error processing {month:02d}/{year}: {e}')
    print('Microdata for the selected month is not yet available')
    break


    output response of the code:

    04/2020
    Microdata downloaded successfully, ready for reading
    Error processing 04/2020: Corrupt input data
    Microdata for the selected month is not yet available
    04/2021
    Microdata downloaded successfully, ready for reading
    Error processing 04/2021: Corrupt input data
    Microdata for the selected month is not yet available
    04/2022
    Microdata downloaded successfully, ready for reading
    Error processing 04/2022: Corrupt input data
    Microdata for the selected month is not yet available
    04/2023
    Microdata downloaded successfully, ready for reading
    Error processing 04/2023: Corrupt input data
    Microdata for the selected month is not yet available
    04/2024
    Microdata downloaded successfully, ready for reading
    Error processing 04/2024: Corrupt input data
    Microdata for the selected month is not yet available


    I tried modifying the code and checking the files on the ftp server

    Continue reading...

Compartilhe esta Página