1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Monkeypatch Extract step in ETL data pipeline for functional testing

Discussão em 'Python' iniciado por Stack, Setembro 12, 2024.

  1. Stack

    Stack Membro Participativo

    Consider an ETL pipelines repo build like that:

    etl_repo
    ├── app
    ├── extract
    ├── extr_a.py
    ├── extr_b.py
    ├── transform
    ├── trans_a.py
    ├── trans_b.py
    ├── load
    ├── load_a.py
    ├── load_b.py
    ├── config.py
    ├── my_job1.py
    ├── tests
    ├── test_my_job1.py


    I am running in a production server python app/my_job1.py on a periodic basis. The job(s) are importing functions from the different ETL models stored in the repo (extract, transform and load). I have unit tests coverage for the ETL models but I would like functional (end to end) testing for the actual job(s).

    I learned about monkeypatch with pytest to load static data instead of relying on my extract network ressources. It is working as expected.

    However I cannot figure out what would be the best way to monkeypatch my extract models and make the test execute the python app/my_job1.py command, as if it was in production.

    I would like to avoid having to copy the full job into another test function with monkeypatch fixture. Although technically working, it would be painful to modify both the job and its test each and every time.

    The functional test has to be as close as possible to what the production system is doing.
    I tried to use subprocess to create a child process from inside the test method but the child process itself is not inheriting from the monkeypatched imports.
    I would like to avoid having to inject test code/import in my_job1.py, within conditions like if Config.ETL_ENV == "TEST", just to keep my code clean between code and tests.

    Continue reading...

Compartilhe esta Página