[Python] Monkeypatch Extract step in ETL data pipeline for functional testing

Stack · Setembro 12, 2024

Consider an ETL pipelines repo build like that:

etl_repo
├── app
├── extract
├── extr_a.py
├── extr_b.py
├── transform
├── trans_a.py
├── trans_b.py
├── load
├── load_a.py
├── load_b.py
├── config.py
├── my_job1.py
├── tests
├── test_my_job1.py

I am running in a production server python app/my_job1.py on a periodic basis. The job(s) are importing functions from the different ETL models stored in the repo (extract, transform and load). I have unit tests coverage for the ETL models but I would like functional (end to end) testing for the actual job(s).

I learned about monkeypatch with pytest to load static data instead of relying on my extract network ressources. It is working as expected.

However I cannot figure out what would be the best way to monkeypatch my extract models and make the test execute the python app/my_job1.py command, as if it was in production.

I would like to avoid having to copy the full job into another test function with monkeypatch fixture. Although technically working, it would be painful to modify both the job and its test each and every time.

The functional test has to be as close as possible to what the production system is doing.
I tried to use subprocess to create a child process from inside the test method but the child process itself is not inheriting from the monkeypatched imports.
I would like to avoid having to inject test code/import in my_job1.py, within conditions like if Config.ETL_ENV == "TEST", just to keep my code clean between code and tests.

Continue reading...

Logar ou Criar uma Conta

[Python] Monkeypatch Extract step in ETL data pipeline for functional testing

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] Monkeypatch Extract step in ETL data pipeline for functional testing

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis