1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[SQL] How to efficiently FOR-loop over a huge SELECT result set in PL/pgSQL

Discussão em 'Outras Linguagens' iniciado por Stack, Outubro 25, 2024 às 09:02.

  1. Stack

    Stack Membro Participativo

    In a PL/pgSQL stored procedure, I would like to calculate multiple averages (simple, weighted, etc) over different periods of time and over different number of data points, all in a single loop.

    Therefore, I cannot set the LIMIT boundary for the SELECT that the loop loops over - because the biggest time period can still have not enough data points, so I can't put LIMIT 1000 nor can I do WHERE date_created >= (one year ago).

    Thus I would like not to limit the SELECT query in any way, and instead loop through it until I'm done and have calculated all the averages that I needed over all possible chunks of data (either time or quantity based), and then I just call EXIT to stop the looping.

    My concern is, what if, without LIMIT or WHERE constrains, the SELECT returns 100 million rows while I only end up needing first 10k of them - will my stored PL/pgSQL procedure actually run the huge 100 million row selection and then loop over it, or will it loop only as needed?

    I do understand that the FOR loop uses cursors internally. But my question is about the result set that needs to be built before a cursor can loop through it. So, is it a bad idea to run a FOR loop over a SELECT query with no LIMIT/WHERE constrains that could potentially return 100M rows, and then EXIT as needed? Or should I just process data in chunks of ~1k items?

    The SELECT statement looks backwards starting from some point in time, and that is why it has:

    1. where date_created < START_TIME
    2. order by date_created desc

    Continue reading...

Compartilhe esta Página