1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[SQL] How to set LIMIT for MATERIALIZED without losing data?

Discussão em 'Outras Linguagens' iniciado por Stack, Outubro 17, 2024 às 22:12.

  1. Stack

    Stack Membro Participativo


    There are transactions table and logs table. logs are linked to transactions by transaction_id. I need to query logs by address, join it with transactions, aggregate logs to array, LIMIT transactions (example is LIMIT 2) and FETCH ALL LOGS that were in that transaction (but query only by one address field).

    create table transactions
    (id int,
    hash varchar);

    create table logs
    (transaction_id int,
    address varchar,
    value varchar
    );

    create index on logs(address);

    insert into transactions values
    (1, 'h1'),
    (2, 'h2'),
    (3, 'h3'),
    (4, 'h4'),
    (5, 'h5')
    ;

    insert into logs values
    (1, 'a1', 'h1.a1.1'),
    (1, 'a1', 'h1.a1.2'),
    (1, 'a3', 'h1.a3.1'),
    (2, 'a1', 'h2.a1.1'),
    (2, 'a2', 'h2.a2.1'),
    (2, 'a2', 'h2.a2.2'),
    (2, 'a3', 'h2.a3.1'),
    (3, 'a2', 'h3.a2.1'),
    (4, 'a1', 'h4.a1.1'),
    (5, 'a2', 'h5.a2.1'),
    (5, 'a3', 'h5.a3.1')
    ;


    Result must be with query WHERE log.address='a2' LIMIT 2:

    id logs_array
    2 [{"address":"a1","value":"h2.a1.1"},{"address":"a2","value":"h2.a2.1"},{"address":"a2","value":"h2.a2.2"},{"address":"a3","value":"h2.a3.1"}]
    3 [{"address":"a2","value":"h3.a2.1"}]


    Problem: sql query below works correct, but on very high amount of logs (100k+ logs for 1 address) it can take many minutes for search. The solution would be set LIMIT in MATERIALIZED, but in that case I can get transactions with not fully correct list of logs. How to fix? Either rewrite query without MATERIALIZED and use multiple SELECT inside each other, but I don't know how, or fix with MATERIALIZED.

    So problem is that postgres does not understand correct in MATERIALIZED that I need limited number of transactions, it first searches all logs, only then append them to transactions with limit (as I guess). Index on address logs is set.

    WITH
    b AS MATERIALIZED (
    SELECT lg.transaction_id
    FROM logs lg
    WHERE lg.address='a2'

    -- this must be commented, otherwise not correct results, although fast execution
    -- LIMIT 2
    )
    SELECT
    id,
    (SELECT array_agg(JSON_BUILD_OBJECT('address',address,'value',value)) FROM logs WHERE transaction_id = t.id) logs_array
    FROM transactions t
    WHERE t.id IN
    (SELECT transaction_id FROM b)
    LIMIT 2


    Real-world example, query was executing ~30 seconds:

    EXPLAIN WITH
    b AS MATERIALIZED (
    SELECT lg.transaction_id
    FROM _logs lg
    WHERE lg.address in ('0xca530408c3e552b020a2300debc7bd18820fb42f', '0x68e78497a7b0db7718ccc833c164a18d8e626816')
    )
    SELECT
    (SELECT array_agg(JSON_BUILD_OBJECT('address',address)) FROM _logs WHERE transaction_id = t.id) logs_array
    FROM _transactions t
    WHERE t.id IN
    (SELECT transaction_id FROM b)
    LIMIT 5000;
    QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------------------------------------------
    Limit (cost=87540.62..3180266.26 rows=5000 width=32)
    CTE b
    -> Index Scan using _logs_address_idx on _logs lg (cost=0.70..85820.98 rows=76403 width=8)
    Index Cond: ((address)::text = ANY ('{0xca530408c3e552b020a2300debc7bd18820fb42f,0x68e78497a7b0db7718ccc833c164a18d8e626816}'::text[]))
    -> Nested Loop (cost=1719.64..47260423.09 rows=76403 width=32)
    -> HashAggregate (cost=1719.07..1721.07 rows=200 width=8)
    Group Key: b.transaction_id
    -> CTE Scan on b (cost=0.00..1528.06 rows=76403 width=8)
    -> Index Only Scan using _transactions_pkey on _transactions t (cost=0.57..2.79 rows=1 width=8)
    Index Cond: (id = b.transaction_id)
    SubPlan 2
    -> Aggregate (cost=618.53..618.54 rows=1 width=32)
    -> Index Scan using _logs_transaction_id_idx on _logs (cost=0.57..584.99 rows=6707 width=43)
    Index Cond: (transaction_id = t.id)
    JIT:
    Functions: 17
    Options: Inlining true, Optimization true, Expressions true, Deforming true
    (17 rows)

    Continue reading...

Compartilhe esta Página