1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[SQL] Slow query with filter on Postgres (using index, requiring many different filters)

Discussão em 'Outras Linguagens' iniciado por Stack, Outubro 26, 2024 às 06:22.

  1. Stack

    Stack Membro Participativo

    I have a table:

    create table accounts_service.operation_history
    (
    history_id bigint generated always as identity
    primary key,
    operation_id varchar(36) not null
    unique,
    operation_type varchar(30) not null,
    operation_time timestamp with time zone default now() not null,
    from_phone varchar(20),
    user_id varchar(21),
    -- and a lot of another varchar(x), text and even couple of number, boolean, jsonb, timestamp columns
    );


    create index operation_history_user_id_operation_time_idx
    on accounts_service.operation_history (user_id, operation_time);

    create index operation_history_operation_time_idx
    on accounts_service.operation_history (operation_time);


    I want to make a simple select with a where filter on operation_time (this is a required filter and can be a day or two) as well as additional filters for other columns: commonly, with varchar(x) type.

    But my quries are slow:

    explain (buffers, analyze)
    select *
    from operation_history operationh0_
    where (null is null or operationh0_.user_id = null)
    and operationh0_.operation_time >= '2024-09-30 20:00:00.000000 +00:00'
    and operationh0_.operation_time <= '2024-10-02 20:00:00.000000 +00:00'
    and (operationh0_.from_phone = '+000111223344')
    order by operationh0_.operation_time asc, operationh0_.history_id asc
    limit 25;

    Limit (cost=8063.39..178328.00 rows=25 width=1267) (actual time=174373.106..174374.395 rows=0 loops=1)
    Buffers: shared hit=532597 read=1433916
    I/O Timings: read=517880.241
    -> Incremental Sort (cost=8063.39..198759.76 rows=28 width=1267) (actual time=174373.105..174374.394 rows=0 loops=1)
    Sort Key: operation_time, history_id
    Presorted Key: operation_time
    Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
    Buffers: shared hit=532597 read=1433916
    I/O Timings: read=517880.241
    -> Gather Merge (cost=1000.60..198758.50 rows=28 width=1267) (actual time=174373.099..174374.388 rows=0 loops=1)
    Workers Planned: 2
    Workers Launched: 2
    Buffers: shared hit=532597 read=1433916
    I/O Timings: read=517880.241
    -> Parallel Index Scan using operation_history_operation_time_idx on operation_history operationh0_ (cost=0.57..197755.24 rows=12 width=1267) (actual time=174362.932..174362.933 rows=0 loops=3)
    Index Cond: ((operation_time >= '2024-09-30 20:00:00+00'::timestamp with time zone) AND (operation_time <= '2024-10-02 20:00:00+00'::timestamp with time zone))
    Filter: ((from_phone)::text = '+000111223344'::text)
    Rows Removed by Filter: 723711
    Buffers: shared hit=532597 read=1433916
    I/O Timings: read=517880.241
    Planning Time: 0.193 ms
    Execution Time: 174374.449 ms

    -- for simplicity:
    set max_parallel_workers_per_gather = 0;
    --It's just simplifying plan, numbers are relevant. Retry the previous query:

    Limit (cost=7535.40..189179.35 rows=25 width=1267) (actual time=261432.728..261432.729 rows=0 loops=1)
    Buffers: shared hit=374346 read=1591362
    I/O Timings: read=257253.065
    -> Incremental Sort (cost=7535.40..210976.63 rows=28 width=1267) (actual time=261432.727..261432.727 rows=0 loops=1)
    Sort Key: operation_time, history_id
    Presorted Key: operation_time
    Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
    Buffers: shared hit=374346 read=1591362
    I/O Timings: read=257253.065
    -> Index Scan using operation_history_operation_time_idx on operation_history operationh0_ (cost=0.57..210975.37 rows=28 width=1267) (actual time=261432.720..261432.720 rows=0 loops=1)
    Index Cond: ((operation_time >= '2024-09-30 20:00:00+00'::timestamp with time zone) AND (operation_time <= '2024-10-02 20:00:00+00'::timestamp with time zone))
    Filter: ((from_phone)::text = '+000111223344'::text)
    Rows Removed by Filter: 2171134
    Buffers: shared hit=374346 read=1591362
    I/O Timings: read=257253.065
    Planning Time: 0.170 ms
    Execution Time: 261432.774 ms


    So it filtered just 2 171 134 rows and it was more than 4 mins. Seems it is too long, isn't it?

    I tried selecting specific columns (e.g. operation_time, from_phone, to_phone, history_id), it had no effect. I tried vacuum analyze, it had no effect.
    I checked some parameters of postgres, like shared_buffers, work_mem, etc. Changing it has no effect. And also I compared it with pgTune and it's ok.

    Some another info:


    SELECT relpages, pg_size_pretty(pg_total_relation_size(oid)) AS table_size
    FROM pg_class
    WHERE relname = 'operation_history';

    18402644 | 210 GB

    select count(*) from operation_history;

    352402877


    Server drives: AWS gp3
    I don't want to create indexes for all columns because there are massive writes to this table...

    Is there any way to optimize it?
    Or is it just making a lot of reads from the index and the table and it's ok and we need to do sharding, etc?

    Continue reading...

Compartilhe esta Página