[SQL] Slow query with filter on Postgres (using index, requiring many different filters)

Stack · Outubro 26, 2024 às 06:22

I have a table:

create table accounts_service.operation_history
(
history_id bigint generated always as identity
primary key,
operation_id varchar(36) not null
unique,
operation_type varchar(30) not null,
operation_time timestamp with time zone default now() not null,
from_phone varchar(20),
user_id varchar(21),
-- and a lot of another varchar(x), text and even couple of number, boolean, jsonb, timestamp columns
);

create index operation_history_user_id_operation_time_idx
on accounts_service.operation_history (user_id, operation_time);

create index operation_history_operation_time_idx
on accounts_service.operation_history (operation_time);

I want to make a simple select with a where filter on operation_time (this is a required filter and can be a day or two) as well as additional filters for other columns: commonly, with varchar(x) type.

But my quries are slow:

explain (buffers, analyze)
select *
from operation_history operationh0_
where (null is null or operationh0_.user_id = null)
and operationh0_.operation_time >= '2024-09-30 20:00:00.000000 +00:00'
and operationh0_.operation_time <= '2024-10-02 20:00:00.000000 +00:00'
and (operationh0_.from_phone = '+000111223344')
order by operationh0_.operation_time asc, operationh0_.history_id asc
limit 25;

Limit (cost=8063.39..178328.00 rows=25 width=1267) (actual time=174373.106..174374.395 rows=0 loops=1)
Buffers: shared hit=532597 read=1433916
I/O Timings: read=517880.241
-> Incremental Sort (cost=8063.39..198759.76 rows=28 width=1267) (actual time=174373.105..174374.394 rows=0 loops=1)
Sort Key: operation_time, history_id
Presorted Key: operation_time
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
Buffers: shared hit=532597 read=1433916
I/O Timings: read=517880.241
-> Gather Merge (cost=1000.60..198758.50 rows=28 width=1267) (actual time=174373.099..174374.388 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=532597 read=1433916
I/O Timings: read=517880.241
-> Parallel Index Scan using operation_history_operation_time_idx on operation_history operationh0_ (cost=0.57..197755.24 rows=12 width=1267) (actual time=174362.932..174362.933 rows=0 loops=3)
Index Cond: ((operation_time >= '2024-09-30 20:00:00+00'::timestamp with time zone) AND (operation_time <= '2024-10-02 20:00:00+00'::timestamp with time zone))
Filter: ((from_phone)::text = '+000111223344'::text)
Rows Removed by Filter: 723711
Buffers: shared hit=532597 read=1433916
I/O Timings: read=517880.241
Planning Time: 0.193 ms
Execution Time: 174374.449 ms

-- for simplicity:
set max_parallel_workers_per_gather = 0;
--It's just simplifying plan, numbers are relevant. Retry the previous query:

Limit (cost=7535.40..189179.35 rows=25 width=1267) (actual time=261432.728..261432.729 rows=0 loops=1)
Buffers: shared hit=374346 read=1591362
I/O Timings: read=257253.065
-> Incremental Sort (cost=7535.40..210976.63 rows=28 width=1267) (actual time=261432.727..261432.727 rows=0 loops=1)
Sort Key: operation_time, history_id
Presorted Key: operation_time
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB
Buffers: shared hit=374346 read=1591362
I/O Timings: read=257253.065
-> Index Scan using operation_history_operation_time_idx on operation_history operationh0_ (cost=0.57..210975.37 rows=28 width=1267) (actual time=261432.720..261432.720 rows=0 loops=1)
Index Cond: ((operation_time >= '2024-09-30 20:00:00+00'::timestamp with time zone) AND (operation_time <= '2024-10-02 20:00:00+00'::timestamp with time zone))
Filter: ((from_phone)::text = '+000111223344'::text)
Rows Removed by Filter: 2171134
Buffers: shared hit=374346 read=1591362
I/O Timings: read=257253.065
Planning Time: 0.170 ms
Execution Time: 261432.774 ms

So it filtered just 2 171 134 rows and it was more than 4 mins. Seems it is too long, isn't it?

I tried selecting specific columns (e.g. operation_time, from_phone, to_phone, history_id), it had no effect. I tried vacuum analyze, it had no effect.
I checked some parameters of postgres, like shared_buffers, work_mem, etc. Changing it has no effect. And also I compared it with pgTune and it's ok.

Some another info:

SELECT relpages, pg_size_pretty(pg_total_relation_size(oid)) AS table_size
FROM pg_class
WHERE relname = 'operation_history';

18402644 | 210 GB

select count(*) from operation_history;

352402877

Server drives: AWS gp3
I don't want to create indexes for all columns because there are massive writes to this table...

Is there any way to optimize it?
Or is it just making a lot of reads from the index and the table and it's ok and we need to do sharding, etc?

Continue reading...

Logar ou Criar uma Conta

[SQL] Slow query with filter on Postgres (using index, requiring many different filters)

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[SQL] Slow query with filter on Postgres (using index, requiring many different filters)

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis