1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[SQL] PostgresSQL (AWS Redshift) Forward-fill null values based on previous non-null value in...

Discussão em 'Outras Linguagens' iniciado por Stack, Outubro 4, 2024 às 06:02.

  1. Stack

    Stack Membro Participativo

    I have the following dataset:

    create schema m;
    create table m.parent_child_lvl_1(customer_id,date,order_type,order_id,sub_id)
    as values
    (108384372,'18/09/2023'::date,'sub_parent_first_order',5068371361861,407284605)
    ,(108384372, '13/11/2023', 'sub_order', 5134167539781, null)
    ,(108384372, '8/01/2024', 'sub_order', 5214687526981, null)
    ,(108384372, '4/03/2024', 'sub_order', 5283166126149, null)
    ,(108384372, '18/06/2024', 'sub_parent_order', 5421811138629, 500649255)
    ,(108384372, '12/08/2024', 'sub_order', 5508433641541, null)
    ,(108384372, '12/08/2024', 'sub_order', 5508433641541, null);


    I need to fill the yellow null's with the Sub_ID until it hits the next non-null Sub_ID, then fill until the next, and so on. The null order only works based on the Date ordering. Also, needs to be grouped by Customer_ID. The full dataset has 15M rows, so any speed optimisation would be a bonus.

    I've tried lead(), lag(), lastvalue(), coalesce() but the closest I got was only filling in the first null value after each real value.

    The closest I've come to is this:

    SELECT
    first_value(sub_id) OVER (
    PARTITION BY customer_id, value_partition
    ORDER BY customer_id, date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS parent_sub_id,
    customer_id,
    order_id
    FROM (
    SELECT
    customer_id,
    order_id,
    sub_id,
    date,
    SUM(CASE WHEN sub_id IS NULL THEN 0 ELSE 1 END) OVER (
    ORDER BY customer_id, date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS value_partition
    from m.parent_child_lvl_1
    where customer_id in ('227330109','90872199','102694972')
    and
    order_type in (
    'sub_parent_first_order',
    'sub_parent_order',
    'sub_order'
    )
    ORDER BY date ASC
    ) AS q
    ORDER BY customer_id, date
    ;


    Which seems to work for some records, but then goes wrong for others and I can't figure out why.

    Continue reading...

Compartilhe esta Página