1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

  2. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

How to write an efficient, readable and scalable SQL query? [closed]

Discussão em 'Outras Linguagens' iniciado por Stack, Janeiro 27, 2021.

  1. Stack

    Stack Membro Participativo

    I am a beginner with Data Analysis and I am learning SQL at this moment. Hence, my apologies if there are problems with my question if so let me know and I will modify it.

    Context

    First of all, to provide some context, this exercise is part of a test given to me by an employer for a Data Analyst position in the technical assessment of the selection process. Since I struggled to produce the result I attached here, I decided to publish this question with changes in the task to not disclose the original problem.

    Problem

    You have the client_agent_messages table that stores data about individual messages exchanged between clients and agents via the web-chat. An example of the table is below:

    Sender OS Client Id From Id To Id First Message Order Id Order status Agent Id Sent time
    Client Linux 101 101 502 FALSE A1001 Sent to adress 502 2020-09-20 9:02:48
    Agent Linux 101 502 101 FALSE A1001 Preparing 502 2020-09-20 9:02:05
    Client Linux 101 101 502 FALSE A1001 Preparing 502 2020-09-20 9:01:05
    Agent Windows 152 513 152 TRUE A1005 Delivered 513 2020-09-20 8:59:44

    Besides this table, there is an orders table where you could find order_id and state.

    Create a table client_agent_chat with a query that aggregates individual messages into conversations. Take into consideration that a conversation is unique per order. The required fields are the following:

    • Order id
    • State
    • Time of first agent message
    • Time of first client message
    • Number of messages from agent
    • Number of messages from client
    • First message sender (agent or client)
    • Time of first message in chat
    • Time between first and second message (in sec)
    • Time of last message
    • Status of order when last message was sent

    You have to make your query efficient, readable and scalable.

    My Question

    I created the queries attached having to looked over internet for one day in order to find the needed code. In my search, I came across posts and articles about the use of CTEs, subqueries and functions but I didn't found some resource that could tell me which code must be preferred over others regarding efficiency and scalability and I thought I got maybe a messy result. That's why I would like to ask: What is the best (or a better) answer for this problem if the three requirements (efficient, readable and scalable) must be met?

    My queries:

    CREATE TABLE client_agent_messages (
    sender_os varchar(256),
    client_id varchar(32),
    from_id varchar(32),
    to_id varchar(32),
    first_message varchar(64),
    order_id varchar(32),
    order_status varchar(32),
    agent_id varchar(32),
    sent_time datetime);

    INSERT INTO client_agent_messages VALUES
    ('Client Linux', '101', '101', '502', 'FALSE', '59528555', 'DELIVERING', '502', '2019-08-19 8:01:47'),
    ('Agent Linux', '101', '502', '101', 'FALSE', '59528555', 'PICKING_UP', '502', '2019-08-19 8:01:04'),
    ('Client Linux', '101', '101', '502', 'FALSE', '59528555', 'PICKING_UP', '502', '2019-08-19 8:00:04'),
    ('Agent Windows', '152', '513', '152', 'FALSE', '59528038', 'ARRIVING', '513', '2019-08-19 7:59:33'),
    ('Agent Linux', '101', '502', '101', 'TRUE', '59528555', 'PICKING_UP', '502', '2019-08-19 7:59:04'),
    ('Client Windows', '152', '152', '513', 'TRUE', '59528038', 'ARRIVING', '513', '2019-08-19 7:58:33');


    CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    state_code INT);

    INSERT INTO orders VALUES
    ('59528555', '23'),
    ('59528038', '33');


    CREATE TABLE client_agent_chat (
    order_id INT PRIMARY KEY,
    state_code INT,
    fs_agent_mess DATE,
    fs_client_mess DATE,
    count_mess_agent INT,
    count_mess_client INT,
    fs_mess_sender VARCHAR(8),
    fs_mess_time DATE,
    fs_respond_time INT,
    last_mess DATE,
    last_mess_status VARCHAR(16));

    INSERT INTO client_agent_chat
    SELECT agg.order_id,
    state_code,
    fs_agent_mess,
    fs_client_mess,
    count_mess_agent,
    count_mess_client,
    fs_mess_sender,
    fs_mess_time,
    fs_respond_time,
    last_mess,
    last_mess_status
    FROM
    (
    SELECT
    order_id,
    MIN(CASE WHEN sender_app_type LIKE 'Courier%' THEN message_sent_time END) AS fs_agent_mess,
    MIN(CASE WHEN sender_app_type LIKE 'Customer%' THEN message_sent_time END) AS fs_client_mess,
    COUNT(CASE WHEN from_id = courier_id THEN 1 END) AS count_mess_agent,
    COUNT(CASE WHEN from_id = customer_id THEN 1 END) AS count_mess_client,
    MIN(message_sent_time) AS fs_mess_time,
    MAX(message_sent_time) AS last_mess
    FROM client_agent_messages
    GROUP BY order_id
    ) AS agg
    LEFT JOIN (
    SELECT
    order_id,
    (SELECT state_code FROM orders AS o WHERE o.order_id = c.order_id) AS state_code,
    LAST_VALUE(order_status) OVER(PARTITION BY order_id ORDER BY sent_time DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_mess_status,
    DATEDIFF(second, LAG(sent_time) OVER(PARTITION BY order_id ORDER BY sent_time),sent_time) AS fs_respond_time,
    ROW_NUMBER() OVER(PARTITION BY order_id ORDER BY sent_time) as row_count,
    (CASE WHEN first_message != LAG(first_message) OVER(PARTITION BY order_id ORDER BY sent_time)
    THEN SUBSTRING(sender_os, 1, CHARINDEX(' ', sender_os)) END) AS fs_mess_sender
    FROM client_agent_messages AS c
    ) AS t
    ON agg.order_id = t.order_id
    WHERE row_count = 2;

    Continue reading...

Compartilhe esta Página