[SQL] Using SQL or Spark-SQL - How To Update Column Value Based on Aggregation Record Count...

Stack · Novembro 5, 2024 às 15:52

I have a dataset that involves customers opting out of a promotion. They will always have a unique customer_id but can share a living space. The issue comes when two customers living in the same household have different methods of promotion actions (add, delete, NULL) on the same day.

The issue comes when multiple household members have separate promo_action values occur on the same promo_action_date. If one household member chooses delete we need to ensure all household members are treated as a deletion for the promotion. If this is the case, how can you update the promo_action column if there are multiple records sharing the same household_id and promo_action_date, but different values for the promo_action? The requirement would be to update the promo_action_date to "DELETE" if this were the case.

The promo_action_household table has the 4 columns seen in the headers here.

Before

customer_id|household_id|promo_action|promo_action_date

101|54|DELETE|2024-10-03

157|54|NULL|2024-10-03

After

customer_id|household_id|promo_action|promo_action_date

101|54|DELETE|2024-10-03

157|54|DELETE|2024-10-03

I was hoping to update the columns of the records to "DELETE" if this condition was met querying the promo_action_household table. SQL or spark-sql are the goal tools to use here.

select household_id, promo_action_date, count() from promo_action_household group by household_id, promo_action_date having count()>1

Continue reading...

Logar ou Criar uma Conta

[SQL] Using SQL or Spark-SQL - How To Update Column Value Based on Aggregation Record Count...

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[SQL] Using SQL or Spark-SQL - How To Update Column Value Based on Aggregation Record Count...

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis