[Python] AWS DynamoDB download speed with Boto3

Stack · Outubro 25, 2024 às 07:42

I am using python/Boto3 to download logs from a health tracking device. I wrote a python app that lets you run a paginated query for a a given user & timeframe:

def paginated_query(self, tstart, tend, user):
paginator = self.datapoints_table.meta.client.get_paginator('query')
page_iterator = paginator.paginate(
TableName=self.table_name,
KeyConditionExpression= Key('userId').eq(user_id) & Key('timestamp').between(tstart, tend)
)

The query itself works and executes within a few milliseconds.

Then, to download the data and process it with python, I convert the generator into a list:

page_iterator = self.paginated_query(tstart, tend, user)

log_list = []
for p in page_iterator:
items = p.get('Items', []) # download the data
log_list.extend(items)

This last snippet takes a very long time depending on how much data I am trying to download. I tried to process pages in parallel, instead of sequentially in a for loop, but it makes no difference in the execution time. I measured a data download rate from DynamoDB/AWS at 1.8MB/s max, which seems much slower than what my connection could achieve. I can only assume there is a bottleneck on AWS side.

Does anyone know how to make the data download faster?

Continue reading...

Logar ou Criar uma Conta

[Python] AWS DynamoDB download speed with Boto3

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] AWS DynamoDB download speed with Boto3

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis