[Python] How to accelerate getting points within distance using two DataFrames?

Stack · Outubro 8, 2024

I have two DataFrames (df and locations_df), and both have longitude and latitude values. I'm trying to find the df's points within 2 km of each row of locations_df.

I tried to vectorize the function, but the speed is still slow when locations_df is a big DataFrame (nrows>1000). Any idea how to accelerate?

import pandas as pd
import numpy as np

def select_points_for_multiple_locations_vectorized(df, locations_df, radius_km):
R = 6371 # Earth's radius in kilometers

# Convert degrees to radians
df_lat_rad = np.radians(df['latitude'].values)[:, np.newaxis]
df_lon_rad = np.radians(df['longitude'].values)[:, np.newaxis]
loc_lat_rad = np.radians(locations_df['lat'].values)
loc_lon_rad = np.radians(locations_df['lon'].values)

# Haversine formula (vectorized)
dlat = df_lat_rad - loc_lat_rad
dlon = df_lon_rad - loc_lon_rad
a = np.sin(dlat/2)**2 + np.cos(df_lat_rad) * np.cos(loc_lat_rad) * np.sin(dlon/2)**2
c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
distances = R * c

# Create a mask for points within the radius
mask = distances <= radius_km

# Get indices of True values in the mask
indices = np.where(mask)

result = pd.concat([df.iloc[indices[0]].reset_index(drop=True), locations_df.iloc[indices[1]].reset_index(drop=True)], axis=1)

return result

def random_lat_lon(n=1, lat_min=-10., lat_max=10., lon_min=-5., lon_max=5.):
"""
this code produces an array with pairs lat, lon
"""
lat = np.random.uniform(lat_min, lat_max, n)
lon = np.random.uniform(lon_min, lon_max, n)

return np.array(tuple(zip(lat, lon)))

df = pd.DataFrame(random_lat_lon(n=10000000), columns=['latitude', 'longitude'])
locations_df = pd.DataFrame(random_lat_lon(n=20), columns=['lat', 'lon'])

result = select_points_for_multiple_locations_vectorized(df, locations_df, radius_km=2)

Continue reading...

Logar ou Criar uma Conta

[Python] How to accelerate getting points within distance using two DataFrames?

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] How to accelerate getting points within distance using two DataFrames?

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis