1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Can I force sklearn to use float32 instead of float64?

Discussão em 'Python' iniciado por Stack, Setembro 10, 2024.

  1. Stack

    Stack Membro Participativo

    I am building a product recommender that will use the description of products to find similar products and recommend them. I am using CountVectorizer over the description to find semantically similar descriptions, rank them and suggest those similar.

    The problem comes when calculating the cosine similarity matrix. My initial dataframe has 47,046 rows so Im coming up with RAM issues both on my local pc and in my Colab notebook.

    Checking the count matrix that CountVectorizer I see that it outputs it as int64:

    <47046x3607 sparse matrix of type '<class 'numpy.int64'>'
    with 699336 stored elements in Compressed Sparse Row format>


    There is no issue in casting it to int32 with : count_matrix = count_matrix.astype(np.int32) but still when running the cosinesimilarity from sklearn it outputs float64 instead of float32 (I confirmed this by testing with a smaller dataset that can be processed fine).

    Is there any way to force the use of float32? Or a way to actually solve the high RAM usage with matrices altogether?

    Continue reading...

Compartilhe esta Página