1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Error when using RAG model from Hugging Face Transformers

Discussão em 'Python' iniciado por Stack, Outubro 1, 2024 às 04:12.

  1. Stack

    Stack Membro Participativo

    I'm working on a project where I am using the RAG (Retrieval-Augmented Generation) model from Hugging Face's Transformers library. I have the following code, which is simply example code from facebook/rag-sequence-nq , to initialize the model and generate a response:

    from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration

    tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
    retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq", index_name="exact", use_dummy_dataset=True)
    model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq", retriever=retriever)

    input_dict = tokenizer.prepare_seq2seq_batch("how many countries are in europe", return_tensors="pt")

    generated = model.generate(input_ids=input_dict["input_ids"])
    print(tokenizer.batch_decode(generated, skip_special_tokens=True)[0])


    I have also created a requirements.txt file with the following contents:

    torch
    transformers
    datasets
    flask
    faiss-cpu


    However, I'm running into an issue (error log to be provided).

    Error:

    The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
    The tokenizer class you load from this checkpoint is 'RagTokenizer'.
    The class this function is called from is 'DPRQuestionEncoderTokenizer'.
    The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
    The tokenizer class you load from this checkpoint is 'RagTokenizer'.
    The class this function is called from is 'DPRQuestionEncoderTokenizerFast'.
    The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
    The tokenizer class you load from this checkpoint is 'RagTokenizer'.
    ...


    Question: How can I resolve this tokenizer class mismatch issue? Any guidance or recommendations would be greatly appreciated!

    Environment:

    • Python version: 3.11.2
    • Operating System: Debian GNU/Linux 12 (bookworm) x86_64

    Any help would be appreciated! Thank you!

    Continue reading...

Compartilhe esta Página