1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] ValueError: invalid literal for int() with base 10: 's' when decoding

Discussão em 'Python' iniciado por Stack, Setembro 27, 2024 às 22:32.

  1. Stack

    Stack Membro Participativo

    I have this code part of a large chunk trying to predict classes, and this part:

    tokenizer_ = RobertaTokenizer.from_pretrained("codeT5-base")
    for epoch in range(model_params["VAL_EPOCHS"]):
    model.eval()
    with torch.no_grad(): # or .set_grad_enabled(False)
    for _, data in enumerate(testing_data, 0):
    nlc_ids = data['source_ids'].to(device, dtype = torch.long)
    cmd = data['target_ids'].to(device, dtype = torch.long)
    mask = data['source_mask'].to(device, dtype = torch.long)

    output_ids = model.generate( #max_new_tokens=5
    input_ids = nlc_ids,
    attention_mask = mask,
    max_length= cmd.shape[1],#200,
    repetition_penalty= 1.2, #2.5, 1.8
    # length_penalty = 1.0, # default
    # length_penalty > 0.0 promotes longer sequences, while length_penalty < 0.0 encourages shorter sequences
    # score = sum_logprobs ( = F.log_softmax = negative) / len(hyp) ** self.length_penalty
    early_stopping=False,
    num_beams=3,
    num_return_sequences=3,
    return_dict_in_generate=True,
    output_scores=True
    )


    and it gives this error

    ValueError Traceback (most recent call last)
    Cell In[52], line 61
    59 model_ = T5ForConditionalGeneration.from_pretrained(os.path.join(model_output_dir, "model")).to(device)
    60 tokenizer_ = RobertaTokenizer.from_pretrained(os.path.join(model_output_dir, "tokenizer"))
    ---> 61 predict(tokenizer_, model_)

    Cell In[52], line 44
    42 target = []
    43 for g in output_ids:
    ---> 44 dec_pred = tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True)
    45 preds.append(dec_pred)
    47 for t in cmd:

    File d:\Apps\envs\cuda11\Lib\site-packages\transformers\tokenization_utils_base.py:4016, in PreTrainedTokenizerBase.decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
    4013 # Convert inputs to python lists
    4014 token_ids = to_py_obj(token_ids)
    -> 4016 return self._decode(
    4017 token_ids=token_ids,
    4018 skip_special_tokens=skip_special_tokens,
    4019 clean_up_tokenization_spaces=clean_up_tokenization_spaces,
    4020 **kwargs,
    4021 )

    File d:\Apps\envs\cuda11\Lib\site-packages\transformers\tokenization_utils.py:1081, in PreTrainedTokenizer._decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, spaces_between_special_tokens, **kwargs)
    ...
    -> 1056 index = int(index)
    1057 if skip_special_tokens and index in self.all_special_ids:
    1058 continue

    ValueError: invalid literal for int() with base 10: 's'


    I can't find a clue to what s going on. Any help? I am skipping the special tokens and I don't know why it has to do with that.

    Continue reading...

Compartilhe esta Página