[Python] ValueError: invalid literal for int() with base 10: 's' when decoding

Stack · Setembro 27, 2024 às 22:32

I have this code part of a large chunk trying to predict classes, and this part:

tokenizer_ = RobertaTokenizer.from_pretrained("codeT5-base")
for epoch in range(model_params["VAL_EPOCHS"]):
model.eval()
with torch.no_grad(): # or .set_grad_enabled(False)
for _, data in enumerate(testing_data, 0):
nlc_ids = data['source_ids'].to(device, dtype = torch.long)
cmd = data['target_ids'].to(device, dtype = torch.long)
mask = data['source_mask'].to(device, dtype = torch.long)

output_ids = model.generate( #max_new_tokens=5
input_ids = nlc_ids,
attention_mask = mask,
max_length= cmd.shape[1],#200,
repetition_penalty= 1.2, #2.5, 1.8
# length_penalty = 1.0, # default
# length_penalty > 0.0 promotes longer sequences, while length_penalty < 0.0 encourages shorter sequences
# score = sum_logprobs ( = F.log_softmax = negative) / len(hyp) ** self.length_penalty
early_stopping=False,
num_beams=3,
num_return_sequences=3,
return_dict_in_generate=True,
output_scores=True
)

and it gives this error

ValueError Traceback (most recent call last)
Cell In[52], line 61
59 model_ = T5ForConditionalGeneration.from_pretrained(os.path.join(model_output_dir, "model")).to(device)
60 tokenizer_ = RobertaTokenizer.from_pretrained(os.path.join(model_output_dir, "tokenizer"))
---> 61 predict(tokenizer_, model_)

Cell In[52], line 44
42 target = []
43 for g in output_ids:
---> 44 dec_pred = tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True)
45 preds.append(dec_pred)
47 for t in cmd:

File d:\Apps\envs\cuda11\Lib\site-packages\transformers\tokenization_utils_base.py:4016, in PreTrainedTokenizerBase.decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
4013 # Convert inputs to python lists
4014 token_ids = to_py_obj(token_ids)
-> 4016 return self._decode(
4017 token_ids=token_ids,
4018 skip_special_tokens=skip_special_tokens,
4019 clean_up_tokenization_spaces=clean_up_tokenization_spaces,
4020 **kwargs,
4021 )

File d:\Apps\envs\cuda11\Lib\site-packages\transformers\tokenization_utils.py:1081, in PreTrainedTokenizer._decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, spaces_between_special_tokens, **kwargs)
...
-> 1056 index = int(index)
1057 if skip_special_tokens and index in self.all_special_ids:
1058 continue

ValueError: invalid literal for int() with base 10: 's'

I can't find a clue to what s going on. Any help? I am skipping the special tokens and I don't know why it has to do with that.

Continue reading...

Logar ou Criar uma Conta

[Python] ValueError: invalid literal for int() with base 10: 's' when decoding

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] ValueError: invalid literal for int() with base 10: 's' when decoding

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis