[Python] Masking and computing loss for a padded batch in a transformer architecture

Stack · Setembro 10, 2024

I am trying to re-create transformer model, and base myself loosely on the Annotated Transformer. My question regards the padding:

How does the Annotated Transformer deal with padded sequences? I can see them creating the method Batch.make_std_mask, which supposedly (also) masks all padded tokens, but this only applies to synthetic data.

How should one generally proceed with padded sequences in a transformer-based architecture? I can see mentions (again, in the Annotated Transformer, search for "Batching matters a ton for speed.") to minimise padding, by which - I guess - is meant to find a sequence length that minimises padding? I am fairly certain one has to pass the whole padded sequence into the model, otherwise the self-attention runs into problems. Apart from setting something like padding_idx in the embeddings (pytorch), should the model itself treat padded tokens? Should loss calculation index target and model output to ignore all padded tokens?

I have already seen this question (Masking and computing loss for a padded batch sent through an RNN with a linear output layer in pytorch) - is the procedure the same for attention-based models, or is there a difference because of how the attention mechanism 'sees' the whole sequence at once?

I have also seen this question (Query padding mask and key padding mask in Transformer encoder), but I am not sure whether it treats the same problem. If it does, I would still love a clarification (and be it in the form of an answer there), because I do not completely understand neither the question nor the currently accepted (only) answer.

Continue reading...

Logar ou Criar uma Conta

[Python] Masking and computing loss for a padded batch in a transformer architecture

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] Masking and computing loss for a padded batch in a transformer architecture

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis