[Python] Pytorch - Realizing identical training with or without using batches

Stack · Setembro 28, 2024 às 05:52

I have a model in PyTorch which converges very well on a reference example when using a standard training process, in which the optimizer trains on all samples at once:

loss = loss_fnc(samples)

model.optim.zero_grad()
loss.backward()
model.optim.step()

Now the model should train on more memory-intensive tasks (more samples, bigger input size), so I thought it would help to train in batches. For this, I’m using this pretty standard method with a for loop:

loss_sum = 0

for k_batch in range(0, len(samples), batch_size):

samples_batch = samples[k_batch:k_batch + batch_size]

loss = loss_fnc(samples_batch)
loss_sum += loss*len(samples_batch)

model.optim.zero_grad()
loss.backward()

model.optim.step()

loss_comp = loss_sum/len(samples)

Now, when I skip the training part in this method and only calculate the loss, the resulting loss_comp is identical to the one when calculating it for all samples at once (like in the method above). Naturally, by training in batches, loss_comp is different since the model changes throughout the batches.

Now I face the problem that the model doesn’t converge on my reference example anymore when using batch training. The samples are already ordered randomly before the training in both cases.

Since I’m only interested in reducing the necessary memory demand for the computation, and before I try anything else, is there a way to really have an identical training when using batches instead of "all-at-once"? I tried to change the function so that only the loss is computed in batches and the optimization step is done at the end, but I didn’t seem to get it to work.

As I understand, using a DataLoader wouldn't change anything concerning this behavior.

Continue reading...

Logar ou Criar uma Conta

[Python] Pytorch - Realizing identical training with or without using batches

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] Pytorch - Realizing identical training with or without using batches

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis