[Python] Is the problem with the DataLoader or the model?

Stack · Outubro 6, 2024 às 19:52

Is the issue with the DataLoader or the model? Typically, if the input sizes are different, how can they be transformed to a fixed size? If we want to feed the outputs of GAT into a CNN, what is the best approach?

The complete code is very lengthy. I tried to include the important parts to simplify the issue.

Model:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GATConv
from torch_geometric.data import Dataset,Data
from torch_geometric.loader import DataLoader

class GAT(nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels, heads):
super(GAT, self).__init__()
self.conv1 = GATConv(in_channels, hidden_channels, heads=heads)
# ... code truncated for brevity : conv2,3,4,5,6 ...
self.conv7 = GATConv(hidden_channels * heads, out_channels, heads=1, concat=False)
self.bn = nn.BatchNorm1d(hidden_channels * heads)
self.dropout = nn.Dropout(0.1)
def forward(self, data):
x, edge_index = data.x, data.edge_index
x = self.conv1(x, edge_index)
x = self.bn(x)
x = F.relu(x)
x = self.dropout(x)
# ... code truncated for brevity : conv2,3,4,5,6 ...
x = self.conv7(x, edge_index)
return x

class MODELGAT(nn.Module):
def __init__(self):
super(DTI, self).__init__()
self.gat = GAT(5, 16, 32, heads=16)
def forward(self, D1, D2, Distances1, Distances2, Angles1, Angles2):
# Distances1, Distances2, Angles1, Angles2: I will use these later in the model
print(D1) # DataBatch(x=[214, 5], edge_index=[2, 228], batch=[214], ptr=[6])
print(D2) # DataBatch(x=[110, 5], edge_index=[2, 115], batch=[110], ptr=[6])
x1 = self.gat(D1)
x2 = self.gat(D2)
print(x1.shape) # torch.Size([214, 32])
print(x2.shape) # torch.Size([110, 32])

x1 = x1.unsqueeze(0).unsqueeze(0)
x1 = F.interpolate(x1, size=(x2.size(0), 32), mode='bicubic', align_corners=False)
x1 = x1.squeeze(0)
x2 = x2.unsqueeze(0)
combined = x1 + x2
combined = combined.unsqueeze(0)

x = self.cnn(combined)
x = torch.sigmoid(x)
x = x.squeeze(0).squeeze(0)

return x # Problem

Didn't we consider batch_size as 5! Why are the inputs like this? DataBatch(x=[214, 5], edge_index=[2, 228], batch=[214], ptr=[6])

In the end, it should produce 5 outputs to be tested with L. But why are the outputs torch.Size([214, 32])?

Normally, when the inputs have different sizes, resulting in outputs like torch.Size([214, 32]) and torch.Size([110, 32]), how should I transform them to the same size before feeding them into a CNN? Is the method I'm using even correct?

DataLoader:

def DATA(D):
# ... code truncated for brevity : OUT DD1 & DD2 & L...
L = torch.tensor(L, dtype=torch.float).squeeze(0) # 0 OR 1
D1 = DD1[0] # graph_data -> Data(x=[59, 5], edge_index=[2, 64])
Distances1 = torch.tensor(DD1[1], dtype=torch.float32) # torch.Size([57])
Angles1 = torch.tensor(DD1[2], dtype=torch.float32) # torch.Size([57])
D2 = DD2[0] # graph_data -> Data(x=[22, 5], edge_index=[2, 23])
Distances2 = torch.tensor(DD2[1], dtype=torch.float32) # torch.Size([20])
Angles2 = torch.tensor(DD2[2], dtype=torch.float32) # torch.Size([20])
data = Data(
Distances1=Distances1,
Distances2=Distances2,
Angles1=Angles1,
Angles2=Angles2,
)
return D1, D2, data, L

def TRAIN(D):
inputs = []
targets = []
for train_list in D:
D1, D2, data, l = DATA(train_list)
inputs.append((D1, D2, data))
targets.append(l)
return inputs, targets

class GATDataset(Dataset):
def __init__(self, input, target):
self.input = input
self.target = target
def __len__(self):
return len(self.input)
def __getitem__(self, idx):
return self.input[idx], self.target[idx]

input, target = TRAIN(LIST) # List of data
dataset = GATDataset(input, target)
Train_Loader = DataLoader(dataset, batch_size=5)

Is the DataLoader code written correctly? In the end, D1, D2, Distances1, Distances2, Angles1, Angles2 should be fed into the network.

Usually, in cases where the data has different sizes, how should it be transformed to the same size without losing important features or removing parts of the data?

Train:

torch.manual_seed(123)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MODELGAT().to(device)
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.00005, weight_decay=0.0001)

for Batch, L in Train_Loader:
model.train()
optimizer.zero_grad()
Distances1 = Batch[2].Distances1.to(device)
Distances2 = Batch[2].Distances2.to(device)
Angles1 = Batch[2].Angles1.to(device)
Angles2 = Batch[2].Angles2.to(device)
output = model(Batch[0], Batch[1], Distances1, Distances2, Angles1, Angles2)
output = output.float()
target = L.float()
train_loss = criterion(output, target)
train_loss.backward()
optimizer.step()
print(train_loss)

Why is the output of output = model(Batch[0], Batch[1], Distances1, Distances2, Angles1, Angles2) a single number?

output.shape -> torch.Size([]), target.shape -> torch.Size([5])

Continue reading...

Logar ou Criar uma Conta

[Python] Is the problem with the DataLoader or the model?

Stack Membro Participativo

Compartilhe esta Página

Logar ou Criar uma Conta

[Python] Is the problem with the DataLoader or the model?

Stack Membro Participativo

Compartilhe esta Página

Pesquisas Úteis