1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Is the problem with the DataLoader or the model?

Discussão em 'Python' iniciado por Stack, Outubro 6, 2024 às 19:52.

  1. Stack

    Stack Membro Participativo

    Is the issue with the DataLoader or the model? Typically, if the input sizes are different, how can they be transformed to a fixed size? If we want to feed the outputs of GAT into a CNN, what is the best approach?

    The complete code is very lengthy. I tried to include the important parts to simplify the issue.

    Model:

    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    from torch_geometric.nn import GATConv
    from torch_geometric.data import Dataset,Data
    from torch_geometric.loader import DataLoader

    class GAT(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, heads):
    super(GAT, self).__init__()
    self.conv1 = GATConv(in_channels, hidden_channels, heads=heads)
    # ... code truncated for brevity : conv2,3,4,5,6 ...
    self.conv7 = GATConv(hidden_channels * heads, out_channels, heads=1, concat=False)
    self.bn = nn.BatchNorm1d(hidden_channels * heads)
    self.dropout = nn.Dropout(0.1)
    def forward(self, data):
    x, edge_index = data.x, data.edge_index
    x = self.conv1(x, edge_index)
    x = self.bn(x)
    x = F.relu(x)
    x = self.dropout(x)
    # ... code truncated for brevity : conv2,3,4,5,6 ...
    x = self.conv7(x, edge_index)
    return x

    class MODELGAT(nn.Module):
    def __init__(self):
    super(DTI, self).__init__()
    self.gat = GAT(5, 16, 32, heads=16)
    def forward(self, D1, D2, Distances1, Distances2, Angles1, Angles2):
    # Distances1, Distances2, Angles1, Angles2: I will use these later in the model
    print(D1) # DataBatch(x=[214, 5], edge_index=[2, 228], batch=[214], ptr=[6])
    print(D2) # DataBatch(x=[110, 5], edge_index=[2, 115], batch=[110], ptr=[6])
    x1 = self.gat(D1)
    x2 = self.gat(D2)
    print(x1.shape) # torch.Size([214, 32])
    print(x2.shape) # torch.Size([110, 32])

    x1 = x1.unsqueeze(0).unsqueeze(0)
    x1 = F.interpolate(x1, size=(x2.size(0), 32), mode='bicubic', align_corners=False)
    x1 = x1.squeeze(0)
    x2 = x2.unsqueeze(0)
    combined = x1 + x2
    combined = combined.unsqueeze(0)

    x = self.cnn(combined)
    x = torch.sigmoid(x)
    x = x.squeeze(0).squeeze(0)

    return x # Problem


    Didn't we consider batch_size as 5! Why are the inputs like this? DataBatch(x=[214, 5], edge_index=[2, 228], batch=[214], ptr=[6])

    In the end, it should produce 5 outputs to be tested with L. But why are the outputs torch.Size([214, 32])?

    Normally, when the inputs have different sizes, resulting in outputs like torch.Size([214, 32]) and torch.Size([110, 32]), how should I transform them to the same size before feeding them into a CNN? Is the method I'm using even correct?

    DataLoader:

    def DATA(D):
    # ... code truncated for brevity : OUT DD1 & DD2 & L...
    L = torch.tensor(L, dtype=torch.float).squeeze(0) # 0 OR 1
    D1 = DD1[0] # graph_data -> Data(x=[59, 5], edge_index=[2, 64])
    Distances1 = torch.tensor(DD1[1], dtype=torch.float32) # torch.Size([57])
    Angles1 = torch.tensor(DD1[2], dtype=torch.float32) # torch.Size([57])
    D2 = DD2[0] # graph_data -> Data(x=[22, 5], edge_index=[2, 23])
    Distances2 = torch.tensor(DD2[1], dtype=torch.float32) # torch.Size([20])
    Angles2 = torch.tensor(DD2[2], dtype=torch.float32) # torch.Size([20])
    data = Data(
    Distances1=Distances1,
    Distances2=Distances2,
    Angles1=Angles1,
    Angles2=Angles2,
    )
    return D1, D2, data, L

    def TRAIN(D):
    inputs = []
    targets = []
    for train_list in D:
    D1, D2, data, l = DATA(train_list)
    inputs.append((D1, D2, data))
    targets.append(l)
    return inputs, targets

    class GATDataset(Dataset):
    def __init__(self, input, target):
    self.input = input
    self.target = target
    def __len__(self):
    return len(self.input)
    def __getitem__(self, idx):
    return self.input[idx], self.target[idx]

    input, target = TRAIN(LIST) # List of data
    dataset = GATDataset(input, target)
    Train_Loader = DataLoader(dataset, batch_size=5)


    Is the DataLoader code written correctly? In the end, D1, D2, Distances1, Distances2, Angles1, Angles2 should be fed into the network.

    Usually, in cases where the data has different sizes, how should it be transformed to the same size without losing important features or removing parts of the data?

    Train:

    torch.manual_seed(123)
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = MODELGAT().to(device)
    criterion = nn.BCEWithLogitsLoss()
    optimizer = torch.optim.AdamW(model.parameters(), lr=0.00005, weight_decay=0.0001)

    for Batch, L in Train_Loader:
    model.train()
    optimizer.zero_grad()
    Distances1 = Batch[2].Distances1.to(device)
    Distances2 = Batch[2].Distances2.to(device)
    Angles1 = Batch[2].Angles1.to(device)
    Angles2 = Batch[2].Angles2.to(device)
    output = model(Batch[0], Batch[1], Distances1, Distances2, Angles1, Angles2)
    output = output.float()
    target = L.float()
    train_loss = criterion(output, target)
    train_loss.backward()
    optimizer.step()
    print(train_loss)


    Why is the output of output = model(Batch[0], Batch[1], Distances1, Distances2, Angles1, Angles2) a single number?

    output.shape -> torch.Size([]), target.shape -> torch.Size([5])

    Continue reading...

Compartilhe esta Página