1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Knowledegraph Embeddings on Custom Dataset with GraphSAGE or (V)GAE in Torch Geometric

Discussão em 'Python' iniciado por Stack, Outubro 5, 2024 às 11:32.

  1. Stack

    Stack Membro Participativo

    I'm currently working on creating an embedding from my own knowledgegraph. This embedding will be used to identify similar data assets based on their structure. Node2Vec and RandomWalk are working well, but I also want to explore other approaches such as GraphSAGE or GraphAutoencoder to see if they perform better, worse, or comparably.

    I've followed the official tutorials for implementing these approaches, but I've encountered various problems. One of the main issues is transferring the NetworkX graph to a Torch Geometric dataset. Additionally, I don't need to predict a "y" class or add missing edges; I simply want to create an embedding from the Knowledgegraph that arranges similar data assets close to each other.

    I would greatly appreciate any tips on how to best build the knowledgegraph from the dataset, as well as specific code examples for creating a Torch dataset or implementing GraphSAGE based on this foundation.

    Here's the initial situation:

    I use a self-generated data set that describes different data sets on Kaggle - an example can be viewed here.

    Various entities and edges were extracted from this dataset using GLiNER. The reference is always the name of the data set. You can find an excerpt here

    This is the basis for creating the knowledegraph. I have created the KG with the following code:

    # Function for edges
    def increment_edge_weight(G, u, v):
    if G.has_edge(u, v):
    G[v]['weight'] += 1
    else:
    # Füge die Kante hinzu, falls sie noch nicht existiert, mit Gewicht 1
    G.add_edge(u, v, weight=1)

    # Create Graph
    G = nx.Graph()
    ## Add Nodes
    for node in nodes:
    if not G.has_node(node[0]):
    G.add_node(
    node[0].lower(),
    type="entity"
    )
    if not G.has_node(node[1]):
    G.add_node(
    node[1].lower(),
    type="label"
    )
    if not G.has_node(node[2]):
    G.add_node(
    node[2].lower(),
    type="dataset"
    )
    # Add tuple edges
    increment_edge_weight(G, node[0].lower(), node[1].lower())
    increment_edge_weight(G, node[1].lower(), node[2].lower())
    increment_edge_weight(G, node[0].lower(), node[2].lower())

    #####
    print(f"Number of nodes: {G.number_of_nodes()}")
    print(f"Number of edges: {G.number_of_edges()}")


    My approach to creating a torch_geometric dataset is this one:

    # Schritt 1: Erstelle ein Mapping für 'id' (String zu Integer)
    id_mapping = {id_str: i for i, id_str in enumerate([x[0] for x in list(G.nodes(data=True))])}

    # Schritt 2: Erstelle ein Mapping für 'group' (String zu Integer)
    group_mapping = {group_str: i for i, group_str in enumerate([x[1]['type'] for x in list(G.nodes(data=True))])}

    from torch_geometric.utils.convert import from_networkx
    from torch_geometric.nn import Node2Vec

    # NetworkX in PyTorch Geometric Graph umwandeln
    data = from_networkx(G)

    # Konvertiere 'id' als Feature (data.x)
    # data.x = torch.tensor([id_mapping[G.nodes[node]['id']] for node in G.nodes], dtype=torch.long).unsqueeze(1)
    data.x = torch.tensor([id_mapping[node] for node in G.nodes], dtype=torch.long).unsqueeze(1)

    # Konvertiere 'group' als Label (data.y)
    # data.y = torch.tensor([group_mapping[G.nodes[node]['type']] for node in G.nodes(data=True)], dtype=torch.long)
    data.y = torch.tensor([group_mapping[node[1]['type']] for node in G.nodes(data=True)], dtype=torch.long)

    # Train- und Validation-Masken erstellen (z.B. 80% Train, 20% Validation)
    train_mask = torch.zeros(data.num_nodes, dtype=torch.bool)
    val_mask = torch.zeros(data.num_nodes, dtype=torch.bool)

    train_indices = torch.randperm(data.num_nodes)[:int(0.8 * data.num_nodes)]
    val_indices = torch.randperm(data.num_nodes)[int(0.8 * data.num_nodes):]

    train_mask[train_indices] = True
    val_mask[val_indices] = True

    data.train_mask = train_mask
    data.val_mask = val_mask


    This is the basis for creating the knowledegraph. I have created the KG with the following code:

    My approach to creating a torch_geometric dataset is this one Based on this, are there any suggestions or ideas on how to best use such a graph in GraphSAGE, GAE or other approaches? Since I don't have a ground truth as described above, I have focused on unsupervised methods. I am grateful for any help!

    Continue reading...

Compartilhe esta Página