1. Anuncie Aqui ! Entre em contato fdantas@4each.com.br

[Python] Partitioning Large R-MAT Graph Datasets From Binary File

Discussão em 'Python' iniciado por Stack, Outubro 7, 2024 às 12:52.

  1. Stack

    Stack Membro Participativo

    I am currently using a python script to find the ideal point with number of non-zeros (nnz) to split the sparse graph: ( from this answer )

    def myhorsplit(
    matrix: sparse.sparray, n_compute_units: int = 4,
    ) -> list[sparse.sparray]:
    nnz = matrix.getnnz(axis=1).cumsum()
    total = nnz[-1]
    ideal_breaks = np.arange(0, total, total/n_compute_units)
    break_idx = [*nnz.searchsorted(ideal_breaks), None]
    return [
    matrix[i: j, :]
    for i, j in zip(break_idx[:-1], break_idx[1:])
    ]
    def main() -> None:
    rand = np.random.default_rng(seed=0)
    # Create an 8x8 adjacency matrix with the modified element
    adjacency_matrix = [
    (1, 1, 1, 1, 0, 0, 0, 0),
    (1, 0, 1, 0, 0, 0, 0, 0),
    (1, 1, 0, 1, 0, 0, 0, 0),
    (1, 0, 1, 0, 0, 0, 0, 0),
    (0, 0, 1, 0, 0, 1, 0, 1),
    (0, 0, 0, 0, 1, 0, 0, 0),
    (0, 0, 0, 0, 1, 1, 0, 1),
    (0, 0, 1, 0, 1, 0, 1, 0),
    ]
    # csr_matrix = sparse.csr_array(adjacency_matrix)
    csr_matrix = sparse.csr_array(
    rand.integers(low=0, high=2, size=(10_000, 50), dtype=np.uint8)
    )

    partitions = myhorsplit(csr_matrix)

    for i, partition in enumerate(partitions):
    print(f"Partition {i}: {partition.nnz} ones, shape {partition.shape}")
    # print(partition.toarray())


    So the example from the code is split into these for 4 partitions :

    Partition 0: 4 ones, shape (1, 8)
    [[1 1 1 1 0 0 0 0]]
    Partition 1: 5 ones, shape (2, 8)
    [[1 0 1 0 0 0 0 0]
    [1 1 0 1 0 0 0 0]]
    Partition 2: 6 ones, shape (3, 8)
    [[1 0 1 0 0 0 0 0]
    [0 0 1 0 0 1 0 1]
    [0 0 0 0 1 0 0 0]]
    Partition 3: 6 ones, shape (2, 8)
    [[0 0 0 0 1 1 0 1]
    [0 0 1 0 1 0 1 0]]


    Currently, I am trying to split R-MAT graph with scale=29 and edge factor=16 which is around 120GBytes binary file to read and convert to sparse array. This Python script hangs with an OOM Killed message.

    I was wondering in order to achieve this partitioning from the binary file (generated via graph500) is there a way to do this from reading the file only ?

    One option I thought was finding an optimal function that ranks in order all edges depending on their source node and then split from that would still keep the regions right ? How could I approach this problem ?

    Continue reading...

Compartilhe esta Página