Structure Similarity Task training #273

ahariri13 · 2024-05-15T08:08:07Z

Hello ! I'm still new to learning on proteins and I was wondering how to train on the Structure Similarity Task (at least in an efficient manner) when using the graph format for PyTorch Geometric.

For loading the data i am using the following lines:

"""## Load the task and the dataset"""
datapath = './data/ec'
task = ps_tasks.StructureSimilarityTask(root=datapath)
dset = task.dataset

"""We convert the protein 3D structures to $\epsilon$-graphs ($\epsilon=8$ here):"""

def transform(data):
    data, protein_dict = data
    data.y = protein_dict['protein']['ID']
    return data

dset2 = dset.to_graph(eps=8.0).pyg(transform=transform)

from torch.utils.data import Subset
from torch_geometric.loader import DataLoader

batch_size = args.batch_size
train_loader = DataLoader(Subset(dset2, task.train_index), batch_size=batch_size,shuffle=True, num_workers=0)

val_loader = DataLoader(Subset(dset2, task.val_index), batch_size=batch_size,shuffle=False, num_workers=0)

test_loader = DataLoader(Subset(dset2, task.test_index), batch_size=batch_size,shuffle=False, num_workers=0)

My understanding is that we need to take two graph (protein) samples, embed them and predict a regression value for the similarity. Using the PyG dataloader will batch all dictionaries together, that's why i decided to select only the protein ID to be batched, and so i removed the ['protein']['ID'] part from the target task function in structure_similarity.py. As a result, my model looks as follows:

    def forward(self, batch):

        it=0
        for sample in batch: ## embed each batch in every sample separately. 
          x=sample.x
          edge_index=sample.edge_index

          x = self.x_embedding(x)
          x = self.conv1(x, edge_index)
          x = F.leaky_relu(x)
          x=self.bano1(x)
          #x = F.dropout(x, training=self.training,p=0.2)

          x = self.conv2(x, edge_index)
          x = F.leaky_relu(x)
          x=self.bano2(x)
          #x = F.dropout(x, training=self.training,p=0.2)

          x = self.conv3(x, edge_index)
          x = F.relu(x)
          x = self.bano3(x)
          # #x = F.dropout(x, training=self.training,p=0.2)

          x = self.conv4(x, edge_index)
          # x = F.relu(x)
          # x = self.bano3(x)
          # # #x = F.dropout(x, training=self.training,p=0.2)
          if it==0:
            s1=global_add_pool(x, sample.batch)
          else:
            s2=global_add_pool(x, sample.batch)

          it+=1
        final=self.mlpRep(s1+s2)

        return final

and the evaluation function where i have to do a for loop to append the ground truths labels for the similarity values.

@torch.no_grad()
def eval_epoch(model, loader):
    model.eval()

    y_true = []
    y_pred = []

    for step, batch in enumerate(val_loader):
        size = len(batch[0].y)
        batch[0] = batch[0].to(device)
        batch[1] = batch[1].to(device)

        y_hat=model(batch)

        truths=[]
        for g in range(size):
          truths.append(task.targetBatch(batch[0].y[g],batch[1].y[g]))
        y_pred.append(y_hat)
        y_true.append(torch.Tensor(truths))

    y_true = torch.hstack(y_true).detach().cpu().numpy()
    y_pred2 = torch.vstack(y_pred).detach().cpu().numpy()
    scores = task.evaluate(y_true, y_pred2)
    return scores

Of course the training is taking too long and I would appreciate any tip on how to use the protein shake package more efficiently for this task. Thanks a lot in advance !

The text was updated successfully, but these errors were encountered:

cgoliver · 2024-05-16T10:46:20Z

Dear @ahariri13 thank you for contacting us! Glad to hear you are using the tool.

I am very busy for the next week and haven't had a chance to look closely at your code but I know @claying has run some experiments on this dataset and would have some suggestions.

Best,
Carlos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structure Similarity Task training #273

Structure Similarity Task training #273

ahariri13 commented May 15, 2024

cgoliver commented May 16, 2024 •

edited

Loading

Structure Similarity Task training #273

Structure Similarity Task training #273

Comments

ahariri13 commented May 15, 2024

cgoliver commented May 16, 2024 • edited Loading

cgoliver commented May 16, 2024 •

edited

Loading