A Prompt-based Knowledge Graph Foundation Model for Universal In-Context Reasoning

Extensive knowledge graphs (KGs) have been constructed to facilitate knowledge-driven tasks across various scenarios. However, existing work usually develops separate reasoning models for different KGs, lacking the ability to generalize and transfer knowledge across diverse KGs and reasoning settings. In this paper, we propose a prompt-based KG foundation model via in-context learning, namely KG-ICL, to achieve a universal reasoning ability. Specifically, we introduce a prompt graph centered with a query-related example fact as context to understand the query relation. To encode prompt graphs with the generalization ability to unseen entities and relations in queries, we first propose a unified tokenizer that maps entities and relations in prompt graphs to predefined tokens. Then, we propose two message passing neural networks to perform prompt encoding and KG reasoning, respectively. We conduct evaluation on 43 different KGs in both transductive and inductive settings. Results indicate that the proposed KG-ICL outperforms baselines on most datasets, showcasing its outstanding generalization and universal reasoning capabilities.

Updates

2024-10-24: CPU version: Now, KG-ICL can be run on CPU devices without the need for a GPU.
2024-10-14: We have released the code and datasets for KG-ICL.
2024-09-26: Our paper has been accepted by NeurIPS 2024!

Instructions

A quick instruction is given for readers to reproduce the whole process. Please use python 3.9 to run the code.

Install dependencies

pip install torch==2.2.0 --index-url https://download.pytorch.org/whl/cu118
pip install torch-scatter==2.1.2 torch-sparse==0.6.18 torch-geometric==2.4.0 -f https://data.pyg.org/whl/torch-2.2.0+cu118.html
pip install ninja easydict pyyaml tqdm

If the numpy version is not compatible, please install the following version:

pip uninstall numpy
pip install numpy==1.24.0

We use the rspmm kernel. Please make sure your CUDA_HOME variable is set properly to avoid potential compilation errors, eg

export CUDA_HOME=/usr/local/cuda-11.8/

If you do not have GPUs, or the rspmm kernel is not compiled successfully, please set the hyperparameter use_rspmm to False.

Dataset

Release datasets

unzip datasets.zip

Process datasets

cd datasets
chmod +x process.sh
./process.sh

Quick Start

If you have any difficulty or question in running code and reproducing experimental results, please email to yncui.nju@gmail.com.

For pre-training

cd src
python pretrain.py

The checkpoints will be stored in the ./chechpoint/pretrain/ fold.

For evaluation, please replace the checkpoint path and test dataset path in the following shell script:

cd shell
chmod +x test.sh
./test.sh

If you want to inference for a specific dataset, please replace the checkpoint path and evaluation dataset path:

cd src
python evaluation.py --checkpoint_path ./checkpoint/pretrain/kg_icl_6l --test_dataset_list [dataset_name]

For fine-tuning, please replace the checkpoint path and fine-tune dataset path in the following shell script:

cd shell
chmod +x finetune.sh
./finetune.sh

Results

We have released three versions of the model, including KG-ICL-4L, KG-ICL-5L, KG-ICL-6L. They are training with 4, 5, 6 layers of KG encoder, respectively. If your device has limited memory, you can choose the KG-ICL-4L model. If you have enough memory, you can choose the KG-ICL-6L model. Here are the results of the three models:

MRR:

Model	Inductive	Fully-Inductive	Transductive	Average
Supervised SOTA	0.466	0.210	0.365	0.351
ULTRA (pretrain)	0.513	0.352	0.329	0.396
ULTRA (finetune)	0.528	0.350	0.384	0.421
KG-ICL-4L (pretrain)	0.550	0.434	0.328	0.433
KG-ICL-5L (pretrain)	0.554	0.438	0.346	0.441
KG-ICL-6L (pretrain)	0.550	0.442	0.350	0.443
KG-ICL-6L (finetune)	0.592	0.444	0.413	0.481

Hits@10:

Model	Inductive	Fully-Inductive	Transductive	Average
Supervised SOTA	0.607	0.347	0.511	0.493
ULTRA (pretrain)	0.664	0.536	0.479	0.557
ULTRA (finetune)	0.684	0.542	0.548	0.590
KG-ICL-4L (pretrain)	0.696	0.622	0.471	0.590
KG-ICL-5L (pretrain)	0.705	0.635	0.501	0.608
KG-ICL-6L (pretrain)	0.706	0.642	0.504	0.611
KG-ICL-6L (finetune)	0.738	0.640	0.566	0.644

Citation

If you find the repository helpful, please cite the following paper

@inproceedings{cui2024prompt,
  title = { A Prompt-based Knowledge Graph Foundation Model for Universal In-Context Reasoning },
  author = { Cui, Yuanning and 
            Sun, Zequn and 
            Hu, Wei },
  booktitle = { NeurIPS },
  year = { 2024 }
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
checkpoint		checkpoint
shell		shell
src		src
LICENSE		LICENSE
Logo.jpg		Logo.jpg
NeurIPS-slides.pdf		NeurIPS-slides.pdf
README.md		README.md
datasets.zip		datasets.zip
overview.png		overview.png
poster.pdf		poster.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Prompt-based Knowledge Graph Foundation Model for Universal In-Context Reasoning

Updates

Instructions

Install dependencies

Dataset

Quick Start

Results

MRR:

Hits@10:

Citation

About

Releases

Packages

Contributors 2

Languages

License

nju-websoft/KG-ICL

Folders and files

Latest commit

History

Repository files navigation

A Prompt-based Knowledge Graph Foundation Model for Universal In-Context Reasoning

Updates

Instructions

Install dependencies

Dataset

Quick Start

Results

MRR:

Hits@10:

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages