GitHub - FSoft-AI4Code/HierarchyNet

[EACL 2024] HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations

Existing code summarization approaches primarily leverage Abstract Syntax Trees (ASTs) and sequential information from source code to generate code summaries while often overlooking the critical consideration of the interplay of dependencies among code elements and code hierarchy. However, effective summarization necessitates a holistic analysis of code snippets from three distinct aspects: lexical, syntactic, and semantic information. In this paper, we propose a novel code summarization approach utilizing Heterogeneous Code Representations (HCRs) and our specially designed HierarchyNet. HCRs adeptly capture essential code features at lexical, syntactic, and semantic levels within a hierarchical structure. Our HierarchyNet processes each layer of the HCR separately, employing a Heterogeneous Graph Transformer, a Tree-based CNN, and a Transformer Encoder. In addition, our approach demonstrates superior performance compared to fine-tuned pre-trained models, including CodeT5, and CodeBERT, as well as large language models that employ zero/few-shot settings, such as StarCoder and CodeGen.

Environment

All source code are written in Python. Besides Pytorch, we also use many other libraries such as DGL, scikit-learn, pandas, jsonlines.

Run

Datasets All the datasets used in the paper are publicly accessible.
Data preprocessing: Folder preprocessing is used to prepare data in the proper format before training. Go to this folder for more information.
Modify the configuration file in the folder c2nl/configs such that all the paths are valid
Train model


cd c2nl

bash main/train.sh

Experimental Results

Baselines Examined baselines are grouped into three categories:

Training from scratch: PA-former, CAST, NCS
Fine-tuning pretrained models: CodeT5, CodeBERT
In-context learning: StarCoder and CodeGen-Multi 2B

Results Results indicate that HierarchyNet surpasses the others with large margins on all the datasets. Our evaluations demonstrate that HierarchyNet, which utilizes a hierarchical-based architecture and dependencies information, significantly improves performance in code summarization tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
c2nl		c2nl
main		main
preprocessing		preprocessing
scripts		scripts
readme.md		readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[EACL 2024] HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations

Environment

Run

Experimental Results

About

Releases

Packages

Contributors 2

Languages

FSoft-AI4Code/HierarchyNet

Folders and files

Latest commit

History

Repository files navigation

[EACL 2024] HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations

Environment

Run

Experimental Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages