This repository is the official implementation of the group polarization detection model proposed in our paper "Heterogeneous Graph-based Polarization Detection (HG-PD): a Model Balancing Crude Processing with Rich Semantics".
Previous studies examining the detection and analysis of online group polarization have primarily concentrated on the single data type, such as self-reported measures, texts and graph structure. Typically these approaches depict group polarization as two factions with opposing viewpoints. However, such methodologies encounter limitations when applied to entertainment topics. To fill this research gap, this paper proposes a novel self-supervised model, termed HG-PD, for polarization detection on online social media.
Leveraging a heterogeneous graph, the model integrates multiple data types. Subsequently, Graph Neural Networks are then utilized to learn the user nodes' representations, guided by MCR2 loss function, for the downstream clustering task. Utilizing a real-world dataset, our model adeptly discerns nuanced differences among users with similar stances, transcending the traditional dichotomy. Furthermore, ablation experiments demonstrate that incorporating multifaceted information enriches the semantic depth of the graph, thereby furnishing meaningful interpretations that facilitate group polarization detection.
- Python version:
3.10.13
- All packages versions are listed in
package_version.txt
- You can refer to
Data_description.txt
for more information about.csv
.xlsx
.pt
and.npy
files in our codes. - Code files
Include 2 types:
- Python scripts
.py
for collecting Sina Weibo's data (Sina_crawl
) and some model-training related functions - Jupyter files
.ipyn
for (a) data processing; (b) all experiments in paper; and (c) visualization for HG-PD (i.e., exp3) - All Jupyter files are in 2 language versions, i.e., Chinese and English, for better understanding :D
- Python scripts
- Python scripts (
.py
)Sina_crawl
: Used for crawling the data we need from Sina weibo (You can use it for crawling other Sina Weibo posts)userInter
: HomoG-based model framework for exp2mcr_HGPD
: HG-PD model framework for exp3mcrLoss
: MCR2 loss functionaugment
: Data augmentationother_func
: Used for constructing membership matrix \PisavePara
: Used for saving loss.csv
and model states.pt
- Jupyter files (
.ipynb
)Data_processing
: Include all data processing steps for 3 experimentsK-Prototype
: Inmplementation of exp1 in our paper; Results are saved inTrain_record/KPrototype
Ablation
: Implementation of exp2 with related visualizations in our paper; Training results are saved inTrain_record/Ablation
and visualizations inVisualization
Model
: Implementation of exp3 in our paper; Results are saved inTrain_record/Model
Analysis_visualize
: Visualizations of exp3 in our paper; Figures are saved inVisualization
Abnormal_compar
: Model Comparisons Experiments in our paper; including 5 GAD models for GP detection and corresponding analysis & visualization (uploaded on 14/09/2024)
- About
Train_record
- We just put the best model state in
Train_record
folder rather than putting all.pt
files (or otherwise that will be many many files...). - If you are interested in all files, I will put a Google Drive link here
- We just put the best model state in
- About
Abnormal_result
(uploaded on 14/09/2024)- Including (a) detection results from all 5 GAD models; (b) csv file involved in analysis of all 5 GAD models
Also feel free to post any issues via Github.
If you have any question on the code, feel free to contact ag.wrld.s@gmail.com or zili7472.uni.sydney.edu.au.