This repo is the SHMM model implementation of A Spherical Hidden Markov Model for Semantics-Rich Human Mobility Modeling [AAAI2018].
SHMM, a multi-modal Spherical Hidden Markov Model is designed for semantics-rich human mobility modeling. Under the hidden Markov assumption, SHMM models the generation process of a given trace by jointly considering the observed location, time, and text at each step of the trace.
An example model output is shown below:
All code are in the folder ./code
.
-
To run SHMM on Twitter Dataset, run
./code/run_Twitter.sh
. -
Hyperparameters for Twitter and Synthetic data are in
./code/run/*.yaml
. -
Preprocess:
-
Twitter data preprocessing code is in
./code/Twitter/preprocess
. It is included in./code/run_twitter.sh
as well. -
How to generate background data is in
./code/Twitter/Generate_background_tweet.txt
.
-
-
Model Training:
- The main model for SHMM is in
./code/gmove
.
- The main model for SHMM is in
-
Post-process:
-
To plot figures, the code is in
./code/pre-post-process
. -
To analysis keywords for different states, the code is in
./code/Twitter/find_state_keywords.py
.
-
-
To run SHMM on Synthetic Dataset, run
./code/run_Synthetic.sh
. -
Synthetic Data Generation:
- The code is in
./code/Synthetic-Data
. The main function isgenerate_VMF_data.m
.
- The code is in
-
Synthetic Data Analysis:
- The code is in
./code/gmove
.
- The code is in
Please download the dataset from this link, unzip it and rename the folder data
and put it under ./
.
Now, all raw data, processed data are in the folder ./data
.
-
Raw tweets is in
./data/tf-la/raw/raw_tweet.txt
. -
Background tweets (for better word embedding) is in
./data/tf-la/raw/background_tweet.txt
. -
Processed data for SHMM training is in
./data/tf-la/input/final.txt
. -
Processed data for Gmove training is in
./data/tf-la/input/sequences.txt
,./data/tf-la/input/words.txt
. -
Results will be saved in
./Results/Twitter-LA/Results_LA.txt
- It has the similar data arrangement as LA.
-
Synthetic dataset generation requires a template, which is in
./data/Synthetic/a_template.txt
. -
After synthetic data generation, synthetic data for SHMM training is in
./data/Synthetic/synthetic_data.txt
. -
Data parameters for generating the synthetic data is saved in
./data/Synthetic/data_para.txt
for evaluation purpose.
Summarized Results for Twitter dataset and Synthetic dataset are in the folder ./Results
.
@inproceedings{zhu2018spherical,
title={A Spherical Hidden Markov Model for Semantics-Rich Human Mobility Modeling.},
author={Zhu, Wanzheng and Zhang, Chao and Yao, Shuochao and Gao, Xiaobin and Han, Jiawei},
booktitle={AAAI},
year={2018}
}