Welcome! This is our fun project of using AI to write couplets (对对联). Please check out our website!
-
What is couplet (对联)
- Couplet is a two-liner poem with strict semantic and phoenetic rules
- See the Wikipedia (English) page for more details
-
What is this project for
- The model will predict the second line (下联) given the first line (上联)
- We are a Chinese couple living in California and we love traditional Chinese poems!
- This project improves from similar ones (see Reference for their great work) on:
- This is the first of its kind written in TensorFlow 2, to the best of our knowledge
- This model improves from the reference models mostly on (i) emotional (意境) matching; (ii) repeated characters treatment (see the Model section below)
Input | Output |
---|---|
欲把心事付瑶琴 | 且将笔墨书诗画 |
半衾幽梦香初散 | 一曲清音韵未央 |
应是飞鸿踏泥雪 | 何如落雁归故乡 |
云破月来花弄影 | 雨停风送柳含烟 |
别后相思空一水 | 愁中寂寞又几回 |
书中自有黄金屋 | 笔下常留白玉簪 |
花谢花飞花满天 | 鸟啼鸟语鸟鸣春 |
杨柳岸晓风残月 | 芦苇春寒雨乱烟 |
This is a sequence-to-sequence model with Encoder + Decoder + Attention, schematically shown as below.
Notable features
- Embedding layer pretrained via Word2Vec
- Beam search decoder
- Format polishing to prevent repeated characters, etc.
- Download the training data (~70k couplets) from this repo
- Set up the model hyperparameters and the file paths in the baseconfig.ini
- Install the package dependencies
- TensorFlow 2.2.0
- Python 3.6.9
- Numpy 1.18.5
- Gensim 3.6.0
- Scikit-Learn 0.22.0
- Train the model
- Run as
python train.py
- The model parameters will be written to the path set up in the config file
- Our model on the website was trained on Google Colab Pro for 12 epochs (~1 day)
- Run as