mining argument structures with expressive inference (with linear and lstm engines)
Marseille learns to predict argumentative proposition types and the support relations between them, as inference in a expressive factor graph.
Read more about it in our paper,
Vlad Niculae, Joonsuk Park, Claire Cardie. Argument Mining with Structured SVMs and RNNs. In: Proceedings of ACL, 2017.
If you find this project useful, you may cite us using:
@inproceedings{niculae17marseille,
author={Vlad Niculae and Joonsuk Park and Claire Cardie},
title={{Argument Mining with Structured SVMs and RNNs}},
booktitle={Proceedings of ACL},
year=2017
}
- numpy
- scipy
- scikit-learn
- pystruct
- nltk
- dill
- docopt
- dynet v1.1
- lightning
- ad3 >= v2.1 (
pip install ad3
)
(replace $ds
with cdcp
or ukp
)
-
download the data from http://joonsuk.org/ and unzip it in the subdirectory
data
, i.e. the path./data/process/erule/train/
is valid. -
extract relevant subset of GloVe embeddings:
python -m marseille.preprocess embeddings $ds --glove-file=/p/glove.840B.300d.txt
- extract features:
python -m marseille.features $ds
# (for cdcp only:)
python -m marseille.features cdcp-test
- generate vectorized train-test split (for baselines only)
mkdir data/process/.../
python -m marseille.vectorize split cdcp
- run chosen model, for example:
python -m experiments.exp_train_test $ds --method rnn-struct --model strict
(for dynet models, set --dynet-seed=42
for exact reproducibility)
- compare results:
python -m experiments.plot_test_results.py $ds
To reproduce cross-validation model selection, you also would need to run:
python -m marseille.vectorize folds $ds
If you have some documents e.g. F.txt, G.txt that you would like to run a pretrained model on, read on.
- download the required preprocessing toolkits: Stanford CoreNLP (tested with version 3.6.0) and the WING-NUS PDTB discourse parser (tested with this commit) and configure their paths:
export MARSEILLE_CORENLP_PATH=/home/vlad/corenlp # path to CoreNLP
export MARSEILLE_WINGNUS_PATH=/home/vlad/wingnus # path to WING-NUS parser
Note: If you already generated F.txt.json with CoreNLP and F.txt.pipe with the WING-NUS parser (e.g., on a different computer), you may skip this step and marseille will detect those files automatically.
Otherwise, these files are generated the first time that a UserDoc
object
is instantiated for a given document. In particular, the step below will do
this automatically.
- extract the features:
python -m marseille.features user F G # raw input must be in F.txt & G.txt
This is needed for the RNN models too, because the feature files encode some metadata about the document structure.
- predict, e.g. using the model saved in step 4 above:
python -m experiments.predict_pretrained --method=rnn-struct \
test_results/exact=True_cdcp_rnn-struct_strict F G