Political Event Coding as Text-to-Text Sequence Generation



We report on the current status of an effort to produce political event data from unstructured text via a Transformer language model. Compelled by the current lack of publicly available and up-to-date event coding software, we seek to train a model that can produce structured political event records at the sentence level. Our approach differs from previous efforts in that we conceptualize this task as one of text-to-text sequence generation. We motivate this choice by outlining desirable properties of text generation models for the needs of event coding. To overcome the lack of sufficient training data, we also describe a method for generating synthetic text and event record pairs that we use to fit our model.

Cite this Paper (BibTeX)
@article{radford:20221208,
    author={Yaoyao Dai and Benjamin J. Radford and Andrew Halterman},
    title={Political Event Coding as Text-to-Text Sequence Generation},
    journal={Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE)},
    year={2022},
    volume={},
    number={},
    pages={117--123},
    DOI={}}