The CODA Corpus (Version 1.0)

The CODA corpus is provided under a Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales Licence. When referring to the corpus, please cite: 

Stoyanchev, Svetlana and Piwek, Paul (2010). Constructing the CODA corpus: A parallel corpus of monologues and expository dialogues. In: The seventh international conference on Language Resources and Evaluation (LREC), 18 - 21 May 2010, Malta. 


The creation of the CODA corpus was supported by the UK's Engineering and Physical Sciences Research Council under grant EP/G/020981/1. 

The corpus is provided as is and no guarantee or warranty is given that the corpus is fit for any particular purpose. The user thereof uses the corpus at its sole risk and liability. 

The CODA corpus (version 1.0) is based on the following resources: 


Content of this folder:

CODA_AnnotationManual_v1.1.pdf (description of the CODA corpus annotation process and file formats) 

SINGLE_FILE_ALIGNED/


BY_SECTION_ALL_VIEWS/

The data set is split into sections which were annotated separately. The release contains a subdirectory NAME/ for each annotated section. Each subdirectory contains:

(all formats are described in the AnnotationManual.pdf)


Detailed content

Mark Twain "What is Man": Total number of turns 520 


Berkeley "Three Dialogues between Hylas and Philonous, in Opposition to Sceptics and Atheists" : Total number of turns 172