Research Center for Applied Linguistics (RCAL)
About the RCAL
The Research Center for Applied Linguistics (RCAL) is a project initiated by and associated with the Bonn Applied English Linguistics (BAEL) department. Empirical pragmatics involves the study of language in use through analysis of actual language data. This data can be collected using questionnaires, interviews or experiments, but may also be sampled from large collections of texts such as corpora.

Student workspaces at the RCAL
The RCAL provides students with the opportunity to conduct their own empirical research. At the centre students have access to:
- a wide range of corpora
- technical equipment
- software for data collection and analysis
- research lab tools for interviews or role-plays
- a quiet room to conduct their own experiments
Available Corpora
Developer: Pam Peters, Peter Collins and David Blair at Macquarie University, Sydney
Sampling period: 1986
Size: 1 million words; 500 text samples of approx. 2,000 words
Contents: written and spoken language; modelled on LOB and BROWN
Variety sampled: Australian English
Annotation: untagged
Availability: available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of ACE
Developer: Nelson Francis and Henry Kucera at Brown University, Providence, Rhode Island
Sampling period: early 1960sSize:1 million words
Contents: written language; 500 text samples of approx. 2,000 words; 15 text categories
Variety sampled: American English
Annotation: untagged and tagged version POS tagging
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the BROWN Corpus
Developer: Ohio State University: Eric Fossler-Lussier, Elizabeth Hume, Keith Johnson, Mark Pitt
Sampling period: 2000Size:300,000Contents:Interviews of 40 people, each ~ one hour
Variety sampled: American English, "long-term residents of Ohio"
Annotation: phonetic/phonemic transcription, word labels
Availability: Online Access through BAEL Licence
Homepage: Website of the Buckeye Corpus
Developer: M. Rissanen, O. Ihalainen and M. Kytö at the Department of English, University of Helsinki
Sampling period: 1418-1680Size:450,000Contents:personal letters
Variety sampled: British English
Annotation: no annotation
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the CEECS Corpus
Developer: University of Bergen, Norway
Sampling period: 1993
Size: 500,000
Contents: transcripts of spoken language of London teenagers (COLT is part of the BNC)
Variety sampled: British English
Annotation: POS tagging
Availability: available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the Colt Corpus
Developer: Christian Mair at the University of Freiburg
Sampling period: 1990s
Size: 1 million words
Contents: written language; 500 text samples of approx. 2,000 words; 15 text categories (matches the original LOB corpus)
Variety sampled: British English
Annotation: untagged
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the FLOB corpus
Developer: Christian Mair at the University of Freiburg
Sampling period: 1990s
Size: 1 million words
Contents: 500 text samples of approx. 2,000 words; 15 text categories (matches the Brown Coprus)
Variety sampled: American English
Annotation: untagged
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the FROWN corpus
Developer: M. Rissanen, O. Ihalainen and M. Kytö at the Department of English, University of Helsinki
Sampling period: ca. 750 to 1700
Size: 1.5 million words
Contents: samples of Old, Middle and Early Modern English texts
Variety sampled: British English
Annotation: untagged
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the Helsinki Corpus
Developer: M. Rissanen, O. Ihalainen and M. Kytö at the Department of English, University of Helsinki
Sampling period: 1450-1700
Size: 830,000 words
Contents: Old, Middle and Early Modern English texts covering 15 prose genres
Variety sampled: Northern British English
Annotation: untagged
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Bibliography of the Helsinki Corpus of Older Scots (no specific manual available online)
Developer: University of Innsbruck
Sampling period: Middle English Prose: 1100 - 1500; Middle/Early Modern English Letters: 1386 - 1688; Middle/Modern English Texts: in progress
Size: Middle English Prose: 182,000; Middle/Early Modern English Letters: 110,000; Middle/Modern English texts: in progress
Variety sampled: Middle English, Early Modern English, Modern English
Annotation: Middle English Prose: untagged; Middle/Early Modern English Letters: untagged; Middle/Modern English texts: mix of tagged/normalized/translated/manipulated texts
Availability: Available for students at the RCEP
Homepage: Manual information for ICAMET
+ SPICE-Ireland - Systems of Pragmatic annotations for the spoken component of ICE-Ireland
Developer: Jeffrey L. Kallen and John M. Kirk
Sampling period: 1990s
Size: 500 texts, each 2,000 words (1 million words)
Contents: 500 texts, spoken and written language (spoken part 60%):
Spoken (300)
Dialogue (180)
Private (100)
Public (80)
Monologue (120)Unscripted (70)
Scripted (50)
written (200)
Non-printed (50)Non-professional writing (20)
Correspondence (30)
Printed (150)Informational (learned) (40)
Informational (popular) (40)
Informational (reportage) (20)
Instructional (20)
Persuasive (10)
Creative (20)
(Figures adapted from Kennedy (1998: 55))
provides pragmatic and discourse annotation and
a prosodic transcription to 100 of the 300 texts of the spoken component of the ICE-Ireland Corpus.
Variety sampled: Aim is to sample all varieties of English
Annotation: Textual markup, word class tagging, syntactic parsing (+ additional tags in some components)
Hong Kong, East Africa, India, Philippines, Singapore, Jamaica, USA written, Canada, Ireland, SPICE-Ireland, Great Britain, Nigeria, Sri Lanka, Ghana, New Zealand
RCEP: All subcorpora are available at the RCEP
IAAK corpus computer: Great Britain, East Africa are available for students on the corpus computer in the IAAK library
Homepage: Homepage of the ICE corpus
Developer: CECL UCL; Project director: Prof. Sylviane Granger
Sampling period: 1990 - 2000
Size: 3,7 million
Contents: Subcorpora (learners of English):
Variety sampled: Learners of English
Annotation: word form/lemma/POS tagged
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Homepage of the ICLE
Developer: S. K. Verma at University of Lancaster and Shivaji University, Kolhapur
Sampling period: 1978
Size: 1 million words, 500 text samples of approx. 2,000 words
Contents: written language; modelled on BROWN and LOB
Variety sampled: Indian English
Annotation: untagged
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the Kolhapur Corpus
Developer: Josef Schmied, Claudia Claridge and Rainer Siemund at TU Chemnitz
Sampling period: 1640 -1740
Size: 1.1 million words
Contents: non-literary prose texts of Early Modern English (various genres)
Variety sampled: British English
Annotation: textual markup
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Homepage of the Lampeter Corpus
Developer: Gaëtanelle Gilquin, Sylvie DeCock & Sylviane Granger [eds]
Sampling period: 1995 - 2010
Size: 1 million words, c. 50 interviews per subcorpus, each interview ~ 2000 words
Contents: spoken language, interviews with learners of English
National subcorpus: Bulgarian, Chinese, Dutch, French, German, Greek, Italian, Japanese, Polish, Spanish, Swedish
Variety sampled: Interlanguage
Annotation: untagged
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Homepage of the LINDSEI Corpus
Developer: Randolph Quirk and Sidney Greenbaum at University College London Jan Svartvik at Lund University
Sampling period: 1960s, 1975-81, 1985-88
Size: 500,000 words
Contents: spoken language, based on the Survey of English Usage (SEU, 1959, University College London) and on the Survey of Spoken English (SSE, 1975, Lund University)
Variety sampled: British English
Annotation: prosodic and discourse annotation
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the LLC
Developer: Geoffrey Leech, University of Lancaster, and Stig Johansson, University of Oslo, in collaboration with Knut Hofland, Norwegian Computing Centre for the Humanities, Bergen
Sampling period: 1961
Size: 1 million words
Contents: written language; 500 text samples of approx. 2,000 words; 15 text categories; British counterpart of Brown corpus
Variety sampled: British English
Annotation: untagged and tagged version POS tagging
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the LOB Corpus
Developer: Philip Hines, Jr., Norfolk, Virginia
Sampling period: 1692
Size: 750,000 words
Contents: a series of more than 2,000 newsletters in the Newdigate series (most of which are addressed to Sir Richard Newdigate, Warwickshire)
Variety sampled: British English
Annotation: untagged
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the Newdigate Corpus
Developer: The Computational Linguistics Unit at the University of Wales College of Cardiff
Sampling period: 1978-1984
Size: 65,000 words
Contents: transcripts of spoken child language
Variety sampled: British English
Annotation: POS tagging, syntactic parsing
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the PoW Corpus
Developer: John W. Du Bois, Wallace L. Chafe, Sandra A. Thompson, Charles Meyer, Robert Englebretson
Sampling period: 1990s
Size: 249,000 words
Contents: transcripts and audio files of naturally occuring interaction from all over the US (mostly face-to-face conversations)
Variety sampled: American English
Annotation: transcripts are time-stamped, overlap indicated; marked-up version on
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library (Parts 1-4)
Homepage: Homepage of the Santa Barbara Corpus of Spoken American English
Developer: University of Lancaster and IBM Scientific Centre
Sampling period: 1984-87
Size: 52,000 words
Contents: spoken language; transcripts from radio-broadcasts, recordings made at University of Lancaster
Variety sampled: British English
Annotation: prosodic markup, POS tagged
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the SEC
Developer: Laurie Bauer at Victoria University, Wellington
Sampling period: 1986-90
Size: 1 million words; 500 text samples of approx. 2,000 words
Contents: written language; modelled on BROWN and LOB
Variety sampled: New Zealand English
Annotation: untagged
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Developer: Janet Holmes, Bernadette Vine and Gary Johnson at at Victoria University, Wellington
Sampling period: 1988-94
Size: 1 million words; 500 text samples of approx. 2,000 words
Contents: spoken language; formal, semi-formal and informal speech
Variety sampled: New Zealand English
Annotation: discourse markup
Availability: Available for students at the RCEP and on the corpus computer in the IAAK library
Homepage: Manual of the Wellington Corpus (spoken)
Hardware and software available at the RCAL
At the RCAL, students have access to the following research tools:
Action Cams (2): Use at RCAL, Can be borrowed
Webcam (2): Use at RCAL, Can be borrowed Digital Voice
Recorder (2): Use at RCAL, Can be borrowed
Tabletop Microphone (3): Use at RCAL, Can be borrowed
Headset (2): Use at RCAL, Can be borrowed
USB Footpedal (4): Use at RCAL, Can be borrowed
Antconc: Use at RCAL, Freely downloadable online
Audacity: Use at RCAL, Freely downloadable online
Camtasia: Use at RCAL
f4: Use at RCAL, Can be borrowed
- for Microsoft, etc. (3)
- for MacOS (1)
MaxQDA: Use at RCAL, Can be borrowed
OpenSesame: Use at RCAL, Freely downloadable online
Translog-II: Use at RCAL, Freely downloadable online
Wordsmith: Use at RCAL, Can be borrowed
Want to know more?
If you are interested in taking advantage of the resources we have to offer, you can register for the RCAL office hours by emailing the RCAL Mentor Alyson Wong at rcal[at] During the winter term 2024/25 our office hours are on Wednesdays 11:00 - 12:00 only with notification, other times and a possible Zoom call can be agreed upon via email, as well. If you want to book an appointment or have any questions, please send an email to rcal[at] Alyson can advise you on which research tools might suit your research question as well as provide assistance with access to corpora, data analysis tools, and more.
Research Center for Applied Linguistics
Genscherallee 3
53113 Bonn, Germany
Room 3.015 / 3.016 (third floor)
Phone: +49 (0)228 73-4481
Email: rcal[at]