publications

Books & Edited volumes

2024

  1. Tibetan digital humanities and natural language processing
    Marieke Meelen, Christian Faggionato, and Nathan Hill
    Proceedings of the IATS 2022 panel as a Special Issue of the Revue d’Etudes Tibétaines
  2. Ngi Dzardzongke ki choe gongma [My first Dzardzongke book: early reader to learn the language of South Mustang, Nepal]
    Marieke Meelen, Charles Ramble, and Kemi Tsewang

2022

  1. Creating annotated corpora for historical languages
    Marieke Meelen and David Willis
    Special Issue for Journal of Historical Syntax

2016

  1. Why Jesus and Job spoke bad Welsh: The origin and distribution of V2 orders in Middle Welsh
    Marieke Meelen

Peer-Reviewed Articles & Chapters

2026

  1. From Large and Complex Manuscript Collections to Searchable eTexts: the Case of PaganTibet
    Rachael M. Griffiths and Marieke Meelen
    Revue d’Etudes Tibétaines , 80

2025

  1. Comparing efficacy of IPA vs Pinyin romanisation transcriptions for complex tonal languages: A case study in Baima
    Katia Chirkova, Rolando Coto-Solano, Rachael Griffiths, and Marieke Meelen
    In Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages , pp. 170-181
  2. Syntactic reconstruction in Celtic
    Marieke Meelen
    In Foundational approaches to Celtic linguistics , pp. 417–467
  3. Collaborative Workflows for Handwritten Text Recognition in Under-Resourced Manuscript Collections
    Marieke Meelen and Rachael M Griffiths
    Journal of Open Humanities Data , 11 , pp. 1-54
  4. How ‘Pagan’ is my text? Information Extraction from untranscribed data
    Rachael M. Griffiths and Marieke Meelen
    In Proceedings of the Computational Humanities Research conference , pp. 1262–1273

2024

  1. End-to-end speech recognition for endangered languages of Nepal
    Marieke Meelen, Alexander O’Neill, and Rolando Coto-Solano
    In Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages , pp. 83–93
  2. Breakthroughs in Tibetan NLP & Digital Humanities
    Marieke Meelen, Sebastian Nehrdich, and Kurt Keutzer
    Revue d’Etudes Tibétaines , 72 , pp. 5-25
  3. The Diachronic Annotated Corpus of Newar: From manuscript to morphosyntax
    Alexander James O’Neill and Marieke Meelen
    Cahiers de Linguistique Asie Orientale , 54 , pp. 162–191
  4. The diachrony of Welsh subject pronouns
    Marieke Meelen and David Willis
    Studia Celtica Posnaniensia Special Issue: Noun phrase and pronominal syntax in medieval and early modern Celtic languages , 9 , pp. 84-111

2022

  1. Towards a historical treebank of Middle and Modern Welsh: Syntactic parsing
    Marieke Meelen and David Willis
    Journal of Historical Syntax , 6
  2. Crosslinguistic semantic textual similarity of Buddhist Chinese and Classical Tibetan
    Rafal Felbur, Marieke Meelen, and Paul Vierthaler
  3. NLP pipeline for annotating (endangered) Tibetan and Newar varieties
    Christian Faggionato, Nathan Hill, and Marieke Meelen
    In Proceedings of the Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia within the 13th Language Resources and Evaluation Conference , pp. 1-6
  4. Towards coreference resolution for Early Irish
    Mark Darling, Marieke Meelen, and David Willis
    In Proceedings of the 4th Celtic Language Technology Workshop within LREC2022 , pp. 85–93
  5. Creating annotated corpora for historical languages
    Marieke Meelen and David Willis
    Journal of Historical Syntax , 6 , pp. 1–5
  6. What are cognates?
    Marieke Meelen, Nathan W Hill, and Hannes Fellner
    Papers in Historical Phonology , 7 , pp. 44-80

2021

  1. Optimisation of the largest annotated Tibetan corpus combining rule-based, memory-based, and deep-learning methods
    Marieke Meelen, Élie Roux, and Nathan Hill
    ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) , 20 , pp. 1–11
  2. Towards a historical treebank of Middle and Early Modern Welsh, part I: Workflow and POS tagging
    Marieke Meelen and David Willis
    Journal of Celtic Linguistics , 22 , pp. 125–154
  3. Old Catalan Morphosyntax: Developing an Annotated Corpus
    Marieke Meelen and Pujol I Campeny

2020

  1. Annotating Middle Welsh: POS tagging and chunk-parsing a partial corpus of native prose
    Marieke Meelen
    In Corpus-based approaches to morphoysyntactic variation and change in medieval Celtic languages , pp. 27-47
  2. Adjectival agreement in Middle and Early Modern Welsh native and translated prose
    Marieke Meelen and Silva Nurmio
    Journal of Celtic Linguistics , 21 , pp. 1–28
  3. V3 in Dutch urban varieties
    Lisa L Cheng, Marieke Meelen, and Khalid Mourigh
    Open Generative Syntax , pp. 327–355
  4. Meta-dating the parsed corpus of Tibetan (PACTib)
    Marieke Meelen and Élie Roux
    In Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories , pp. 31–42
  5. (Not so) Great Expectations: Listening to foreign-accented speech reduces the brain’s anticipatory processes
    Niels O Schiller, Bastien P-A Boutonnet, Marianne LS De Heer Kloots, Marieke Meelen, and 2 more authors
    Frontiers in Psychology , 11 , pp. 2143
  6. Reconstructing the rise of Verb Second in Welsh
    Marieke Meelen
    In Rethinking Verb Second , pp. 426–454

2019

  1. Developing the Old Tibetan treebank
    Christian Faggionato and Marieke Meelen
    In Proceedings of Recent Advances in Natural Language Processing , pp. 304–312

2017

  1. Object-initial word order in Middle Welsh narrative prose
    Marieke Meelen
    In Referential Properties and Their Impact on the Syntax of Insular Celtic Languages. Studien und Texte zur Keltologie 14 , pp. 145-178
  2. Segmenting and POS tagging Classical Tibetan using a memory-based tagger
    Marieke Meelen and Nathan Hill
    Himalayan Linguistics , 16 , pp. 64-89

2015

  1. Promoting youth development worldwide: The Duke of Edinburgh’s international award
    Eva van Baren, Marieke Meelen, and Lucas CPM Meijs
    Journal of Youth Development , 10 , pp. 1-14

Annotated Corpora & Other Datasets

2025

  1. Ground Truth for PaganTibet Ume models 1 and 2
    Rachael M. Griffiths, Marieke Meelen, Daniel Berounský, Marc Jardins, and 17 more authors
    Oct
  2. HTR Input and Correction Cheat Sheet: 10 Basic Rules and Protocols for Diplomatic Transcription
    Rachael M. Griffiths and Marieke Meelen
  3. HTR Input and Correction Manual
    Marieke Meelen and Rachael M. Griffiths

2024

  1. Classical Newar Annotation Manual: Part I - Preprocessing & Segmentation
    Alexander O’Neill and Marieke Meelen
  2. Classical Newar Annotation Manual: Part II - Part-of-Speech Tagging
    Alexander O’Neill and Marieke Meelen
  3. Diachronic Annotated Corpus of Newar (DACON)
    Alexander O’Neill and Marieke Meelen

2023

  1. Classical Tibetan Annotation Manual Part II - Segmentation & POS tagging
    Marieke Meelen, Christian Faggionato, and Nathan Hill

2022

  1. An audio-visual archive of Dzardzongke (South Mustang Tibetan
    Marieke Meelen
  2. The first annotated corpus of Old Catalan
    Marieke Meelen and Afra Campeny
  3. Classical Tibetan Word Embeddings
    Marieke Meelen

2020

  1. The Annotated Corpus of Classical Tibetan (ACTib) - Version 2.0 (Segmented & POS-tagged)
    Marieke Meelen and Élie Roux
  2. The Annotated Corpus of Classical Tibetan (ACTib) - Version 2.0 (Segmented & POS-tagged)
    Marieke Meelen and Élie Roux

2019

  1. Kaike - Chaaigo Lapsol
    Marieke Meelen and Jag Bahadur Budha

2018

  1. PARSHCWL – The annotated texts of the Llyfr yr Ancr
    Marieke Meelen, Raphael Sackmann, and Elena Parina
  2. An audio-visual archive and searchable corpus of Kaike, an endangered Tibeto-Burman language of Dolpa, Nepal
    Marieke Meelen