I am a linguist, translator, and overall nerd for all things Arabic. I am interested in AI (especially LLMs), Tunisian Arabic, Arabic NLP, Arabic pedagogy, Arabic linguistics, and literary translation.
Dissertation: "Tunisian Arabic as a written language: Identity and vernacularization"
Thesis Topic: “Fī (‘in’) as a marker of the progressive aspect in Tunisian Arabic: A cognitive and historic approach.”
Support a critical NLP project for Amazon Comprehend by developing and implementing data collection and annotation strategies and assuring high data quality. Work with stakeholders in science, engineering, and product teams to optomize datasets for model performance and customer needs.
Performed semantic annotations for Arabic NLP training data as well as creating guidelines, managing remote team, and reviewing annotations for accuracy.
Created Arabic curriculum for web-based language and cultural training for military foreign area officers. Project involves researching and collecting level-appropriate authentic materials, creating exercises (listening, reading, practical and supplemental), and creating assessments.
Inaugurated office and built program to support military-affiliated students on campus. Also served as a first-year and sophomore academic advisor, a member of the Diversity Advisory Board, a member of the Health Careers Advisory Committee, and a First Readings seminar leader.
Responsible for reviewing and translating entries for the CJK Arabic Learners’ Dictionary. Ensured accuracy of English translations and appropriateness of Arabic examples. Also ensured that sense division, headword selection, organization of entries, and typography follow set guidelines.
Reviewed entries for the Oxford Arabic Dictionary (2014) including verifying the accuracy and naturalness of English translations of Arabic headwords and examples, as well as utilizing the billion-word Oxford Arabic corpus to expand entries and discover new word senses that were previously not reflected in existing monolingual or bilingual Arabic dictionaries. Also prepared resource materials for the team and trained other reviewers.
Translated foreign intelligence materials using Standard Arabic, Syrian, Iraqi and Libyan.
Translated foreign intelligence; trained and mentored 25 Arabic linguists; provided quality control.
Social and technological changes over the past several decades have led to widespread writing of “spoken” Arabic dialects. In Tunisia, vernacular writing has flourished since the 2011 revolution: although the first novel written entirely in Tunisian dɛ̄rja did not appear until 2013, there are now nine vernacular novels, in addition to several translations, memoirs, and children’s books. This burgeoning print literature is just one part of expansion of vernacular Tunisian into domains previously reserved for Standard Arabic, such as advertisements (Walters 2003), radio stations (Achour Kallel 2011), classrooms (Bach Baoueb & Toumi 2012), the mosque (Sayahi 2014), and even in government (Sayahi 2019; Achour Kallel 2015).This dissertation examines the expansion of Tunisian Arabic into writing. Encoding an ‘unwritten’ language in writing is not straightforward and mechanical, but rather a complex process that balances practical considerations with ideological stances such as autonomy from the standard language (Mühleisen 2005). Practical issues, such as affinity with an established written language in which people are accustomed to reading, may lead writers to prefer more Standard Arabic-like features, for example by preferring ⟨blAdh بلاده⟩ for blādu 'his country'. On the other hand, the writing of Tunisian Arabic as an expression of Tunisian national identity—distinct from the Islamic and pan-Arab identities—may lead writers to prefer forms that hew closer to the vernacular pronunciation, such as ⟨blAdw بلادو⟩. Using a quantitative analysis of nine print novels (2013–2021) and a 32-million-word corpus of internet forum posts (2010–2020), this dissertation explores the expansion of Tunisian Arabic into writing and how Tunisians writing in dɛ̄rja make orthographic choices to collectively position themselves in relation to Standard Arabic, French, and the other Arabic vernaculars. The study finds that the writers who view Tunisian Arabic as an independent “language” and Tunisian as a distinct national identity — in contrast or even conflict with an Islamic pan-Arab identity — are more likely to both write in Tunisian Arabic and to use phonemic, rather than etymological, spellings. It also finds that even pro-Standard Arabic / pro-Arab partisans often express their arguments in Tunisian Arabic, underlining the extent to which Tunisian Arabic has become normalized as a written language. Through this analysis, this study provides a valuable window into the process of vernacularization in the Arab world.
I created the first large-scale corpus of Tunisian Arabic, available free to the public at tunisiya.org. The corpus contains materials from a wide variety of genres, including novels, folktales, talk radio, blogs, and screenplays, and is accessible through a custom-built search and concordancing tool. This corpus has been used by scholars all over the world, supplying data for journal articles, dissertations, and at least one book. The ultimate goal of the corpus is to provide data for reference materials for Tunisian Arabic (and North African Arabic more broadly), most critically a bilingual Tunisian-English dictionary, grammar, and basic coursebook for Tunisian Arabic.