The CWALM project uses a corpus collected from Facebook to analyze and understand the Moroccan dialect. It includes interactions, posts, and dialogues in Moroccan Arabic, providing insight into informal and spontaneous language use in real-world digital contexts, analyzed using DiMorp tools and manually disambiguated by expert linguists.
The fully annotated CWALM corpus is publicly available in JSON format. It contains morphologically analyzed and manually disambiguated Moroccan dialect data.
View Corpus on GitHub