CWALM Corpus: Analyzing the Moroccan Dialect on Facebook

The CWALM project uses a corpus collected from Facebook to analyze and understand the Moroccan dialect. It includes interactions, posts, and dialogues in Moroccan Arabic, providing insight into informal and spontaneous language use in real-world digital contexts, analyzed using DiMorp tools and manually disambiguated by expert linguists.

CWALM Annotated Corpus (JSON Format)

The fully annotated CWALM corpus is publicly available in JSON format. It contains morphologically analyzed and manually disambiguated Moroccan dialect data.

View Corpus on GitHub

Story 1

Total Tokens: 964

Story 2

Total Tokens: 1210

Story 3

Total Tokens: 2371

Story 4

Total Tokens: 699

Story 5

Total Tokens: 9074

Story 6

Total Tokens: 5876

Story 7

Total Tokens: 2608

Story 8

Total Tokens: 2694

Story 9

Total Tokens: 2909

Story 10

Total Tokens: 4405

Story 11

Total Tokens: 1251

Story 12

Total Tokens: 2292

Story 13

Total Tokens: 1518