CWALM Corpus: Analyzing the Moroccan Dialect on Facebook

The CWALM project uses a corpus collected from Facebook to analyze and understand the Moroccan dialect. It includes interactions, posts, and dialogues in Moroccan Arabic, providing insight into informal and spontaneous language use in real-world digital contexts, analyzed using DiMorp tools and manually disambiguated by expert linguists.

CWALM Annotated Corpus (JSON Format)

The fully annotated CWALM corpus is publicly available in JSON format. It contains morphologically analyzed and manually disambiguated Moroccan dialect data.

View Corpus on GitHub
Story 1

Total Tokens: 964

View Story 1
Story 2

Total Tokens: 1210

View Story 2
Story 3

Total Tokens: 2371

View Story 3
Story 4

Total Tokens: 699

View Story 4
Story 5

Total Tokens: 9074

View Story 5
Story 6

Total Tokens: 5876

View Story 6
Story 7

Total Tokens: 2608

View Story 7
Story 8

Total Tokens: 2694

View Story 8
Story 9

Total Tokens: 2909

View Story 9
Story 10

Total Tokens: 4405

View Story 10
Story 11

Total Tokens: 1251

View Story 11
Story 12

Total Tokens: 2292

View Story 12
Story 13

Total Tokens: 1518

View Story 13