Description
Clear prefixes first, clear suffixes second
Greedy to choose between nouns suffixes and verb suffixes: طالبات
الزمان
والشمس
لمعالجة
أفنلزمكموها
س لا تلتصق إلا بأفعال المضارع ا
Detecting است prefix and define using it if noun or verb and also larger the size condition by 3: نسنعين ,
in suffixes, جمع مذكر السالم نادرا ماتكون جذع اقل من 4
و الفعل المضارع اللواحق يجب أن تترك الحجم 4 لأن للمضارع سابقا من حرف واحد
study the case of والأمر
make suffixes to set/unset is_noun, is_verb
don't stem if it contains a number or english number or size = odd
define regions before start stemming, test everything then perform stemming
black list: Ignore some predefined words, or does it worth
remove feminine marks and study feminine patterns
remove broken plural infixes: أطفال، كواسر ،نُمور
consider vocalization when exists:
tanween means a noun
detect and process_vocalized texts
Study patterns and guess it before stemming
Verb conjugation prefixes: a, t, y, n, if it has suffix, then remove the prefix with it
Rename routines to better-explaining names
study Alef-tanween
study idgham
Calculate probability of being noun or being verb
Prefix confusion
2 letters words
improve from ISRI ideas
improve from khoja ideas
improve from tashaphine ideas
optimize performance
filter stop words
Reactions are currently unavailable
You can’t perform that action at this time.
alkalfalbalbbshould marked first, and setis_nounaawwffshould marked first