This web site is no longer maintained and the content may be outdated.
Please visit for up-to-date information.
No upcoming events...

Home / Graduate / M.S. Theses Completed
  Ali Çıltık, 2006  [download thesis]    

Thesis Title

Time Efficient Spam E-mail Filtering for Turkish


In the present thesis, we propose spam e-mail filtering methods having high accuracies and low time complexities. The methods are based on the n-gram approach and a heuristics which is referred to as the first n-words heuristics. Though the main concern of the research is studying the applicability of these methods on Turkish e-mails, they were also applied to English e-mails. A data set for both languages was compiled. Tests were performed with different parameters. Success rates above 95% for Turkish e-mails and around 98% for English e-mails were obtained. In addition, it has been shown that the time complexities can be reduced significantly without sacrificing from success.

We also propose a combined perception refinement (CPR) which improves baseline success rates around 2%, where development set is used in the first step of the CPR to find out the parameters used in the second step. Free word order is another characteristic of Turkish language; we will make an attempt to implement free word order aspect of Turkish.
Boğaziçi University Department of Computer Engineering
Address: 34342 Bebek, Istanbul, TURKEY
Phone: +90 212 359 4523-24 Fax: +90 212 287 2461
general information:   webmaster: