A New Bio-Inspired Method for Spam Image-Based Emails Filtering

A New Bio-Inspired Method for Spam Image-Based Emails Filtering

Abdelkrim Latreche, Kadda Benyahia
Copyright: © 2021 |Pages: 22
DOI: 10.4018/IJOCI.2021040102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Electronic mail has become one of the most popular and frequently used channels for personal and professional online communication. Despite its benefits, e-mail faces a major security problem, which is the daily reception of a large number of unsolicited electronic messages, known as “spam emails.” Today, most electronic mail systems have simple spam filtering mechanisms based on text spam filtering technologies. To circumvent these filters, spammers are introducing new techniques of embedding spam messages in the image attached to the mail. In this article, the authors propose a new method for spam image filtering. The proposed system can distinguish between legitimate images from spam images based on the texture characteristics of the image attached to an email. From each image, around 20 characteristics can be extracted from the gray level co-occurrence matrix (GLCM). Then, to classify the images as spam or ham, the authors use a new metaheuristic nature-inspired model for building classifiers based on the social worker bees and enhanced nearest-centroid classification method.
Article Preview
Top

1. Introduction

Today, electronic mail or e-mail has become one of the most popular, powerful, and frequently used channels for personal and professional online communication. As an indication, the total number of worldwide email accounts reached about 4.3 billion accounts in 2016, with a yearly progress rate of 6% by 2020 (Radicati and Hoang, 2016). The number of e-mails sent worldwide every day is 293 billion in 2019 (excluding spam) (Arobase.org 2020). The success of email is due in part to its quick, permanency, low cost, and easy of data distribution.

Despite these benefits, electronic mail faces a major security problem, which is the daily reception by the users of a large number of unsolicited electronic messages, known as “spam emails”. Spam is an irrelevant, unsolicited or unwanted text or image mail received by users and often sent by an obscure sender without user consent, which often may contain advertisements, adult content, malware, and many more. The widespread and massive use of e-mail makes it a preferred target for spammers. Spam has become a major problem for Internet networks (Al-Duwairi et al., 2012; Ketari, et al, 2012). According to a recent study by Symantec, spam emails now represents about 91% of all emails. Today, most electronic mail systems have spam filtering mechanisms that can block or quarantine unwanted mail, and most of them are essentially based on text spam filtering technologies. In this context, many classification systems have been developed to detect and filter spam emails, according to a certain number of characteristics, such as their header, subject, and content. For example, in (Lai and Tsai, 2004), the authors exploit four machine learning algorithms used to detect spam using different parts of the email message. The machine learning algorithms are KNN, SVM, Naïve Bayes, etc. For a survey and review of existing and emerging techniques, see (Blanzieri et Bryl, 2008; Caruana and Li, 2012; Attar et al., 2013, Zamel et al., 2018, Khawandi et al., 2019).

To circumvent these strong text-based detection filters, spammers reacted by introducing new techniques of embedding spam text inside images attached to the e-mail, known as “spam image”. Spam Image is a sort of email spam where the textual spam message is embedded into images that are then joined to spam emails. The earlier spam images contained easily readable text, as shown in Figure 1a. Spam text embedded in an image can be an effective method of circumventing text filtering systems (Gao et al., 2008). This type of spamming has developed rapidly in recent years, so the major challenge for new filtering systems is to find effective methods to distinguish a spam image from a legitimate image (ham) contained in the email. To achieve this goal, many works have been achieved by proposing techniques to filter this type of image contained in electronic mails. In general, spam image detection techniques are divided into 3 categories (Attar et al., 2013; Hosseini, and Rahmati, 2015): i) Techniques based on the spam email header which consists of many fields that provide a useful range of information for analysis and detection ii) Techniques based on OCR (Optical Character Recognition) which use the OCR technique to extract the text embedded in the image. iii) Content-based techniques using image content analysis, and feature extraction.

OCR based techniques use optical character recognition techniques to extract the text embedded in spam images and then submit it along with the text body in the email to text-based detection techniques (Biggio et al. 2008; Sathiya et al., 2011; Nisha and Gaikwad, 2015). Recently, to circumvent this type of spam filter, spammers have introduced obfuscation techniques to spam images to prevent OCR tools from reading the text embedded in the images. Some examples are shown in Figure 1.b. This has raised the issue of improving the detection of image spam using other techniques (Aradhye et al., 2005; Fumera et al., 2006; Dredze et al., 2007; Liu, et al., 2010; Biggio et al., 2011). In particular, several researchers have investigated the possibility of using generic low-level image features to recognize image spam with obscured images.

Content-based techniques are intended to study and analyze image features and content, such as color, texture, edge, shading, surface, etc are extracted from the image and that are used to filter spam images (Attar A, et al., 2013; Caruana and Li, 2012; Hosseini and Rahmati, 2015; Das and Prasad, 2014; Kamble and Malik, 2012; Mallikka and Balamurugan, 2018; Zamil et al. 2019).

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022)
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing