Analysing Twitter Data for Phishing Tweets Identification

Analysing Twitter Data for Phishing Tweets Identification

Falah Hassan Ali Al-Akashi
Copyright: © 2021 |Pages: 11
DOI: 10.4018/IJIIT.2021040105
Article PDF Download
Open access articles are freely available for download

Abstract

Detecting threats like adult, violent, and phishing tweets on online social networks is a crucial issue in recent years. The aim of the work is to identify phishing content from the users' perspective in real-time tweets. To outline such content comprehensively, lexicon analysis with sentiments are encapsulated to investigate tweets that yield phishing dynamic keywords, while some features and parameters are altered to optimize the performance. To support the preliminary study, the approach is rigorously designed to assemble users' opinions on completely different classes of phishing content. Each direct and indirect opinions as well as recently projected opinions are listed to characterize all sorts of phishing content. The authors use word level analysis with sentiments to build keyword blacklist lexicons. High promising results and high level of accuracy and performance are obtained experimentally if compared with the alternative algorithms.
Article Preview
Top

Introduction

Social media during a web 2.0 Era has evolved from monotonous social behaviour associated communication to an integration of social media functions for all types of services (Pellicer et al., 2013). In the past decade, more social network sites have sprung up and attracted lots of users. Among them, Facebook, QQ, Twitter are the foremost widespread ones with respectively 1,590 million, 853 million and 320 million active users as of May 2016 (Hall et al., 2000). With the quick growth of social network, they need to become the new target of several cyber criminals like spammers and phishers as many advertisers that have resulted in worrying problems. Some content typically designed to form potential victims on faux or counterfeit services or simply outright frauds (Chan et al., 1999). Botnets and Malware-infected computers are normally sent bulk messages, dangerous and deceptive content, as well as job-hunting advertisements and promotions for free vouchers, testimonials for few pharmaceutical products, etc. (Chen-Huei and Atich, 2008). Phishing is recognized as a special kind of content that is meant to trick the recipients into revealing their personal data particularly sensitive knowledge like login and parole details. When getting the non-public or account data, the phishers would breach the victims’ accounts and commit fraud. According to the networked insights analysis, as of fall 2014, there is 9.3% of content on Twitter are harm (Su e al., 2004), apart from these harm and phishing content, the social network additionally suffers from a great amount of calibre content as well as advertisements, mechanically generated content by third-party applications, etc. Users are hampered from browsing fascinating content by the overwhelming quantity of low-quality content, leading to important decrease within the overall user expertise of victimization on social network. In some cases, they will even have an effect on some vulnerable users with a syndrome known as “Twitter Psychosis” (Churcharoenkrung et al., 2005). Researchers have paid the abundant attention to detect the harmful content like phishing or sexuality whereas very little attention has been given to distinct a big amount of continual low-quality content that always bothers users. Only few of previous work were allotted from the users’ perspective; therefore, it is necessary to develop a unified technique to filter such low-quality content to improve the users’ needs rather than targeting the phishing content. The explanation we cover herein is tend to capture the content of victimization in terms of “low-quality content” rather than capturing the acquainted terms that typically wont indicate to a malicious content. However, in our perception, the malicious social networks have little proportional share to all low quality content. In different words, there are different varieties of low quality content besides phishing. Hence, to avoid any potential misunderstanding, we are going to use phishing content rather than low quality content. Considering that there's not a general accord regarding the definition of low quality content on social network, this adds some difficulties to detect enormous efforts due to various detection ways need more analyses. The question is how the options selected for detecting porn or phishing will still be economical once for detecting different varieties of low quality content. Additionally, in spite of these options could make a high detection rate, how they extracted in real time? The thought is once a tweet is announced; it would be delivered to any or all their followers at once. Thus, the demand period is kind of necessary for shielding and raising the user’s expertise once they are victimized the social network. As a matter of reality, several of this detection work is finished offline. Graph options adopted by Du et al. (2003) like the centrality scores of network positions “betweenness” while the redirection data adopted by (Polpinij et al., 2006) is overwhelming time, creating a troublesome to the idea of studying Internet contexts as a social science. Additionally, Polpinij et al. (2008) consumed abundant time once shrewd the carefulness of users’ behaviours. Apart from existing analysis works whose attention were targeted on the detection of malicious content like adult and phishing messages, the analysis objective of our work is that the detection of low-quality content on OSN features provides simple malicious content. Another highlight it might important to emphasize is that the projected options that performed to characterize low quality content are time sensitive for detecting serious periods.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing