A team of researchers has developed a sophisticated algorithm to detect harmful and abusive posts against women on Twitter that cuts through the rabble of millions of tweets to identify misogynistic content. Online abuse targeting women, including threats of harm or sexual violence, has proliferated across all social media platforms.
Now, researchers from Queensland University of Technology (QUT) have developed a statistical model to help drum it out of the Twittersphere. The team mined a dataset of 1 million tweets, then refined these by searching for those containing one of three abusive keywords - whore, slut, and rape.
Understanding the Context of the Tweet
The team's model identified misogynistic content with 75 percent accuracy, outperforming other methods that investigate similar aspects of social media language. "At the moment, the onus is on the user to report abuse they receive. We hope our machine-learning solution can be adopted by social media platforms to automatically identify and report this content to protect women and other user groups online," said Associate Professor Richi Nayak.
The key challenge in misogynistic tweet detection is understanding the context of a tweet. The complex and noisy nature of tweets makes it difficult. On top of that, teaching a machine to understand natural language is one of the more complicated ends of data science as language changes and evolves constantly, and much of meaning depends on context and tone.
"So, we developed a text mining system where the algorithm learns the language as it goes, first by developing a base-level understanding then augmenting that knowledge with both tweet-specific and abusive language," she noted.
The team implemented a deep learning algorithm called 'Long Short-Term Memory with Transfer Learning', which means that the machine could look back at its previous understanding of terminology and change the model as it goes, learning and developing its contextual and semantic understanding over time."
"Take the phrase 'get back to the kitchen' as an example - devoid of context of structural inequality, a machine's literal interpretation could miss the misogynistic meaning," Nayak said. "But seen with the understanding of what constitutes abusive or misogynistic language, it can be identified as a misogynistic tweet."
Scope for Expansion
Other methods based on word distribution or occurrence patterns identify abusive or misogynistic terminology, but the presence of a word by itself doesn't necessarily correlate with intent, said the paper, published in the journal Springer Nature.
"Once we had refined the 1 million tweets to 5,000, those tweets were then categorized as misogynistic or not based on context and intent, and were input to the machine learning classifier, which used these labeled samples to begin to build its classification model," Nayak informed.
The team hoped the research could translate into platform-level policy that would see Twitter, for example, remove any tweets identified by the algorithm as misogynistic. "This modeling could also be expanded upon and used in other contexts in the future, such as identifying racism, homophobia, or abuse toward people with disabilities," Nayak said.