Google has quietly updated Gmail with a new spam filter that the company says does a better job of flagging junk messages and phishing emails.
The new spam filter is based on an “RETVec,” a newly developed text vectorizer that can map words into vectors or numerical representations. Developers have long used text vectorization to help computer models interpret and classify human language, including whether an email may be spam or not.
The problem is that current text classification models can still struggle to identify scams and phishing attacks. That’s because cybercriminals are creating the content to bypass the defenses, for example, using non-Latin characters to create links to reputable brands. In addition, text classification models can require “large dictionaries” and computing resources to flag the malicious content or understand typos, the company’s researchers wrote in a paper.
(Credit: Google)In response, Google developed RETVec, which is trained to detect and understand character-level manipulations, including typos in a piece of text, while also reducing the computing cost.
"RETVec embeddings are trained using pair-wise metric learning, ensuring that words containing typos are embedded close to the the original word," Google's researchers wrote.
Over the past year, Google has also been testing RETVec inside company systems "to evaluate its usefulness and found it to be highly effective for security and anti-abuse applications,” the company wrote in a blog post. The results show RETVec improved spam detection by 38% over Gmail’s previous filter.
(Credit: Google)At the same time, RETVec reduced the false-positive rate by 19% while using 83% less computing resources. This has made the “RETVec deployment one of the largest defense upgrades in recent years,” Google adds. The same system works for over 100 languages, including English.
“Due to its novel architecture, RETVec works out-of-the-box on every language and all UTF-8 characters without the need for text preprocessing, making it the ideal candidate for on-device, web, and large-scale text classification deployments,” the company says.
In addition, Google has made RETVec open source, allowing other developers to incorporate the system as text classifier as well.