Some of the top 100,000 websites collect everything you type, before you hit submit
When you sign up for a newsletter, make a hotel reservation, or make an online payment, you probably assume that if you type your email address incorrectly three times or change your mind and X leave the page, it doesn’t matter. Nothing actually happens until you hit the submit button, does it? Well, maybe not. As with so many assumptions on the web, that’s not always the case, according to new research: a surprising number of websites collect some or all of your data when you enter it into a digital form.
Researchers from KU Leuven, Radboud University and the University of Lausanne explored and analyzed the top 100,000 websites, examining scenarios in which a user visits a site while in European Union and visits a site from the United States. They found that 1,844 websites harvested an EU user’s email address without their consent, and a staggering 2,950 logged a US user’s email in some form or another. Many sites apparently do not intend to log data, but integrate third-party marketing and analytics services that cause the behavior.
After specifically crawling sites for password leaks in May 2021, researchers also found 52 websites where third parties, including Russian tech giant Yandex, accidentally collected password data before to submit them. The group disclosed its findings to those sites, and all 52 cases have since been resolved.
“If there is a submit button on a form, you can reasonably expect it to do something – to submit your data when you click on it,” says Güneş Acar, a professor and researcher at the Radboud University digital security group and one of the leaders. of the study. “We were super surprised by these results. We thought we might find a few hundred websites where your email is collected before you submit it, but this far exceeded our expectations.
The researchers, who will present their findings at the Usenix Security Conference in August, say they were inspired to investigate what they call “leaky forms” by the media, particularly from Gizmodo., on third parties collecting form data regardless of submission status. They point out that, at its core, the behavior is similar to so-called keyloggers, which are typically malicious programs that record everything a target types. But on a mainstream top 1000 site, users probably wouldn’t expect their information to be keyed in. And in practice, the researchers saw some variation in behavior. Some sites recorded data keystroke by keystroke, but many captured full submissions of one field when users clicked on the next.
“In some cases when you click on the next field they collect the previous one, like you click on the password field and they collect the email, or you just click anywhere and they collect all the information immediately,” says Asuman Senol, a privacy specialist. and identity researcher at KU Leuven and one of the co-authors of the study. “We weren’t expecting to find thousands of websites; and in the US the numbers are really high, which is interesting.”
Researchers say regional differences may be related to companies being more careful about tracking users, and potentially even integrating with fewer third parties, due to the EU’s General Data Protection Regulation. . But they stress that this is only a possibility, and the study did not examine explanations for the disparity.
Through a substantial effort to notify websites and third parties that collect data in this manner, researchers have found that an explanation for some of the unexpected data collection may relate to the challenge of differentiating one “submit” action from others. user actions on certain websites. pages. But the researchers point out that from a privacy perspective, this is not an adequate justification.
Since the article ended, the group also made a discovery about Meta Pixel and TikTok Pixel, invisible marketing trackers that services embed on their websites to track users around the web and show them ads. Both claimed in their documentation that customers could enable “automatic advanced matching,” which would trigger data collection when a user submitted a form. In practice, however, the researchers found that these tracking pixels harvest hashed email addresses, a masked version of email addresses used to identify web users across platforms, prior to submission. For US users, 8,438 sites may have leaked data to Meta, Facebook’s parent company, via pixels, and 7,379 sites may be impacted for EU users. For TikTok Pixel, the group found 154 sites for US users and 147 for European users.
The researchers filed a bug report with Meta on March 25, and the company quickly assigned an engineer to the case, but the group hasn’t heard from an update since. Researchers notified TikTok on April 21 — they discovered TikTok’s behavior more recently — and have had no response. Meta and TikTok did not immediately return WIRED’s request for comment on the results.
“The privacy risks for users are that they will be tracked even more effectively; they can be tracked across different websites, across different sessions, on mobile and on desktop,” says Acar. “An email address is a very useful identifier for tracking, because it is global, unique, constant. You cannot delete it like you delete your cookies. It is a very powerful identifier.
Acar also points out that as tech companies seek to phase out cookie-based tracking as a sign of privacy, marketers and other analysts will increasingly rely on static identifiers such as phone numbers and email addresses.
Since the results indicate that deleting data in a form before submitting it may not be enough to protect you from being collected, the researchers created a Firefox extension called LeakInspector to detect malicious form collection. And they say they hope their findings will raise awareness of the problem not only among regular web users, but also website developers and administrators who can proactively check whether their own systems or any of the third parties they use collect data from forms without consent.
Leaky forms are just one more type of data collection to be wary of in an already hugely crowded online realm.
This story originally appeared on wired.com.