Another week, another data-scraping leak: from Facebook to LinkedIn

[ad_1]

Is there anything left to be revealed about the extent and the frequency with which large volumes of personal data leak from Facebook?

A collective yawn seemed to be the appropriate response this month at the latest news. If the information about usersâ€™ social networks that leaked out in the Cambridge Analytica scandal was like the plutonium of social media, then this latest slip involved a decidedly low-grade fuel. Details such as names, phone numbers and birth dates of more than 530m people had been scraped from the site, in what amounted to a mass harvesting of data that was already publicly available.

The regulators, on cue, said they would investigate, as regulators must. Irish data protection officials, who take the lead in overseeing Facebook in Europe, now have 15 different reviews going on into the companyâ€™s apps.

But while this might look like a misdemeanour without any real victims, it raises more troubling questions. Even public material like this, combined with other data sets to build fuller profiles on people, can be used for malicious ends. And the case touches on a deeper issue: the growing volume of data that people release publicly as part of their digital lives â€” often after being nudged by the companies which benefit from the disclosures â€” can later be used in ways that hurt their own interests.

As if to underline the point, the last few days have brought other cases of mass data scraping to light. LinkedIn said public information taken from its service appeared to have been mixed with data from other sources and offered for sale online. And a trove of information from audio social network Clubhouse was discovered on a site used by hackers.

Scraping has been around since the early days of the internet, when potentially valuable information was first left in plain sight on public pages. But recently, the incentives and the opportunities have multiplied.

Social networks have become ever-larger repositories, presenting attractive targets for harvesters operating at scale. And the rise of machine learning has brought new incentives, as AI has turned the raw material into potential gold. The controversial US face-recognition company Clearview AI, for instance, uses a huge database of images scraped as the raw material for its service.

There are also more ways to scrape in volume. Many companies now make their data available through APIs, the digital â€œhooksâ€ that others can use to connect to their systems. This reflects the creeping automation in the information realm, as well as a common business strategy.

These days, companies often set their sights on becoming platforms, making themselves an indispensable resource for others. Becoming the go-to source for data on any subject is one way to achieve that. This might raise few misgivings for a company such as eBay, which wants to be seen as the definitive source for all product listings. But it is more troubling when personal information is at stake.

It is not only scammers who have seen the opportunities. The commercial value in publicly available data has also led to creative â€” and unwanted â€” uses. Data analytics company hiQ trawled LinkedIn, for instance, looking for tell-tale signs of who among the professional networkâ€™s users might be looking for a new job â€” then reported it back to the usersâ€™ employers.

Academics have also seen the value. A group at New York University drew a protest from Facebook last year when it scraped large amounts of information to study how political adverts were being targeted on the network. The NYU study may have had valid academic goals, but â€” as Cambridge Analytica showed â€” it is not always easy to tell when legitimate research is being used as a cover for something else.

In short, this looks like yet another instance where the design of todayâ€™s mass information systems has not always put users first, and where the guardians of the data have allowed their own interests to cloud their decisions.

To be sure, LinkedIn has fought back against hiQ, blocking its ability to harvest data. But it lost in court â€” and again on appeal â€” when hiQ sued for access. Others have seemed less troubled. Facebook initially brushed off its latest leak, saying only that it occurred before September 2019 and that it had fixed the issue, without reporting it to regulators or warning users.

Facebook also tacitly put some of the blame on its own users, saying they could protect themselves better by thinking more about what information they share publicly, and doing regular â€œprivacy check-upsâ€ to make sure they are not compromised. This ignores the fact that few people have the time or inclination to indulge in such digital hygiene, and are in no position to judge how what they disclose today might be used against them tomorrow.

[ad_2]

Source link