Breaking up Facebook? Try data literacy, social engineering, personal knowledge graphs, and developer advocacy

Yes, Facebook is a data-driven monopoly. But the only real way to break it up is by getting hold of its data and functionality, one piece at a time. It will take a combination of tech, data, and social engineering to get there. And graphs -- personal knowledge graphs.
Written by George Anadiotis, Contributor

The relationship with Facebook has gone from infatuation to passive-aggressive. People love to hate Facebook, often by sharing angry posts on Facebook itself, usually to return to it shortly afterward. The truth is that Facebook has made it pretty easy for people to take issue with it, with blunder after blunder exposing its practices on data management, user rights, privacy, transparency, and control. 

Last week it was friendly fire, so to speak, as Facebook's co-founder Chris Hughes called for Facebook to be broken up by the FTC. Hughes pointed out the fact that Facebook is, in effect, a monopoly. He was not the first one to do so. 

TheEconomist was a forerunner, notable for the fact that it pointed to the real nature of the monopoly, not just including Facebook: if data is the new oil, Big Tech owns the oil rigs. Data-driven products such as Facebook harvest data, and use it to enhance their product, and harvest more data. 

As noted by ZDNet's Larry Dignan, however, it's questionable whether the FTC would consider Facebook a monopoly. It may take a while, if it ever happens. It's also questionable whether a breakup would work, if it ever happened -- as noted by internet pioneer Jaron Lanier, business models based on services in exchange for data are broken, and invariably lead to asymmetry. So, Facebook clones based on the same model would probably not be substantially different.

To fix this, it will take a combination of tech, data, and social engineering.

Data literacy: Paying the price of data

The surveys on people's attitude toward data collection are telling. Respondents want more privacy, but are not willing to pay for it. A big part of them would be willing to give out even more data in exchange for free services. This speaks volumes on the state of data sovereignty and awareness in the world right now. Even though being profiled on an individual level should be alarming in and by itself, it's the bigger picture that matters most here.

Your data is the oil that fuels Facebook. With every scroll, click, like, read, you are feeding its monopoly and voting for its practices with your thumbs. Although to be fair, Facebook tries its best to get your data even if you don't use it, or have an account in the first place.

The amount of data Facebook has amassed via its practices means it has many legs up in the data and AI race. In a world increasingly ran via machine learning algorithms, the importance of data to train those algorithms is paramount. Facebook's data trove has also attracted lots of talent, as researchers need data to make algorithms work.

This, however, brings us to a very real issue. As Hughes pointed out, after the Cambridge Analytica scandal, the backlash against Facebook has intensified. Yet #deletefacebook has not lead to much change, as lack of viable alternatives means people are largely left with two options: Giving up social networking altogether, or keep using Facebook. 

The machine learning feedback loop

But is there really no alternative to Facebook?

To answer this question, we must start by breaking up Facebook in terms of functionality. Facebook really is a conglomerate of functionalities all tied together, further expanded via its acquired ecosystem -- WhatsApp and Instagram. Media and form factors aside, it all comes down to a few core things: Contacts, 1-on-1 messaging, open and closed group posts, public and restricted newsfeed posts, and events.

Though few alternatives offer all of those in the seamless way Facebook does, alternatives do exist. WhatsApp used to be one of those, but other messaging apps with contact management and support for groups and channels exist, too. Some of them, like Telegram, Mastodon, or Diaspora, are open source, too, and they give users the option to have more control of their data. But they are mostly used by people who have been deplatformed, or fringe groups. So, why have they not caught up?

Facebook alternatives across the Fediverse

Fragmentation is one reason. If some of your contacts use AppA, while others use AppB, how is it possible to communicate across them? But that's no longer an issue -- not if you choose an application from the Fediverse, or the Federation, or the Activity Web, where these applications live. These are applications built on open communication protocols and are able to interoperate with each other.

Though each application comes with its own UI/UX and perks, supporting those open protocols ensures that basic functionality such as following or posting works across all of them. This means you can keep up with people across applications. Pretty cool, I hear you say. It is -- but it's not good enough. Facebook's core functions should not be too hard to replicate, and many alternative social networking applications have done this.

But part of the reason these applications have not gotten mainstream yet is that, frankly, many of them are not ready for it. The functionality is often limited and wonky, UI/UX is not ideal, and a good number of those applications are based on old programming frameworks that are having trouble keeping up with modern development and attracting contributors. Many of them have no mobile app counterparts to speak of, for example. 


There is an expanding ecosystem of alternative, open source social networking platforms. The good news is these platforms give users great control and offer interoperability. The bad news is they are not as easy to use as Facebook. Image: Sean Tilley

Regardless of their (varying) quality of implementation, however, these applications have a different philosophy altogether. They are typically not hosted by commercial entities shouldering the burden of installing, configuring and keeping the back-end running.

This means that potential users can do one of two things: They have to find or rent from a hosting or cloud provider, some machine to use as their personal server, and then run the software there. Or, they need to find someone who runs a server node of the application they want to use, and get an account they can use on their node.

The first option is not feasible for the vast majority of users. Although efforts such as the Freedom Box promise to make hosting your own applications easier, they are not yet at a point where they can offer seamless functionality.

The second option may be more realistic. Families, friends, and communities of all sorts could all chip in and either have their application of choice ran and use it as a hosted service, or run their own if they are savvy and motivated enough.

Both options require paying a price, however. Not just in terms of paying a fee to cover operational expenses of keeping the software running. Perhaps, more importantly, paying the price of taking ownership and control of your data. Or trusting whoever is running your software of choice with your data. And doing the social engineering required to make something like this work.

Do you like social engineering?

Yes, it will take some social engineering to make this work. Even with marketing budgets in place, getting something off the ground is not easy. There always is a cost of switching, and besides having to advocate to a number of contacts, Facebook sure does not make it easy to just take your data and walk away.

To begin with, until recently, that was not even an option. Now there is data export functionality in Facebook, but the data you get is far from complete and usable to boot with. The format is proprietary and undocumented, forcing anyone who wants to use that data to write custom parsers and adapters. And there is crucial information missing from the export. Most prominently, likes. 

You may have heard the psychographics, aka "5 likes are enough to profile you" mantra, as well as the fact that Facebook and many of its partners, authorized or otherwise, have access to that data. Users who generated the data, however, do not have access to it. All you get in the data export is the fact that you liked something -- not what it is! And that's without even mentioning all the other information Facebook does not export.

This isn't just data that could help you bootstrap another application. It's data you, or anyone you authorize, may use to feed machine learning algorithms. Facebook makes sure you don't have it, GDPR or no GDPR. So, there is a role for regulation there: Regulation should force Facebook to give users all the information it has on them at the push of a button. Deleting your account and all of its data should be equally easy, too. (Surprise: it is not).

To be clear, that would not be the end all in being able to move away from Facebook -- or any other platform, for that matter. But it would be a major step in making it possible. It would give users the option of doing as they please with their data. Facebook displays the same kind of contempt to calls for transparency by the US and the UK administration, as it does to calls from users to give them access to their data. Maybe starting with data sovereignty for user data would lead to access to advertising data, too.

Personal knowledge graphs

Data sovereignty is precisely the vision behind Solid, the application development framework championed by Tim Berners Lee. Berners Lee, credited as the inventor of the web, is working on Solid with the help of startup Inrupt, and contributors from the open source and research community.

Solid is working on developing so-called pods. You can think of pods as personal data vaults, keeping all your data in one place. From there, you can authorize applications to use the data, giving read or write access to your pod. And pods could be hosted on your own machine, or in the cloud.

It sounds great, but it's tricky. Besides all the non-technical reasons, there are technical obstacles, too. If this caught up, it would mean applications would have to fetch data from pods all over the web and the cloud to work. Querying in distributed environments is notoriously hard. But Solid has an ace up its sleeve there: knowledge graphs. 


Assembling data for an application from many sources is challenging. Personal knowledge graphs built on the Linked Data stack can help achieve that. Image: Ruben Verborgh

Knowledge graphs are a rebranding of a technology that goes back 20 years. Started out as Semantic Web, rebranded as Linked Data, now going by the knowledge graph moniker, this technology enables a number of things, including federated querying.

The Linked Data stack (RDF, URIs, and SPARQL) can make any piece of data accessible and queryable on the web. Solid is based on this technology, effectively aiming to build a personal knowledge graph for each of its users. Needless to say, this is quite ambitious, which may help explain why there is still no production-ready software for Solid.

SPARQL is more than a query language. It also is a protocol, and can be used to execute federated queries across many endpoints on the web. But this is far from trivial, both from a performance and from a usability point of view. Using SPARQL and the Linked Data stack for this type of application has been challenging even for connoisseurs. Will things be different this time around?

Developers, developers, developers

It looks like the Solid team has the mindset and the people to make it work, while the tech is work in progress. Ruben Verborgh, Semantic Web professor, researcher, and Inrupt technology advocate, together with his team have been working on enriching the Linked Data stack with tools to address modern developers.

Their tools are based on JavaScript and frameworks such as React, and aim to make programming Solid a seamless experience. Verborgh and team are trying to meet developers where they are. Rather than expecting developers to switch to the Linked Data stack, they are giving them tools to build applications on Solid. This is smart, pragmatic, and a bit ironic, considering React's origins in Facebook. But will it work? 

Berners Lee's leverage could help, but it will take more than good intentions to succeed. In 2009, Berners Lee had TED audiences chanting in support of his previous project, Linked Open Data. Linked Open Data is about making open data available on the web as Linked Data. Today, open datasets may be growing, but the way these datasets are made available is different.

Getting to the hearts and minds of developers is key to the success of Solid, as much as it is for any other software project today. Getting Solid in production-ready shape, and building applications on top of it, largely depends on it. Attracting developers is a fine art. Deep pockets, which we don't know whether Solid has, is a good starting point, but it does not necessarily guarantee success.

But just think of the possibilities that open up by putting the Fediverse and Solid together. This could be the key to social networking minus the dictator in the middle, data sovereignty, and a whole new ecosystem of innovation. Let's hope Solid really gets solid soon.

Facebook's worst privacy scandals and data disasters

Editorial standards