Ruben Verborgh is a Professor at Ghent University – imec. Ruben’s research is situated in the field of web technology, specifically in Linked Data and the Semantic Web. Rather than focusing on big data, he explores questions of small data, that is, what can we do with the huge amount of small datasets on the Web, be they open, public, private and everything in between.
Can you talk a little bit about the projects you are currently working on?
This all depends on how you de fine a ‘project’, but I can identify a number of recurring interests. The main driver for my research has been querying the web, so executing structured queries. I am very curious about how an agent can find stuff online or bring things together from different sources. This has been my ongoing research track. A concrete project within that track is Linked Data Fragments, in which we investigate query interfaces, query clients and so forth.
Related to this, I am of course interested in researching how we get data into a linked format in the first place. This basically amounts to exploring the whole life cycle of Linked Data: starting with raw data, converting it into Linked Data, publishing it online in an optimal format, querying it, making edits and tracing the provenance of the data. We looked into these matters when developing RML, or the RDF Mapping Language. Supporting and streamlining the life cycle of Linked Data also plays a prominent role in the projects that we do surrounding Smart Cities and e-governments.
A project that I have recently spent a lot of time working on, is the Solid project. This project is about redecentralizing the Web, or, to put it in a more tangible way: it is about giving people back control over their data in order to have a nicer web for everyone, be it individuals, developers, or companies.
Could you describe in more detail what is wrong with today’s Web?
In my opinion, the biggest problem is that a lot of creativity and innovation on the Web have been lost. Or at least, the rate of innovation on the Web is way too low in comparison with what it was in the early days and compared to what it could or should be. Initially, the Web was this big, unbounded space. It was permissionless – you did not need anyone’s permission to things or to build nice stuff. We lost that.
One of the primary reasons why we lost the ability to create and innovate is that companies have shifted their business strategies to become data harvesters. Rather than focusing on creating something new, many companies have switched to a model where they aim to harvest as much data as possible from different people. This model prevents others from innovation. Suppose for instance that you decide to go against this and start a company that does not rely on data harvesting. This is by no means easy, as a large portion of the market has been shaped around this model, and this makes it almost impossible to compete. I believe that this is a big reason why we simply do not see a lot of innovation or nice things any more.
Another consequence of this focus on data harvesting is that there are some serious problems with privacy on the Web. Now, you may have noticed that the issue of privacy is not the first problem that I have mentioned. That is because I think that the problem with privacy is a consequence rather than a cause. A lack of privacy is of course a huge consequence, and a very annoying consequence, but let us be honest: people have stopped caring about it—and many companies never did. We have been conditioned not to care about who reads our data, or what happens to it. So, the issue of privacy on the Web is a in as sense a lost battle, and I think that the important this now is to salvage creativity and innovation.
Let us be honest: people have stopped caring about privacy – and many companies never did.
In terms of data harvesting, I think it is striking that data are often characterized as ‘the new oil’. Data might indeed be like oil, but the problem is that we do not treat it as such. The data is there, but the only thing we do to it is put it into big barrels and it does not move. So I say that if data is really like oil, it should move around. The problem with the Web today is that it is no longer designed for data to flow around in an interoperable way. On the contrary: it is built to keep data in one place as much as possible. And in fact, if you look at the Web as a whole, there are only a very limited number of such places. So, apart from having no privacy, we have very little choice left of where we put our data.
How does the Solid project solve these problems? How can it again open up possibilities for innovation?
Solid sets out from the principle that data and applications should be separated. On today’s Web, the data and applications are mostly coupled. Twitter, Facebook and Google Drive for instance, are combinations of an application and the data that they own. In order to get that data flowing around freely again, we need to separate it from those apps and put it into a separate place.
What Solid proposes, is that data stay in a place chosen by their creator. If you create a piece of data, for instance a photograph, you should have the right to determine where it is stored. This right to determine where you store your data should be independent of how the data is eventually going to be used. This is a very simple principle, but it restores choice on the Web. Because I can store my data anywhere, I can choose any application to read the data, and I can also choose any host to save the data for me. In that way we get back a number of choices that we have been robbed from: I can choose to share my data with the app that guarantees the most privacy, I can choose the cheapest solution, or I can simply choose to use the app that has the fanciest buttons. Given back these choices to consumers also opens up possibilities for developers to innovate: they get to create new options for people to select.
Having a choice presupposes an ecosystem of storage services and apps. How will such an ecosystem come about and can we already see evidence of this?
This is a very good and difficult question to which I do not claim to have all the answers. But I can give you some circumstantial things, because if we are honest what you are really asking here is: ‘will this actually work?’
With Solid we are on a big mission. It is putting the Web as we know it today upside down. However, it is not putting upside down the Web as it was originally intended. It should also be clear that we are not trying to overthrow anything. That is why you will not hear me say things like Facebook is bad or Google is evil. In order for us to succeed, we have to provide an alternative and focus on delivering those systems or experiences that are currently lacking. Many of today’s platforms for instance are very bad at integration. To give you a simple example: it is impossible to share a Facebook picture with your Linkedin colleagues. This type of ‘over the wall’ integrations are missing, yet at the same time these are the kinds of experiences that are so important to people’s every day needs. We are thus aiming to create new value and offer possibilities that go beyond the current experiences people have on the Web. As I already mentioned, we are also not going to lure people by proclaiming that we are better at privacy, because no one cares. The reason people switch platforms is because they get better experiences, not privacy.
In order to make these new experiences a reality, we focus on enabling others rather than doing everything ourselves. As developers, we want to make sure that we establish an ecosystems that allows others to build nice things for people. We thus focus on creating tooling to help other developers get started quickly. In the Solid ecosystem developers will not need to harvest data in order to create an app. This can be additional incentive for programmers and developers, as not being part of the data harvesting field frees up developers to create something entirely new.
To a large extent, Solid also about agreements and specifications. If you ask me what Solid is, I would say that is an ecosystem, it is software, it is a community, it is many things. But at the core of everything are specifications, more specifically W3C (World Wide Web Consortium) specifications. These specifications detail how different Solid components work together. They are the glue that binds everything. Our use of specifications also sends a clear message: if you want to join this thing, then implement a specification and you are a part of it.
Could you expand some more on the social aspects of the project? How do you go about this community building?
For a project like Solid to be successful, a lot of different people are needed. There is of course technology, but that only takes us so far. To decentralize the Web, there is an entire socio-economic system to shift. We need the technology to prove that it can in fact change, but once the technology is there, all of the rest still needs to happen. We for instance need economists to think about new business models, and we need legal people to consider questions of data ownership and copyright. We also need designers, because on a decentralized Web, data is going to come from many different places, so we will need to cater to different user experiences as well.
We need artists to illustrate what is wrong with today’s model and to make this really tangible.
Something I am very serious about, is that we also need artists. I have come across some artists that are very good at illustrating what is wrong with today’s model and making this really tangible. One of those artists is Dries Depoorter, a Belgian artist whose work really makes you think about today’s data world. Similarly we also need writers and other content creators: people who want to publish their work on their own terms and in their own space. Solid could for instance offer an alternative to those creative minds that are limited by copyright terms and other restrictions on YouTube. Artists and educators can also play an important role in explaining Solid, in describing it and creating documentation. Apart from this, an important role is reserved for managers that can get people together and for entrepreneurs as well: people who want to jump into this and create something nice without having to depend on data harvesting. By naming all of these different profiles, I think I have covered about every person on the planet, which I think is correct. Redecentralizing the Web will be a major job for technologists, but I think only about half of the community will actually be concerned with development. I mean, if you just think about the Web today, how much of it is actually technology? A big part of course, there are browsers and standards in play, but in the end, it is all about the people and the content or applications they create. So yes, Solid will need the community, preferably a global one.
What could be the significance of Solid for the world of scholarly communication?
There is actually quite a lot to be said about this. The idea of Solid about like data ownership can be applied to many domains – and there are many details for each domain. Scholarly communication is of course very interesting. That said, you can also see that technology is only a small part of it – a lot is about changing mindsets.
Let me briefly sketch today’s situation. As a researcher myself, I am expected to publish. Now how does that work? I write something, generate data, do experiments and so on. I keep all of that private and then I submit it to a conference or journal (again in private), it is reviewed in private, and if accepted, it might get published in a book, maybe open access, maybe not. If I am a good researcher I will also publish my dataset or resources, (which still does not happen very often), and then finally it gets printed. This is the end of the process: we do not see the reviews, the work that I have done in in the meantime, nor do we see any comments that I might have received, and so on. Once the publication has happened, that is that.
Now, I outline what scholarly communication could look like. And I have to be honest here, this is not something Solid invented, but something that people within the scholarly domain already in the nineties hoped that would happen and it is actually really simple. It starts with the fact that I am a researcher and I make something and I publish it. In the nineties, people would have said: I publish it on your website. Today, that publication platform might be a Solid data space; it does not matter. From a user’s perspective you cannot distinguish them anyway. So I put my stuff in my space as I am writing it, along with my data generated from experiments, and so forth. All of this goes into my data pod. And then it is open for the whole world, I am paid by a public institution with public money, so everything I do is in public. And then I invite people to start commenting on my work, so I spread my work as widely as possible and people can start writing reviews, publishing comments, and so forth. Commenters also store their writings in their own data pod, so they keep it. I can then link back to what they write.
And that is basically it. I can publish new versions, do revisions, get feedback, this is all we need to do science. Now, you might be worried about (official) ‘publication’ and so forth. Well, it is published on my website and it is even published better because if people find mistakes after publication, I can fix them or retract certain things. This is not possible in the current process. Once you have the reviews, it is closed. A question you could then ask is who is going to give the stamp of quality. Well, the stamp quality is provided by the people who read the article and who provide comments. If I still want to have the article published centrally, I can perhaps share it with a journal or share it with a conference. They can have a look at it, the can look at the comments, they can invite other reviewers to also provide comments, and if they are satisfied, they might want me to present my work or have a copy of my work in a journal.
We have all the technology in place to transform our system of scholarly communication, we have had it since the nineties.
The point I am making is that scholarly communication should be more of a living ecosystem. Sadly, we have all the technology in place to do that, we have had it since the nineties, but a number of socio-economic factors are preventing us from using these systems. Right now my university, like many universities, evaluates me on the types of publications that I have in different journals. The problem with that system is that there is an entire industry surrounding those journals, including predatory journals, that is concerned with turning a profit rather than disseminating sound scholarship.
But it does not have to be that way. We do not have to depend on centralized institutions like publishers to put a stamp on our work and publish it. We all have the means to self-publish and we all have the means to comment on other people’s work. That said, there is quite some work left. For instance, my personal website might disappear, so it is my institution’s obligation to preserve a copy of what I write on my website or when I disappear. Continuing on that, it might also be the responsibility of a country to have a national library that aggregates all of the institutions’ repositories in case they disappear or merge. Different countries we might even have the responsibility to store backups of each other’s research in case something disappears. So the individual researcher is the source, but we still need copies and backups to make everything work fine.
If you had to pick one, which book would you recommend to the readers?
There only exists one perfect book in the world in my opinion. That book is called Trees, maps and theorems. Effective communication for rational minds by Jean-Luc Doumont. It is a brilliant book with regards to content and a brilliant book with regard to typesetting, getting messages across. It is one of the most beautiful books I have ever seen, one of the most meaningful books I have ever read as well. So that would be my number one recommendation. If you read anything as a researcher, but actually as a person communicating in general (so this is all of us), this is the one thing you should read. It is really, really good.
Ruben’s blog: https://ruben.verborgh.org/blog/