• Blog
  • Informational

The Web as a CMS: How BBC joined Linked Open Data

September 10, 2016 6 mins. read Jarred McGinnis

The Web as a CMS: BBC editorial staff are contributing to MusicBrainz and Wikipedia instead of internal systems

I was looking at the slides from a recent talk by Paul Rissen, Senior Data Architect at the BBC, about the history of Linked Data usage at the organization. One of his slides, number 20 to be exact, reminded me of how quietly revolutionary the work at the BBC has been. The slide was titled ‘The Web as a Content Management System’.

First Successes

Early on the BBC decided not to mint their own IDs but to utilize existing URIs for musical artists from a freely available database MusicBrainz. For the uninitiated, a URI (Uniform Resource Identifier) is a way for the computer to identify a thing and it is one of the basic concepts in the Linked Data paradigm. You can read more about it is our post The Power of URI or Why Odysseus Called Himself Nobody.

Firstly, it instantly gave them a database of 50 million artists, albums and songs. This saved the BBC a huge time and expense. Each MusicBrainz entry has a link to yet another data source DBpedia, which has the text description of the artist from Wikipedia

That’s the ‘magic’ of linked data. By magic, I mean, exactly how a Knowledge Graph (of data) works. Everything is connected. Follow the links, gather the content.

The magic of linked data

Fast, Cheap and Out of Control

I can imagine what the conversations with the heads of editorial when the techies suggested the idea of using ‘wild’ data more often called Open Data. I’ve been involved in similar conversations. There is always a fear of losing control.

Editorial wants to create faultless content and it is hard for them to imagine that quality coming from anyone else but their team. The dilemma these days is how do you maintain that high-quality in an era of shrinking editorial budgets and ever-increasing amounts of data.

See what Jim Rayfield, Senior Technical Architect at BBC at that time, had to say in his post Sports Refresh: Dynamic Semantic Publishing about the complexity of the data the BBC Olympics 2012 side had to manage.

The ability to automatically and reliably make use of information on the web for free must have convinced the skeptics on the editorial side of the BBC. To give you an idea of just how much information is out there: DBpedia has data for 4.58 million things (e.g., people, places, music, film, video games, organizations, species, etc.). Wikidata, another general information data source, has 26 million similar kinds of ‘items’.

The Quiet Revolution of Linked Open Data

The use of Linked Open Data would have been one battle that would have been fought. BBC went further and made the strategic decision to also use its resources to help improve the MusicBrainz database. When errors were found, the BBC fixed the mistakes in the external data source and not within the walled garden of BBC’s ICT infrastructure where only the BBC could benefit from the organization’s editorial expertise.

Of course, the BBC’s charter requires the organization to provide ‘benefit’ to the public and contributing to the free and open MusicBrainz database fits nicely with that public service remit.

But regardless of its public service remit, this is a strategically smart approach and one of those quietly revolutionary ideas behind ‘the Web as CMS’. The BBC’s contributions add value to a resource, the MusicBrainz database.

That added value, in turn, makes that resource more attractive to others who will use it and further improve that data. This virtuous cycle is how Wikipedia became a ubiquitous part of our lives online. The BBC is one of the main beneficiaries of their altruism.

Today, the list of MuzicBrainz’ contributors includes names like last.fm, Spotify and Universal Music who inject Linked Open Data into their knowledge management infrastructure to enhance the effectiveness of their catalogs metadata.

The Quiet Revolution of using Linked Open Data

Ten Years On

Ten years after the BBC started down, the Linked Data path still makes some editors, and even IT directors, worried.

The lack of control is still a concern. Each time an organization looks at using Open Data, the same conversation has to be had. What about mistakes or deliberate errors introduced into the data sources? The importance is to be able to trace the provenance of the error.

Every organization will have a means to trace the source of an error that doesn’t change when you are using the web as your CMS. It’s just that you have a few extra thousand pairs of eyes also on the content who are more likely to catch the error and fix it before your relatively small team.

The choice to use Open Data is not an all or nothing proposition. Use what you need, ignore the rest. Of course, you can create a guarantee for the data, create a vetting process, track deltas, etc. You can even pull the data into the walled garden of your organization and never share and play nice with the rest of the community.

Just as there is a concern about the data coming in, people worry that they will lose control of the data going out. Rest assured, you can still make the business decision on what internal data you want to share and what you feel commands a premium.

Using wild data

It’s Getting Better All the Time

Those arguments were true ten years ago as they are today, but back then it was hard to convince organizations of the advantages of Open Data. That was ten years ago. The use of Open Data is commonplace now. There are only numerous examples proving the real value that Open Data provides. We have moved from the bold experiments of the BBC to ‘ignore at your peril’.

The scale of content and data that an organization must make sense of has long ago gone beyond what can be handled by one organization. The data problems that the giants like Google, Twitter and Facebook were dealing with ten years ago are the problems that all organizations are dealing with.

This has made it more likely that organizations can’t afford to manage data and content without making use of the data that exists openly and freely on the web. The simple but radical idea of ‘The Web as a CMS’ is increasingly the norm.

 

          New call-to-action

Article's content

Technical Author at Freelancer

Jarred McGinnis is a managing consultant in Semantic Technologies. Previously he was the Head of Research, Semantic Technologies, at the Press Association, investigating the role of technologies such as natural language processing and Linked Data in the news industry. Dr. McGinnis received his PhD in Informatics from the University of Edinburgh in 2006.

Human-computer Collaboration with Text Analysis for Content Management

Read about how knowledge-driven computing such as Ontotext’s content management solutions are essential for closing the semantic gap between humans and computers.

RDF-Star: Metadata Complexity Simplified

Read about how RDF-Star brings the simplicity and usability of property graphs without sacrificing the essential semantics that enables correct interpretation and diligent management of the data.

Knowledge Graphs for Open Science

Read about how knowledge graphs model the relationships within scientific data in an open and machine-understandable format for better science

Knowledge Graphs and Healthcare

Read about how industry leaders are using Ontotext knowledge graph technology to discover new treatments and test hypotheses.

Does Your Right Hand Know That Your Left Hand Just Lost You a Billion Dollars?

Read about how by automatically identifying and managing human, software and hardware related outages and exposures, Ontotext’s smart connected inventory solution allows banks to save much time and expenses.

Data Virtualization: From Graphs to Tables and Back

Read about how GraphDB’s data virtualization allows you to connect your data with the knowledge graph regardless of where that data lives on the internet or what format it happens to be in.

Throwing Your Data Into the Ocean

Read about how knowledge graphs help data preparation for analysis tasks and enables contextual awareness and smart search of data by virtue of formal semantics.

Ontotext Invents the Universe So You Don’t Need To

Read about the newest version of Ontotext Platform and how it brings the power of knowledge graphs to everyone to solve today’s complex business needs..

From Data Silos to Data Fabric with Knowledge Graphs

Read about the significant advantages that knowledge graphs can offer the data architect trying to bring a Data Fabric to their organization.

What Does 2000 Year Old Concrete Have to Do with Knowledge Graphs?

Read about how knowledge graphs provide a ‘human-centric’ solution to preserving institutional memory and avoiding operational mistakes and missed business opportunities.

Three’s Company Too: Metadata, Data and Text Analysis

Read about how metadata grew more expressive as user needs grew more complex and how text analysis made it possible to get metadata from our information and data.

The New Improved and Open GraphDB

Read about Ontotext’s GraphDB Version 9.0 and its most exciting new feature – open-sourcing the Workbench and the API Plugins.

It Takes Two to Tango: Knowledge Graphs and Text Analysis

Read about how Ontotext couples text analysis and knowledge graphs to better solve today’s content challenges.

Artificial Intelligence and the Knowledge Graph

Read about how knowledge graphs such as Ontotext’s GraphDB provide the context that enables many Artificial Intelligence applications.

Semantic Search or Knowing Your Customers So Well, You Can Finish Their Sentences For Them

Read about the benefits of semantic search and how it can determine the intent, concepts, meaning and context of the words for a search.

The Knowledge Graph and the Internet’s Memory Palace

Learn about the knowledge graph and how it tells you what it knows, how it knows it and why.

The Web as a CMS: How BBC joined Linked Open Data

Learn what convinced the skeptics on the editorial side of the BBC to try the simple but radical idea of ‘The Web as a CMS’.

Can Semantics be the Peacemaker between ECM and DAM?

Learn about how semantics (content metadata) can give peace a chance and resemble how humans understand and use the content.

The Future is NOW: Dynamic Semantic Publishing

Learn how semantically annotated texts enhance the delivery of content online with Ontotext’s News On the Web (NOW) demo.

Introducing NOW – Live Semantic Showcase by Ontotext

Discover interesting news, aggregated from various sources with Ontotext’s NOW and enjoy their enriched content with semantic annotation.