Oa5678 Stack
ArticlesCategories
Web Development

The Web's Missing Structure: Why Semantic Markup Matters and How We Can Finally Achieve It

Published 2026-05-04 23:55:10 · Web Development

The Web's Missing Structure: Why Semantic Markup Matters and How We Can Finally Achieve It

Since the 1990s, the World Wide Web has served primarily as a platform for publishing documents meant for human eyes. These documents, written in HTML, carry only a minimal layer of structure: a paragraph here, an emphasis there. Add a dash of CSS for visual flair—say, making all paragraphs tiny and gray—and the page becomes “stylish” (though perhaps illegible for older readers). That, in essence, is the extent of the web’s current structural vocabulary.

The Original Vision: Human-Centric Publishing

Imagine you mention a book on a web page. You might write:

The Web's Missing Structure: Why Semantic Markup Matters and How We Can Finally Achieve It
Source: www.joelonsoftware.com

Goodnight Moon
by Margaret Wise Brown
Illustrated by Clement Hurd
Harper & Brothers, 1947
ISBN 0-06-443017-0

To a human, this clearly signals a book’s details. But to a naive computer program, it’s just a jumble of text. The only structural hint is that the title is bold—a purely presentational cue. There is no semantic markup indicating that this is a book, let alone what its author, illustrator, publisher, or ISBN are. This is the norm across the web: human-readable but machine-ambiguous.

The Dream of a Semantic Web

As early as 1999, Tim Berners-Lee envisioned a web where computers could analyze data, links, and transactions. In his book Weaving the Web, he wrote:

“I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers… The ‘intelligent agents’ people have touted for ages will finally materialize.”

To realize that dream, we need to embed machine-readable metadata into web pages. One common approach is to consult schema.org for a definition of a “Book” and then use formats like RDF or JSON-LD to annotate the HTML. For example, you could mark up the book example above with explicit properties: name, author, isbn, etc.

The Challenge: Complexity and Motivation

Herein lies the rub: adding such semantic markup is fiddly and time-consuming. After crafting a beautiful, human-readable blog post, few developers have the energy to dive into the arcane syntax of RDF or JSON-LD. Unless there is already an automated agent consuming that data, the incentive is nearly zero. As a result, two decades later, semantic markup remains rare in the wild. The web is still a place where “structure” means little more than paragraph breaks and bold text.

The Web's Missing Structure: Why Semantic Markup Matters and How We Can Finally Achieve It
Source: www.joelonsoftware.com

This persistence of unstructured content holds back progress. Human knowledge—from bibliographic data to medical records to product catalogs—would be far more accessible if computers could understand it. But the effort-to-reward ratio is skewed. People will only add semantic markup if doing so is effortless and immediately beneficial.

Introducing the Block Protocol

Enter the Block Protocol. While the original article did not delve into specifics, the protocol aims to make semantic markup as easy as writing plain HTML. By providing a standardized way to embed structured data blocks—such as citations, events, or product details—directly into a page, it removes the burden of learning complex formats. Content creators can focus on what they do best: creating engaging text and media. Machines, in turn, can reliably extract structured data without guesswork.

The Block Protocol is not just another specification; it’s a shift in mindset. Instead of asking authors to be both creative writers and data architects, it offers a simpler path from human-readable to machine-readable. Early adopters are already using it to publish structured information about books, scientific papers, and more.

What Needs to Change

To truly make the Semantic Web a reality, we must reduce friction. The Block Protocol represents a promising step, but broad adoption will require:

  • Easy authoring tools that integrate with existing content management systems
  • Immediate feedback (e.g., rich previews in search results) to incentivize markup
  • Open standards that avoid vendor lock-in

If we succeed, the web will become a place where both humans and machines can thrive—a true web of data. The vision Tim Berners-Lee described in 1999 is within reach, provided we make semantic markup as natural as writing a paragraph. The Block Protocol may be the catalyst we’ve been waiting for.

Let’s stop settling for gray, tiny text and start building a web that understands itself.