About the schema.org Project :: Mateusz Jabłoński - blog, podcast, kursy o programowaniu i rozwoju

About the schema.org Project

Quality first. High-quality code and high-quality content are two very important ingredients in the success of a website. Code alone will not defend itself, just as content alone will not - fortunately, there is a project that can help us with that. Ladies and gentlemen, meet Schema.org.

Publication date

13 February 2023

Mateusz Jabłoński

Frontend Developer with a passion. Husband in love. Proud father. Gamer by choice.

From this article you will learn:

What is search engine optimization?
What is the Schema.org project, and is it worth implementing?
What is structured data?
What is the difference between RDFa and JSON-LD?

Creating websites is, despite appearances, a fairly complex job. I do not mean only the act of writing code - that part alone is already difficult enough. If we go a little deeper into the question of why websites are created in the first place, we quickly reach the conclusion that they are sales and promotion tools for our activity. From that point it is relatively easy to draw a simple conclusion: I have to promote this project so that people know about it, talk about it, and eventually visit it and convert.

So how should we promote our website? Maybe a billboard at the entrance to my city with the web address on it? We can do that, but for IT products the effectiveness of this solution is relatively low. With this kind of advertising we reach everyone and build brand recognition, but we do not really sell. On top of that, we cannot directly measure visits to the website from such an ad. We would have to connect it with advertising activities on the Internet.

So how do we advertise online? There are many options: banner campaigns on partner websites, sponsored articles, SEM ads, social media ads, and SEO. The last one is probably the most interesting from the perspective of a website creator, because that creator has an influence on how SEO, meaning search engine optimization, will work. Of course, it does not depend only on them, but they do have an influence... and quite a significant one.

What is search engine optimization?

SEO is an acronym for Search Engine Optimization. The name itself already tells us that we are dealing with internet search engines. In broad terms, search engines work by browsing the Internet through special programs and indexing all the websites they encounter. An index, then, is a collection of all websites that are properly marked by search engine mechanisms so that later they can be matched as accurately as possible to the queries users enter into the search engine.

So where is the programmer's work in all this? If we take into account that a search engine is a program that analyzes our website, it may turn out that concepts such as semantic HTML, SSR, or performance metrics start to matter. The program should receive data in the most accessible form possible, so that it indexes the page according to the author's intention. How can we achieve that?

Search engines want to help

Reaching users' needs as accurately as possible is not only in the interest of the person managing the website, but also in the interest of search engines. Search engines provide guidelines on how to build a website so that it is optimized as well as possible for their mechanisms. A large part of these guidelines is connected with high-quality, well-structured code and with professional, unique content. I do not want to write here about copywriting or about building code specifically for search engines, but I do want to point to a project that was initiated in 2011.

In 2011, the three largest search engines at the time, Google, Bing, and Yahoo!, decided to create an initiative that would make the use of structured data on websites more consistent, with the goal of extending the ability of those sites to describe content, services, and products more effectively. That is how the schema.org project was born. A few months later Yandex, the largest search engine operating in the Russian Federation, joined the project.

As part of the project, it was proposed that data provided through schema.org would be used together with formats such as Microdata, RDFa, and JSON-LD.

Structured data to the rescue

The schema.org project is not the first project of this kind. People had tried to approach this topic earlier. It is worth looking at projects such as OpenCyc, from 2002, or FOAF, from 2005. All of them used structured data to extend the capabilities of websites in the right way. So what is structured data? It is nothing more than data - information, notes, descriptions, additional explanations - passed through the structures of markup languages such as HTML. Most often this is done in one of two ways: by creating new tags or by adding appropriate attributes to specific tags.

RDFa

Before we get to the way data is introduced with schema.org, it is worth going back in time just a little. In 2004, an extension for HTML was developed whose goal was to expand the semantic capabilities of structural languages. RDFa, meaning Resource Description Framework in Attributes, was proposed by Mark Birbeck. He promoted it over the following years at various conferences. As part of the work on XHTML, the RDFa standard was added to version 2.0. Initially, the standard was available only for XML-based languages.

In 2008, RDFa received the status of a W3C recommendation. Further iterations were released until 2015, when version 1.1 made the standard available for languages other than XML, for example HTML5.

We have bitten into the topic from a slightly historical angle, but what is all this really about? The answer is simple: attributes added to HTML tags. For example:

if we want to describe the origin of a given resource, we can use attributes such as src, href, or resource;
if we want to define the relationship a given resource has to our page, we can use the rel attribute;
if we want to describe the properties of our data, we have the property attribute for that.

These three are the most basic uses, although there are a few more possibilities. Advantages? Here they are: our code becomes more semantic, we do not duplicate data when we use XML and HTML within a single project, and HTML and RDFa remain independent of each other.

JSON-LD

A few years later, in 2010, Mark Birbeck proposed using the JSON standard for similar purposes to RDFa. JSON-LD stands for JavaScript Object Notation for Linked Data. Linked Data is simply structured data that is properly connected with other data. This kind of data is most often used to prepare organized data sets. The concept was coined in 2006 by Tim Berners-Lee as part of the Semantic Web project. An interesting detail is that part of the "linked data" vision is to turn the Internet into a global database.

JSON-LD was designed to provide additional support for RDFa. The two standards do not exclude each other. By using JSON notation, JSON-LD lets us describe relationships between different resources on a page much more precisely. JSON-LD introduced the concept of a context, which allows us to specify how the data passed in the JSON-LD structure should be interpreted. The context helps determine what particular data means and in what form it has been provided.

Schema.org

The schema.org project assumes that search engines will be able to match a website to users more effectively. By using the formats described above, we can implement this project within our site. For example, if we want to describe a particular resource, we should first find the appropriate definition in the schema.org vocabulary, and then, through microdata, RDFa, or JSON-LD, add the relevant information in the website code. Microdata is a standard similar to RDFa, with slightly different attributes available. The example below shows a movie definition in HTML code using the microdata format.

html

<div itemscope itemtype="https://schema.org/Movie">
  <h1 itemprop="name">Avatar</h1>
  <div itemprop="director" itemscope itemtype="https://schema.org/Person">
  Director: <span itemprop="name">James Cameron</span> (born <span itemprop="birthDate">August 16, 1954</span>)
  </div>
  <span itemprop="genre">Science fiction</span>
  <a href="../movies/avatar-theatrical-trailer.html" itemprop="trailer">Trailer</a>
</div>

Summary

Statistics say that only 30% of websites have implemented the schema.org standard. In addition, according to SEO specialists, the implementation itself does not have a major impact on a page's position in search results. However, it has an invaluable impact on the quality of users who arrive on the page - there is a much greater chance that the person who lands there will not be completely random. Personally, I think it is worth spending a bit of time on implementing it. After all, every website creator should care about quality.