Paywalls and SEO

As more and more publishers make the leap to monetising their content through paywalls, let's dive in to the pros and cons

Oct 04, 2024

Unless you live under a rock, I am sure you’ve noticed that more and more publishers are gating their content behind a paywall. Whilst you can enjoy the majority of online news for free, it is only a matter of time before more and more publishers start using a subscription model.

TL;DR

Many publishers, including The Independent and The Guardian, are adopting metered, hard, freemium, or smart paywalls to reduce reliance on traffic volumes
Paywalls don’t inherently harm SEO rankings, as Google can still access and index the content. But your engagement metrics will take a hit…
An effective implementation using client-side Javascript or serving different content based on the user-agent is key
Publishers must use isaccessibleforfree structured data to indicate paywalled content, ensuring transparency to users and search engines

Why are so many publishers moving to a paywalled model?

Why? Because it makes money. Any business that has customers on a contract becomes significantly more valuable.

The reason they work so effectively is that people want to belong. They want to feel part of something, and the publisher or paper you choose to read defines who you are. It’s a part of many people’s personalities. Why do people buy anything from specific brands?

It’s not necessarily because they offer the best deal or shipping service. It’s because the brand speaks to them like others don’t. Branding and SEO have never been more important.

Publishers in particular. If you’re purchasing from a publisher it’s based on their opinion. Unless you’re a sadist, I find it unlikely you’re going to pay for opinions you violently disagree with. You can do that on X (the artist formerly known as Twitter).

In the last 12 months, both The Independent and The Guardian have launched paywalls, with many more to follow. While I think this is an under-discussed topic in SEO, Barry Adams's Guide to Paywalls is fantastic and would highly recommend reading it.

The Athletic's paywall — Now part of the NYT, The Athletic has been a real paywall success story

As paywalls have evolved, there is no hard and fast rule for how they should work and how to get the most out of them. Publishers have success with metered paywalls, smart (or dynamic) paywalls and hard paywalls.

The only absolute must is that the planning, implementation and optimisation process must be planned out really effectively as if something goes awry, it can cost a lot.

How do paywalls affect SEO?

There are two schools of thought when it comes to whether paywalls have a negative impact. According to Google, adding a paywall to your content doesn’t cause a drop in rankings. If you look at any ‘ranking factors’ list popularised by SEMRush or Backlinko, you won’t find having a paywall on there in any form. Which in itself is interesting and not something I’d considered before writing this sentence. Surely it should? If not now, when?

However, I think we have passed having a definitive list of ranking factors. If you believe that Google uses UX and engagement metrics to help understand what pages most effectively answer a query (and you should), then any changes you make to a website or article are ranking factors.

“While paywalled content is not visible to users, it is still visible to web crawlers, so there’s no need to worry about it being perceived as thin.”
John Mueller

Ultimately, Google behaves like a cookieless user when it crawls your site. This means that for any sites with a metered or smart paywall, Google will have access to the entirety of every article for free. If you decide to use a lead-in method, where the headline, standfirst and first 80 or so words are accessible to everyone, you need to ensure that search engines can still access the rest of your article.

Given the prominence of paywalled brands that feature in the Top Stories box, gain a huge amount of Discover traffic and continue to dominate the news SERPs, it doesn’t seem to be a huge hindrance.

SEO reporting and tool stacks for publishers

Harry Clarkson-Bennett

October 2, 2024

SEO reporting and tool stacks for publishers

TL;DR Your business goals should define your reporting. As an SEO, your reports should still be ‘revenue’ first.

Read full story

What types of paywalls are there?

There are four main types of paywalls:

Metered paywall
Hard paywall
Freemium model
Smart or dynamic paywall

Metered Paywall

A metered (or soft) paywall grants a reader a certain number of articles per day, week or month that can be read for free before a paid overlay is enforced. Every month, anywhere between 3-10 is the norm.

This has been the go-to model for most publishers until recently, as it gives readers a chance to sample the product before they get pushed out of the VIP zone and forced to stand outside the red rope gazing in with the rest of the hoi polloi.

Hard paywall

The hard paywall does exactly what it says on the tin. No free articles each month. No freemium options for anyone. It is a hard and fast rule associated with only the most prominent brands where the quality and opinion are well understood.

Most hard paywalls keep the homepage and category pages open, but everything else is closed off to a select few.

Freemium model

A model associated with brands taking the first foray into the premium world. Some articles are left open to the public, whilst other areas of the site are closed off. To me, this is more commonly associated with brands like Which?, where detailed reviews and unique research are cordoned off.

Smart or dynamic paywall

The most contemporary paywall is the smart or dynamic paywall. We have been using this at The Telegraph for some time now and it is based on the user’s characteristics and their likelihood to subscribe. The more likely you are to subscribe, the more likely you are to be hit by a paywall and vice versa.

How do I implement a paywall?

You can implement a paywall in several ways, but the only two you should consider when it comes to maintaining SEO rankings are:

Using client-side Javascript
Serving paywalled or open HTML based on the user-agent

‘Client side' refers to everything in a web application that is displayed or takes place on the end-user’s device. The end user is known as the client. This includes what the user sees (text, images, gifs, videos etc) and any actions that an application performs within the user's browser.

Client-side Javascript

If you create a paywall using client-size Javascript, you can show a paywall to users whilst giving bots free access to the entirety of the article HTML. The nuts and bolts of the page.

If you have a metered paywall, you need to store how many articles someone has read. This is much easier and more cost-effective to do client side as opposed to server-side because you don’t need to store the data yourself. It’s also much more efficient than using a server-side modality, resulting in faster response times and less CPU usage.

News is a challenging topic for Google because it has to make a split-second decision based solely on its first crawl of the article, where it picks up the HTML only. In this scenario the paywall may not even be picked up, so the full article is crawled, rendered and indexed.

In your NewsArticle structured data, you set the "isAccessibleForFree" as "False" and you can include the entirety of your content in the articleBody section.

Serving different content based on the user-agent

In this scenario, actual people are served the paywalled version of the article, whereas Google (and other bots) can be served the full article complete with the appropriate structured data. Again, using your NewsArticle structured data, you must set the "isAccessibleForFree" as "False" and you can include the entirety of your content in the articleBody section.

Typically most websites do this using some kind of IP lookup and you need to ensure that you can verify Google crawlers as it crawls from multiple, unique IP addresses. You can find a list of IP addresses Google uses here.

To ensure ‘savvy’ internet users can’t circumnavigate the paywall, don’t allow Google to serve a cached page version. By using the noarchive meta tag, Google won’t generate a cached page that will likely not have a paywall.

What does Google say about paywalls?

If you offer any subscription-bound access to content (or indeed if users need to register to view it), then you must indicate that the content is not accessible for free and/or use a CSS selector to indicate that either the entire page or sections of it are paywalled.

The below markup Google uses as an example indicates that this article is not available for free and that the entirety of the article is behind a paywall. As indicated by "isAccessibleForFree": "False" and "cssSelector": ".paywall".

{
"@context": "https://schema.org",
"@type": "NewsArticle",
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "https://example.org/article"
},
(...)
"isAccessibleForFree": "False",
"hasPart": {
"@type": "WebPageElement",
"isAccessibleForFree": "False",
"cssSelector": ".paywall"
}
}

If sections of your article are free, but others are paywalled (very typical of lots of publishers when the first 80 words or so are open, known as lead-in content) then you need to use a CSS selector to indicate which sections are open and which are not.

{
  "@context": "https://schema.org",
  "@type": "NewsArticle",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://example.org/article"
    },
  (...)
  "isAccessibleForFree": "False",
  "hasPart": [
    {
      "@type": "WebPageElement",
      "isAccessibleForFree": "False",
      "cssSelector": ".section1"
    }, {
      "@type": "WebPageElement",
      "isAccessibleForFree": "False",
      "cssSelector": ".section2"
    }
  ]
}

Its not-so-explicit guidelines are:

JSON-LD and microdata formats are accepted methods for specifying structured data for paywalled content
Don't nest content sections
Only use .class selectors for the cssSelector property

How do you indicate your site has paywalled content?

The CreativeWork structured data type has a Property called isaccessibleforfree, which lets you give a ‘true’ or ‘false’ distinction as to whether the content is free or gated. This is vital in helping Google understand when your article is paywalled, making it as clear what you’re doing isn’t cloaking. The practice of serving different content to users and search engines, intended to manipulate rankings and mislead users.

See the below example of NewsArticle structured data written in JSON-LD from the Google Developer’s page on paywalled content.

<html>
  <head>
    <title>Article headline</title>
    <script type="application/ld+json">
    {
      "@context": "https://schema.org",
      "@type": "NewsArticle",
      "headline": "Article headline",
      "image": "https://example.org/thumbnail1.jpg",
      "datePublished": "2025-02-05T08:00:00+08:00",
      "dateModified": "2025-02-05T09:20:00+08:00",
      "author": {
        "@type": "Person",
        "name": "John Doe",
        "url": "https://example.com/profile/johndoe123"
      },
      "description": "A most wonderful article",
      "isAccessibleForFree": "False",
      "hasPart":
        {
        "@type": "WebPageElement",
        "isAccessibleForFree": "False",
        "cssSelector" : ".paywall"
        }
    }
    </script>
  </head>
  <body>
    <div class="non-paywall">
      Non-Paywalled Content
    </div>
    <div class="paywall">
      Paywalled Content
    </div>
  </body>
</html>

The required and recommended properties are as follows:

Google's required and recommended paywall markup properties

Can Google index content behind a paywall?

Yes, Google can index content behind a paywall. As long as you use the correct structured data and indicate to search engines that either the entirety of your content or sections of it are gated, you won’t have any problem.

The most important thing you can do post-implementation is to check how Google renders your content. Google must be able to see the full content behind your paywall, like any user who has access to the gated content.

Top tip: you need to ensure that Googlebot can still crawl and render your paywalled content, so use a combination of the URL inspection tool, Wayback Machine and review how the pages are rendered and cached.

A SERP showing the results for the term 'paywalled content' with the option to view the cached result — If you click the three vertical dots by the side of any search result you get the opportunity to view the cached result of any article

How does the SGE impact paywalled content?

As yet unknown, however… Google has recently updated their paywall guidelines to include a note on the SGE. In it, Google notes that the search generative overviews, reviews and snippets of information designed to help users (which may or may not be true) within the SERPs.

Whilst SGE snippets will still link to paywalled content, SGE while browsing will not surface key points for paywalled content. So it’s reasonable to presume that paywalled content will be utilised less than non-paywalled content in this brave new world. As ever, the most important thing to do is to ensure that crawlers can access all of your content.

Internal linking for news publishers

Harry Clarkson-Bennett

October 2, 2024

TL;DR Because who has time to read a full article?

Read full story

What websites have paywalls?

There are now hundreds of websites that have gated their content behind a paywall of sorts, with multiple implementation types and paywalls present. Primarily paywalls are associated with news publishers, but it that isn’t the be-all-and-end-all of gated content.

If you have a quality product that offers something unique (and want to reduce your reliance on Google’s seemingly worsening algorithm), then paywalling your content could be an excellent option. Any business is more valuable with repeat customers.

The Telegraph

The Telegraph uses a smart, Javascript-implemented paywall utilising the "isAccessibleForFree": "False", structured data marker to highlight this to search engines. The smart paywall is based on people’s propensity to subscribe and means certain users based on their location and demographic data will be shown somewhere between 0-10 articles for free. Worth noting that some areas of the site are accessible for free.

NYT

The New York Times gives you access to at least one article for free when you sign up, before you are hit with another smart paywall.

Below you can see how it is implemented in the markup. The NewsArticle schema type hosts the "isAccessibleForFree": "False" with a cssSelector used to signify the entire article is behind a paywall.

Schema markup from the NYT showing their paywall

Schem markup from the NYT showing their css.Selector

Which?

Interestingly I think Which? has the most unique approach to paywalls and doesn’t seem to signify that the site is behind a paywall to search engines. Which?’s hard paywall completely restricts access to core site content to anyone but signed-in users.

Interestingly, the only markup associated with this specific article is Product-focused. Even articles with gated content like this Best TVs to Buy in 2024 article have no paywall-specific markup attributed to it.