Looking Back at Linkscape’s Trillion + URLs (and Announcing our Latest Index Update) – Moz

Skip to content

Moz logo

Menu open

Menu close

Search

Products

Moz Pro

Moz Pro Home

Moz Local

Moz Local Home

STAT

Mozscape API

Free SEO Tools

Competitive Research

Link Explorer

Keyword Explorer

Domain Analysis

MozBar

More Free SEO Tools

Learn SEO

Beginner’s Guide to SEO

SEO Learning Center

Moz Academy

SEO Q&A

Webinars, Whitepapers, & Guides

Blog

Why Moz

Agency Solutions

Enterprise Solutions

Small Business Solutions

Case Studies

The Moz Story

New Releases

Log in

Log out

Products

Moz Pro

Your All-In-One Suite of SEO Tools

The essential SEO toolset: keyword research, link building, site audits, page optimization, rank tracking, reporting, and more.

Learn more

Try Moz Pro free

Moz Local

Complete Local SEO Management

Raise your local SEO visibility with easy directory distribution, review management, listing updates, and more.

Learn more

Check my presence

STAT

Enterprise Rank Tracking

SERP tracking and analytics for SEO experts, STAT helps you stay competitive and agile with fresh insights.

Learn more

Book a demo

Mozscape API

The Power of Moz Data via API

Power your SEO with the proven, most accurate link metrics in the industry, powered by our index of trillions of links.

Learn more

Get connected

Compare SEO Products

Free SEO Tools

Competitive Research

Competitive Intelligence to Fuel Your SEO Strategy

Gain intel on your top SERP competitors, keyword gaps, and content opportunities.

Find competitors

Link Explorer

Powerful Backlink Data for SEO

Explore our index of over 40 trillion links to find backlinks, anchor text, Domain Authority, spam score, and more.

Get link data

Keyword Explorer

The One Keyword Research Tool for SEO Success

Discover the best traffic-driving keywords for your site from our index of over 500 million real keywords.

Search keywords

Domain Analysis

Free Domain SEO Analysis Tool

Get top competitive SEO metrics like Domain Authority, top pages, ranking keywords, and more.

Analyze domain

MozBar

Free, Instant SEO Metrics As You Surf

Using Google Chrome, see top SEO metrics instantly for any website or search result as you browse the web.

Try MozBar

More Free SEO Tools

Learn SEO

Beginner’s Guide to SEO
The #1 most popular introduction to SEO, trusted by millions.
Read the Beginner’s Guide

How-To Guides
Step-by-step guides to search success from the authority on SEO.
See All SEO Guides

SEO Learning Center
Broaden your knowledge with SEO resources for all skill levels.
Visit the Learning Center

Moz Academy
Upskill and get certified with on-demand courses & certifications.
Explore the Catalog

On-Demand Webinars
Learn modern SEO best practices from industry experts.
View All Webinars

SEO Q&A
Insights & discussions from an SEO community of 500,000+.
Find SEO Answers

August 7-9, 2023
Lock in Super Early Bird savings for MozCon

Snag tickets

Blog

Why Moz

Small Business Solutions
Uncover insights to make smarter marketing decisions in less time.
Grow Your Business

The Moz Story
Moz was the first & remains the most trusted SEO company.
Read Our Story

Agency Solutions
Earn & keep valuable clients with unparalleled data & insights.
Drive Client Success

Case Studies
Explore how Moz drives ROI with a proven track record of success.
See What’s Possible

Enterprise Solutions
Gain a competitive edge in the ever-changing world of search.
Scale Your SEO

New Releases
Get the scoop on the latest and greatest from Moz.
See What’s New

New Feature: Moz Pro
Surface actionable competitive intel

Learn More

Log in

Moz Pro

Moz Local

Moz Local Dashboard

Mozscape API

Mozscape API Dashboard

Moz Academy

Avatar

Moz Home

Notifications

Account & Billing

Manage Users

Community Profile

My Q&A

My Videos

Log Out

December 2, 2009

Looking Back at Linkscape’s Trillion + URLs (and Announcing our Latest Index Update)

Moz News

As we rapidly approach the end of 2009 and opening of 2010, we’ve got a much anticipated index update ready to roll out gang.  Say it with me “twenty-ten”.  Oh yeah, I’m so gonna get a flying car and a cyberpunk android 🙂   …Ahem.  I thought this would be a great time to take a look back at the year and ask, “where did all those pages go?”  Being a data-driven kind of guy, I want to take a look at some numbers about churn, freshness and what it means for the size of the web and web indexes over the last year, and the hundreds of billions, indeed trillion plus urls we’ve gotten our hands on.
This index update has a lot going on, so I’ve broken things out section by section:

Analysis of the Web’s Churn (or why having ten trillion URLs isn’t very useful)
Canonicalization, De-Duping & Choosing Which Pages to Keep
Statistics on our December Linkscape Update
New Updates to the FREE SEOmoz API (and a 90% price drop on the paid API)

An Analysis of the Web’s Churn Rate
Not too long ago, at SMX East, I heard Joachim Kupke (senior software engineer on Google’s indexing team) say that “a majority of the web is duplicate content”. I made great use of that point at a Jane and Robot meet up shortly after.  Now, I’d like to add my own corollary to that statement: “most of the web is short-lived”.

 
After just a single month, a full 25% of the URLs are what we call “unverifiable”.  By that I mean that the content was either duplicate, included session parameters, or for some reason could not be retrieved (verified) again (404s, 500s, etc.).  Six months later, 75% of the tens of billions of URLs we’ve seen are “unverifiable” and a year later, only 20% qualifies for “verified” status. As Rand noted earlier this week, Google’s doing a lot of verifying themselves.
To visualize this dramatic churn, imagine the web six months ago…

Using Joachim’s point, plus what we’ve observed, that six-month old content today looks something like this:

What this means for you as a marketer is that some of the links you build and content you share across the web is not permanent. If you engage heavily with high-churn portions of the web, the statistics you monitor over time can vary pretty wildly. It’s important to understand the difference between getting links (and republishing content) in places that will make a splash now, but fade away, versus engaging in lasting ways.  Of course, both are important (as high-churn areas may drive traffic that turns into more permanent value), but the distinction shouldn’t be overlooked. 
Canonicalization, De-Duping & Choosing Which Pages to Keep
Regarding Linkscape’s indices, we capture both of these cases:

We’ve got an up-to-date crawl including fresh content that’s making waves right now. Blogscape helps power this, monitoring 10 million+ feeds and sending those back to Linkscape for inclusion in our crawl.
We include the lasting content which will continue to support your SEO efforts by analyzing which sites and pages are “unverifiable” and removing these from each new index. This is why our index growth isn’t cumulative — we re-crawl the web each cycle to make sure that the links + data you’re seeing are fresh and verifiable.

To put it another way, consider the quality of most of the pages on the web, as measured, for instance, by mozRank:

I think the graph speaks for itself. The vast majority of pages have very little “importance” as defined by a measure of link juice. So it doesn’t surprise me (now at least) that most of these junk pages are disappearing after not too long.  Of course, there are still plenty of really important pages that do stick around.
But what does this say about the pages we’re keeping?  First of let’s take out any discussion of the pages that we saw over a year ago (as we’ve seen above, there’s likely less than 1/5th of them remaining on the web).  In just the past 12 months, we’ve seen between 500 billion and well over 1 trillion pages depending on how you count it (via Danny at Search Engine Land).

So in just a year we’ve provided 500 billion unique urls through Linkscape and the Linkscape powered tools (Competitive Link Finder, Visualization, Backlink Analysis, etc.). And what’s more, this represents less than half of the URLs we’ve seen in total, as the “scrubbing” we do for each index cuts approx. 50% of the “junk” (including canonicalization, de-duping, and straight tossing for spam and other reasons). There’s likely many trillions of URLs out there, but the engines (and Linkscape) certainly don’t want anything close to all of these in an index.
Linkscape’s December Index Update:
From this latest index (compiled over approx. the last 30 days) we’ve included:

47,652,586,788 unique URLs (47.6 billion)
223,007,523 subdomains (223 million)
58,587,013 root domains (59.5 billion)
547,465,598,586 links (547 billion)

We’ve checked that all of these URLs and links existed within the last month or so.  And I call out this notion of “verified” because we believe that’s what matters for a lot of reasons:

Our own research on how search engines rank documents
Your impact on the web (as in traditional marketing) and ability to compare progress over time
Sharing reliable, trust-worthy data with customers, both for self and competitive analysis
Measuring progress and areas for improvement in search acquisition and SEO

I hope you’ll agree. Or, at least, share your thoughts 🙂
New Updates to the Free & Paid Versions of our API
I also want to call a shout out to Sarah who’s been hard at work on repackaging our site intelligence API suite.  She’s got all kinds of great stuff planned for early the coming year, including tons of data in our free APIs.  Plus she’s dropped the prices on our paid suite by nearly 90%.
Both of these items are great news to some of our many partners, including:

Buzzstream – a tool for social media, PR and link management
Brandwatch – a reputation monitoring tool
Grader.com – Hubspot’s popular site analysis tool
Quirk’s Search Status Bar
And at least three of these top “10 Link Building Tools for Tracking Inbound Links”

Thanks to these partners we’ve doubled the traffic to our APIs to over 4 million hits per day, more than half of which are from external partners!  We’re really excited to be working with so many of you.

Snag your MozCon 2022 video bundle for even more SEO insights.

Buy the video bundle!

Read Next

The MozCon 2022 Video Bundle Is Here (Plus, Our 2020 Videos are FREE!)

Read this post

Announcing the Local SEO Certification from Moz Academy

Read this post

Gather ‘Round the Campfire for the MozCon 2022 Day Three Recap!

Read this post

Comments

Please keep your comments TAGFEE by following the community etiquette

Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.

Moz logo

Contact

Community

Free Trial

Terms & Privacy

Jobs

Help

News & Press

Copyright 2022 © Moz, Inc. All rights reserved.

类似文章