Linkscape’s September Update + Feedback – Moz

Skip to content

Moz logo

Menu open

Menu close

Search

Products

Moz Pro

Moz Pro Home

Moz Local

Moz Local Home

STAT

Mozscape API

Free SEO Tools

Competitive Research

Link Explorer

Keyword Explorer

Domain Analysis

MozBar

More Free SEO Tools

Learn SEO

Beginner’s Guide to SEO

SEO Learning Center

Moz Academy

SEO Q&A

Webinars, Whitepapers, & Guides

Blog

Why Moz

Agency Solutions

Enterprise Solutions

Small Business Solutions

Case Studies

The Moz Story

New Releases

Log in

Log out

Products

Moz Pro

Your All-In-One Suite of SEO Tools

The essential SEO toolset: keyword research, link building, site audits, page optimization, rank tracking, reporting, and more.

Learn more

Try Moz Pro free

Moz Local

Complete Local SEO Management

Raise your local SEO visibility with easy directory distribution, review management, listing updates, and more.

Learn more

Check my presence

STAT

Enterprise Rank Tracking

SERP tracking and analytics for SEO experts, STAT helps you stay competitive and agile with fresh insights.

Learn more

Book a demo

Mozscape API

The Power of Moz Data via API

Power your SEO with the proven, most accurate link metrics in the industry, powered by our index of trillions of links.

Learn more

Get connected

Compare SEO Products

Free SEO Tools

Competitive Research

Competitive Intelligence to Fuel Your SEO Strategy

Gain intel on your top SERP competitors, keyword gaps, and content opportunities.

Find competitors

Link Explorer

Powerful Backlink Data for SEO

Explore our index of over 40 trillion links to find backlinks, anchor text, Domain Authority, spam score, and more.

Get link data

Keyword Explorer

The One Keyword Research Tool for SEO Success

Discover the best traffic-driving keywords for your site from our index of over 500 million real keywords.

Search keywords

Domain Analysis

Free Domain SEO Analysis Tool

Get top competitive SEO metrics like Domain Authority, top pages, ranking keywords, and more.

Analyze domain

MozBar

Free, Instant SEO Metrics As You Surf

Using Google Chrome, see top SEO metrics instantly for any website or search result as you browse the web.

Try MozBar

More Free SEO Tools

Learn SEO

Beginner’s Guide to SEO
The #1 most popular introduction to SEO, trusted by millions.
Read the Beginner’s Guide

How-To Guides
Step-by-step guides to search success from the authority on SEO.
See All SEO Guides

SEO Learning Center
Broaden your knowledge with SEO resources for all skill levels.
Visit the Learning Center

Moz Academy
Upskill and get certified with on-demand courses & certifications.
Explore the Catalog

On-Demand Webinars
Learn modern SEO best practices from industry experts.
View All Webinars

SEO Q&A
Insights & discussions from an SEO community of 500,000+.
Find SEO Answers

August 7-9, 2023
Lock in Super Early Bird savings for MozCon

Snag tickets

Blog

Why Moz

Small Business Solutions
Uncover insights to make smarter marketing decisions in less time.
Grow Your Business

The Moz Story
Moz was the first & remains the most trusted SEO company.
Read Our Story

Agency Solutions
Earn & keep valuable clients with unparalleled data & insights.
Drive Client Success

Case Studies
Explore how Moz drives ROI with a proven track record of success.
See What’s Possible

Enterprise Solutions
Gain a competitive edge in the ever-changing world of search.
Scale Your SEO

New Releases
Get the scoop on the latest and greatest from Moz.
See What’s New

New Feature: Moz Pro
Surface actionable competitive intel

Learn More

Log in

Moz Pro

Moz Local

Moz Local Dashboard

Mozscape API

Mozscape API Dashboard

Moz Academy

Avatar

Moz Home

Notifications

Account & Billing

Manage Users

Community Profile

My Q&A

My Videos

Log Out

By: Rand Fishkin
September 17, 2011

Linkscape’s September Update + Feedback

Moz News

The author’s views are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Last week, we launched a new Linkscape update with data crawled and indexed in August. Several folks noticed some significant changes in this index, particularly in link counts and some PA/DA metrics. I wanted to take some time in this post to talk about Linkscape’s data, our process, some of the challenges we’re facing and what you can expect to see with the index over the next several months. Before we do that, here’s the stats on the latest update: 45,200,112,724 (45.2 billion) URLs 425,981,698 (425 million) Subdomains 98,785,848 (98.7 million) Root Domains 373,046,145,690 (373 billion) Links Followed vs. Nofollowed 2.22% of all links found were nofollowed 58.7% of nofollowed links are internal, 41.3% are external Rel Canonical – 10.12% of all pages now employ a rel=canonical tag The average page has 80.08 links on it 66.71 internal links on average 13.37 external links on average If you’ve been paying close attention to the stats on the Linkscape index updates, you might have observed that for the past year, domain diversity (the quantity of root domains in the index) and overall size (the number of unique URLs) appear to have an inverse relationship. When we have larger indices, we crawl fewer domains and when we crawl more domains, we tend to have fewer pages from them. Here’s a graphical comparison starting in August of last year: As you can see, when we’ve crawled a larger number of unique domains, we’ve crawled fewer individual URLs. This has long been a frustration and an artifact of some of the systems that we’ve used to build the service. In April of this year, we began testing a new system for crawling that we hope will enable us to reach both depth and breadth, but there’s a lot of complex, hard-to-build steps we need to take first to scale processing, fix bugs and streamline Linkscape’s architecture. Our VP Engineering, Kate, recently addressed this in a Q+A on the topic: Hi everyone!I just wanted to add a quick response to shed a bit more light on the situation. Last year we started a on a project to drastically improve our index. The first part of that was to make our crawler discover more of the web – this included crawling deeper on domains, discovering more links faster (freshness), and contain more links overall.BackgroundTo understand the changes, it might help if I explain how our crawler used to work and how we changed.Our crawler used to crawl the web (for 3-4 weeks), then we would compute the link graph and create all the lists of links, and metrics you see in Open Site Explorer – this is what we called processing (and it would take 2-3 weeks). As part of processing we would select the top 10 billion urls to crawl, and then start crawling those.The problem with this system was that the data was could be 7-8 weeks old (crawling time + processing + deployment to the API and OSE). It also wasn’t recursive – meaning that we would only discover new links when we did the processing of that crawl, so it could take us several months before we would see new links that were deeper in domains.The ChangesWe modified our crawler so we were crawling all the time – we crawl sites every day, or week, or month – based on authority. As we crawl those site, any new links that we find are added to one of the buckets, and will be crawled typically within that same index. This is exciting because we can go deeper, discover more links, and produce a higher quality index. The other benefit, is that since we are crawling all the time, we can just take a snapshot of that crawl and run processing – without waiting for the last round of processing to finish – and this means we can update the index more often.However, in June, we had a problem with the old crawlers, and we had to roll out our new version of the crawl and index with the OSE launch on July 27th. So even though our testing looked good when we released the new index, and correlations were higher than the old crawl, we got complaints about things that were wrong.The IssuesBinary files were in the index – There are normally only supposed to be links in the index, but because the new crawler went very deep on some domains we started discovering all sorts of binary files, which when parsed, produced lots of weird links. So domains had all these links from sites that didn’t link to them. We fixed this issue, and this is the first index with the fix.We went too deep on big domains – There are a lot of knobs to turn on the new crawlers – from the number of sites we crawl daily/weekly/month to how many links we keep for different domains. One of the first things we noticed with this new crawl, was that we had less domains in our index. So we dialed down how many urls could come from a domain – and this new index also contains that change.What We Are DoingWe recognize that all of you depend on this data. And we take the index quality very seriously.We have already made a lot of other changes, increasing the overall size and adjusting how we crawl. However, since it still takes 2-4 weeks to process an index, so some of those changes won’t be seen for another 2-4 weeks yet.We are also working on an updated, higher correlating Page Authority/Domain Authority that should be out in a month or two – but also may jump around a bit.What You Can DoDefinitely keep sending us feedback. It really helps us understand where we may have missed in our testing, and what we can do to fix it. And thanks again for your patience – we really want to deliver the best possible Linkscape for you, and I assure the team is working nights and weekends to address these concerns. And if anyone has questions you can always email me or our help team (which tend to respond to emails much faster), as all of us care a lot and really want to hear your feedback.Thanks again,Kate On Friday night, I stayed late at the office with a number of folks from the Linkscape team (pictured below during their morning standup):(clockwise from Martin in the center; Alec, Phil, Brandon, Carin, Matt and Walt)There are big, tough problems around building a web index, particularly on a budget like ours vs. those of Google or Bing. We brainstormed a lot of ideas, but the big challenge comes down to this: Any change we make today won’t be observable for at least 5-6 weeks, making for a very slow iteration process. In software engineering, the faster your iterations and the faster you know the impact of your changes, the faster you can improve. Linkscape is not providing a fast feedback loop today, and we know we need to address that before we invest tons of efforts in improvements that “might” have a positive impact.I can promise, however, that the team of engineers working on this are among the smartest, most capable, diligent and passionate people I’ve ever worked with or met. We know there’s going to be 3-4 more months of hard slogging and indices of only moderately improved quality before we reach the levels we really want (our internal goal is 100 billion URLs in an index while maintaining domain diversity above 110 million root domains). You can definitely help us by providing feedback when you think we’ve missed an important site or page, when metrics look out of whack or when something goes awry in OSE, the mozBar or your web app campaigns. We really appreciate your patience while we improve and your support for the Linkscape dataset. The team can tell you that I take our struggles personally and hard, but I’m incredibly bullish on what we’ll be producing by the end of the year.What to Expect in the Next 3 MonthsWe’ll have a new index out in just 7-10 days that further addresses some bugs (and has some more freshly crawled pages, too)Index sizes – look for between 44-55 billion URLs, probably not achieving much over that until December, possibly laterDomain diversity – look for 100mil+ starting in the next index, and likely maintaining near that or above for future indicesIndex updates may slip past 4-5 weeks as we try to make more fixes ahead of a new crawl or processing cycle (we’ll keep the Linkscape calendar updated to make this a transparent process)We’re releasing a new version of PA + DA that are likely to be much better correlated with Google rankings (giving a superior metric to judge the ranking potential of sites/pages). This might, however, result in some sites + pages rising or falling dramatically. My best advice here is to use your competitors and industry cohorts as a bar for comparison rather than just looking at the raw numbers over time (since the metric itself is changing, a “40” in October might not mean what a “40” means today).Looking forward to hearing from you – the engineering team, along with myself and Kate, will be paying close attention to the comments on the thread and to any private feedback or emails to help@seomoz.org on this topic as well. Thanks again – it’s an honor to have such a great community of folks paying careful attention and deriving value from our products. We promise to live up to the high expectations you’ve got for us.

Snag your MozCon 2022 video bundle for even more SEO insights.

Buy the video bundle!

Read Next

The MozCon 2022 Video Bundle Is Here (Plus, Our 2020 Videos are FREE!)

Read this post

Announcing the Local SEO Certification from Moz Academy

Read this post

Gather ‘Round the Campfire for the MozCon 2022 Day Three Recap!

Read this post

Comments

Please keep your comments TAGFEE by following the community etiquette

Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.

Moz logo

Contact

Community

Free Trial

Terms & Privacy

Jobs

Help

News & Press

Copyright 2022 © Moz, Inc. All rights reserved.

类似文章