In my first weeks at Atlassian, I was seeking for a quick-win to get a stronger foothold within the company and evangelize SEO. So, I audited our main site and noticed it didn’t have an XML sitemap. What an easy win!
I went to the devs and asked them to activate it in the CMS. To my surprise, they told me that it wasn’t possible; I was baffled.
After some thinking, I remembered that Screaming Frog had an XML sitemap function, so I scraped the site and uploaded the crawl as an XML sitemap. Google ate it within a few seconds, and we saw a noticeable impact on our traffic in the following days.
The moral of the story is that XML sitemaps are important and sometimes underrated.
Here is everything I’m going to cover in this article
- What XML Sitemaps Are and Why You Need to Have One
- HTML vs. XML Sitemaps
- Different Types of XML Sitemaps
- XML Sitemap Minimum Requirements
- XML Sitemap Tips for Large Sites
- XML Sitemap Best and Worst Practices
- XML Sitemap Tools and Generators
What XML Sitemaps Are and Why You Need to Have One
XML sitemaps are digital maps that help Google discover important pages on your site and how often they are being updated.
Google states on its help center page:
A sitemap tells the crawler which files you think are important in your site, and also provides valuable information about these files: for example, for pages, when the page was last updated, how often the page is changed, and any alternate language versions of a page.
According to Gary Illyes, XML sitemaps are the second most important source of URLs to be crawled by Googlebot after hyperlinks and previously discovered URLs. That’s massive and shouldn’t be underestimated!
Sitemaps are the second Discovery option most relevant for Googlebot @methode #SOB2019
Enrique Hidalgo (@EnriqueStinson) June 15, 2019
Google started using XML sitemaps in 2005 and shortly after was joined by search engines like MSN or Yahoo. Nowadays, they use them for even more than just URL discovery.
Every website should have an XML sitemap. They are especially important for:
-
Large sites
-
New sites
-
Sites with lots of orphaned pages
-
Sites that use lots of images and videos
Whereas the robots.txt helps you to exclude parts of your site from being ranked in search engines, XML sitemaps do the opposite. They help search engines discover new pages — even when they are not linked from the main site.
Sitemaps come in XML format that Google can quickly parse to find new URLs. XML — eXtensible Markup Language — is lightweight and portable between devices and was made to store data.
The easiest way for you to check if your site has a sitemap is to look in Google Search Console or in Bing Webmaster Tools under “sitemaps.” Most search engines, such as Google or Bing, look for the “Sitemap: <sitemap_location>” entry (or entries) in your site’s robots.txt file. Alternatively, you can also ping your sitemap directly to Google, Baidu, Bing, and Yandex.
XML sitemaps in the Bing Webmaster Tools. 1: Sitemaps report. 2: Adding new sitemap paths. 3: existing sitemaps Bing found.
XML sitemaps in Google Search Console. 1: Sitemaps report. 2: Adding new sitemap paths. 3: existing sitemaps Google found.
HTML vs. XML Sitemaps
There are two types of sitemaps: HTML and XML. What is the difference?
1. You will notice the format.
HTML is obviously different from XML. But that implies even more: while HTML sitemaps are visible to site users, XML sitemaps are feeds for search engines.
You could argue that HTML sitemaps are also created for search engines, but while they can be valuable to users, XML sitemaps cannot.
2. They serve the same purpose but in different ways.
Both help search engines discover new URLs, whether pages, videos, or images.
XML sitemaps are custom feeds that help search engines understand the priority of URLs to crawl, how often they change, and which new ones were added to the site. That is especially helpful for search engine schedulers because they can better estimate when and how often to recrawl a URL.
HTML sitemaps also help search engines discover new URLs but through the discovery of links they follow. That means HTML sitemaps can only be an effective URL discovery tool if they are being crawled and if the links are followed. You can understand this by looking at your log files.
3. They have different side-benefits.
XML sitemaps have meta-attributes like <changefreq> or <lastmod> to indicate how the state of URL changes. They can also carry extensions for videos, images, and news.
HTML sitemaps distribute PageRank throughout a site, and that is what they are nowadays mainly used for, aside from the navigational value for users. Since HTML sitemaps are often linked in the footer of a site, they are usually linked from every page and might distribute that incoming PageRank to other pages with weaker internal linking.
Different Types of XML Sitemaps
Even though XML sitemaps can be submitted in RSS, mRSS, Atom 1.0 or text format, the “type” of a sitemap refers to its content or “media type”:
-
News
-
Video
-
Image
As I will further specify below, you can create sitemaps that contain only one specific media type or integrate them into your regular XML sitemap.
XML Sitemap Minimum Requirements
For your XML sitemaps to work optimally, you have to meet the standards. An XML sitemap should:
-
Contain only canonical URLs with a 200 status code.
-
Include up to 200K URLs per sitemap and up to 50K sitemaps per index sitemap.
-
Be referenced in the robots.txt.
-
BUTF-8 encoded.
-
Be compressed in .gz format.
-
Be no larger than 50mb or contain 50,000 URLs (whatever you hit first).
But there is more you can and should do to get the most out of XML sitemaps. You can signal to Google which URLs are important by including only important pages in XML sitemaps, and by updating it often.
Most CMSs have a function to automatically update sitemaps when a new URL is created or an existing page changes. For Google, the update frequency of the sitemap itself and the lastmod tag of pages can be a signal of freshness. Whether that is important for its ranking depends on the page and the context.
Think of an XML sitemap as a city map for tourists with the city being your website and the tourist being Google — you want to make sure to only include the important buildings, not every address. That is why unimportant pages shouldn’t be included; examples are pages like your privacy policy or about us page. While these pages should be indexed, they don’t need to be crawled often and are not important when we are talking about SEO.
XML Sitemap Tips for Large Sites
There is more you can do to elevate your sitemap game, beyond meeting the standard requirements.
Large sites like news publishers, for example, should make use of index sitemaps, which contain (up to 50,000) normal sitemaps, and should also not be heavier than 50mb. They are like the XML sitemap mothership that carries lots of smaller sitemaps. Large sites need them because they can’t fit into a single sitemap. You shouldn’t try to fit everything into a single sitemap, anyway.
You can make the most out of these sitemaps by structuring them either per page type or topic. In practice, you would create dedicated XML sitemaps per subdirectory or page template to get an understanding of technical and indexing problems with your site.
There are specialized XML sitemaps for specific purposes. Sites that operate heavily around rich media (think: Pinterest or YouTube) benefit a lot from image or video sitemaps. Publishers should have news sitemaps.
Image sitemaps increase your site’s chance to be found in Google image search. You don’t have to have a dedicated image sitemap; you can also use image extensions in your regular sitemap.
This is what image extensions look like ( XML specifications):
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://example.com/sample.html</loc>
<image:image>
<image:loc>http://example.com/image.jpg</image:loc>
</image:image>
<image:image>
<image:loc>http://example.com/photo.jpg</image:loc>
</image:image>
</url>
</urlset>
Video sitemaps function after the same principle: either create a dedicated sitemap or add extensions to your regular one:
<url>
<loc>https://example.com/mypage</loc>
<video> ... information about video 1 ... </video>
</url>
But be careful with the meta-data you add to video sitemaps or extensions.
Google states, “Google might use text on the video landing page rather than the text you supply in your sitemap if the page text is deemed more useful than the information in the sitemap.” They are speaking about the text delivered through the description. Besides a description, you can feed Google a thumbnail, video length, rating, family-friendliness, and more ( full list of video XML sitemap meta-data). For sites that heavily use video, this certainly makes sense. For all others, it is relatively optional.
News sitemaps are different in that you should always have a separate news XML sitemap. Google doesn’t recommend (or offer) extensions in this case. News sitemaps help Google discover and rank new articles, which is especially challenging in the publishing industry because it produces a lot of content. Even though Google states that publishers with news sitemaps are not favored, it does help to get hot news ranking in Google News faster.
News sitemaps have special requirements:
-
Include articles not older than 2 days.
-
Don’t add more than 1000 new entries to an existing sitemap at a time.
-
Update existing sitemaps for article updates.
You can also use XML sitemaps to define and indicate certain meta-tags for Google. One example is hreflang; you can add as an extension to a sitemap (full guidelines):
<url>
<loc>http://www.example.com/english/page.html</loc>
<xhtml:link
rel="alternate"
hreflang="de"
href="http://www.example.com/deutsch/page.html"/>
<xhtml:link
rel="alternate"
hreflang="de-ch"
href="http://www.example.com/schweiz-deutsch/page.html"/>
<xhtml:link
rel="alternate"
hreflang="en"
href="http://www.example.com/english/page.html"/>
</url>
Google ignores the priority attribute in XML sitemaps but does pay attention to lastmod, according to John Mueller. Google determines the priority of your pages itself, probably by popularity and authority. Lastmod, however, is a tag that indicates when the URL has changed the last time, which is really interesting to Google.
The URL + last modification date is what we care about for websearch.
? John ? (@JohnMu) August 17, 2017
Also, you don’t need to add XML sitemaps for AMP URLs, according to John Mueller.
@Kfowler325 No need for sitemaps for AMP pages — the rel=amphtml link is enough for us.
? John ? (@JohnMu) October 13, 2016
XML Sitemap Best and Worst Practices
At Atlassian, we solved the missing XLM sitemap functionality of our CMS with a 3rd party XML sitemap provider, and it worked just fine.
Even though the format is text-based instead of XML, it works.
The New York Times references its sitemaps in the robots.txt and separates formats like videos or news. It goes even a step further and has sitemaps for specific categories, such as cooking or elections.
It makes sense to have dedicated XML sitemaps to timely events as a publisher because you need to understand how fast Google picks the content up and if everything can be indexed without problems.
Walmart has a similar split by categories that makes a lot of sense for ecommerce sites. It has Master XML sitemaps for topics and categories.
As you can see in the screenshot below, the topic split allows Walmart to see how Google indexes different areas of the site like fashion or entertainment.
If you have a site that is split into topic, categories, or both, creating specific XML sitemaps for each is recommendable. There is no known disadvantage of having the same URLs in different sitemaps.
Semrush Tip: With the Semrush Site Audit tool, you can audit any website and check for six specific issues related to XML sitemaps. The tool will first check for if a sitemap.xml is present or not, and then it will look for formatting errors, incorrect pages in the sitemap, and other issues that could be impacting the clarity of your sitemap.
XML Sitemap Tools and Generators
Most content management systems come with prepackaged functions that allow you to create an XML sitemap automatically. But some don’t, and in this case, you need a third-party tool.
You might also want to read: 10 of the Best Sitemap Generator Tools…
These are my personal picks for XML sitemap generators.
Name |
Price |
Limit |
Features |
Free trial |
$8.99/month |
n/a |
|
30 days |
|
$40/month |
200K URLs per crawl |
|
14 days |
|
$14.99/month |
n/a |
|
3 sitemaps free |
|
Free to 500 URLs £149.00/year |
n/a |
|
n/a |
|
$49/month |
15K pages |
|
30 days |
|
$4.99 for 1K pages $189.99 for 1.5m pages |
1,5m pages |
|
Free for 500 pages |
WordPress Plugins
Name |
Price |
Limit |
Features |
Ratings |
free |
n/a |
|
4.4/5 33 reviews |
|
free |
n/a |
|
4.3/5 112 |
|
free |
n/a |
|
4.9/5 2090 reviews |
|
free (premium available) |
n/a |
|
4.9/5 26,745 reviews |
|
free |
n/a |
|
n/a |
|
free |
n/a |
|
4.4/5 449 reviews |
|
free |
n/a |
|
4.4/5 59 reviews |