Sitemaps are easy when you’re using a small wordpress site. But when you have 100,000s of pages, sitemaps become their own project. Here’s what you need to know for sitemapping very large websites.
Q: How big can my Sitemap be?
Sitemaps should be no larger than 10MB (10,485,760 bytes) and can contain a maximum of 50,000 URLs. These limits help to ensure that your web server does not get bogged down serving very large files. This means that if your site contains more than 50,000 URLs or your Sitemap is bigger than 10MB, you must create multiple Sitemap files and use a Sitemap index file. You should use a Sitemap index file even if you have a small site but plan on growing beyond 50,000 URLs or a file size of 10MB. A Sitemap index file can include up to 1,000 Sitemaps and must not exceed 10MB (10,485,760 bytes). You can also use gzip to compress your Sitemaps.
Q: My site has tens of millions of URLs; can I somehow submit only those that have changed recently?
You can list the URLs that change frequently in a small number of Sitemaps and then use the lastmod tag in your Sitemap index file to identify those Sitemap files. Search engines can then incrementally crawl only the changed Sitemaps.
Q: What data format must the sitemap be?
Acceptable sitemap data types are XML, RSS, Line delimited text, mRSS, and Atom 1.0.
Source: Google Support
Q: What are nested sitemaps and what is the benefit of using nested sitemaps?
In the example above, the asterisk next to the name is indicating that the sitemap file is an index, not a sitemap as mentioned above. Sitemap indices are sitemaps to point to other sitemaps. This what makes your life easier and more structured. If you loaded each section of your site as a separate sitemap, that’s fine, but rather annoying to page through on webmaster tools. If you were to use indices, you could drill down and see more detail in specific areas. Let me show you.
How sitemap indices help:
- Indicate where indexation issues are.
- Allow an overview look (the numbers for sitemap.xml) all the way down to specific areas. Great for reports!
- Show the search engines what your site structure is supposed to be.
- Identify possible duplicate content. (Have a section for doll shoes and doll boots? Those might cause duplicate content if they share products and their URLs are different)