Posted: 6 January 2020
Crawl Budget is a phrase that’s regularly used by SEOs to describe the way Google crawls a website.
Google actually has its own term for this, and that’s ‘Crawl Rate Limit’. This term refers to a number of different factors that reflect how and why Google will crawl a website:
‘Googlebot is designed to be a good citizen of the web. Crawling is its main priority while making sure it doesn’t degrade the experience of users visiting the site. We call this the “crawl rate limit,” which limits the maximum fetching rate for a given site.’ - Google
Google believes that there are two main factors affecting how they crawl a website:
Another factor they refer to is the ‘popularity’ of a website. If a website is in demand and the index deems there to be a need for fresh content, they will actively crawl the website more regularly in order to get the most up to date content for search results.
Despite Google being the second biggest Internet firm (for now), it’s an incredibly resource-heavy job to regularly crawl every website. Therefore it needs to prioritise some websites that update regularly and are popular with users, in order to spend less time on websites that remain static for long periods of time.
As webmasters, particularly in the eCommerce space where websites are often much larger, we need to understand how Google views our site and whether it sees all of our content. Some pages are more important than others i.e. the homepage and category pages. Whereas we have other pages that update more regularly i.e. product pages and blog pages. So it’s important when Google revisits our sites, we give it good instructions so it can spend the right amount of time on a page, rather than digging deeper in the site architecture which provides less value to the business.
The easiest way to measure crawl rate is in Search Console, which you can access through ‘Legacy tools and reports’.
If Google is having trouble interpreting your website, one negative effect could be fewer updates of key ranking pages. It’s important we keep the crawl budget allocated for our site focussed on the pages that are a) important to us, and b) rank in search results.
If Googlebot is spending a lot of time on smaller pages lower in the hierarchy and missing key pages, this will have a negative impact on performance.
If you have an inventory of products that changes regularly, frequent blog output, or changes in site architecture, there is a danger that poor crawl rates could result in these changes being missed by Google.
For large websites, there’s often a lot of change and by making sure rules are in place that remains consistent, we know that Google will have a set path through the site and preserve crawl budget for areas of the site that are important and update more regularly.
If Googlebot is spending a lot of time on your site because it believes it’s bigger than it potentially is, this could result in reduced bandwidth or unnecessary server upgrades.
This issue is particularly prevalent when faceted navigation is set up incorrectly and new filters create new URL pages that are more or less the same in terms of content and appearance as a landing page. Sometimes this can be a quick fix with a canonical or a robots.txt file, but the latter should be used as a last resort where possible.
It pays dividends to take care of your XML sitemap as the bots will have a much easier time understanding where the internal links lead.
Use only the URLs that are canonical for your sitemap and make sure that it corresponds to the newest uploaded version of robots.txt.
URL parameters that point back to the same page can eat up a significant amount of your crawl budget.
One way to combat this is to eliminate all URL parameters. For example, an eCommerce store using http://mystore.com/shop?productId=7 URL to display a product can change that URL to http://mystore.com/shop/productId7 to remove at least one request parameter.
Otherwise, if you’re using request parameters, be sure to tell Google how to handle those parameters in the Search Console so that Google doesn’t index duplicate pages.
If you have a lot of broken links and 404 pages on your site, you’ll need to reduce those to maximise your crawl budget.
By fixing broken links and redirecting links using tools like Screaming Frog, alongside Google Search Console and Bing Webmaster Tools, you can recover wasted crawl budget and improve a user’s experience.
Every time one of the pages on your site redirects to another page (with a 301 or 302 redirect), it uses a small part of your crawl budget. If you have a lot of redirects, your crawl budget could get depleted before the Googlebot crawls the page you want to be indexed.
To ensure redirects are effective, Google suggests minimising any unnecessary redirects by never linking to a page that you know has a redirect on it and by never requiring more than one redirect to get to any of your resources.
When pages have high loads times or they time out, search engines can visit fewer pages on your website within their allocated crawl budget. The ideal page load speed will be under one second.
You can monitor your page load time and page timeouts within Google Search Console. It is also essential to consider the page load speed, specifically on mobile, as a significant number of users will be accessing your website via their mobile device.
The faster that your server responds to a page request, the more pages that the Googlebot will crawl.
When selecting a hosting provider, it would be wise to choose a host that responds quickly to server requests to optimise crawl rates.
You don’t want search engines to spend their time on duplicate content pages, so it’s important to make sure that your site’s pages are made up of unique, quality content. Configuring canonical tags can help inform Google of similar pages that exist on the site and where to find the original.
You can also minimise duplicate content in various other ways. Including setting up website redirects for all domain variants, make internal search result pages inaccessible to search engines using robots.txt, disabling dedicated image pages and being vigilant with the use of taxonomies.
Aside from minimising any duplicate content, you should also ensure that pages with little content are kept to a minimum or avoided altogether as they are not attractive to search engines.
A typical example of low-quality content is a FAQ section with separate URLs provided for each question and answer.
Orphan pages are pages that have no internal or external links pointing to them. If your website contains a high number of orphan pages that are accessible to search engines, you’re in theory, keeping search engines busy sifting through irrelevant pages.
To get the most out of your crawl budget, make sure that there’s at least one internal or external link pointing to every page on your site.
In summary, if you were wondering whether crawl budget optimisation is still important for SEO performance, the answer is it depends on the size of your website. For large websites, it’s essential you optimize your relationship with Google.
We hope our advice on how to optimise crawl rates helps improve visibility, the number of users visiting your website, and in turn increase conversions.