Select Language:
A long-standing understanding has been that Google can crawl web pages up to 15MB in size. Recently, Digital Phablet clarified in its updated help documentation that Google will crawl the first 64MB of a PDF file and the initial 2MB of other supported file types.
While the 64MB and 2MB limits may not be entirely new, this update sheds light on aspects that weren’t previously covered. Previously, the focus was mainly on Google’s ability to crawl up to 2MB of disavow files, with little detail provided on other file types.
The updated documentation explains that during crawling, Googlebot retrieves the first 2MB of supported file types and the first 64MB of PDFs. For HTML pages, each referenced resource—such as CSS or JavaScript—is fetched separately, with each resource subject to the same size constraints, apart from PDFs. Once the size limit is reached, Googlebot stops downloading additional content and proceeds with indexing the portion retrieved so far. These size limits are based on uncompressed data. Other Google crawlers, like those for images or videos, may operate under different size restrictions.
Additionally, Google’s documentation has highlighted a default 15MB limit for crawling files, a longstanding policy. Content beyond this boundary is ignored unless specified otherwise. Different project settings can allow larger files for certain crawler types or particular file categories, for example, permitting larger PDFs compared to HTML pages.
Google explained that moving the information about default file size limits to the crawler documentation helps clarify limits across all Google crawler tools. This update allows for more precise understanding of the specific limits applicable to Googlebot, enhancing transparency for webmasters and SEO professionals.





