Sitebulb is a new crawling tool to check for best practices on a website relevant to Google SEO.
TLDR: Overall Sitebulb looks to be a great ‘allrounder’ in the site audit category, and its flexibility will be useful for experienced SEO’s and newbies alike.
QUOTE: “A Better Website CrawlerFor More Comprehensive Audits And Exceptional Reports.” Sitebulb
The Sitebulb Audit includes analysis, insight and guidance on the technical and content requirements of modern SEO.
The reports are quick and the results are easily interpreted with actionable insights The main features/benefits are summarised below.
The core function of the tool, it performed well on the sites we tested. It is easy to configure and there is a range of advanced options for including or excluding URLs and tailoring the crawl rate for specific site or server environments.
Crucially the configuration allows control the Render Timeout, and it’s easy to fine-tune and test when required.
Interactive Crawl Map
Sitebulb also produces a unique, interactive ‘Crawl Map’,
Crawl Maps can help you identify, conceptualize and communicate website architecture patterns, giving you a much clearer perspective on both issues and potential solutions.
It’s certainly a nice feature which at a glance let gives you an indication of site structure, click depth and pagination.
Crawl Large Sites
A key selling point of the tool is its ability to handle very large sites.
BIG sites can often have BIG problems, which without a global view, can be difficult to spot. Getting that global view usually requires considerable investment in cloud platforms. Sitebulb is installed on your own machine, so it’s only constrained by local operating resources.
Sitebulb has provided detailed instructions for successfully crawling and understanding large, complex websites, using the tool.
The site reports provide a succinct and insightful overview of the sites indexable and non-indexable content.
It highlights any potential inconsistencies where there are conflicting instructions in the 3 main locations (robots.txt, <head> tag and by HTTP Header).
Canonical tag issues are also checked, and again it’s easy to dig deeper and export ready-made reports to help fix the errors.
Vital site speed and mobile rendering analysis are performed on every page on the site during the crawl.
There are over 30 ‘industry standard’ speed checks performed and summarised in the results dashboard. The ‘Site Performance’ view shows the most critical issues, and crucially let see how many URLs affected by each issue, so you can easily identify the quick wins, and where more work is required, either on the site or the host server.
This dashboard lets you view, analyse and export every potential URL on the site, found during the crawl or in the XML Sitemap. You can also join your Google Analytics and Search Console data with Sitebulb to get a fully comprehensive data set.
It gives you a breakdown the content type distribution as well as HTTP / https consistency checks.
Analyse, check and understand both internal and external links. The report covers key metrics such as Followed / Nofollowed, Broken or Redirected links along with useful hints which give specific examples of potential issues.
The ‘Anchor Text’ is particularly useful to spot potential over or under-use of primary key phrases in internal links, or other site-wide issues.
Altogether, this data can guide decisions regarding internal/external link equity and focus strategy on anchor text usage.
This section is intended to provide all key data required when performing a site content audit, and it does a pretty good job.
It summaries and the common but often neglected on-page best practice elements as well as detailing potential duplication across the meta and on-page content.
The report also includes 2 interesting features in the ‘Readability’ and ‘Sentiment’ scores. These are intended to help you improve ‘engagement’ and the ‘message’ of your content respectively. This is obviously (potentially) a very strong feature, an additional layer of data which is different from other crawler tools.
Focus on-site resources such as JS, CSS, Video and Images only. This dashboard gives a detailed breakdown of resource distribution and behaviour. It highlights broken references and tells you exactly where to find them.
Joining the Sitebulb crawl data with Google Analytics and Search Console data is extremely useful. Not only can you identify top performing pages but also which URL’s are receiving little or no traffic (handy for content consolidation or trimming strategy development).
Having correctly structured and validated AMP is essential, as any errors will result in the pages being ignored. These URL’s also need to be crawlable, indexable and correctly ‘canonicalised’ this report highlights any issues in this regard.
International / Multilingual
As with AMP, it’s vital that multilingual sites have a valid ‘hreflang’ set-up. Unlike AMP, there are various methods for implementing hreflang markup. The crawler will follow, check and test every potential alternate URL, whether it’s found in the HTTP header, sitemap or inline HTML head. Across separate or sub-domains.
The crawlers analysis and interpretation of XML sitemaps, in conjunction with Google Search Console data, should help ensure the contents are accurate and free from errors. This feature will be no doubt prove it’s worth when dealing with large or multiple XML sitemaps, which can often be a real pain to analyse accurately.
How to configure and run the Sitebulb crawler
Create your first Project and enter the target site details in the Start URL (remember to specify if it’s HTTP or https).
Hit ‘save and continue’ and let the tool run through its pre-audit checks, which will highlight and provide solutions to any potential issues ahead of the crawler configuration.
Select the type of audit required. In most cases, the ‘Standard Audit’ will be the best option.
The ‘Sample Audit’ is best used initially for testing purposes on large sites to determine the optimal settings before embarking on ‘Standard Audit’.
Decide which data reports are required for the target site. If for example, you already know the site is not multilingual or does not feature AMP, these options can be left.
We would suggest all other options are ticked as standard (remember to add the path to the target site XML sitemap).
To fully maximise the results, we would recommend adding both Google Analytics and Search Console access (if available). It enhances the audit insights and saves you a lot of time having to join this data at a later date using Excel. Simply follow the onscreen instructions to connect your accounts, the process only takes a few minutes.
Specify the Sitebulb crawler settings. In most cases, the type, rate and maximum URL’s options can be left as default.
There are ‘Advanced’ option available if required to further control the crawler behaviour. However, we would expect that initially these can be left as default.
These settings become more pertinent when there are tight constraints to work with (such as site/server security) or when analysing sites you know well, helping you focus on very specific aspects of the site. This can quicken up the crawl process and improve performance.
Hit the ‘Start Audit’ button, then sit back and let the tool do its thing.
The Audit process is easy to configure and reasonably quick to run (depending of course on the size of the site).
It’s easy to re-run crawls at any time and the because the data is stored locally in its own database structure, you have the benefit of easily comparing data over time, which takes analysis to a much deeper level.
Further information & instructions on the audit process can be found in the Resources section of the Sitebulb website.
Overall Sitebulb looks to be a great ‘allrounder’ in the site audit category, and its flexibility will be useful for experienced SEO’s and newbies alike (some understanding of SEO practices is required).
As a crawler, while it’s perhaps not as fast as Screaming Frog, it has a better UI and it’s storage method means it’s less local resource intensive.
As a site auditor it does a pretty good job, particularly from a technical point of view, however, it’s error reporting and resolution advice is not yet as comprehensive as SEMRush for example (with which it shares a number of similarities), but it is, however, cheaper and is not does not share the same restrictions on usage.
It’s unclear if it is as good at handling ‘monster’ sites as Deepcrawl, but it is certainly more accessible and with unlimited projects (in theory) cheaper to run.
I would imagine we will see this crawler and auditor turn into a very popular website crawler tool.