We are delighted to announce the release of Screaming Frog SEO Spider version 11.0, codenamed internally as ‘triples’, which is a big hint for those in the know.
In version 10 we introduced many new features all at once, so we wanted to make this update smaller, which also means we can release it quicker. This version includes one significant exciting new feature and a number of smaller updates and improvements. Let’s get to them.
1) Structured Data & Validation
Structured data is becoming increasingly important to provide search engines with explicit clues about the meaning of pages, and enabling special search result features and enhancements in Google.
The SEO Spider now allows you to crawl and extract structured data from the three supported formats (JSON-LD, Microdata and RDFa) and validate it against Schema.org specifications and Google’s 25+ search features at scale.
To extract and validate structured data you just need to select the options under ‘Config > Spider > Advanced’.
Structured data itemtypes will then be pulled into the ‘Structured Data’ tab with columns for totals, errors and warnings discovered. You can filter URLs to those containing structured data, missing structured data, the specific format, and by validation errors or warnings.
The structured data details lower window pane provides specifics on the items encountered. The left-hand side of the lower window pane shows property values and icons against them when there are errors or warnings, and the right-hand window provides information on the specific issues discovered.
The right-hand side of the lower window pane will detail the validation type (Schema.org, or a Google Feature), the severity (an error, warning or just info) and a message for the specific issue to fix. It will also provide a link to the specific Schema.org property.
In the random example below from a quick analysis of the ‘car insurance’ SERPs, we can see lv.com have Google Product feature validation errors and warnings. The right-hand window pane lists those required (with an error), and recommended (with a warning).
As ‘product’ is used on these pages, it will be validated against Google product feature guidelines, where an image is required, and there are half a dozen other recommended properties that are missing.
The right-hand window pane explains that this is because the format needs to be two-letter ISO 3166-1 alpha-2 country codes (and the United Kingdom is ‘GB’). If you check the page in Google’s structured data testing tool, this error isn’t picked up. Screaming Frog FTW.
The SEO Spider will validate against 26 of Google’s 28 search features currently and you can see the full list in our structured data section of the user guide.
As many of you will be aware, frustratingly Google don’t currently provide an API for their own Structured Data Testing Tool (at least a public one we can legitimately use) and they are slowly rolling out new structured data reporting in Search Console. As useful as the existing SDTT is, our testing found inconsistency in what it validates, and the results sometimes just don’t match Google’s own documented guidelines for search features (it often mixes up required or recommended properties for example).
We researched alternatives, like using the Yandex structured data validator (which does have an API), but again, found plenty of inconsistencies and fundamental differences to Google’s feature requirements – which we wanted to focus upon, due to our core user base.
There are plenty of nuances in structured data and this feature will not be perfect initially, so please do let us know if you spot any issues and we’ll fix them up quickly. We obviously recommend using this new feature in combination with Google’s Structured Data Testing Tool as well.
2) Structured Data Bulk Exporting
As you would expect, you can bulk export all errors and warnings via the ‘reports’ top-level menu.
The ‘Validation Errors & Warnings Summary’ report is a particular favourite, as it aggregates the data to unique issues discovered (rather than reporting every instance) and shows the number of URLs affected by each issue, with a sample URL with the specific issue. An example report can be seen below.
This means the report is highly condensed and ideal for a developer who wants to know the unique validation issues that need to be fixed across the site.
3) Multi-Select Details & Bulk Exporting
You can now select multiple URLs in the top window pane, view specific lower window details for all the selected URLs together, and export them. For example, if you click on three URLs in the top window, then click on the lower window ‘inlinks’ tab, it will display the ‘inlinks’ for those three URLs.
You can also export them via the right click or the new export button available for the lower window pane.
Obviously this scales, so you can do it for thousands, too.
This should provide a nice balance between exporting everything in bulk via the ‘Bulk Export’ menu and then filtering in spreadsheets, or the previous singular option via the right click.
4) Tree-View Export
If you didn’t already know, you can switch from the usual ‘list view’ of a crawl to a more traditional directory ‘tree view’ format by clicking the tree icon on the UI.
However, while you were able to view this format within the tool, it hasn’t been possible to export it into a spreadsheet. So, we went to the drawing board and worked on an export which seems to make sense in a spreadsheet.
When you export from tree view, you’ll now see the results in tree view form, with columns split by path, but all URL level data still available. Screenshots of spreadsheets generally look terrible, but here’s an export of our own website for example.
This allows you to quickly see the break down of a website’s structure.
5) Visualisations Improvements
We have introduced a number of small improvements to our visualisations. First of all, you can now search for URLs, to find specific nodes within the visualisations.
By default, the visualisations have used the last URL component for naming of nodes, which can be unhelpful if this isn’t descriptive. Therefore, you’re now able to adjust this to page title, h1 or h2.
Finally, you can now also save visualisations as HTML, as well as SVGs.
6) Smart Drag & Drop
You can drag and drop any file types supported by the SEO Spider directly into the GUI, and it will intelligently work out what to do. For example, you can drag and drop a saved crawl and it will open it.
You can drag and drop a .txt file with URLs, and it will auto switch to list mode and crawl them.
You can even drop in an XML Sitemap and it will switch to list mode, upload the file and crawl that for you as well.
Nice little time savers for hardcore users.
7) Queued URLs Export
You’re now able to view URLs remaining to be crawled via the ‘Queued URLs’ export available under ‘Bulk Export’ in the top level menu.
This provides an export of URLs discovered and in the queue to be crawled (in order to be crawled, based upon a breadth-first crawl).
8) Configure Internal CDNs
You can now supply a list of CDNs to be treated as ‘Internal’ URLs by the SEO Spider.
This feature is available under ‘Configuration > CDNs’ and both domains and subfolder combinations can be supplied. URLs will then be treated as internal, meaning they appear under the ‘Internal’ tab, will be used for discovery of new URLs, and will have data extracted like other internal URLs.
9) GA Extended URL Matching
Finally, if you have accounts that use extended URL rewrite filters in Google Analytics to view the full page URL (and convert /example/ to www.example.com/example) in the interface, they break what is returned from the API, and shortcuts in the interface (i.e they return www.example.comwww.example.com/example).
This means URLs won’t match when you perform a crawl obviously. We’ve now introduced an algorithm which will take this into account automatically and match the data for you, as it was really quite annoying.
Version 11.0 also includes a number of smaller updates and bug fixes, outlined below.
The ‘URL Info’ and ‘Image Info’ lower window tabs has been renamed from ‘Info’ to ‘Details’ respectively.
‘Auto Discover XML Sitemaps via robots.txt’ has been unticked by default for list mode (it was annoyingly ticked by default in version 10.4!).
There’s now a ‘Max Links per URL to Crawl’ configurable limit under ‘Config > Spider > Limits’ set at 10k max.
There’s now a ‘Max Page Size (KB) to Crawl’ configurable limit under ‘Config > Spider > Limits’ set at 50k.
There are new tool tips across the GUI to provide more helpful information on configuration options.
The HTML parser has been updated to fix an error with unquoted canonical URLs.
A bug has been fixed where GA Goal Completions were not showing.
That’s everything. If you experience any problems with the new version, then please do just let us know via support and we can help. Thank you to everyone for all their feature requests, bug reports and general support, Screaming Frog would not be what it is, without you all.
Now, go and download version 11.0 of the Screaming FrogSEO Spider.
From crawl completion notifications to automated reporting: this post may not have Billy the Kid or Butch Cassidy, instead, here are a few of my most useful tools to combine with the SEO Spider, (just as exciting).
We SEOs are extremely lucky—not just because we’re working in such an engaging and collaborative industry, but we have access to a plethora of online resources, conferences and SEO-based tools to lend a hand with almost any task you could think up.
My favourite of which is, of course, the SEO Spider—after all, following Minesweeper Outlook, it’s likely the most used program on my work PC. However, a great programme can only be made even more useful when combined with a gang of other fantastic tools to enhance, compliment or adapt the already vast and growing feature set.
While it isn’t quite the ragtag group from John Sturges’ 1960 cult classic, I’ve compiled the Magnificent Seven(ish) SEO tools I find useful to use in conjunction with the SEO Spider:
Debugging in Chrome Developer Tools
Chrome is the definitive king of browsers, and arguably one of the most installed programs on the planet. What’s more, it’s got a full suite of free developer tools built straight in—to load it up, just right-click on any page and hit inspect. Among many aspects, this is particularly handy to confirm or debunk what might be happening in your crawl versus what you see in a browser.
For instance, while the Spider does check response headers during a crawl, maybe you just want to dig a bit deeper and view it as a whole? Well, just go to the Network tab, select a request and open the Headers sub-tab for all the juicy details:
There are honestly far too many options in the Chrome developer toolset to list here, but it’s certainly worth getting your head around.
Page Validation with a Right-Click
Okay, I’m cheating a bit here as this isn’t one tool, rather a collection of several, but have you ever tried right-clicking a URL within the Spider? Well, if not, I’d recommend giving it a go—on top of some handy exports like the crawl path report and visualisations, there’s a ton of options to open that URL into several individual analysis & validation apps:
Google Cache – See how Google is caching and storing your pages’ HTML.
Wayback Machine – Compare URL changes over time.
Other Domains on IP – See all domains registered to that IP Address.
Open Robots.txt – Look at a site’s Robots.
HTML Validation with W3C – Double-check all HTML is valid.
PageSpeed Insights – Any areas to improve site speed?
Structured Data Tester – Check all on-page structured data.
Mobile-Friendly Tester – Are your pages mobile-friendly?
Rich Results Tester – Is the page eligible for rich results?
AMP Validator – Official AMP project validation test.
User Data and Link Metrics via API Access
We SEOs can’t get enough data, it’s genuinely all we crave – whether that’s from user testing, keyword tracking or session information, we want it all and we want it now! After all, creating the perfect website for bots is one thing, but ultimately the aim of almost every site is to get more users to view and convert on the domain, so we need to view it from as many angles as possible.
Starting with users, there’s practically no better insight into user behaviour than the raw data provided by both Google Search Console (GSC) and Google Analytics (GA), both of which help us make informed, data-driven decisions and recommendations.
What’s great about this is you can easily integrate any GA or GSC data straight into your crawl via the API Access menu so it’s front and centre when reviewing any changes to your pages. Just head on over to Configuration > API Access > [your service of choice], connect to your account, configure your settings and you’re good to go.
Another crucial area in SERP rankings is the perceived authority of each page in the eyes of search engines – a major aspect of which, is (of course), links., links and more links. Any SEO will know you can’t spend more than 5 minutes at BrightonSEO before someone brings up the subject of links, it’s like the lifeblood of our industry. Whether their importance is dying out or not there’s no denying that they currently still hold much value within our perceptions of Google’s algorithm.
Well, alongside the previous user data you can also use the API Access menu to connect with some of the biggest tools in the industry such as Moz, Ahrefs or Majestic, to analyse your backlink profile for every URL pulled in a crawl.
Understanding Bot Behaviour with the Log File Analyzer
An often-overlooked exercise, nothing gives us quite the insight into how bots are interacting through a site than directly from the server logs. The trouble is, these files can be messy and hard to analyse on their own, which is where our very own Log File Analyzer (LFA) comes into play, (they didn’t force me to add this one in, promise!).
I’ll leave @ScreamingFrog to go into all the gritty details on why this tool is so useful, but my personal favourite aspect is the ‘Import URL data’ tab on the far right. This little gem will effectively match any spreadsheet containing URL information with the bot data on those URLs.
So, you can run a crawl in the Spider while connected to GA, GSC and a backlink app of your choice, pulling the respective data from each URL alongside the original crawl information. Then, export this into a spreadsheet before importing into the LFA to get a report combining metadata, session data, backlink data and bot data all in one comprehensive summary, aka the holy quadrilogy of technical SEO statistics.
While the LFA is a paid tool, there’s a free version if you want to give it a go.
Crawl Reporting in Google Data Studio
One of my favourite reports from the Spider is the simple but useful ‘Crawl Overview’ export (Reports > Crawl Overview), and if you mix this with the scheduling feature, you’re able to create a simple crawl report every day, week, month or year. This allows you to monitor and for any drastic changes to the domain and alerting to anything which might be cause for concern between crawls.
However, in its native form it’s not the easiest to compare between dates, which is where Google Sheets & Data Studio can come in to lend a hand. After a bit of setup, you can easily copy over the crawl overview into your master G-Sheet each time your scheduled crawl completes, then Data Studio will automatically update, letting you spend more time analysing changes and less time searching for them.
This will require some fiddling to set up; however, at the end of this section I’ve included links to an example G-Sheet and Data Studio report that you’re welcome to copy. Essentially, you need a G-Sheet with date entries in one column and unique headings from the crawl overview report (or another) in the remaining columns:
Once that’s sorted, take your crawl overview report and copy out all the data in the ‘Number of URI’ column (column B), being sure to copy from the ‘Total URI Encountered’ until the end of the column.
Open your master G-Sheet and create a new date entry in column A (add this in a format of YYYYMMDD). Then in the adjacent cell, Right-click > ‘Paste special’ > ‘Paste transposed’ (Data Studio prefers this to long-form data):
If done correctly with several entries of data, you should have something like this:
Once the data is in a G-Sheet, uploading this to Data Studio is simple, just create a new report > add data source > connect to G-Sheets > [your master sheet] > [sheet page] and make sure all the heading entries are set as a metric (blue) while the date is set as a dimension (green), like this:
You can then build out a report to display your crawl data in whatever format you like. This can include scorecards and tables for individual time periods, or trend graphs to compare crawl stats over the date range provided, (you’re very own Search Console Coverage report).
Here’s an overview report I quickly put together as an example. You can obviously do something much more comprehensive than this should you wish, or perhaps take this concept and combine it with even more reports and exports from the Spider.
Building Functions & Strings with XPath Helper & Regex Search
The Spider is capable of doing some very cool stuff with the extraction feature, a lot of which is listed in our guide to web scraping and extraction. The trouble with much of this is it will require you to build your own XPath or regex string to lift your intended information.
While simply right-clicking > Copy XPath within the inspect window will usually do enough to scrape, by it’s not always going to cut it for some types of data. This is where two chrome extensions, XPath Helper & Regex- Search come in useful.
Unfortunately, these won’t automatically build any strings or functions, but, if you combine them with a cheatsheet and some trial and error you can easily build one out in Chrome before copying into the Spider to bulk across all your pages.
If you simply right clicked on one of the highlighted elements in the inspect window and hit Copy > Copy XPath, you would be given something like:
While this does the trick, it will only pull the single instance copied (‘16 January, 2019 by Ben Fuller’). Instead, we want all the dates and authors from the /blog subfolder.
By looking at what elements the reference is sitting in we can slowly build out an XPath function directly in XPath Helper and see what it highlights in Chrome. For instance, we can see it sits in a class of ‘main-blog–posts_single-inner–text–inner clearfix’, so pop that as a function into XPath Helper:
XPath Helper will then highlight the matching results in Chrome:
Close, but this is also pulling the post titles, so not quite what we’re after. It looks like the date and author names are sitting in a sub <p> tag so let’s add that into our function:
Bingo! Stick that in the custom extraction feature of the Spider (Configuration > Custom > Extraction), upload your list of pages, and watch the results pour in!
Regex Search works much in the same way: simply start writing your string, hit next and you can visually see what it’s matching as you’re going. Once you got it, whack it in the Spider, upload your URLs then sit back and relax.
Notifications & Auto Mailing Exports with Zapier
Zapier brings together all kinds of web apps, letting them communicate and work with one another when they might not otherwise be able to. It works by having an action in one app set as a trigger and another app set to perform an action as a result.
To make things even better, it works natively with a ton of applications such as G-Suite, Dropbox, Slack, and Trello. Unfortunately, as the Spider is a desktop app, we can’t directly connect it with Zapier. However, with a bit of tinkering, we can still make use of its functionality to provide email notifications or auto mailing reports/exports to yourself and a list of predetermined contacts whenever a scheduled crawl completes.
All you need is to have your machine or server set up with an auto cloud sync directory such as those on ‘Dropbox’, ‘OneDrive’ or ‘Google Backup & Sync’. Inside this directory, create a folder to save all your crawl exports & reports. In this instance, I’m using G-drive, but others should work just as well.
You’ll need to set a scheduled crawl in the Spider (file > Schedule) to export any tabs, bulk exports or reports into a timestamped folder within this auto-synced directory:
Log into or create an account for Zapier and make a new ‘zap’ to email yourself or a list of contacts whenever a new folder is generated within the synced directory you selected in the previous step. You’ll have to provide Zapier access to both your G-Drive & Gmail for this to work (do so at your own risk).
If your Google Ad Grants account has been unexpectedly suspended, this article will help you through the process of getting it back up and running in as little time as possible, as well as providing preventative measures that will keep your account suspension free.
Ad Grants account guidelines are renowned for being vague with regards to their disapproval reasons, and it can often be the case of automated checks flagging problems that were potentially misinterpreted by their systems. If you believe this to be the case, then jump along to the section about how to get in contact with Google and dive straight into getting your account re-reviewed.
A more likely reason, however, will be that one of the program terms or policies has been violated somewhere in your campaigns. In this case, you will need to take a closer look at your account in order to resolve the issue.
Diagnosing the Problem
In most cases, if your account has violated any policies during the month, you will receive a gentle nudge via email to check your account’s compliance with programme policies. This email will link through to a policy compliance report explaining which area of the account is causing problems.
When reading this, go through each section to make sure that you are following each policy correctly, check and check again!
Common reasons for suspensions
The most common reasons for why a Grants account may have been suspended include:
Not enough ad groups per campaign
Every campaign requires at least 2 ad groups per campaign, with at least 2 ads in each and 2 sitelinks across the account.
Low Click Through Rate (CTR)
Not reaching the specified 5% CTR for 2 consecutive months can result in a temporary account deactivation.
No active keywords in an ad group
Having an active ad group with no active keywords in it can often cause an account to be flagged. This ad group will have to be paused or have some new relevant keywords added to comply. This is something to be especially wary of if you have set up any rules to pause low quality score keywords. This is explained further in the ways of preventing future suspensions
Low Quality Score
Every single keyword that is not removed or paused within your account needs to have a quality score of 3 or higher. Reminder: If the keyword is yet to be given a visible quality score and is shown with a dash (-) then this keyword is still compliant, it’s only those with a QS of 2/10 or 1/10 will cause problems.
Not responding to Programme Survey
All Ad Grants accounts must complete an annual programme survey sent to the login email address. Email accounts are often lost or neglected, however, failure to complete this will result in temporary suspension. Always make sure that your contact details are up to date so that this reminder isn’t missed.
The Next Steps
If you have been through the checklist and can’t identify why you may have been suspended or believe Google have made an error, then your best option to understanding the suspension reason will be to get into contact with Google, and talk to their grants team.
If however, having looked through the compliance checklist, you have identified what caused your account to be suspended, you will be in a position to request reactivation, which is detailed below.
Once you’ve discovered the reason why your account has been suspended, then the next step is fairly simple.
All you will need to do now (after you’ve made the necessary amendments), is request for reactivation of your Ad Grants account, filling in your Ad Grants account ID, sign-in email address, contact email and a brief explanation covering the steps you went through to get the account compliant in the notes section. Then press ‘Submit’.
Once submitted, Google state you should hear back from a Googler within three business days.
It’s worth keeping in mind, Google won’t be able to re-activate your account if there are still errors within the account, so it is your responsibility to make sure the account is fully compliant before being resubmitted. If you do experience any problems then go through the checklist again or get into contact with Google, as detailed below.
If you are amongst those that do not know why your account has been suspended, then you will need to contact Google directly.
Often the easiest way to do this is by remaining on the Google Ads interface and clicking on the ‘CONTACT US’ button in the red deactivation banner found on your overview. Similar to the one below.
You can get in contact with Google in one of three ways:
Live chat: – Often the best way to get in contact with Google when trying to diagnose the cause of an account suspension is through live support, you normally get connected quickly and it’s an easy way of getting hold of someone who can get the account re-reviewed then and there.
Call: – You can also contact your local Google office by phone (for example, here in the UK their number is 0800 169 0409. One problem with this method though is that they do say that the lines are only answered during office hours, which isn’t helpful if your suspension occurs at 6pm on a Sunday night!
Email Support: – Alternatively if you don’t have time to chat at the point of suspension, you can email the Google Grants help support at any time and should expect a reply within 24 hours.
If you’re still unsure as to what you need to do to get back up and running there is also the ability to use the Official Google Ads Community to view/join discussions about other instances of Ad Grants account suspensions and possible remedies for these.
Ways of preventing future suspensions
In order to avoid having your Ad Grants account suspended again, we’ve listed below some handy tips and tricks that will help ensure the account remains compliant and suspension free.
If you don’t have time to check your account every day, setting up automated rules are a really good way of staying on top of the basic requirements. These rules can be used to not only pause keywords that go below a quality score of 3, but also to turn on any that reobtain a quality score of 3 or above (or are shown with no score at all).
Note the basic rule can sometimes leave ad groups vacant and so you should also add a script to check and pause any empty ad groups.
If you think your account/campaigns could include any non-brand single keywords, you can easily check by filtering all keywords, with a filter that consists of ‘Does not contain’ and then entering a space ( ), all those non-brand keywords should be paused or removed. If the only keywords returned are your brand, then you may have to contact Google and register/whitelist them so that they don’t flag the keyword policy.
Use match types to increase ad groups
If you are struggling to create two distinct ad groups within a campaign, then a simple (and quick) way is to build out your ad groups based on match types. E.g. exact match & broad match ad groups can be run and will abide by the ‘2 ad groups per campaign’ quota.
Always review your compliance reports
If Google do send over a non-compliance report stating an infringement in policy, make sure you always look into this, even if you believe the account to be fully compliant. There is always the possibility that you may have miss-checked something or that something has changed recently which is causing Google to flag your account, so always treat this with the utmost importance in order to avoid suspension.
Hopefully, you won’t find yourself in the position of having your Ad Grants account suspended, but if you do our final (and probably most important) piece of advice is not to panic. Follow the guidance provided and work through each of the policies, checking off each requirement as you go. Once you’re sure you’ve met all the requirements, get that review request in and you’ll soon be back up and running and driving relevant traffic to your site.
If you’re still not sure how best to make sure your campaigns stay online, then drop us a line and we can look into how we could help!
The Screaming Frog SEO Spider has evolved a great deal over the past 8 years since launch, with many advancements, new features and a huge variety of different ways to configure a crawl.
This post covers some of the lesser-known and hidden-away features, that even more experienced users might not be aware exist. Or at least, how they can be best utilised to help improve auditing. Let’s get straight into it.
1) Export A List In The Same Order Uploaded
If you’ve uploaded a list of URLs into the SEO Spider, performed a crawl and want to export them in the same order they were uploaded, then use the ‘Export’ button which appears next to the ‘upload’ and ‘start’ buttons at the top of the user interface.
The standard export buttons on the dashboard will otherwise export URLs in order based upon what’s been crawled first, and how they have been normalised internally (which can appear quite random in a multi-threaded crawler that isn’t in usual breadth-first spider mode).
The data in the export will be in the exact same order and include all of the exact URLs in the original upload, including duplicates, normalisation or any fix-ups performed.
2) Crawl New URLs Discovered In Google Analytics & Search Console
If you connect to Google Analytics or Search Console via the API, by default any new URLs discovered are not automatically added to the queue and crawled. URLs are loaded, data is matched against URLs in the crawl, and any orphan URLs (URLs discovered only in GA or GSC) are available via the ‘Orphan Pages‘ report export.
If you wish to add any URLs discovered automatically to the queue, crawl them and see them in the interface, simply enable the ‘Crawl New URLs Discovered in Google Analytics/Search Console’ configuration.
This is available under ‘Configuration > API Access’ and then either ‘Google Analytics’ or ‘Google Search Console’ and their respective ‘General’ tabs.
This will mean new URLs discovered will appear in the interface, and orphan pages will appear under the respective filter in the Analytics and Search Console tabs (after performing crawl analysis).
3) Switching to Database Storage Mode
The SEO Spider has traditionally used RAM to store data, which has enabled it to crawl lightning-fast and flexibly for virtually all machine specifications. However, it’s not very scalable for crawling large websites. That’s why early last year we introduced the first configurable hybrid storage engine, which enables the SEO Spider to crawl at truly unprecedented scale for any desktop application while retaining the same, familiar real-time reporting and usability.
So if you need to crawl millions of URLs using a desktop crawler, you really can. You don’t need to keep increasing RAM to do it either, switch to database storage instead. Users can select to save to disk by choosing ‘database storage mode’, within the interface (via ‘Configuration > System > Storage).
This means the SEO Spider will hold as much data as possible within RAM (up to the user allocation), and store the rest to disk. We actually recommend this as the default setting for any users with an SSD (or faster drives), as it’s just as fast and uses much less RAM.
4) Request Google Analytics, Search Console & Link Data After A Crawl
If you’ve already performed a crawl and forgot to connect to Google Analytics, Search Console or an external link metrics provider, then fear not. You can connect to any of them post crawl, then click the beautifully hidden ‘Request API Data’ button at the bottom of the ‘API’ tab.
Alternatively, ‘Request API Data’ is also available in the ‘Configuration > API Access’ main menu.
This will mean data is pulled from the respective APIs and matched against the URLs that have already been crawled.
5) Disable HSTS To See ‘Real’ Redirect Status Codes
HTTP Strict Transport Security (HSTS) is a standard by which a web server can declare to a client that it should only be accessed via HTTPS. By default the SEO Spider will respect HSTS and if declared by a server and an internal HTTP link is discovered during a crawl, a 307 status code will be reported with a status of “HSTS Policy” and redirect type of “HSTS Policy”. Reporting HSTS set-up is useful when auditing security, and the 307 response code provides an easy way to discover insecure links.
Unlike usual redirects, this redirect isn’t actually sent by the web server, it’s turned around internally (by a browser and the SEO Spider) which simply requests the HTTPS version instead of the HTTP URL (as all requests must be HTTPS). A 307 status code is reported however, as you must set an expiry for HSTS. This is why it’s a temporary redirect.
While HSTS declares that all requests should be made over HTTPS, a site wide HTTP -> HTTPS redirect is still needed. This is because the Strict-Transport-Security header is ignored unless it’s sent over HTTPS. So if the first visit to your site is not via HTTPS, you still need that initial redirect to HTTPS to deliver the Strict-Transport-Security header.
So if you’re auditing an HTTP to HTTPS migration which has HSTS enabled, you’ll want to check the underlying ‘real’ sitewide redirect status code in place (and find out whether it’s a 301 redirect). Therefore, you can choose to disable HSTS policy by unticking the ‘Respect HSTS Policy’ configuration under ‘Configuration > Spider > Advanced’ in the SEO Spider.
This means the SEO Spider will ignore HSTS completely and report upon the underlying redirects and status codes. You can switch back to respecting HSTS when you know they are all set-up correctly, and the SEO Spider will just request the secure versions of URLs again. Check out our SEOs guide to crawling HSTS.
6) Compare & Run Crawls Simultaneously
At the moment you can’t compare crawls directly in the SEO Spider. However, you are able to open up multiple instances of the software, and either run multiple crawls, or compare crawls at the same time.
On Windows, this is as simple as just opening the software again by the shortcut. For macOS, to open additional instances of the SEO Spider open a Terminal and type the following:
open -n /Applications/Screaming\ Frog\ SEO\ Spider.app/
You can now perform multiple crawls, or compare multiple crawls at the same time.
7) Crawl Any Web Forms, Logged In Areas & By-Pass Bot Protection
The SEO Spider has supported basic and digest standards-based authentication for a long-time, which are often used for secure access to development servers and staging sites. However, the SEO Spider also has the ability to login to any web form that requires cookies, using its in-built Chromium browser.
This nifty feature can be found under ‘Configuration > Authentication > Forms Based’, where you can load virtually any password-protected website, intranet or web application, login and crawl it. For example you can login and crawl your precious fantasy football if you really wanted to ruin (or perhaps improve) your team.
This feature is super powerful because it provides a way to set cookies in the SEO Spider, so it can also be used for scenarios such as bypassing geo IP redirection, or if a site is using bot protection with reCAPTCHA or the like.
You can just load the page in the in-built browser, confirm you’re not a robot – and crawl away. If you load the page initially pre-crawling, you probably won’t even see a CAPTCHA, and will be issued the required cookies. Obviously you should have permission from the website as well.
However, with great power comes great responsibly, so please be careful with this feature.
During testing we let the SEO Spider loose on our test site while signed in as an ‘Administrator’ for fun. We let it crawl for half an hour; in that time it installed and set a new theme for the site, installed 108 plugins and activated 8 of them, deleted some posts, and generally made a mess of things.
While this can be helpful, the search engines will obviously ignore anything from the fragment and crawl and index the URL without it. Therefore, generally you may wish to switch this behaviour using the ‘Regex replace’ feature in URL Rewriting. Simply include #.* within the ‘regex’ filed and leave the ‘replace’ field blank.
This will mean they will be crawled and indexed without fragments in the same way as the default HTML text only mode.
9) Utilise ‘Crawl Analysis’ For Link Score, More Data (& Insight)
While some of the features discussed above have been available for sometime, the ‘crawl analysis‘ feature was released more recently in version 10 at the end of September (2018).
The SEO Spider analyses and reports data at run-time, where metrics, tabs and filters are populated during a crawl. However, ‘link score’ which is an internal PageRank calculation, and a small number of filters require calculation at the end of a crawl (or when a crawl has been paused at least).
The full list of 13 items that require ‘crawl analysis’ can be seen under ‘Crawl Analysis > Configure’ in the top level menu of the SEO Spider, and viewed below.
All of the above are filters under their respective tabs, apart from ‘Link Score’, which is a metric and shown as a column in the ‘Internal’ tab.
In the right hand ‘overview’ window pane, filters which require post ‘crawl analysis’ are marked with ‘Crawl Analysis Required’ for further clarity. The ‘Sitemaps’ filters in particular, mostly require post-crawl analysis.
They are also marked as ‘You need to perform crawl analysis for this tab to populate this filter’ within the main window pane.
This analysis can be automatically performed at the end of a crawl by ticking the respective ‘Auto Analyse At End of Crawl’ tickbox under ‘Configure’, or it can be run manually by the user.
To run the crawl analysis, simply click ‘Crawl Analysis > Start’.
When the crawl analysis is running you’ll see the ‘analysis’ progress bar with a percentage complete. The SEO Spider can continue to be used as normal during this period.
When the crawl analysis has finished, the empty filters which are marked with ‘Crawl Analysis Required’, will be populated with lots of lovely insightful data.
The ‘link score’ metric is displayed in the Internal tab and calculates the relative value of a page based upon its internal links.
This uses a relative 0-100 point scale from least to most value for simplicity, which allows you to determine where internal linking might be improved for key pages. It can be particularly powerful when utlised with other internal linking data, such as counts of inlinks, unique inlinks and % of links to a page (from accross the website).
10) Saving HTML & Rendered HTML To Help Debugging
We occasionally receive support queries from users reporting a missing page title, description, canonical or on-page content that’s seemingly not being picked up by the SEO Spider, but can be seen to exist in a browser, and when viewing the HTML source.
As the year comes to an end here at Screaming Frog we thought we would add to the inevitable roundups posts with our own. But instead of just bragging about the work we do or making pointless ‘SEO predictions for 2019’, we also want to celebrate the people that make the company what is it (we are also bragging, but we’re not just bragging). So take a look at what your favourite frogs (sorry Kermit) have been up to this year!
Conferences & Community
With so many fantastic conferences available to us in digital marketing, we’re spoiled for choice to choose which one to have some time off work learn from the industry’s best and brightest.
For the first time ever Screaming Frog had a stall this year, helping the community at our crawl clinic with Spider conundrums. We had a great time so expect to see us with a stall at more BrightonSEOs in the near future, now we’ve mastered SEO (Stall Engineering Optimisation)! Reminder to self – bring more swag.
We also like to get involved in other ways – our resident Strategist Charlie has also been running the Crawling and Indexation Indexing sessions at BrightonSEO (shameless plug – we’re running a similar SEO Spider training workshop in January!), and be sure to check out Oliver Brett’s talk next April! Not to be missed!
SearchLeeds was where Oliver, rumoured to be the mastermind behind Twitter’s favourite SEO meme account Lord of the SERPs, made his speaking debut this year and a fantastic conference held in a great city. How he managed to keep the topic of “Why SEOs are weirdos” down to just a few slides is testament to his speaking prowess.
Determined to show the SEO team they do more than just “set it and forget it”, the PPC team headed up to London for HeroConf. Despite rave reviews about the conference, the SEO team remain unconvinced by the PPC team.
Information Is Beautiful Workshop
Being the creative force behind almost everything we produce, and after seeing the SEO team heading out for ANOTHER conference, the design team headed up to London for an inspirational masterclass from the man behind some of the best content on the web – Information Is Beautiful. They brought their knowledge back to share with the SEO team, which resulted a creative boost that can only be compared to a strong coffee on a Monday morning.
Having launched a couple of years ago OutREACH conference has fast become one of the must-attend conferences at Screaming Frog. The team returned with fresh ideas for the ever more difficult task of outreaching, from quality speakers and we’re looking forward to seeing what next year has to offer!
Another conference held in very high esteem around the Screaming Frog office is Distilled’s SearchLove. A couple of our team headed up to London (we’ll hit the US iteration one of these days!) and were once again blown away by the expertise on offer.
During June we ventured into London for the popular Search Elite conference, which fully delivered on its promise of in depth presentations from expert speakers hailing from both sides of the Atlantic. Look out for the 2019 edition which has been rebranded to Digital Elite Day, combining the best from the worlds of search and CRO.
Perhaps a little less SEO focussed, but no less interesting was Nudgestock. Full of behavioural ideas that can be applied to a whole range of settings, a highlight was the behavioural nudges that can be used to increase conversion rates.
And how could we forget the 5 a side tournament hosted by BrightonSEO and Deepcrawl. The game on everyone’s lips was “El Crawlico” as we took on Deepcrawl to determine once and for all the superior crawler result which has no further implications on the quality of each respective tool.
Without wanting to dwell on the matter 9-2 much, Screaming Frog came away victorious! Looking forward to the battle next year!
UK Search Awards
One of the big events at the end of the year is of course, the UK Search Awards, celebrating the great work that’s been done by our industry. With only 35 awards up for grabs in a hugely competitive industry we had the honour to be shortlisted in 3 categories…
…And took home the prestigious award of “Best SEO Campaign” for our client Spotahome, a testament to the fantastic work put in by everyone at the company.
A fantastic evening to round off a brilliant year.
We’re always looking to support charities that mean a lot to the people that work at Screaming Frog and if they involve eating copious amount of cakes, muffins, and brownies then that’s just what we’ll have to do!
Macmillan Coffee Morning
Over £200 raised toward a great cause, but some tell-tale signs that not everybody was upfront about what they made themselves…!
Cakes, cakes and more cakes
A further £250 was raised from another cake sale earlier in the year, with Aaron insisting he hasn’t been part-timing as a baker whilst taking the title for the best bake.
Henley Toad Patrol
Despite what our our robots.txt might suggest we care about all animals in the Anura order! Which is why we helped to support a local team protecting toads from local traffic.
First Aid Training
To make sure we’re safe both in and out of the office, a number of the team received First Aid training. So, if you spot us at a conference you know you’re in safe hands… despite what the below might suggest!
Throughout the year many of the team have taken part in sports events, particularly of the endurance running kind. A better person would ignore the tempting comparison between a grueling slog with the end never quite coming into sight as your endurance ever more depletes, and that of running a marathon. I am apparently not that person.
Run-Club Takes on Reading (Nearly)
The Screaming Frog team took to the pavements en-masse as a weekly Run Club was set up to train for Reading Half Marathon. Under the expert guidance of Coach Euan Brock the team was fit and raring to go. Unfortunately, the weather was not. With heavy snow the night before the race, the event was cancelled and Run Club never got to put training into action. Some were more savvy about this than others…
And how could we show our faces in our town of residence, Henley-on-Thames, unless a member of the team also took on the hilly half-marathon here! Our Office Manager Ewa took up the mantle and smashed the course with such running prowess that she has become the official new coach of Run-Club.
Undeterred by the Reading setback Coach Brock and I set our sights to warmer months and a longer distance, running the Paris Marathon in April beside a strong support team from a cohort of Screaming Frog employees past and present. The pictures below represent the reality of the event versus how we describe it to other people.
Taking things a step further – Endure24
As if a marathon wasn’t enough, Matt Hopson, our Senior SEM Manager took things one step further (pun certainly intended), running in a team 24 hour ultra-endurance race for the second year running.
In the Octagon
Most people would run for the hills if asked to step into the octagon for an MMA fight, but our Digital Designer Mike is not one of them. He stepped up for 3 months intensive MMA training culminating in a fight night to raise money for charity. He insists this wasn’t just a way to take out his frustrations about the SEO team asking yet again for “just a few more changes”.
Making Sweet Music
We don’t just play to Google’s tune. Among our search specialists, we also have a variety of talented musicians that this year took to the stage. Our Head of SEO Pat Langridge, guitarist for Reading-based Las Nova caught here performing Ricky Gervais’ Free Love Freeway, was so accomplished that they even caught the attention of the man himself on Twitter.
Not to be outdone, James McCrea, formerly touring worldwide on a cruise-ship before switching to the rockstyle lifestyle at Screaming Frog, hit up Oakford Social Club as the guitarist of Nobodies Birthday.
But we don’t just limit ourselves to the more traditional instruments. Faisal headed over to Austria to perform his own brand of electronic music which he describes as:
“One Hand Clap, a part generative performance piece that analysed player controller data and the spectromorphology of acousmatic soundtracks to generate it’s own soundscape.”
An impressive feat, but we’ll stick with techno…
You don’t need an instrument to hit the airwaves. Sam, one of the dev superstars working on the SEO Spider, has been working on Spinnerproof – a podcast dedicated to Robot Wars coverage among other things. If his robot making skills half as good as his dev skills, we’re all in serious trouble!
Heading out as an Army
A group of frogs is known as an army, which feels rather appropriate for when the team step out into the wider world on a company social.
Henley Royal Regatta
The normally sleepy, quaint town of Henley-on-Thames in which we are based becomes a bustling hive of boats, visitors, and 7 ft rowers during the regatta season. Never wanting to be left out, our summer social now happily coincides with the regatta. This year was beautifully sunny and hot, so we headed down to the river armed en-masse with suncream and shades.
We embrace all cultures at Screaming Frog, so felt it our duty to support the German tradition of Oktoberfest, which has become a yearly occurrence. Although Bavaria is admittedly a long way from our meeting point in nearby Reading, a great stein was had by all that attended.
Whilst being outside of London has made us very familiar with the Henley train branch line and the Twyford to Paddington route, which stops at every conceivable location South of London, it also offer some lovely views and friendly faces, which we like to take advantage of.
This year we’ve hit the town to enjoy some wine and cheese, and quenched our thirst riverside in the Summer evenings for work socials.
Would any workplace be complete without an office Christmas party? We maintained the yearly tradition of heading into town for some pool, darts and fun, before the office Secret Santa and heading to the local pub for a 3 course dinner – Merry Christmas!
I am delighted to announce the release of the Screaming Frog Log File Analyser 3.0, codenamed ‘Yule Log’.
If you’re not already familiar with the Log File Analyser tool, it allows you to upload your raw server log files, verify search engine bots, and analyse search bot data for invaluable SEO insight and understanding about how search engines are really crawling a site.
While this update is fairly small, it includes some significant new features and upgrades to the Log File Analyser based upon user requests and feedback. Let’s take a look at what’s new in version 3.0.
1) Configurable User-agents
You’re now able to completely configure the user-agents you wish to import into a project. You can choose from our pre-defined list of common search engine bot user-agents, or de-select those that are not relevant to you.
This helps improve performance and reduces disk usage by focusing only on bots of interest. You can also add your own custom user-agents, which are then stored and can be selected for projects.
Previously the Log File Analyser only analysed bots for Google, Bing, Yandex and Baidu, so now this allows users to monitor bots from other popular search engines, such as Seznam in the Czech Republic. It also allows users to analyse and monitor other specific user-agents of interest, such as Google-News, or Adsbot etc.
2) Include Functionality
Similar to the SEO Spider include feature, you’re able to supply a list of regular expressions for URLs to import into a project. So if you only wanted to analyse certain domains, or paths, like the /blog/, or /products/ pages on a huge site, then you can now do that to save time and resource – and more granular analysis.
3) New Log File Format Support
The Log File Analyser now supports Application Load Balancing log file format and Verizon Edge Cast format.
All you need to do is drag and drop in log files (or folders) as usual, and the Log File Analyser will automatically detect their format and start to analyse them.
4) JSON Support
You can now upload log files in JSON format to the Log File Analyser. There isn’t a common standard, so we have utilised customer provided JSON formats, and provided support for them all. We’ve found these to cover most cases, but do let us know if you have any issues when importing.
We have also included some other smaller updates and bug fixes in version 3.0 of the Screaming Frog Log File Analyser, which include the following –
The overview graphs are now configurable, so you can easily select the date range and metrics to display.
The workspace storage location (where you store the database with log files) is now configurable.
X-Forwarded-For in W3C logs is now supported.
Time-taken in W3C logs is now supported.
We hope you like the update! As always, please do let us know via support if you experience any issues at all.
Thanks to the SEO community and Log File Analyser users, for all the feedback and suggestions for improving the Log File Analyser.
If you’re looking for inspiration on analysing log files and using the Log File Analyser, then check out our guide on 22 Ways To Analyse Log Files using our Log File Analyser.
Last Thursday evening some of the Screaming Frog team and I headed into London for the annual UK Search Awards. The event, hosted at the Bloomsbury Big Top, recognises the very best in UK search and ended up being a perfect way to sign off what has been a tremendous year for Screaming Frog.
We scooped the prestigious ‘Best SEO Campaign’ award alongside our client Spotahome, for our thorough technical audit, comprehensive content campaign and subsequent results for the client. The competition was fierce, and it was an immensely proud moment for the team.
From the judges:
“The campaign by Screaming Frog & Spotahome was extremely comprehensive, covering all aspects of SEO that included PR and SEO. They commented that it was enjoyable to watch the magic of SEO at work in this campaign.”
It was a great early Christmas gift for us, and we can’t wait to get stuck into 2019! We feel the below gif of Mairead from our PR team perfectly sums up our feelings about this awesome achievement.
We’ve often been asked by customers where we are at conferences, as we don’t ever have a conference stand, or actively try and sell our software or services at events.
While we don’t like to sell in this way at events, our team love to attend conferences. We’re running the “managing search engine crawling and indexing training” course at the legendary brightonSEO this week, and are often there as attendees or speakers (and you’re often more likely to bump into us at the pre or after parties).
If you’ve attended brightonSEO in the past you may have seen our Screaming Frog beer mats, too.
However, we thought we’d try something a little different at brightonSEO on Friday (28th September 2018), where we will have a stand for the first time, and will be running a small crawling clinic.
Come & Chat About Crawling
The idea is that you’ll be able to come and say hello to the Screaming Frog team and chat about any crawling problems you experience, how best to tackle them and any feature requests you’d like to see for the software – or just pilfer some swag.
We’ll be on hand to help with any crawling issues and will have a few machines to run through anything.
The team will be on the main exhibition floor (B14) throughout the day, so if you’d like to meet out team and chat about crawling, log files, SEO in general, new features launched in version 10 or ‘things we really should have included in version 10 and you can’t believe we didn’t’, then please do come over.
Alternatively, you can say hello in the bar before or afterwards! See you all on Friday.
We are delighted to announce the release of Screaming Frog SEO Spider version 10.0, codenamed internally as ‘Liger‘.
In our last release, we announced an extremely powerful hybrid storage engine, and in this update, we have lots of very exciting new features driven entirely by user requests and feedback. So, let’s get straight to them.
You can now schedule crawls to run automatically within the SEO Spider, as a one off, or at chosen intervals.
This should be super useful for anyone that runs regular crawls, has clients that only allow crawling at certain less-than-convenient ‘but, I’ll be in bed!’ off-peak times, uses crawl data for their own automatic reporting, or have a developer that needs a broken links report sent to them every Tuesday by 7 am.
The keen-eyed among you may have noticed that the SEO Spider will run in headless mode (meaning without an interface) when scheduled to export data – which leads us to our next point.
2) Full Command Line Interface & –Headless Mode
You’re now able to operate the SEO Spider entirely via command line. This includes launching, full configuration, saving and exporting of almost any data and reporting.
It behaves like a typical console application, and you can use –help to view the full arguments available.
You can read the full list of commands that can be supplied and how to use the command line in our updated user guide. This also allows running the SEO Spider completely headless, so you won’t even need to look at the user interface if that’s your preference (how rude!).
We believe this can be an extremely powerful feature, and we’re excited about the new and unique ways users will utilise this ability within their own tech stacks.
3) Indexability & Indexability Status
This is not the third biggest feature in this release, but it’s important to understand the concept of indexability we have introduced into the SEO Spider, as it’s integrated into many old and new features and data.
Every URL is now classified as either ‘Indexable‘ or ‘Non-Indexable‘.
These two phrases are now commonplace within SEO, but they don’t have an exact definition. For the SEO Spider, an ‘Indexable’ URL means a page that can be crawled, responds with a ‘200’ status code and is permitted to be indexed.
This might differ a little from the search engines, which will index URLs which can’t be crawled and content that can’t be seen (such as those blocked by robots.txt) if they have links pointing to them. The reason for this is for simplicity, it helps to bucket and organise URLs into two distinct groups of interest.
Each URL will also have an indexability status associated with it for quick reference. This provides a reason why a URL is ‘non-indexable’, for example, if it’s a ‘Client Error’, ‘Blocked by Robots.txt, ‘noindex’, ‘Canonicalised’ or something else (and perhaps a combination of those).
This was introduced to make auditing more efficient. It makes it easier when you export data from the internal tab, to quickly identify which URLs are canonicalised for example, rather than having to run a formula in a spreadsheet. It makes it easier at a glance to review whether a URL is indexable when reviewing page titles, rather than scanning columns for canonicals, directives etc. It also allows the SEO Spider to use a single filter, or two columns to communicate a potential issue, rather than six or seven.
4) XML Sitemap Crawl Integration
It’s always been possible to crawl XML Sitemaps directly within the SEO Spider (in list mode), however, you’re now able to crawl and integrate them as part of a site crawl.
You can select to crawl XML Sitemaps under ‘Configuration > Spider’, and the SEO Spider will auto-discover them from robots.txt entry, or the location can be supplied.
The new Sitemaps tab and filters allow you to quickly analyse common issues with your XML Sitemap, such as URLs not in the sitemap, orphan pages, non-indexable URLs and more.
You can also now supply the XML Sitemap location into the URL bar at the top, and the SEO Spider will crawl that directly, too (instead of switching to list mode).
5) Internal Link Score
A useful way to evaluate and improve internal linking is to calculate internal PageRank of URLs, to help get a clearer understanding about which pages might be seen as more authoritative by the search engines.
The SEO Spider already reports on a number of useful metrics to analyse internal linking, such as crawl depth, the number of inlinks and outlinks, the number of unique inlinks and outlinks, and the percentage of overall URLs that link to a particular URL. To aid this further, we have now introduced an advanced ‘link score’ metric, which calculates the relative value of a page based upon its internal links.
This uses a relative 0-100 point scale from least to most value for simplicity, which allows you to determine where internal linking might be improved.
The link score metric algorithm takes into consideration redirects, canonicals, nofollow and much more, which we will go into more detail in another post.
This is a relative mathematical calculation, which can only be performed at the end of a crawl when all URLs are known. Previously, every calculation within the SEO Spider has been performed at run-time during a crawl, which leads us on to the next feature.
6) Post Crawl Analysis
The SEO Spider is now able to perform further analysis at the end of a crawl (or when it’s stopped) for more data and insight. This includes the new ‘Link Score’ metric and a number of other new filters that have been introduced.
Crawl analysis can be automatically performed at the end of a crawl, or it can be run manually by the user. This can be viewed under ‘Crawl Analysis > Configure’ and the crawl analysis can be started by selecting ‘Crawl Analysis > Start’. When the analysis is running, the SEO Spider can continue to be used as normal.
When the crawl analysis has finished, the empty filters which are marked with ‘Crawl Analysis Required’, will be populated with data.
Most of these items were already available via reports, but this new feature brings them into the interface to make them more visible, too.
We have a confession. We have always loved the idea of crawl visualisations, but have always had a problem with them – they were rarely actionable. They too frequently don’t help diagnose actual problems, hide data, and often don’t reflect the real world view of a crawl either. Although, they have always looked pretty, and some SEOs are able to read them like a piece of abstract art. That said, the actual concept is fun and exciting, and due to overwhelming popular demand, we went to the drawing board.
Portent originally introduced the concept of force-directed diagrams to the SEO industry, and a few providers already provide various useful site visualisations (and kudos to them), but we don’t believe any of them were perfect, and we wanted to see if we could challenge our own assumptions of their limits.
We wanted to build a better way of visually understanding a site, its architecture, internal link structure and issues. We wanted to make them scalable, and we didn’t want to have to hide data from users to make them work.
So, we have introduced two types of diagrams, and two different perspectives on viewing a site, each with their own benefits that we believe provide more actionable data, and insight.
These include two crawl visualisations, and two directory tree visualisations.
The force-directed crawl diagram and crawl graph visualisations are useful for analysis of internal linking, as they provide a view of how the SEO Spider has crawled the site, by shortest path to a page. Here’s how our own website can be seen with our force-directed crawl diagram.
Indexable pages are represented by the green nodes, the darkest, largest circle in the middle is the start URL (the homepage), and those surrounding it are the next level deep, and they get further away, smaller and lighter with increasing crawl depth (like a heatmap).
One of the problems with crawl visualisations is scale. They are really memory intensive and the force-directed crawl diagram visualisations do not scale very well due to the amounts of data. The browser will start to grind to a halt at anything above 10k URLs, unless interactivity and other bells and whistles were removed, which would be a shame, as that’s part of their appeal. However, it’s sites on a larger scale that need visualisations the most, to really understand them.
So, as site architecture doesn’t start and end at the homepage, our visualisations can be viewed from any URL.
The visualisation will show up to 10k URLs in the browser, but allow you to right-click and ‘focus’ to expand on particular areas of a site to show more URLs in that section (up to another 10k URLs at a time). You can use the browser as navigation, typing in a URL directly and moving forwards and backwards with ease.
You can also right-click on any URL in a crawl, and open up a visualisation from that point as a visual URL explorer.
When a visualisation has reached the 10k URL limit, it lets you know when a particular node has children that are being truncated (due to size limits), by colouring the nodes grey. You can then right click and ‘explore’ to see the children. This way, every URL in a crawl can be visualised.
The pastel red highlights URLs are non-indexable, which makes it quite easy to spot problematic areas of a website. There are valid reasons for non-indexable pages, but visualising their proportion and where they are, can be useful in quickly identifying areas of interest to investigate further.
We also took the force-directed diagrams a few steps further, to allow a user to completely configure them visually, in size of nodes, overlap, separation, colour, link length and when to display text.
After all, they are arguably more like works of art.
More significantly, you also have the ability to scale visualisations by other metrics to provide greater insight, such as unique inlinks, word count, GA Sessions, GSC Clicks, Link Score, Moz Page Authority and more.
The size and colour of nodes will scale based upon these metrics, which can help visualise many different things alongside internal linking, such as sections of a site which might have thin content.
Or highest value by link score.
It can be hard to quickly see pages in a force-directed diagram, as beautiful as they are. So you can also view internal linking in a more simplistic crawl tree graph, which can be configured to display left to right, or top to bottom (or bottom to top, if you’re slightly weird).
You can right click and ‘focus’ on particular areas of the site. You can also expand or collapse up to a particular crawl depth, and adjust the level and node spacing, to get it just right.
Like the force-directed diagrams, all the colours can also be adjusted for fun (or if you have to be boring and use brand colours).
Directory Tree Visualisations
The ‘Directory Tree’ view in the SEO Spider has been a favourite of users for a long time, and we wanted to introduce this into our visualisations.
The key differentiator is that it helps to understand a site’s URL architecture, and the way it’s organised, as opposed to internal linking of the crawl visualisations. This can be useful, as these groupings often share the same page templates, and SEO issues (but not always).
The force-directed directory tree diagram is unique to the SEO Spider and you can see it’s very different for a crawl of our site than the previous crawl diagram and easier to visualise potential problems.