PAGES is the quarterly print magazine about SEO. Subscribe to discover where SEO fits into your marketing strategy. Explore expert insights, industry news, and SEO resources for digital marketing professionals.
Measurability is an oft-repeated buzzword in digital marketing. The ability to discreetly measure the impact and success of our work, and even ROI, is the differentiator we love to tout over our more traditional marketing brethren.
Everything can and should be measured. Right?
In SEO this takes many forms: tracking website changes, links built, keyword rankings, position fluctuations, clicks, impressions, CTR, organic pageviews, sessions, users. Time on page, pages per session, scroll depth, goal completions…and on and on the list goes.
The truth is, this much data can be more noise than signal. Data overload leads to short term and short-sighted decisions. It becomes a race to make the data move in the way we predict and want, with goals and KPIs chased fervently instead of investing in long-term, business-impacting change.
I love data-backed decisions just as much as everyone else in the industry, but often times the most optimal changes within a business won’t be the ones that effect KPIs the fastest.
The work I love the most, the work that brings me the most personal satisfaction and joy, is affecting meaningful change within an organization.
As an agency — often perceived as a partner at best, or a vendor at worst — change is rarely done through the work we execute ourselves. It’s achieved through great communication, clear consultation, well-documented work, and ongoing education. My goal is always to help a client internally achieve meaningful, institutional growth and empower them to succeed in search on their own, without oversight.
This is one such story.
In October 2017, the strategy department of Page One Power was retained by a client who had launched six months prior in the finance niche and was struggling to find a foothold.
We were helping build links back to their site, but they were struggling to see the results they expected and needed.
The client had identified organic search as a primary channel for growth and audience development, yet were only somewhat successful in creating content that stuck within a SERP. After analysis, it was clear success was haphazard and a result of quality work accidentally targeting a SERP well, as opposed to strategically targeting specific SERPs with good work. They were producing 50 pages per month, with roughly 250 pages already published. Despite this, virtually all organic search traffic went to a handful of pages, ranking for terms only somewhat-relevant to their goals.
The client had thorough — really, really thorough — audience research with a primary topic of focus, complete with well-developed personas across multiple demographics to ensure audience value in every piece they created. And the content was quality. Nothing was thin, off-topic, or lacking in audience value.
The issue was that the content was broad and generalized, with little research to identify keyword themes, searcher intent, ranking competitors, or any of the other typical work that goes into creating optimized content to rank for a desired SERP.
Disclaimer:I often say “rank for a desired SERP”, but what I really mean is an entire category of keywords, both long tail and head term, with the most competitive SERP representing the entire category of keywords. Basically, if you can rank for the most competitive head term, you should rank well for the rest of the keyword variations and themes with decent on-page optimization.
After interviewing the client and reviewing goals, expectations, and work-to-date, the partnership proved promising: they had the budget for a long-term campaign, belief in search as a channel, and a willingness to invest in quality content.
Despite this, long-term success wasn’t certain.
The financial space is extremely, extremely competitive. And not just for the obvious terms surrounding credit cards, personal loans, mortgage calculators, debt consolidation, financial planning, etc. Even outside these head terms, there are many different sites and organizations invested in creating quality content at scale, with the intent to answer nearly any question a person might have. And these are large, established, and dominant brands such as The Balance, NerdWallet, Investopedia, Credit Karma, and more.
The client was aware of the inherent challenge within the niche, and willing to invest the time, energy, and resources necessary. Expectations were reasonable, but they needed a clear strategy to find a foothold in search. Specifically, they needed to demonstrate an ability to rank in relevant SERPs and grow traffic month-over-month, or continued investment would be at risk. The runway to success wasn’t forever.
Their mistake was simple: they failed to consider how the pages they were creating would be found within search, and had only focused on their personas and audience. They needed to better understand searcher behavior, and identify winnable SERPs where existing content ranking for relevant searches could be displaced.
This was the institutional change I sought to affect.
Page One Power specializes in link building through manual promotion and content creation. We also offer keyword research, technical SEO, and content audits. The client contracted Page One Power with two specific goals:
Improve their content strategy to more consistently create pages that would rank for meaningful keywords and drive traffic.
Create a systematic approach to identify which pages were best suited for link building and which pages were most likely to need links in order to earn rankings and traffic.
Content quality wasn’t the issue; they were struggling to produce content that targeted a specific keyword and query with a realistic chance to rank.
I see it often in my work with clients: a team of savvy content creators lack the SEO insight to successfully create content that ranks. The mistake is the level of detail and granularity of the content — it’s overgeneralized, with the expectation that “understanding an audience” means you simply add a keyword, and you then have content that will rank.
The truth is, search content requires research before creation. It’s important to identify a top-level keyword to target, the natural variation and themes involved, which domains currently rank, how the pages are formatted, the overall quality of the information, what topics they cover, whether or not they miss a piece of searcher intent, and how many links are involved. Basically, the goal is to find a gap in terms of information offered to searchers, create a better page than currently exists, and determine how many links are needed to reasonably expect to rank.
For example, the client had consistently plugged [credit score] into their Yoast SEO plugin, even though they knew it was beyond their means, but hadn’t looked at more granular and modified versions. They were aiming each page at the same head term, rather than answering distinct and exact-match queries.
Despite the validity of the topic, they had no reasonable opportunity to rank for [credit score] in the short term, nor did it make sense to create multiple (20 or so) pages that all theoretically target the head term [credit score]. Sure, in practice they were targeting variations and modified versions of the term — but to them, their page was most relevant to the head term [credit score].
My plan was straightforward:
Perform a content audit to analyze existing content.
Present the content audit to the client’s head of content (HoC) and train him on SEO and keyword research.
Optimize existing pages using the content audit with the HoC.
Work personally with the HoC on the next wave of content planning to identify winnable SERPs.
Continue to run a monthly content audit (maintenance) and work with HoC to ensure continued success and education.
Occasionally research and hand-deliver opportunities to ensure continued growth and education.
The first step was simple: I performed a content audit and worked with the client to analyze which pages were successful, which pages missed the mark, and which pages were close, but used the wrong keywords. I presented the analysis so they understood why certain pages were successful, why others were not, and how to optimize existing pages to improve.
Next, I helped the head of content work through revisions of existing content, with the intent to improve their targeting and optimization for terms they were almost ranking on page one. Although the revisions were only somewhat successful, it was an extremely valuable moment in helping their head of content truly understand keyword research and its role in content designed for search.
I worked through the plan and documentation with the head of content, but I wanted the client to be invested and understand why we were making specific changes so everyone could better understand the difference between targeted, optimized content and well-written, unoptimized content.
They went back through and performed the keyword research necessary to optimize the pages for the correct keyword themes and variations.
Lastly, I put the content audit in maintenance mode, meaning I continued to pull data for content performance month-over-month for new and existing pages, meeting with the head of content to review performance once per month. Within the process I would occasionally uncover an opportunity myself (digging through Search Console and competitor content), and I delivered those opportunities in our monthly meeting.
The client went from struggling in search to consistently making search-focused content able to rank very well for their intended keywords.
Within two months, they evolved from performing the barest of keyword research that grasped at unattainable head terms, to performing deep keyword research which identified topical clusters and gaps in SERPs. Their content consistently answered granular questions immediately and provided deeper, additional context below.
They’ve been a tremendous partner and I’m exceedingly humbled to see their continued growth. We’re working on a new phase of strategy now, and without a doubt the most pride I take isn’t the ROI of our services, the keywords they rank well for, or even the up-and-to-the-right organic traffic growth on the website.
I’m most proud of the internal growth and improvement their own organization attained, and the fact that if I were to walk away now they could easily carry on creating content that drives organic traffic. I’ve done more than deliver excellent work: I’ve helped them create an excellent system to achieve repeatable success.
Optimization is at my very heart, and it’s much more optimal to teach others than to do the work for them. That’s how true growth is achieved.
The organic traffic is fun to look at, though.
Cory Collins works in strategy development at Page One Power. Cory is a writer, runner, SEO strategist, beer brewer, and lives with his wife and dogs in Boise, Idaho. Cory's super power is eternal curiosity.
Measuring SEO ROI is more difficult than most people understand.
It sounds simple; just tag your pages properly with analytics code, make sure you’re tracking conversions and can associate a dollar value with those conversions, and you’re done, right? Not so fast! There is much more to the challenge of SEO ROI than that.
To illustrate the point, I’m first going to explain how it works with PPC. Here is a simple breakdown:
On any given day, the amount of money I spend in paid search has a one-to-one correspondence to clicks to my website, and some of those clicks result in revenue-generating transactions (or “contact us” requests, or some other desirable action). To a large degree, paid search is a direct response medium.
To be fair, there is definitely the concept of deferred conversions that result from paid search too. Someone clicks on an ad and goes to your site, doesn’t convert in that session, but comes back at some later time — perhaps by some other means than clicking on an ad — and converts. In PPC terms, we refer to that as an “assisted conversion.”
That said, a large part of the realizable revenue comes from those initial clicks, and the cost of the campaign scales in a linear way in relationship to the clicks received. The key deliverable from your PPC spend is clicks from users.
Now, let’s look at SEO. To set the tone, let me share how I often explain what an SEO sales pitch is like: “I don’t know what I’m going to do for you yet, I don’t know how fast you’re going to get results, and I don’t know how big those results are going to be. $10,000 per month, sign here.” Of course, I’m joking when I put it that way, but there is a real element of truth to that being what an SEO pitch sounds like.
The reason the typical pitch sounds so obscure is that the key deliverable from SEO campaigns is increases in organic search rankings. These gains in rankings then deliver clicks from users on an ongoing basis, often for an extended period of time.
It’s like a gift that keeps on giving — but that’s not how most businesses look at ROI from SEO revenue. For example, if I invest $1M in SEO this year, and we get $4M in SEO-related sales, which reflects a lift from $2M the prior year, what do I calculate as the ROI? Here is the way many organizations look at it:
Is it ($2M – $1M) / $1M = 100%, because that’s how much my SEO revenue grew (as opposed to my total SEO revenue) in the same year? I’d argue that this isn’t a good way to look at it at all. Why? Well, one reason is that if I invest nothing in SEO at all, the $2M I got in SEO revenue last year will likely be less in this year, yielding a chart more like this for SEO revenue:
Continue with no SEO revenue for multiple years, and you’ll see that number plummet down to near zero. Why? Because your competition is investing in SEO while you’re not. Their SEO gains will become your SEO losses. Part of what you’re accomplishing with your SEO investment is defending your current levels of SEO revenue.
But there is also a forward-looking part to this story too! Let’s go back to my original scenario, where I invested $1M in SEO in the current year and saw SEO revenue go up from $2M to $4M. Now, let’s imagine that I shut off my SEO investment in the next year. What happens in that year?
You guessed it: SEO revenue does not instantly drop back to $2M. In fact, over time, the yield on that $1M SEO investment might look like this:
That’s one heck of a different picture of SEO ROI, is it not? Oh, and don’t forget the fact that in the current year, the one where SEO revenue went from $2M to $4M, I would have lost $500K of SEO revenue if I hadn’t made any investment at all. Now your real SEO ROI looks something like this:
Now that we’ve established some basic concepts, let’s look at a couple of models for SEO campaigns and the ROI you might get.
Model 1: The Self-Defense SEO ROI
Let’s say you have that $2M per year revenue from SEO run rate coming into a year. You know that you could see large-scale growth if you could invest $1M in SEO, but you just can’t — the budget you can afford is only $250K. Let’s say it turns out that $250K is enough SEO investment that your revenue for the year will turn out to be $2M, i.e. no growth.
You did enough to keep from losing ground to your competition. They were doing the best they could to take some of your market share, but failed. Now let’s say you would have lost $500K in revenue if you hadn’t invested the $250K. Your actual ROI in this scenario would be:
Note that this is the story if you look at this on a one-year basis only. Let’s say you invest $250K per year, over two years, and you manage to keep the SEO revenue at $2M for both years. Using the numbers I shared above on the “no SEO investment overall multiple years” model, I will have defended $500K of revenue in year one and $1M of revenue in year two. The picture of this ROI scenario looks like this:
Now you’re getting a reasonable model to estimate the SEO investment results when you invest only enough to preserve your revenue, but show no growth.
Model 2: Growth Mode SEO ROI
Earlier in this article I laid out a model that suggested ROI over a five-year period was 375%. Should I walk into a pitch and tell a client we’re going to get 375% ROI? Frankly, that would make for a challenging conversation.
If you’re an executive, you likely have little interest in a five-year ROI model; what you’re able to achieve in this year is probably most important. You may even have bonus compensation programs based on the ROI you can get with your budget in the current year.
However, I also believe that educating yourself and your team on how it works with SEO is important. If you’re interested in the business for the long haul, then how it will perform next year should be of interest to you — and your team too, even if it’s a secondary interest. Everyone should want to be part of a growing business, not a shrinking one.
For that reason, I’d show a two-year view similar to this one:
This at least gives you and your management team a view of the bigger picture of how SEO ROI works.
Approaching the Budget Conversation
If your business is like most businesses, the focus on the current year is natural. In larger companies, the executive staff has current year goals, and compensation is often tied to those goals. But, if the executive staff are forward-looking, the long-term health of the business is arguably of great interest too.
Learn the mindset of who you are going to be presenting the budget to, whether you’re part of the executive team, or just building a plan to present to them. This includes understanding the overall organizational budget and margin goals, and adjusting your budget proposal accordingly. Start the conversation by making sure that your team understands the difference in the deliverables between PPC and SEO. Here is a simple visualization of it:
Once this concept is clear in everyone’s mind, the rest of the story becomes quite a bit easier to tell. From here, you can lay out the various ROI models using two, three or even five-year time horizons to show the broader strategic picture, and the one-year ROI to outline the impact on company performance in the current year.
The first step in understanding how to measure ROI is to understand a proper definition for the impact of your investment. As you have seen, this is not easily done in the world of SEO.
In other articles on this blog, you’ll see a lot of invaluable information on how to achieve and measure results. But, along the way, remember the enduring aspects of the benefits of an SEO investment. It’s not contained to a single year in the same simple way as PPC and other types of advertising campaigns.
The more you and your team understand how SEO ROI differs from other mediums, the better, as it will help your organization have the right perspective on the role that SEO investments should play in your overall marketing mix.
Eric Enge is General Manager of Perficient Digital, a full-service, award-winning digital agency. Previously Eric was the founder and CEO of Stone Temple, also an award-winning digital marketing agency, which was acquired by Perficient in July 2018. He is the lead co-author of The Art of SEO, a 900+ page book that’s known in the industry as “the bible of SEO.” He is a prolific writer, researcher, teacher and a sought-after keynote speaker and panelist at major industry conferences.
Between the PAGES Episode 2 - Charles Taylor - Verizon Consumer Markets - SEO Testing - SoundCloud (1073 secs long, 4 plays)Play in SoundCloud
In the practice of Search Engine Optimization, there are a lot of unknowns. Sure Google gives us ideas and points us in a direction. But even then it can be hard to know exactly what does what when we are trying to make an impact on the work we do. Listen in as Charles breaks down his process to test and prove the work of SEO at Verizon.
Charles is a columnist in PAGES, where he shares his findings from SEO tests with the goal of helping digital marketers make informed decisions about SEO best practices.
In this episode, Charles and I discuss SEO testing. The practice of creating tests to prove theories and ideas we have about the work of Search Engine Optimization.
SEO is a field where testing assumptions can reveal truths about optimizing your website. Busting SEO myths helps us make better business decisions and get more from investments in SEO.
In Issue #4 of PAGES, I busted a myth about phone number formats and their impact on search rankings. We determined that the format used for a phone number does impact the way Google recognizes the phone number.
As is often the case with SEO testing, I invariably forget something.
In the last issue, I tested the best telephone formats to use on the web so that Google can understand them. If you did not read that article, I suggest you get your hands on the Q3 issue of PAGES SEO Magazine and review it.
I mentioned that I had not tested the phone number format ###_###_####. I added this format to the three iterations of my test grid. After a couple weeks, the results were in.
While that format did not perform as well as the “winning formats”, it did deserve an honorable mention. So again, I would not use ###_###_#### on my website but if forced to use that format on another site I would not worry too much – Google will likely figure it out.
The second test variation I had not considered was wrapping the phone numbers in the HTML telephone link code:
I tried two different variations of this test. First, I wrapped this code around the “losing format” of the phone number (the one using periods – ###.###.####). The second variation was to wrap the code around just text. The phone number never appears on the page, only within the telephone code. If Google reads the number in the code and uses it for on-page optimization purposes, the pages should rank in the search results.
In both cases, Google did not seem to use the phone number as an on-page optimization factor. In the first case, the “losing format” never ranked for any of the other phone number format searches, only searches explicitly for ###.###.####. In the second case, (text wrapped in code) the page never appeared for searches for the phone number – even when using the “site:” command. I also tried searching for the full phone number “1-###-###-####” and even “+1-###-###-####” both without and with the “site:” command – again, no results.
I feel this demonstrates that Google does not use the telephone code for on-page optimization purposes.
I hope this test helps you make more informed decisions about the content on your site! This is always the goal of SEO testing. If you have questions or comments about the test, let me know on Twitter: @CharlesHTaylor
Charles has been actively involved in online marketing since 2000. For the past 15 years, he's focused on SEO in a number of B2B and B2C verticals – legal services, eCommerce, information marketing and affiliate marketing. He is currently the SEO Manager for Verizon's Fios division. Charles is always looking for new ways to help new and established companies solve their SEO challenges.
Hello everyone! Welcome to the new year — along with it comes a new issue of PAGES!
This issue of PAGES is our first since we have shifted the format of our content. Issue #5 doesn’t come with a specific theme for the articles inside, but each of those articles will help you improve SEO strategy within your organization.
Each issue of PAGES now covers a range of topics, ranging from case studies, to guidance on best practices, to industry news and updates.
In this issue, we discuss the integration of SEO and social media, bust more myths about what it means to be optimized, discuss local SEO, and share tips for maximizing the reach of your content, just to name a few of the topics covered.
Let’s get to know the contributors who wrote articles for our first issue of the new year!
Charles Taylor has been actively involved in online marketing since 2000. He’s currently the SEO Manager for Verizon’s Fios division. He is always looking for ways to help new and established companies solve their SEO challenges.
Charles is back to bust another SEO myth — this time, it’s title tag lengths!
Joelle is the Director of Marketing and Growth at Bookmark Content and Communications, a full-service global marketing company that brings together content and communications. In addition to digital marketing and SEO, she also loves travel, pop culture, gadgets, and tech.
Joelle’s article on content strategy in this issue of PAGES will set you on the right track for 2019.
Greg is Senior SEO Strategist at Found, where he ensures the team is deploying the latest tactics for their clients, headed in the right direction with strategy, and fully integrated with other digital channels.
Greg shares ways to make the most of your time as it applies to SEO work in his article in our fifth issue.
If you’re not already subscribed to PAGES, you can visit here to get signed up. Subscribing is free and you’ll get PAGES delivered straight to your mailbox.
Don’t forget to share your thoughts on Issue #5 on one of our social media channels using the tag #pagesSEOmagazine, or to share PAGES with your friends who could benefit from it. We love hearing what you think!
You can also give us a follow to stay up-to-date with the release of fresh articles here on our blog.
If you’ve been waiting for the online release of last issue’s articles, blog versions of the articles from Issue #4 will be released starting next week, so stay tuned! The web version of Issue #4 is now available online as well.
We’re also excited to announce the launch of the new PAGES podcast, hosted by our Editor-in-Chief Joe Oliver, and featuring authors from past and coming issues of PAGES magazine. Joe is sitting down with our contributors to dive a little deeper into they’ve concepts explored inside PAGES.
The first episode will be available on January 15th — listen and subscribe to get updates on new episodes here.
As always, thank you for supporting and reading PAGES. We can’t wait to hear what you think of our fifth issue and to see what you’re able to achieve through SEO in 2019.
Cheers to another great year!
Sloan Roseberry is managing editor of PAGES magazine, and part of the content marketing team at Page One Power. You can follow her for the occasional tweet or connect with her on LinkedIn.
These days, it’s easy to create a website — but it’s not always easy to create a website right. Basic SEO knowledge builds the foundational understanding you need to create a site that achieves your goals. These articles explore the basic search engine optimization best practices and principles you need to understand to build a successful web presence.
Basic SEO These days, it’s easy to create a website — but it’s not always easy to create a website right. Basic SEO knowledge builds the foundational understanding you need to create a site that achieves your goals. These articles explore the basic search engine...
All decisions made in SEO are based on information that requires careful collection. Guidance must come from educated guesses, informed by trustworthy data. One of the best tools for retrieving this data is Screaming Frog, a crawler which can be tweaked in many...
So you have this great website.The graphics and design are perfect, the HTML is letter-perfect, and you even have web analytics switched on (and you monitor them obsessively). But still, you’d like to see your site bumped up the search results pages.What else would...
A drop in visitors to your site can be one of the first indicators that something is awry with your optimization. From technical errors to regular updates to the Google algorithm, there can be many possible reasons behind a traffic loss. By discovering the cause of...
Whether you’ve been an SEO for two weeks or two decades, you’re inevitably going to stumble into arguments about Meta tags. It’s sometimes hard to separate myth from history, and the role of Meta tags in SEO has evolved a great deal over time, but the basics are still...
You’ve got a website and you think it’s time for an SEO audit — where do you start? If you’re looking for help on how to conduct a basic SEO audit of your own website, this guide can set you on the right track. Whether it’s your first time trying your hand at an audit...
For all the constant advances in SEO, on-page optimization is often unsophisticated. You’ve optimized the Title tag, Meta description, and H1, and even some alt tags. Time to hit the bar? Truly, there's much more to it. Optimizing your page means more than the...
“Slow and steady” always wins the proverbial race, but Aesop might rewrite his fable if he saw how fast internet speeds are shaping online success. For the fourth consecutive year, mobile devices remain the go-to platform for users to visit their websites of...
Issue #4 is officially here! We can’t wait for you to get your copy. In this issue of PAGES, we’re focused on the ROI of SEO: what you get out of what you invest into SEO. Tracking the ROI of SEO can be tricky — to help, we enlisted some digital marketers with some...
Building your business or blog online is all about getting into your audience’s head. Knowing what and how they think makes your job as a marketer that much easier. Content marketing is about writing information on your website to fit the needs and wants of...
All decisions made in SEO are based on information that requires careful collection. Guidance must come from educated guesses, informed by trustworthy data.
One of the best tools for retrieving this data is Screaming Frog, a crawler which can be tweaked in many different ways to accomplish specific tasks of data gathering. However, ensuring the integrity of your data is rarely a straightforward path.
There are many potential obstacles in the journey towards information enlightenment — here, we’ll discuss some of the more common issues you might run into.
The Site is Too BIG
Screaming Frog runs off of your system’s memory (RAM), not hard disk, which means it is blistering fast but can encounter problems crawling on a large scale. You might have 1TB of hard disk space on your computer, but a typical computer won’t have more than 16GB of RAM.
Generally speaking, I run into issues with a plain text crawl after around ~75k-~100k URLs using a machine with 16GB of RAM and a 2.2 i7 processor.
It’s possible to run Screaming Frog on a remote server to increase the power available. This is most easily accomplished through Amazon Web Servers EC2 instances, which scale in processor and RAM to your desired definition.
For a full walkthrough, check out Mike King’s post here on ipullrank.com.
Other cloud services such as Google’s Cloud Computer Engine are available, but from my perspective AWS is the simplest solution.
Another possibly quicker solution, if you only occasionally deal with large domains, is to crawl in segments using Screaming Frog’s include/exclude features, disallow “crawl outside of start folder” in settings, and engage in several crawls of all main subfolders, presuming each subfolder is within your RAM’s capacity.
As a side note, you can increase Screaming Frog’s available RAM allocation (the standard is 1GB for a 32-bit machine, and 2GB for 64-bit). If you use all your allocated memory, you’ll receive a warning to increase your memory or it will become unstable.
The Crawler is Blocked By the Server
There are many evil crawlers on the internet, full of malice and server load. Large domains tend to take an inexorable approach, allowing entry only to those defined as pure by the royal domain. All others shall receive the forbidden gates of 403.
There are many potential causes to this, and it behooves you to determine exactly which cause is to blame before approaching the domain’s gatekeeper with a request for entry — specificity is appreciated by technical individuals. Below are the most common causes to test for.
User-Agent Whitelist or Blacklist
Try crawling as Chrome, but always use a VPN first to ensure you haven’t already been blacklisted by IP after approaching the server identified as a crawler. I always prefer to identify as a crawler in my first crawl in case user-agent blacklisting is inactive, because then the server logs will be cleaner later. It’s not a huge pain to filter your IP out later, but I’m constantly on VPNs and it can be a mild pain.
Server Rate Limiting
Crawling too quickly can upset a server, as bots can be used for distributed denial of service (DDOS) to crash the system. You may receive 403 messages after a decent number of URLs initially responded 200 OK — this is usually because of rate limiting.
Try slowing your crawl to under 5 URLs per second and limit threads to 1 or 2: this seems to be the most common acceptable crawl rate I’ve found in my work. This method takes much longer for huge domains; do the math and run it overnight, the weekend, or on a server.
For example, a 1 million page site at an average of 5 URLs per second will take a couple of days to crawl: 86,400 seconds per day x 5 URLs per second = 432,000 URLs crawled per day.
IP Whitelist or Blacklist
If you’ve already tried changing user-agent, limiting your crawl rate, and changing your IP via VPN, you’re probably dealing with a server that has a whitelist. It’s also possible that you’ve already landed yourself on the site’s blacklist by being too liberal with your crawl testing.
At this point, you can approach the gatekeeper with the steps you’ve attempted, and they will most likely happily put your IP on a whitelist, because you know how to crawl respectfully.
Unless, of course, you did get yourself blacklisted. Then you may be heckled.
The Crawler Can’t See the Desired Content
This is a conundrum when specific data, like a view count per post, needs to be extracted from a large number of URLs from an external domain; for example, perhaps one you have a guest column or competitive analysis interest in.
Enter artoo.js, a client-side scraping companion. This is the cleanest solution I have found to extracting rendered DOM quickly and at scale. Artoo automatically injects jQuery and exports in pretty JSON.
GG, WP. Happy crawlings.
For most of the 2010s, Nicholas Chimonas was Head of Technical SEO projects at Page One Power. He recently moved on from the agency life to an in-house role as Director of SEO at WTP Inc, a holding company of investors and tech strategists. You’ll find him in technicalSEO.slack.com or a national forest in the pacific northwest.