Justia is an online platform that provides the community with open access to the law, legal information and lawyers. Justia delivers premium marketing solutions to lawyers seeking to expand their online marketing reach.
While search engines have long adopted this protocol, neither the Internet Engineering Task Force (IETF) nor the World Wide Web Consortium (W3C) have never formally adopted them as an internet standard. This left the rather vague protocol up for a lot of interpretation over the years, and not every search engine interpreted the standards the same way (or even followed it at all in some cases). In addition, while Koster’s original document has remained largely unchanged for 25 years, the internet has changed dramatically in that same period.
Against this backdrop, Google apparently decided the time was right to standardize the Robots Exclusion Protocol so that it can be implemented consistently going forward. On Monday, July 1, in partnership with Koster (the original author of the protocol), Google announced that it had submitted a working draft to the IETF to standardize and extend the Robots Exclusion Protocol with enhancements for the modern web. Here are a few of the proposed enhancements:
Indicating how wildcards and flags should be used, and indicating how to deal with encoded characters.
Indicating that robots.txt can be used on any protocol based on the concept of a Uniform Resource Identifier (URI), rather than working only on HTTP.
This particular working draft refers only to robots.txt; it does not mention meta tags or other methods of robots exclusion. When compared against the original specification, the only new directive that will be standardized is the Allow: directive, which joins the pre-existing User-agent: and Disallow: directives mentioned in the original document. Many search engine spiders, including Google’s own Googlebot have long supported Allow:, which acts as an inverse to Disallow:, allowing you to indicate URLs that specifically should be allowed to be indexed, which is useful when you want to block a pattern of URLs while still allowing a specific URL that would otherwise match the pattern.
Once the protocol is standardized, it will be important that web crawlers implement the standard reliably. This will mean work for every company that makes a spider that crawls the web. To ease this transition, on the same day that Google announced that they were spearheading the effort to standardize the protocol, they also announced that they are open sourcing Google’s own robots.txt parser that it has used for over 20 years.
This library was written in the C++ programming language and according to the blog post, some pieces of the code are over 20 years old (though other pieces are much newer). The parser is available under an Apache 2.0 License and published for free on Github. The Apache 2.0 License allows both commercial and non-commercial use and allows modification by any party, making for a very permissive license.
Google has opened up the repo to pull requests as well, and third parties have already started submitting a number of changes to the parser. It is unclear from Google’s announcement whether changes made by third parties to the open source project will in turn be adopted by Google’s own crawler, Googlebot, but if they do, keeping track of this open source project may be a good way of knowing just how Google treats the robots.txt file.
Cleaning up the Googlebot Parser’s Use of Unsupported Directives
In the wake of standardizing the protocol and open sourcing its own parser, Google has announced that it is officially removing support for unsupported rules from Googlebot. In particular, Google has indicated that it is retiring the Crawl-delay:, Nofollow:, and Noindex: directives from Googlebot’s robots.txt parser effective September 1.
Keep in mind this is specifically about the robots.txt parser; the noindex and nofollow directives in meta tags and HTTP headers are still supported. Google never documented their support for Noindex: in robots.txt files, as we mentioned in our 2017 post, but it was an effective means of instructing Googlebot not to index certain URLs until now.
Google made these changes to the parser before open sourcing it, and as such the open source code contains no references to the Crawl-delay:, Nofollow:, or Noindex: directives.
Google’s Webmaster Central team was certainly busy the first week of July. In two days they published three blog posts, an IETF Working Draft, and an open source Github repository containing two decades of code. This standardization of the Robots Exclusion Protocol is clearly something Google feels very strongly about, which makes sense. Defining what a crawler can and cannot access on websites is fundamental to your site being indexed. We recommend you or your webmaster takes a look at your robots.txt file and makes sure that your directives are following the new draft standards, and be sure to keep an eye out on the draft as it evolves. We here at Justia are following these developments closely as always.
Google My Business is a free tool for businesses and organizations through which they are able to manage their presence on Google, including both Google Search and Google Maps. Google My Business listings regularly appear in Google search results, along with a map, when users look for local information. Google has announced that they’re rolling out more features on Google My Business. These new features can help businesses make listings as descriptive and unique as possible.
This feature for verified businesses allows you to assign a short name and URL to your Google My Business listing to make it easier for customers to find you. Please note that the short name can be 32 characters long. You can remove or edit a custom URL up to 3 times per year. Keep in mind that the short name should be associated with your business name. We recommend including your location for that particular listing.
Google My Business also provides businesses with the opportunity to easily set their preferred cover photo and logo. The businesses that have completed their core information (phone number, website, hours, etc.) will have their logo displayed at the top right-hand side of their profile. Additionally, it will be possible to add photo captions, which let businesses tell the stories behind the pictures.
Formerly known as Small Thanks with Google, this feature allows you to easily download and create custom assets for your Google My Business listing. These may include printable stickers, posters, signs, and more items from reviews and highlights on your Business listing.
At Justia, we can help you with Google My Business and let you know how the tools, products, and services outlined above are useful to your specific business. Contact us today if you are interested in getting help with your Google My Business listing.
Google announced a major update to its Googlebot that would bring the spider up to date with newer web technologies.
Googlebot, the spider that Google uses to index sites on the web has long been based on Chrome (or more specifically Chromium, the open source base from which Chrome is derived). Being based on Chrome, Googlebot was able to render pages the same way a real browser (specifically Chrome) does, thereby allowing Google to index dynamic sites in addition to simple HTML ones.
Until recently Googlebot was fairly static, based on Chromium Version 41, which was released to the stable channel on March 3, 2015. While Chrome 41 was state of the art in 2015, the web has changed significantly since then, so much so that the list of new and changed features between Chrome 41 and Chrome 74 is massive. For a long time, web developers that have used these newer technologies have had to use hacks and other methods to make sure their sites would still work in Chrome 41 (even though no user actually still used Chrome 41 thanks to automatic updates) so that Googlebot would be able to index their site.
Evergreen Googlebot Announcement Slide at Google I/O 2019
Announcing that Googlebot was updating to Chromium 74 was a big deal for web developers because it meant that things that would work in modern browsers would also work for Googlebot, and thus for Google Indexing. In addition to this major update, Google made the commitment to make Googlebot evergreen, meaning that every time there is a new stable release of Chrome, Google will update Googlebot within a few weeks at the most.
This is important because Google updates Chrome very often, which means it will update Googlebot just as often. In fact, while Chrome 74 was the stable release just last month at Google I/O, Chrome 75 was released to stable this month, so an update to Googlebot should be imminent, if it hasn’t already happened.
Web developers can now use modern web technologies such as ES6, Lazy Loading techniques, and perhaps most importantly, Web Components, without worrying that Googlebot won’t index their site.
Googlebot’s User Agent Will Change
Every browser and bot sends a string to any web server that they visit called a “User Agent String.” This string allows the web server to identify what the browser is, and can tailor things to that particular browser if necessary. Googlebot is no exception to this and broadcasts a user agent string so that servers know that they are Googlebot.
When Googlebot went evergreen, a purposeful decision was made to temporarily lock the user agent string of Googlebot however to give web developers time to adjust their code if they were targeting the specific user agent string Googlebot was using. This means that even though Googlebot was running Chrome/74, the user agent string would for some time.
User agent string will start reflecting the version of Chromium running (right now they still say 41 in case anyone has hard coded the whole user agent string) #io19
This decision is temporary, however, and Google intends to start showing a user agent string with the correct Chrome version very soon, as soon as enough time has passed for any web developers who wrote code based on the user agent to have updated their code to match.
Mobile Search Redesign With Improved Site Branding
On May 22, Google announced that it was rolling out a visual redesign the mobile version of search. This new version makes more prominent than ever before the source of the content. The name of the website the page will appear at the top of the search result snippet, and for an extra bit of visual branding, the website’s icon (if it has one) also appears on that line.
Ads get a visual refresh to match, but instead of the website’s icon, an icon with the word “Ad” in black in a sans serif font is shown in that place. SEO Roundtable reported shortly after the release of the redesign that some SEO experts had immediately begun experimenting to see whether they could do misleading things using that icon. Google’s Danny Sullivan was quick to recommend that these experts not try to do such things with their icons, as it would violate Google’s Favicon Guidelines. Sure enough, sites that had misleading icons saw their icons removed from search results later the same day.
Overall we believe this new change will be a great branding opportunity for your law firm because of the new prominence of your site name and icon in search results.
Preferred Domain Setting Removed From Search Console
Preferred Domain allows you to specify which version of your main website’s domain should show up in search (with or without the “www.” prefix). Google recommends that you follow the guidelines in its “Consolidate duplicate URLs” article to designate the primary URL (known as the canonical URL) for a page.
Of course we’ve long recommended that everyone follow those same guidelines regardless of the existence of the Preferred Domain setting; however the “Preferred Domain” was often seen as the fastest way to prevent both versions of a main domain from getting indexed at the same time.
Mobile-First Indexing is Now Default on New Domains
There is Now an Official Google Webmaster Conference
Until now, Google had chosen not to have a conference specifically for webmasters, choosing instead to include SEO-related tracks in their other conferences (like the sessions I blogged about from Google I/O), and to make appearances at other industry-run SEO Conferences like SMX.
Unlike your traditional tech conference, each one of these conferences is actually a roaming conference that will travel throughout the country. Webmaster Conference India 2019 will hold events in 14 different cities throughout India this year, helping to make the conference accessible for anyone in the country who wants to attend.
The More Things Change…
The SEO landscape changes rapidly. Google and others are constantly moving forward, and it can sometimes be difficult to stay on top of new developments. These are just some of the changes we’ve been tracking over the last couple months. We’ve also been tracking the recent core update to measure what impact it has had on the websites and blogs of our clients, as well as our own. We’ll always keep a weather eye on the horizon.
Google took to Twitter yesterday to pre-announce what they are referring to as a “broad core algorithm update.”
Tomorrow, we are releasing a broad core algorithm update, as we do several times per year. It is called the June 2019 Core Update. Our guidance about such updates remains as we’ve covered before. Please see this tweet for more about that:https://t.co/tmfQkhdjPL
At this time we don’t have any details about this update or what effect it will have on the search ranking of legal websites, but we stand by our recommendation of writing high quality, original, focused legal content for the Google of today, and the Google of tomorrow.
On May 7-9, 2019, Google held the twelfth annual Google I/O Developer Conference (thirteenth if you include the original Google Developer Day in 2007).
In 2016, Google moved the conference from Moscone Center in San Francisco to the Shoreline Amphitheatre in Mountain View (just down the street from both the Googleplex and Justia’s Mountain View Headquarters), and changed the feel of the conference to give it more of a music festival vibe.
This year’s conference continued the theme of Google’s recent focus on Artificial Intelligence and Machine Learning in particular, but also had a strong emphasis on trying to inspire developers.
This Year’s Big Announcements
Google used this year’s conference to announce two new phones, the Pixel 3a and Pixel 3a XL. These cheaper versions of the Pixel 3 and Pixel 3 XL released back in the fall have most of the Pixel functionality at nearly half the price of those flagship devices.
The advancements of Google’s Machine Learning were front and center with features like on-device Live Captioning at the operating system level in Android Q, improvements to the functionality of Google Lens to aid users of different levels of literacy to interact with the world, and the ability to automatically fill in multi-step forms on the web with Duplex for the Web.
Google also used this conference to rebrand Google Home devices under the Nest brand name (while at the same time ending several of the Nest specific projects such as “Works with Nest” in favor of the Google Home integration points).
While it wasn’t announced on the main stage, a big change announced at Google I/O of interest to us at Justia, and to the web marketing industry as a whole, is that Google’s search engine spider, Googlebot, is now “Evergreen.” Until recently, Googlebot was based on an older version of the Chrome rendering engine (Chrome 41). Websites using technologies that didn’t work in Chrome 41 could have issues getting indexed properly in Google. Now Googlebot is based on the latest open source stable version of Chromium (Chrome 74 at the time of the announcement), and when new stable versions of Chromium are released in the future, Googlebot will update to the new version within weeks of release.
Justia Live Blogs
While I attended a large variety of sessions, I’ve continued the tradition I started 2 years ago by live blogging some of the sessions I attended which I thought are of particular interest to those of us interested in marketing on the web. This year’s live blogged sessions were: