Education logo

About the Google Search Document Leak

Top intership course in nagercoil

By featherlinksPublished about a month ago 3 min read


Thousands of Google's internal search ranking documents recently leaked, giving us a rare look into the workings of Google’s ranking algorithms. Here, we’ll dive into what the leaked data was about, how it can help the SEO community, and how Google responded to it.

What the Leak Was About and Who Leaked It

The leaked documents showed information about the data Google collects to rank sites. Erfan Azimi, a search marketer and the founder of EA Eagle Digital, came across the leak at first. He wanted to let the search community know how the ranking system actually works and on May 5, reached out to Rand Fishkin, co-founder of Moz, since he felt that he was the best person to make this information public.

Rand checked with some of his friends who are ex-Googlers to make sure whether the leak was authentic, and then turned to Mike King, CEO of iPullRank, to decode the documents. He analyzed the documents and later published an article sharing his insights.

When It Was Leaked

The documents were released on Github on March 13 by a bot called yoshi-code-bot. They came from Google’s internal Content API Warehouse, which the employees use to store their files, and were not taken down until May 7.

What the Leaked Data Is About

There were 2,596 modules in the API files with 14,014 attributes, and the data seemed to be about:

Clicks: According to Mike King, Google most likely uses clicks and post-click behavior for ranking.

Chrome Data: In Rand Fishkin’s blog, he says that Google probably uses the number of clicks on pages in the chrome browser to identify the most popular URLs on the site.

Links: Erfan Azimi mentioned to Rand that Google has three tiers for classifying their link indexes- low, medium, and high quality.

Whitelists in certain domains: Rand said that Google is whitelisting specific domains like Travel, Covid, and Politics.

Website Authority: Mike said that Google has an overall domain authority, when he saw that there was a module that mentions a feature called “siteAuthority”.

How it is useful for the SEO community

Some key takeaways from Mike and Fishkin’s analysis of the documentation are:

User Experience: According to Mike, getting more clicks to a site with good user experience signals to Google that your page has to rank, and this will probably help you bounce back from the Helpful Content Update.

Authorship matters: He also said that authors are being measured by Google. So we can probably assume that authorship is important for ranking.

Links are probably still a big deal: Links from a fresh or top-tier page are more valuable and improve your ranking performance. From Mike’s analysis, Google checks the average weighted font size of terms in documents and the anchor text of links. And they also value a link based on the trust they have for the homepage.

Page Titles are still important: Mike said that Google still considers how well the page title matches what the user is searching for, since there was a mention of a feature ‘“titlematchScore”.

Video-focused sites are treated differently: If more than 50% of the site has video content, it qualifies as video-focused, and is evaluated differently.

Your Money Your Life is specifically scored: Content that can directly affect people’s well-being, safety and happiness has a separate ranking score.

Locally relevant links might be more valuable: Links from websites in the same country have more weight than those from other locations.

Quality rating: Fishkin found mentions of quality raters in the documents, including those of ‘Human Ratings’. So we should probably keep in mind how quality raters view our websites too.

Google’s Response

Google confirmed that the leaked documents were authentic, but said that we would be wrong to assume that it provided a complete picture of the ranking system. Google spokesperson David Thompson said, “We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information”. They also said that their ranking signals are always changing.


SEO experts are still decoding the documents and we will probably gain even more insights in the months to come. And while the data leak cannot be used to get a quick win in SEO, there is still a lot of information in there that can confirm our best practices and bring us more on the right track.


About the Creator


Feather Softwares is a leading IT company which helps you build your presence digitally.

Enjoyed the story?
Support the Creator.

Subscribe for free to receive all their stories in your feed. You could also pledge your support or give them a one-off tip, letting them know you appreciate their work.

Subscribe For Free

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights


There are no comments for this story

Be the first to respond and start the conversation.

    featherlinksWritten by featherlinks

    Find us on social media

    Miscellaneous links

    • Explore
    • Contact
    • Privacy Policy
    • Terms of Use
    • Support

    © 2024 Creatd, Inc. All Rights Reserved.