For most internet regulars, Google is an indispensable service. We use it to search the silliest anecdotes, questions, facts, and probably, we consult it more than we consult actual people. But, revolutionizing the internet with a simple search bar isn’t a feat that comes easy. So, how does Google search function at the back-end in order to maintain such an unbeatable search engine standard?
STEP 1: CRAWLING AND INDEXING
For a typical search function, all the work cannot be done when a query is typed. To be as efficient and fast as Google, the work starts even before a search query is typed in. The pre-search work is called crawling and indexing.
Web crawlers or spiders essentially gather all the data available to them (i.e., billions of web pages) and organize it into something called the Search Index. This process of gathering data from web pages is extensive.
The crawling process begins with the spiders going to a list of past web addresses from past crawls and sitemaps provided by website owners
What’s a Sitemap?
A sitemap is a file of various web pages provided by website owners to Google and other search engines. Web crawlers read this file and crawl your website more intelligently. A sitemap can also provide metadata, i.e., website information like when was the page updated, how many other URL’s are on the page, etc.
When spiders visit the website, they use the links on those websites to link to other pages. Crawlers are self-learning software. When they see new links or go to old links, they learn characteristics like whether it is a dead link, whether the site has been updated, or whether a new site has come up, etc. The crawlers also determine what to crawl, how often to crawl, and how many pages are expected from each site.
Website owners can make choices about how many web pages in their websites are crawled or whether they want their website to be crawled at all. All these decisions can be made by owners using webmaster tools.
After retrieving information from websites, crawlers store them in a Search Index. This search index contains information from billions of web pages and according to Google, the information is over 100,000,000 GB in size.
A new index is created for every new word found. When a web page is indexed, it is added to the word indexes of all the words the web page contains. To increase the reliability of a search, Google has also created something called the Knowledge Graph.
With the Knowledge Graph, Google looks into other sorts of information about a webpage along with keyword information. You can either search for books in libraries or check local transport in other countries. It is a cohesive network of interdependent points.
STEP 2: SEARCH ALGORITHMS
Now, when someone Googles something, they want a definitive answer to their question at the top and not a huge list of web pages where they have to sit and sift through the information. So, Google ranking sorts through the pages stored in the search index to give results that are relevant to your search.
The Google ranking systems are based on algorithms that basically break down what you are looking for and then give the most relevant information to you. It’s not just a set of haphazard web pages; it’s a set of relevant ones. The following are the ways in which Google does this:
1. Analyzing Your Words:
The first and the most obvious step is figuring out what words are used by you in your search query. Now, while doing this Google can interpret spelling mistakes and search accordingly. It also tries to contextualize your query to the best of its abilities. For example, when you type in “Take Me To Church”, it will show the result of the songs by that name and not the best route to a church.
The keyword analyzation also takes into consideration the width of your query. Is it something very specific like a song or is it something more general like a recommended list of restaurants?