Member-only story
Web Crawlers: Spies of the Internet
To find information on the hundreds of millions of webpages that exist on the Internet, a search engine employs special programs called Web crawlers or Spiders. And they gather information in an automated and orderly way for indexing.
Legitimate sites, in particular search engines like Google, do web crawling as a means to provide up-to-date information.
Usually, a web crawler finds its instructions from robot.txt file on a website. Then it crawls in a recursive fashion within the tree-like structure of the website to index content.
Obviously, crawling consumes resources of the systems they visit, and often without permission. So mechanisms exist for websites not wishing to be crawled to make this known to the crawling agent (compromising SEO).
Blogs like Medium are more actively crawled because they offer fresh content and gets updated several times a day.