I had been googling Google's alternatives, and here is what I found.

Google provides a great service of providing a web search engine. The scale of the task is huge.

If you had to go open source way then following are some interesting organizations that I came across

  • - Peer to peer index, you can create your own index
  • - Open crawl data for the web. You get all the URLs
  • - Chrome seems to have made a lot of data public
  • - Searx is a free internet metasearch engine which aggregates results from more than 70 search services. Users are neither tracked nor profiled. Additionally, searx can be used over Tor for online anonymity.
  • - Proposal for open web index

All of these options are good for different use cases.

As I understand it, there is a crawling part where you identify URLs. Then an index part where you analyze the content.

For crawling there are many open source ways to do it. But I decided to give scrapy a try. It's a python library.


below are personal blog websites that I found interesting. I am not sure if they are still active.


900dpi, Amb-1, asocialfolder, Boxfolio, Brace, Calepin, Cloud Cannon, Droppages, Dropplets, Duetto, Fargo, Harp, Kissr, Montaigne, Markbox, Pancake, Scriptogram, Site44, Sitebox, Skrivr, Small Victories, Synkee, Updog