Latest Common Crawl - 2025 - 43

Insights about the latest Common Crawl dataset.

The latest Common Crawl dataset provides a wealth of information for researchers and developers alike. With its vast archive of web pages, it enables a variety of applications — from training machine learning models to conducting large-scale web analysis.

---

Top Domains by Number of Pages in the Dataset

The table below lists the top 100 domains by the number of pages they contribute to the latest Common Crawl dataset:

# Count Domain
1 17668631 blogspot.com
2 4367122 wikipedia.org
3 1858874 wordpress.org
4 1700207 ebay.com
5 1480496 europa.eu
6 1323520 app.link
7 1189629 google.com
8 1155701 wiktionary.org
9 1149496 ning.com
10 1140009 investing.com
11 1139779 rakuten.co.jp
12 1071707 fandom.com
13 1059788 exblog.jp
14 950627 medium.com
15 911901 ox.ac.uk
16 874891 aif.ru
17 856566 pixnet.net
18 823272 hh.ru
19 810382 googlesource.com
20 790915 qq.com

(Table truncated for brevity — full list continues up to #100)

---

Total Pages in the Crawl

2,616,796,857

The number 2,616,796,857 in words is:

Two billion, six hundred sixteen million, seven hundred ninety-six thousand, eight hundred fifty-seven.

links

social