I have a simple blog. It dates back years. Most things are about technology but I also have a popular blog post about finding song by lyrics which gets the lion share of the traffic.

I have implemented my own analytics of incoming traffic:

  1. Every request that comes to the backend server gets logged in PostgreSQL
  2. When you view any page, an async XHR request is made and that's also logged in PostgreSQL

Most traffic terminates at the CDN. Most likely, when you're reading this page right now it never renders on my server but is served straight from the CDN, but it will send an XHR request to my analytics backend, which in a sense becomes a measure that you're in a real regular browser that supports JavaScript.

One thing I noticed is that the request User-Agent of the incoming requests that come in, appear to be some sort of bot that is not Googlebot, which used to dominate the traffic on my blog.

Bot Agent Requests

Notables:

  • Claude's bot makes a ton of traffic!
  • OpenAI appears to have two bots ("gptbot" and "searchbot") and it's large
  • What on earth is that Facebook crawler doing? Is it crawling for training Meta's LLMs?
  • What is this Amazonbot and why is it making as much traffic as Googlebot?

JavaScript or not

At the time of writing this, I had only recently started tracking the User-Agent of pageviews so I can't compare historical numbers. But generally it seems only ~1% of pageviews is by a bot user agent, whereas direct server-side traffic to the server, ~66% is from a bot agent.

Is bot in pageviews vs requests

That means that a lot of the bots don't render the page with JavaScript. Or rather, perhaps they do but they have some provision in there so as to not trigger XHR requests to my analytics (which is implemented with sendBeacon).

The reason for the "-16.5%" drop was because I recently implemented a fix to redirect traffic that bypassed the CDN and went straight to the backend.

Comments

Your email will never ever be published.

Previous:
Native connection pooling in Django 5 with PostgreSQL June 25, 2025 Python, Django
Related by category:
Fastest way to find out if a file exists in S3 (with boto3) June 16, 2017 Web development
Be very careful with your add_header in Nginx! You might make your site insecure February 11, 2018 Web development
<datalist> looks great on mobile devices August 28, 2020 Web development
How to get all of MDN Web Docs running locally June 9, 2021 Web development
Related by keyword:
Robots.txt Validator January 24, 2004 Web development