7 MIN READ · Pedro Thomaz

Privacy-First Analytics, Explained: What It Means and What We Actually Collect

Q: Can you still see where traffic comes from?

Yes. We keep the referrer host (e.g. duckduckgo.com) and a country derived at the edge, so channel and geography reporting work well. We just don't keep the precise identifiers that would turn that into surveillance.

Privacy-first analytics means measuring traffic without tracking people. Here's what it means, how it differs from Google Analytics, and the exact signals we collect.

Privacy-First Analytics, Explained: What It Means and What We Actually Collect

Privacy-first analytics means measuring how a website is used without identifying, tracking, or profiling the people who use it. No cookies, no device fingerprints, no cross-site identity. You still learn what pages perform, where visitors came from, and whether a change worked — but you never build a dossier on an individual. That single design constraint is the whole discipline.

We build and run our own cookieless analytics for client sites, and this is the explanation we wish existed when we started. It is also the honest version: what we collect, what we deliberately throw away, and the tradeoffs we accepted to stay on the right side of GDPR without a consent banner.

What "privacy first analytics" actually means

The phrase gets used loosely, so here is a working definition. Privacy-first analytics is measurement designed so that the data collected cannot, on its own or in combination, single out a natural person. The privacy guarantee is a property of the system, not a promise in a policy document. You cannot leak what you never stored.

Three commitments fall out of that definition:

No persistent identifiers. No cookies, no localStorage IDs, no device fingerprint hashed from screen size, fonts, and user agent. Nothing that survives across visits or follows a person between sites.
Aggregate by default. Numbers are counted, not joined. We store "this page got 412 views today," not "visitor X viewed page A then page B then converted."
Data minimisation as the starting point. Under GDPR Article 5(1)(c), you collect only what is adequate and relevant. Privacy-first inverts the usual habit: instead of capturing everything and pruning later, you capture the minimum and justify any addition.

The practical payoff is that, in most jurisdictions, analytics built this way is not "tracking" in the legal sense. The ePrivacy Directive's cookie-consent rules hinge on storing or accessing information on the user's device. If you store nothing on the device, the cookie banner requirement largely evaporates — which is exactly why we don't show one.

How it differs from Google Analytics

Google Analytics 4 is the default, and it is the clearest contrast. GA4 is built to connect behaviour to identity over time, because its parent business is advertising. Even with consent mode and IP anonymisation, the model is fundamentally event-and-identity: a stream of timestamped events bound to a client ID, designed to be stitched into journeys and, where Google's terms allow, into Google's broader graph.

That creates three problems a privacy-first tool doesn't have:

Consent overhead. Because GA4 reads and writes on the device, it needs explicit, opt-in consent in the EU. The banner you click "Reject all" on exists largely to make tools like GA4 lawful. Consent banners themselves measurably hurt conversion and load time.
Data leaves your control. GA4 sends data to Google's infrastructure. After the Schrems II ruling and a string of European DPA decisions (the Austrian, French, and Italian regulators all found specific GA deployments unlawful), EU-to-US transfer of analytics data has been a live legal risk for years.
Sampling and modelling. To handle scale and consent gaps, GA4 samples and models data. You are often looking at an estimate dressed as a count.

Privacy-first analytics trades reach for honesty. We cannot tell you that the same person came back four times this month, because we made it impossible to know. What we can tell you is true, complete for the traffic we see, and yours.

The exact signals we collect

Concrete is better than abstract, so here is the actual shape of what hits our pipeline on a pageview. We run this server-side on PHP 8.3, behind Cloudflare, with no client-side tracking script doing identity work.

{
  "path": "/journal/privacy-first-analytics-explained",
  "locale": "pt",
  "referrer_host": "duckduckgo.com",
  "country": "PT",
  "device_class": "mobile",
  "ts_hour": "2026-06-03T14:00:00Z"
}

Note what is and isn't there:

Page path — which page, so we know what content earns attention.
Locale — EN, PT, or ES, since we serve three via URL prefixes (/pt/, /es/) and want to know which audiences read what.
Referrer host only — we keep duckduckgo.com, not the full referring URL with its query string, which can carry search terms or session tokens.
Country — derived from Cloudflare's edge geolocation, never stored alongside anything that could re-identify. Country, not city, not coordinates.
Device class — the bucket mobile / desktop / tablet, not the full user-agent string and never a fingerprint assembled from it.
Hour bucket — the timestamp is rounded to the hour. Minute-and-second precision is a surprisingly strong identifier when combined with other fields, so we drop it.

The IP address is the load-bearing decision. We use it transiently to derive the country at the edge and then we do not store it — not hashed, not truncated, not "anonymised." A hashed IP is still personal data because it is reversible with a rainbow table over the IPv4 space. So it never lands in the database. The row above is what persists, and it describes a pageview, not a visitor.

What we refuse to collect

The discipline is mostly defined by the no list. We do not collect, and have no mechanism to collect:

Cookies or any device-stored identifier. There is no first-party analytics cookie and no localStorage key. This is why there is no consent banner.
Device fingerprints. No canvas hashing, no font enumeration, no concatenating screen resolution + timezone + GPU to manufacture a pseudo-ID. Fingerprinting is tracking without consent by other means, and we treat it as off-limits.
Cross-site or cross-session identity. We cannot link a visit on a client site to a visit anywhere else, including our own. There is no shared graph.
Individual user journeys. We store aggregate counts per page, not ordered event streams tied to one person.
Raw IP addresses, full user-agent strings, or precise geolocation. Each is either discarded at the edge or never requested.

If a piece of data could be used to recognise the same person twice, our default answer is that we don't keep it. Exceptions get a written justification and a retention limit, not a shrug.

The tradeoffs, stated plainly

This is not free. We learned the costs the hard way and think you should know them before adopting the approach.

You lose per-user funnels and cohort retention. If your product genuinely needs to know that the same account did A then B then churned, cookieless aggregate analytics will not give you that, and you should reach for consented, first-party product analytics instead. For our own site and most marketing sites, the question is "which content and channels work," and aggregates answer that cleanly.

You lose deduplicated unique visitors in the strict sense. We report a privacy-preserving estimate of uniques per page per day, derived without storing identifiers, and we are upfront that it is an estimate. Honest fuzziness beats precise surveillance.

What you gain: no banner, faster pages (no heavy third-party tag), data residency you control, and a measurement story you can defend to a regulator or a client's legal team in one paragraph. For a project like Delicious Diamonds, where the brand promise is taste and trust, shipping a site that doesn't quietly tax visitors with tracking is part of the craft, not a compliance afterthought.

The short version

Privacy-first analytics answers "is the site working?" without answering "who is this person?" It counts pages, not people. The data is aggregate, cookieless, and fingerprint-free by construction, which is why it usually falls outside cookie-consent law. You give up individual-level tracking; you keep speed, sovereignty, and a clear conscience.

FAQ

Is privacy-first analytics GDPR compliant?

Done properly, it minimises or eliminates the processing of personal data. We don't store IPs, cookies, or fingerprints, so for typical site measurement there is little or no personal data being processed. Compliance always depends on your full setup, but starting from "collect nothing identifying" makes the rest far easier.

Does it need a cookie consent banner?

If it stores nothing on the visitor's device and reads nothing from it — no cookies, no localStorage — then the ePrivacy cookie-consent trigger generally doesn't apply. Ours stores nothing, so we run no banner.

How is it different from Google Analytics?

GA4 ties events to a persistent client identity, requires consent in the EU, and sends data to Google. Privacy-first analytics counts in aggregate, uses no persistent identifier, and keeps the data with you. You trade individual tracking and reach for legality, speed, and control.

Can you still see where traffic comes from?

Yes. We keep the referrer host (e.g. duckduckgo.com) and a country derived at the edge, so channel and geography reporting work well. We just don't keep the precise identifiers that would turn that into surveillance.

What about unique visitors?

We report a privacy-preserving estimate derived without storing identifiers, and we label it as an estimate. If you need exact, deduplicated, per-user data, that requires consented product analytics — a deliberate, separate choice.