Skip to content

Pitfall #8: user_pseudo_id Is Not a Stable User Identity

By Timo Dechau · Last updated March 25, 2026

You start building your user-level analysis — attribution, lifetime value, retention — and you anchor it all on user_pseudo_id. It’s right there in every row of the export, it looks like a user identifier, and it feels like the obvious primary key for “a person.” So you count distinct user_pseudo_id values and call that your user count. You join tables on it. You build cohorts with it. And then one day someone asks why your BigQuery user count is 3x higher than what GA4 reports, or why a single customer appears to have five separate journeys that never connect.

The user_pseudo_id is a client ID — specifically, the value from the _ga cookie in a browser, or the app instance ID on mobile. That’s it. It identifies a browser on a device, not a person.

Here’s where the gap opens up. The same person using Chrome on their laptop, Safari on their phone, and the company iPad for weekend browsing? That’s three user_pseudo_id values. One person, three “users” in your data. Now multiply that across your entire user base.

It gets worse. Even on a single browser, the user_pseudo_id resets when someone clears their cookies, uses incognito mode, or when the browser itself decides to expire first-party cookies (Safari’s ITP does this after 7 days of no visit in some scenarios). Every reset mints a brand-new user_pseudo_id, and there’s nothing in the raw export that connects the old one to the new one.

People build entire attribution models on this identifier without realizing they’re actually measuring browser instances, not people. Your “new user” count is quietly inflated by returning visitors on fresh cookies.

Treat user_pseudo_id for what it actually is: a device-and-browser identifier. It’s useful — it’s the most granular identifier GA4 gives you out of the box — but it’s not a person.

For actual user identity, you need your own user_id. When someone logs in, GA4 lets you set this via gtag('set', 'user_id', 'your-id') or through the config. That value shows up in the export as user_id. It’s your system’s identifier, so it follows the person across devices and browsers.

The practical approach is to build an identity resolution layer:

-- Build a device-to-user mapping from login events
WITH identity_map AS (
SELECT
user_pseudo_id,
user_id,
MIN(event_timestamp) AS first_seen,
MAX(event_timestamp) AS last_seen
FROM
`your_project.analytics_123456789.events_*`
WHERE
user_id IS NOT NULL
GROUP BY
user_pseudo_id, user_id
)
SELECT * FROM identity_map
ORDER BY user_id, first_seen;

This gives you a mapping table: which user_pseudo_id values belong to which actual user_id. Join this into your analyses and you’ll start seeing real people instead of cookie fragments.

For the portion of your traffic that never logs in, user_pseudo_id is the best you’ve got. Accept that limitation — just don’t confuse it with knowing who someone is.

Run this to see how much identity coverage you actually have:

SELECT
COUNT(DISTINCT user_pseudo_id)
AS total_pseudo_ids,
COUNT(DISTINCT user_id)
AS total_user_ids,
COUNT(DISTINCT CASE
WHEN user_id IS NOT NULL
THEN user_pseudo_id
END) AS pseudo_ids_with_user_id,
ROUND(
COUNT(DISTINCT CASE
WHEN user_id IS NOT NULL
THEN user_pseudo_id
END)
/ COUNT(DISTINCT user_pseudo_id) * 100, 1
) AS identity_coverage_pct
FROM
`your_project.analytics_123456789.events_*`
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE());

Look at that identity_coverage_pct number. If it’s below 20%, your user-level analysis is mostly fiction — you’re counting cookies, not people. Even at 50%, you should be treating any “per-user” metric with serious caveats.

The gap between total_pseudo_ids and total_user_ids tells you how fragmented your identity picture really is. If you see 100k pseudo IDs but only 8k user IDs, that ratio should make you pause before you present any “unique users” number to a stakeholder.