Pitfall #8: user_pseudo_id Is Not a Stable User Identity
The Trap
Section titled “The Trap”You start building your user-level analysis — attribution, lifetime value, retention — and you anchor it all on user_pseudo_id. It’s right there in every row of the export, it looks like a user identifier, and it feels like the obvious primary key for “a person.” So you count distinct user_pseudo_id values and call that your user count. You join tables on it. You build cohorts with it. And then one day someone asks why your BigQuery user count is 3x higher than what GA4 reports, or why a single customer appears to have five separate journeys that never connect.
Why It Happens
Section titled “Why It Happens”The user_pseudo_id is a client ID — specifically, the value from the _ga cookie in a browser, or the app instance ID on mobile. That’s it. It identifies a browser on a device, not a person.
Here’s where the gap opens up. The same person using Chrome on their laptop, Safari on their phone, and the company iPad for weekend browsing? That’s three user_pseudo_id values. One person, three “users” in your data. Now multiply that across your entire user base.
It gets worse. Even on a single browser, the user_pseudo_id resets when someone clears their cookies, uses incognito mode, or when the browser itself decides to expire first-party cookies (Safari’s ITP does this after 7 days of no visit in some scenarios). Every reset mints a brand-new user_pseudo_id, and there’s nothing in the raw export that connects the old one to the new one.
People build entire attribution models on this identifier without realizing they’re actually measuring browser instances, not people. Your “new user” count is quietly inflated by returning visitors on fresh cookies.
The Fix
Section titled “The Fix”Treat user_pseudo_id for what it actually is: a device-and-browser identifier. It’s useful — it’s the most granular identifier GA4 gives you out of the box — but it’s not a person.
For actual user identity, you need your own user_id. When someone logs in, GA4 lets you set this via gtag('set', 'user_id', 'your-id') or through the config. That value shows up in the export as user_id. It’s your system’s identifier, so it follows the person across devices and browsers.
The practical approach is to build an identity resolution layer:
-- Build a device-to-user mapping from login eventsWITH identity_map AS ( SELECT user_pseudo_id, user_id, MIN(event_timestamp) AS first_seen, MAX(event_timestamp) AS last_seen FROM `your_project.analytics_123456789.events_*` WHERE user_id IS NOT NULL GROUP BY user_pseudo_id, user_id)
SELECT * FROM identity_mapORDER BY user_id, first_seen;This gives you a mapping table: which user_pseudo_id values belong to which actual user_id. Join this into your analyses and you’ll start seeing real people instead of cookie fragments.
For the portion of your traffic that never logs in, user_pseudo_id is the best you’ve got. Accept that limitation — just don’t confuse it with knowing who someone is.
How to Check If You Have This Problem
Section titled “How to Check If You Have This Problem”Run this to see how much identity coverage you actually have:
SELECT COUNT(DISTINCT user_pseudo_id) AS total_pseudo_ids, COUNT(DISTINCT user_id) AS total_user_ids, COUNT(DISTINCT CASE WHEN user_id IS NOT NULL THEN user_pseudo_id END) AS pseudo_ids_with_user_id, ROUND( COUNT(DISTINCT CASE WHEN user_id IS NOT NULL THEN user_pseudo_id END) / COUNT(DISTINCT user_pseudo_id) * 100, 1 ) AS identity_coverage_pctFROM `your_project.analytics_123456789.events_*`WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE());Look at that identity_coverage_pct number. If it’s below 20%, your user-level analysis is mostly fiction — you’re counting cookies, not people. Even at 50%, you should be treating any “per-user” metric with serious caveats.
The gap between total_pseudo_ids and total_user_ids tells you how fragmented your identity picture really is. If you see 100k pseudo IDs but only 8k user IDs, that ratio should make you pause before you present any “unique users” number to a stakeholder.