Pitfall #9: ga_session_id Is Not Globally Unique
The Trap
Section titled “The Trap”You need to do session-level analysis, so you pull ga_session_id from the event parameters and use it as your session key. You GROUP BY it, you COUNT(DISTINCT) it, you join on it. The numbers look reasonable enough, so you move on and build more logic on top. Weeks later, something weird surfaces — sessions with impossibly diverse page paths, or conversion rates that don’t add up. You dig in and realize that two completely different users, on different continents, sometimes share the same ga_session_id. Your “sessions” have been quietly merging unrelated visits.
Why It Happens
Section titled “Why It Happens”The ga_session_id is a timestamp — specifically, the Unix timestamp (in seconds) of when the session started, generated on the client side. It’s scoped to a single user_pseudo_id, not globally unique across your entire dataset.
Think about it: if two users happen to start a session in the same second, they’ll get the same ga_session_id. On a busy site, this isn’t a rare coincidence — it’s a statistical certainty.
Google designed it this way because on the client, it only needs to be unique within one browser’s cookie jar. It was never meant to be a global session identifier. But when you pull all your data into BigQuery and treat it like one, that assumption breaks.
There’s another wrinkle. The ga_session_id can behave oddly around midnight boundaries and session timeout edges. If a user is active across a session timeout (30 minutes of inactivity by default), the new session gets a new timestamp — but I’ve seen cases where clock skew, timezone handling, or late-arriving events create unexpected overlaps. It’s rare, but it’s real.
The Fix
Section titled “The Fix”Always use the composite key: user_pseudo_id + ga_session_id. Together, they form a genuinely unique session identifier.
Here’s the pattern I use in every GA4 BigQuery project:
SELECT CONCAT( user_pseudo_id, '.', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') ) AS session_id, user_pseudo_id, event_name, event_timestampFROM `your_project.analytics_123456789.events_*`WHERE _TABLE_SUFFIX = FORMAT_DATE( '%Y%m%d', CURRENT_DATE() );That CONCAT gives you a single string you can safely GROUP BY, join on, and count. Make it a convention in your project — define it once in a view or a common CTE and reference it everywhere.
For extra robustness, especially on high-traffic properties, consider adding the ga_session_number (the sequential session count for that user) as a third component. This handles the rare edge case where a single user somehow gets two sessions with identical start timestamps:
CONCAT( user_pseudo_id, '.', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'), '.', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number')) AS session_idIs the three-part key necessary for most teams? Probably not. But it costs nothing extra and closes the last remaining edge case. I’d rather over-engineer a primary key than debug phantom sessions six months from now.
How to Check If You Have This Problem
Section titled “How to Check If You Have This Problem”This query finds ga_session_id values that appear under multiple user_pseudo_id values — direct evidence of collision:
WITH sessions AS ( SELECT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id' ) AS ga_session_id FROM `your_project.analytics_123456789.events_*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE()))
SELECT ga_session_id, COUNT(DISTINCT user_pseudo_id) AS user_countFROM sessionsWHERE ga_session_id IS NOT NULLGROUP BY ga_session_idHAVING COUNT(DISTINCT user_pseudo_id) > 1ORDER BY user_count DESCLIMIT 20;If this returns rows, you have collisions. On any reasonably busy property, it will. The user_count column shows you how many different users share each ga_session_id value. If you’ve been using ga_session_id alone as your session key, every one of those rows represents silently merged sessions in your analysis.