Skip to content

Pitfall #9: ga_session_id Is Not Globally Unique

By Timo Dechau · Last updated March 25, 2026

You need to do session-level analysis, so you pull ga_session_id from the event parameters and use it as your session key. You GROUP BY it, you COUNT(DISTINCT) it, you join on it. The numbers look reasonable enough, so you move on and build more logic on top. Weeks later, something weird surfaces — sessions with impossibly diverse page paths, or conversion rates that don’t add up. You dig in and realize that two completely different users, on different continents, sometimes share the same ga_session_id. Your “sessions” have been quietly merging unrelated visits.

The ga_session_id is a timestamp — specifically, the Unix timestamp (in seconds) of when the session started, generated on the client side. It’s scoped to a single user_pseudo_id, not globally unique across your entire dataset.

Think about it: if two users happen to start a session in the same second, they’ll get the same ga_session_id. On a busy site, this isn’t a rare coincidence — it’s a statistical certainty.

Google designed it this way because on the client, it only needs to be unique within one browser’s cookie jar. It was never meant to be a global session identifier. But when you pull all your data into BigQuery and treat it like one, that assumption breaks.

There’s another wrinkle. The ga_session_id can behave oddly around midnight boundaries and session timeout edges. If a user is active across a session timeout (30 minutes of inactivity by default), the new session gets a new timestamp — but I’ve seen cases where clock skew, timezone handling, or late-arriving events create unexpected overlaps. It’s rare, but it’s real.

Always use the composite key: user_pseudo_id + ga_session_id. Together, they form a genuinely unique session identifier.

Here’s the pattern I use in every GA4 BigQuery project:

SELECT
CONCAT(
user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id')
) AS session_id,
user_pseudo_id,
event_name,
event_timestamp
FROM
`your_project.analytics_123456789.events_*`
WHERE
_TABLE_SUFFIX = FORMAT_DATE(
'%Y%m%d', CURRENT_DATE()
);

That CONCAT gives you a single string you can safely GROUP BY, join on, and count. Make it a convention in your project — define it once in a view or a common CTE and reference it everywhere.

For extra robustness, especially on high-traffic properties, consider adding the ga_session_number (the sequential session count for that user) as a third component. This handles the rare edge case where a single user somehow gets two sessions with identical start timestamps:

CONCAT(
user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'),
'.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_number')
) AS session_id

Is the three-part key necessary for most teams? Probably not. But it costs nothing extra and closes the last remaining edge case. I’d rather over-engineer a primary key than debug phantom sessions six months from now.

This query finds ga_session_id values that appear under multiple user_pseudo_id values — direct evidence of collision:

WITH sessions AS (
SELECT
user_pseudo_id,
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'
) AS ga_session_id
FROM
`your_project.analytics_123456789.events_*`
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d',
DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
)
SELECT
ga_session_id,
COUNT(DISTINCT user_pseudo_id) AS user_count
FROM sessions
WHERE ga_session_id IS NOT NULL
GROUP BY ga_session_id
HAVING COUNT(DISTINCT user_pseudo_id) > 1
ORDER BY user_count DESC
LIMIT 20;

If this returns rows, you have collisions. On any reasonably busy property, it will. The user_count column shows you how many different users share each ga_session_id value. If you’ve been using ga_session_id alone as your session key, every one of those rows represents silently merged sessions in your analysis.