Skip to content

Pitfall #11: Trying to Match BigQuery Numbers to GA4 Reports

By Timo Dechau · Last updated March 25, 2026

You’ve done the work. You’ve written careful SQL, handled the nested schema, filtered by _TABLE_SUFFIX, and built a clean query for active users or sessions over a specific date range. Then you open the GA4 interface to validate your numbers — and they don’t match. Not even close. So you spend the next two weeks tweaking your query, adjusting timezone logic, trying different session definitions, convinced there’s a bug in your SQL. There isn’t. The numbers were never going to match.

Google’s own developer advocates have said publicly that BigQuery exports and GA4 UI reports are “not expected to be reconcilable.” That’s not a caveat buried in fine print — it’s the design.

Here’s why. The GA4 interface applies a stack of processing that simply doesn’t exist in the raw export:

  • Thresholding. When Google Signals is enabled, the UI applies thresholding — suppressing rows where user counts are too low to prevent re-identification. This never happens in BigQuery. You get all the data; the UI hides some of it. That alone can create gaps in date-level or dimension-level comparisons.
  • Behavioral and conversion modeling. The GA4 UI includes modeled data — estimated conversions and behavioral signals for users who declined analytics cookies or whose data was otherwise incomplete. BigQuery contains only observed, raw data. If Consent Mode is active and a chunk of your users denied consent, the UI fills in estimates where BigQuery shows nothing (see also Pitfall #10).
  • HyperLogLog++ for unique counts. The GA4 UI uses HyperLogLog++ (HLL++), a probabilistic algorithm that estimates unique counts extremely fast but with a small margin of error (~1-2%). A COUNT(DISTINCT) in BigQuery is exact. On large datasets, this difference compounds — your BigQuery number will be precise, the UI’s will be an approximation. Neither is “wrong,” but they won’t agree.
  • Attribution black boxes. GA4’s reports use their own attribution logic — data-driven attribution, channel groupings, conversion counting windows — none of which are transparent or reproducible in SQL.
  • Traffic source scoping. The top-level traffic_source field in BigQuery is user-scoped — it represents the first source/medium that acquired the user. This is not the same as session source/medium in the GA4 UI. If you’re comparing “where did this session come from?” using traffic_source, you’re looking at a fundamentally different thing. The newer collected_traffic_source field in the export schema helps bridge this gap — it captures the session-level attribution that’s closer to what the UI reports.
  • Scoping differences. The UI applies its own session scoping logic, engagement thresholds, and active user definitions. These don’t map cleanly to anything in the raw event stream.
  • gclid misattribution. Google Ads click identifiers get special treatment in the GA4 interface. Sessions with a gclid are attributed in ways that can differ from what the raw traffic_source fields in your export suggest.
  • Timezone discrepancies. Your BigQuery export stores event_timestamp in UTC microseconds. GA4 reports use the timezone configured in your property settings. A session that straddles midnight in your local timezone might land on different dates in the UI vs. your query.

Each of these differences is small-ish on its own. Combined, they produce discrepancies of 10-30% on metrics like active users, sessions, and conversions. And there’s no way to reverse-engineer the exact combination, because Google doesn’t publish the full methodology.

I’ve seen teams burn weeks — sometimes longer — chasing exact reconciliation. Pulling in consultants, filing support tickets, rewriting queries from scratch. The answer was always the same: the numbers diverge by design.

Stop trying to reconcile. Seriously. Treat GA4 UI and BigQuery as two different instruments measuring overlapping but non-identical things.

  • GA4 UI is your trend-spotting tool. Good for: “Are sessions trending up or down this month?” “Is this campaign performing roughly in line with last quarter?” It’s pre-processed, modeled, and convenient. Use it for directional monitoring.
  • BigQuery is your source for custom analysis. Good for: anything the UI can’t do. Cross-channel attribution modeling, user-level journey analysis, connecting GA4 events with your own backend data, building audiences from raw behavioral signals.

The moment you accept this, you stop wasting time on reconciliation and start spending it on the analysis that only BigQuery can give you.

If stakeholders need a single “official” number, pick one source and commit to it. For most teams doing serious analytics, that source should be BigQuery — because you control the definitions, the logic is transparent, and you can audit every step. Just don’t expect the GA4 UI to agree, and educate your stakeholders on why.

Run this query for a recent 7-day window:

SELECT
FORMAT_DATE('%Y-%m-%d',
DATE(TIMESTAMP_MICROS(event_timestamp),
'America/New_York') -- your GA4 property timezone
) AS event_date,
COUNT(DISTINCT user_pseudo_id) AS bq_active_users,
COUNT(DISTINCT
CONCAT(user_pseudo_id, '.',
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'))
) AS bq_sessions
FROM
`your_project.analytics_123456789.events_*`
WHERE
_TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d',
DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
GROUP BY event_date
ORDER BY event_date;

Now open the GA4 interface, navigate to Reports > Engagement > Overview (or the standard Users report), set the same date range, and compare day by day.

You’ll see discrepancies. Probably 10-30%, sometimes more. The direction of the discrepancy may not even be consistent day to day — some days BigQuery will be higher, some days the UI will be.

That’s the point. Seeing this gap with your own data is what converts the intellectual understanding (“they’re not reconcilable”) into a lived experience that changes how you work. Once you’ve seen it, you stop chasing the match and start building things only BigQuery can do.