Pitfall #11: Trying to Match BigQuery Numbers to GA4 Reports
The Trap
Section titled “The Trap”You’ve done the work. You’ve written careful SQL, handled the nested schema, filtered by _TABLE_SUFFIX, and built a clean query for active users or sessions over a specific date range. Then you open the GA4 interface to validate your numbers — and they don’t match. Not even close. So you spend the next two weeks tweaking your query, adjusting timezone logic, trying different session definitions, convinced there’s a bug in your SQL. There isn’t. The numbers were never going to match.
Why It Happens
Section titled “Why It Happens”Google’s own developer advocates have said publicly that BigQuery exports and GA4 UI reports are “not expected to be reconcilable.” That’s not a caveat buried in fine print — it’s the design.
Here’s why. The GA4 interface applies a stack of processing that simply doesn’t exist in the raw export:
- Thresholding. When Google Signals is enabled, the UI applies thresholding — suppressing rows where user counts are too low to prevent re-identification. This never happens in BigQuery. You get all the data; the UI hides some of it. That alone can create gaps in date-level or dimension-level comparisons.
- Behavioral and conversion modeling. The GA4 UI includes modeled data — estimated conversions and behavioral signals for users who declined analytics cookies or whose data was otherwise incomplete. BigQuery contains only observed, raw data. If Consent Mode is active and a chunk of your users denied consent, the UI fills in estimates where BigQuery shows nothing (see also Pitfall #10).
- HyperLogLog++ for unique counts. The GA4 UI uses HyperLogLog++ (HLL++), a probabilistic algorithm that estimates unique counts extremely fast but with a small margin of error (~1-2%). A
COUNT(DISTINCT)in BigQuery is exact. On large datasets, this difference compounds — your BigQuery number will be precise, the UI’s will be an approximation. Neither is “wrong,” but they won’t agree. - Attribution black boxes. GA4’s reports use their own attribution logic — data-driven attribution, channel groupings, conversion counting windows — none of which are transparent or reproducible in SQL.
- Traffic source scoping. The top-level
traffic_sourcefield in BigQuery is user-scoped — it represents the first source/medium that acquired the user. This is not the same as session source/medium in the GA4 UI. If you’re comparing “where did this session come from?” usingtraffic_source, you’re looking at a fundamentally different thing. The newercollected_traffic_sourcefield in the export schema helps bridge this gap — it captures the session-level attribution that’s closer to what the UI reports. - Scoping differences. The UI applies its own session scoping logic, engagement thresholds, and active user definitions. These don’t map cleanly to anything in the raw event stream.
gclidmisattribution. Google Ads click identifiers get special treatment in the GA4 interface. Sessions with agclidare attributed in ways that can differ from what the rawtraffic_sourcefields in your export suggest.- Timezone discrepancies. Your BigQuery export stores
event_timestampin UTC microseconds. GA4 reports use the timezone configured in your property settings. A session that straddles midnight in your local timezone might land on different dates in the UI vs. your query.
Each of these differences is small-ish on its own. Combined, they produce discrepancies of 10-30% on metrics like active users, sessions, and conversions. And there’s no way to reverse-engineer the exact combination, because Google doesn’t publish the full methodology.
I’ve seen teams burn weeks — sometimes longer — chasing exact reconciliation. Pulling in consultants, filing support tickets, rewriting queries from scratch. The answer was always the same: the numbers diverge by design.
The Fix
Section titled “The Fix”Stop trying to reconcile. Seriously. Treat GA4 UI and BigQuery as two different instruments measuring overlapping but non-identical things.
- GA4 UI is your trend-spotting tool. Good for: “Are sessions trending up or down this month?” “Is this campaign performing roughly in line with last quarter?” It’s pre-processed, modeled, and convenient. Use it for directional monitoring.
- BigQuery is your source for custom analysis. Good for: anything the UI can’t do. Cross-channel attribution modeling, user-level journey analysis, connecting GA4 events with your own backend data, building audiences from raw behavioral signals.
The moment you accept this, you stop wasting time on reconciliation and start spending it on the analysis that only BigQuery can give you.
If stakeholders need a single “official” number, pick one source and commit to it. For most teams doing serious analytics, that source should be BigQuery — because you control the definitions, the logic is transparent, and you can audit every step. Just don’t expect the GA4 UI to agree, and educate your stakeholders on why.
How to Check If You Have This Problem
Section titled “How to Check If You Have This Problem”Run this query for a recent 7-day window:
SELECT FORMAT_DATE('%Y-%m-%d', DATE(TIMESTAMP_MICROS(event_timestamp), 'America/New_York') -- your GA4 property timezone ) AS event_date, COUNT(DISTINCT user_pseudo_id) AS bq_active_users, COUNT(DISTINCT CONCAT(user_pseudo_id, '.', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) ) AS bq_sessionsFROM `your_project.analytics_123456789.events_*`WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())GROUP BY event_dateORDER BY event_date;Now open the GA4 interface, navigate to Reports > Engagement > Overview (or the standard Users report), set the same date range, and compare day by day.
You’ll see discrepancies. Probably 10-30%, sometimes more. The direction of the discrepancy may not even be consistent day to day — some days BigQuery will be higher, some days the UI will be.
That’s the point. Seeing this gap with your own data is what converts the intellectual understanding (“they’re not reconcilable”) into a lived experience that changes how you work. Once you’ve seen it, you stop chasing the match and start building things only BigQuery can do.