Pitfall #10: Consent Mode Creates Invisible Data Gaps
The Trap
Section titled “The Trap”You’ve implemented Google Consent Mode v2 — the right thing to do for privacy compliance. Your data keeps flowing into BigQuery, your event counts look healthy, and everything seems fine. Then you try to do session analysis and the numbers are off. Or you notice your user count is lower than expected. Or — and this is the sneaky one — your conversion rate looks higher than it should, and you can’t figure out why.
What’s happening is that a chunk of your events are arriving without the identifiers you need. Consent mode is working exactly as designed, and it’s quietly making parts of your dataset unreliable for user-level and session-level analysis.
Why It Happens
Section titled “Why It Happens”When a user doesn’t grant consent for analytics cookies, Google Consent Mode does something that’s easy to miss: it still sends events, but it strips out the identifying information.
In Advanced Consent Mode (which is what most implementations use), non-consented hits arrive in your BigQuery export as what I call “cookieless pings.” They have event names, timestamps, page locations — but the user_pseudo_id is either missing or replaced with a temporary value, and ga_session_id is absent. These events count toward your totals but can’t be stitched into sessions or attributed to users.
Here’s the thing — your event count stays high, which makes everything feel normal. But when you try to aggregate at the session or user level, you’re working with an incomplete picture. And the incompleteness isn’t random. Users who decline consent tend to be more privacy-conscious, often more technically sophisticated, and depending on your market, may represent a specific demographic segment.
So when teams filter to consented-only data — the natural instinct, since it’s the only data with usable identifiers — they’re not just shrinking the sample. They’re introducing a systematic bias toward users who click “Accept All.” In some European markets, I’ve seen consent rates as low as 30-40%. Filtering to consented-only means you’re basing your analysis on the minority of your traffic.
The Fix
Section titled “The Fix”There’s no magic query that makes this go away. The fix is a conscious strategy, and it starts with understanding your actual consent breakdown.
Step one: know your consent rate. Run the diagnostic below. This tells you what percentage of your data is fully consented, partially consented, and non-consented.
Step two: decide on a strategy based on that number:
- High consent rate (>80%): You can mostly filter to consented data and note the limitation. The bias exists but is manageable.
- Medium consent rate (50-80%): Use consented data for user/session analysis but use all events (including non-consented) for aggregate metrics like page views and event counts. Be explicit about which metrics use which dataset.
- Low consent rate (<50%): Your consented-only data is a minority sample. Aggregate event-level analysis on the full dataset. For user/session work, consider statistical adjustment or server-side collection as a complement.
The key principle: never mix consented and non-consented data in session or user analysis without knowing you’re doing it. And never present consented-only analysis as “total” numbers without disclosing the gap.
For page-level or content analysis where you don’t need user identity, use all events regardless of consent status:
-- Aggregate analysis: use all eventsSELECT (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location' ) AS page, COUNT(*) AS pageviewsFROM `your_project.analytics_123456789.events_*`WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE()) AND event_name = 'page_view'GROUP BY pageORDER BY pageviews DESCLIMIT 50;For session-level analysis, filter explicitly and document it:
-- Session analysis: consented traffic onlySELECT CONCAT(user_pseudo_id, '.', (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') ) AS session_id, COUNT(*) AS events_in_sessionFROM `your_project.analytics_123456789.events_*`WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE()) AND privacy_info.analytics_storage = 'Yes'GROUP BY session_idHAVING session_id IS NOT NULL;How to Check If You Have This Problem
Section titled “How to Check If You Have This Problem”Run this to see your consent breakdown right now:
SELECT privacy_info.analytics_storage, privacy_info.ads_storage, COUNT(*) AS event_count, COUNT(DISTINCT user_pseudo_id) AS unique_pseudo_ids, ROUND( COUNT(*) / SUM(COUNT(*)) OVER () * 100, 1 ) AS pct_of_eventsFROM `your_project.analytics_123456789.events_*`WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())GROUP BY privacy_info.analytics_storage, privacy_info.ads_storageORDER BY event_count DESC;Look at the rows where analytics_storage is 'No' or NULL. That’s your invisible data gap. If those rows represent more than 20% of your events, you need a deliberate strategy — not just a WHERE clause.
Also pay attention to the unique_pseudo_ids column for the non-consented rows. If that number is suspiciously low or shows a lot of NULL values, those events are truly anonymous — they can’t be sessionized or attributed at all. That’s the portion of your data that’s essentially event-level only. Knowing the size of that segment is the first step to not being surprised by it.