Skip to content

Pitfall #2: LIMIT Doesn't Limit Your Bill

By Timo Dechau · Last updated March 25, 2026

You’re new to BigQuery — or maybe just new to GA4’s export — and you want to explore the data. So you do the sensible thing: SELECT * FROM events_* LIMIT 10. Just ten rows, right? How expensive could that be? Very. That query just scanned your entire dataset. Every column, every table, every day of data you’ve ever collected. The LIMIT 10 only limits the output. BigQuery already read everything to get there.

BigQuery is a columnar, full-scan engine. When you run a query, it reads all the data that matches your FROM clause before applying LIMIT. That’s just how the architecture works — there’s no early-exit optimization that says “I found 10 rows, I can stop now.”

Now here’s where it gets interesting with GA4 specifically. Your GA4 export doesn’t use a single partitioned table. It creates date-sharded tables: events_20250101, events_20250102, and so on. When you query events_* with a wildcard, BigQuery unions all of those tables together. Without a _TABLE_SUFFIX filter to constrain which date-shards get included, you’re scanning everything. Every single day.

And this compounds. On day one, your dataset is small. By month six, it might be hundreds of gigabytes. By year two, a terabyte or more. The exact same SELECT * query gets more expensive every single day, even if you never change it. Scheduled queries and dashboards that seemed cheap in January can quietly become budget problems by October.

Three habits that will save you real money:

1. Always filter by _TABLE_SUFFIX for date ranges:

SELECT
event_name,
event_timestamp,
user_pseudo_id
FROM
`your_project.analytics_123456789.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20260301' AND '20260325'

This tells BigQuery which date-sharded tables to actually read. Everything outside the range is skipped entirely — zero bytes scanned.

2. Use TABLESAMPLE for exploration:

SELECT *
FROM
`your_project.analytics_123456789.events_20260325`
TABLESAMPLE SYSTEM (1 PERCENT)

This physically reads only ~1% of the table’s storage blocks. It’s perfect for “I just want to see what the data looks like” moments. Not statistically rigorous, but that’s not the point — you’re exploring, not analyzing.

3. Always check dry-run bytes before executing:

In the BigQuery console, look at the green validator message in the top-right corner before you hit Run. It tells you exactly how many bytes the query will scan. Get in the habit of glancing at it. If a “quick exploration” query says it’ll scan 800 GB, that’s your cue to add a date filter.

You can also dry-run from the command line:

Terminal window
bq query --dry_run --use_legacy_sql=false \
'SELECT event_name
FROM `your_project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX = "20260325"'

Run the same simple query with and without a date filter, using dry-run mode, and compare the bytes:

-- Without date filter (check dry-run bytes):
SELECT event_name
FROM `your_project.analytics_123456789.events_*`
LIMIT 10;
-- With date filter (check dry-run bytes):
SELECT event_name
FROM `your_project.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX = '20260325'
LIMIT 10;

Don’t actually run the first one — just look at the dry-run estimate. If the difference is 100x or more, you now understand why every query needs a date filter. That gap is money.

Also check your query history for past damage:

SELECT
user_email,
query,
total_bytes_processed,
ROUND(
total_bytes_processed / POW(1024, 3), 2
) AS gb_processed,
creation_time
FROM
`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(
CURRENT_TIMESTAMP(), INTERVAL 7 DAY
)
AND total_bytes_processed > 10 * POW(1024, 3)
ORDER BY total_bytes_processed DESC
LIMIT 20;

This surfaces any queries from the last week that scanned more than 10 GB. You might be surprised what you find.