GA4 says (not set), thresholded, or sampled: why your data looks incomplete
Three different GA4 quirks make data look missing or wrong — thresholding hides rows, (not set) means no value, sampling estimates from a subset. What each one is, why it happens, and how to get the full picture.
TL;DR
Three distinct GA4 behaviours get blamed on "GA4 is broken" when they're actually working as designed — and telling them apart is the whole fix. Data thresholding: GA4 withholds rows when low numbers + Google Signals could identify individuals (you see "a threshold has been applied"). (not set): a dimension simply has no value for those rows — and the cause differs by dimension (lost source, processing timing, unregistered parameter). Sampling: large explorations estimate from a subset of data rather than all of it. None means your tracking is broken; each has a different mitigation, and the universal escape hatch for all three is the BigQuery export (raw, unsampled, unthresholded). Below: what each one is, why it happens, and what to do.
A client asks why their GA4 shows "(not set)" for a third of conversions, or why a row vanished, or why two reports of the "same" thing disagree. The instinct is that tracking is broken — but usually it's one of three built-in GA4 behaviours, each with its own cause. Diagnose which, and the panic goes away.
1. Data thresholding — GA4 is hiding rows on purpose
When a report combines low volumes with signals that could identify individuals (demographics, granular dimensions, Google Signals on), GA4 withholds the data and shows a notice ("a threshold has been applied"). It's a privacy protection, not a bug — but it makes reports look incomplete, and it's maddening on smaller sites.
Why it happens: Google Signals enabled + small numbers + a granular/demographic report.
What to do: for the affected report, reduce what triggers it — view a less granular breakdown, or adjust the reporting identity / Google Signals usage where the demographic data isn't essential. For full fidelity with no thresholding, use the BigQuery export — raw event data isn't thresholded.
2. (not set) — the dimension has no value (and why varies)
(not set) means GA4 had no value for that dimension on those rows. The cause is dimension-specific, so don't treat it as one problem:
- Source / medium
(not set)→ attribution was lost: params stripped by a redirect, broken cross-domain, or a source-less server event. This is the Unassigned/traffic-attribution story. - Landing page
(not set)→ often a processing/timing artifact (thepage_viewand the session-start didn't line up), or a session with no qualifying page_view. - A custom dimension
(not set)→ the parameter wasn't sent on that event, or wasn't registered in time (custom dimensions aren't retroactive).
What to do: identify which dimension is (not set) and fix that specific cause — attribution capture for source, event timing for landing page, parameter registration for custom dimensions. There's no single fix because there's no single cause.
3. Sampling — estimated from a subset
GA4's standard reports are largely unsampled, but explorations over large date ranges or high-cardinality queries can apply sampling — computing from a representative subset and estimating the rest. You'll see a sampling indicator. It's usually fine directionally, but it's why two views can disagree slightly.
What to do: shorten the date range or reduce complexity to drop below the sampling threshold; use a standard report where one exists; or go to BigQuery for exact, unsampled numbers. (GA4 360 raises the thresholds, but BigQuery is the real answer.)
The related gotcha: (other) from high cardinality
Adjacent to these: a dimension with too many unique values (a raw URL with query strings, a timestamp) gets its long tail bucketed into an (other) row, making it useless for analysis. The fix is upstream — keep dimension values categorical and bounded, don't shove high-cardinality data into a custom dimension.
How to explain it to a client
The client-facing version is short: "GA4 protects privacy by hiding small numbers (thresholding), labels missing values as (not set), and estimates very large queries (sampling) — none of it means we're not tracking; it means GA4's interface has limits. For the complete, exact picture we use the BigQuery export." Framing it as known platform behaviour (with a fix) rather than "something's wrong" is the difference between a confident answer and a worried client.
Where this fits
These quirks are why "is the data wrong, or is it GA4 being GA4?" is a real, recurring question — and answering it confidently requires knowing the client's setup (Google Signals, BigQuery link, custom dimensions) and what's expected. Phloz keeps each client's GA4 configuration and its data pipeline (including the BigQuery export, the escape hatch for all three) modeled on the tracking-infrastructure map, so you can tell "platform behaviour" from "broken tracking" at a glance. The CRM for SEO agencies and pricing pages cover the workflow — but the takeaway is: thresholding, (not set), and sampling are three different things with three different causes, and BigQuery is the universal way to see past all of them.