tracking infrastructure5 min readBy Phloz team

Exporting ad platform data to BigQuery: the agency hub

The agency architecture for exporting ad platform data to BigQuery: build-vs-buy math, one cross-platform spend schema, and the restatement MERGE pattern.

TL;DR

The agency case for ad-spend data in BigQuery is the join: blended CAC across platforms, each platform's claimed conversions refereed against the GA4 event log, and cross-client comparisons no platform UI will ever run. The architecture has three decisions. Route: native transfer where it exists (Google Ads has a first-party BigQuery Data Transfer Service connector; the DTS third-party list is short but growing), connector vendors (fast, priced per connection — which multiplies brutally across an agency roster), or DIY API pipelines (a day each to build, nearly free to run, you own the maintenance). Schema: one cross-platform spend table at daily ad grain, platform differences carried as columns, never as table forks. Freshness discipline: every platform restates metrics for days after the fact, so every pipeline re-fetches a trailing 28-day window nightly and MERGEs — blind-append tables drift from the UIs and die of distrust. Platform specifics live in the spokes: TikTok and Meta so far.


Every ad platform answers one question well: "how did my ads do, by my counting?" Agencies live on the questions platforms can't answer: what's blended CAC across four platforms this month, per client? How much does each platform exaggerate against the client's actual order table? Which clients' creative fatigues fastest? Those are SQL questions, they need every platform's numbers in one place, and the place is BigQuery — next to the GA4 export that serves as the independent referee.

Decision 1: the route per platform

Native transfers first. Google Ads → BigQuery is a solved problem: the Data Transfer Service connector is first-party, mature, and free of per-connection vendor pricing. If Google Ads is the client's dominant platform, start there — it's the highest-volume data at the lowest operational cost. DTS has also been adding third-party connectors (Facebook Ads among them); the catalog shifts, so check it before building anything by hand.

Connector vendors (Fivetran, Airbyte, Supermetrics, Windsor.ai, Funnel) cover everything with maintained schemas and someone else absorbing API churn. The agency-shaped problem is pricing: per-connection or per-account fees that read as trivial for one brand become a real margin line across thirty clients and four platforms each. The break-even is roughly: if connector spend across the roster exceeds a few engineering days per quarter, the DIY pipelines pay for themselves — and unlike the fees, pipelines don't scale linearly with client count, because one app token typically reports across every account the agency's business-manager structure can access.

DIY pipelines are one pattern repeated per platform: scheduled cloud job → platform reporting API → date-partitioned BigQuery table. The per-platform personality differences (auth model, rate-limit style, async report flows) are exactly what the spoke posts cover — TikTok and Meta — but the architecture never changes.

CSV is for one-off audits and for validating any pipeline against the UI before a dashboard ships on it. A recurring manual export is a pipeline with a single point of failure who takes vacations.

Decision 2: one schema, platform differences as columns

The failure mode that makes multi-platform spend data useless is letting each platform keep its own table shape, so every blended query is a fresh archaeology project. Land everything — whatever the route — into one canonical table at the finest common grain: one row per day, per ad, per platform.

CREATE TABLE `phloz_agency.ad_spend_daily` (
  stat_date            DATE NOT NULL,
  platform             STRING NOT NULL,   -- 'google_ads' | 'meta' | 'tiktok' | ...
  phloz_client_id      STRING NOT NULL,   -- same key the GA4 events_unified view uses
  account_id           STRING NOT NULL,
  campaign_id          STRING,
  adgroup_id           STRING,
  ad_id                STRING,
  currency             STRING NOT NULL,
  spend                NUMERIC,
  impressions          INT64,
  clicks               INT64,
  conversions          NUMERIC,
  conversion_value     NUMERIC,
  attribution_setting  STRING,            -- what 'conversions' means on this row
  fetched_at           TIMESTAMP NOT NULL
)
PARTITION BY stat_date
CLUSTER BY platform, phloz_client_id;

Three deliberate choices. phloz_client_id matches the client key in the GA4 canonical view, so referee joins are one line. attribution_setting rides along because two accounts with different windows produce conversions columns that are different metrics — a blended dashboard that averages them without knowing is lying politely. And platform-specific extras (placement breakdowns, view-through splits) go in optional side tables keyed back to this one — the canonical table stays narrow and every cross-platform query stays trivial.

Decision 3: the restatement MERGE

The single most common way agency spend tables die: the nightly job appends yesterday's rows and never looks back, but every platform restates recent metrics as attributed conversions land — Meta documents up to 28 days of updates, TikTok revises within its attribution windows, Google Ads adjusts for conversion lag. Within weeks the table disagrees with every platform UI, the team learns to distrust the dashboard, and the pipeline is dead regardless of how well it runs.

The pattern that survives, identical on every platform:

MERGE `phloz_agency.ad_spend_daily` t
USING staged_fetch s   -- trailing 28 days, re-fetched nightly
ON  t.stat_date = s.stat_date
AND t.platform  = s.platform
AND t.ad_id     = s.ad_id
WHEN MATCHED THEN UPDATE SET
  spend = s.spend, impressions = s.impressions, clicks = s.clicks,
  conversions = s.conversions, conversion_value = s.conversion_value,
  fetched_at = s.fetched_at
WHEN NOT MATCHED THEN INSERT ROW;

Re-fetching 28 days of daily ad-level rows is cheap on every platform's API budget; freezing wrong numbers is expensive in the only currency that matters, which is whether anyone believes the table.

What you build on top

The spend table earns its keep in three joins. Blended CAC per client: spend summed across platforms over conversions from the client's source of truth. The exaggeration report: each platform's claimed conversions versus the GA4/CRM referee — the artifact that reallocates budgets in quarterly reviews, per the GA4 export post's attribution-diff pattern. Cross-client benchmarks: the per-vertical CPM and CVR baselines that make new-business pitches honest. The reading layer is the same one as always — Looker Studio on materialized report tables, never dashboards querying raw tables per page-load.

Pipelines are tracking infrastructure

A spend pipeline fails like a pixel fails: silently, off-screen, discovered by a client. Tokens expire, API versions sunset, an account permission gets pruned. Each pipeline needs an owner, a freshness check ("is the newest partition within 48 hours?"), and a recurring verification cadence. In Phloz, each pipeline is a node on the client's tracking infrastructure map with that recurring task attached and a last-verified date on the node — the same treatment every tracking asset gets, because that's what it is.

The series continues platform by platform — TikTok and Meta are live; Google Ads' DTS setup and the smaller platforms are next as the data justifies them.