How to evaluate agency software: a trial that decides
How to evaluate agency software with a one-week trial that decides: real data, the messiest client, the break test, the exit test, month-18 math.
TL;DR
Most agency software trials are vibes plus demo data, which is why most agency software regret arrives around month 18. The trial that actually decides runs one structured week: before signup, write down the three jobs the tool must do and the incident that triggered the search. Day one, import real data — your actual clients, including the messy one. Days two and three, run real work through it (not alongside it). Day four, try to break it: client-confidentiality walls, permissions, the weird edge your agency actually has. Day five, run the exit test — export everything and look at what comes back, because the cost of leaving is part of the price of staying. Staff the trial with the team skeptic, not just the champion; score against your incident list, not the feature tour; run the month-18 pricing math at the team size you plan to be; and end with a decision meeting that has exactly three outputs — buy, no, or one extension with a named reason. Anything fuzzier than that is how agencies end up paying for three tools at 40% adoption each.
This is the method post behind the whole review series. The buyer's guide covers what to look for in a CRM specifically; the best-tools list covers the field; and the agency project management software guide covers that category in depth. This one covers the part everyone skips: how to run the trial so it produces a decision instead of a feeling.
Before signup: the incident list and the three jobs
Software searches start with an incident — the report that took a weekend, the client email nobody answered, the spreadsheet that lied. Write the incidents down before you see any product, and distill them into three jobs the tool must do. Three, written, specific: "client email threads to the client record automatically," not "better communication."
This list is the whole defense against the demo. Demos are optimized to show what's impressive; your incidents define what's relevant. Every tool in our review series looks great doing its happy path — the evaluation question is always whether your unhappy paths are covered.
Day one: real data, including the messy client
Demo data is a lie of omission — every record complete, every name spelled once. Import your actual book, and make sure it includes the messiest client: the one with three businesses under one retainer, the duplicate contacts, the tracking setup nobody fully understands. How the tool handles your mess is the product; how it handles its own sample data is theater.
Note the import experience itself — it's a preview of the migration you'll run if you buy, and friction here multiplies by every client you have.
Days two and three: run work through it, not alongside it
The trial fails the moment it becomes a parallel universe — real work continuing in the old tools while two people click around the new one. For two days, route a real slice through the candidate: one client's tasks, this week's actual client emails, the Tuesday report. You're testing the operating rhythm fit — does the daily loop (what's due, what came in, what's blocked) feel like less friction than today, with real volume and real interruptions?
Adoption is the multiplier on every feature. Two days of real use predicts it better than any feature matrix.
Day four: the break test
Spend a day trying to make the tool fail in the ways that would actually hurt:
- Client walls. Add a contractor with access to exactly one client; verify they genuinely can't see — or infer — the others. The review series found this is a default in agency-shaped tools and a maintained discipline in horizontal ones (Airtable, Notion); day four is where you find out which you're holding.
- Your weird edge. Every agency has one — the client with two brands, the department that works differently. Force it through.
- Quota and limit behavior. What happens at the automation cap, the storage line, the seat boundary? The Monday TCO layers are findable in a trial if you go looking.
Day five: the exit test
Export everything. Look at what comes back: complete records or summaries? Relationships intact or flattened? A format another tool could import, or a souvenir? Vendors publish entrance paths and bury exits — but the cost of leaving is part of the price of staying, and it compounds with every month of data you accumulate. A tool that holds your data hostage has told you what the renewal negotiation will feel like. (This is why data export is a one-click, full-fidelity feature in Phloz — disclosure as ever, but the test is vendor-neutral: run it on us too.)
Who runs the trial, and how it ends
Staff the skeptic. Trials run by the tool's internal champion always pass. Add the person who'll complain loudest — if the tool wins them, adoption is real; if it can't, you've learned it cheaply.
Score against the incident list. End-of-week question per incident: "would this have prevented or shrunk it?" Yes/no/partially, written down. Features that don't map to an incident score zero, however shiny.
Run the month-18 math. Price at the team size and client count you plan to have, with the tier you'll actually need (the trial told you which features forced it) — the TCO discipline in one line.
Decide with three outputs. Buy (with a migration date), no (with the reason logged for next year's search), or one extension with a named open question. "Let's keep playing with it" is the fourth output, and it's how trials become un-cancelled subscriptions.
One week, one messy client, one skeptic, one written scorecard. It's a fraction of the diligence agencies apply to a $5k client proposal — applied, for once, to a decision that compounds for years.