ThriveArk
Home/Insights/What 90 days to production actually looks like
Article

What 90 days to production actually looks like

Everyone promises fast delivery. Here's the honest breakdown of what happens in the first 90 days of an enterprise AI engagement — the decisions, the blockers, and the inflection points that determine whether you ship something real.

ThriveArk· EditorialMay 14, 20268 min read

Day three. The integration architect is on a call with the client's data engineering lead, walking through the pipeline architecture. The diagram on screen looks clean — clearly something prepared for the scoping conversations. Partway through, almost as an aside, the lead mentions that the inventory feed the team has been assuming will power the first pricing agent runs every two hours, not continuously, and has a roughly fifteen percent failure rate on weekends.

This wasn't in the scoping documents. Nobody had mentioned it in three weeks of pre-engagement calls. It changes where the first agent goes.

That's day three. The rest of the first thirty days is variations on that conversation.

Days 1–30: What's actually true

The audit phase is named wrong. "Audit" implies verification — checking that documented architecture reflects reality. What actually happens is closer to excavation. Organizations have a mental model of their own infrastructure that diverges from how it behaves under real operational conditions. The divergence accumulates over years of patches, migrations, workarounds, and tribal knowledge that never made it into documentation. Nobody is hiding anything. The gap just grows.

The first thirty days are about finding where that gap matters for the specific system you're building.

Data freshness is almost always worse than represented. The feed described as "near-real-time" in the architecture diagram runs on a fifteen-minute cadence with a queue that backs up under load. For an agent making decisions where the competitive window is hours, that's acceptable. For one where the window is minutes, it's the reason to start the streaming migration before anything else. The right answer depends on the specific agent and the specific decision — which is why the audit has to precede the build, not run alongside it.

The organizational map matters as much as the technical one. An agent that writes prices to a commerce layer or triggers interventions in a CRM needs someone to grant it that access. That person is almost never in the kickoff meeting. Finding who owns the decision — not who owns the system, but who has the authority to say "yes, this agent can act without human review on this class of decision" — is audit work. The teams that treat it as build work discover the dependency in week seven, when the access request hits a security review queue with a three-week SLA.

The output of day thirty is a decision log, not a requirements document: here's what we found, here are the four facts about this environment that constrain what we build first, here are the decisions that follow. This agent, this data source, this integration point, this governance path. The specificity of that document determines how much the build phase costs and how far it gets in thirty days.

One other thing the audit produces that doesn't appear in any document: a clear picture of which parts of the organization are ready for an agent that makes autonomous decisions, and which parts aren't. That picture shapes the scope of the first deployment more than any technical constraint.

Days 31–60: What ships and what it means

The first thing that ships is almost never the most compelling use case. It's the use case where the data is clean enough, the integration access is granted, and the governance path is clear. That intersection is usually narrow. One product category instead of the full catalog. One customer segment instead of the whole base. Human approval required on decisions above a certain value threshold.

This tends to look underwhelming compared to the scoping conversations. That's expected.

The production environment — real data quality issues, real load patterns, edge cases that never appeared in evaluation — reveals behaviors that the development environment didn't show. Running with human oversight for two weeks in production tells you more about the agent's actual behavior than six weeks of offline testing. The gap between demo performance and production performance on day one is a feature of honest deployments, not a sign that something went wrong.

During this phase, the value accumulates in the decision log more than in the scope of what's running. The agent makes a decision. The decision is logged with the signal that triggered it, the options it considered, and the guardrail conditions that applied. Someone reviews the log each day. That review process calibrates the guardrails, builds organizational confidence, and produces the evidence that makes the day-sixty conversation possible.

By day sixty, the agent has been running in production for two to three weeks. The decision log shows its behavior across the range of real conditions it's encountered. The next question — what does it take to expand to the full catalog, drop the approval requirement, extend the same pattern to a second decision type — now has a concrete answer, grounded in observed production behavior rather than projected performance from an evaluation set. That answer is where the engagement either accelerates or stalls.

Days 61–90: The direction things go

The engagements that accelerate at this stage share a specific characteristic: the infrastructure built in days one through sixty was scoped slightly wider than the first use case required.

The streaming pipeline built for the pricing agent can serve a second agent without rebuilding. The integration pattern used for the first operational system replicates to a second in days. The decision log structure already supports the governance review the next expansion will need. None of this happens automatically — it requires a deliberate choice during the build phase to invest a small amount of additional scope in generality. The teams that make that choice find each subsequent agent takes meaningfully less time than the first. The teams that build tighter find themselves starting over each time.

The engagements that stall here almost always stall for an organizational reason. The agent is performing. The case for expansion is clear. What's missing is sign-off from stakeholders who weren't involved in the first sixty days and are encountering the system for the first time. Their questions are reasonable. The process to answer them takes time that wasn't in the schedule.

The organizations that navigate this start the governance conversation in week two, not week nine. The stakeholders who will need to approve autonomous decision-making at scale are identified during the audit and brought into the process while there's still time to address their concerns before they become blockers. Governance moves faster when it's treated as a design constraint from the start rather than a checkpoint at the end.

What actually slips the timeline

This is the honest section, because the real causes of slippage are more useful to know about than the aspirational ones.

Integration access takes longer than anyone expects. The agent needs write access to a production system. That access requires a security review. The security review requires documentation that doesn't exist, or approval from a team with a multi-week queue, or a policy exception that needs VP sign-off. Sixty-day engagements turn into hundred-day engagements waiting for access provisioning that started in week five. The organizations that hit ninety days identified the target integrations in week one and started the access request process immediately.

Data quality surprises emerge after the build starts. Sample data used during scoping looked clean. The full production dataset has anomalies, gaps, and edge cases that weren't visible in the sample — a product category with missing metadata, a customer segment with incomplete behavioral history, a system that generates valid-looking but semantically incorrect values under certain load conditions. This is the nature of production data at enterprise scale, and no scoping process catches all of it. The teams that absorb these surprises without derailing the schedule built enough slack into the first two weeks of the build phase that a week of unexpected data work doesn't cascade.

Key dependencies go unavailable. The data engineering lead who owns the pipeline architecture goes on leave in week five. The platform team that controls the integration environment is in a code freeze because of an unrelated release. An AI engagement doesn't automatically get priority in the queue. The audit phase is supposed to surface these risks; sometimes it does and sometimes it surfaces them too late. The organizations that handle this best have a clear picture of which delays are recoverable with scope adjustment and which require a timeline conversation.

Governance takes longer when the question is genuinely novel. Most enterprises have a process for approving software releases. They don't have one for approving a system that makes autonomous operational decisions. Building that process mid-engagement, while the clock is running, is expensive. The organizations that move fast on governance have framed the answer clearly from the start: here is what the agent decides, here are the boundaries it operates within, here is the decision log you can read at any time. That framing reduces the novelty, which reduces the deliberation time.

These are the predictable friction points of deploying autonomous systems inside organizations built around human decision-making. Knowing they're coming doesn't eliminate them. It does make them cheaper to navigate.

What ninety days is actually measuring

Enterprise software implementations routinely take two years. What they produce is a configured system that does what it was told — requirements gathered, use cases covered, tested and deployed. It behaves the same way every time it runs.

A system that makes autonomous decisions at scale is a different category. It learns from its outputs. It encounters conditions the original deployment didn't anticipate. Three years after launch, it's making different decisions than it made on day one — because the environment has changed, because the model has been updated, because the guardrails have been tuned against real production behavior.

The work that takes two years in a traditional enterprise transformation is largely work this category doesn't need: the sequential approval gates, the exhaustive requirements documentation, the proof-of-concept that lives in a sandbox until someone decides it's production-ready. What it needs instead is a real foundation — signal that's fresh enough to act on, integration access that lets the agent affect something, guardrails encoded in the system rather than in a process document, and a decision log from day one that makes the system's behavior legible to everyone who needs to understand it.

Ninety days builds that foundation and gets the first agent making real decisions in production. Whether the investment compounds from there depends entirely on whether what was built can support what comes next — whether the next agent builds on something solid or gets rebuilt from scratch.

That's a different kind of work. It's also the only kind that compounds.

ThriveArk runs 90-day engagements that end with an agent in production, not in a sandbox. If you want to understand what the first thirty days would surface in your environment, start a conversation →

Work with ThriveArk

The architecture behind
the ideas.

If this raised questions about your own stack — good. Tell us what you're building and we'll tell you how we'd approach it.

hello@thriveark.comBook an intro callReply within 48 hours · NDA on request