Flagship whitepaper

ValueMaxx Agents in Production

Production AI that's private, cost-effective at scale, and deterministic. Eight agents, one repeatable pattern.

12 min readJune 2026
  • Private
  • Operationally cost-effective at scale
  • Deterministic & controlled

Cost to run as volume grows

Illustrative. Small task-specific models on your infrastructure vs a general model billed per token.

TokenMaxx: general model, per-tokenrises with volume
ValueMaxx: small models, your infrastructure~flat, ~10× cheaper
~10×
Cheaper to run than per-token

Illustrative

8 agents
One repeatable pattern
Private
Runs in your environment
Cited
Every answer traceable
01

Executive summary

Attentions builds enterprise AI agents that do real work in production: they read the documents, calls, claims, and reports a workflow already depends on, apply your rules, and post a clean, cited result into the systems you already run. Eight are live today on customers' own infrastructure, across finance, healthcare, automotive, and engineering. Every one is built to the same three guarantees: it runs privately inside your environment, it is operationally cost-effective at scale, and it is deterministic and controlled so it doesn't make things up.

This paper sets out the thesis behind that approach, how the three guarantees are actually enforced, how the agents are assembled from a shared catalogue of building blocks, and the engagement model we put in writing. Each of the eight agents has its own deep-dive paper covering the workflow it runs.

For the reader in a hurry: The thesis and How we build and measure are the business case; The three guarantees and How the agents are assembled are written for your technical and security teams.

02

The thesis: ValueMaxx, not TokenMaxx

Most enterprise AI gets two things wrong, and they are the two things that decide whether an agent survives contact with production: the running cost, and the hallucination.

The running cost is the number that never shows up in the demo. Big, general-purpose models are priced by the token. Point one of those models at a high-volume workflow, say thousands of invoices, claims, calls, or tender lines, and the bill scales with the volume of work, not with the value it creates. The busier the agent gets, the worse the maths. A pilot that looked brilliant becomes a workflow that costs more to run than the manual work it replaced. That is the quiet failure of enterprise AI, and it is the default outcome of optimising for the most capable, most expensive model rather than for the most work per dollar.

The hallucination is the other half. A large general-purpose model is built to always produce a fluent answer, even when it doesn't actually know. That tendency to fill the gap with something plausible is where hallucinations come from, and when the answer lands on an invoice, a warranty verdict, or a patient record, a confident wrong answer is worse than no answer at all.

We optimise for a different goal. Most enterprise AI maximises tokens. We build agents that maximise value: the same intelligence, run privately, for a fraction of the cost, engineered to be right and able to prove it. We call it ValueMaxx, not TokenMaxx. It is a simple test you can apply to any AI project: is it engineered to do the most work per dollar, or just to use the most capable model available?

Everything that follows is the same three guarantees, expressed eight different ways. Every agent below is in production today, on a customer's own infrastructure.

Private

Runs inside your own environment; data never leaves your walls.

Cost-effective at scale

Small job-specific models keep the running cost nearly flat.

Deterministic & controlled

Rules and citations mean it doesn't make things up.

03

The three guarantees

Private

Runs inside your environment, your servers or your own cloud account. Documents, index, and (at the highest tier) the model itself never leave; permissions are inherited and every action is audit-logged.

Operationally cost-effective at scale

Small, job-specific models on your infrastructure instead of a general model billed per token. Running cost stays roughly flat as volume grows, about 10× cheaper to run than the general-purpose approach, and is watched from day one.

Deterministic & controlled

Fills set fields instead of free text, checks output against your rules, cites every answer to its source, and routes low-confidence cases to a person. Same input, same output.

Every Attentions agent is assembled from a shared catalogue of single-job capability blocks. The three guarantees are not marketing posture; they are specific blocks doing specific work in every agent's pipeline. Here is how each one is actually enforced.

Private (sovereign)

The fastest way to get AI working is also the riskiest: ship your documents to someone else's cloud and let a public API read them. Our agents do the opposite. They run inside your own environment, on your servers or in your own cloud account on AWS, Google Cloud, or Azure. We bring the AI to your stack instead of asking you to move your data to ours. At the highest tier of data-sovereignty, the model itself runs on dedicated hardware inside your walls, so documents, index, and processing never leave at all.

Two blocks enforce it concretely. Access & Permission Inheritance carries over the RBAC and SSO your systems already enforce: if a person can't see a document in SharePoint or a record in the EMR, they can't see it through the agent either, so there is no new place for sensitive data to leak. Source Citation & Audit Trail logs every query, every source, and every action in a tamper-evident record, so when compliance asks what the AI saw and did, the answer is one audit trail away. Sovereign Deployment is the design default behind both. This is what lets us work to the standards your auditors care about, GDPR, HIPAA (where patient data is involved, as in Clinical Scribe and Patient Front Desk), SOC 2 Type 1, and ISO 27001, rather than bolting security on at the end. This is the sovereign-by-design posture described in Sovereign by design.

Operationally cost-effective at scale

A model fine-tuned for one job, like reading an invoice, matching a voucher, or writing up a clinical note, needs far less computing power than a giant model that also has to be able to write poetry. So instead of a general-purpose model billed per token, every agent runs small, job-specific models on your own infrastructure, with the hardware sized to the workload, not to the brochure. The result is a running cost that stays nearly flat as volume grows, rather than scaling with every token, and that is roughly ten times cheaper to run in production than the general-purpose approach.

We also treat cost as an operations discipline, not a quarterly surprise: the running cost is watched from the first day in production and managed as volumes grow. That is FinOps for AI, handled for you, and it is the difference between an agent that pays for itself and one that quietly runs up the bill. The full argument is in Why your AI bill is bigger than the work it replaced.

Deterministic and controlled (least hallucination)

We don't hand your work to one giant know-it-all model and hope. We build for "right, and able to prove it," and four mechanisms enforce it on every agent.

  • Fields, not free text. Where we can, the agent fills set fields rather than writing loose paragraphs. Structured output leaves far less room for mistakes to hide and makes every value easy to check.
  • Rule & Tolerance Checks gate every output. Business rules check the agent's work before anything is final, so nothing reaches your system without clearing them. This block is the engine that keeps agents deterministic and the reason they don't make things up.
  • No source, no answer. Through Source Citation & Audit Trail, every answer points back to the exact document and line it came from. If the agent can't ground a value, it says so instead of inventing one.
  • Confidence Scoring routes the uncertain to a human. Anything shaky is labelled and handed to a person rather than slipped through as fact.

The result is an agent that behaves the same way every time and can always show you why it did what it did: deterministic, and explainable. The full mechanism is in How we stop agents from making things up.

04

How the agents are assembled

There is no eight-times-the-work secret here. Every agent is composed from one shared catalogue of single-job capability blocks, grouped into six families. A new workflow reuses what is already in production rather than starting from scratch, which is why the three guarantees hold uniformly across all eight rather than being re-earned each time.

  • Read & ingest, turn messy inputs (PDFs, scans, email, spreadsheets, handwriting, images, video frames, speech) into clean structured data. The starting point for almost every agent.
  • Classify & organise, recognise what each input is and route it: documents to the right type, a conversation into a structured note, a long report into a faithful summary.
  • Match & correlate, compare documents field by field, find reused or duplicate assets, and link entities, claims, parts, and people across a whole portfolio.
  • Reason & decide, apply policy before an answer becomes an action: rule and tolerance checks, anomaly and fraud detection, confidence scoring, ranked evidence.
  • Ask & act, let people ask in plain English, then move the work forward by posting results and taking the next step in the systems you already run.
  • Govern & prove, make every answer private, permission-aware, and auditable: source citation, permission inheritance, sovereign deployment.

An agent is a recipe: a named combination of these blocks for one job. Invoice Intelligence is Any-format Extraction → Classification → N-way Matching → Rule & Tolerance Checks → System Posting → Source Citation & Audit Trail. Swap the reading block for voice transcription and the matching block for a six-way check and you have a different agent on the same foundation. The full catalogue is browsable at the building blocks. Crucially, the three guarantees live inside the Govern & prove and Reason & decide families, so they are present in every recipe by construction rather than promised on the side.

05

Eight agents, one pattern

01

Invoice Intelligence

Reads any invoice, runs the three-way match

02

Enterprise Document Intelligence

Private chat across your own documents

03

Voucher Matching

Six-document cross-border payment check

04

Defect Intelligence

Every defect report, useful to every team

05

Clinical Scribe

Writes the SOAP note during the visit

06

Patient Front Desk

Answers every call, books every appointment

07

Warranty & Fraud Intelligence

One audit-ready verdict per claim

08

BOQ Intelligence

Reads and compares huge tenders in minutes

Illustrative. Eight different jobs, one pattern: each is private, cost-effective at scale, and deterministic by construction.

Every agent below reads the messy inputs a workflow already depends on, applies your rules, posts a clean and cited result into the systems you already run, and routes the genuine exceptions to a person. Same pattern, eight jobs.

Invoice Intelligence, accounts payable, in production at a multi-business-unit conglomerate. It reads any invoice (PDF, scan, email, spreadsheet, even handwriting), runs the three-way match against the PO and goods-received note using your own per-supplier, per-category tolerances, and posts the clean ones straight into your finance system. The most distinctive guarantee here is determinism in action: N-way Document Matching plus Rule & Tolerance Checks mean a fuel bill and a fixed service contract are judged by different rules, and nothing posts until it clears them. (Illustrative) 60-85% of invoices auto-matched, with people handling exceptions only.

Enterprise Document Intelligence (artiGen), for any team, any industry, in production, and the platform every other agent is built on. A private ChatGPT over your own corpus, running inside your environment, where every answer links back to the exact file and page, and if the answer isn't in your documents it says so. The most distinctive expression of the pillars is sovereignty at corpus scale: Large-Volume Extraction & Q&A answers across millions of private documents on your own infrastructure, with Access & Permission Inheritance so answers respect who's allowed to see what. (Illustrative) a 20-40 minute hunt, or an interrupted expert, becomes a few seconds.

Voucher Matching, cross-border payments, in production at a multi-business-unit trading group. It reads all six documents of a payment pack at once (sales order, delivery note, invoice, customs declaration, payment instruction, approval), cross-checks them line by line, pays the clean packs automatically, and routes risky ones to a named approver with the mismatch highlighted and a fix drafted. Its distinctive control is Confidence Scoring: a low-confidence match is never auto-paid. (Illustrative) catching a single material currency or payee mismatch before the money moves can be worth more than a year of running cost.

Defect Intelligence, commercial vehicles, in production at a vehicle maker. It reads thousands of defect reports a month, including the photos and sensor screenshots, and lets any team ask in plain English. The standout is correlation at scale: Cross-Modal Correlation makes a pattern invisible inside one report (a part failing under a specific load, a defect cascading into a neighbour) visible across the whole fleet, while Confidence Scoring keeps soft numbers off the sales slide. (Illustrative) time from a new complaint to the right diagnosis falls from days of digging to seconds of query.

Clinical Scribe, clinical documentation, in production at a multi-country hospital chain. It listens in the consultation room, writes the SOAP note in real time with ICD-10 codes filled in, and posts it the moment the clinician approves. The sharpest expression of the pillars is sovereignty plus human control: the audio is captured, transcribed, and turned into a note inside your network and the recording is deleted after sync, while Rule & Tolerance Checks validate every ICD-10 code and a clinician signs every note. (Illustrative) most of a doctor's 1.5-2 hours a day of after-hours typing comes back; recovering 3-8% of previously dropped coded revenue is meaningful at clinic scale.

Patient Front Desk, patient access, in production at a multi-site clinic and hospital group. It answers every call around the clock in the caller's own language, verifies the patient against the record, checks insurance eligibility in real time, books or refills, and logs the call. Its distinctive guarantee is check-before-you-act determinism: Real-time Lookup & Eligibility confirms identity and coverage before any action, and a defined action set (book, reschedule, refill, remind, route) means it never improvises a medical opinion. (Illustrative) answered calls move toward ~100% including evenings and weekends, with reminders pulling the no-show rate down 20-40%.

Warranty & Fraud Intelligence, warranty claims, in production at a commercial-vehicle maker. It reads every claim from the dealer network, checks it against the rulebook, the gallery of excluded damage, and the full history of every claim ever filed, then returns one verdict: pay, refer, or reject, with a fraud score, the payable amount, and the evidence quoted and cited. Its distinctive power is the network view: Image Similarity Matching and Cross-Modal Correlation catch the same damage photo reused across six dealers over three months, which is invisible inside any one folder. (Illustrative) roughly 30-60% of clean claims auto-clear, while recoverable exposure is quantified per dealer instead of written off at year end.

BOQ Intelligence, engineering and tenders, in production at a global engineering firm. It reads bills of quantities, tenders, and specs in any format, pulls every line item (material, quantity, rate) into a clean structured list, and compares versions, quotes, and specs so deviations surface before they become change orders. Its distinctive control is traceability under deadline: Spec & BOQ Parsing plus Source Citation & Audit Trail mean an estimator can follow any flagged deviation straight back to the line that produced it, and Confidence Scoring keeps a shaky comparison off the bid. (Illustrative) a thousand-line comparison that took a senior engineer two to four days becomes minutes to a few hours of review.

The thesis

Most enterprise AI maximises tokens. We build agents that maximise value: the same intelligence, run privately, for a fraction of the cost, engineered to be right and able to prove it.

06

How we build and measure

The same engagement model runs behind all eight agents, and we put the numbers in writing.

It starts with a short, fixed-fee assessment from $10K. We pick the right first workflow with you, scope it on your own data, and give you an honest go or no-go before you commit to anything bigger. The agent build itself starts from $15K, runs on your servers or your own cloud account, and goes live in production in four to six weeks. Not a pilot: real work, every answer cited, the running cost watched from day one. And the line that matters: most agents return their cost within six months, at a running cost roughly ten times cheaper than the general-purpose approach.

The honesty clause is part of the model, not a footnote. Not every workflow pays back. If the volume is low, the process changes every quarter, or the data the agent needs is locked somewhere we can't responsibly reach, the ROI isn't there, and we will say so at the assessment. That is what the go-or-no-go is for: a no costs you a short assessment, a bad yes would cost you a year of quiet running costs, and we would rather give you the no.

We baseline the return on your own data, not ours. Before anything is built, we capture four numbers: items per month (invoices, calls, claims, documents), minutes of handling per item, the exception or error rate, and what a single error costs you downstream. After go-live, we compare the same four, with the agent's running cost on the other side of the ledger. Because that cost is nearly flat, the arithmetic gets better every month your volume grows. The full method is set out in The ROI of ValueMaxx agents. Every ROI figure in this paper is illustrative, based on typical workflow mechanics rather than specific customer results; the real numbers come from your baseline.

Assessment-first engagementBefore you commit
Scope one workflow on your dataSTEP 1
Honest go / no-goSTEP 2
Measure accuracy & cost from day oneSTEP 3

Illustrative. A short, fixed-fee assessment scopes the right first workflow before any build begins.

Start with an assessment.

We scope the right first workflow on your own data and give you an honest go or no-go before you commit to anything bigger.