FOR ANY TEAM, ANY INDUSTRY

Enterprise Document Intelligence (artiGen)

A private ChatGPT over your own corpus, the sovereign platform every other agent is built on.

9 min readJune 2026

Private
Operationally cost-effective at scale
Deterministic & controlled

Which of our contracts auto-renew in Q3, and what notice period applies?

Three contracts auto-renew in Q3. Each requires 60 days written notice before the renewal date.

MSA-Acme.pdf · p.12Vendor-register.xlsxRenewals-2026.docx · p.3

Illustrative example. Every answer links to the exact file and page; if it is not in your documents, it says so.

Millions

Documents indexed

Illustrative

Seconds

To a cited answer

Illustrative

100%

Answers linked to source

On-prem

Model can run in your walls

Executive summary

Your company knows an enormous amount, but most of it is locked inside documents no one has time to read: contracts, manuals, policies, reports, tenders, and past projects. A public AI tool could answer across all of it in seconds, but sending private documents to someone else's cloud puts your IP and confidential data outside your control, so most enterprises can't use one.

artiGen is the same idea, made private. It runs entirely inside your environment, your documents never leave, and every answer links back to the exact file and page it came from. Your teams find answers in seconds instead of hours, new joiners ramp in days instead of months, and your experts stop answering the same question over and over. It is in production today, and it is also the platform every other Attentions agent is built on, so when you start here, the rest get easier.

Section 1 of 11

The essentials

The whole paper, in brief.

Understanding the problem: what is enterprise document sprawl?

Every enterprise runs on documents, and over the years those documents pile up faster than anyone can organise them. By "enterprise document sprawl" we mean the everyday reality that the knowledge your organisation has already paid to create is scattered across dozens of places, in dozens of formats, with no single way to ask a question of all of it at once.

A typical organisation's document estate is built up, one file at a time, over many years. It usually includes:

Contracts and agreements, master agreements, NDAs, supplier and customer contracts, amendments, and the clauses buried in their appendices.
Policies and procedures, HR handbooks, compliance manuals, standard operating procedures, safety and quality documents, each with its own revision history.
Manuals and technical documentation, product manuals, equipment guides, installation and service instructions, specifications.
Reports, financial reports, project reports, audit findings, board papers, field and incident reports.
Project records, proposals, tenders, bids, drawings, meeting notes, and the lessons learned that never quite made it into a system.
Correspondence, emails and attachments where, in practice, a great deal of real decisions and context actually live.

These files do not sit in one tidy place. They are spread across the systems each team happened to adopt:

Shared drives and network folders, organised by whoever set them up.
SharePoint sites, Google Drive, and other collaboration tools.
A legacy document management system (DMS) nobody fully trusts anymore.
ERP and line-of-business systems where documents are attached to records.
Email inboxes and personal folders.
Filing cabinets and scanners, paper that was digitised as image-only PDFs, with no searchable text inside.

And the documents arrive, and survive, in every possible format. Born-digital PDFs sit next to scanned faxes from the 2000s, Word and Excel files next to images and handwritten notes, English documents next to ones in other languages. Some are a single page; some are thousand-page tenders or contracts.

How people search for an answer today. When someone needs a specific fact, a clause in a contract, the current version of a policy, a spec from a past project, the process is slow and manual:

They guess which system the document lives in, then guess which folder.
They run a filename or keyword search that only matches text the system happens to have indexed (so scanned documents return nothing).
They open file after file and skim for the relevant section.
When that fails, they message a colleague, or the one expert who "knows where everything is."
That expert stops their own work to answer, often re-answering a question they have answered many times before.
If none of that works, the person guesses, or acts without the document that already had the answer.

Why doing this by hand is slow, costly, and error-prone. The knowledge already exists; the cost is that nobody can get to it quickly. Concretely:

Your knowledge is buried in thousands of documents that are hard to search.
Public AI tools, the obvious thing that would unlock all of it, are off-limits, because you can't send private data outside your walls.
New staff take months to find what they need, because the map of "where things are" lives in people's heads.
Experts answer the same questions over and over, which is both a tax on your best people and a single point of failure when they leave.

So the obvious tool that would unlock all of this is exactly the one your security team can't approve. The knowledge stays buried, and the cost shows up as slow onboarding, repeated questions, decisions made on a half-remembered or out-of-date document, and institutional memory that walks out the door when a key person does.

SharePoint & drives

Sites and network folders organised by whoever set them up.

Email & PDFs

Born-digital files next to scanned faxes from the 2000s.

Wikis & tickets

Decisions and context spread across the tools each team adopted.

ERP exports

Documents attached to records in line-of-business systems.

indexed privately, in your environment ↓

One private, permission-aware index

Searchable across everything at once, including scanned paper, with access inherited from your systems.

Section 2 of 11

The problem

What this costs to do by hand today.

What artiGen does

artiGen applies private, on-premises AI across your entire document corpus, from a few thousand files to many millions, and turns a scattered collection of contracts, manuals, policies, reports, and project records into a single source you can simply ask. Instead of knowing which system, which folder, and which file to open, anyone authorised asks a question in plain English and gets a clear, sourced answer drawn from across everything at once.

Across that corpus, artiGen produces understanding such as:

What your documents actually say on a given topic, in plain language, pulled from wherever it lives.
The exact file and page each answer came from, so the answer can be trusted and checked.
A faithful summary of a long document, a thousand-page contract or tender condensed into something a person can use.
Answers that respect who is allowed to see what: if you can't see a file in the source system, you can't see it through artiGen.
An honest "it isn't in your documents" when the answer genuinely isn't there, instead of a confident guess.
Answers across languages, where documents (or questions) are not all in one language.
A searchable view of scanned and legacy paper, not just born-digital files, because it reads images and handwriting through OCR.

In short, it makes the knowledge you have already paid to create finally usable, privately, at enterprise scale, and with the source attached to every answer.

AnswerCITED

Question: …notice period for Acme MSA?
Answer: 60 days before renewal
Sources: MSA-Acme.pdf · p.12
Confidence: High
Access: Inherited from your systems

The ruleNo source, no answer

Backed by your documentsGROUNDED

Not in your documents → it says so, never guessesNO SOURCE

Illustrative example. Every answer points back to the exact file and page; when the answer genuinely is not in your documents, it says so instead of inventing one.

Section 3 of 11

The agent at work

The job, run end to end.

Questions it can answer

The whole point is that people ask in normal words and get an answer they can trust. Grounded in how teams actually use artiGen, the kinds of questions it answers include:

What does our contract with this supplier say about termination, renewal, or liability?
Which of our policies covers this situation, and what is the current version?
What's the spec we used on the last project like this one?
Summarise this thousand-page tender, what are the key obligations and deadlines?
Where in these documents is this clause, figure, or requirement?
What did we decide about this, and which document records that decision?
Does the answer to this exist anywhere in our documents, and if not, just tell me?
Our documents live in SharePoint and a legacy DMS, can it answer across both at once?
Half our contracts are scanned PDFs from the 2000s, can it read and search those too?
Can it keep answers restricted by department, so HR files only surface for HR?

How it works

Point it at your approved document repositories

Wherever they live and in whatever format, including scanned paper. You decide which sources are in scope.

It reads and indexes them privately, inside your environment

It can read scans (OCR), work across languages, and summarise long documents. Nothing is sent out to do this.

Anyone asks a question in plain English and gets a clear answer

with a link to the exact file and page behind it. If the answer isn't in your documents, it says so instead of guessing.

Under the hood (for your technical team)

Private Document Chat

A private ChatGPT over your own corpus, on your own infrastructure.

Any-format Extraction

Reads PDFs, scans, emails, spreadsheets, handwriting, images, even video frames, and turns them into clean structured data (OCR included).

Large-Volume Extraction & Q&A

Pulls answers out of millions of documents and lets anyone ask across all of them at once.

Document Summarisation

Turns long documents into short, faithful summaries.

Plain-English Q&A

Ask any agent a question in normal words and get a sourced answer.

Source Citation & Audit Trail

Links every answer to its exact source, with a tamper-evident log.

Access & Permission Inheritance

Respects who's allowed to see what. If you can't see it in the source system, you can't see it through the agent.

The building blocks it's composed from. artiGen is assembled from proven, reusable capability agents from the Attentions catalogue, not built bespoke from scratch:

Large-Volume Extraction & Q&A, pulls answers out of millions of documents and lets anyone ask across all of them at once. This is the engine that makes the corpus searchable at enterprise scale.
Private Document Chat, a private ChatGPT over your own corpus, on your own infrastructure. It is the heart of artiGen, and the same block also powers Patient Front Desk.
Any-format Extraction (OCR), reads PDFs, scans, emails, spreadsheets, handwriting, and images and turns them into clean structured data, OCR included. This is how it ingests scanned paper and legacy files alongside born-digital documents.
Document Summarisation, turns long documents into short, faithful summaries, so a thousand-page contract or tender becomes something a person can actually use.
Plain-English Q&A, ask in normal words and get a sourced answer. The friendly front door on the whole platform.
Connector Framework, plugs into the tools you already run, with new sources added in days, not quarters. This is what lets a corpus split across SharePoint and a legacy DMS be answered across as one.
Source Citation & Audit Trail, links every answer to its exact source, with a tamper-evident log. Regulator-ready by default.
Access & Permission Inheritance, respects who's allowed to see what. If you can't see a document in the source system, you can't see it through the agent. RBAC and SSO.

Inputs, formats, and modalities. It handles born-digital and scanned documents through OCR, reads images and handwriting, works across languages (translation and multilingual understanding), and summarises long documents. The corpus can run to millions of files spread across many source systems.

What it integrates with. The Connector Framework plugs into the tools you already run, Drive, SharePoint, ERP, EMR, Slack, email, databases, with new sources added in days, not quarters. So a corpus split across SharePoint and a legacy DMS can be answered across as one, without first migrating everything into a single repository.

Data-flow and deployment topology. You point the agent at your approved repositories; it ingests and indexes them inside your environment. Queries are answered against that private index, and every response is returned with a citation to the source file and page. There are two deployment shapes:

Your server, with AI running through an approved endpoint. Your documents and your index stay on your machine.
Everything inside your walls. At the highest tier of data-sovereignty, the whole platform, including the AI model itself, runs on dedicated hardware inside your environment, so documents, indexes, and processing never leave at all.

Built for production

Private

The whole platform, including the model, can run on dedicated hardware in your walls; permissions are inherited from your systems.

Operationally cost-effective at scale

Fast across millions of documents, with running cost ~flat as the corpus and user count grow.

Deterministic & controlled

Every answer links to the exact file and page; if it isn't in your documents, it says so. No source, no answer.

Private

artiGen runs inside your environment, on your servers, or in your own AWS, Google Cloud, or Azure account. Your documents and your index stay on your machine. At the highest tier of data-sovereignty, the whole platform, including the AI model itself, runs on dedicated hardware inside your walls, so documents, indexes, and processing never leave at all.

How it's enforced. Access & Permission Inheritance means access follows the permissions your systems already enforce: if a person can't see a file in SharePoint, they can't see it through the agent either, so there's no new place for sensitive data to leak. Source Citation & Audit Trail logs every query, every source, and every action, so when compliance asks what the AI saw and did, the answer is one audit trail away. This is why we can work to the standards your auditors care about, GDPR, HIPAA, SOC 2 Type 1, and ISO 27001, rather than bolting security on at the end.

Operationally cost-effective at scale

Big general-purpose models are priced by the token, so a high-volume document workflow scales its bill with the volume of work, not the value created, the busier the agent gets, the worse the maths. artiGen runs small, job-specific models on your own infrastructure instead. The hardware is sized to the workload, not to the brochure, which makes it roughly ten times cheaper to run in production than the general-purpose approach, and keeps the running cost nearly flat as your documents and users grow.

How it's enforced. Cost is treated as an operations discipline, not a quarterly surprise: the running cost is watched from the first day in production and managed as volumes grow. FinOps for AI, handled for you. Large-Volume Extraction & Q&A is what lets a fine-tuned model answer across millions of documents without reaching for the most expensive model available for every question.

Deterministic & controlled

For an enterprise workflow, a confident wrong answer is worse than no answer at all. A large general-purpose model is built to always produce a fluent answer, even when it doesn't actually know, and that is where hallucinations come from. artiGen is built the other way around.

How it's enforced. Every answer points back to the exact document and page it came from. If the agent can't find a source, it says so instead of inventing one, no source, no answer, delivered by Source Citation & Audit Trail. Where the platform powers structured workflows, it fills set fields rather than writing loose paragraphs, which leaves far less room for mistakes to hide. The result is an agent that behaves the same way every time and can always show you why it gave the answer it did: deterministic, and explainable.

Who benefits

Everyone who works with documents

Anyone whose job involves finding or checking a fact in a document, which, in most organisations, is nearly everyone, gets instant answers they can trust, with the source attached. Instead of guessing which system holds the file, opening one after another, or messaging a colleague, they ask a question in plain words and get a clear answer with a link to the exact file and page. There is nothing to learn: it works the way people already wish search worked. When the answer genuinely isn't in the documents, it says so, so people stop acting on a half-remembered or out-of-date version.

IT and security

IT and security get a way to say yes to AI without giving anything up. Nothing leaves your environment, and you stay in control of access. Because artiGen inherits the permissions your systems already enforce, it creates no new place for sensitive data to leak, if a person can't see a file in the source system, they can't see it through the agent. There is no rip-and-replace: it sits on top of the repositories you already have, through connectors, and leaves your systems and your permissions exactly as they are. And because every query, source, and action is logged, IT can show exactly what the AI saw and did.

Leadership

Leadership finally gets the whole company's knowledge made usable. The contracts, manuals, policies, reports, and project records the organisation has spent years and budget creating stop being dead weight in a file store and become something the business can actually draw on, in seconds, at any scale. It reduces the dependence on a handful of people who "know where everything is," and it preserves institutional memory so it doesn't walk out the door when a key person leaves. And because artiGen is the platform every other Attentions agent is built on, standing it up is also the groundwork for everything that comes next.

New joiners

New joiners reach productive in days rather than months. Instead of spending their first weeks learning the unwritten map of where documents live and who to ask, they can ask the corpus directly and get a sourced answer immediately. That ramps them faster, and it does so without interrupting your experts, who stop losing their own time to the same onboarding questions, over and over.

In short

“artiGen lets your whole organisation ask a question of everything it has ever written down, privately, instantly, and with the source attached.”

Core business value

artiGen transforms disconnected contracts, manuals, policies, reports, project records, and the systems they're scattered across into a central, private, ask-anything knowledge platform. It helps organisations: find answers in seconds instead of hours; trust those answers because each one cites its exact source; unlock knowledge that public AI tools can't be allowed to touch; ramp new staff in days rather than months; stop overloading experts with repeated questions; preserve institutional memory so it survives staff turnover; keep every document, index, and answer inside their own walls; and lay the foundation that every other Attentions agent is built on.

In simple terms, artiGen lets your whole organisation ask a question of everything it has ever written down, privately, instantly, and with the source attached.

Section 9 of 11

The payoff

Where the return comes from.

The return (illustrative)

The return on artiGen stacks the same four sources we model with every customer:

Hours returned

A question that took a person 20-40 minutes of hunting (illustrative), or a Slack to an expert who then loses 15 minutes of their own (illustrative), becomes a few seconds. Across a team that asks dozens of these a day, that is hours back every week.

Error cost avoided

Decisions made on the right clause, the current policy, or the actual spec, instead of a half-remembered or out-of-date one, avoid the downstream cost of acting on the wrong document.

Speed

New joiners reach productive in days rather than months (illustrative), because they can ask the corpus directly instead of waiting on a colleague.

Scale without headcount

The corpus can double, and the number of people asking can double, without your experts' answering load doubling with it, the agent's capacity isn't tied to hiring.

Time to a sourced answer across the corpus

Illustrative.

Hunting across systems, today~20–40 min

With artiGen~seconds

015 min3040 min

Why teams adopt it

It works the way people already wish search worked. You ask in plain words and get an answer you can trust, because it shows its source. There's nothing to learn. And because it's private, the security team can say yes. There's no rip-and-replace: it sits on top of the repositories you already have, through connectors, and leaves your systems and your permissions exactly as they are.

Section 11 of 11

Why teams adopt it

What makes the switch worth it.

Prefer the overview? See the Enterprise Document Intelligence page

Start with an assessment.

We scope the right first workflow on your own data and give you an honest go or no-go before you commit to anything bigger.

Book an assessment