Before you connect an AI tool to your business, ask yourself a simpler question: if a new employee needed to understand how your business operates, where would they find that information? In most SMEs, the honest answer is: in about fifteen different places, across four platforms, half of it outdated, a quarter of it in someone's inbox.

That answer matters for AI for the same reason it matters for onboarding. AI works on the information you give it. If that information is scattered, inconsistently formatted, and riddled with outdated versions sitting alongside current ones, every AI output inherits that chaos. Garbage in, garbage out is not a cliché — it is a precise description of what happens.

Document centralisation is the infrastructure work that makes AI useful rather than unreliable. It is also, almost universally, the work that implementation guides skip over in favour of getting to the interesting part faster.

What "centralised" actually means

Centralisation does not mean having everything in one folder on SharePoint. It means having a single, authoritative location for each type of document, with a consistent structure, a clear naming convention, and a known owner responsible for keeping it current.

The distinction matters because many businesses believe they have centralised their documents when in fact they have replicated their chaos in a different platform. Moving scattered files from email threads into a shared drive is not centralisation — it is migration. The structural problems travel with the files.

Genuine centralisation requires three things working together: a location, a structure, and a discipline. The location is the easy part. The structure and the discipline are where most implementations fail.

The 20-document principle

Every business, regardless of size or sector, has a set of documents that it actually runs on. These are not all the documents you have — they are the ones that get referenced when decisions are made, when client work is produced, when new staff are onboarded, or when something goes wrong and someone needs to find out how it should have been handled.

In our experience working with UK SMEs, this set typically contains between 15 and 25 documents. We call this the 20-document core. Identifying it is the starting point for any meaningful AI readiness work, because these are the documents that AI will interact with most frequently and where the cost of inaccuracy is highest.

Operations

  • Client onboarding process
  • Service delivery SOPs
  • Quality review checklist
  • Supplier agreement template
  • Incident response procedure

Governance

  • AI Acceptable Use Policy
  • Data classification framework
  • GDPR/data processing record
  • Staff handbook
  • Business continuity plan

Commercial

  • Master service agreement
  • Standard proposal template
  • Pricing and rate card
  • Client contract terms
  • NDA template

Finance and People

  • Budget template and actuals
  • Payroll and benefits summary
  • Job description templates
  • Performance review framework
  • Expense policy

The exercise of identifying your 20-document core is itself valuable. Businesses that have never done it typically discover two things: there are critical documents that do not yet exist in written form, and there are documents they assumed were current that have not been touched in three years.

The naming convention problem

A naming convention sounds like a minor administrative detail. It is not. It is the mechanism by which a human — or an AI — can reliably locate, identify, and trust the currency of a document without opening it.

A document named Final_Client_Proposal_v2_EDIT_USE THIS ONE.docx is structurally useless to an AI tool. It contains no information about when it was created, who owns it, whether it is current, or what version hierarchy it sits within. An AI working from a library of files named in this way will either guess or hallucinate the hierarchy, neither of which is acceptable.

A workable naming convention for an SME has four components: document type, subject or client name, version number, and date. A proposal becomes PROPOSAL_ClientName_v1_2025-03.docx. A policy becomes POLICY_AISecurity_v2_2024-11.docx. The format is less important than the consistency — whatever you choose, it must be applied to every document without exception.

The version trap: The most common failure in document management is maintaining multiple versions of the same document without a clear mechanism for identifying the current one. The live version should always be a single file in a single location. Archived versions should be in a clearly labelled archive folder — not sitting alongside the current version with a slightly different filename.

Data classification — the step most businesses skip entirely

Once your documents are centralised and consistently named, the next layer is classification. Not every document in your business should be accessible to every AI tool you use, and the mechanism for making that distinction is a data classification framework.

For most SMEs, three tiers are sufficient:

Classification needs to be applied at the document level, not the folder level. A folder called "Clients" may contain both internal reference documents and restricted client data — the distinction needs to be at the file, not the container.

Once classified, the rule is simple: Restricted documents do not enter AI tools. Internal documents may enter approved AI tools with appropriate enterprise settings. Public documents can be used freely. The classification does the work of the judgement call, which means staff do not have to make that call individually every time.

The ownership question

Every document in your 20-document core needs a named owner. Not a team, not a department — a specific person who is responsible for ensuring it is current, accessible, and correctly classified.

Ownership without accountability is decoration. The owner needs to know they are the owner, understand what that means in practice, and have a review schedule — typically quarterly for operational documents and annually for governance documents — built into their working calendar.

A document with no owner is a document that will become outdated, be stored in multiple versions simultaneously, and eventually be avoided by staff who have learned they cannot trust it. By the time AI comes to interact with it, it will be structural noise rather than structural signal.

Why this is the work that AI makes essential, not optional

Businesses have operated with fragmented document infrastructure for decades and survived. The costs were real — slower onboarding, knowledge loss when staff left, inconsistent client work — but they were diffuse and hard to attribute directly.

AI changes the calculus in one specific way: it makes the quality of your document infrastructure directly visible in your outputs. When AI works from clean, centralised, consistently named, correctly classified documents, it produces reliable, usable results. When it works from the alternative, the failures are immediate and concrete — wrong information in a client document, outdated process applied to a current situation, restricted data surfaced inappropriately.

The businesses that will get the most from AI over the next five years are not necessarily the ones with the most sophisticated tools. They are the ones who did the unglamorous infrastructure work first. Document centralisation is the first item on that list.

It takes two to four weeks to do properly. It requires no technology beyond a shared drive you probably already have. And it is the prerequisite that every AI implementation guide tells you to assume is already in place.

It almost never is.