Editorial must own AI implementation with engineering support

By Dr. Dietmar Schantin

IFMS Media Ltd

London, United Kingdom

Connect      

News organisations are building AI systems as technical projects with editorial consultation. Perhaps they have it backwards.

The implementations that succeed will treat AI as an editorial capability requiring engineering support, not the reverse. This isn’t a radical position. Across industries, leading organisations have recognised AI transformation demands deep collaboration across functions. BlackRock’s AI governance emphasises tight collaboration between technical and talent leadership. Moderna created a chief people and digital technology officer role, explicitly combining HR and internet technology (IT) leadership to accelerate AI integration. These companies understand AI touches every aspect of how work gets done, and no single function can own it alone.

Though AI is a tech tool, how media organisations use and implement it should be led by editorial teams.
Though AI is a tech tool, how media organisations use and implement it should be led by editorial teams.

News organisations face an even stronger case for editorial ownership. In most business functions, AI operates as a knowledge tool, drawing on its training data to generate responses, draft communications, or analyse information. In editorial contexts, this approach is dangerous. When a large language model (LLM) draws from its training data to create or modify journalistic content, the risks of hallucination, factual error, and inadvertent plagiarism become unacceptable. Editorial applications require AI to function as a language tool, processing and transforming content journalists provide rather than generating knowledge from its own training.

This distinction sounds simple. Implementing it is not. And the judgment required at every stage is editorial judgment.

Current approach and its limits

Retrieval-augmented generation (RAG) represents the current best practice for constraining AI to work only with verified content. Rather than letting a model access its training data, RAG systems provide specific documents — typically the archive, verified sources, and other journalistically vetted material — for the AI to draw upon.

The results have often been disappointing. Experiments where users can “talk to the archive” produce mixed outcomes at best. Publishers in Europe and the United States report that users try these systems once and don’t return. The question is whether this reflects fundamental limitations of conversational interfaces with news content, or whether the implementations themselves are inadequate.

Evidence suggests the latter. Most RAG implementations still often follow a naive pattern: Content feeds into a vector database, semantic search retrieves relevant chunks, and an LLM assembles them into responses. This architecture fails because it treats AI implementation as a purely technical problem. Three areas determine success: input, retrieval, and output. Each requires editorial judgment.

The input problem

Software follows a simple rule: Poor input produces poor output. But with traditional systems, errors can be traced and reproduced. Large language models are stochastic. When something goes wrong, diagnosing the cause within the processing chain becomes extremely difficult. This makes input quality paramount.

Articles written for human readers are not optimised for machine processing. Consider how journalists naturally write: “The minister announced the policy yesterday. She expects implementation by March.” For a human reader, context makes the referent clear. For a RAG system that chunks content into segments, the second sentence may end up separated from the first, and suddenly the system has no idea who “she” refers to.

Semantic search compounds this problem. It finds content with similar patterns, not necessarily similar meaning. A user searching for “Minister Schmidt” won’t surface chunks containing only “she,” even if they contain the most relevant information about the minister’s actions.

Then there’s metadata, which consistently undervalued in every news organisation I’ve encountered. Without robust metadata, systems cannot filter or contextualise effectively. Semantic similarity alone cannot distinguish between a 2019 article about “the current president” and a 2024 article using the same phrase. Metadata provides the context that makes intelligent retrieval possible.

The retrieval problem

When a user asks “What did Apple announce about privacy?” the query is fundamentally ambiguous. Which announcement? Last week’s feature update, last year’s policy change, or the keynote from three years ago that established their current positioning? The user may have a specific event in mind or want a comprehensive overview.

Current RAG systems don’t ask. They assume. They retrieve semantically similar chunks and synthesise them, potentially conflating announcements from different eras into a response that misrepresents the company’s evolving position.

The best approach when intent is unclear is to ask. But determining what constitutes a clear question versus an ambiguous one requires linguistic expertise. What synonyms might users employ? What different phrasings express the same intent? These are questions language professionals are trained to answer.

Retrieval also involves choosing appropriate search methods. Semantic search alone is often insufficient. Some queries need full-text search; others benefit from knowledge graphs mapping relationships between entities. Knowing which approach serves which query type requires understanding both the content and how users seek information about it.

The output problem

Even with sound input and retrieval, the final synthesis step presents risks. The model may inadvertently editorialise, introducing framing or emphasis absent from the source material. It may blend facts from different time periods, creating technically accurate but contextually misleading responses. It may adopt a tone inconsistent with editorial standards.

Identifying these problems requires precisely the skills journalists possess: sensitivity to language, understanding of how framing shapes meaning, and judgment about what constitutes appropriate editorial voice.

Four editorial responsibilities

If AI implementation requires editorial judgment at every stage, then editorial must be embedded in the development process, not consulted after the fact. This translates to four specific responsibilities.

Content optimisation

Editorial teams must own the process of making content machine-readable while preserving accuracy. This means developing standards for language model optimisation, such as eliminating ambiguous references, using explicit dates rather than “yesterday,” and ensuring entity names appear consistently.

This is translation work, converting journalism written for humans into formats serving both human and machine readers. AI can assist with this task, but editorial must verify the results.

Metadata governance

Metadata is as important as content itself, and editorial must treat it accordingly. This goes beyond basic tagging to include structured information about user needs addressed, topics covered, temporal context, and relationships to other content.

Systems can assist with metadata generation and management, but responsibility for accuracy belongs with editorial.

Development participation

Editorial staff must be embedded in AI development teams. They bring essential expertise including understanding how questions can be phrased differently while seeking the same information, anticipating user intent, and ensuring systems reflect journalistic judgment about what matters and why.

Aftonbladet in Sweden demonstrated this approach with its EU Buddy project, where editorial teams created comprehensive question-and-answer datasets that dramatically improved response quality.

Testing and feedback

Reinforcement learning with human feedback requires humans who understand what constitutes an adequate response in journalistic terms: Is this answer accurate? Appropriately nuanced? Consistent with editorial standards?

Engineers cannot make these judgments, not because they lack intelligence, but because this isn’t their domain of expertise.

The value proposition

General-purpose AI will always be more powerful in certain dimensions than purpose-built editorial AI. The scale of systems like ChatGPT is unmatchable. But scale is not the value proposition of news organisations.

What distinguishes journalistic AI from general-purpose AI is precisely what distinguishes journalism from the undifferentiated mass of online information: editorial judgment, verified content, and linguistic precision. These qualities cannot be engineered in after the fact. They must be present from the beginning, embedded in how systems are designed, built, and refined.

The organisations that succeed with AI will be those that recognise this technology as inseparable from editorial function and not a tool that editorial uses, but rather an extension of editorial judgment itself. That recognition must translate into organisational structure, development processes, and accountability.

AI implementation is not an engineering problem with editorial consultation. It is an editorial problem requiring engineering capability.

About Dr. Dietmar Schantin

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.
x

I ACCEPT