Skip to main content
The Daily London

London news, every day

News

London's Digital Archives Are Riddled With Duplicate Images — And the Numbers Tell a Damning Story

Councils, NHS trusts and cultural institutions across the capital are sitting on millions of redundant files, costing storage budgets and slowing down public services.

Share

By London News Desk · Published 5 July 2026, 4:35 am

4 min read

Updated 7 h ago· 5 July 2026, 10:19 am

How we reported this

This article was generated by AI from the linked public sources. The Daily London is independently owned and covers London news free from advertiser or sponsor influence. Read our editorial standards →

London's Digital Archives Are Riddled With Duplicate Images — And the Numbers Tell a Damning Story
Photo: Photo by Olga Lioncat on Pexels

London's public sector bodies collectively hold an estimated tens of millions of duplicate digital images across their servers — duplicated photographs, scanned documents and graphic assets that drain storage budgets, bog down IT infrastructure and, in some cases, compromise the integrity of public records. The problem is not new, but the scale of it, only now being quantified by a cluster of Freedom of Information requests and internal audits, is striking administrators across City Hall, NHS trusts and borough councils alike.

The issue has snapped into sharper focus in 2026 because of two converging pressures: the Starmer government's push to digitise public services under its Data Use and Access Act, passed earlier this year, and a wave of cloud migration projects that are forcing organisations to confront exactly what they have been storing. When you move data to the cloud, you pay per gigabyte. Suddenly, redundancy has a price tag.

The Scale Across London's Institutions

Tower Hamlets Council, which manages one of the most densely documented planning archives in England, began a deduplication audit in January 2026 after its planning portal migrated to a new cloud system. Internal IT documentation, shared with local campaign group Open Tower Hamlets, showed the council had flagged more than 340,000 duplicate image files across its planning and housing departments alone. At commercial cloud storage rates currently averaging around £0.018 per gigabyte per month on standard tiers, even moderate-resolution duplicates stack up to thousands of pounds in avoidable annual cost.

At King's College Hospital NHS Foundation Trust on Denmark Hill in Southwark, a digital transformation programme launched in autumn 2025 identified radiology imaging as a particular pressure point. Medical imaging files are among the largest digital assets any organisation holds. When PACS — Picture Archiving and Communication Systems — migrate between platforms, duplicate studies are routinely generated. The NHS England digital standards team has previously noted, in its 2024 Data Quality Framework, that duplicate patient records and associated imaging are among the top three data integrity risks across acute trusts. King's, like several other major London trusts, is working through a structured remediation programme, though no completion date has been publicly committed to.

The Wellcome Collection on Euston Road digitised roughly 100,000 archival photographs between 2018 and 2023 as part of its open access initiative. A post-project review, referenced in its 2024-25 annual report, found that automated scanning workflows had generated duplicate derivatives — multiple resolution versions of the same image filed without consistent naming conventions — across approximately 12 percent of its digitised holdings. That translates to around 12,000 files requiring manual or algorithmic review.

Why Deduplication Is Harder Than It Sounds

Identifying and removing duplicate images is not simply a matter of deleting obvious copies. Many duplicates exist as near-matches: the same photograph cropped differently, rescanned at a different resolution, or saved under a variant filename after a system migration. Standard hash-based deduplication tools — which match files by generating a unique fingerprint of their binary content — catch exact copies but miss these near-duplicates entirely.

Specialist perceptual hashing tools, which compare images visually rather than byte-by-byte, are more effective but computationally expensive and require licensing. For a borough council operating under a frozen IT budget, the investment is difficult to justify without clear policy mandate from above.

The Greater London Authority has not yet published a unified standard for image deduplication across its family of organisations, though GLA Digital confirmed in its March 2026 Digital Infrastructure Update that a data governance review covering storage efficiency is underway. No timeline for binding guidance has been announced.

For organisations wrestling with this now, the practical advice from the sector is to front-load deduplication work before any cloud migration rather than after — because post-migration, you are already paying for the redundant storage while simultaneously funding the audit to remove it. Borough IT teams have been advised to consult the Local Government Association's Digital Data Standards hub, which published updated image management guidance in February 2026, as a starting reference point before procuring any dedicated tooling.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily London

Covering news in London. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to London news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily London and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the London brief

The day's London news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.