London's public sector bodies collectively hold an estimated tens of millions of duplicate digital images across their servers — duplicated photographs, scanned documents and graphic assets that drain storage budgets, bog down IT infrastructure and, in some cases, compromise the integrity of public records. The problem is not new, but the scale of it, only now being quantified by a cluster of Freedom of Information requests and internal audits, is striking administrators across City Hall, NHS trusts and borough councils alike.
The issue has snapped into sharper focus in 2026 because of two converging pressures: the Starmer government's push to digitise public services under its Data Use and Access Act, passed earlier this year, and a wave of cloud migration projects that are forcing organisations to confront exactly what they have been storing. When you move data to the cloud, you pay per gigabyte. Suddenly, redundancy has a price tag.
The Scale Across London's Institutions
Tower Hamlets Council, which manages one of the most densely documented planning archives in England, began a deduplication audit in January 2026 after its planning portal migrated to a new cloud system. Internal IT documentation, shared with local campaign group Open Tower Hamlets, showed the council had flagged more than 340,000 duplicate image files across its planning and housing departments alone. At commercial cloud storage rates currently averaging around £0.018 per gigabyte per month on standard tiers, even moderate-resolution duplicates stack up to thousands of pounds in avoidable annual cost.
At King's College Hospital NHS Foundation Trust on Denmark Hill in Southwark, a digital transformation programme launched in autumn 2025 identified radiology imaging as a particular pressure point. Medical imaging files are among the largest digital assets any organisation holds. When PACS — Picture Archiving and Communication Systems — migrate between platforms, duplicate studies are routinely generated. The NHS England digital standards team has previously noted, in its 2024 Data Quality Framework, that duplicate patient records and associated imaging are among the top three data integrity risks across acute trusts. King's, like several other major London trusts, is working through a structured remediation programme, though no completion date has been publicly committed to.
The Wellcome Collection on Euston Road digitised roughly 100,000 archival photographs between 2018 and 2023 as part of its open access initiative. A post-project review, referenced in its 2024-25 annual report, found that automated scanning workflows had generated duplicate derivatives — multiple resolution versions of the same image filed without consistent naming conventions — across approximately 12 percent of its digitised holdings. That translates to around 12,000 files requiring manual or algorithmic review.
Why Deduplication Is Harder Than It Sounds
Identifying and removing duplicate images is not simply a matter of deleting obvious copies. Many duplicates exist as near-matches: the same photograph cropped differently, rescanned at a different resolution, or saved under a variant filename after a system migration. Standard hash-based deduplication tools — which match files by generating a unique fingerprint of their binary content — catch exact copies but miss these near-duplicates entirely.
Specialist perceptual hashing tools, which compare images visually rather than byte-by-byte, are more effective but computationally expensive and require licensing. For a borough council operating under a frozen IT budget, the investment is difficult to justify without clear policy mandate from above.
The Greater London Authority has not yet published a unified standard for image deduplication across its family of organisations, though GLA Digital confirmed in its March 2026 Digital Infrastructure Update that a data governance review covering storage efficiency is underway. No timeline for binding guidance has been announced.
For organisations wrestling with this now, the practical advice from the sector is to front-load deduplication work before any cloud migration rather than after — because post-migration, you are already paying for the redundant storage while simultaneously funding the audit to remove it. Borough IT teams have been advised to consult the Local Government Association's Digital Data Standards hub, which published updated image management guidance in February 2026, as a starting reference point before procuring any dedicated tooling.