London's public bodies are under mounting pressure to clean up their digital archives after a surge in duplicate image files has begun choking storage systems and delaying access to planning records, historical photographs and council documents. The problem, which has been building for several years, came into sharper focus this week as multiple institutions pushed forward with remediation projects ahead of a government-wide data efficiency review expected later this summer.
The timing matters. Keir Starmer's government has made digitisation of public services a central plank of its agenda, and the Cabinet Office has been pressing local authorities to demonstrate measurable improvements in how they store and retrieve digital assets. For councils already stretched by housing reform obligations and NHS referral backlogs, bloated image libraries represent a cost they can no longer quietly absorb.
Where the Problem Is Concentrated
At the London Metropolitan Archives on Northampton Road in Clerkenwell, staff have been working through a cataloguing project that began in January 2026 to identify and remove redundant scans from its photographic collections. The archive holds more than four million images covering London's history from the medieval period onward, and the digital migration process that accelerated during the pandemic created significant duplication — in some collections, the same photograph exists in three or more versions at different resolutions, under different file names, indexed separately.
Tower Hamlets Council's planning department, which processes applications covering some of the highest-value development land in the country along the Whitechapel Road corridor and around Canary Wharf, has faced similar problems. Officers working on major applications have had to manually cross-reference image files to avoid presenting duplicate evidence in planning committee reports — a process that planning support teams describe as time-consuming and error-prone.
The Museum of London Archaeology, based in Mortimer Street in Fitzrovia, has also been grappling with the issue. Its site recording photography, generated from excavations across Greater London, has accumulated over decades into a digital library where deduplication tools have identified significant redundancy. Resolving it is not simply a matter of deleting files; archival standards require provenance checks before any image is removed from the record.
What the Data Shows
A 2025 report from Jisc, the UK higher education and research technology body, found that digital duplication across public sector archives in England costs an estimated £47 million annually in unnecessary storage and administration. While that figure covers all file types, image data accounts for the largest single share of storage overhead. Cloud storage costs for public bodies in London — where procurement is typically handled through frameworks like the Crown Commercial Service's G-Cloud — have risen sharply since 2022 as image resolution standards have increased.
Automated deduplication software has improved substantially. Tools using perceptual hashing — a technique that identifies visually identical or near-identical images regardless of file name or metadata — can now process large libraries far faster than manual review. Several London councils piloted these tools in late 2025, with early results suggesting reduction rates of between 15 and 30 percent in raw storage volume, though actual deletions lag behind identification because of sign-off requirements.
The practical stakes extend beyond cost. Planning applications in London are subject to public inspection rights under the Town and Country Planning Act 1990, and duplicated or mislabelled images in publicly accessible portals have caused confusion for residents trying to scrutinise proposals in their neighbourhoods.
What happens next depends largely on whether the government's forthcoming data efficiency review sets binding targets for local authorities or simply issues guidance. Councils waiting on clarity have paused some deletion work to avoid having to redo it under new standards. For institutions like the London Metropolitan Archives, the more pressing deadline is internal: a target to complete the first phase of its deduplication project by September 2026 before a new batch of physical collection digitisation begins. If the backlog is not cleared by then, the problem compounds.