London's public sector is sitting on a digital clutter problem that has now been quantified. Across borough council planning portals, NHS trust document systems, and Transport for London's infrastructure databases, duplicate image files — the same photograph or scan stored two, three, sometimes a dozen times — account for an estimated 18 to 23 percent of total file storage, according to figures compiled by the Local Government Association in its 2025 digital efficiency review. That is not a rounding error. At scale, across 33 London boroughs, it translates to tens of thousands of gigabytes of redundant data that taxpayers are paying to store, backup, and maintain.
The timing matters. Keir Starmer's government has made planning reform a centrepiece of its agenda, with new digital requirements for local planning authorities coming into force under the Levelling Up and Regeneration Act's secondary legislation. Councils are under pressure to digitise records faster, open up data to third parties, and run public-facing portals that actually function at speed. Duplicate images sitting undetected inside those systems directly undermine all three goals, inflating costs at exactly the moment councils are being told to absorb savings elsewhere.
Where the Problem Is Concentrated
Tower Hamlets Council's planning portal, which handles some of the highest application volumes in London due to ongoing development along the Whitechapel Road and Canary Wharf corridors, processes thousands of document submissions a month. Planning officers and digital teams working across east London have flagged for years that applicant-uploaded image files — site photographs, elevation drawings, heritage images — frequently appear multiple times within a single application, uploaded in error or re-submitted after portal timeouts. The Greater London Authority's Planning London Datahub, launched in 2020 to centralise data from all 33 boroughs, has similarly had to build deduplication logic into its ingestion pipeline precisely because source data arrives with significant image duplication already baked in.
Brent Council, which is managing a substantial regeneration programme around Wembley Park, reported in its 2024-25 annual digital transformation update that storage costs for planning-related documents had risen by 31 percent in two years — partly attributed to increases in application volume, but also to unmanaged file duplication. The council subsequently piloted an automated deduplication tool developed in partnership with Socitm, the professional body for public sector technology leaders, across a subset of its archives. Results from that pilot, presented at a Socitm conference in Birmingham in March 2025, showed a 14 percent reduction in storage requirements within the scanned document archive after a single deduplication pass.
The Cost in Hard Numbers
Storage is not free. Public sector cloud storage contracts, typically negotiated through Crown Commercial Service frameworks, run at figures that vary by volume and provider, but independent analysis published by Public Technology magazine in February 2026 put average per-terabyte annual costs for mid-size London councils at between £180 and £340. Apply even the conservative end of that range to the LGA's estimated 18-percent duplication rate across a borough holding, say, 200 terabytes of planning and administrative image data, and you are looking at roughly £6,480 to £12,240 wasted annually per council — before accounting for backup, disaster recovery, and staff time spent managing bloated systems.
NHS trusts face a parallel version of the problem. Barts Health NHS Trust, which runs the Royal London Hospital in Whitechapel and several other major east London sites, operates medical imaging archives that are subject to strict retention rules under NHS England's Records Management Code of Practice. Duplicate non-clinical images — scanned consent forms, administrative photographs, equipment documentation — fall outside the clinical PACS systems that already have deduplication built in, and sit instead in general document management systems where redundancy can accumulate unchecked over years.
The practical path forward is now fairly well established even if the political will to follow it varies. Councils and NHS bodies that have invested in hash-based deduplication tools — software that generates a unique digital fingerprint for each file and flags identical copies — have seen measurable results quickly. The Socitm pilot at Brent is the most documented London example, but Southwark Council has also quietly run a file audit programme across its housing and planning archives since January 2025. For boroughs yet to act, the LGA's digital efficiency guidance recommends starting with a read-only audit pass before any deletion, building in legal hold checks against the Records Management Code, and scheduling quarterly automated sweeps rather than treating deduplication as a one-off exercise.