London's public sector digital teams are confronting a messy, expensive, and long-deferred problem: tens of thousands of duplicate images clogging archives, planning portals, and public-facing databases, costing storage budgets and creating legal headaches over licensing. Several institutions moved to address the issue in earnest this week, triggering a quiet but significant scramble across borough councils and heritage organisations.
The push matters right now because the Starmer government's planning reform agenda has pushed nearly every London borough to digitise its application records faster than originally planned. That acceleration surfaced a pre-existing problem. Systems built at speed rarely checked for duplication before ingesting images. The result, according to digital records specialists working across multiple boroughs, is archives where the same planning photograph, heritage asset image, or street-level shot may exist in three, four, or even a dozen separate folders under different file names — each treated as a distinct record, each chewing through licensed cloud storage.
Where the Problem Is Hitting Hardest
Two projects are particularly exposed. The Greater London Authority's London Infrastructure Mapping Application, which tracks live planning and development data across all 32 boroughs, flagged internal duplication concerns as far back as late 2024. This week, the GLA's digital team began a structured deduplication pass using perceptual hashing tools — software that identifies visually identical or near-identical images even when file metadata differs. The work is expected to run through September.
Meanwhile, the London Metropolitan Archives, based on Northampton Road in Clerkenwell, has been quietly running a parallel review of its digitised photographic collections since May. The archive holds more than 400,000 images spanning centuries of London history. Librarians there discovered during routine quality checks that batch-scanning contracts carried out between 2019 and 2023 had introduced duplicate entries at a rate of roughly one in every 14 scanned items in certain collections — a figure that, projected across the full digitised catalogue, suggests well over 28,000 potentially redundant records. None of those figures have been published officially yet, but they reflect the scale of work underway.
Tower Hamlets Council, whose planning department processed a record volume of applications during the post-pandemic development surge along the Blackwall Reach and Poplar riverside corridors, is also understood to be mid-review. The council moved its planning image repository to a new cloud platform in March 2026, and the migration exposed duplication rates that were reportedly higher than anticipated, though the council has not published specific data.
The Practical and Financial Stakes
Storage costs are not trivial. Enterprise cloud storage for public sector bodies in the UK typically runs between £18 and £35 per terabyte per month depending on contract tier, and large image files — particularly uncompressed heritage scans — accumulate fast. For a mid-size borough sitting on 50 terabytes of planning and asset images, even a 15 percent duplication rate represents meaningful ongoing expenditure for records that add no informational value.
There is also a copyright dimension. Some images ingested during rapid digitisation programmes between 2020 and 2023 were sourced from external contributors under one-time or limited licences. When duplicates exist across multiple internal systems, councils risk holding and serving images that technically breach the terms of those original agreements. Several boroughs have sought advice from the Information Commissioner's Office on this point, though no enforcement action has been reported.
The Guildhall Library in the City of London, which holds substantial photographic collections alongside the London Metropolitan Archives, completed its own deduplication exercise in April and reported a clean outcome — partly because its digitisation programme used stricter intake protocols from the outset.
For anyone dealing with London's public image archives — researchers, planners, journalists, heritage consultants — the practical advice is to re-check citations and download links on older digitised records through the summer. Institutions actively restructuring their databases may temporarily break or redirect URLs. The GLA has flagged that parts of the London Infrastructure Mapping Application may show intermittent gaps in imagery between now and September as the deduplication work progresses. The London Metropolitan Archives asks users to report any records that appear to be exact duplicates through its online catalogue feedback form.