London holds more than 70 million digitised images across its network of public archives, borough councils and NHS trusts — and a significant portion of them are duplicates. That is the operational reality confronting the Greater London Authority's digital asset teams as they push through a major rationalisation effort in 2026, one that has quietly become a test case for how large civic administrations manage sprawling visual records in the post-digitisation era.
The issue matters now because storage is not cheap, and the government's broader data reform agenda under Keir Starmer has put pressure on public bodies to demonstrate efficiency. Cabinet Office guidance issued earlier this year directed central and local government bodies to reduce unnecessary data redundancy by the end of the 2026-27 financial year. For a city like London — where Transport for London, the NHS North East London Integrated Care Board, the Metropolitan Police, and dozens of borough councils each maintain separate image libraries — the duplication problem is structural, not incidental.
What London Is Actually Doing
The most concrete effort so far has come from the London Metropolitan Archives on Northampton Road in Clerkenwell, which since January has been running a deduplication audit across roughly 4.2 million scanned heritage photographs. Staff there are using open-source perceptual hashing software — the same class of tool used by media libraries — to flag near-identical images before human reviewers make final deletion decisions. The archive has so far flagged approximately 340,000 potential duplicates, though confirmed removals remain in the tens of thousands pending sign-off.
Separately, the Wellcome Collection on Euston Road began a parallel review of its 100,000-image digital collection in March, focusing specifically on medical photography records shared across NHS partner databases. The Wellcome process is more cautious — duplicates are quarantined rather than deleted, and a retention review panel meets monthly. It is a slower model, but one designed to avoid the kind of irreversible loss that has caused problems elsewhere.
Transport for London's image library — which covers everything from engineering schematics to press photography — has taken a third route, outsourcing the deduplication work to a contracted digital asset management firm under a deal worth £1.8 million over three years, according to procurement records published on the GLA's contracts register in April 2026.
How London Compares to New York, Amsterdam and Tokyo
New York City's Department of Records and Information Services completed a comparable exercise in 2024, working through the Municipal Archives' holdings of around 2.5 million images. The New York process relied heavily on automated deletion with minimal manual review — faster, but it drew criticism from archivists after several historically significant photographs were reportedly lost in error. The city has since revised its policy to require human sign-off on any image predating 1980.
Amsterdam took the most aggressive centralisation approach. The Stadsarchief Amsterdam consolidated image holdings from 14 municipal departments into a single platform in 2023, cutting its overall storage footprint by roughly 28 percent in the first year alone. That figure, cited in the Stadsarchief's own annual report, has been held up as a benchmark by GLA digital officers, though London's fragmented governance structure — with 33 borough councils, each retaining independent data controls — makes a direct Amsterdam-style merger politically and logistically difficult.
Tokyo's approach through the Tokyo Metropolitan Archives has been the most conservative. The city opted for tagging and cross-referencing duplicates rather than removing them, prioritising discoverability over storage savings. The result is a larger but better-mapped dataset — useful for researchers, expensive for server budgets.
London currently sits somewhere between Amsterdam's efficiency drive and Tokyo's preservation instinct. That middle path has costs: the GLA's own digital infrastructure budget has grown by 12 percent year-on-year since 2023, partly because unresolved duplication inflates storage overhead.
Public bodies across London have until March 2027 to submit compliance reports against the Cabinet Office redundancy-reduction targets. Organisations that have not begun formal deduplication audits by October 2026 risk being flagged in a cross-government review. For borough councils still running legacy image systems — several in outer east London have yet to migrate to cloud-based asset management — that deadline is already uncomfortably close.