Skip to main content
The Daily London

London news, every day

News

London Leads the Push to Purge Duplicate Images From Public Records — But Other Cities Are Catching Up Fast

As councils and heritage bodies across the capital tackle a growing backlog of repeated and misfiled digital images in public archives, London's approach is being watched — and quietly challenged — by rivals in Amsterdam, New York and Tokyo.

Share

By London News Desk · Published 5 July 2026, 5:00 am

4 min read

Updated 4 h ago· 5 July 2026, 1:17 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily London is independently owned and covers London news free from advertiser or sponsor influence. Read our editorial standards →

London Leads the Push to Purge Duplicate Images From Public Records — But Other Cities Are Catching Up Fast
Photo: Photo by SevenStorm JUHASZIMRUS on Pexels

Transport for London has more than 4 million digitised archive images on its internal asset management system. A significant share of them, according to ongoing internal audits the organisation began in early 2025, are duplicates — the same photograph stored two, three, sometimes five times under different file names. It is a problem that sounds mundane. The cost of not fixing it is not.

Duplicate image management — the systematic identification, deduplication and replacement of redundant visual files across public digital infrastructure — has quietly become one of the more pressing data-governance headaches for major cities. London's institutions are now deep into that work, driven partly by the Starmer government's broader push to cut waste from public-sector IT spending ahead of the October 2026 spending review deadline.

What London Is Actually Doing

The Museum of London Docklands, based at West India Quay in Tower Hamlets, completed a deduplication audit of its 280,000-image digital collection in March 2026. The project, run in partnership with the cultural technology firm Axiell, reduced active storage requirements by roughly 18 percent and eliminated more than 40,000 redundant files. The museum worked from a framework developed jointly with the London Metropolitan Archives on Northampton Road in Clerkenwell, which manages records stretching back to 1067 and has been digitising at scale since 2019.

The London Metropolitan Archives project is larger and slower. Staff there have been using automated perceptual hashing — a technique that generates a short digital fingerprint for each image and flags near-identical matches — to work through a backlog that runs to several million files. The process, which began formally in January 2025, is expected to take until late 2027 to complete. Identified duplicates are not simply deleted; they are flagged, reviewed by a human archivist, and either consolidated into a single canonical record or retained if contextual metadata differs meaningfully between copies.

That human-review layer is what distinguishes London's approach from cheaper, faster alternatives. It adds cost. It also adds accuracy.

How Amsterdam, New York and Tokyo Compare

Amsterdam's Stadsarchief — the city's main municipal archive, housed in a converted banking complex on Vijzelstraat — went further, faster. It completed a full deduplication of its 750,000-image digital collection in 2024 using a fully automated pipeline with no mandatory human review for files below a defined risk threshold. Reported storage savings ran to 23 percent. Critics inside the archival community raised concerns about accidental data loss, though the Stadsarchief has not publicly reported any confirmed losses of unique material.

New York City's approach has been fragmented. The Department of Records and Information Services on Chambers Street in Lower Manhattan runs its deduplication work separately from the New York Public Library system on Fifth Avenue, meaning duplicates that span both collections — common for mid-20th century documentary photography — have not been systematically addressed. The city's own audit office flagged the lack of a unified image-governance policy in a report published in February 2026.

Tokyo's National Archives digitalisation programme, launched under a 2023 government mandate, is the largest of any city reviewed here: the target is 10 million digitised public records by 2030. Deduplication is built into the ingestion pipeline from the start, rather than retrofitted to existing collections. That architecture is widely regarded as the most efficient model, though it is only available to institutions building their digital collections from scratch.

London, working with legacy systems and legacy collections, cannot simply copy Tokyo. What it can do is tighten coordination between institutions. The Greater London Authority has been in discussions with the London Metropolitan Archives and the Guildhall Library since April 2026 about a shared image-registry standard — a common identifier system that would let archives across the capital check a new acquisition against a central index before storing it. No formal agreement has been signed.

If that registry comes together, institutions filing new images after an agreed cut-off date — likely January 2027, according to documents circulated among the working group — would run automatic checks before committing files to storage. The backlog before that date remains each archive's own problem to solve. For now, that means archivists on Northampton Road and West India Quay are still working through boxes of digital ghosts, one fingerprint at a time.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily London

Covering news in London. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to London news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily London and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the London brief

The day's London news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.