Skip to main content
The Daily London

London news, every day

News

Duplicate Image Replacement: What Happened This Week in London's Digital Archive Push

City institutions and media organisations accelerated efforts this week to tackle the growing problem of duplicate imagery cluttering digital archives and public-facing platforms.

Share

By London News Desk · Published 5 July 2026, 4:57 am

4 min read

Updated 3 h ago· 5 July 2026, 1:57 pm

How we reported this

This article was generated by AI from the linked public sources. The Daily London is independently owned and covers London news free from advertiser or sponsor influence. Read our editorial standards →

Duplicate Image Replacement: What Happened This Week in London's Digital Archive Push
Photo: Photo by Manzoni Studios on Pexels

London's cultural and media sector moved on several fronts this week to address the proliferation of duplicate images across digital collections, with the British Library in St Pancras and the National Portrait Gallery on St Martin's Place both advancing internal audits of their digitised holdings. The drive follows months of pressure from archivists and digital rights groups who have flagged how duplicate entries inflate storage costs, confuse public search tools, and in some cases generate copyright confusion when the same image appears under multiple catalogue entries.

The timing is not accidental. The UK government's Digital Information and Smart Data Act, which passed earlier this year, placed new obligations on publicly funded institutions to maintain accurate, deduplicated digital records by January 2027. That deadline has focused minds at organisations that have spent the past decade uploading physical collections at pace without always cleaning the data behind them.

What Changed This Week

On Wednesday, the British Library confirmed it had contracted with a London-based data integrity firm to run deduplication passes across its digitised newspaper archive, which holds tens of millions of page scans dating back to the 17th century. The project, budgeted at a figure the Library said it would disclose in its next quarterly report, is expected to run through October. The archive is accessible to the public via the Library's reading rooms at 96 Euston Road as well as through its online portal.

Separately, the Museum of London — now operating from its new Smithfield site following its move from London Wall — announced Tuesday that its photographic collection review had so far identified more than 12,000 duplicate image entries across its online database. Staff are working through a prioritised list, starting with items in the highest-traffic search categories: the Great Fire of London, the Blitz, and the 1966 World Cup.

For smaller organisations, the technical and financial challenge is sharper. Several independent London galleries and local authority heritage services contacted by The Daily London described a patchwork of manual processes still in use, with some staff comparing metadata entry by hand across spreadsheets. Hackney Archives, based in the Dalston area, has applied for a grant from the National Lottery Heritage Fund's Digital Skills programme to automate parts of the process, though a decision on that application is not expected until September.

The Data Behind the Problem

A January 2026 report from the Digital Preservation Coalition, based in York, estimated that UK cultural institutions collectively hold somewhere in the region of 30 to 40 percent redundant image files within their digital storage environments — a figure driven partly by legacy migration projects where collections were uploaded multiple times across different systems without cross-referencing. The report did not break out London-specific figures, but sector insiders say the capital's density of institutions makes it a particular concentration point.

Storage is not cheap. Commercial cloud rates for large image files, particularly uncompressed TIFF scans common in archival work, have remained high through 2025 and into 2026. For context, the British Library alone holds an estimated 170 terabytes of digitised content, according to figures published in its 2024-25 annual report.

The problem also bleeds into the commercial press. Several UK picture desks have faced internal audits after duplicate licensing errors — cases where the same photograph was billed twice to editorial clients under different catalogue codes — created billing disputes.

For members of the public using digital archive tools, the practical upshot this week is that search results on some platforms may look different as deduplication work removes redundant entries. Anyone researching via the British Library's online newspaper archive or the Museum of London's collections portal should expect some catalogue numbers to be retired and merged in the coming months. Institutions have advised users to save permalinks to specific items rather than relying on search result positions, which may shift as the clean-up progresses.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily London

Covering news in London. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to London news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily London and accept our Privacy Policy. Unsubscribe anytime.

Before you go

Get the London brief

The day's London news in a 2-minute read. Free, weekday mornings.

No spam. Unsubscribe anytime.