London's cultural and media sector moved on several fronts this week to address the proliferation of duplicate images across digital collections, with the British Library in St Pancras and the National Portrait Gallery on St Martin's Place both advancing internal audits of their digitised holdings. The drive follows months of pressure from archivists and digital rights groups who have flagged how duplicate entries inflate storage costs, confuse public search tools, and in some cases generate copyright confusion when the same image appears under multiple catalogue entries.
The timing is not accidental. The UK government's Digital Information and Smart Data Act, which passed earlier this year, placed new obligations on publicly funded institutions to maintain accurate, deduplicated digital records by January 2027. That deadline has focused minds at organisations that have spent the past decade uploading physical collections at pace without always cleaning the data behind them.
What Changed This Week
On Wednesday, the British Library confirmed it had contracted with a London-based data integrity firm to run deduplication passes across its digitised newspaper archive, which holds tens of millions of page scans dating back to the 17th century. The project, budgeted at a figure the Library said it would disclose in its next quarterly report, is expected to run through October. The archive is accessible to the public via the Library's reading rooms at 96 Euston Road as well as through its online portal.
Separately, the Museum of London — now operating from its new Smithfield site following its move from London Wall — announced Tuesday that its photographic collection review had so far identified more than 12,000 duplicate image entries across its online database. Staff are working through a prioritised list, starting with items in the highest-traffic search categories: the Great Fire of London, the Blitz, and the 1966 World Cup.
For smaller organisations, the technical and financial challenge is sharper. Several independent London galleries and local authority heritage services contacted by The Daily London described a patchwork of manual processes still in use, with some staff comparing metadata entry by hand across spreadsheets. Hackney Archives, based in the Dalston area, has applied for a grant from the National Lottery Heritage Fund's Digital Skills programme to automate parts of the process, though a decision on that application is not expected until September.
The Data Behind the Problem
A January 2026 report from the Digital Preservation Coalition, based in York, estimated that UK cultural institutions collectively hold somewhere in the region of 30 to 40 percent redundant image files within their digital storage environments — a figure driven partly by legacy migration projects where collections were uploaded multiple times across different systems without cross-referencing. The report did not break out London-specific figures, but sector insiders say the capital's density of institutions makes it a particular concentration point.
Storage is not cheap. Commercial cloud rates for large image files, particularly uncompressed TIFF scans common in archival work, have remained high through 2025 and into 2026. For context, the British Library alone holds an estimated 170 terabytes of digitised content, according to figures published in its 2024-25 annual report.
The problem also bleeds into the commercial press. Several UK picture desks have faced internal audits after duplicate licensing errors — cases where the same photograph was billed twice to editorial clients under different catalogue codes — created billing disputes.
For members of the public using digital archive tools, the practical upshot this week is that search results on some platforms may look different as deduplication work removes redundant entries. Anyone researching via the British Library's online newspaper archive or the Museum of London's collections portal should expect some catalogue numbers to be retired and merged in the coming months. Institutions have advised users to save permalinks to specific items rather than relying on search result positions, which may shift as the clean-up progresses.