New audio article available: “Finding Light in Dark Archives”

Do you want one of your articles available as an audio version? Send out a message at! The article needs to be OA and formatted as a word document designed to be read out. Get in touch for more information.

This week, we are making another audio version of an Open Access article available as a podcast. This article appeared in “AI & Society”, 2022, volume 33.

Finding light in dark archives: Using AI to connect context and content in email


  • Stephanie Decker, University of Bristol, UK
  • David A. Kirsch, Robert H. Smith School of Business University of Maryland, USA
  • Santhilata Kuppili Venkata The National Archives (UK)
  • Adam Nix University of Birmingham, UK


Email archives are important historical resources, but access to such data poses a unique archival challenge and many born-digital collections remain dark, while questions of how they should be effectively made available are answered. This paper contributes to the growing interest in preserving access to email by addressing the needs of users, in readiness for when such collections become more widely available. We argue that for the content of email to be meaningfully accessed, the context of email must form part of this access. In exploring this idea, we focus on discovery within large, multi-custodian archives of organisational email, where emails’ network features are particularly apparent. We introduce our prototype search tool, which uses AI-based methods to support user-driven exploration of email. Specifically, we integrate two distinct AI models that generate systematically different types of results, one based upon simple, phrase-matching and the other upon more complex, BERT embeddings. Together, these provide a new pathway to contextual discovery that accounts for the diversity of future archival users, their interests and level of experience.

Keywords: email archives; born-digital collections; computational archival studies; contextual email discovery


We gratefully acknowledge funding support by the Arts & Humanities Research Council (UK) and National Endowment for the Humanities (USA) as part of the US-UK Partnership Development Grants, grant AH/T013060/1.

This article is available #OpenAccess here: