As you know Archives New Zealand is now accepting digital transfers. We have updated our guidance for digital transfer as below:
This blog provides some further information and technical detail around the process of transfer and ingest and highlights common issues encountered in recent transfers from organisations. This learning supports our common goal of preserving the digital memory of government.
· 33,389 born-digital records in the Government Digital Archive (GDA)
· 39 unique file formats for digital records
· 7 legacy transfers - 2015-2016 – mainly ministerial papers
· 3 transfers currently being processed
· 2 new transfers from agencies - 2017-2018
Some points to start with:
· before transferring digital records to Archives, an organisation must have a valid disposal authority
· like all transfers detailed discussion is necessary throughout the process, such as which records will be transferred, how they will be described and how they will be accessed
· sentencing digital records can be challenging, whether being transferred or not, and resource intensive. It is crucial to have a solid working relationship with your IT staff.
Once a discrete set of records has been identified for transfer we require the organisation to export the files and metadata from their systems. The metadata must contain a checksum. See our guidance on digital transfers and checksums.
In addition to checksums all digital transfers must arrive with descriptive metadata. In some cases there might be no metadata available, for example when the transfer is records from shared drives and not an EDRMS. But we assume organisations follow the recommendations about what metadata they need to generate and maintain in their recordkeeping systems (as described for example in the Metadata for information and records factsheet and Minimum requirements for metadata guide).
We ask organisations to give us the digital records and metadata “as-is”; eg a complete set of un-filtered metadata exported from an EDRMS. This can be in any structured form (CSV, TXT, XML etc.) - see our blog on File format for digital transfers and the Digital transfer initiation – characteristics guidance. We ask for all metadata fields, however there must be a way to ‘connect’ the metadata provided with the metadata used to describe the records in Archway. Often a discussion is necessary between Archives and organisation representatives to discuss what metadata the organisation wants to transfer and how that will work with our systems. Then the metadata mapping into Archway and the long-term preservation system work is done by us.
The actual transfer of the digital records can be any method which suits the needs of the transferring organisation – portable hard-drives, encrypted hard-drives or online shared workspaces - all work well.
We begin with an initial test set of digital records and perform a file-by-file analysis using a set of tools and scripts, some developed in-house. Reporting from this helps us understand the files from both technical and archival points of view. For those interested in technical detail we have described the process from testing to completing the ingest into Archway and the GDA in the sections below.
A first assessment is performed via our SQLite Analysis Engine which uses DROID and/or Siegfried file format identification tools reports. This gives a human-readable summary and statistics about the test set in an HTML file. It includes (not an exhaustive list):
· a count of files and directories
· overview of identified and unidentified file formats
· file extensions mismatches and similar issues
· zero-byte files
· empty folders
· duplicate files.
It indicates the frequency of file formats, date ranges and non-ASCII characters in file names. The tool contains a configurable “black list”, identifying potentially sensitive information.
Next is a test ingest into the test environment of the GDA using the Rosetta digital preservation system developed by Ex Libris. This test ingest might highlight other issues not identified by the SQLite tool.
After identifying all issues in the transferred set, there is further conversation with the transferring organisation on individual files. Some common issues with files are:
Metadata extraction issues and
wrong file extensions
unknown file formats
invalid file formats
Solutions to issues are discussed with organisations before possible ingest to the GDA. The decision to progress the transfer is based on the number of issues identified, their nature and the time needed to fix them.
After agreement on the distilled set of digital archives ready for ingest we begin the ingest process by importing metadata to Archway, as only items with existing descriptions in Archway can be ingested to the GDA. The metadata is imported via a CSV file which is created internally using a script called the Archway Import Generator. This uses the original DROID report (exported as a CSV) and the recordkeeping metadata from the organisation with technical metadata extracted at Archives along with a unique identifier.
The final step is to prepare a submission package with the files and metadata for the GDA. For this we use a tool called the Rosetta CSV generator tool. This tool uses the original DROID report combined with the metadata previously imported to Archway. The result is a CSV import sheet which can be submitted, together with the files, to the GDA for ingest.
It can take a lot of time to process and analyse the collection, depending on the size (number of files) and the complexity (number of file formats). Some examples of issues we may encounter before the analysis include:
· checksums failing on transfer
· difficulties with metadata mapping
· metadata can arrive in strange coding (e.g. UTF-16) causing issues with Māori macrons.
A certain amount of pre-conditioning before ingest should be expected with any transfer – ie, fixing found issues (file extension replacement, odd characters replacement etc.). For every digital file which has had pre-conditioning work done to it a provenance note is created which is stored with the file and metadata in the system.
It is important for the organisation to understand that once a transfer of digital records is completed and the Transfer Agreement and Access Authority both signed, it is our expectation that the copies of the records on the organisation side will be deleted. This is to ensure duplicate transfers do not occur in the future. The records transferred into the custody of the Chief Archivist become the authoritative copy.
If the transfer process is permanently halted for any reason before final sign off, the copies of the digital files in Archives New Zealand’s possession will be destroyed.
All archived digital items are accessible via Archway (allowing for organisation restrictions). See example here. The link in Archway takes the user to the Rosetta viewer. If the Rosetta viewer supports the file format, the file is rendered (opens) automatically. If the Rosetta viewer does not support the file format the file is offered as a download.
If you think your organisation has born digital records which are suitable for transfer as archives under a current disposal authority please get in touch with us via email@example.com