Exporting data in RIMMF6

This page applies to RIMMF6 release 20230630 and later.

Export records options

In RIMMF, an option to export records is available:

  • When viewing the EI
  • When viewing an R-Tree
  • When viewing a Manifestation 1)

This page describes

  1. how the program stores your data, and
  2. the data options available for an export.

Your RIMMF data

Data format

When an entity record is created and saved in RIMMF, the data is stored in a diskfile using the N-Triples format: 'a line-based, plain-text serialisation format for RDF graphs'–wikipedia. These N-Triples files use the windows file extension '.nt'. Note: although the 'official' internet media type for an .nt file is “application/n-triples”, they are in essence “text/plain” files. The .nt files produced by RIMMF can be opened, viewed, edited, etc., by any text editor.

As to the actual data within these files, we apply the following conventions.

Unicode characters

Unicode characters are stored in escaped format. This means that the characters which comprise the displayed string

John Le Carré

will be stored as

John Le Carr\u00E9

In the example above, “\u00E9” represents the hexadecimal Unicode code point for “é”. The hexadecimal number must be exactly four digits long2). When converting and deconverting between unicode characters and their escaped representations, we use Normalization Form C.

RDA Elements

RIMMF stores RDA Elements, or properties, using opaque URIs. For example, when storing a triple for the RDA element 'Title of work', the URI

http://rdaregistry.info/Elements/w/P10088

will be used.

In addition, RIMMF stores only canonical RDA elements. We do not store triples using the object or datatype subclasses typically defined for each RDA entity. (Data that is imported to RIMMF using these subclasses will be mapped to the corresponding canonical class).

Statements about statements

As you know, a triple is a statement, comprised of a subject, predicate (or property), and object (value). In order to support provenance, we need a way to uniquely identify each statement. In RIMMF, the N-Quads format is used to assign unique identifiers to statements. N-Quads are an extention of N-Triples; the fourth part, called a graphLabel, is appended after the object. The 'quad' assigned by RIMMF is always a unique IRI. N-Quads should be compatible with all applications that support N-Triples.

The term used in RDF for saying 'something about' a statement is reification. More on this later.

RIMMF-specific data and metadata

The subject of every RIMMF statement is assigned the namespace:

http://rimmfdata.com/

Subdomains are used to categorise statements, as follows:

The program metadata assigned to the '/m' namespace is data that RIMMF uses internally: the version of RIMMF used to create the record, the windows filename, tne entity template used to display it, and so on. An application on the receiving end of this data can safely ignore triples in the the '/m' namespace.

Export options

The next three options on the form refer to various combinations of two processing features:

  • LexicalAlias properties, and
  • RDF reification vocabulary.

Select 'LexicalAlias properties' to export the selected records using a human-readable string instead of an opaqueId for RDA elements. For example, without this option, the property used for “Title of Work” would be:

http://rdaregistry.info/Elements/w/P10088

Whereas if this option is selected, the same property would be rendered as:

http://rdaregistry.info/Elements/w/titleOfWork.en	

The main reason to use this option is to facilitate data analysis–since an NTriples

When a '.txt' file is dragged and dropped on RIMMF's 'Import records' form, a new folder will be created, and the contents of the file will be added as individual records in that folder, and appear in a new EI.

If 'Format for RIMMF' is not checked, the selected records will be export to a file with the '.nt' file extention, and a spacing line will not be output after each record. This exports the data as a single batch of N-Triples.

When an '.nt' file is dragged and dropped on RIMMF's 'Import records' form, a new folder will be created, as above, and the contents of the file will be added as a single record.

'.nt' might be the more useful format to use when sharing records with a non-RIMMF application.

Notes

Both the '.txt' and '.nt' formats use the N-Triples syntax. RIMMF exports N-triples as 'text/plain' (any character outside US-ASCII will be escaped); thus, either format can be opened in any text editor.

The 'spacing' line mentioned above may contain an arbitrary string, such as '0000', used as a marker for the program to determine when one 'record' ends and the next begins. Other than this, the '.txt' file is exactly the same as the '.nt' file.

About '.zip' files

In the past, RIMMF supported '.zip' versions of an export. We haven't entirely ruled this out as a future option; but at present, given the relatively small file sizes involved, and the blocking of '.zip' email attachments by many institutions, there's not a dire need to support this in R4.

If there is a need to export a large EI, the user can manually generate a '.zip' from either of the two export formats above.

The RIMMF4 Import process still supports dragging and dropping a '.zip' file onto it.

+ When viewing a Manifestation (export all records in set)

If 'RIMMF' is checked (which is the default), the selected records will be exported as N-Triples; the output file will be assigned a '.txt' file extension, and a spacing line will be output after each record. This extra line maintains the RIMMF distinction between 'records'; otherwise the exported data would simply be a stream of N-Triples. '.txt' is a convenient format for sharing records with other RIMMF users (via email, etc.).

If 'N-Triples' is checked, the selected records will be exported as N-Triples and the output file will be assigned a '.nt' file extension. There is no other difference between this option and the default–except for the file extension and the empty line between records, the exported data is exactly the same in both cases.

, although there are as many as four other different conventions for reification 3).

1)
and selecting 'Export all records in set' from the 'Rda record set' submenu
2)
More information is available from the original N-Triples specification: https://www.w3.org/TR/rdf-testcases/#ntrip_strings
3)
my research identified the following: N-Quads (aka 'named graphs'), RDF's built-in reification vocabulary, singleton properties, RDF-star, and something called “association nodes”
rimmf4/r6export.txt · Last modified: 2023/07/01 22:47 by Rick
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki