Exporting and importing data in RIMMF6

This page is still in progress

This page describes

  1. how the program stores your data
  2. the data options available for an export
  3. the options available for an import

Your RIMMF data

Data format

When an entity record is created and saved in RIMMF, the data is stored in a diskfile using the N-Triples format: 'a line-based, plain-text serialisation format for RDF graphs'–wikipedia. These N-Triples files use the windows file extension '.nt'. Note: although the 'official' internet media type for an .nt file is “application/n-triples”, they are in essence “text/plain” files. The .nt files produced by RIMMF can be opened, viewed, edited, etc., by any text editor.

As to the actual data within these files, we apply the following conventions.

Unicode characters

Unicode characters are stored in escaped format. For example, the characters which comprise the displayed string

John Le Carré

will be stored as

John Le Carr\u00E9

Here “\u00E9” represents the hexadecimal Unicode code point for “é” (e acute). The hexadecimal number must be exactly four digits long1). When converting and deconverting between unicode characters and their escaped representations, RIMMF uses Normalization Form C.

Data exported in RIMMF6 is always unicode-escaped.

RDA Elements

RIMMF stores RDA Elements, or properties, using opaque identifiers. For example, when storing a triple for the RDA element 'Title of work', the URI

http://rdaregistry.info/Elements/w/P10088

will be used.

The creators of the RDA Registry developed an alternate way of identifying elements called a lexical alias. Using this property, a triple for the RDA element 'Title of work' would be represented as

http://rdaregistry.info/Elements/w/titleOfWork.en

This is a convenient naming convention to use during debugging, as opaque Ids do not easily support human comprehension. Unfortunately, despite the implied support for translation2), a lexical alias value is not included in any of the RDA translations3).

In addition, RIMMF stores only canonical RDA elements. We do not store triples using the object or datatype subclasses typically defined for each RDA entity. (Data that is imported to RIMMF using these subclasses will be mapped to the corresponding canonical class).

Statements about statements

As you know, a triple is a simple statement, comprised of three terms:

  1. subject,
  2. predicate (or property), and
  3. object (value).

In order to support provenance, applications need to uniquely identify each statement. Given a unique identifier, a statement can be treated as a resource about which additional statements can be made. The term used in RDF for saying 'something about' a statement, or statements, is reification.

There are several ways to reify a statement in RDF. In RIMMF, the N-Quads format is used to assign unique identifiers to statements. N-Quads are an extension of N-Triples in which an optional fourth part, called a graph label, is appended after the object. The graphLabel assigned by RIMMF is always a unique IRI.

Another way to reify a statement is to use the built-in RDF reification vocabulary; find out more about that here. Note that at present there is no standard, universally accepted, means of reification.

RIMMF-specific data and metadata

The subject of every RIMMF statement is assigned the namespace:

http://rimmfdata.com/

Subdomains may be used to categorise statements, for example:

Note that the application metadata assigned to the '/m' namespace is data that RIMMF uses internally: the version of RIMMF used to create an entity record, the windows filename in the local storage, the entity template used to create it, timestamps, and so on. An application on the receiving end of this data can safely ignore triples in the the '/m' namespace.

Export options

In RIMMF6, an option to export records is available:

  • When viewing the EI
  • When viewing an R-Tree
  • When viewing a Manifestation 4)

When exporting data in RIMMF6, the following options are available:

By Default is meant:

  • RDA Opaque Ids
  • Graph labels for reification

The other options on the form refer to processing options described above:

  • LexicalAlias identifiers
  • RDF reification vocabulary.

In RIMMF6, data is always exported as N-Triples and the output file is assigned a '.nt' file extension. In the past, RIMMF supported '.zip' versions of an export. Given the relatively small file sizes involved using RIMMF, and the blocking of '.zip' email attachments by many institutions, this option has been removed. The user can easily zip an exported data file themselves if needed.

Brief example

An example based on the the Title proper of a Manifestation follows, for each of the four export options. A single provenance statement is included (Note: in RDA, every statement, taken on its own, is considered an RDA Work).

Default export

<http://rimmfdata.com/r/rks425> <http://rdaregistry.info/Elements/m/P30156> "Love Me Do" <http://rimmfdata.com/r/rks425/15> .
<http://rimmfdata.com/r/rks425/15> <http://rdaregistry.info/Elements/w/P10219> "20220511T160408" .

Export with Lexical alias Ids

<http://rimmfdata.com/r/rks425> <http://rdaregistry.info/Elements/m/titleProper.en> "Love Me Do" <http://rimmfdata.com/r/rks425/15> .
<http://rimmfdata.com/r/rks425/15> <http://rdaregistry.info/Elements/w/dateOfWork.en> "20220511T160408" .

Export with RDF reiification vocabulary

<http://rimmfdata.com/r/rks425> <http://rdaregistry.info/Elements/m/P30156> "Love Me Do" .
<http://rimmfdata.com/r/rks425/15> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> .
<http://rimmfdata.com/r/rks425/15> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://rimmfdata.com/r/rks425> .
<http://rimmfdata.com/r/rks425/15> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http://rdaregistry.info/Elements/m/P30156> .
<http://rimmfdata.com/r/rks425/15> <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> "Love Me Do" .
<http://rimmfdata.com/r/rks425/15> <http://rdaregistry.info/Elements/w/P10219> "20220511T160408" .

Export with Lexical alias Ids and RDF reiification vocabulary

<http://rimmfdata.com/r/rks425> <http://rdaregistry.info/Elements/m/titleProper.en> "Love Me Do" .
<http://rimmfdata.com/r/rks425/15> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement> .
<http://rimmfdata.com/r/rks425/15> <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://rimmfdata.com/r/rks425> .
<http://rimmfdata.com/r/rks425/15> <http://www.w3.org/1999/02/22-rdf-syntax-ns#predicate> <http://rdaregistry.info/Elements/m/titleProper.en> .
<http://rimmfdata.com/r/rks425/15> <http://www.w3.org/1999/02/22-rdf-syntax-ns#object> "Love Me Do" .
<http://rimmfdata.com/r/rks425/15> <http://rdaregistry.info/Elements/w/dateOfWork.en> "20220511T160408" .

Importing data

Importing refers to adding RDA entity records to your RIMMF6 environment.

The data being imported must use the RDA vocabularies and the N-triples format. If working from a different serialization, like RDF XML, convert it to N-triples first. The character encoding must be escaped unicode (Some utilities that, like raptor, convert RDFXML to N-Triples also convert UTF-8 to escaped unicode).

The interface to the “Import records” utility is located on the main menu under the “Tools” option; when selected, the following form is displayed:

To import a file of RDA entity records, drag and drop the file onto this form.

All of the options supported during an export are automatically supported during an import. When the file is first dropped onto the import form, the program parses the file with the goal of determining whether:

  • the file was produced by a supported version of RIMMF (RIMMF4- )
  • the RDA elements use LexicalAlias or Opaque Ids
  • the reification method is N-Quads or RDF vocabulary

In the case of the latter two items, any needed conversions–from LexicalAlias to Opaque, from RDF vocabulary to N-Quads–will be performed during the initial parse to render the triples into the program's Default format.

If the import process is successful, the user is prompted to enter a folder name; the imported records will be added to the new folder; a subsequent option automates the “Change data folder” action and opens an Entity Index on the new folder.

Notes and Exceptions

For the most part, the import tool expects the incoming data to have been generated by RIMMF6.

RIMMF4 data folders can be dropped onto the RIMMF6 import form with good results, but data from any earlier RIMMF will fail.

There is an attempt to support non-RIMMF data provided it uses N-triples and RDA Elements. This support is activated by selecting the External data box before dropping the file onto the form. If successful, and if there are non-RDA properties in the file, they will be added to the respective entity record in a raw format–i.e. instead of displaying a human-readable label for a statement, the “Element Label” column in the RIMMF display will contain the property URI used in the imported triple.

Note that, at least in the current release, the External data box must be unchecked when impporting RIMMF6 data5).

When the import tool prompts for a foldername–

–take care that the new foldername does not already exist. If it does, the import process will fail and need to be restarted.

When RIMMF6 (or in some case, RIMMF4) produces an export file (using the Export option in the Entity Index menu), a comment line will appear at the beginning of each entity record.

Something like this:

# BEGIN http://rimmfdata.com/r/rks13637
<http://rimmfdata.com/r/rks13637> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdaregistry.info/Elements/c/C10004> .

This comment may be useful when parsing the file but it has no semantic purpose.

1)
More information is available from the original N-Triples specification: https://www.w3.org/TR/rdf-testcases/#ntrip_strings
2)
evidenced in the .en suffix
3)
i.e. only English values are available
4)
and selecting 'Export all records in set' from the 'Rda record set' submenu
5)
but not RIMMF4, strangely enough; this anomaly is something to be resolved in a future update
rimmf4/r6export.txt · Last modified: 2024/01/09 19:54 by 127.0.0.1
Back to top
CC Attribution-Share Alike 4.0 International
Driven by DokuWiki