Linkability: Mapping Examples

When mapping the non-MARC elements with MARC fields
Focusing on Nine Common Groups of Bibliographic Metadata Elements

When mapping the non-MARC elements with MARC fields, the research team realized the need for finding a common ground for bibliographic data. From the non-MARC sample, the 20 digital music collections have used a variety of labels for their elements. While we understood the limits of analysis based on these displayed labels (instead of using the metadata records), this approach was still useful for identifying the types of information considered important for describing these resources. The following list provides a quick overview of the elements collected as part of the non-MARC sample. Bold-face terms represent generalized common element groups; then actual element labels found are listed, with the most common element labels listed first. The number in parentheses represents the number of times each label was found in the sample:


Eaxample: Elements collected from non-MARC sample

Title: Title (13); Alternate Title(s) (3); Other Title (4); Item Title (1); Recording title (1); Also titled (1); Title from tape (1); Uniform title (1); Display title (1)

Composer: Composer (10)

Creator: Creator (5)

Lyricist: Lyricist (4)

Performer: Performer (6); Other performers (1); Primary Performer (1)

Publisher: Publisher (3); Digital publisher (1); Original publisher (2)

Date: Date (4); Recording Date (2); Date of Recording (1); Date text (1); Date original (1); Date digital (1); Metadata entry date (1); Date of digitization (1); Date of publication (1); Manuscript date (1)

Language: Language: Language (7)

Repository: Repository (3)

Subject: Subject (8); Subject heading(s) (2)

Description: Description (11); Physical Description (5); Cover description (1); Description of the work (1)

Note: Note (11); Event Note (1); Performer note (1); Other notes (1); General notes (1) Creator (5)

Type: Type (4); Object type (2); Type of resource (1); Type of recording (1)

Creator: Creator (5)

The variations in the element labels were handled by clustering them under a common element group name, for example, the “Title” group name covered quite a few differences in how title elements were labeled. Another issue to be taken care was the similar elements that can be put into the same “bucket” for the purpose of mapping, for example, “Composer”, “Creator”, “Lyricist”, and “Performer”; all of these elements correspond to MARC fields describing responsible bodies. In attempting to align not only the data structure of MARC in relation to the description of music resources but also that of non-MARC descriptions by the digital collections, the research team created the unified crosswalk that closely mimics the metadata elements grouping recommended by a related project: “Meaningful Bibliographic Metadata (M2B): Recommendations of a set of metadata properties and encoding vocabularies” (Subirats & Zeng, 2012). In this latter report, names of suggested element groupings include: Title Information, Responsible Body, Physical Characteristics, Location, Subject, Description of Content, Intellectual Property, Usage, and Relation.

In the unified crosswalk for this research project (available from http://lod-lam.slis.kent.edu/about/default.html#music), all of the MARC fields and subfields, the elements from those digital collections, and their matching MO elements were aligned to correspond to the nine common groups identified by the M2B Recommendations. This crosswalk also included the elements from other metadata vocabularies (e.g., Dublin Core and FOAF) that were useful. The mapping relationships were categorized by the following SKOS (Simple Knowledge Organization System) mapping properties to indicate the degree of alignment: narrowMatch, broadMatch, relatedMatch, or closeMatch (see also Table 1, which illustrates how the alignments were presented in the unified crosswalk).

The Unified Crosswalk
Digging into the Most-Linkable Metadata Elements

Although the research team mapped elements from all nine groups of data found in the sample MARC records to the properties of MO and related schemas (such as Dublin Core properties used in music datasets), we realized that particular groups of elements tend to be more useful for interlinking library data to LD resources. Thus in the Unified Crosswalk five groups were included: Title Information, Responsible Body, Subject and Genre, Physical Characteristics, and Location. When considering the use scenarios discussed at the beginning of the paper, from the linkable and useful point of view, the team identified the following groups of bibliographic data as the most likely candidates for linking to other data sources.

The linkability of Title
Title information

In the case of music description, the relationship between a title and a sound recording or a sheet music volume is not simply one-to-one, unlike the majority of library collections. A MARC record of a sound recording as described in a library catalog can have several musical works contained in it, such as two or more works by the same person(s) or body (bodies), or works by different persons or bodies. Let’s take an example of a classical music album that has works from several different composers that are brought together under one album title (collective title). Where can the titles (collective and individual) be found?

• In subfield a ($a) of field 245, the title of the album is typically recorded as it appears on the resource to be cataloged, and additional title information such as a subtitle will be placed in 245 subfield b ($b).

Yet, titles of individual musical works contained in the album may appear in various other places in the MARC record. In order to understand where this title information might be found in the records, the research team explored other possible title information from both structured and semi-structured data. By semi-structured data, we mean that information might be provided within a field or subfield, but there is no clear indication what is a title (i.e., no content designator is used). These include:

• The 246 field (Varying Form of Title) is for other title information that is deemed important enough to receive its own access point. It would be an example of structured data because it is specifically designed to contain a type of title information.

• 700 $t (Title of a Work), 730 $a (Uniform Title), or 740 $a (Related Title), all for additional entries provided in structured data, often refer to the individual components of an album, such as its tracks.

• The 505 field (Formatted Contents Note), on the other hand, may contain structured or semi-structured title information, depending on the practice (examples can be found in Table 2). As structured data, the appropriate content designator must be used to indicate that the information is title-related (in this case, $t). Otherwise it is considered semi-structured.

• There is a small chance that some records may contain free-text (no structure), describing the contents in a 500 (General Note) field.

The large number of places in the record where music title information might be found presents a conundrum about whether these fields should be included in the Title Information section. We decided to include them here (Table 1) because people can easily search and access music by track instead of just by album these days, and it would behoove libraries to provide access in these ways.

Table 1. Mapping of Elements for Title Information

MARC

Class Mapping

Property Mapping

Field

Sub-fields

 Explanation

Music-related Classes

 Mapping Situation

Music-related Properties

 Mapping Situation

240

$a

Uniform Title

mo:MusicalWork

broadMatch

dct:title

closeMatch

245

$a

Title Proper

mo:Record

broadMatch

dct:title

closeMatch

245

$b

Subtitle

mo:Record

broadMatch

dct:title

closeMatch

246

$a

Variant Title

mo:Record

broadMatch

dct:title

closeMatch

dct:alternative

700

$t

Title of a work

mo:MusicalWork

broadMatch

dct:title

closeMatch

700

$m

Performance Medium

mo:instrument

closeMatch

700

$n

Part/Section

mo:opus

closeMatch

700

$r

Music Key

mo:key

closeMatch

730

$a

Uniform Title

mo:MusicalWork

broadMatch

dct:title

closeMatch

740

$a

Related Title

mo:MusicalWork, mo:Track

broadMatch, broadMatch

mo:track, dct:title

broadMatch,

closeMatch

505

$a

Content Note

mo:MusicalWork, mo:Track

broadMatch, broadMatch

mo:track, dc:description

broadMatch,

narrowMatch

505

$t

Title

mo:MusicalWork, mo:Track

broadMatch, broadMatch

mo:track, dc:title

broadMatch,

closeMatch

Note: Abbreviations: mo = Music Ontology; dct = Dublin Core Terms

While 7xx fields brings very useful linkable values to the bibliographic data, its application may be inconsistent from one cataloger to another cataloger, or from one institution to another institution. Another issue is current practice, which is based on AACR2. For example, for sound recordings, added entries are not recommended when there are four or more principal persons or bodies involved (AACR2, 21.23). In the sound recording cases, however, there are often more than three principal responsible bodies involved. Therefore one cannot just rely on the 7xx additional entries when trying to provide useful linked information. With the implementation of RDA, though, this situation can be dramatically improved.

The 505 (Formatted Contents Note) field is important for all music forms and genres. If added entries in 7xx fields were not used consistently and regularly and, again, when more than three principal persons or bodies involved but were not assigned access points in the 7xx fields, there would be no indexable access points to those individual musical pieces found on an album. Therefore, a 505 field may be the only place where these titles are recorded. Ideally, there will still be structured data in the 505, especially when $t is provided (see examples for Scenario 1 and Scenario 2 in Table 2), and they will become linkable nodes. However, catalogers might not use those content designators to define the data in the 505, and thus those titles could not be recognized and indexed automatically. In other words, although all tracks might be listed in a 505 field (see examples for Scenario 3 in Table 2), since there are no subfield codes for identifying the titles, it would be very difficult to use them as link nodes without additional processing efforts.



This work was supported by a grant from IMLS. It is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. reserved.