Linkability: Data Structure

The Relationship Between Data Sructure and Linkability
Linkability’s Dependency on the Suitable Structured and Semi-structured Data

The paper reveals linkability’s dependency on the suitable structured and semi-structured data, as well as the impact of how data has been produced by the library community. The researchers also share the results of the alignment work to achieve the goal of incorporating LD music information sources into bibliographic records through the identification of some of the metadata elements showing the most promise for interlinking.

One of the key tasks of the project was to find linkable elements from library bibliographic data that will allow library data to be linked with those available music information sources. The research team mapped the fields, subfields, and relator codes of the MARC format to the classes and properties of the Music Ontology, a vocabulary that has come to serve as the basis for many other application profiles for music information sources. Relevant elements in MARC records were identified and tested using a representative sample of MARC records that encompasses various genres of music. The team also studied metadata structures used by 20 digital music collections and mapped them to those MARC fields. Key fields in MARC music records that contain resources with value for LD applications were identified.


The Need for Greater Granulatity in Resource Description

Users are more and more interested in directly finding, identifying, selecting, and obtaining individual pieces of music (e.g., songs) rather than having to find pieces by sifting through album-level descriptive data. Leazer (1992) indicated the challenge of finding a work from a container such as a compiled collection of works. In order for users to find those smaller units by title, cataloging records for such works must contain each individual song, piece, or movement in a table of contents or summary note, or, provide an access point (i.e., added entry) for each title (Leazer, 1992; King, 2007). A great example of such a catalog is Iris (http://iris.banq.qc.ca/), the Online Public Access Catalogue (OPAC) of the Bibliothèque et Archives nationales du Québec, which indexes to the level of individual pieces in albums (e.g., printed sheet music books and CDs). Still, much library data for music in the world consists of cataloging records created to describe the larger units, not individual pieces. Even though some do reveal more detailed contents in the note fields, those listings might not be directly accessible to users in an online catalog. We discuss this problem further in the results and discussion section of this paper.


Bridgeing the Devide Between library data structures and online metadata

The challenge of enhancing library bibliographic data for music discovery has recently been a topic of discussion within the Music Library Association (MLA). In 2012, the MLA Board of Directors endorsed “Music Discovery Requirements,” a report from the Association’s Emerging Technologies and Services Committee (subsequently published in 2013 by Newcomer et al. as “Music Discovery Requirements: A Guide to Optimizing Interface” by Notes). The guide targets the real issues related to the legacy data generated in the MARC and AACR2 (Anglo-American Cataloguing Rules, Second Edition) environment, providing recommendations relating to each bibliographic description field required for music works, expressions, and manifestations (emphasis added to indicate the FRBR entities) in the context of indexing and display in discovery interfaces (Newcomer et al., 2013). Glennan (2012) has provided another timely document, which discusses the development of Resource Description & Access (RDA) and its impact on description and access of music materials.


From Document-centric to Data-centric Metadata: Moving from MARC to RDF-based Description

The W3C Library Linked Data Incubator Group Report pointed out that today’s library data is not integrated with Web resources. “Library data today resides in databases which, while they may have Web-facing search interfaces, are not deeply integrated with other data sources on the Web. There is a considerable amount of bibliographic data and other kinds of resources on the Web that share data points such as dates, geographic information, persons, and organizations. In a future Linked Data environment, all these dots could be connected” (W3C Library Linked Data Incubator Group, 2011, Section 3.1.1). Noticing the significant increase in activity over the past few years to integrate library metadata with the Semantic Web, Dunsire and Willer (2011a) laid out detailed examples of how a traditional library bibliographic record can be disaggregated into catalog records consisting RDF triples, and the benefit of such data. They also explained the required components including URIs and controlled vocabularies. In another paper, they presented recommendations for representing UNIMARC formats in RDF for bibliographic and authority data (Dunsire and Willer, 2011b). Similarly, Alemu et al. (2012) called for taking a conceptual shift from current document-centric to data-centric metadata, moving from MARC to RDF-based description. The paper presented methods of achieving this goal without disrupting current library metadata operations.

Outside of the library community, many studies, projects, and implementations have taken place in music-related communities. A number of music datasets published in the last decade, discussed below in the data collection and analysis section of this paper, can be considered pioneers in Linked Open Data (LOD). Projects that utilize LOD datasets have resulted in promising products. The Linked Jazz (http://linkedjazz.org/) project used LD approach to reveal the relationships between musicians and their community’s network from digital archives of jazz history. The project demonstrated how great the potential is for exploring and linking resources such as agents and events in the musical world (Pattuelli, 2012). Yet, despite these advances and new resources available, few libraries have tried to bridge bibliographic data to linked music data.

How can library bibliographic data be enhanced using relevant LD, and thus satisfy today’s user needs? For example, is it possible that library bibliographic data might be connected with LD music datasets such as BBC Music, DBTune, MusicBrainz, etc. and thus enable users to mash up those useful data points, just like in the example of OpenAgris (see Figure 1)? One of the key tasks is to find linkable elements from library bibliographic data that will allow library data be linked with those music information sources that are available as LOD datasets, through their vocabularies. This will pave the way to successful interlinking. The following sections will report the methodology used by the authors that led to the representative mapping results and discuss issues relating to those elements in MARC records identified as most linkable. The article aims to show how libraries may begin to overcome the barriers between library data and linked music data.



This work was supported by a grant from IMLS. It is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. reserved.