Metadata Research Focuses: Music Information

Brief Overview
Connecting to Music Through the Semantic Web Music Information as Linked Open Data

We looked closely at this model before and while doing our vocabulary alignment work. Notice that disconnections are made between various events: Composi0on, Performance, Sound, Recording, Signal, and Record are all classes of events that related to the Musical Work itself.

Literature Review
Connecting to Music Through the Semantic Web Music Information as Linked Open Data

Music information as linked open data. To describe a data source, using the term LD implies that the data have been structured in a way so that they can be interlinked with other data on the web, using a particular set of best practices. The four principles, or best practices, of LD, as first defined by Tim Berners-Lee (2007), are stated as follows:

1. Use URIs [uniform resource identifiers] as names of things.
2. Use HTTP URIs, so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
4. Include links to other URIs, so that they can discover more things (Heath & Bizer, 2011, chapter 2).

Data sets that have published their data as LD using the RDF standard are often referred to as “RDFized.” Creating or converting data sets to RDF is an important first step, but to be truly useful to a broad spectrum of potential users they must also be open data. A data set can be described as “open” when it is also easily accessible and can be modified or reused without restriction other than requiring users to honor attribution and integrity requirements (Open Definition, n.d.).

At the time of this writing, the CKAN Data Hub contained 41 data sets tagged with the term “music,” 25 of which were labeled as open data. Data sets identified as being music-related include a variety of information about musical works, musicians, and events such as live performances and broadcasts. Some sets contain metadata relating to specific collections, whereas others store data about music uses, such as playlists. Other data sets not exclusively related to musical works or artists that are also valuable sources of music-related information include DBpedia, Freebase, the Library of Congress Subject Headings (LCSH), and the Virtual International Authority File (VIAF).

At the time of this writing, the CKAN Data Hub contained 41 data sets tagged with the term “music,” 25 of which were labeled as open data. Data sets identified as being music-related include a variety of information about musical works, musicians, and events such as live performances and broadcasts. Some sets contain metadata relating to specific collections, whereas others store data about music uses, such as playlists. Other data sets not exclusively related to musical works or artists that are also valuable sources of music-related information include DBpedia, Freebase, the Library of Congress Subject Headings (LCSH), and the Virtual International Authority File (VIAF). Most recently, the OCLC (Online Computing Library Center) announced the “first step toward adding linked data toWorldCat by appending schema.org descriptive markup to WorldCat.org pages” (OCLC, 2012). Schema.org provides a set of markup vocabularies for on-page markup use, allowing search engines to understand the information on web pages, especially the structures of the original data source, before they are published in HTML. As a result, richer and more meaningful search results can be provided to users, leading to more relevant information on the web. WorldCat is the largest multilingual bibliographic catalog in the world, containing thousands of records relating to musical works (either musical scores or recordings). Along with millions of publicly available WorldCat data, these music-related bibliographic records are now ready for use by intelligent web crawlers, which can make use of them in search indexes and other applications.

Integrating Structured Data Into Information Services.

Although information services must have access to LOD in order to enrich their records with other pertinent information sources, it is equally important to develop ways of querying those sources and aggregating structured data found in those external data sets within the service. An example of an information service that provides easy access to LOD about persons, organizations, places, concepts, etc., is DBpedia, which is an RDFized version of the structured data found on Wikipedia and was developed by the Free University of Berlin, the University of Leipzig, and OpenLink Software (DBpedia, 2011). It defines LD URIs for millions of concepts; many data providers set RDF links from their data sets to DBpedia, making DBpedia one of the central interlinking-hubs of the data sets in the Linking Open Data group as evidenced by its prominence in the visualization of the LOD Cloud (Linking Open Data Cloud, 2011).

More recently, Google’s Knowledge Graph has been implemented (see Fig. 1). Although it is not labeled as an implementation of LOD, Google’s official blog indicates that it displays data about things, people, and places drawn from diverse sources such as Wikipedia, Freebase (which is owned and funded by Google, with an approach similar to that of DBpedia), other subject-specific sources, and Google’s own data stores (Singhal, 2012).


FIG. 1. Knowledge Graph generated on June 30, 2012, through a Google Search on the name “Mozart”; then from the Mozart graph to The Magic Flute graph. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Although this example from Google Knowledge Graph shows how data can successfully be integrated into information services, this functionality relies, however, on a robust mapping of metadata between the data sets and the alignment of ontological classes. To retrieve relevant information, the relationships between the ontological structure of the source data set and that of the target data sets must be defined so that queries will be directed to the data most closely corresponding to the users’ needs.

How Users Search For and Use Musical Works and Music-Related Information

Since the early 1990s, increased interest in music information-seeking behavior and music informationretrieval (IR) methods has resulted in numerous studies of the search activities of users seeking manifestations of musical works and music-related information as well as analysis of the performance of music IR systems. In the following section, a review of relevant results from these studies is presented.

How Users Search for Music Scores and Recordings in Catalogs and Other Databases

In 2005, King reviewed five studies of users searching for music scores and recordings in manual and automated catalogs in order to determine what types of queries were most often used, what access points are most helpful to music searchers, and what sorts of information would be most important to include in the bibliographic record. King found several common findings in the areas of known-item and subject searching among the studies he analyzed. Known-item searching may be defined as search activity in which the user knows that a particular work exists and has certain information about it, such as composer, title, or performer. In subject searching, the user does not have a particular work in mind but instead may be looking for works drawn from particular types or genres of music.

The first key finding shared among all the studies reviewed was that known-item searches for music materials in catalogs are much more common than subject searches. This result was also corroborated by Lee in his 2010 study of natural language queries for music material. An important caveat to this phenomenon relates to the type of users studied; in the studies reviewed by King, searchers came from college and university environments. The predominance of known-item searching may not be generalizable to the general population of music searchers (2005, p. 7).

Author searches were by far the most common type of known-item searches, constituting over 50% of searches in all studies King reviewed. Title searching is less commonly used, most likely because well-known works (particularly classical pieces) may have numerous titles; for example, Beethoven’s Symphony Number 9 has many nicknames, such as the Ode to Joy, and the title may be translated into many different languages. The uniform title, familiar to catalogers, attempts to solve this problem. A related problem for title searching is the challenge of finding a work when it is compiled into a collection of works; only if the work is noted in the cataloging record through a contents listing, given a name-title or title added entry, or cataloged as an analytic will the user easily be able to find the music (Leazer, 1992). Leazer, cited by King, found that the fewer music pieces found on a sound recording, the more likely a user was to be able to locate the recording in the catalog (King, 2005, p. 14). Subject searching, which most often translates into a search by genre, form, or instrumentation, tends to occur more often when performers are looking for music written for their particular instrument, or when composers and theorists wish to study particular types of music (King, 2005, p. 8).

Gardinier (2004), who studied the search behavior of music faculty to determine which access points they considered most useful, found that there were six data elements that participants employed 75% of the time: composer, title, performer, genre, opus number/thematic number, and instrumentation. Over two thirds of this group also found the following elements to be useful: edition (various types), format, publisher, manuscript, and source of lyric (Gardinier, 2004, p. xiv). Gardinier suggests that catalogs increase the number of access points for music, making information such as genre, instrumentation, format, and publisher indexed and searchable. She also suggests that the authors of liner notes be brought under name authority control.

Changes in User Search and Retrieval Behavior for Music Recordings

Searching for music recordings has been transformed in the wake of the introduction of services such as iTunes and Amazon. As music search and purchase activities migrated from record stores and other retail facilities to Internet retailers and streaming services, the ways in which music users looked for and consumed music recordings began to shift as well. This searching behavior may be triggered by the documented phenomenon of users searching for particular works after being exposed to music recordings throughout the day in a variety of environments in the home, workplace, and retail establishments and via commercials and entertainment media such as music videos, movies, and television shows (Cunningham, Bainbridge, & McKay, 2007). In the new model of searching, users often look first for individual songs rather than seeking albums containing collections of songs or pieces. By and large, library catalogs still reflect the older model of music description, taking the album as the primary focus for the cataloging record. Although titles of individual pieces may be recorded in the cataloging record, they are often not indexed individually, as will be shown in the section on mapping library data to other external information sources.

How Users Search for Music Information (information relating to works, artists, and particular instantiations of works)

Users searching for musical works may also be interested in additional information about the work, its creator( s), and circumstances of its creation and distribution. LD offers the possibility to provide this information immediately, rather than having the user perform an additional search using other information services. What types of music information are users most interested in obtaining? Using content analysis techniques, Lee examined 1,705 Google Answers queries relating to music and found that the top three types of queries related to help in identifying musical works (43.8%), identifying artists (36.4%), and locating particular music recordings (16.7%; Lee, 2010, p. 1035). Although one may be tempted to characterize these queries as “known-item searches,” Lee distinguishes them from those conducted in the catalog (as discussed earlier), because they are really queries for information about an object, not a search for the items themselves as one may assume is the given objective of most catalog searches. Users could access such information via cataloging records, however, if key access points such as creator and title were linked to external data sets.

Content-based Music Retrieval

Content-based music retrieval. Another relevant area of research in music information retrieval employs analysis of aspects of the recording’s signal such as the melody, timbre, tempo, mood, or other musical characteristics that can be used for queries and filters. Although musicians may long be familiar with thematic catalogues found in the table of contents of a composer’s collected works, content information retrieval systems for music can allow searching by sequences of notes (“query by humming”) or by other features.1

Probabilistic information retrieval built into such systems also provides the engine for music recommender systems such as Pandora and Last.fm, which will serve up new musical selections to users based on analysis of various criteria found in music tracks that the user has already heard and indicated that he or she likes. The data driving Pandora’s recommendation system was gathered through the Music Genome Project, a taxonomy of musical information that provides a structure for analyzing hundreds of distinct musical characteristics (Pandora, 2012).2

Content-based retrieval may also aid in known-item searches when the listener hears a musical work on the radio, in a television show or film, or at a live performance that he or she wishes to find, but for which the creator(s), title, or performer(s) is unknown. Lee suggests that musical similarity can be helpful in information retrieval systems to generate potential answers to music queries, based on his study of the Google Answers data (Lee, 2010, p. 1041). However, Lee indicates that the concept of musical similarity may be poorly defined at this point and should not use only analysis of content features of the musical work. Contextual information about creators, their musical influences, and their origins may also impact user judgment of musical similarity among a group of works.

Context as a potential source to expand retrieval of music information

A musical work is located within a larger web of information, including data about the composer, the circumstances of the work’s creation, the various performers of the work, the details about performance events, and the work’s relationships to other works, as traced by scholars and other consumers of the work. These bits of information together constitute a history of the production, distribution, and use of the work. Smiraglia notes that “Musical works exist in time as a sort of continuum along which all instantiations lie—even a manuscript, which might lie near one end, is only one instantiation, and likely the most authoritative. Rather, works derive much of their meaning from their reception and continuous reinterpretation in evolving cultures” (Smiraglia, 2002, p. 753). The next generation of information retrieval systems must provide ways for music seekers to locate not just the work itself but the larger body of contextual information about the work. In their study of music information needs and uses, Lee and Downie (2004) found that users ranked descriptive metadata and extramusical information highly; after the title of the work(s), users valued lyrics and artist information as the second and third most valued data they were seeking. They also found that users prefer online resources for extramusical information.

More and more of the contextual information about works’ creation and reception has begun to be compiled and distributed online; an increasing amount of this information is available in the form of LOD. In the past, this information was accessible only via the liner notes found with the items themselves, or in popular or research publications, but, now, it is increasingly available in online databases. Waters and Allen urge new research and development of tools for connecting traditional library catalog resources to these new sources of information (2010, pp. 251–252). This additional contextual information may also help searchers differentiate among the many instantiations of a popular musicalwork that may proliferate through multiple derivatives and mutations, such as interpretations or translations of the work over time (Smiraglia, 2002, p. 761).

Ontologies for Music

Within the context of this research, MO and related ontologies and applications are an important part of the literature review. In building the MO, various existing ontologies have been reused and integrated in MO, including the Timeline Ontology, the Event Ontology, the Friend Of A Friend (FOAF) Ontology, and the Functional Requirements for Bibliographic Records (FRBR) Ontology as a foundation (Raimond, Abdallah, Sandler, & Giasson, 2007). The latter two ontologies give the MO the ability to describe music-specific concepts with great flexibility. The most prominent use of the MO is by the British Broadcasting Corporation (BBC), which uses LD technologies in both its BBC Music and BBC Programmes services. The MO and Programmes Ontology form the basis for how the BBC uses LD (Kobilarov et al., 2009). The MO was used in several influential music-related projects, such as the MusicBrainz metadata repository, the DBTune projects, the Magnatune repository, and the Jamendo repository. Corthaut, Govaerts, Verbert, and Duval (2008) conducted research on eight music metadata standards, including the MO. The standards were compared “to determine how good the metadata clusters are represented in the different metadata standards” (p. 252). The MO was represented in all the metadata property clusters except for the meta-metadata cluster. Then, the standards were compared with the application domains and given a score from 1 to 100 based on how well they perform in that application domain. The MO performed very well in all eight of the application domains. Music library/encyclopedia, music recommendation, and music retrieval were the three areas that were most applicable to libraries.

Using Semantically Interlinked Online Communities (SIOC) and MO together, Web 2.0 music content such as user-generated music playlists can be represented (Passant& Raimond, 2008). Last.fm uses theMOand Event Ontology to represent the listening habits of users, while FOAF can be used for this purpose as well by using the foaf:topic_interest property. These services are using social networks to create music recommender systems in which tagged content is available to use as information for music recommendations. By using the Meaning Of A Tag (MOAT) framework, URIs can be created for tagged content, enabling users to make recommendations through the relationships established between the URIs. Meanwhile, information on the LOD cloud, such as information from DBpedia, can be used to help find relevant musical content. These strategies “can be combined together for advanced querying and suggesting data” because “the underlying models are also interlinked” (Passant & Raimond, 2008, p. 10).

On the other hand, the MO also allows for many extensions such as the Key Ontology, Instrument Taxonomy, Genre Taxonomy, and others (Raimond et al., 2007). Jacobson, Raimond, and Sandler (2009) presented a Similarity Ontology because the MO has only a limited ability (using the mo:similar_to property) to indicate the similarity between two different music artists or songs. The Similarity Ontology was created to provide a rich and flexible way to describe similar relationships between different concepts in the LOD environment. The creation of the Similarity Ontology was necessary because of a lack in the MO, but the Similarity Ontology would not be usable without the MO framework and “the technology and infrastructure provided by the Linked Data community” (p. 37). An ontology for a music recommendation system that uses mood and other features in the music was created to fill the gaps that the MO does not cover. It was developed using concepts from MO and other music-related ontology projects such as MusicBrainz (Rho, Han, & Eenjun, 2009, p. 715).

Extended use of MO can also be found in related endeavors such as the Foafing the Music project (Celma, 2006). The goal of the project is to create a music recommendation system that uses information about things like listening habits to help recommend music recordings for listeners. The project created its own ontology that is mapped with the MusicBrainz Ontology, but it uses the MPEG-7 standard for description instead of MO. Though not related to the MO directly, the project does fill a gap that the MO does not cover. This system uses the information from the FOAF profiles, such as interests and listening habits, to detect the music artists and bands. Once this information is found, the system computes any similar artists and ranks them by relevance.

Approaches to Alignment

The efforts of contributing to and consuming LD cannot proceed without researchers making connections between different ontologies and metadata schemas that have been developed, used and reused, or derived by different communities for meeting different functional requirements. “Schema Mapping and Data Fusion” were listed as research challenges by Bizer, Heath, and Berners-Lee (2009) in their paper “Linked data—the story so far” (pp. 16–17). Technologically, RDF Schema and OWL Web Ontology Language have defined terms regarding relationships between ontological classes (e.g., owl:equivalentClass and rdfs:subClassOf) and properties (e.g., owl:equivalentProperty and rdfs:subPropertyOf) to support the connection among different schemas. Many researchers have tried different approaches to establish such connections.

Alignment focusing on ontological classes. The LOD cloud has been the most obvious achievement of the LD movement since its appearance in May, 2007. By the end of 2011, nearly 300 data sets had been included. It was noticed, however, that the interlinks of data sets are mainly on the instance level through owl:sameAs and are clustered within a number of separate major thematic domains (Jain, Hitzler, Sheth, Verma, & Yeh, 2010). The importance of class-level linkages was addressed by the Upper Mapping and Binding Exchange Layer (UMBEL) project when the LOD cloud started to expand (Bergman, 2008). This resulted in the development of an LOD constellation diagram for the classlevel linkages. The definition of class-level linkages was based on one of four possible predicates: rdfs:subClassOf, owl:equivalentClass, umbel:superClassOf, or umbel: isAligned (Bergman, 2008).

Class-level matching systems involving music ontologies have been reported in various conferences. For example, a schema-level alignment system called BLOOMS uses the idea of bootstrapping information already presented on the LOD cloud (Jain et al., 2010). It generates links between class hierarchies that are rdf:subClassOf relations. The research provided details of the class-level alignment between the MO and BBC Program Ontology as well as between MO and the DBpedia Ontology, among other alignment analyses.

In contrast to the top-down approach of UMBEL, Nikolov and Motta (2010) took the bottom-up approach to construct a network of class level mappings. The authors emphasized that, although the general metadata description about a data set may provide basic information of its subject domain and coverage, revealing schema-level correspondences that reflect the coverage of topics by available ontologies would be helpful for the publishers to decide the data set with which to connect. For example, by looking at the ontologies that contains such classes as mo:MusicArtist and dbpedia:MusicalArtist, the facts about whether the data are overlapping can be revealed. For this project, we limited alignment to the class level, however, and did not establish mappings between properties.

Alignment Focusing on Properties

Alignment focusing on properties. Berstein, Madhavan, and Rahm (2011) summarized generic schema-matching techniques. Here, schema matching is defined as a process of generating correspondences between elements (i.e., properties) of two schemas. Several techniques have been used in schema matching, including linguistic, instance-based, structure-based, constraint-based, and rule-based matching, as well as using auxiliary information. Emerging techniques include graph matching, usage-based matching, and matching based on document content similarity or document link similarity. The authors indicated the trend of increasing convergence of schema matching and entity resolution approaches, that is, matching at the metadata level and matching at the instance level. In the instance-based matchers, the similarity of schema elements is derived from the similarity or overlap of element instances. Nikolov, Uren, Motta, and de Roeck (2009) used an approach that captured schema-level relations between LD sets based on available instance data and reused these relations to facilitate generation of new coreference links. Through the interlinking, they established relations between classes and properties (e.g., between movie:music_contributor and dbpedia:Artist and between movie:actor and dbpedia:starring). An interesting fuzzy relationship, #overlapsWith, was created (similar to umbel:isAligned) to handle the situation outside of owl:equivalentClass or owl:subClassOf, mainly stating that two classes share a subset of their individuals. A quantitative measure was used to indicate the strength of correlation. This is called an “instance set similarity.” The authors also utilized information on the thirdparty data set when there were existing instance-level coreference links with third-party repositories, that is, when instances in a data set were linked to a schema used by a third party data set.

Tejo-Alonso, Berrueta, Polo, and Fernández (2011) analyzed properties used in popular OWL ontologies and other web vocabularies. The team also developed a tool, Parrot, which generates documentation for ontologies, rules, and combinations of rules and ontologies (Parrot, 2012). Parrot’s internal memory records the artifacts that a semantic asset describes, mainly ontologies, classes, properties, instances, rules, and rule sets. The system also establishes direct and inverse relationships as well as indexes of the artifacts.

Mapping Ontological Classes to Instances

In a study of linking ontological concepts and relations to their realizations in the texts, researchers at the University of Heidelberg experimented with a method to extract Wikipedia articles corresponding to ontology classes of MO (Reiter, Hartung, & Frank, 2008). The goal was to detect the most appropriate Wikipedia article for a given ontology class. To ensure that the article represents a class rather than an instance of a class, the researchers used a number of features (e.g., translation distance and infobox templates) to detect the appropriateness of the link targets.

Tools for Interlinking

Researchers also invested in developing tools so that some manual data set interlinking could be decreased semiautomatically or by the users completing tasks as part of a game. Wölger et al. (2011) reported the finding of a survey of data interlinking methods and tools from manually, game-based to semi- and fully automated interlinking. The aspects related to the field of interlinking include the following: (a) manual interlinking (or user contributed interlinking), for which the interlinking of data sets is based on information provided by a user; (b) game-based interlinking that provides incentives for users to interlink data by playing games; (c) semiautomatic interlinking, which combines analysis techniques and human judgment; (d) collaborative interlinking based on semantic wikis; and (e) automatic interlinking. The authors concluded that, even with a tool that automatically interlinks data sets, by selecting appropriate matching techniques or selecting the types of links, human review of the results is needed in most cases simply because complicated decisions must be made in subject domains.

Reconceptualizing the Life Cycle of Musical Works
[Secondary Header]

Before considering the methodology and results of the research presented here, it is critical to describe the framework that guided our investigation. The study required a representation of the life cycle of a musical work that encompassed many different types of activities, including creation, performance, publication, and the many types of uses of music. It also had to be inclusive of all types of music information, not just bibliographic data, encompassing the many types of data relating to music beyond basic descriptions of objects, including information about creators, related works, musical styles and genres, performances, and recording events. In addition, we wished to incorporate not only information relating to the creation and interpretation of musical works but also other types of use that may result in musical derivatives and other products that have close relations to the original work. Thus, the we created a representation of the music life cycle that embraces three spheres of activity: composition, production, and use.

In the life of a musical work, several key agents and events contribute to its content and context. The sequence of each event, the agents involved at each stage, and the resulting product may differ based on musical traditions in a particular culture. We wished to create a representation of the life cycle that took into account cultural differences in music composition, production, and use, so we attempt in the following discussion to distinguish betweenWestern and other traditions when relevant.

In the Western music tradition, creation occurs at the beginning of the life cycle, when an artist, or artists, composes the musical work. This process may generate multiple instantiations of the work, as the work is refined through multiple versions or arranged for different instruments. The work may be fixed initially as a written or printed musical score and may also be interpreted in a performance by the composer or other musical artists. These performances generate sounds, called a signal, which can then be fixed in a tangible form as an audio recording (either analog or digital). Users engage with the musical work in numerous ways: as listeners, as performers of the work, through scholarly analysis and interpretation of the work, and through social activities such as sharing the work with others, describing the work using tags or other metadata generation methods, or writing a review of the work. It is important to note that the same individual can embody more than one role (composer, performer, user), sometimes simultaneously.

At all stages of the musical work’s life cycle, one can associate many types of metadata with the work; these data elements document the events and agents that contribute to the work, in addition to the work itself. The Music Ontology (2010) aimed to provide a structure for the gathering and recording of such data by establishing a set of concepts and properties that can be used to document composition and production activities in the Music Creation Workflow. At the simplest level of expressivity, this workflow defined four concepts of music creation.

1. The musical work itself, including basic data about the title, creator(s), publisher/distributor, etc.
2. Performance, which is the event corresponding to an actual performance of the musical work.
3. The signal, which is the event of recording the performance as either an analog or a digital signal.
4. The musical manifestation, which is the release of the signal in a particular recording.

To supplement these four concepts, the MO added three additional entities.

5. Composition, which is the event leading to the creation of a musical work.
6. Sound, which corresponds to the physical sound produced by a performance.
7. Recording, the event that represents the transduction of the work from a physical sound to a signal, through the use of a device such as a microphone.

In designing the Music Creation Workflow, the MO has expanded the universe of data to include many contextual details about the history of a work’s composition and performances and also encompasses the important steps of signal production and recording, which many experts would argue contributes significantly to the nature of the musical work, just as do composition and performance.

The authors of this study suggest that this model is incomplete, however, insofar as it does not consider the third dimension of use, which is essential to the life cycle of musical works. In the MO model, composers and performers are the primary agents. In our suggested revision of this universe, the types of users/consumers are more varied and could potentially include listeners, artists, scholars, fans, and others who may influence the reception of the work. These users may embellish or modify the work in some way, such as through the creation of a new musical work or works, through works of scholarship, through criticism, or through other derivatives, which may involve activities such as ordering/reordering, tagging, or annotation. More discussions of this dimension will be presented in the Consumption section.

To reflect the addition of use to the representation of the music life cycle, we offer Figure 2 as an illustration. This model adds consumption, a third sphere of activity, to the creation and production activities that the MO workflow already depicts. This depiction now reflects three spheres of activity, all of which contribute to the musical work: composition, production, and use (Fig. 3). In the following sections, we interpret the nature of each sphere, including agents, events, and products that may result from musicrelated activities. We also explore the overlapping relationships among these spheres, which are indicative of the nature of the creative process that encompasses music creation, production, distribution, and engagement.

Reconceptualizing the Life Cycle of Musical Works
[Secondary Header]

Composition Agents and activities in the composition sphere. In the composition sphere of activity, one finds the agents and events that contribute to the creation of a musical work. Depending on the musical tradition in which the music artist is engaged, that work may take tangible form or may exist as part of an intangible heritage that is remembered through memory and oral repetition. Composition activities may also include alterations to the musical work such as transcription, arrangement, instrumentation, and orchestration that result in additional versions of the work.

As one might imagine, the music artist is a primary player in the composition sphere. The music artist, as originally envisioned in the Music Ontology, is an individual or group of individuals who may participate in acts of composition or may also be performers responsible for the interpretation of a work, which is another type of creative act. In theWestern tradition, a distinction is often made between the role of composer and the role of performer. In other musical traditions in which improvisation is common, as in jazz, or in which the lack of a notation system means that a work is reconceived during each performance, such as in the Indian Raga style, performers often use acts of creation in the process of their performance. Even in the Western tradition of composition, often composers may also be heavily involved in production activities, such as when a composer also conducts his or her own works or performs producing or sound engineering duties in the process of preparing a recording of the work.

To reflect this dual nature of the musical artist, our representation of creation purposefully overlaps with production, in order to communicate this idea of creation through composition and creation through performance. The overlap of composition and production also represents the collaborative relationship between music artists and production personnel involved in producing and recording the sounds of a performance or in creating the published version of musical works. These individuals, such as editors, producers, and recording engineers, may have significant creative influence over the composition and performance processes. Finally, one may note that we have also overlapped the use sphere with composition and will further elaborate the ways in which use of existing music and information about music may affect activities in the composition sphere.

Mapping Ontological Classes to Instances

Data generated during composition. Most information systems found in libraries will record the names of the creator(s) and the title of the work but do not normally include additional data on a work’s creation, such as the time and place in which the work was composed or related artists and works that influenced the composition process. This type of origin information is more likely to be included in archives or museum information systems, because provenance is so critical to those description traditions. We consider that biographical information is a particularly important type of contextual information that is important to collect and manage. Such information may include facts about the creator(s) of a work and also the relationships between the creator(s) and other music artists and entities with whom the creator(s) has worked, including other composers, arrangers, lyricists, and librettists; performers and conductors; recording personnel, such as sound engineers; and record labels and publishing companies with whom the creator(s) has a relationship. It is worth noting that the MO specification provides a variety of classes and properties that allow the recording of metadata relating to the background of the creator(s), such as biographical data and the numerous agents contributing to composition of a musical work, such as the musical artist, compiler, composer, conductor, engineer, etc.

Production Agents and activities in the production sphere. In the production sphere, we have placed three types of events: performance, recording, and publication. In the MO creation workflow, from which we took initial inspiration for our depiction of the music life cycle, the model distinguishes between a performance by a musical artist, which results in a sound (that may or may not be captured by a recording device), and a recording, which results in an audio signal, or record, of a performed musical work once that signal is fixed in an analog or digital representation. This distinction is helpful, particularly when considering the different types of metadata associated with these activities and products. We also suggest that the publication of musical scores resides in the production sphere, insofar as this activity results in an object that may be consulted by users for further study and other activities and also may be incorporated in new works. As noted earlier, activities in the production sphere may overlap with composition and use activities and agents, particularly when agents normally associated with production activities perform creative acts that impact the composition or interpretation processes and affect the nature or structure of the resulting musical work.

Mapping Ontological Classes to Instances

Data generated during production. The various activities relating to music production and publishing generate significant metadata that constitute important contextual information about the musical work and the artists and other personnel involved in the process. Although some of these metadata may already be present in the records generated in cultural institutions such as libraries, particularly because they are often found and transcribed from information recorded on an object (such as liner notes), they are usually included in general notes or description fields rather than being labeled separately. This lack of granularity in the data structure found in most library catalogs and many digital collections data sets makes it difficult to use this information in searching for or linking to other data sources. Linked data sources may actually be better sources for this information, in that the structure of these databases may be more likely to distinguish among different types of data. Data sets that use the MO classes and properties are examples of sources that may provide more useful data than library sources are able to provide.

Agents and activities in the use sphere. Use of musical works, sometimes referred to as consumption in certain contexts, generates a considerable amount of useful data that has thus far been underutilized in traditional library catalogs but holds significant promise for enriching information systems. The inclusion of use agents, events, and products in the music life cycle reflects a shift in thinking about the relationship between a work and its description. In the next generation of music information systems, the object-focused approach to generating descriptive data—where vocabulary for description is taken from the objects themselves and their contents—will shift to an approach that is more focused on metadata drawn from the descriptions that users themselves attach to the objects.

The final sphere of activity, use, thus completes our depiction of the musical work life cycle. It expands upon the two spheres of activity suggested by the MO’s Music Creation Workflow, composition and production, by establishing an additional set of activities centered around use of the musical work in various activities and contexts, particularly those that might not easily be characterized as either composition or production. Users are the primary type of agent in this sphere, and they engage in a multitude of actions in relation to the musical work. These actions often create or establish the following:

• New musical works that incorporate some aspect or part of the original musical work (e.g., mash-ups that combine two or more works into a new work).

• Musicological works of scholarship that use annotation and commentary to create scholarly editions of musical works.

• Sources of information or metadata about a work, agents, and events associated with the work (such as Wikipedia pages devoted to a work, music artist, or performance).

• Networks of relationships among works, agents, and events (such as when users construct depictions of musical influences on artists).

The types of users are numerous and may include individuals playing the roles of listener, artist, reviewer, fan, or licenser, to name just a few. In the area of overlap between composition and use, we may find agents assuming versioning roles not associated with initial composition, such as arranger or transcriber. Although the MO defines listener as a property to allow for tracking people who have listened to a record or performance of a musical work, it has not further defined other types of user roles that would be required to exhaust this category. It would be worthwhile to explore various user roles and activities to understand this sphere more fully.

Data generated during use. The use sphere includes a number of activities in which users of musical works

commonly engage, and these activities often generate useful data. Some of the information generated from these activities might already be gathered in some systems, but if it was available as LOD, libraries could then share and aggregate data across many systems. Activities, and their resulting products, may include the following:

• Listening (commentary during and after performances, such as tweets, including associated hashtags, or blog posts, with accompanying tags).

• Description (records generated through cataloging or metadata generation, tagging, and other forms of description and categorization).

• Scholarly analysis (commentary, reviews, and other forms of critical analysis).

• Editing or reorganization of the work (new versions of works).

• Reuse of the work in new works (mash-ups or other derivatives of the musical work).

Listening is an activity that is already defined by many music information systems, particularly through listening statistics and playlists in use by music recommender systems, but such metadata could also be used to generate networks of related artists and genres. Tagging and categorization activities create additional entry points for works; combining tags drawn from multiple sources will increase the number of phrases as potential entry points to the work and can also provide more data to rank tags in order of popularity. Comments and reviews could be drawn from a wider variety of sources than is currently possible now, expanding beyond those proprietary sources now governed by agreements between the repository and data owners (such as the agreement between OCLC and Goodreads to display reviews from the latter in WorldCat records). We noted several relevant properties in MO that may help link to annotations about works and musical artists, such as biographical and discographical information, fan pages, and reviews.

Finally, information systems could expand the network of objects related to the original work to include any derivatives that incorporate the original work in new works (for example, new editions of musical scores, performances, new arrangements and remixes, mash-ups, or new works and critical analyses that sample from the work). Although the MO provides a number of stable properties to record “other versions,” relationships relating to compilations, sampling and mash-ups, remixes, remastering and rereleases, translations, and musical tributes, we suggest that the MO does not provide an exhaustive list and is focused primarily on version relationships for signals and recordings rather than musical scores.

In sum, the addition of a use sphere to the musical work life cycle results in a more complete illustration of the phenomena of music creation by acknowledging the impact of use on the musical work itself and the interpretation and reception of that work. In its expanded form, the model also provides a better understanding of and connection to the varying data sources that have developed around various music use activities. Finally, it provides guidance in the analysis and interpretation of the overlaps and gaps between library data structures and various data structures of musicrelated data sets, which was a primary objective of this research project.



This work was supported by a grant from IMLS. It is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.