Songverse: a digital musical instrument based on Virtual Reality

We can define Digital Musical Instruments (DMIs) as hardware-software solutions that are interactive and crafted to output sound according to users’ input. DMIs are well known to unleash users’ creativity but also to allow different and innovative experiences for the creation process, for example, smoothing the learning curve towards musical concepts such as rhythm and composition. On the other hand, Virtual Reality (VR) allows users to explore spatial interfaces in a natural and limitless way, which shows potential synergy towards the rise of new DMIs. In this paper, we introduce Songverse, an immersive DMI placed in a Virtual Reality scenario that allows users to create music by interacting with an environment designed to resemble the outer space. By adding systems, planets, and satellites to the virtual environment, the user can shape the produced sound through interactions that were extensively tested during the development phase. We then evaluated the instrument with musicians and non-musicians by interviewing and applying the System Usability Scale (SUS) to assess the easiness for people to create music using Songverse. As a result, users reported the use of the DMI as intuitive and easy to use, also highlighting the produced song as


Introduction
Acoustic instruments are a type of musical instruments that allow the user to perform directly on the sound-producing mechanisms in a tangible way. This interaction type is the main difference when compared to Digital Musical Instruments (DMIs), which are based on control systems that capture the user's intention through input sensors and translate it into parameters to the digital synthesizer (Marshall, 2009).
Tangibility can be defined as the capability of being touched. Tangible objects are real objects that humans can feel through the sensation of touch, and tangible tools guide intuition, leading towards a natural interaction with the object (Valli, 2005).
In the particular case of intangible DMIs, which usually require mid-air gestures to produce sound, building DMIs that are appealing to the user becomes a complex task due to the loss of intimacy between the user and the instrument. We can explain this loss of intimacy by the lack of haptic feedback, such as sensorial touching, vibration, and pressure from the instrument and the absence of the sense that the sound is coming from the controller (Cook, 2004), showing how important it is to focus on user experience when developing DMIs.
To achieve a better user experience, we chose a Virtual Reality (VR) scenario, which allows complete immersion through head-mounted displays. Due to the robustness that is present in modern technology, applications based on VR have stability in execution, and offer a high definition rendering pipeline, allowing immersive experiences. DMIs that are implemented in a VR scenario are often called Virtual Music Instruments (VMIs).
In this paper, we propose Songverse, a VMI as a music sequencer where the user can create music while interacting with a universe-like environment in a VR experience. The objective of this work is to create a musical experience useful to musicians and non-musicians, focusing on interaction. The interaction in Songverse consists of combining backtracks with different instrument samples by adding, moving, and removing them to create new music 1 . Songverse ambiance uses systems, planets, and satellites attached as musical platforms. The proposed theme symbolizes the endless possibilities of music creation. The structure of this paper is composed as follows: firstly, we present related works about the creation of interfaces and DMIs on eXtended Realities (XR). Then, we proceed to present our application and its features, as well as the focus on user experience and interaction design. Next, we display the conducted evaluation and its respective results for both musicians and non-musicians. Finally, we conclude this research paper and discuss the future works that we have planned for the application.

Related Works
Research works tackle the development and design of DMIs for music creation that is based on natural interaction principles. In particular, there are publications that propose DMIs developed for Augmented Reality (AR) and Virtual Reality (VR), which analyze how users interact with these particular environments.

AR-based music creation tools
In this subsection, we investigate the current literature on Augmented Reality (AR)-based music creation tools. An example of an application is the reacTable (Jordà et al., 2007), a musical instrument based on a tangible tabletop interface, proposes a tangible interface for music creation. By augmenting a tabletop, they allowed both casual and professional users to recreate the instrument by using a simple and accessible object. The interaction occurs through the manipulation of physical artifacts by rotating and moving them over the table, which triggers a direct change in the music that the system produces. Each of these objects represents a single synthesizer, which is tracked by a camera and has its parameters changed by its position and rotation. By using computer vision, reacTable provides a compact system, discarding the need for multiple sensors and additional hardware.
Another example of an AR-based DMI is the Illusio (Barbosa et al., 2013), which is based on an augmented multitouch interface combined with a physical pedal. The system allows its users to draw shapes and associate these shapes to recorded loops, working as a sequencer. This interaction was designed aiming to make the user resemble a playful environment, where it is encouraged to explore ideas. This inspiration was brought to our system in the sense that the user interacts with an environment that is detached from reality, providing a supernatural interaction.
Also based on tangible tabletops is the Scrapple (Levin, 2006), an instrument which uses spectrography as an input device to generate music in a music-loop perspective. By using a camera, the system scans the table for drawings or the presence of variously shaped objects. The scan happens from left to right, synthesizing sound in real-time by interpreting these objects in a periodical loop, depending on the tempo that is controlled through an external knob.

VR-based music creation tools
Even though the literature examines DMIs for a long time, applications for immersive Virtual Reality have only recently gained focus. This delay is mainly due to the recent advances in head-mounted displays (HMDs) technologies and the availability of such devices, and the processing capabilities of new graphical cards, which can render complex and well-textured scenes in real-time. In this subsection, we review the literature on music creation tools based on Virtual Reality.
The VRMin (Johnson and Tzanetakis, 2017) is a system focused on musical tutoring of the theremin, an electronic musical instrument that the user interacts without physical contact. The system tracks the users' hands and maps them to a 3D hand model on the virtual environment. The system uses the hands' positions to guide the user to the correct position concerning the instrument's antenna. However, due to this mapping, the user's attention is drawn from the auditory feedback to the visual feedback. Also, by using Google Daydream as a head-mounted display, the system is prejudiced for its low field of view, making it difficult to visualize the events that happen on the periphery.
Another experience based on Virtual Reality is EXA 2 , a music creation experience that allows users to create and interact with virtual DMIs. In the experience, the user interacts with the instruments by using several virtual tools resembling drumsticks or even violin bows. The user is also able to create several backtracks and keep them on a loop while interacting with other instruments present in the scene. Since the application allows the user to create endless musical possibilities, it is required to have some level of musical knowledge in order to better experience the application.
In Crosscale (Cabral et al., 2015), the authors proposed an interface for a virtual instrument where they implemented a particular mapping of touchable virtual spheres, serving as metaphors for note keys and chords of a given musical scale. These spheres are arranged in the scene forming a playable spatial instrument where the player performs sequences of notes by interacting with them. The placement of the spheres is made in a way that the distance between notes is shortened, allowing users to execute note combinations quickly.
ChromaChord (Fillwalk, 2015) proposes interaction through hands with the Leap Motion, a device that is used to track and estimate the hands' 3D pose. Three panels are created on the user's viewpoint, allowing them to change the synthesizer configuration and to interact with the VMI. ChromaChord interface is composed of two rows of colored rectangles representing musical notes. Melodic sequences can be performed with a single hand, but the action is susceptible to the Leap Motion's tracker. Users reported false positives issues that were found to be related to the tracking stability, primarily with fast movements, resulting in actions that are not intended by the user.
Another example of VMI is the Cirque des Bouteilles (Zielasko et al., 2015), an interface to play songs on virtual bottles. This VMI focuses on solving the challenge that is playing on physical bottles due to the arrangement of the environment, thus proposing changing this environment to a virtual one, where parameters can be tuned to provide a better experience. The user interacts with the system by blowing on a microphone, and the application plays a note that is controlled by the user's hand through raycasting.

General music creation tools
Outside the context of XR, there are also other music creation tools. For example, the Incredibox 3 is a musical application that allows the user to quickly create music by asserting beatbox samples to a "crew" of musicians. This type of interaction allows users to combine different sound samples and create new songs interactively. This application comes as a reference to bring good musical experience to newcomers in music given its proposal to allow pre-made samples to be used during the creation process, restricting users' possibilities and guiding them on the music creation process.
Apart from the easy interactions and smooth learning curve on Incredibox, an important feature was the creation of a narrative within its gameplay. As the user experiments with different permutations of the crew, some events occur that congratulates the user for permutations that work particularly well, rewarding them by unlocking unique cutscenes.
Although our work has inspirations from interactions and interfaces that were proposed and analyzed in these publications, the immersiveness factor is the central differential between our proposal and the research works that are present in the current literature.

User experience
We began the development process focusing on the user experience. In the first phase of the conception stage, we defined the features and elements that would compose the experience after defining that non-musicians interested in music creation was our target audience. Our objective was for the application to be enjoyable and easy to use, not demanding any technical musical knowledge from the user.
After defining the basis for our application, we applied Brainstorming, a technique in which participants aim to find a common conclusion for a specific problem by spontaneously gathering a list of ideas and discussing them. We defined various concepts for our application in this phase of the conception stage, such as the outer space theme and application name.
A design technique called Storyboarding (Truong et al., 2006), commonly used in filmmaking, was applied to branch out different possibilites for the application flow and interactions. In the Storyboarding technique, each participant comes up with a sequence of sketch frames of how they think the user should use the application. The goal of this technique is to generate different ideas, compiling the best of each into a single storyboard. In Figure 1, we can see some of the ideas sketched to our system. This technique proved useful, consolidating interaction ideas and creating discussions about them. Also, serving as the quick-start for most of our base interactions and concepts, such as the planets orbiting around the user and satellites as added elements to the whole music.

Music composing
The experience begins with the user placed in a blank dark universe canvas, surrounded by emptiness and the light of a few stars. As the user begins to insert new elements into the environment, the universe starts to come to life, with new stars, asteroids, and small visual elements appearing. In order to compose the universe, the user can insert planets orbiting around them, by using the applications tools menu. We show this menu in Figure 2, from where the user can select the planet to be inserted in a system.
The metaphor used to create the menu was that of a painter's color palette, adapted from a simple 2D menu, which is popular in 3D system control techniques (Gebhardt et al., 2013). This metaphor proved to be useful in applications focused on free interactions in canvas, such as Google Blocks 4 and Microsoft Maquette 5 . Since this palette is attached to the user's left hand, it is always accessible, so they can simply drag the celestial bodies from the palette to the canvas.
Users create the base track composition through the addition of systems, planets, and satellites to the universe. Systems serve as effects that the user can apply to the samples coming out of it, transforming the sound with effects such as reverb and chorus. Planets are placed inside each system, as shown in Figure 3, and each planet represents a specific type of sound, such as drum beats for rhythmic planets, guitar, and piano chords for harmony planets and guitar riffs for melodic planets. Each of these planets serves as visual cues to the user, since they do not produce sound by themselves, but serving as anchors for satellites.
Placing a planet in the scene creates an orbit around it, which the user uses to place satellites, the elements that generate sound. By dragging a satellite with the controller's raycast to a planet, it starts to orbit around the planet, and, from the next beat's turn onward, it begins producing one of the samples associated with the planet. The system plays these samples in a loop, and the user manipulates it from a distance using raycasting (Jerald, 2015).

Early prototypes
In early versions of the experience, each satellite added to a planet required the user to select in which beat each sound would be played, analogous to an electronic music sequencer, as shown in Figure 4. However, upon user testing, we verified that this required some technical knowledge and restricted the playfulness of the application, confusing our intended target audience, which generally would not have such musical skills and thus would be taken away from the immersion. This interaction also did not translate well to the VR environment and the outer space metaphor since it was timeconsuming to create and edit each planet in order to achieve the intended musical result. Therefore, we adapted it so the satellites represent predefined samples, which are musically stacked onto each other, inspired by Incredibox.
To create the music's rhythm, the user can select a drums sound planet, and different kinds of samples (satellites) can  be attached to it. By design, all the presets available can be freely combined without friction or losing musical coherence of the planet or the universe as a whole.
Also, in early versions, aiming to add more interactivity to the application, while the planets and their satellites added to the universe are playing the music's backtrack, the user was free to create the melody of the song on-the-fly. Different approaches for this interaction were prototyped, and we describe them below. However, these interactions were discarded in the final version of the experience. This choice was mainly due to the difference in the sound quality between the samples played and the MIDI synthesizer.
The first prototyped interaction was a virtual xylophone. The user would tap in the virtual spheres located around its position using the wands, as we show in Figure 5. Since the hand could pass through the object, this approach proved to be inadequate due to the lack of tactile feedback, which would break the plausibility illusion (Slater, 2009), but mainly due to the poor musical experience that was created by the experienced delay in the attack phase of the notes. Also, as we wanted to provide a way in which the user could sustain notes and swipe through the instrument to simulate a pitch bend of that note, this prototype was discarded.  A later prototyped interaction was a theremin-based instrument, as shown in Figure 6, in which the user could slide the controllers through a horizontal monolith placed on the scene. This action generated different continuous sounds based on the coordinates of each controller on the target object. Achieving the desired sonority was a difficult task since each movement changed the synthesized note. This difficulty led to unwanted sounds being generated with the smallest movement, breaking the note's sustain, and beginning another attack phase. Figure 6. A theremin-based instrument that was also discarded earlier in the development stage.
The final prototyped idea was to put six particle emitters next to each other and pointing upwards in front of the player. Each emitter represents a note that plays when a hand crosses the flow of particles. It is possible to extend the note by keeping the hand blocking the particles' way. Since the user is not touching any physical object, the lack of tactile feedback is not an issue and also frees the user's field of view. However, the difference between the sound coming from the synthesizer and the background samples were significant. Due to this difference in sound quality, we chose to use only samples on the final application, discarding any interaction that would generate sound on-the-fly.
Then, by creating multiple systems, planets, and satellites, the size of the universe would expand, and users would have difficulty reaching every part of it. Then, we provided a way by allowing users to shrink down or enlarge the scene, getting the planets closer, facilitating edition, and allowing the user to play with the sound's spatiality.

Design
At the beginning of the development process, we decided that we did not want to replicate real-life universe, abstracting from its physics laws and appearance. Our idea was to create a playful and entertaining experience, taking advantage of the inherent advantages of VR, such as immersiveness and spatiality through the metaphor of musical universe creation.

Planet's design
Aiming to improve the ludic factor and to help the user dissociate the experience from the real-life universe expectations, we chose to add a cartoonish visual to the planets. In Figure 7, we show an example of a planet that we designed in order to resemble a city. This design choice accelerates the prototyping of the solution and helps to improve the overall application performance.
Initially, low-polygon planets were modeled using Maya 6 6 Available at https://www.autodesk.com/products/maya/ overview Figure 7. An example of a planet that was created with a cartoonish appeal. and Cinema4D 7 . Although they were good-looking and pleasing, as we show in Figure 8, these planets looked static within our universe, and it was not trivial to animate specific parts of the planet in real-time according to the music's beat. Finally, we decided to use Google's Tilt Brush 8 , a roomscale virtual canvas application, which allowed us to create the planets in a more immersive and direct way. Using Tilt Brush, we designed the planets by 3D painting them directly into a virtual environment, allowing anyone without technical knowledge to work on it, speeding up the process. The modeled assets were exported to Unity3D 9 by using Tilt Brush Toolkit 10 , which accelerated the prototyping process since we could import the designed planets quickly to the scene and analyze if they fit the visual context. Also, thanks to a specific feature from this toolkit called music synchronization, we were able to easily sync the model animation according to the chosen music samples, creating a more synesthetic experience. In order to create a more diverse and vibrant universe, each planet has its design and characteristics, looking to match the variety of instruments and sounds available for the music composition. 7 Available at https://www.maxon.net/en-us/products/ cinema-4d/overview/ 8 Available at https://www.tiltbrush.com/ 9 Available at https://unity.com/ 10 Available at https://github.com/googlevr/ tilt-brush-toolkit

Development
This experience was developed focusing on execution on high-end computers using a Windows Mixed Realitycompatible headset, as well as a pair of 6DoF tracked wand controllers. This decision allowed us to focus on the user experience rather than excessively optimizing the application to run on devices with low computing capability. By using 6DoF tracked head-mounted displays and wads, the application allowed for more natural interactions, giving the user a more immersive experience.
We used Unity, a cross-platform gaming engine for development. This gaming engine has extensive support for developing Mixed Reality applications, as well as a variety of development kits and toolkits, such as the Microsoft Mixed Reality Toolkit (MRTK) 11 , Virtual Reality Toolkit (VRTK) 12 and SteamVR 13 . For this project, we chose to use MRTK, which contains various interaction features that we used extensively already implemented, such as controller button mapping, interactable buttons, grabbable objects, and raycasting interactions.
Unity also has different third-party sound-generation libraries available, which were heavily used in the prototyping phase. In the final experience, we only use sound samples based in the Ogg sound format. This format was chosen due to its permissive license and the efficient streaming and manipulation possibilities.
During the prototyping of the melodic interactions, we have used different MIDI synthesizer libraries, but as we explained in previous sections, these are not present in the final experience. This decision happened as we judged that it did not merge well with the chosen sound samples that had a natural and acoustic appeal.

Evaluation
To evaluate our application, we performed a system usability evaluation with 15 participants, in which 7 are classified as musicians due to musical expertise or participation in musical projects, and 8 are classified as non-musicians. This classification is made by the user during the interview phase. These participants were also grouped by experience with VR applications, where 6 have experience from development or prototyping of VR applications, and 9 have no experience or have used a few applications. Dividing the participants into these groups proved to be useful since it allowed us to obtain more specific information about the system. From this group of 15 participants, 14 are under-graduate and graduate students from the Federal University of Pernambuco, in Brazil, while a single participant is an employee in the university.
We asked the participants to use the system freely, allowing free musical experimentation and interaction with the mechanics of the system, and the total time of usage was counted for each user. The maximum usage time was achieved by a non-musician female without any experience 11 Available at https://github.com/microsoft/ MixedRealityToolkit-Unity 12 Available at https://vrtoolkit.readme.io/ 13 Available at https://store.steampowered.com/steamvr with VR, which was 20 minutes and 9 seconds. The minimum usage time was also achieved by a non-musician without any experience with VR, but this time, a male, which was 3 minutes and 54 seconds. The average usage time was 10 minutes and 21 seconds for the 15 people that participated in the test. All of the participants tested the application with the Samsung Odyssey HMD in an ample space. In Figure 9 we show one of the participants wearing the HMD and participating in the tests. Figure 9. Users participating in the evaluation were presented with the option of using the application while standing or sitting down. In the left, a male user without musical knowledge is experimenting the application while standing, and on the right, a female user with musical knowledge is experimenting the application while sitting down.

Interviews
After experimenting with the application, we conducted an interview with all of the participants to better understand how key concepts of the implementation and interaction were seen by the users. In this interview, we asked some questions about the application concept and usability. The list of questions asked in the interview is the following: 1. What was your opinion about the metaphor of using planets as musical instruments? 2. What was the most interesting feature in the application? 3. What did not perform well? 4. Have you ever used an application with the same proposal in 2D or in VR? If yes, could you compare the application with ours? 5. Did you miss something in the application? Something that you would expect the system to have, and it did not. 6. Was the scene visually pleasing? 7. Did you feel like you could create music? 8. Did you think that the interaction was difficult? 9. Suggestions and general feedback.

Lessons learned
In this subsection, we discuss the users opinions about the system. We extracted all of these answers from the interview questions introduced in the previous section, and we will analyze them by familiarity with music and VR.

General analysis
We were able to extract useful information about the system's usability and attractiveness from the questions asked during the interview. We began by asking the users what their opinion about the metaphor of using planets as musical instruments was. Most participants answered that the metaphor was exciting and different, and only a few stated that they were unable to create the link between the planets and the musical instruments. Some participants also classified the proposed metaphor as synesthesic due to the planets' beats according to the music and the colorful environment.
According to the participants, the most exciting feature was the design of the application. Users stated that they felt immersed as if they were in outer space. Immersion is the degree in which an application projects stimuli onto the sensory receptors of users in a way that it proposes a vivid illusion of reality to the senses of the user (Slater and Wilbur, 1997). This characteristic has the potential to engage users in the experience (Jerald, 2015).
We also wanted to compare Songverse with previous experiences that the users may had in the past. Once again, the concept of immersion was a differential for our experience. Since most users have only used this kind of application in a 2D space, the immersiveness of the system gave the users a feeling of being infinite. As general feedback, users stated that the system was well integrated and suggested that the application could be expanded and have more options for samples and genres. None of the users experienced nausea or headaches while using the application.

Analysis by musical experience
The main feature stated by both musicians and nonmusicians was the ease of creating enjoyable music and the various samples possibilities that the users had available at hand, which allowed them to create different and creative music. We have noticed that, during the evaluation sessions, the pieces of music created by the users were very different from one another, meaning that there are many different possibilities of music creation with Songverse.
The participants were happy with their creation and suggested a feature to allow them to save the created music to their computer and to share the moment and their creations with friends in social networks. Even though the users have stated that they felt like they were able to create music, a few users asked for more samples options. The samples currently implemented to the system are based on a soft-rock genre, and these users asked for samples on genres that are popular in the north-east region of Brazil (the region in which we conducted the evaluation), such as brega and forró.

Analysis by VR experience
A common difficulty that appeared among the group of users that did not had experience with VR was in the scaling and rotating interactions. These interactions were done using the grip buttons on the wands, and the difficulty of using these interactions created problems with other interactions that were present in the system, such as deleting planets and inserting new satellites in planets that are far from the user, since reaching some of these planets were difficult. Since using the grip buttons to scale and rotate objects is a common interaction in VR applications, we attributed this problem to the lack of experience of this group. We show the position of the grip button in a wand in Figure 10 and a sketch of the scaling interaction in Figure 11. Users in this group also suggested a tutorial, as they had difficulty in memorizing the actions and interactions present in the system. Figure 10. Location of the grip button on the Samsung Odyssey HMD's wand (in red). The button is located in the wand in a way that it is easily reachable by the middle finger in both hands. Figure 11. Sketch of the scaling interaction. By pressing the grip buttons on both wands, the user would then drag the wands in diverging directions to enlarge the scene, or in converging directions, as in this example, to shrink the scene.
Another common request was for the possibility of previewing the samples. In the experience, users are only able to hear to the sample when it is placed in the environment, and especially when other samples are present on the scene, hearing the selected sample may be a difficult task. For the group of users that had difficulty with the scaling and rotating interactions, the possibility of hearing the sample before putting the satellite in the scene would help, since in a few cases, the users had to scale the scene to be able to delete a sample that they did not like.

System Usability Scale
After the interview, users were asked to participate in the System Usability Scale (SUS) (Brooke et al., 1996). This evaluation procedure allows us to understand the quality of the system's usability by quantifying how well the system is integrated and how easy are the interactions.
The questionnaire is composed of 10 questions, where 5 are positive statements about the users' experience during the use of the system, and 5 are negative statements. These questions are graded from 1 (strongly disagree with the statement) to 5 (strongly agree with the statement), and the final score is a note between 0 and 100, grading the usability of the system.
The questions are distributed as follows: • I think that I would like to use this system frequently; • I found the system unnecessarily complex; • I thought the system was easy to use; • I think that I would need the support of a technical person to be able to use this system; • I found the various functions in this system were well integrated; • I thought there was too much inconsistency in this system; • I would imagine that most people would learn to use this system very quickly; • I found the system very cumbersome to use; • I felt very confident using the system; • I needed to learn a lot of things before I could get going with this system.
We also extracted useful information about our application's usability by analyzing the answers to this questionnaire. On the statement that the application was easy to use, 86.6% of the participants agreed with it, indicating that the application can easily be used by most people. On the statement that the user felt confident using the system, 93.3% agreed with it, reflecting that the users are having an intuitive experience and know what to do even with few interaction cues from the evaluation conductor.
Songverse graded 82.83 in the SUS framework for all 15 participants, which can be translated to "A" (excellent). In this framework, the average grading for general systems is 68. Since we had the objective of creating a musical experience that could be used by both musicians and non-musicians, we also calculated the SUS score for these two groups. For the musicians' group, the SUS score was 78.92, which can be translated to "B" (good), and for the non-musicians group, the score was 86.25, which can be translated to "A" (excellent). These results show that non-musicians received better the application, mostly due to the ease of composing that was argued during the interviews.

Conclusion
This paper proposes a Virtual Reality experience in which the user is immersed in a space-like environment, allowing the user to create real music while manipulating a virtual universe. A variety of approaches, techniques, and concepts were prototyped in order to achieve the final results, which were evaluated with musicians and non-musicians and proved to be user-friendly. From the interview, we learned that this final application is an engaging, pleasant experience, visually and musically speaking.
For future work, we plan on refining the interactions upon the feedback received from the interviews and add more features to the application, including haptics support and interaction through hands. Also, we tend to explore more editing features and creating a richer composition environment. Finally, more genres will be added to the application, allowing users to choose between various possibilities.