License.. Using Interviews to Evaluate Location-Based Games: Lessons and

do Ceará | windson@virtual.ufc.br ] Abstract Games User Research (GUR) is an interdisciplinary field of study that aims to measure, analyze, and understand players’ interaction and experience with digital games. Joining efforts to the advances in GUR, this work focuses on the evaluation of player interaction with location-based games (LBGs), seeking to understand which aspects of this interaction can be explored through the application of interviews and how researchers and practitioners have been using this method. We analyzed 23 studies that applied interviews to this end and conducted an expert opinion survey with these studies’ authors. As a result, we presented lessons and research challenges for the use of interviews in this type of evaluation to encourage the conscious and systematic application of this method and guide students, practitioners,


Introduction
Games User Research (GUR) is an interdisciplinary field of research and practice, which ties together Human-Computer Interaction (HCI) and Game Development (Drachen et al., 2018;Abeele et al., 2020). GUR focuses on measuring, analyzing, and understanding players' interaction and experiences with digital games to optimize game designs (Abeele et al., 2020). Thus, GUR experts seek to understand how players experience specific game design choices and how they lead to specific emotional responses, aiming to design games that meet players' expectations and produce actionable insights to guide game development activities (Abeele et al., 2020). To accomplish this mission, the GUR community also works on adapting and improving methods from other areas to compose a toolbox that helps practitioners answer relevant questions about players' behavior and attitudes in different types of games (Nacke et al., 2016;Drachen et al., 2018). During the last years, GUR has obtained many advances in this endeavor.
Seeking to join efforts to GUR advances, this work applies GUR to the study of location-based games (LBGs), a subtype of pervasive games that use location technologies to integrate the position of one or more players into their rules as a central element of the game (Kiefer et al., 2006;Ahlqvist, 2018). This information modifies the game state at run time, creating a meaningful connection between the real and the virtual worlds and transforming the physical space in the game scenery (De Souza e Silva and Sutko, 2011). For this reason, LBGs have specific characteristics that distinguish them from other conventional games, such as mobility, spatial expansion, and pervasiveness (Kiefer et al., 2006;Ahlqvist, 2018).
In the last decade, not only LBGs have achieved great success among the public -especially with Pokémon Go (2016) and, more recently, Harry Potter: Wizards Unite (2019) -, but they also have conquered space in industry and academy (Kasapakis and Gavalas, 2015). Researchers have used these games in studies in different domains -such as health (Chittaro and Sioni, 2012), education (Oppermann et al., 2017), and tourism (Ballagas et al., 2008) -and consequently have faced new challenges, for instance, related to the evaluation of the player-game interaction and the resulting player experience (PX) (Paavilainen et al., 2017;Carneiro et al., 2019b).
In this context, we conducted a three-phased research with the ultimate goal of proposing a guide for qualitative evaluation of PX in LBGs, aiming to provide directions to researchers and practitioners on this task. The first phase consisted of a systematic mapping (SM) study (Kitchenham et al., 2010) in which we investigated how researchers and practitioners have been evaluating the player interaction with LBGs, as reported in Carneiro et al. (2019b). The results showed that PX had been the most commonly evaluated quality of the interaction and most studies applied evaluation strategies heavily based on surveys and questionnaires, mainly ad hoc instruments.
Surveys and questionnaires are extremely useful, as stated in the literature. However, they offer only an overview of the studied phenomenon and provide limited feedback (Lazar et al., 2017;Drachen et al., 2018). Thus, their isolated application should be avoided in studies that seek in-depth investigations, since an approach based on multiple methods is the most desirable path to follow (Lazar et al., 2017;Creswell and Poth, 2018). One of GUR experts' methods to investigate PX deeply and obtain rich feedback from players is interviewing (Drachen et al., 2018). This method offers several benefits for user research, especially in the context of PX evaluation, being useful to compose an understanding of the needs, preferences, motivations, and attitudes of players (Lazar et al., 2017). In GUR studies, interviews are an essential part of a qualitative testing session with users (Drachen et al., 2018).
Thus, this paper reports the second phase of our research, whose goal was to understand how we can use interviews to evaluate player interaction with LBGs, and how our community has been doing it. For that purpose, we made an additional analysis of 23 works identified in the SM that reported using interviews in their evaluations and conducted an opinion survey with the authors of these papers to investigate how they have used the method in that context and extract lessons from their experiences. This work's contribution is in identifying lessons and challenges to foster discussion and reflection on the use of interviews to evaluate player interaction with LBGs, thus encouraging its conscious and systematic application. These lessons can help practitioners and researchers, especially the novice, plan and conduct LBGs evaluations using qualitative interviews. It is worth to notice that the present paper is an extension of a paper (Carneiro et al., 2019a)

Location-Based Games (LBGs)
Location-based games are digital games that use location technologies to integrate one or more players' positions to their rules as a central element in their dynamics (Kiefer et al., 2006;Ahlqvist, 2018). This information about the player's location is applied to modify the game status during its execution, and creates a meaningful connection between the real and the virtual world, turning the physical space into the game scenario (De Souza e Silva and Sutko, 2011; Alha et al., 2019). Therefore, the most distinctive trace between LBGs and most digital games is that, in LBGs, players' actions predominantly occur in a physical environment, usually outdoor, public spaces (Leorke, 2018;Alha et al., 2019).
Thus, the idea of LBGs has two key elements: (i) players' position and movement in real-world spaces, as the game requires considerable locomotion, and (ii) a game dynamic mediated through some position-tracking technology (Ahlqvist, 2018). LBGs use location-aware technologies, wireless networks (e.g., Wi-Fi and 3G/4G), and portable devices (e.g., smartphones) to allow the communication between physical and digital spaces and between players (De Souza e Silva and Sutko, 2009;Leorke, 2018). Therefore, game interactions usually occur while players use their devices to explore the world, as they physically move around and, at the same time, visit points of interest inside the game.
Despite these elements that distinguish LBGs from conventional digital games, any survey on LBGs history can show that, in the last two decades, LBGs have assumed many and sometimes ambiguous forms -see the work of Leorke (2018) to an in-depth discussion of this history. Although there are some definitions concerning these games, previous studies have discussed that it can still be tricky to define LBGs precisely, as there is a diverse range of games that fit into this category (Kiefer et al., 2006;De Souza e Silva and Sutko, 2011;Ahlqvist, 2018;Leorke, 2018). As Leorke states, no simple definition that applies to LBGs and the task of attempting to define these games is also "not helped by the fact that there are almost as many terms used to define them as there are types of games" (Leorke, 2018).
Several researchers have attempted to draw a set of LBGs' core characteristics in the face of this issue. For instance, De Souza e Silva and Sutko (2009) indicate three main characteristics of LBGs: mobility, spatiality, and sociability. The authors point spatiality as an essential characteristic of LBGs since these games expand the magic circle 1 by creating a unique way of connecting players with one another and to space, thus defining a new logic for the game space. Another example lies in Ahlqvist (2018) proposal of five key dimensions to identify and characterize LBGs: location, spatial and temporal expansion, representation, and pervasiveness. The author also emphasizes that a game to be considered an LBG should require considerable physical movement of its players.
These studies indicate what can be considered the primary (or more typically addressed) characteristics of LBGs. Although relevant, it is not our intention to exhaust this discussion in this paper. As mentioned above, these studies are just a few examples of how LBGs have been subject to many interpretations by scholars and designers (Leorke, 2018). However, this brief overview helps us clarify the distinction between LBGs and other digital games, an essential notion to our reader, and highlight these games' specificities 2 . Instead of merely focusing on LBGs technological features, we approach these games, emphasizing meaningful aspects of the player-game relation, which can be affected by LBGs characteristics, such as the perception of spaces and the connection between life and game spaces. Building on Leorke's suggestion, we gathered this brief discussion as a way to lay the ground for an analysis that encompasses the diverse and multifaceted dimensions of these games.

Interviews in Games User Research
Interviews offer several benefits for user research and are valuable to encourage in-depth investigation of specific topics and answer questions based on the feedback provided by the interviewees (Lazar et al., 2017). In addition to being one of the most used research methods in HCI (Lazar et al., 2017;Shneiderman et al., 2016), interviews are useful in the context of games studies, especially when investigating the PX, because they help to build an understanding of the needs, concerns, preferences, and attitudes of players (Isbister and Schaffer, 2008). They also allow researchers to explore more complex issues than those addressed in other methods, for example, surveys (Schell, 2014).
In GUR, interviews are an essential part of a qualitative testing session with users (Drachen et al., 2018), as they offer one of the only ways to validate observations, discover problems, collect opinions and find causes for difficulties faced by players (Drachen et al., 2018). Interviews can be combined with other methods to enrich the collected data and create a holistic view of the user's thinking and behavior, being a primary part of the discovery and understanding of usability problems and setbacks in the player's experience. Thus, interviews may be the most appropriate choice for certain research objectives and types of knowledge desired (Isbister and Schaffer, 2008). For example, Nacke, Drachen, and Göbel indicate using interviews to evaluate the PX and capture the context and its social impact on the individual player's experience with serious games (Nacke et al., 2010).
Since this method presents the potential to reveal emotional dimensions of experience that are not always evident in an individual's behavior (Lamont and Swidler, 2014), several researchers have explored it, devising new techniques and approaches to apply it in different areas and contexts. For example, El-Nasr and colleagues El-Nasr et al. (2015) proposed a formative evaluation method that uses retrospective interviews to investigate how players accept and integrate a game in their lives, posing it as particularly suitable for research on pervasive games in naturalistic settings. Crawford, Monks, and Wells (Crawford et al., 2018) developed a virtual reality-based interview technique to assess candidates for medical emergency residency and identify their communication, problem-solving, and teamwork skills. More recently, Holmes Holmes (2019) has explored the play-based interview method, which allows observing and interviewing young children, taking into account their cognitive and language limitations, through playful activities that promote engagement and direct communication between interviewers and interviewees. These studies are just a few examples of the versatility and potential for using interviews.
However, it is necessary to highlight that planning, interviewing, and analyzing data is an arduous task that requires preparation and rigor, for applying qualitative interviews is not trivial. Questions content and the manner the interviewer poses the questions, for instance, can determine the difference between new insights and a waste of time. Applying this method brings real challenges to the interviewer, requiring specific skills, study, practice, and experience. Nevertheless, it can result in invaluable data. Hence, appropriate and specific instructions can aid the interview preparation process. Literature is filled with materials to help researchers and professionals (especially novice) plan and conduct qualitative interviews in different areas. Although several authors offer orientations and guidelines for using interviews (e.g., Turner III (2010); Blandford et al. (2016); Kvale (2008); Rowley (2012)), more specific issues will still require careful consideration.
Even though the difficulties associated with applying qualitative interviews are widely known, ironically, many researchers and practitioners see in this method an effortless manner to obtain data quickly. As identified by Myers and Newman Myers and Newman (2007), several studies report using interviews inadequately and superficially. Their study analyzed papers in Information Systems (IS), but this deficiency is not exclusive to their area. As we observed in our systematic mapping study Carneiro et al. (2019b), LBGs interaction evaluation still struggles with similar issues, such as the suitability of the chosen methods to the research goals, inadequate application of methods and techniques, and lack of rigor. Nevertheless, to the best of our knowledge, there are no studies focused on the issues of using qualitative interviews to evaluate LBGs. Thus, this work investigates these topics as an initial step towards the treatment of such problematic practices.

Methodology
In the first phase of this research, we conducted a systematic mapping (SM) of the literature to investigate how researchers and practitioners have evaluated the qualities of the interaction between players and LBGs. SM studies consist of a secondary study method that reviews existing primary studies, indicated to construct an overview of a research area (Petersen et al., 2015). The SM followed the approach proposed by Kitchenham et al. (2010), and aimed to answer three questions regarding LBGs interaction evaluation: • What methods do researches and practitioners use in LBGs interaction evaluation? • What qualities of the interaction do they evaluate?
• What strategies and approaches do they use?
We searched for studies on five sources (Scopus, ACM Digital Library, Web of Science, IEEE Xplore, and Science Direct) using the search string shown in Figure 1. As a result, we obtained an initial set of 437 papers filtered in a three-step process, applying the inclusion and exclusion criteria summarized in Table 1. The filtering resulted in the final set of 51 articles, which we analyzed to extract information to answer our research questions.

Inclusion
The study focus on LBGs AND It reports an evaluation process of qualities of the player-game interaction AND It is a primary study

Exclusion
The study does not focus on LBGs OR The reported evaluation does not assess qualities of interaction OR The paper is not written in English or Portuguese According to our results, the PX is the most frequently evaluated quality of the interaction. Regarding methods, the application of surveys and questionnaires was the most used, followed by interviews and interaction logs recording. Although many papers do not provide detailed information about the evaluation process's specifics, it was possible to notice that most studies use multi-methods approaches and ad hoc evaluation instruments. The complete report of the SM and its results are available in Carneiro et al. (2019b).
The findings obtained with the SM directed our focus towards a more in-depth examination of the use of interviews to evaluate LBGs since our initial analysis indicated a gap regarding the systematic exploration of the method in this context. We also observed that many studies reported interviews with less rigor than other methods, reducing its relevance. Thus, in the second phase of our research, we performed additional quantitative analysis on 23 papers (extracted from the final set of 51) that reported using interviews and conducted an opinion survey with the authors of those articles to understand their use of interviews.

Expert Opinion Survey
Opinion surveys aim to determine what the participants think about certain concepts (Ozok, 2009). When conducted with experts, they can serve various purposes, being useful for identifying problems, predicting changes, and clarifying relevant issues on a specific topic (Rowe and Wright, 2001;Darin et al., 2019), for example. Therefore, we created an online survey to deepen our understanding of how the authors of the 23 papers applied interviews to evaluate the player interaction with LBGs in their studies, aiming to identify gaps and extract lessons from their experiences with the method.
We invited the authors to participate in the survey mainly by e-mail. In the case of authors whose e-mail addresses we couldn't identify, we contacted them through the social network Research Gate 3 , when possible. We could not find six authors' contact information, so we excluded them from the mailing list, resulting in 81 recipients.
Each author received a personalized message, containing a brief description of our research, the title of her/his article identified through our SM, and a hyperlink to access the survey. Some participants authored more than one paper, so they received a slightly different questionnaire, which mentioned their identified works, but asked them to answer our questions considering their experience as a whole. Ten days after the invitation, we sent reminders to the authors who had not answered the survey yet. The questionnaire was available for approximately two months, and we closed it after the number of responses stagnated.
The survey had 17 questions (eight of them were open-ended questions) divided into three sections: (i) Researcher/practitioner profile; (ii) Your experience applying interviews -questions about the use of interview reported in the identified paper; and (iii) Your Opinion -a poll regarding the author's openness to a proposal of a guide to using interviews to evaluate LBGs. Appendix A presents the survey questions.
Fourteen authors, responsible for 11 papers, participated in the survey, representing a response rate of 17.28% (N = 14). Despite the relatively small number of responses, the literature attests to the fairness of this response rate since it advises the use of groups from 5 to 20 experts (Rowe and Wright, 2001). Besides, the fact that the participants have different 3 https://www.researchgate.net/ backgrounds and responded to the survey independently reduces the potential for biases in answers (Darin et al., 2019).

Analysis of the survey answers
We treated the answers to close-ended questions as quantitative data and performed basic quantitative analysis, while the answers to open-ended questions were analyzed qualitatively. The results were then crossed and combined to address relevant topics -as presented in Section 5.2 -in the light of the survey objective: understand how authors had used interviews to evaluate player-interaction with an LBG.
The goal of qualitative analysis is to turn the unstructured data into descriptions about important aspects of the situation or problem under consideration (Lazar et al., 2017). To this end, we analyzed the responses to the eight open-ended questions by combining two basic qualitative analysis approaches: thematic analysis and data categorizing (Preece et al., 2019). Thematic analysis is an analytical technique that aims to identify, analyze, and report patterns in the data, in which a theme is something important to the study goals. In our case, data categorizing involved inductive analysis to allow themes to emerge from the data itself and use the results to answer the study goals.
Thus, we tabulated the data, grouped it -initially, guided by the questions' themes -and analyzed iteratively to extract descriptions to answer each topic. In a second moment, we coded the data according to topics that emerged from itself. From the codes, we created categories that were systematically analyzed and combined with knowledge and perceptions gained through literature reviews to translate them into the lessons in Section 6. In Lazar et al. (2017)'s words, this application of experience and contextual knowledge is critical for the appropriate interpretation of qualitative data.

Results and Discussion
This section presents and discusses the results of the second phase of this research, summarizing the quantitative analyses performed on the 23 papers (identified in the SM study) that used interviews in their evaluations and the expert opinion survey with the authors.

Use of interviews as identified in the SM Study
The 23 studies reported evaluating 21 LBGs as shown in   Benford et al. (2006) Obscura Quek and See (2015)

Number of participants
The sample sizes used in the studies ranged from 6 to 96 users. The average number of participants was 24.56 users per study (SD = 19.72), and the most frequently used samples consisted of 10 or 24 participants (reported in 3 studies each). The works with the largest samples brought together 36 (Nilsson et al., 2016), 60 (Blum et al., 2012) and 96 (Sandham et al., 2011) participants.

Qualities of interaction under evaluation
We identified twenty qualities of interaction under evaluation, with an average of 1.87 qualities per study (DP = 1.14). PX was the most frequently quality evaluated (indicated in 14 works), followed by usability, immersion, and presence (evaluated in four studies each). Other less evaluated qualities were engagement, playability, and spatial presence, as shown in Figure 2. Most studies (12, i.e., 52.17%) focused on evaluating only one quality -in seven of them, that quality was PX. The work of McCall and Braun (McCall and Braun, 2008) evaluated the highest number of qualities (five), namely: usability, PX, presence, sense of place, and social presence. Of the 14 works that evaluated PX, seven also reported evaluating other qualities: usability, engagement, immersion, enjoyment, and presence.
It is worth mentioning that, frequently in the literature, some of these qualities (e.g., presence and immersion) are addressed as PX components, or even part of the set of properties that describe this experience, as stated by Sánchez et al. (Sánchez et al., 2012), for example. However, these works seem to treat PX and the referred qualities as distinct aspects of interaction without discussing this issue.

Evaluation methods and strategies
In addition to interviews, we identified ten other methods applied in the studies, as shown in Figure 3. Three (13.04%) studies used a single method (i.e., interview), while the other 20 (86.96%) followed multiple methods approaches (Lazar et al., 2017;Creswell and Poth, 2018). The three studies that used only interviews (Quek and See, 2015;Linehan et al., 2013;Ekman, 2007)   The other 20 (86.96 %) studies combined interviews with at least one more method: surveys and questionnaires were the most common method in the combinations, being used in 18 (78.26%) studies. Observation of use was the second most applied (14 studies, 60.87%), following a common tendency to link interviews to observation (Lamont and Swidler, 2014). Important to notice that only two (8.70%) studies (Chatzidimitris et al., 2016;Verdejo et al., 2010) used methods to monitor physiological data -both also applied interviews, observation, and interaction logs recording. A possible explanation for this is that the evaluation of an LBG, ideally, is performed outdoors or in environments that simulate mobility contexts, making it challenging to use some equipment types that capture these measures. It is also possible that difficulties linked to the context explain that only one work (Nilsson et al., 2016) used the Think Aloud protocol (Fonteyn et al., 1993). These cases illustrate some of the challenges that accompany the interaction evaluation in LBGs.

Data Analysis
Even though all the studies conducted interviews, fourteen (60.87%) did not inform how (or if) they analyzed the qualitative data. Three of these (13.04%) did not mention performing any analysis, be it quantitative or qualitative -two of them applied exclusively interviews (Quek and See, 2015; Ekman, 2007). The remaining nine (39.13%) studies described the analysis process with varying levels of detail and, in some cases, superficially. Only three papers mentioned qualitative and quantitative analysis, while the other six mentioned only qualitative or quantitative analysis (three studies each). Some techniques mentioned were: inductive thematic analysis, coding with ad hoc scheme, and affinity diagram. Two papers (Linehan et al., 2013;Diamantaki et al., 2011) stood out for providing more details on the qualitative analysis procedures of the data obtained in interviews.

Survey Results Overview
In the remainder of this section, we provide an overview of the expert opinion survey results performed with the authors of the 23 papers that reported using interviews to evaluate player interaction with LBGs.

Participants profile
The survey received responses from 14 authors from eight countries: Germany, Austria, Canada, Denmark, Greece, Indonesia, Mexico, and the United Kingdom. Twelve (85.71%) of them work in the HCI area, seven (50%) in Game Design, and six (42.86%) also work with LBGs. Only three participants (21.43%) listed GUR among their areas of activity. Seven (50%) participants have worked with game evaluation for ten years or more, five (37.71%) have between 1 and 6 years of experience, and two (14.29%) use this type of evaluation in their research only sporadically. Six (42.82%) authors mentioned being familiar with guidelines in the literature for interviews -the works of Steinar Kvale (2008) and Larry Wood (1997) were the most indicated -however, no papers mentioned using them. Table 3 summarizes the participants' profiles.

Authors' motivations for using interviews
The main reasons mentioned were searching for a deeper understanding of the qualities evaluated and rich and detailed feedback. Another common reason was the awareness that the interviews could complement the data obtained with quantitative methods, especially when the objective was to investigate users' perceptions, behaviors, and emotions. For participant A13, the interviews allowed them to scrutinize potentially interesting behavior patterns observed during usability tests. Other reasons listed were flexibility, direct contact with users, systematic feedback collection, and the use of interview data as a guide for interpreting quantitative data. Author A11 said that he chose semi-structured interviews because it is a quick and easy method to be applied outdoors. Accordingly, A6 pointed out that interviews allowed them to investigate in more detail critical issues observed in the tests because it offered more freedom to the participants, which generated valuable information that other methods could not generate. Since critical incidents can have real consequences for LBGs players -for example, errors in GPS or a misplaced point of interest can put the player in danger situations -it is crucial to investigate adverse conditions in this type of evaluation.

Elaboration of the interview script
All authors said that they applied semi-structured interviews in the referred studies. The primary approach for composing the script was authors elaborating it themselves, based on the evaluation and the study's objectives. Other common practices were to create the script during discussions between co-authors and to have more experienced colleagues responsible for this activity. Two authors adapted scripts from other studies, and one participant claimed to have used usability consultancy to create the script since game evaluation is not his area of expertise. Only three authors followed guidelines for applying interviews proposed in the literature -however, only one study (Kasapakis and Gavalas, 2017) reported this.
Most authors showed confidence in their previous experiences with the method, in discussions with colleagues, and in using the research objectives as a parameter to create the questions. Only two participants (A11 and A14) reported difficulties in preparing the script. They found it challenging to gather and address all relevant issues for the study. They also mentioned the difficulty in composing a solid script with a right balance between structure and flexibility so that the interviewer can follow the user's mental flow (which often produces unexpected insights), but still get proper comparators between the data.

Difficulties in conducting interviews
Five authors reported difficulties in conducting the interviews, highlighting the adversities inherent to the evaluation context (for example, low temperatures during tests performed outdoors). A common difficulty is that participants are more subject to distractions when the interview is conducted outdoors. The interviewer often has to call them back to the interview questions. In addition to that, A6 reported that, although interviewers get in-depth feedback on PX, users tend to talk a lot more about their views on the game aesthetic, making it more challenging to deal with questions. One of the most experienced authors, A14, pointed out the challenge of defining clear and effective communication between interviewer and interviewee, mainly when the evaluator aims to collect "technical" information from users who have little understanding of the subject, and highlighted the importance of a trained interviewer for situations like this. None of the presented options

Particularities of interviews to evaluate LBGs
We asked authors if, in their interviews, they had explored questions related to any LBGs characteristics or specificities. Ten (71.43%) of them considered questions related to at least one specificity of the LBG to elaborate the interview script. In general, they indicated location as the characteristic most explored in studies, followed by mobility and social interaction in the game. These choices are in line with the characteristics most commonly attributed to LBGs in the literature -as we mentioned in Section 2, De Souza e Silva and Sutko (2011) point location, mobility, and sociability as the main characteristics of these games. Other specificities considered were context sensitivity, connectivity, pervasiveness, security, privacy, and GPS accuracy. The least explored were temporal and spatial expansion -although both appear in the literature as essential aspects of LBGs (Ahlqvist, 2018) -and level of physical effort.
Regarding the reuse of the interview script applied in the studies, most authors (85.7%) believe that they could not reuse it to evaluate other types of games since the interview's focus depends heavily on the research objectives. They also mentioned that the scripts used in these studies were very specific, applying only to the game and context in question. Two authors (A1 and A6) said they would reuse their scripts, making minor adjustments to include contextual aspects of the LBG and issues particular to the quality of interaction under assessment. A6 pointed out that although "playing an LBG" is arguably different from "playing a conventional game on a smartphone", it is possible to use similar instruments to evaluate certain aspects in both games (e.g., the degree of realism of the graphics). However, when the evaluation focuses on subjective issues, such as PX, it is essential to consider each game's particularities, as they will dictate the necessary changes in the interview scripts for each case. In general, experts have attested to the benefits of using interviews in their studies.

Lessons to Use Interviews to Evaluate Player Interaction with LBGs
The results obtained with the first and second phases of our research allowed us to extract some lessons for conducting interviews to evaluate the player interaction in LBGs. Together, these lessons form an initial set of guidelines that aims to provide researchers and practitioners, especially novice, with resources for the conscious and adequate execution of such activity. Some of them can be applied to the evaluation of games and interviews in general, while others are specific to LBGs. It is worth noting that the adequacy of the information presented here should be judged based on the particularities of each research and stage of the design process of an LBGs, and does not apply equally to all cases.

Divide to conquer: in search of better strategies to evaluate LBGs
The survey's results highlighted the importance of combining different methods, as already recommended in the liter-ature (Creswell and Poth, 2018;Lazar et al., 2017). As expected, the respondents indicated to adopt this approach to evaluate LBGs in their studies, recognizing its value, since the isolated application of a single method does not usually provide the desired data. It was also clear that not all methods apply equally in this type of evaluation -at least not without some adaptation -since the characteristics of an LBG and its context of use can impose several restrictions on the evaluators. It is necessary to consider these issues when planning the evaluation strategies, keeping the focus on the study's objectives, qualities of the interaction to be evaluated, and characteristics of the LBGs. Some lessons related to these issues are:

L1:
Remember that the specificities of LBGs impact and distinguish the PX in these games. Identify and understand the main characteristics of the LBG under evaluation and use that information to outline strategies that consider and explore them. Choose and adapt the data collection methods keeping in mind that the LBG specificities will influence the conduction of the evaluation and the results obtained.
L2: Combine quantitative and qualitative methods, complementing their limitations, but avoid applying methods or equipment that mischaracterize or limit the player's experience with the specificities of the evaluated LBG. For instance, be cautious of using equipment that limits the player's mobility, since it plays a crucial role in LBGs gameplay.

L3:
Prioritize evaluation in external environments, including and exploring the inherent difficulties of that context (e.g., climatic conditions, distractions, GPS inaccuracy). These factors are part of the actual LBG gaming experience and should be considered in an assessment that seeks a realistic view of PX.
L4: When possible, combine laboratory research with real context research to gain a deeper understanding of the PX -if that is what you are aiming for (Drachen et al., 2018).

L5:
When planning an evaluation, consider the drastic changes that can occur in the context of mobile games (especially concerning LBGs) (Drachen et al., 2018).

Know to apply: how to enhance the planning and use of interviews to evaluate LBGs?
The use of interviews usually presents different challenges to the evaluators. In addition to that, since an often unpredictable context permeates LBGs, common issues can gain new nuances (for instance, a "minor problem" of delay in map update can take the player to an unwanted space, which can be a big problem). It increases the importance of developing precise and efficient interview scripts that offer both interviewer and interviewee flexibility and freedom to explore unpredicted situations. Thus, to best use this method and ensure the achievement of evaluation objectives, it is necessary to know its potentials and limitations to plan its proper use and execute it in a truly beneficial way.
L6: Semi-structured interviews are a good choice to evaluate PX in LBGs. They offer flexibility and freedom for users to express themselves while allowing the interviewer to take advantage of opportunities and further investigate specific points, maintaining some structure to make systematic data collection feasible.
L7: When preparing the script for the interview, focus on the objectives of the evaluation, but also prioritize the examination of specific issues of LBGs -such as mobility, spatiality, temporal expansion -and their influences on PX (e.g., How physical effort affects player's fun?).
L8: Identify which evaluation goals can be achieved through other methods and explore in the interview those that require further examination (for example, subjective questions and clarification of doubts). Some of the advantages of interviews to evaluate LBGs are practicality and speed, so be objective and simplify its application.

Interviews serve multiple purposes: the importance of taking advantage of this when evaluating LBGs
The authors pointed out several motivations for applying interviews in their studies. Versatility, flexibility, and (relative) speed in the application, for instance, make this method valuable for evaluations carried out outdoors. Interviews have proved to be a useful tool for LBGs evaluation, especially when investigating PX, as they serve various purposes and provide rich data that can be used in different ways (for example, to guide interpretation of quantitative data). Some lessons learned are: L9: Use interviews to investigate how the specificities of LBG affect PX in the player's view. Seek to collect detailed information about how he perceives and deals with the characteristics of the game and its specificities (e.g., security, effort, social image). This practice can assist in obtaining data that, among others, can generate guidelines of design for these games.

L10:
Combine interviews with user observation, for example, to investigate critical events (including those caused by external factors) and understand how players perceive, interpret, and react to them. These events can modify how the player interacts with the LBG, creating game dynamics that were not foreseen by game designers.

L11:
Use interviews to investigate how players integrate an LBG into their lives and its impact on the player's perspective and experience. Since LBGs are played amid daily activities, this kind of information is essential to improve PX and game dynamics.

L12:
In LBGs, the mix between real everyday social norms and the game world's rules needs to fit perfectly to support integration between the physical and virtual worlds. Use the interview to investigate issues connected to this relation (Jegers, 2007) L13: LBGs should allow a smooth and fluid transition between different playing contexts and not imply or re-quire actions from the player that could result in a violation of social norms in everyday contexts. These games should also allow the player to shift his focus between the physical and virtual parts of the game world without losing full immersion in the game. Interviews are a powerful tool to investigate whether a game has these characteristics (Jegers, 2007).

With great power comes great responsibility: interviews can be expensive, but worth the price
Interviews are an effective method for assessing the interaction between player and LBG -92.86% of the experts confirmed the method's effectiveness to achieve their goals. The use of interviews can result in several benefits for a study, such as direct contact with users, obtaining feedback that would not be captured with other methods, a greater understanding of the players' perceptions, and support in interpreting and verifying the data collected with quantitative methods. However, interviews should not be seen as an effortless way to obtain good results. Its proper use requires effort and rigor, and it comes with some limitations and pitfalls, like any other method. As attested by experts, interviews require planning, preparation, and can result in poor feedback if conducted frivolously. Besides, data analysis is often timeconsuming and laborious.

L14:
The interviewers must be familiar with the characteristics and specificities of the evaluated LBG so they can recognize and explore opportunities in the users' speeches and actions, allowing them to investigate better the PX offered by the game.
L15: A good question should contribute to the production of knowledge and promote good interaction in the interview. In this way, it is crucial to choose the questions to be asked in the interview carefully (Spradley, 2016).

L16:
LBGs gives players great freedom, which increases the variability of PX. Despite this, data must be collected systematically to ensure useful comparison parameters. When interviewing a user, ask questions about aspects of the experience that are common to all players -the characteristics of the evaluated LBG can generate topics to be consistently explored, for example.

L17:
When conducting an interview and taking notes, try to categorize your notes into sections during the session -it also serves for observation. This practice provides a useful shortcut to analyzing the collected data and can reduce the time spent reporting the results (Drachen et al., 2018).

L18:
Combine interviews with methods that capture data during the game and explore gameplay highlights in the interview questions to get more meaningful results. During the analysis, these data can be crossed and used to prove the users' statements with their actions.

Practice only makes for improvement: master the craft
Qualitative research can be a tricky task depending on the level of experience a researcher may have with a particular type of methodology (Turner III, 2010). Logically, it also applies to the interview. Contrary to what many people may think, it is not a regular conversation with a predefined topic, and it can even result in frustration and a waste of effort and time. To avoid this risk, the interviewer must appropriately train interviewing techniques to obtain the most detailed and rich data from users (Boyce and Neale, 2006).

L19:
These techniques include avoiding yes/no and leading questions, using appropriate body language, and keeping their personal opinions in check (Spradley, 2016;Boyce and Neale, 2006).

L20:
While one of the most important rules about asking questions is to keep quiet and give the interviewee room to talk, it is equally important to show appreciation and interest in what they say. They must make that interviewee comfortable and appear interested in what they are saying. It means that the interviewer should be sensitive to guide the user along with the planned topics without controlling them too much to avoid losing important and unexpected points. Beyond the technical and academic knowledge, only practice will reward an interviewer with the perception and sensitivity necessary to perform this task.

Challenges in Using Interviews to Evaluate the Player Interaction with LBGs
During the compilation of the survey's results, we also identified some challenges that reflect research opportunities, especially for the HCI and GUR communities. We present some of them as insights to be discussed and further expanded in hopes of maturing the practice of GUR to study and comprehend player interactions and experiences with LBGs.
• How can we plan evaluation strategies that adequately and systematically explore the characteristics and specificities of the evaluated LBG?
• How to structure the systematic application of interviews for the evaluation of PX with LBG?
• How can an interviewer ensure the systematic collection of meaningful information without compromising the flexibility of the interview?
• How to make the analysis process more practical and straightforward so evaluators can apply interviews in contexts that require higher speed, such as in the industry?
• How to guide novice researchers and practitioners to plan and conduct qualitative interviews to evaluate LBGs considering its specificities?

Conclusion and Future Work
Evaluating the player's interaction with an LBG is still a complex task, given these games' peculiar characteristics. In this context, this paper reported the analysis of 23 studies that used interviews to evaluate LBGs and the conduction of an expert opinion survey with the authors of these studies. We aimed to identify common practices and translate them into actionable recommendations to interested researchers and practitioners. Based on the obtained knowledge, we presented lessons and challenges -emphasizing the use of interviews -which point to the importance of considering LBGs specificities and investigating their effects on PX when evaluating the interaction. The information presented here is useful for assisting students, researchers, and practitioners, especially the novice, in this task and encouraging them to reflect and act on these issues. Concerning the use of interviews, we observed that, although there are plenty of works on this subject, a minimal portion of the experts used these directions in the studies we analyzed. This fact does not allow one to question the quality of the interviews conducted in their studies, but it opens space for reflections about LBGs evaluation practice. Since the scientific community deals with issues beyond practice, it seems relevant to investigate the reasons behind this modus operandi. It is possible, for instance, that practitioners judge the existing guidelines inadequate to the actual LBGs context or that the HCI evaluation is treated as something secondary in these studies. Either way, it is necessary to investigate these issues to promote the area's maturing and offer tools and resources capable of performing more accurate and expressive evaluations in this domain.
Therefore, we intend to evolve and expand the lessons presented here to compose a guide for qualitative evaluation of PX in LBGs, to provide directions for planning and conducting these evaluations in a consistent and structured manner. In our survey, we asked participants about such a proposal, and they considered it positive, especially for the benefit of students and beginning practitioners. Thus, we will continue with our efforts in this direction.