Characterization of software testing practices: A replicated survey in Costa Rica

Software testing is an essential activity in software development projects for delivering high quality products. In a previous study, we reported the results of a survey of software engineering practices in the Costa Rican industry. To make a more in-depth analysis of the specific software testing practices among practitioners, we replicated a previous survey conducted in South America. Our objective was to characterize the state of the practice based on practitioners’ use and perceived importance of those practices. This survey evaluated 42 testing practices grouped in three categories: processes, activities, and tools. A total of 92 practitioners responded to the survey. The participants indicated that: (1) tasks for recording the results of tests, documentation of test procedures and cases, and re-execution of tests when the software is modified are useful and important for software testing practitioners. (2) Acceptance and system testing are the two most useful and important testing types. (3) Tools for recording defects and the effort to fix them (bug tracking) and the availability of a test database for reuse are useful and important. Regarding the use and implementation of practices, the participants stated that (4) Planning and designing of software testing before coding and evaluating the quality of test artifacts are not a regular practice. (5) There is a lack of measurement of defect density and test coverage in the industry; and (6) tools for automatic generation of test cases and for estimating testing effort are rarely used. This study gave us a first glance at the state of the practice in software testing in a thriving and very dynamic industry that currently employs most of our computer science professionals. The benefits are twofold: for academia, it provides us with a road map to revise our academic offer, and for practitioners, it provides them with a first set of data to benchmark their practices.


Introduction
Software testing is an essential activity in software development projects, for delivering high quality products, but it is a costly activity in the software development life cycle (Garousi and Zhi, 2013). Software testing represents, on average, around 35% of the total budget of a development project (Dias-Neto et al., 2017). Testing practices play a significant role in the development process, they represent a quality assurance strategy for the identification of defects in the software applications before its deployment (Juristo et al., 2004).
Software testing has been a focus of attention for the industry. For example, the International Software Testing Qualifications Board (ISTQB, https://www.istqb.org/) aims to continually improve and advance the software testing profession by defining and maintaining a Body of Knowledge that allows testers to be certified based on best practices, connecting the international software testing community, and encouraging research. ISTQB promotes the value of software testing as a profession to individuals and organizations and has performed studies to observe the perception of practitioners on testing. After the "2013 ISTQB Effectiveness Survey", in which they collected feedback on the impacts of testing certifications, in 2015 ISTQB conducted a worldwide survey on Software Testing Practices with 3,281 responses from testing practitioners from 89 countries. ISTQB survey reveals significant findings for the professional practice: • The budgets assigned to testing are large and keep on growing and ranges between 11% and 40%. • The agile methodologies are being adopted ahead of traditional ones that emphasize the need to have appropriate testing processes and techniques for Agile. • The segregation of duties has become a standard practice where in 84% of the cases the test team does not report to develop. • The test tools for defect tracking, test execution, test automation, test management, performance testing, and test design are widely adopted. • Some level of test automation is a trending topic with a with 72% of adoption. • Testing requires a wide range of skills and competencies. • There are important career paths available for testers and test managers. • The decision of when to stop testing is mainly based on requirements coverage. • Exploratory testing is the most adopted test techniques. • Performance, usability, and security are the top three non-functional testing activities. • There are several improvement opportunities in testing practices such as test automation, test process, communication, and test techniques.
Afterward, the 2017-2018 ISTQB Worldwide Software Testing Practices Report collected more than 2,000 responses from 92 countries. It reported findings mostly in parallel with the previous survey and revealed the following: (1) main improvement areas in software testing were test automation, knowledge about test processes, and communication between development and testing. (2) The top five test design techniques are use case testing, exploratory testing, boundary value analysis, checklist-based, and error guessing.
(3) Trending topics will be test automation, agile testing, and security testing. (4) New technologies that could affect testing are security, artificial intelligence, and big data. Finally, (5) non-testing skills expected are soft skills, business and domain knowledge, and business analysis skills.
Currently, there is a gap between knowledge in academia and the software testing practices used in industry (Dias-Neto et al., 2017). Moreover, there is a knowledge deficiency for testing topics in practice activities (Scatalon et al., 2018).  state that the level of joint industry-academia collaborations in Software Engineering is very low compared to the number of activities in each of the two communities. Comparing the focus areas of industry and academia in software testing, results show that the two groups are talking about quite different things. As an example, practitioners talk about test automation referring to automating the test execution phase and academics on automated approaches (mostly focused on test-case generation and test oracles) . Moreover, researchers tend to be more interested in theoretically challenging issues, but test engineers in practice are more looking for options to improve the effectiveness and efficiency of testing .
Besides, there is a wide spectrum of testing practices conducted by different software teams (Garousi and Zhi, 2013) and a little evidence in the literature regarding the use and importance of such practices in industry (Dias-Neto et al., 2017). The characterization of testing practices used in industry can help professionals, researchers, and academics to better understand the challenges faced by the software engineering profession (Garousi and Zhi, 2013).
To characterize testing practices in the software industry, a large number of surveys have been conducted in different countries. Garousi and Zhi (2013), and Dias-Neto et al. (2017) summarized previous surveys on software testing practices. In particular, Dias-Neto et al. (2017) identified surveys conducted to characterize the adoption of software testing practices, tools, and methods.
In Costa Rica, previous surveys had been conducted to characterize software engineering practices. In our previous work Jenkins, 2017, 2018), we replicated a survey based on (Garousi et al., 2015(Garousi et al., , 2016 where we identify the most common practices, methods, and tools in professional practice and their related challenges. Moreover, we conducted a cross-factor correlation analysis of development and testing engineering practices versus practitioner demographics. In (Aymerich et al., 2018), the authors conducted a survey on development practices based on the HE-LENA study . They studied development approaches, practices, and methods in the industry. To analyze the specific software testing practices among practitioners in our country, we replicated previous surveys conducted in South America (Dias-Neto et al., 2006;De Greca et al., 2015;Dias-Neto et al., 2017;Robiolo et al., 2017).
Further replications in different countries are still needed to allow the comparison of industry trends in software testing practices (Garousi and Zhi, 2013;Dias-Neto et al., 2017). The results of these surveys can support evidence on testing practices in the software engineering community (Garousi and Zhi, 2013).
The objective of our study was to characterize a set of software testing practices with respect to their use and importance from the point of view of practitioners of software organizations in Costa Rica. In this work, we replicated the previously surveys in (Dias-Neto et al., 2006;De Greca et al., 2015;Dias-Neto et al., 2017;Robiolo et al., 2017) with 92 practitioners from our country. As stated in (Dias-Neto et al., 2017), we were interested in understanding the testing practitioners' use and perceived importance of software testing practices. In addition, we wanted to compare the results of our study with the results of the previous surveys. Thus, to facilitate the comparison between previous studies and this replication, we used the same questionnaire used in (Dias-Neto et al., 2017).
Previously, we had researched the software engineering practices of the industry in Costa Rica Jenkins, 2017, 2018). In this paper, we extend our previous study on software testing practices (Quesada-López et al., 2019) by extending the analysis performed. Besides, we con-ducted a literature search to identify past surveys on software testing practices in the industry. We describe the survey's planning, design, execution, analysis of the collected data, and the comparison with previous surveys conducted in Brazil, Uruguay, and Argentina to discuss the use and importance of software testing practices. Finally, to get feedback about the significance and usefulness of the survey results from the practitioners' perspective, we made two presentations of the study to different groups of professionals.
This study gave us a first glance at the state of the practice in software testing in a thriving and very dynamic industry that currently employs most of our computer science professionals. The benefits are twofold: for academia, it provides us with a road map to revise our academic offering, and for practitioners, it provides a baseline to benchmark their current practices.
The paper is structured as follows: Section 2 presents the related work. Section 3 describes the survey replication process. Section 4 analysis the results of the survey. Finally, Section 6 outlines our conclusions and future work.

Related work
Several survey studies have been conducted on the subject of software testing practices in different countries and scales (Garousi and Zhi, 2013). This section summarizes identified past surveys on software testing practices in the industry. These studies mainly aim to characterize the state of the practice in the software testing industry, identifying trends and opportunities for improvement and training (Dias-Neto et al., 2017).
To identify past surveys on software testing practices in the industry, we conducted a literature search. First, we conducted an exploratory search using Scopus and using the search string "TITLE-ABS-KEY(("software") AND ("testing practices" OR "quality assurance practices") AND ("survey" OR "questionnaire"))".
The inclusion criteria included only papers describing software testing surveys based on titles, keywords, abstracts, and analysis. The list includes papers on software engineering practices that report results on specific software testing practices. Table 1 briefly summarizes the surveys on testing practices. The paper reference, scale and region (or target community), target audience, number of respondents, and survey goal and focus area are listed. This table was based on Garousi and Zhi (2013); Dias-Neto et al. (2017) and updated with identified surveys in our search. In Table 1, papers reported in Garousi and Zhi (2013) were marked with (*) and papers reported in Dias-Neto et al. (2017) were marked with (**). Papers in both studies were marked with (***). The following reports were excluded because their research goal and method were not comparable to the others surveys (Andersson and Runeson, 2002;Runeson et al., 2003).

Replication process
In the following subsections, we provide details about the methodology for conducting the replication.
Replication studies are beneficial to evaluate the validity of prior study findings. Successful replications increase the validity and reliability of the outcomes observed in the original study and are an essential part of the experimental paradigm to produce generalizable knowledge (Carver et al., 2014). Combined results from a family of replications are interesting because all studies are related and could investigate related questions. The aggregation of replication results will be useful for software engineers to draw conclusions and consolidate the findings (Carver, 2010;Juristo and Gómez, 2010;Carver et al., 2014). A close replication study attempts to recreate the known conditions of the original study and is very similar to the original study. Close replications are often used to establish whether the original outcomes are repeatable (Lindsay and Ehrenberg, 1993).
Our study is an external replication of four previously conducted surveys in South America (Dias-Neto et al., 2006;De Greca et al., 2015;Dias-Neto et al., 2017;Robiolo et al., 2017). Dias-Neto et al. (2006) analyze the answers of 36 practitioners from 13 Brazilian organizations to identify the software testing practices used by the organizations and its importance. Greca et al. (2015) replicated the original survey with 18 practitioners in Argentina. Dias-Neto et al. (2017) conducted the same survey in Brazil and Uruguay with 150 practitioners. They surveyed different companies from Southern/Brazil (56 participants), Northern/Brazil (50 participants) and Uruguay (44 participants). Robiolo et al. (2017) surveyed 25 practitioners from 25 organizations of the public sector.
In this study, we reported the responses from 92 practitioners from Costa Rica. The study includes a detailed analysis of the data collected, and its comparison with previous studies, in accordance with the recommendations and guidelines in (Carver, 2010;Carver et al., 2014). This study is descriptive (Linåker et al., 2015) and is intended to compare and extend previous results (Carver et al., 2014), highlighting the similarities and differences in the use and importance of testing practices in different countries. The authors of the original study did not take part in the replication process. However, in our replication, we reused the survey goal, research questions, questionnaire, and analysis procedure reported in (Dias-Neto et al., 2017;Robiolo et al., 2017).

Goal and research questions
The objective of the study formulated using the Goal, Question, Metric (GQM) approach (Basili et al., 1994) was to characterize testing practices based on the practitioners' use and perceived importance in the context of software organizations in Costa Rica. The survey evaluated 42 testing practices grouped in three categories: processes, activities, and tools. We studied the following research questions: • RQ1: What are the software testing practices used by practitioners in their organizations? • RQ2: What are the most important software testing practices according to the opinion of testing practitioners?

Survey design
To address the study's goal and research questions, we conducted a survey to gather the opinions from practitioners.

Target population and sampling
The target population is the practitioners applying testing practices in software organizations in Costa Rica. The practitioners were sampled by convenience. They were contacted through the University of Costa Rica and the State Distance University, two of the most important public universities in our country. E-mail distribution lists were used to support the recruitment of participants.

Instruments used to collect data
We applied the questionnaire designed in (Dias-Neto et al., 2017) to collect the information. The instrument was divided into three parts: (1) profile and demographics, (2) the use of testing processes, activities and tools; and (3) perceived importance of testing processes, activities, and tools. The instrument evaluated 42 testing practices grouped in three categories: testing processes (practices related to the adopted test processes in the software organization), testing activities (practices concerned with the procedures performed during the software testing), and testing tools (practices concerned with tools supporting the software testing). We used the Spanish version of the instrument. In order to validate the questionnaire (concepts, language, and practices), we conducted five survey pilots. Table 2 details the list of questions of the instrument. The participants were asked to fill out the job position, experience in software testing, academic degree, certifications in testing, development methodology, programming language expertise, software platform used for development, company's size, and quality team configuration.
Participants were asked to fill the entire questionnaire with the 42 testing practices according to the use level in their current organization and the perceived importance of a testing practice. Dias-Neto et al. (2017;2006) defined a five point Likert scale to express the gradual increase in the level of use and importance of a testing practice, as shown in Table 3. As in the previous study, each practitioner answered only one option for the level of use and importance for each software testing practice. To determine whether there is a gap between the current state-of-the-practice and state-of-the-art in software engineering (*).  Norway Computing students 33 To identify the interest and desire to work in software testing among engineering and computer science students (**). Deak and Stålhane (2013) Norway Not reported 23 To characterize the factors that can influence the creation of a software testing department or the investment in software testing personnel (**). Garousi and Zhi (2013) Canada Software developers 246 To characterize Canadian testing practices (***).
Pham et al.
Not reported Software developers of GitHub 569 To characterize how the testing behavior is influenced by the peculiarities of social coding environments (**). Pérez et al. (2013) Belgium Development professionals 63 To assess the state of the practice in software quality with respect to software quality, and how these practices vary across companies. Pfahl et al. (2014) Finland and Estonia Software Developers 61 To study how software engineers understand and apply the principles of exploratory testing, as well as the specific advantages and difficulties they experience (***). Daka and Fraser (2014)  To understand the perception of practitioners regarding the use and importance of software testing practices, a replication of Dias-Neto et al. (2006) To characterize challenges and research topics that industry wants to suggest to software testing researchers.
Finland Industry practitioners 33 To explore industry practices concerning software testing and to assess how they test their products and what process models they follow, a continuation study of Taipale et al. (2006); Kasurinen et al. (2010). Kassab (2018) Not reported Software professionals 72 To discover the actual practices for software testing and quality assurance activities for software in safety-critical systems. Bhuiyan et al.

Data analysis
For each testing practice, we collected the use and importance level based on the opinions of the professionals. The equations were based on Dias-Neto et al. (2017). First, the responses of the professionals were differentiated by assigning a weight for each participant according to their experience, academic degree, and certifications on testing (Eq. 1). Second, we multiplied each answer by the weight of the participant and computed the total value for a testing practice (Eq. 2). Finally, we obtained a normalized value for the levels of use and importance that oscillates between 0% and 100% (Eq. 3). We applied the following formulas: Where: T (j) is the total value obtained for use and importance regarding the testing practice j. Answer(i, j) is the answer value (1-5) relating to the use and importance by the participant i for the testing practice j.
Where: N (j) is the normalized value for use and importance of testing practice j and ∑ N i=1 W (i) * 5) is the maximum possible value for testing practice j.
For each testing practice, the use and importance were analyzed and compared with previous studies, and the correlation between use and importance perceived was evaluated. For this study, we replicated the analysis proposed in (Dias-Neto et al., 2017). The most used/important software testing practices, the differences between regions, and the difference between the levels of use and importance perceived by practitioners were analyzed. Finally, the existence of a significant correlation between the levels of use and importance for each evaluated practice was tested.

Survey execution
The electronic questionnaire was implemented using LimeSurvey (www.limesurvey.org) and it was available in a Survey Server at the University of Costa Rica for a period of two months, from September to October 2018. Participants were asked to complete the survey online. All participants were invited to participate anonymously and voluntarily by email. We sent e-mail invitations directly to the professionals through contact lists of the universities.
Practitioners could withdraw at any time, and only summarized and aggregated information were published. Similar to experiences in previous studies Jenkins, 2017, 2018), some participants leave questions unanswered and others leave the questionnaire without completing it. Only the completed answers were considered for the analysis of results. After data pre-processing, the responses of 92 professionals were analyzed.

Threats to Validity
This work is subject to the threats to the validity reported for this type of studies including previous replications and the results must be interpreted carefully. We discuss the validity concerns based on Wohlin et al. (2012) classification.

Internal validity
This threat is related to the quantity and representativeness of the sample. The practitioners were sampled by convenience, reported as common practice for survey studies in software engineering (Molléri et al., 2016;Ghazi et al., 2017), and in previous surveys listed in Section 2. Besides, the survey could not necessarily represent all the Costa Rican industry. Although we achieved a relatively high number of respondents compared with previous surveys (Dias-Neto et al., 2017;Robiolo et al., 2017), it was not possible to evaluate the representativeness of the sample. We were not able to obtain a reliable estimation of the total number of practitioners in the software industry of Costa Rica. Our participants were mainly invited through the Universidad Estatal a Distancia and Universidad de Costa Rica network and partners in Costa Rican software development organizations. Many practitioners out of our contact were not probably properly represented in the survey sample. Moreover, we were informed that some practitioners working in transnational software companies could not answer the questionnaire for confidentiality issues with their companies. The original testing practices lists in the original study were not modified to allow the replication. The original practices could be outdated from the current state of the art and practice. Moreover, some testing practices in Costa Rica's context could be missed or omitted. First, we believe that the set of practices is still representative in the testing research field (Dias-Neto et al., 2017). Second, we conducted five survey pilots with professionals in Costa Rica to validate the questionnaire (concepts, language, and practices).

Construct validity
The testing practices lists were based on a previous survey instrument (Dias-Neto et al., 2017, 2006. The analysis of the levels of use and importance has already been used in the evaluation of the performance of organizations. We counted the votes for each question and then made statistical analysis. We used the weight function based on Dias-Neto et al. (2017) to compare the results across studies. The weight function Use of methodology or process P06 Analysis of identified defects P07 Identification and use of risks for planning and executing software tests P08 Planning/Designing of testing before coding P09 Monitoring adherence to the test process P10 Re-execution of tests when the software is modified P11 Evaluation of the quality of test artifacts P12 Setting a priori criteria to stop the testing P13 Reporting evaluation of a test round A01 Definition of a responsible professional or team A02 Application of unit tests A03 Application of integration tests A04 Application  Not used: the practice is within the scope of the organization, but it is not used in any software project.
2 Low value: the practice has low importance to use in software projects.
3 Infrequent use: the practice is not frequently used in the organization's software projects. 3 Limited value: the practice can be adequate to use in software projects. 4 Common use: the practice is used in most of the organization's software projects.

4
Significant value: the practice is recommended to use in software projects. 5 Standard use: the practice is used in all organization's software projects.

5
Essential value: the practice must be used in all software projects. L: Likert Scale.
should be carefully analyzed to interpret the results. The analysis showed differences in the levels of use and importance of software testing practices. The characteristics of the organizations could affect these results. We informed participants of the survey that we will not collect any personal information so that professionals will remain anonymous.

Conclusion validity
The analysis procedure to obtain the level of use and importance according to the characteristics of each participant was based on previous surveys (Dias-Neto et al., 2017, 2006. The analysis procedure is a weighted average, where the weight function is based on qualitative aspects representing each subject (Dias-Neto et al., 2017). The model of use and importance was based on a previous empirical evaluation of the software practices (Dias-Neto et al., 2006). The trade-off of using this type of analysis is that the information from the extremes can be lost (Dias-Neto et al., 2017). All conclusions in this study are traceable to data.

External validity
The survey reflects the practitioners' interpretation of importance and use. The answers could not necessarily represent the reality of testing practices and could reflect subjectivity. Aspects such as self-awareness and difference of training of the participants could influence responses. The results show a correlation between the levels of use and importance. It could indicate that practitioners find those practices usable and important, but they could not distinguish between the use and importance or they see no value in the difference (Dias-Neto et al., 2017). In this study, we analyzed correlations between testing practices and we did not intend to establish any causal relationship.
With respect to organizations size, 50% (46) of participants work in organizations with more than 100 employees, 16% (15) in organizations with 50-100 employees, 22% (20) work in organizations with 10-49 employees, and 12% (11) in organizations with less than 10 employees. Participants reported on average, 11.5 years of experience in the software industry, and 5.5 years of experience in software quality and testing. Only 20% (18) of the participants hold a software testing certification. Some 15% (14) of practitioners are ISTQB Certified Testers, 3% (3) are Certified Test
In total, 59% (54) of the practitioners claim to apply agile methodologies, 26% (24) traditional methodologies and 15% (14) use a hybrid development methodology. The most used programming languages are .Net in C# and Visual Basic (35%), Java (24%), C/C++ (11%), PHP (9%), and Python (9%). -Neto et al. (2017) observed that some participants could influence the results of the testing practices with their answers (experience and academic degree, as defined in Eq. 1). In this section, we analyzed the influence of each participant in this survey. The distribution of participants' weight ranges from 1.20 to 15.00 (M = 6.63, M d = 6.50, S.D. = 2.92). The 25th percentile was 4.80, the 50th percentile was 6.50, and the 75th percentile was 8.17. The normality test shows a normal distribution. The p-value for the Shapiro-Wilk test indicates that the values representing the influence (weight) of the participants were normally distributed (p > 0.05). Figure 1 shows the weight distribution through a dispersion and box-plot graph. Two outliers were identified (experts), the weights were 14.00 and 15.00 respectively. Both of them are project managers, with 30 years of experience in the IT industry, and 20 years of experience in Testing. Their highest academic degree is a Master's degree and the first one is a Certified Test Manager (CTM). In our analysis, we used the answers of all participants.

Participants among surveys
In this study, we compare the results of surveys conducted in Argentina, Brazil, Uruguay, and Costa Rica. Table 5 presents the percentages of the positions reported in each previous survey (Dias-Neto et al., 2017;Robiolo et al., 2017) and this study. We present the percentages of Northern Brazil In Brazil and Uruguay, 66% of the respondents are working on quality/testing (Quality Manager, Test Leader, Test Analyst, and Tester) and 34% in development activities (Analyst, Architect, Developer, and Project Manager). In the Northern Brazil region 84% are working on quality/testing, in Southern Brazil region 59%, and in Uruguay 57% (Dias-Neto et al., 2017). In contrast, Argentina reported only 16% of the respondents working on quality/testing and 84% in other development activities (16% were not reported) . In Costa Rica, 36% of the respondents are working on quality/testing, including 6% reported as quality engineers. In the same way, Table 6 the percentage of respondents by the company's size. The company's size (%) are: Less than 10 (S1), 10 -49 (S2), 50 -99 (S3), and more than 100 (S4). We can observe that with the exception of Argentina (AR), most of the answers come from professionals from organizations with more than 100 employees. In the next sections, we present the analysis of the results of the use and importance of the evaluated software testing practices. First, we present the analysis of the use and perceived importance of testing practices. Second, we analyze the correlation between use and perceived importance, Third, the results between use and perceived importance based on "more used" and "more important", "less used" and "less important", "more used" and "Less important", and "less used" and "more important" are discussed. Finally, we compare the results among replications. Table 7 presents a heat map with the results of the use and importance of software testing practices. The first column contains the results of our study and the other four columns the results of the previous studies. The most used and perceived important (P. I.) testing practices in process (P), activities (A), and tools (T) were marked in green, and the least used and important ones were marked in red. The greener color means the practice is deemed useful and/or important, the redder mean the practice is not considered important or not implemented. We present the results of Costa Rica ( For each testing practice, we could observe some trends by analyzing the use and important across the replications. In all five countries/regions, there is a set of used and important practices (P02: Documentation of test procedures and cases, P03: Recording the results of test execution, P10: Reexecution of tests when the software is modified, A01: Definition of a responsible professional or team, A03: Application of integration tests, A04: Application of system tests, A05: Application of acceptance tests, T01: Availability of a test database for reuse, and T07: Use of tools for recording defects and the effort to fix them-bug tracking), and a set of less used and considered less important practices (P08: Planning/Designing of testing before coding, A10: Registration of the time spent on testing, A11: Measurement of the effort/cost of testing, A13: Measurement of the defect density, A14: Conducting training on software testing, and A17: Analysis of faults patterns-trends).

Use of testing practices
The results of the use of software testing practices per country/region are presented. By analyzing the green patterns in Table 7, we can conclude that the three most used testing processes reported were: the recording of test cases results (P03), the documentation of test procedures and cases (P02), and the re-execution of tests when the software is modified (P10). In the case of testing activities, the three most used were the application of acceptance testing (A05) and system testing (A04), and the definition of a responsible professional or team (A01). Finally, the three most used testing tools were those for recording defects and the effort to fix them -bug tracking (T07), a test database for reuse (T01), and management tools to track and record the results (T04).
On the other hand, the processes for planning/designing of testing before coding (P08), the evaluation of the quality of test artifacts (P11), and the measurement and analysis of the test coverage (P04) were reported as the three least used. The measurement of the defect density (A13), the analysis of faults patterns -trends (A17), and the registration of the time spent on testing (A10) were reported as the three least used activities. Finally, the three least used tools were the tools for automatic generation of test procedures or cases (T03), coverage measurement tools (T08), and tools to estimate test effort and/or schedule (T05).

Importance of testing practices
The importance perceived by the participants on the software testing practices per country/region is presented in Table 7. By observing the green patterns, we can conclude that the three most perceived important testing processes were: the task of recording the results of tests cases (P03), the documentation of test procedures and cases (P02), and the reexecution of tests when the software is modified (P10). These processes were also the most used by practitioners. In the case of testing activities, the three perceived as most important were the application of acceptance testing (A05), the application of integration tests (A03), and the storage of records (logs) of the executed tests (A12). Besides, system testing (A04), and a definition of a responsible professional or team (A01) were perceived as important. Finally, the three most important testing tools were: tools for recording defects and the effort to fix them -bug tracking (T07), tools for automatic execution of test procedures or cases (T02), and a test database for reuse (T01). The management tools to track and record the results (T04) were also perceived as important.
Likewise, the processes for test artifacts quality (P11), for planning/designing of testing before coding (P08), and for reporting evaluation of a test round (P13) were perceived as the three least important. The measurement of the defect density (A13), the application of exploratory tests (A07), and the analysis of faults patterns -trends (A17) were perceived as the three least important activities. The perceived as the three least important tools were the tools to estimate test effort and/or schedule (T05), coverage measurement tools (T08), and tools for automatic generation of test procedures or cases (T03). Table 8 presents the Spearman's rho correlation coefficient between the use and perceived importance of each testing practice (two-tail test with p<0.01). In this case, there was a positive correlation between the use and perceived importance, and all correlations were statistically significant. The values above 0.5 were considered as highly correlated and are marked in bold. A high correlation means that the participants either: (1) deemed the practice useful and important, or (2) deemed the practice not useful and not important.

Analysis of correlation between use and perceived importance
Our results show that although there is a correlation between the values of use and perceived importance, only 18 of 42 practices are highly correlated (P01: Documentation of test plan, P02: Documentation of test procedures and cases, P03: Recording the results of test execution, P09: Monitoring adherence to the test process, P12: Setting a priori criteria to stop testing, P13: Reporting evaluation of a test round, A01: Definition of a responsible professional or team, A04: Application of system tests, A06: Application of regression tests, A07: Application of exploratory tests, A10: Registration of the time spent on testing, A11: Measurement of the effort/cost of testing, A12: Storage of records (log) of the executed tests, A13: Measurement of the defect density, T01: Availability of a test database for reuse, T05: Use of tools to estimate test effort and/or schedule, T06: Use of test management tools to enact activities and artifacts, T07: Use of tools for recording defects and the effort to fix them-bug tracking). In the following section, we compare the relation between use and importance. Dias-Neto et al. (2017) analyze the level of use and perceived importance dividing the test practices into two equal groups of the total 42 practices. Table 9 presents the "More used" and "More important", and the "Less used" and "Less important" testing practices according to the answers of Costa Rican practitioners. To classify the practices, the top 21 most used practices and the top 21 most perceived as important  practices were selected. The set of "most used, most important" practices represents the good practices in testing performed by Cost Rican practitioners. The set of "least used, least important" testing practices represent those that seem to be not relevant in the context of these organizations. Furthermore, these practices could represent gaps in knowledge about their benefits or simply a lack of organizational resources to put them into practice. Table 10 presents the "More used" and "Less important", and the "Less used" and "More important" testing practices. The set of "most used, least important" testing practices includes the practices used by software practitioners but considered not as important as other practices. In this case, other used practices could generate more value in supporting testing activities. The set of "least used, most important" test-ing practices are those not used by practitioners in their software organizations, but perceived as important for their professional practice.

Discussion
The results of the use of software testing practices show that practitioners in our industry are currently implementing basic processes and tools for performing software testing, but at the same time, they are not using key metrics for assessing testing results or the quality of the testing products. This clearly represents an important area for improvement in our industry and a challenge for universities for teaching these concepts.
Second, although not perceived as important by practitioners, we believe that metrics (such as defect density) and processes such as analysis of fault patterns are key for software organizations that aspire to improve their processes and reach higher maturity levels. They may not be deemed important now, but they will gain more importance as the industry matures.
On the other hand, based on the analysis of the correlation between use and perceived importance, we agreed with (Dias-Neto et al., 2017) when they state that practitioners can find the practices they use daily to be important and therefore, either they cannot distinguish between the use and important or they do not see value in the distinction. In the following section, we compare the relation between use and importance.
Finally, based on the analysis between use and perceived importance, the set of "least used, least important" testing practices could represent gaps in knowledge about their benefits or simply a lack of organizational resources to put them into practice. These practices may point out the gaps between academia and industry and, for example, have to be addressed through practitioners' training courses and software process improvement plans to show the benefits of their application. The set of "least used, most important" can be complex or expensive to implement, they may have considerable training needs, or these organizations may not have the necessary tools to perform them.

Comparing the results among replications
To compare the results of this survey with previous studies Dias-Neto et al. (2017) the "More used" and "More important" testing practices, and the "Less used" and "Less important" testing practices were analyzed. Table 11 presents the "More used" and "More important" testing practices for each replication. Five testing practices are common in all surveys (P03: Recording the results of test execution, A01: Definition of a responsible professional or team, A03: Application of integration tests, A04: Application of system tests, A05: Application of acceptance tests), and four practices are common in four surveys (P2: Documentation of test procedures and cases, P10: Re-execution of tests when the software is modified, A15: Separation of testing and development activities, A18: Availability of human resources allocated full time for testing).  Id "More used" and "Less important" Id "Less used" and "More important" P01 Documentation of test plan A08 Application of performance tests P12 Setting a priori criteria to stop testing A09 Application of security tests A16 Storage of test data for future use T02 Automatic execution of test procedures or cases Table 12 presents the "Less used" and "Less important" testing practices for each replication. Six testing practices are reported in four surveys (P07: Identification and use of risks for planning and executing software tests, P09: Monitoring adherence to the test process, A11: Measurement of the effort/cost of testing, T03: Use of tools for automatic generation of test procedures or cases, T05: Use of tools to estimate test effort and/or schedule, T08: Use of coverage measurement tools). These practices represent a gap between software testing state of the art (academia) and the state of the practice (practitioners) considering that the list of practices in the survey was defined considering the academic literature. In (De Greca et al., 2015), no practices were classified as less used and less important.
In Table 11 and Table 12, we only included practices of our survey, and practices with more than three occurrences across replications. We found no significant differences in practices perceived usefulness and importance between our survey and previous surveys. As in other countries, important practices are not being used in our software industry. This opens an interesting line of research to find out why they are not being used.
Our survey aggregated evidence previously reported and presented new evidence on the use and perceived importance of testing practices in the industry: • There is a gap between software testing state of the art and state of the practice. This study identified a set of testing practices classified as "Less important" and "Less used" (Table 9), and the set of these "Less im-portant" and "Less used" testing practices reported in multiple replications (Table 12). • The findings support that organizations mainly use the ad hoc criteria to stop testing. In Dias-Neto et al. (2017); Robiolo et al. (2017) the practice P12: Setting a priori criteria to stop the testing is ranked low (the level of use ranked in the bottom 10th (65%), 10th (63%), 12th (64%) and 7th (50%) positions respectively). In the case of Costa Rica P12 was ranked 23rd (72%). The perceived importance received a total of 77% (8th), 73% (10th), and 74% (11th) in Dias-Neto et al. (2017), 73% (13th) in Robiolo et al. (2017), and 87% (17th) in Costa Rica. • The application of unit tests (A02) is not within the three most used (71%, 79%, 78%) and important (81%, 88%, 86%) practices in any of the regions reported in Dias-Neto et al. (2017). However, in Robiolo et al. (2017) unit tests were reported as the most important practice (93%) and used (79%). In this study, unit testing was reported used (79%) and important (92%). According to the findings, we cannot conclude about the use and importance level of unit tests. Other testing practices, such as A03: Application of integration tests, A04: Application of system tests, A05: Application of acceptance tests, and A06: Application of regression tests were reported as used and important in multiple replications (Table 11). • The findings indicated some level on the use and importance of automated testing. However, T03: Use of tools for automatic generation of test procedures or Recording the results of test execution P05 Use of methodology or process P06 Analysis of identified defects P10 Re-execution of tests when the software is modified A01 Definition of a responsible professional or team A02 Application of unit tests A03 Application of integration tests A04 Application of system tests A05 Application of acceptance tests A06 Application of regression tests A12 Storage of records (log) of the executed tests A15 Separation of testing and dev activities A16 Storage of test data for future use A18 Availability of human resources allocated full time for testing T01 Availability of a test database for reuse T04 Test management tools to track and record T06 Test management tools to enact activities and artifacts T07 Tools for recording defects and the effort to fix them (tracking) Identification and use of risks P08 Planning/Designing of testing before coding P09 Monitoring adherence to the test process P11 Evaluation of the quality of test artifacts P13 Reporting evaluation of a test round A07 Application of exploratory tests A10 Registration of the time spent on testing A11 Measurement of the effort/cost of testing A13 Measurement of the defect density A14 Conducting training on software testing A17 Analysis of faults patterns (trend) A19 Selection of test techniques based on features T03 Use of tools for automatic generation of test procedures or cases T05 Use of tools to estimate test effort and/or schedule T08 Use of coverage measurement tools T09 Use of continuous integration tools for automated tests T10 Selection of test tools according to project characteristics cases was reported as "Less used" and "Less important" in Dias-Neto et al. (2017); Robiolo et al. (2017) and this study. Besides, the testing practices T02: Use of tools for automatic execution of test procedures or cases, and T09: Use of continuous integration tools for automated tests were categorized as "Less used". We cannot infer whether the level of use is lesser or higher than manual testing.
Finally, we confirmed some similarities highlighted by Dias-Neto et al. (2017) regarding industrial surveys: (1) testing automation is a concern, but it has not reached full adoption in industry, (2) the ad hoc has been reported as one of the main used criteria to stop testing, (3) the use of tools for recording defects and bug tracking are the most adopted, and (4) the most used testing levels are acceptance, integration, system, and unit testing.

Getting Feedback from Practitioners
To get some feedback about the significance and usefulness of this research from the practitioners' perspective, we made two presentations to different groups of professionals about our study results. After presentations, we asked them the following two questions: (1) Do you think that the data on this presentation provides value for your professional practice?
(2) What would you like to see in future presentations?
For the first question, everyone who answered responded in the affirmative. They considered the results from the survey useful to keep up to date with industry trends and improve their own software processes. One person mentioned the importance of doing an informal benchmark with this initial data. A couple of them also mentioned the importance for academia to know these data for keeping updated their curricula and for better defining the exit profile of their graduates.
For the second question, the answers varied substantially. Some people would like to see presentations with specific examples or case studies on how to implement software testing practices in organizations. Others would like to have a presentation on guidelines about how to implement some of those practices in their own organizations. Others suggested having presentations about software testing metrics and tools (including the measurement of testing effectiveness), and how to implement them in small and medium organizations. Finally, one person suggested to hold an entire workshop on software testing and to include software security testing as the main issue.

Conclusions
This paper reported a survey study of software testing practices in the Costa Rican software industry and compared the results with previous studies conducted in South America. We characterized a set of testing practices with respect to their use and perceived importance from the point of view of 92 practitioners.
The main software testing practices reported in this survey were the recording of the results of tests, documentation of test procedures and cases, and re-execution of tests when the software is modified. Acceptance and system testing were the two most useful and important testing types. The tools for recording defects and the effort to fix them (bug tracking) and the availability of a test database for reuse were reported useful and important. In contrast, the planning and designing of software testing before coding and evaluating the quality of test artifacts were not a regular practice. Finally, there is a lack of measurement of defect density and test coverage in the industry; and tools for automatic generation of test cases and for estimating testing effort are rarely used.
A set of testing practices were common across different countries: the application of integration, system and acceptance tests, the recording of test execution results and the definition of a responsible professional, or team for testing. In contrast, our results confirm that the main testing limitations are the monitoring and measurement of tests and defects, the automatic generation of test cases, and procedures and the management of test coverage and effort. These last three are clear areas for process improvement.
Further studies in different countries and regions should be conducted to compare industrial trends in software testing practices. We believe this work could be used by organizations, practitioners, and academics to improve the state of the practice in our software industry. For future work, it could be interesting to make a comparison using the demographic data of the participants (such as types of projects, organizations' characteristics, and others) to find out if different demographics influence the results by country.