Extraction of test cases procedures from textual use cases: is it worth it?

Software testing plays a major role in software quality once it assures that the software complies with its expected behavior. However, this is an expensive activity and, consequently, companies usually do not perform testing activities on software projects due to the time required. These costs may be even higher in testing processes that rely on manual test execution only, which is both time­consuming and error­prone. One strategy commonly used to mitigate these costs is to use tools to automate testing activities such as test execution, test documentation, and test case generation. This paper presents an experience report in the context of a Test Factory about the use of a tool that partially automates the specification of test case procedures from textual use cases. This tool automatically retrieves use cases from the requirement management system, generates the test case procedures, requires inputs from the tester, and then sends the test cases to the test management system. This paper details how this tool was used in releases of an industrial software project through a proof of concept. We also performed a feasibility study with four test analysts from different projects to gather more data regarding its efficiency to support the test case documentation. The results indicate that the tool reduces the test specification time, and that the integration with both requirements and test management systems made our tool feasible in practice.


Introduction
Software testing has an essential role in software quality as surance, allowing the discovery of bugs beforehand over the product life cycle (Myers et al., 2004). However, performing manual testing activities can be timeconsuming and error prone. Beyond that, mistakes in these activities (e.g., bad test coverage or error in testing effort estimation) may contribute to the appearance of test debts, i.e., technical debts related to software testing activities (Samarthyam et al., 2017; Aragão et al., 2019. Aiming to reduce the mistakes and costs related to soft ware testing, many companies have dedicated efforts to au tomate testing activities, such as the generation of test cases, test execution, and test reports (Garousi and Mäntylä, 2016). In spite of the advanced research on testing activities automa tion in the academy, the main concern in the industry is to im prove the effectiveness and efficiency of the tests with the au tomation and use of techniques that are easy to use (Garousi and Felderer, 2017).
Besides that, many software development companies have hired test factories services. One of the advantages of a test factory is that it acts in software testing externally and inde pendently from the development team  Test factories can help to improve the quality of software by reducing the effort of testing activities from the development team. Test factories also have teams that work on several do mains of systems, which can be allocated to work on different testing projects on demand. Software development organiza tions also have the benefit of outsourcing the selection of the testing team.
On the other hand, test factories have to cope with chal lenges related to the definition of testing processes (Aragão et al., 2017) and automation of test case execution (Vieira et al., 2018a). Additionally, the tight deadlines of software projects can hinder the process of an external company that offers the testing services. Thus, it is necessary to research the automation of testing activities.
Regarding the automation of activities, which is the focus of this paper, the literature still has few experience reports, especially in the context of a test factory. This sort of study is important since it provides evidence that knowledge from literature can support practitioners. This paper's main objective is to report the experience on the test generation from use cases with an automated tool. In our previous work , we presented our first experience report on using a tool for the semiautomatic generation of test procedures based on use cases. The devel opment of this tool was based on existing work in the soft ware testing literature. We also conducted a proof of concept in the context of a test factory to assess the benefits of the tool during the testing process and reported five lessons learned from this experience. Afterwards, we intend to expand the previous report, also focusing on the acquired experience in the automatic generation of tests. This extension consists of improving the tool's functionalities, allowing users to define their own templates to extract data from the textual use cases. Another improvement in our tool was the inclusion of busi ness rules in the test case generation process to increase the test coverage.
In this context, we plan to answer the following question: "Is it feasible to use a tool to generate test cases from textual use cases in the test process within a test factory?".
To answer this question, we expanded the efficiency proof of concept with more data regarding real releases of an in dustry software project. In this proof of concept, the speci fication of tests needed 65,38% less time than manual activ ity. We also conducted a feasibility study with test analysts from different projects and collected their feedback. All users needed less time to complete the specification task using the tool, but they also reported the need to improve its usabil ity. For instance, the solution generates test procedures with unnecessary extra characters.
This paper is organized as follows: Section 2 discusses re lated work. Section 3 presents the methodology used for the development and proof of concept of our solution. Section 4 describes the environment of the test factory, its team pro file, tools, and internal processes. Section 5 details the tool developed. Section 6 details the proof of concept conducted with our solution in a industrial context. Section 7 describes the feasibility study with users that was conducted. Section 8 summarizes the lessons learned during this study. Finally, Section 9 concludes the paper.

Related Work
In the literature, several work deal with the generation of test cases and procedures from use cases. For example, some ap proaches (Nogueira et al., 2019; Sneed, 2018 are based on Natural Language Processing (NLP) for the extraction of test cases and others on the generation of intermediate models to extract the necessary information (Some andCheng, 2008; Massollar et al., 2012). Furthermore, studies (Gutiérrez et al., 2015; Jorge et al., 2018; Massollar et al., 2012; Yue et al., 2015 in the literature have performed evaluations and expe rience reports in the industry about test generation tools and approaches. Some and Cheng (2008) offers an approach for generating test scenarios based on textual use cases, using a restricted language with tokens for preconditions, flows, steps, and con ditional expressions. The first step in the approach consists of extracting information from structured texts to create a state machine called the Control FlowBased State Machine (CFSM), in which transitions represent the steps, and states represent the actions and outputs. Use cases included in an other use case compose the same CFSM. At the end of the generation process, a global CFSM is generated to link all use cases, which is traversed to generate the test scenarios. The paths in the model represent scenarios that can be generated with different coverage criteria. We use a similar concept in this paper to generate the test procedures, also requiring man ual intervention to create the final tests. Another similarity is that we generate the scenarios by paths in the flows, but without generating intermediate models and with simplified selection criteria. Massollar et al. (2012) present an automated modelbased approach for generating test cases. The approach consists of specifying the use cases using specific patterns so that they are converted into UML 1 activity diagrams to represent the system's behavior. The goal of the activity diagram is two folded to check if the use cases have been specified cor rectly and assist test models generation. This test model is the basis for the generation of procedures and test cases in a way that the test analyst must manually identify and insert the necessary data to generate the test cases. This paper also presents an evaluation of the tool that is carried out with two software engineers and a group of students. The authors dis cuss the data related to the specification time and the model verification, but with low emphasis on the test generation. Gutiérrez et al. (2015) present a modelbased approach for test case generation which focuses on the use of meta models to increase the generalization of the solution with dif ferent approaches. Their solution uses meta models to model use cases and test elements, thus making transformations in the models until the test cases can be obtained. This work presents three industrial use cases, one of them in an agile context, and it also summarizes the lessons learned. Even with the introduction of extra models and their respective transformations, the authors reported effort reductions with the use of the proposed tools. However, not much informa tion is provided about the approach's effort in the agile envi ronment. Yue et al. (2015) present RTCM, an approach for the tex tual specification of test cases through similar elements from use cases. This approach provides some predefined patterns for test specification and a tool called aToucan4Test, whose primary goal is to assist the whole generation process of man ual and automated test cases. To analyze the feasibility of the solution, the authors present the use of the tool in two indus trial case studies in the domain of cyberphysical systems. The assessment is focused on automated test scripts genera tion, in which the authors report a significant reduction in the implementation effort. Finally, the authors present the lessons learned from the process. Jorge et al. (2018) propose CLARET, a domainspecific language that allows use case creation using structured nat ural language and test cases. They also present a supporting tool that allows the specification and validation of use cases but also converts them to Labeled Transitions Systems to cre ate the test cases. This work describes industrial study cases in an agile environment, where the software engineers write use cases using CLARET and generate test cases by using the developed tool. They also present the lessons learned and results on the effectiveness of the solution. Sneed (2018) reports his experience in the industry with the semiautomatic generation of tests. The generation ap proach consists of extracting information through Natural Language Processing, using either requirement in plain text or use cases enriched by keywords. The expressions of the text are compared with grammars to identify actions, states, and business rules that serve as the basis to test conditions. Finally, the tester must change the test conditions to insert in puts and outputs. The author reports experience in four indus trial projects, summarizing data related to effort. This work is similar to ours, mainly in the use of keywords to ease the ex traction of information, though little information is provided on how these tests can be changed through the tool. Addi tionally, the integration of the tool with other systems that supports the testing process is not presented. Nogueira et al. (2019) propose an approach for the auto matic generation of tests from use cases. For this purpose, the authors propose the use of a controlled natural language. The first step consists of modeling use cases through language, which allows the declaration of system interactions, entries, and conditional expressions. After that, the specification is converted into CSP models, so that the variables and data types are converted to formalism. In the third step, the ana lyst specifies the purposes of testing that will guide test gen eration. Finally, the generation is performed using an LTS model, where the traces represent test scenarios and the spec ified domain is used to create the tests. The authors reported the implementation of a tool that abstracts the formalism of the approach for testers. Among the similarities, it is possible to highlight the use of use cases of partners in the industry. However, the tool usage by test analysts is not presented.
The main goal of our paper is to report the experience of the automatic generation of test procedures from textual use cases. To accomplish this, we implemented a tool that fulfills the needs of a particular agile project. Table 1 summarizes a comparison of our work with the related work presented in this section. It is possible to verify that most studies have focused on the generation of test cases rather than test pro cedures. However, these approaches impose the additional cost of formal models to increase efficacy (Massollar et al., 2012; Yue et al., 2015; Gutiérrez et al., 2015. To analyze and extract the use cases, we used predefined structures in the use cases without restricting their specification with a syntax grammar (Nogueira et al., 2019; Sneed, 2018. This latter would require changes in all use cases of the project's documentation. The work most related to ours is the one by Some and Cheng (2008) and Massollar et al. (2012). In our tool, we use a concept similar to the scenarios presented by the fore mentioned authors to create the procedures, linking the use case flows through references in steps. The main difference was to use a simpler representation of use cases that do not rely on models or formal languages. This impacts the effi cacy during the test generation. On the other hand, there is the benefit of offering a more practical solution with less ef fort related to specification. Thus, the solution can be consid ered an initial approach for test automation, which integrates with other systems of testing projects. Moreover, our paper discusses the results of effort metrics collected in the context of a test factory and presents feasibility study results with users.

Research Method
To guide the execution of the original study , we used a methodology with steps that were based on the transfer technology model proposed by Gorschek et al. (2006). This model favors a cooperation between the academia and the industry and can be beneficial for both. It allows researchers to study relevant industry issues and val idate their results in a real environment. The methodology used in this paper has five steps. In this paper, we improved the solution of Step 2 and the proof of concept of Step 3. We also added Step 5 to perform the evaluation with users. The steps are described thereupon.
Step 1 Identifying potential improvement areas based on industry requirements: In this step, we performed ob servations on the test activities of a real test project (see Sec tion 4). To assist information gathering, involved researchers asked the test team about needs regarding the testing process. We identified improvement issues related to the test specifica tion process execution. After that, test analysts of the project were interviewed to gather more details about how the test activities can be executed to reduce the effort 2 . As a result, we identified the requirements of an automation tool for sup porting the test analysis and specification.
The requirement for the solution is that it should not cause too many modifications in the artifacts (e.g., use case and test cases templates) of the process. This solution must also be as practical as possible to reduce the effort related to for mal specifications. Most of the solutions presented in Section 2 requires the introduction of additional models or modifica tion of the use case templates. We also could not find an auto mated solution that fulfill the projects needs, so a customized solution must be implemented. Step 2 Solution design: After the previous step, we started the elaboration of a solution. The goal was to use prac tices and concepts from the literature that would best fit the requirements of the industrial project. In order to do so, we made a review of some solutions presented in the literature that could help in the development of a customized solution. As described in Section 2, many approaches are supported by additional models to increase the effectiveness of the gen erated tests. However, we chose to avoid modeldependent approaches, since the objective was an easytoimplement solution that does not require the manipulation of an addi tional formalism or that could somehow affect the Sprints of the project. As the use cases of the industrial project were specified in the Portuguese language, we also chose not to use NLPbased approaches, once the solutions found are de signed for usage with the English language. Additionally, we did not intend to use a fixed syntax to avoid impacts on the specification of use cases. The latter was necessary because the project employs a use case template in the requirements elicitation that should not be changed.
Among the tools in the literature that propose test genera tion, Specmate (Freudenstein et al., 2018) is one of the cur rent tools with more features. Although it still depends on additional models for test generation, its procedure specifi cation and test data insertion process is straightforward and does not depend on additional models. We follow similar in teractions to build the user interface of our solution. Our steps to generate the test scenarios draw similarities with the work of Some and Cheng (2008), but with simpler coverage crite ria.
Given the considerations mentioned above, we decided to develop a tool that would partially automate the specifica tion process of test procedures. Thus, test analysts could have more control over the test specification. This tool should also receive input data to create test cases.
Step 3 Performing a proof of concept: To perform an initial evaluation of the tool, a proof of concept was made in one Sprint of the same project used as the context to build the solution, which was started in June 2019 and finished in July 2019. This proof of concept was conducted by one of the researchers and assisted by the test leader of the software project. Aiming to analyze the impact of the tool in the test ing teamwork, metrics related to the number of test cases, requirements coverage, effort, and variance were collected. These metrics are part of the Test Factory process (de Cas tro  and are based on articles available in white literature (Seela and Yackel) and academic work (Lazic and Mastorakis, 2008). Table 2 summarizes the met rics used and their respective formulas. Based on the pilot results, which were presented in our previous paper , we obtained initial data about the feasibility of the tool and its benefits. We also identified some failures in the tool, which were fixed before the tool was deployed. The results of the proof of concept are presented in Section 6.
Step 4 Solution deploy: In this step, the elaborated so lution is deployed in the project for use. In our case, we de ployed the tool for use in our industrial project. Then, we used our tool during some releases of the referred project. The data collected during this usage is presented in Section 6.
Step 5 Carrying out a feasibility study with other professionals: After the deployment of the solution, we per formed a feasibility study with professionals from the testing area of other software projects. The main objective of this evaluation was to obtain data about the efficiency of the tool with professionals from different contexts who have had ex perience in the specification of tests based on use cases. The results of this feasibility study are presented in Section 7.

Proof of Concept Environment
In this section, we present the environment in which the so lution was created and the proof of concept in Section 6 was conducted. We performed the proof of concept in a Re search, Development, and Innovation project concerned with requirement elicitation and software testing of the Software A 3 . This software aims to manage a passive optical network.
This project can be considered distributed since the client, development and requirement/test teams belong to different institutions and work in different locations. The team respon sible for the Software A's requirement/tests makes use of the SCRUM (Schwaber and Beedle, 2002) framework with Sprints that lasted a month.
This environment was the basis to build the tool presented in Section 5. It was also used to conduct the proof of concept to be presented in Section 6.
Subsection 4.1 presents the testing team's profile. Sub section 4.2 describes the tools and patterns adopted in the project. Subsections 4.3 and 4.4 detail, respectively, the re quirement and the test process used. These processes were elaborated based on the previous experiences (Aragão et al., 2017; Vieira et al., 2018b of the GREat's 4 test factory in test projects.

Team profile
The test factory team involved in Software A project is com posed of a test manager, one requirement analyst, two test analysts, one trainee (tester), and one researcher. Among the members, only one of the analysts and the trainee executed test cases. The analyst has a fourteenmonth experience in re quirement elicitation and worked for one year and six months in test execution. The trainee has an experience of a year and four months in requirement elicitation and test execution. Both of them have a fourteenmonth experience in require ment elicitation and test activities in the Software A project.
The requirement/test team performed both the requirement and test activities, in which the use cases are the basis for the test case specification. In addition to the tests based on use cases, the team also conducted exploratory tests during the execution of the Sprints. The high knowledge of Soft ware A's requirements allowed the test analysts to generate more concise test documentation, thus providing more agility during the process. Therefore, the analysts executed the tests based on the documents and their own experience. However, concise test documentation can also be costly to create and maintain, especially in a project with a lot of requirement changes and fixed release date.

Tools and Patterns of the Internal Pro cesses
To guide the activities of the requirement and test processes (see Sections 4.3 and 4.4), the testing team used the fol lowing tools: JIRA 5 , for use case and task management; Confluence 6 , for business rules and general documentation; TestLink 7 , for test plan and test cases management; and the browsers Google Chrome 8 and Firefox 9 , for the test case ex ecution.
Since the beginning of Software A project, the stakehold ers decided to perform the requirement specification using welldefined templates, aiming to improve the understand ability for all stakeholders. Therefore, we used special sym bols that ease the identification of elements in the use case steps. Figure 1 shows a fictitious example of a use case to edit a registered user, where the Basic Flow starts in the tag [Basic Flow]. Likewise, in step 3 of the Basic Flow, the in put fields are identified by double quotations. In step 4, the clickable visual elements are written between <> symbols. The use case also has information about the use case's goal, related mockups, preconditions, and the acceptance criteria. The latter refers to the flows that should not have any crit ical bug so the use case implementation can be considered "done".

Requirements Process
In order to organize the requirement engineering tasks, we followed a requirement process with the following activities: Elicitation, Analysis, Specification, and Validation. Accord ing to Wiegers and Beatty (2013), these steps are essential to requirement engineering in a software project. During the implementation of the project, the analysts performed the re quirements activities in a way its outputs could be used as input to the Sprint's backlog.
Although the project had agile characteristics, the client requested the detailing of the documentation for Software A because of its complex features. Thus, we specified the re quirements through textual use cases. Each step task is pre sented as follows.
1. Elicitation: This step aims to identify the system re quirements by consulting the stakeholders. In this pro cess, the team performs an interview with stakeholders and the elaboration of usage scenarios with interface prototyping using the Balsamiq 10 tool. 2. Analysis: This step is responsible for verifying the consistency, completeness, and viability of previous elicited requirements. Hence, the stakeholders prioritize the requirements aiming to identify which ones have a faster and higher return of investment to the client and final customer. 3. Documentation: In this step, the analysts specify the requirements as use cases and communicate them to the team. The system business rules are also documented. 4. Validation: In this step, the analysts assure that the re quirements have acceptable description and can be sent to the development. In this paper, this step also involves the creation and validation of high fidelity prototypes.

Testing Process
The testing process provides real feedback from the behav ior of the software (Bertolino, 2007), and the organization of activities allows its communication, monitoring, and improv ing (Mette and Hass, 2008). Processes can vary depending on the institution. Still, there are generic processes (Mette and Hass, 2008; ISO/IEC29119 2, 2013) that can be adapted for the organization's purposes. GREat's Test Factory project (de Castro  is based on MPS.BR (Montoni et al., 2009) and has three steps: (i) Planning; (ii) Elaboration; and (iii) Execution. In the context of this project, the requirement analysts send the documents, specified as described in Section 4.3, to the test activities so that the specification and execution of the tests are performed before the system is released. We present a brief description of the main activities of this process as fol lows: 1. Planning: This step consists in verifying the test goals and perform the required actions to transform the test strategy in an operational plan. 2. Specification: This step aims to elaborate tests to meet the demands of the test plan. This also includes auto mated test scripts specification, when necessary. 3. Execution: At last, the final step relates to executing the tests and store results. In this step, the test analysts must verify the test incidents. Therefore, the analysts also generate a test report and send it to the client with lessons learned.
It is worth noting that, during the whole process, the test team controlled and kept track of the activities, allowing them to make some improvements in the next process exe cution.

Tool for SemiAutomatic Genera tion of Tests Procedures
In our previous work , we introduced the tool used to generate tests from use cases in the context of the Software A project. In this paper, we will refer to this tool as UC2Proc. This tool was mainly developed by one re searcher and one test analyst. Regarding the improvements of the tool (see Section 5.1), another test analyst was respon sible for implementing additional features. The features of the UC2Proc tool comprise processing structured textual use cases from JIRA, the generation of test procedures from the use case flows, the edition of the extracted test procedures, the addition of input data to create test cases, and, finally, sending generated test cases to the TestLink. For the current study, our tool was improved based on the pilot results. All the tool features and its improvements are detailed in Subsection 5.1. In Subsection 5.2, we present its package diagram, and in Subsection 5.3 we present some in terfaces and how the tool works.

Tool Features
The features of the UC2Proc are described as follows: (1) Integration with JIRA. In our tool, the test analyst can search for the identifier of a use case issue from the JIRA system. Next, regular expressions are used to extract the in formation from the textual use cases, which processes the fol lowing elements: objective, preconditions, flows, steps, refer ences to other flows, and data entries in the steps. This opera tion only works correctly if the use cases are strictly specified according to the patterns configured in the tool.
(2) Testing Procedures Generation. The information ex tracted from the textual use cases is used to generate test procedures. To achieve so, the UC2Proc first creates test scenarios that are composed of flow sequences to be vis ited in the use cases. To accomplish this task, we used an approach similar to the one presented by Some and Cheng (2008). Then, we implemented an algorithm that generates state sequences starting in the 'Basic Flow' and visits all al ternative/exception flows that depart from it. Thus, starting from each of the alternative/exception flows, all their respec tive steps are analyzed and the paths to other flows are visited. This scenario of flow sequences are used to compile the input steps and variables that will compose the test procedures.
In the current version of our tool, we also added a new function that identifies the business rules referenced in the use case. The tool then creates two tests for each rule: one with the purpose of validating it and the other aiming at veri fying the violation of the rule. The user then visualizes the ref erence and manually fills the steps of the test procedure. The coverage criteria, although simple, allows generating scenar ios that go through all flows and some transitions. However, a deep search for all state machine paths is not performed, which could lead to the generation of some scenarios with several flows. The intention of this functionality is to gen erate test procedures similar to those that the test analysts manually generate in the project.
Algorithm 1 explains the process to generate the scenarios for each use case. Lines 1 to 5 declare the necessary vari ables, where testScenerarios is the list of scenarios with the use case flows, currentPath is an auxiliary variable and test Procedures is the final list of test procedures. The first step is to create a scenario for the basic flow. Thus, the algorithm iterate over each step of the basic flow. From line 8 to line 11, a new scenario is created from basic flow to each flow that is called in the basic flow steps. Next, the lines of each alternative/exception flow are analyzed, and one scenario is created starting from then to each new flow reference. In sum mary, scenarios are generated by exploring a maximum of one level from each flow. After creating the test scenarios, the algorithm iterates over each scenario, creating a proce dure containing title, goal, and preconditions from the sce nario. Thus, the algorithm iterates over each business rules from use case and creates a test procedure containing the ti tle of the rule and blank space in steps and output, which the user must manually fill. At last, the algorithm returns the list of test procedures generated.
(3) Testing Procedures Management. The developed tool allows the test analyst to add, edit, and delete a test proce dure, as well as to manage the steps within a procedure. Our tool also extracts and displays to the user the inputs listed in the steps of each procedure. These inputs can be added, edited, or removed when editing the steps, but it is not pos (5) Integration with TestLink. After the generation of the test procedures and addition of the data to generate test cases, the tool sends the test suite to TestLink. For this, the test an alyst must configure the test project name in which the gen erated test cases must be uploaded.
(6) Template Customization. In our previous work (San tos et al., 2019), the tool was limited to a fixed use case tem plate. In the present work, we added new functionality that allows the customization of the patterns to detect elements of textual use cases. It is worth noting that the general struc ture of the use cases is fixed and must be followed. However, the user can create its own regular expressions using a form in the tool. The major advantage of this functionality is that it makes the tool more customizable, allowing it to adapt to patterns of different projects or organizations.

Package Diagram
The UC2Proc tool was developed as a Web App using the framework Ionic 11 and the JavaScript programming lan guage 12 . We also used the JIRA and TestLink APIs to allow communication with these services. Figure 2 presents a UML The Issue module manages the issues received from the JIRA API, which uses the defined regular expressions to ex tract the necessary elements from the issue. These elements are used to instantiate the objects (e.g., Basic Flow, Business Rules) used in the test generation process.
The Use Case module receives the issue extracted from the previous module. Next, it instantiates a use case object based on the information received in such a way that each flow contains steps, the flow/event that triggers it, and the flows possible to reach it.
After that, the Scenario module generates the usage sce narios from the flows labeled in the Use Case module. The generation process follows the algorithm presented in Sub section 5.1. The Test Data module handles the manual input of test data in the test scenarios.
Finally, the Test Case module generates the test cases that will compose the test suite and uses the Testlink module to send them to the test management tool. 11 https://ionicframework.com/ 12 https://www.javascript.com/

Tool's Usage
This section presents an example of the use of our tool, given the use case presented in Figure 1. At first, to use the tool, the user must configure the integration with external man agement systems, detailed as follows: • JIRA: The test analyst must provide the URL of the JIRA API, username, and API key of an account. This information can be obtained from the security page of the JIRA user account; and • Testlink: The test analyst must provide the URL of the TestLink API, which can be obtained from the analyst responsible for the maintenance of the TestLink in the organization, and the API key, which can be found on the user page in TestLink. Once the authentication information is configured, the user must start by searching the use cases. In order to do so, it should type the use case issued ID from JIRA in the field "search". The system then displays the search results and al lows the found issues to be added by the "+" button, as shown in Figure 3. Thus, the user must click on the right arrow button and the system runs Algorithm 1. Hence, it displays the generated scenarios for each flow and business rule from the use case. Assuming the structure of the use case presented in Figure 1, the expected result must generate one test procedure for each flow (Basic Flow, AF01, AF02, EF01) and two additional procedures for the business rules.  Table 3 presents the test procedures generated with the following fields: (i) Scenario, the test scenario generated; (ii) Title, which is the test procedure title; and (iii) Actions   and Outputs, which represent the steps and expected results through the use case flow (between "[]") and the step num bers. For instance, the first row of 3 represents basic flow scenario, which assumes steps 1, 3, 4 and 6, steps that con tain "The Admin", from use case as test actions, and steps 2 and 5, steps that contain "System", as the expected result. Thus, the user can add, remove, or modify test cases or test data using the fields shown in Figure 4.
After the edition of the test procedures and generation of the test cases, the user must click on the button "Send to Testlink", and choose the name of the project and test suite in TestLink, as shown in Figure 6. The tool then creates a new test suite into the chosen project and export all current test procedures as test cases into Testlink.
An example of test cases generated from use case shown in Figure 1 is depicted in Figure 5.

Proof of Concept
In this section, we detail the process of the study execution after the deployment of the solution described in Section 3. After that, we performed a proof of concept conducted in the environment presented in Section 4. Subsection 6.1 describes the steps to perform the evalu ation. Subsection 6.2 summarizes and discusses the results related to the test effort.

Proof of Concept Steps
We evaluated the tool in three steps: (1) selection of use cases, which produces a list of requirements without previous test cases; (2) effort estimation, where the analyst evaluates the time to complete the task based on its experience; and, (3) automatic generation of tests, comprising the actual use of the generated solution. These steps were performed by one of the researchers and the test analyst of the project.
These steps are described next.

Use Case Selection.
The first step of the proof of con cept was the selection of the use cases. To prevent the bias as sociated with the analyst's knowledge, we selected use cases that were not analyzed and specified before. Taking into ac count that textual patterns of the project were already applied to documents, the analysts did not perform editions in the use cases.
To present results as close as possible in the original con text of Software A project, we selected use cases from a real release of the software under development. Considering the aforementioned conditions, the team used all the artifacts pro duced during the proof of concept in the real release.

Effort Estimation.
The test analyst estimated the man ual effort to specify the tests in minutes. Then, the analyst calculated the effort with a metric defined by the following equation: 3 * (N of F lows + N of BusinessRules) + W . The number three is a multiplier factor representing the an alyst's effort in minutes to specify the tests for a use case with N flows and N business rules. Additionally, the metric includes a weight W that adds the extra time based on the analyst's perception. The cases with W equals to 1 refers to flows with many repeated steps, which required lower test specification effort. The greater the inexperience of the ana lyst with the functionality under test, the greater the value of the W weight.
The team of Software A project created the metric to fill its needs of manual effort estimation, considering margins of error based on the analyst's opinion. The whole equation is based on recent experience gained in the project. Despite the metric is ad hoc and not validated in controlled experiments or case studies, the results were accurate enough in our pre vious evaluations .

Generation of Test Procedures.
The test analyst must perform the test cases generation based on use cases. The test analyst carries this process using the proposed tool and the TestLink, aiming to compare the required time to complete the task and the estimation effort results. It is worth noting that the test analyst did not specify the test cases manually during the proof of concept.
To use the tool and generate the test cases, the test analyst performed the activities following the process illustrated in Figures 7 and 8. The details of each activity are presented as follows.
1. Select Sprint's use cases from JIRA: This activity re ceives as input the list of selected use cases from back log, which were selected at the beginning of the proof of concept. At this point, the test analyst must select the use cases for the automatic generation of test procedures. 2. Verify compliance of use cases with the organiza tion's patterns: In this activity, the test analyst checks the compliance of use cases from the previous activity regarding the organization's patterns. The focus of this activity is also to analyze whether the use cases follow the patterns configured in the automated tool. The test analyst must record any nonconformity in the corre sponding JIRA task. 3. Use case update: In this subprocess, use cases that do not follow the pattern configured in the tool must be up dated by the requirements analyst. This activity receives as input the list of incorrect use cases. In the end, the requirements analyst must produce updated use cases according to the established patterns.

Perform automatic specification of test procedures:
This activity contemplates the use of the tool to perform the automatic generation of test procedures. In order to do so, the test analyst must have access to verified use cases. At the end of the process, a set of test procedures must be generated. 5. Analyze the generated test procedures: After gener ating the procedures, the test analyst should perform an analysis of the generated steps. In this one, the test an alyst assures the correct extraction of steps and outputs from the use cases. If errors are found in the procedures, the team records the occurrences in the JIRA. 6. Adjust the set of test procedures: This subprocess consists of adjusting the set of test procedures when there are nonconformities, and the test analyst is re sponsible for performing them. The subprocess must receive the list of test procedures for adjustments and produces as output the adjusted set. 7. Generation of test cases: After analyzing the proce dures and performing the necessary adjustments, the test analyst should provide the test data to generate the test cases. The test analyst repeats this action until the desired coverage is obtained. The activity receives as in put a set of test procedures and must produce as output a suite of test cases. 8. Send test cases to the TestLink: The last activity of the process is to send the test cases to TestLink. The activ ity receives the test case suite as input and automatically sends them to the TestLink tool. After sending the test cases to TestLink, the test analyst must perform any ad ditional update directly in the TestLink. If more tests cases are generated for the the test suite already present in the TestLink, the test analyst must also add them man ually.

Metrics Results and Discussion
During the processing of use cases, the test analyst manu ally collected the times obtained at the end of each activity. Hence, the only instrument used in this step was a spread sheet with the same fields of Table 4. Table 4 shows the results of the tool usage. In this table, we detail the main use case information, such as the number of flows, steps, business rules, and the estimated modeling time of the use case. The table also presents the time spent to cor rect the nonconformity of patterns in the use case, the time required to edit the test suit (e.g., the addition of the business rules details), and, finally, the time necessary to adjust the test cases to be exported to TestLink. In total, 24 use cases were used, distributed among five Sprints, totaling 100 flows and 555 steps. As shown in Table 4, the use of the tool in the Sprint yielded 154 test cases, taking, in total, 2 hours and 34 min utes to be carried out. This represents a reduction of approxi mately 65,38% in effort compared to the 7 hours and 27 min utes of estimated manual effort.
From the results of the columns "Business rules", "UC steps number" and "Test Adjust Time" it is possible to verify that, most of the times, the use cases with a higher number of business rules and steps demanded a greater effort for cor rection. This complements the results of our previous work, which shows that the complexity of the use cases impacts the generation of tests since the manual intervention of the analysts is necessary.
In the previous paper , the result of the pilot study was presented using nine use cases belong ing to one System A project Sprint, which accounts for 41 flows and 186 steps. In this proof of concept, the test ana lyst estimated the time needed for the modeling of each use case and compared it with the time spent during automation. The reduction of effort in the pilot was approximately 65%. Thus, analyzing the results obtained, it is possible to identify an effort below the estimated for the generation of test cases, especially in use cases with a larger quantity of flows and steps. Regarding the question raised in the title of the paper, it can be said that this generation was worthwhile in the par ticular context of the project. However, more research must be carried out to generalize the findings.

Threats to Validity
After the proof of concept execution, we have identified some limitations which must be discussed. So, we have car ried out the presentation of these limitations as threats to va lidity, as showed in (Wohlin et al., 2012). We discussed the following threats.
Regarding the external validity, which determines the gen eralization of results, we used a metric to estimate the manual effort that could be a threat to the proof of concept. The met ric estimates the time required to complete the specification of the test cases, and it was created based on the recent expe rience of the test analysts. However, the resultant value takes into account specific characteristics related to the context of the Software A, e.g., extra time to adjust the test cases after the specification. Nevertheless, we believe that the metric is simple enough to be adapted for other projects.
Also, regarding the generalization of the study, the usage of patterns in the tool for data extraction can hinder the tool usage by other teams. To mitigate this, we tried to develop the tool to read patterns as general as possible, but that still fit the needs of the current project. Additionally, we also improved the tool to allow users to create custom patterns with regular expressions.
At last, the proof of concept was conducted by one of the authors, who is also the test analyst in the project. To reduce the possible bias of the evaluation, the test analyst used a tool as similar as possible to the real context. Additionally, the proof of concept was a pilot evaluation to analyze the feasibility of the tool.

Feasibility Study
To analyze the viability of using the developed solution in different contexts, we perform a proof of concept in a real Sprint with users trained and used to project context of Soft ware A. Aiming to obtain more data about the use of the UC2Proc tool, we performed a feasibility study with test an alysts from another industry project in partnership with the GREat 13 laboratory. This feasibility study focuses on mea suring the tool's efficiency from the user's perspective, with out previous experience with the developed solution in this case. Additionally, we also collected users' opinions about the positive and negative points of the solution.
Subsection 7.1 details the methodology adopted to con duct the feasibility study. Subsection 7.2 discusses the results obtained after the feasibility study with the users. Subsection 7.3 explains some of the limitations in the conduction of the work.

Methodology
We decided to plan the feasibility study using the DECIDE framework proposed by Preece et al. (2004). This framework aims to guide evaluations with users through a checklist with welldefined activities ranging from defining objectives to evaluating data. We selected DECIDE because it is easy to apply in practice, allowing the assessment to be conducted by inexperienced assessors (Preece et al., 2004). The following paragraphs will describe the six activities related to planning and execution.
(1) Define objectives. The first item on the DECIDE checklist concerns the definition of the objectives that should guide the feasibility study. Since our focus was to analyze whether the solution generated was capable of reducing the testing specification time, we defined the following objec tives: (i) to evaluate the efficiency of using the tool to spec ify tests by the test analyst and (ii) discover the positive and negative perceptions of the test analyst about the use of the tool.
(2) Explore questions. The second item of DECIDE is the definition of the questions that must be answered at the end of the study. Considering as our goal the analysis of the user performance with the automated tool, we elaborated on the following question: was the tool able to increase the testing specification speed compared to the manual specification? Regarding the objective of identifying positive and negative aspects, we elaborated on the following question: what are the positive and negative perceptions of the analyst during the tool usage?
(3) Choose the evaluation paradigm. The third item cor responds to choosing the evaluation paradigm and techniques to answer the questions of item two. In order to identify the efficiency of the generated solution, the participants of the feasibility study performed a manual and an automatic task. The details of the activities are described as follows: i Questionnaire for profile identification: the evaluators applied this questionnaire to identify some general char acteristics of the test analysts, such as their experience with use cases and automated test specification tools. The form filled by the users is presented in Table 6 of the Appendix A with the name Professional Profile. ii Manual specification of tests: the first task performed by the test analyst concerns the manual specification of tests related to a use case. In order to achieve it, the par ticipant received a document describing the use case, be ing instructed to make the specification in the TestLink. The use case for this activity is a fictional system for cre ating movie schedules in theaters, with a total of five flows and 24 steps. During the performance of the ac tivity, two evaluators made notes about the comments and doubts of the participants and other considerations about the execution of the task. iii Tool presentation: after that, the researcher presented the tool and showed an example of how to use it. The use case employed in the example is the same as the one in the manual task. iv Semiautomatic specification: in the second task, the test analyst received a use case document of music soft ware that allows the creation of playlists. Even though it is a different system, the use case has the same com plexity (number of flows, steps, and references between flows) as used in the manual task. Then, they were instructed to process it with the tool and send it to TestLink. The two evaluators also observed and made notes just as in the manual task. v Open questionnaire: finally, analysts needed to answer an open questionnaire with questions about the positive and negative aspects of the tool. The final form is pre sented in Table 6 of the Appendix A with the name User Tool Evaluation.
(4) Identify practical questions. The fourth item of DE CIDE corresponds to identifying issues related to the selec tion of users and materials to be used. The study population was professionals working on software testing projects at the Test Factory of GREat Lab. We selected two subjects for the pilot study and four for the final evaluation. Besides that, all tasks were performed in a controlled environment with the aid of a computer.
(5) Decide how to deal with ethical issues. The purpose of the fifth item concerns how to protect the privacy and other issues related to the participants of the feasibility study. At this point, test analysts were asked to sign a consent form to participate in the research and were informed about the purpose of the research, the data anonymization, and how it would be conducted.
(6) Data evaluation. The last item of DECIDE is about evaluating, interpreting, and presenting the data obtained dur ing the evaluation. The performance of the users was evalu ated by comparing the execution times of the tasks manually and with the tool's support. To answer the question regarding the positive and negative aspects of using the tool, the two evaluators analyzed and discussed the notes collected during the execution of the tasks and the answers from the form with open questions. Both results were combined to provide the fi nal topics related to positive/negative points of the solution.
Regarding the context of this project, it was impossible to carry out evaluations with many users, so we did not use statistical tests. Instead, we presented the data and discussed the times obtained and the possible reasons that led to the results. Given our focus on assessing efficiency, we did not assess the correctness of the test cases produced, as coverage requirements and specification type may change for different projects.

Results
This subsection presents the results obtained after conduct ing the feasibility study. Before the final evaluation, we con ducted two pilot tests based on the planning of Subsection 7.1 to make possible improvements. The tests were performed with a test analyst and a trainee in test analysis. After per forming the tests, the evaluators detected inconsistencies in the use cases of the tasks that were promptly corrected. We also decided to reduce the size of the test cases to take less time from the professionals' work. Finally, we made a few adjustments using forms and other evaluation materials.  Figure 9. Participants experience in test case specification with use case.
After the pilot tests were executed, we selected four par ticipants from a different project. All of those participants work as test analysts, but three of them were more experi enced, and one of them was an intern. Table 5 summarizes the profile of the four professionals in the way that it details their experience with software testing and specification tests based on use cases. Thus, as illustrated in the graph in Figure  9, most of the participants had some experience with speci fying test cases in the industry, but none of them works with use cases in their current projects. During the performance of the manual tasks with and with out the UC2Proc tool, we collected the total times per execu tion. Figure 10 presents a chart comparing the total times in minutes obtained by each of the four participants, in which all of them achieved lower specification time using the tool. While the average execution of the manual task was equal to 28.25 minutes, the average obtained with the use of the tool was equal to 14.50 minutes. Given the reduced size of the sample of participants that we obtained, we chose not to analyze statistical significance in the differences. There fore, the answer to the first question of the feasibility study (Was the tool able to increase the speed of tests specification when compared to the manual activity?) gives more indica tions that the developed solution can increase the specifica tion speed. However, more evaluations are needed to gather more data about its actual effectiveness.
After the activities were carried out, the participants were instructed to fill out an open questionnaire to point out the UC2Proc tool's positive and negative points. It is worth men tioning that the evaluators wrote down the participants' com ments during the execution of the tasks. The form used by the evaluators is available in 6 of the Appendix A with the name Evaluator Tool Evaluation.
The two researches that conducted the feasibility study per formed a qualitative analysis of the questionnaire answers and users' comments during the evaluation. The evaluators used the notes to complement answers of the questionnaire about positive/negative points, so this results are presented together. To accomplish this, we grouped the most repeated and contrasting topics into the following categories about the tool: efficiency, utility, understanding, and visual acceptance. The content of these topics was then used to compose the list of positive and negative points presented below.
As positive points, all participants mentioned that they felt a reduction in the time by using the UC2Proc tool compared to manual activity. It means that the analyst's work could be streamlined. Half of the participants also found positive the integration of the tool with systems like TestLink and JIRA. This last comment may be related to the participants' work ing context so that they are used to working with this suite of systems to manage the testing activities.
The main negative point indicated by three of the four par ticipants was about the titles generated by the tool. They were not very intuitive and could be difficult to understand during execution. This may happen because the UC2Proc tool cre ates the test case's title based on the titles of the last flow in the generated sequence. For instance, a test procedure with a flow sequence composed by Basic Flow, Alternative Flow and Basic Flow will have the title of the Basic Flow. This leads to test procedures with repeated titles.
Another negative point was the repetition of steps when the tool moves between different flows more than once. We observed it mainly from the Basic Flow to Alterna tive/Exception Flows. Still regarding the steps, two partici pants reported that it was necessary to correct some of them because there were system responses without user actions. It occurs because each system response has its own step in the generation process, even though they are in sequence. Fi nally, most of the users seemed confused when using the tool, once the buttons were not very intuitive and it offered little feedback on the actions. They also reported that some of the test procedures contained outcomes without steps, but this is how it is supposed to work, taking into account that the tool produces steps with only one outcome.

Limitations
Subsections 7.1 and 7.2 presented the methodology used to conduct the feasibility study and the results obtained, respec tively. However, we identified some limitations in the eval uation that deserve to be discussed. The first limitation con cerns the small number of participants, making it difficult to apply statistical tests to state the real difference between man ual and automatic task times. Nevertheless, participants had varied experience with software testing in the industry, hav ing already worked with different types of systems, tools for test process support, and types of requirements documents. Therefore, the selected group can be considered suitable for an initial evaluation outside the Software A project context. Moreover, the participants used only fictitious systems documentation because of the confidentiality of the Software A. However, the use cases used are similar to the original doc umentation of the software. We also tried to create new use cases where they have complexities and interactions similar to the use cases from Software A.
Regarding the execution of tasks, all participants executed the manual task before the automated one. To mitigate this threat and reduce bias in this approach, the analysts used dif ferent use cases in both tasks.
Finally, during the feasibility study, some participants pointed out problems in the tool procedures. The main con cern was about the repetition of some steps, mainly during the transition from Basic Flow to Alternative and Exception Flows. In its current version, the tool can incorrectly repeat some steps from the Basic Flow steps. Even so, we believe that it was not of high impact for the execution of the evalu ation, considering that the repeated steps could be easily ex cluded.

Lessons Learned
As explained in Section 3, the performed study had activi ties related to the specification of the procedure. While us ing the tool during the Sprints, it was possible to obtain lessons regarding the solution and its use circumstances, so they are based on the researcher's observations, opinion of the requirements/tests team's members, and analysis of the collected metrics. These lessons represent some of the chal lenges obtained with the use of the solution in a way that actions taken during the Sprint were integrated into the pro cess of using the tool. The main lessons learned during the process are listed as follows: LL1: The efficiency of a test case generator tool using use cases is strongly related to the following of the writ ing pattern. Use case modeling is a task that demands a high degree of instruction, communication, and knowledge about the software product. Being a manual activity, it is common to create certain textual documents susceptible to attention errors in the writing pattern. These errors could vary from problems in the spelling, plural, blanks in the markup charac ters (such as writing "[FA 01] instead of [FA01]") or even hidden logical loops. Such errors generated flaws in the tool and needed to be corrected, implying additional time in the process of semiautomated generation. Therefore, one action taken was to perform a detailed inspection to determine if the use cases follow the template of the project, thus making any necessary corrections. Having a use case with the correct pat tern as an example helped analysts to identify inconsistencies more quickly and, consequently, enable the use of the tool. For teams working with poorly detailed documentation, the tool may not generate good results. However, it is possible to address in future work, more specific scripts for different types of projects and documentation.
LL2: The deployment of an automation tool may not be worth the effort reduction. Throughout the design of Software A, the number of flows and steps was used to esti mate the time needed to generate the tests. In some cases, this calculation inaccuracy is observed in use cases with a large number of flows, but that could be easily modeled manually or in an automated way. In this case, the tool has allowed a negligible reduction in efficiency gains, so the effort to adapt the use cases of a project to the presented template may not compensate, especially if the use case is straightforward. In these scenarios, the implementation of a semiautomated tool may require a great deal of manual work to adjust the use cases to a template and the generated procedures, which may impact the project activities. Nonetheless, for most of the use cases of Software A, there were indications that the time re duction for the test specification compensated the implemen tation of the tool in the Test Factory's context. LL3: Textual use cases do not express all the informa tion necessary for good test coverage. In some use cases of Software A, it was not possible to automatically obtain all the necessary information to generate more test cases from the use case documentation. The reason for this is that busi ness rules were expressed in unstructured natural language and screen prototypes were images. It prevented the extrac tion of some input variables for the procedures; the used use case patterns gave analysts freedom to specify. During the use of the tool, the test analyst needed to continue consulting the other documents during the analysis process and by that ensure the desired coverage.
LL4: The integration of a solution with specific pro cess tools is an important factor for efficiency gain. Dur ing the execution of a requirements/test process, the analysts may need to interact with different support tools to facilitate the activities' performance. Therefore, to facilitate the prac tical application of an automation tool, it becomes essential that the developed solution integrates with the other systems. For example, in this report, the analysts originally cloned the tests on TestLink, but the task generated many errors due to the lack of options for the test data and interface problems. In this sense, using the tool for specification activities and then submitting the tests to TestLink helped to decrease the errors.
LL5: Generating additional tests for business rules were not advantageous in all cases. According to user re ports, the implementation of the new functionality was ad vantageous since it signaled the business rules referenced in the case of use. On the other hand, three negative points were reported about the functionality. The first one is that a good part of the business rules could be covered with only one test case, generating less useful tests. The second point is related to use cases that were too specific and had detailed flows to the business rules; this way, it was necessary to remove the duplicate tests. Finally, the users needed to apply some effort to complement the test cases based on business rules, since only the title was generated.

Conclusion
This paper presented an experience report about the gener ation of test procedures in an industrial context. This paper is an extension of our previous work , whose the main goal was to analyze the feasibility of insert ing a tool to automate the generation of tests based on use cases.
We implemented the solution in partnership with the in dustry, thus enabling the generation of a product that better suits the needs of the requirements/testing team, which leads to the question of this paper: "Is it feasible to use a tool to generate test cases from textual use cases in the test process at a test factory?".
Our previous results showed that the proposed solution positively contributed to the analysts' activities. Therefore, we have extended the current work through the following contributions: (i) improvements in the tool's generation of test procedures; (ii) data related to more testing cycles of Soft ware A; and (iii) feasibility study with test analysts.
Regarding the tool's improvement, we realized that only indicating the procedures of business rules to be tested might not be sufficient. In such manner, we obtained low gain in the effort. Therefore, the results reinforced the need for specifi cation of business rules in a structured manner.
When it comes to the tool usage in more releases, we con cluded that the effort reduction in the test generation was maintained, as well as the relationship between the complex ity of the use cases and the time spent in manual interven tion during the specification process. The reduction in ef fort equaled 65,38% in the context of the industry software project. Furthermore, the majority of the effort required was adjusting the test procedures generated by the tool.
In addition to the proof of concept, the feasibility study has provided further insight into the efficiency of the solution. Although all users completed the task more quickly using the tool, they pointed out interface issues that can make the software hard to use.
These evaluations also enabled the generation of one ad ditional lesson learned regarding the generation of tests for business rules, which demanded additional effort to remove unnecessary tests. This set of lessons learned can give more information about the introduction of an automated tool in a testing process.
Considering the characteristics of Software A project, the team decided for the development of a simple custom solu tion. Nonetheless, finding the right degree to which the test ing process had to adapt to the insertion of new tools was chal lenging. Regardless of the decision about the usage of custom solutions or other available solutions, we believe that more work is needed to provide practical insights in the context of test factories, which could benefit projects in distributed scenarios.
Concerning future work, we plan to research how to doc ument business rules to increase the efficiency of generating test procedures. In the current work, this improvement has become even more evident, considering the perceived effort necessary to update procedures with partial descriptions of business rules. The test analysts of Software A also pointed out that several use cases needed corrections in its patterns. Since this hinders the usage of the tool, we also plan to apply techniques of static analysis in the requirements documen tation. Finally, we intend to make improvements in the tool based on user comments and analyze how the test procedures can assist the generation of automated scripts for functional tests. Table 6 presents the following forms used in the evaluation:

A Instruments of the Feasibility Study
(1) Professional Profile, used to to collect the professional profile; (2) User Tool Evaluation, filled by the participants to report the positive and negative points of the solution; and, (3) Evaluator Tool Evaluation, used by the research to col lect the time and general notes during the tasks.