Understanding the Impact of Introducing Lambda Expressions in Java Programs

Background : The Java programming language version eight introduced several features that encourage the func­ tional style of programming, including the support for lambda expressions and the Stream API. Currently, there is a common wisdom that refactoring legacy code to introduce lambda expressions, besides other potential benefits, simplifies the code and improves program comprehension. Aims : The purpose of this work is to investigate this belief, conducting an in­depth study to evaluate the effect of introducing lambda expressions on program compre­ hension. Method : We conducted this research using a mixed­method approach. For the quantitative method, we quantitatively analyzed 158 pairs of code snippets extracted directly either from GitHub or from recommendations from three tools (RJTL, NetBeans, and IntelliJ). We also surveyed practitioners to collect their perceptions about the benefits on program comprehension when introducing lambda expressions. We asked practitioners to evaluate and rate sets of pairs of code snippets. Results : We found contradictory results in our research. Based on the quantitative assessment, we could not find evidence that the introduction of lambda expressions improves software readability— one of the components of program comprehension. Our results suggest that the transformations recommended by the aforementioned tools decrease program comprehension when assessed by two state­of­the­art models to esti­ mate readability. Differently, our findings of the qualitative assessment suggest that the introduction of lambda expression improves program comprehension in three scenarios when: we convert anonymous inner classes to a lambda expression, use structural loops with inner conditional to an anyMatch operator, and apply structural loops to filter operator combined with a collect method. Implications : We argue in this paper that one can improve program comprehension when he/she applies particular transformations to introduce lambda expressions (e.g., re­ placing anonymous inner classes with lambda expressions). Also, the opinion of the participants highlights which kind of transformation for introducing lambda might be advantageous. This might support the implementation of effective tools for automatic program transformations.


Introduction
Software evolves to adapt to social and technical needs (God frey and German, 2008): users might request new features, or performance constraints must be met. Indeed, the success of a system depends on how easy its evolution is. If it does not change to reflect the needs of their users (Lehman and Ramil, 2001), it is doomed to failure. In the same vein, suc cessful programming languages change over time (Overbey and Johnson, 2009): programmers require more features and more expressivity from language constructs.
Mainstream programming languages (e.g., Python and C++) also evolve to support new programming styles, such as the recent trend of imperative languages to adhere to the functional style. Since version 2.0, Python language supports features to facilitate list comprehension (Lott, 2018), a fea ture originally found in functional languages (like Erlang and Haskell). Similarly, C++ introduced lambda expressions in C++ version 11 (Stroustrup, 2013).
Recently, Java has adopted a faster release cycle to fre quently deploy new features. Some of these releases did not significantly change the language semantics. Contrast ingly, other releases present remarkable changes in language constructs. This is the case for Java 8, which introduces new features to facilitate functional programming and be havior parameterization. Using these features, developers can pass (anonymous) functions as arguments to other func tions (Urma et al., 2014).
However, as languages evolve, programs' source code usually lag behind. When a language releases a new version, source code that was uptodate suddenly becomes legacy code and older constructs often persist in the system while developers add new ones (Overbey and Johnson, 2009). The coexistence of old and new constructs puts a toll on program mers, requiring them to be familiar with different idioms that implement a similar behavior. To mitigate the problem of these old and new constructs coexisting, Overbey and John son (2009) recommended using refactoring tools that aim to help developers introduce new language constructs in legacy programs automatically.
For instance, Gyori et al. (2013) proposed a tool to rejuve nate Java programs that replaces legacy constructs, such as anonymous inner classes, with lambda expressions. The au thors claim that the adoption of lambda expressions in Java improves program comprehension, though without present ing empirical evidence (Gyori et al., 2013). However, Dantas et al. (2018) report that this kind of transformation might not always improve the quality of the code, and developers often reject patches applying this kind of transformation (Dantas et al., 2018). Moreover, Mazinanian et al. (2017) found that developers often perform this kind of transformation without any tool support.
In previous work (Lucas et al., 2019), we investigated how the introduction of lambda expressions impacts source code comprehension. We found that stateoftheart metrics to measure code readability fail to capture the benefits of intro ducing lambda expressions. Nonetheless, based on the find ings of a survey with practitioners, we disclosed that the in troduction of lambda expressions improve program compre hension only in a few specific scenarios, such using lambda expressions as a substitute to anonymous inner classes.
In this paper, we extend our previous work, mitigating two threats of that research: (a) the use of a small num ber of pairs of code snippets (each pair comprising the code before and after the introduction of a lambda expression) during the qualitative assessment; and (b) the use of real world code snippets collected from opensource projects, whose versions after introducing lambda expressions could also have additional modifications (such as a bug fix). There fore, we report the results of an extensive empirical investi gation on the benefits of introducing lambda expressions in legacy code, considering 92 pairs of code snippets as sug gested by automated tools. We review some aspects of our previous work and present new evidence about: • Scenarios that benefit from introduction lambda ex pressions: We identified scenarios where the introduc tion of lambda expressions improve program compre hension. Tool developers might use this information to customize techniques that find opportunities to refactor a legacy code to use lambda expressions. • Lambda expressions make the code more succinct: Our findings provide evidence that the introduction of lambda expressions makes the code more succinct (in more than 80% of the scenarios, the total num ber of lines of code reduced after introducing lambda expressions)-even though this does not necessarily lead to an improvement on code comprehension. • Lambda expressions make debugging difficult: Our results suggest that the introduction of lambda expres sions can lead to pieces of code that are harder to debug. We consider this as a possible negative side effect of in troducing lambda expressions. • Relevance of tooling support for rejuvenating Java code : We also found that developers consider tooling support to be important for performing transformations introducing lambda expressions in Java legacy code. Nonetheless, existing tools also recommend transfor mations that need manual improvements, lead to small benefits, or make the code harder to understand.

Background and Related Work
Program comprehension is a fundamental software attribute that facilitates its maintenance and supports its evolu tion (von Mayrhauser and Vans, 1995). Understanding ex isting software enables maintainers to successfully evolve functionality and/or integrate improvements for every type of change commonly associated with software maintenance and evolution, including adaptive, perfective, and correc tive modifications (von Mayrhauser and Vans, 1995). Un derstanding software is challenging due to several factors, one of which is that large programs are often maintained by developers with different skills and using different prac tices (Storey et al., 2000). Moreover, in many cases, the source code may be the only available and up to date refer ence for a software (Storey et al., 2000), though poor design and lack of good programming practices might compromise program comprehension (Tilley et al., 1996). The practices developers use to understand a software are diverse and often are taskrelated (e.g., documenting part of a system, fixing a bug, and implementing a new feature). In deed, "programmers use domain knowledge, programming knowledge, and comprehension strategies when attempting to understand a program" (Tilley et al., 1996). Program com prehension uses existing knowledge to acquire new knowl edge to build a mental model of the software that might help developers accomplish a specific task (von Mayrhauser and Vans, 1995). While it is true that the skills and experiences of a developer are relevant when he/she wants to understand software, it has been reported that a set of recommended practices (such as the use of programming idioms and code formatting tools, design patterns, and refactoring) might also support program comprehension, in particular when using a bottomup strategy as defined by Pennington (1987). Con versely, the use of some obscure programming constructs (e.g., atoms of confusion) increases the rate of source code misunderstandings (Gopstein et al., 2017). For instance, the atoms of confusion conditional operator and logical as con trol flow 1 involve fundamental language constructs such as math operators and if statements.
Although many software characteristics might impact pro gram comprehension (e.g., variable names (Avidan and Feit elson, 2017) and atoms of confusion (Gopstein et al., 2017)), in this paper, we are particularly interested in aspects re lated to source code quality that might either facilitate or hinder program understanding (Storey et al., 2000). Sev eral research studies (Buse and Weimer, 2010; Posnett et al., 2011; Scalabrino et al., 2016 have explored the use of mod els for estimating the readability of the source code-which directly affect program comprehension. Additionally, previ ous research has already investigated the impact of coding practices on software readability (Gopstein et al., 2017; dos Santos andGerosa, 2018). Our work builds upon these pre vious efforts, using existing models for estimating software readability (Buse andWeimer, 2010; Posnett et al., 2011), and procedures to qualitatively assess the preference of de velopers when considering sets of code snippets (dos Santos and Gerosa, 2018). We apply these models in a different and particular scenario: the introduction of lambda expressions into Java legacy code.
Lambda expressions were introduced in Java 8 to sup port functional programming , lift ing function definitions to values, thus allowing develop ers to pass a lambda function definition as an argument to a method (Alqaimi et al., 2019). Developers can also use lambda expressions in Java to abstract parallelism and re move the boilerplate code necessary to write anonymous in ner classes (Alqaimi et al., 2019). Moreover, lambda expres sions enable chaining functional recursive patterns (e.g., map and filter) using the stream API methods as an alternative way to iterate, filter, and collect data from a collection (Maz inanian et al., 2017). For instance, consider the code snippets in Figure 1 (filter 1 and filter 2), based on an implementation of the 101Companies problem domain (Favre et al., 2012). In this example, the goal is to filter a department's employees that have a salary greater than a given value. In the first snip pet, the code uses an implementation without the language features of Java 8. In the second, the implementation uses a lambda expression as an argument to the filter method of the Java 8 stream API.  Previous research on Java lambda expressions focused on their introduction via automatic techniques for refactor ing legacy code to "make the code more succinct and read able" (Gyori et al., 2013; Dantas et al., 2018-in partic ular situations that one can, for instance, replace either an anonymous inner class or a loop over a collection by state ments involving lambda expressions. Other approaches rec ommend transformations that introduce lambda expressions to remove duplicated code  and to use parallel features of Java 8 properly (Khatchadourian et al., 2019). Also, Mazinanian et al. (2017) present a comprehen sive study on the adoption of Java lambda expressions to un derstand the motivations that lead Java developers to adopt the functional style of thinking in Java. The authors pub lished a large dataset with more than 100 000 real usage sce narios. We use this dataset to understand program compre hension benefits with the adoption of Java lambda expres sions.
At first glance, the use of lambda expressions, due to its conciseness, yields a more succinct and readable code (Gyori et al., 2013; Dantas et al., 2018. However, this is not always the case, as Dantas et al. (2018) produced automated refactor ings for iterating on collections that developers judged less comprehensible. We aim to investigate further which scenar ios benefit from the introduction of lambda expressions. To the best of our knowledge, previous research did not inves tigate the assumption that the use of lambda expressions ac tually lead to benefits on program comprehension.

Study Settings
The general goal of this research is to investigate the bene fits on code comprehension after refactoring a Java method to introduce a lambda expression, and thus answering the research questions we present in Section 3.1. To this end, we conducted a research in two phases, both using a mixed methods approach.
In the first phase, whose results we presented in previ ous work (Lucas et al., 2019), we carried out a quantitative assessment of 66 pairs of code snippets, using stateofthe art models for measuring software comprehension (see Sec tion 3.2). Each pair corresponds to a method body before and after introducing lambda expressions. We also conducted a qualitative investigation (survey) considering the opinion of 28 practitioners that answered questions that also aim to com pare the code before and after the introduction of lambda ex pressions in nine pairs of code snippets.
In the second phase we mitigated some possible threats that we identified in the first study: a small number of code snippets used in the survey of the first phase and the assess ment of code snippets that might contain not only a man ual program transformation, but actually a manual program transformation and an additional contribution to the program (e.g., a bug fix). As such, in the second phase we leveraged existing support of program transformation tools to refactor legacy code of open source systems to introduce lambda ex pressions. Considering the outcomes of these program trans formation tools, we again conducted a quantitative assess ment (using stateoftheart models for measuring software comprehension) of a random sample of 92 pairs of code snip pets and a survey with 182 practitioners that evaluated at least five code snippets from this sample of 92 pairs.

Research Questions
We investigated the following research questions in our study.
(Q1) Does the use of lambda expressions improve program comprehension?
(Q2) Does the introduction of lambda expression reduce source code complexity? We conducted this research using an iterative approach, and after investigating a given question, new subquestions and hypothesis emerged. For instance, we investigated whether or not the reduction in the size of a code snippet, af ter introducing a lambda expression, has an influence on the perception of the participants about the quality of the trans formation.

Metrics of the Quantitative Study
We measured the complexity of a code snippet using two met rics: number of source lines of code (SLOC) and cyclomatic complexity (CC). Both metrics have been used in a num ber of studies (Riaz et al., 2009; Baggen et al., 2012; Land man et al., 2016. In addition, we used two models to esti mate and compare the readability of each pair of code snip pets considered in our research. Readability is one of the as pects used for assessing program comprehension, and here after both terms (readability and program comprehension) are used interchangeably. The first model we used to esti mate program comprehension is based on the work of Buse and Weimer (2010). It estimates the comprehensibility of a code snippet considering a regression model that takes as in put several characteristics, including the length of each line of code in a code snippet, the number of identifiers in a code snippet, and the length of the identifiers present in a code snippet (Buse and Weimer, 2010).
The second model was proposed by Posnett et al. (2011), which builds upon the Buse and Weimer model, though con sidering a smaller number of characteristics. Based on this model, we can estimate the readability of a code snippet us ing Eq. (1) and Eq. (2); and the constant C = 8.87.
That is, in the Posnett et al. model, we calculated pro gram comprehension using three main components: the num ber of lines of a code snippet (L(X)), the volume of a code snippet (V (X)), and the entropy (H(X)) of a code snip pet. The volume of a code snippet X is given by V (X) = N (X)log 2 n(X), where N (X) is the program length of the code snippet and n(X) is the program vocabulary. These measures are defined as • Program Length (N (X)) is given by N (X) = N 1(X) + N 2(X), where N 1(X) is the number of op erators and N 2(X) is the number of operands of a code snippet. • Program Vocabulary (n(X)) is computed using the formula n(X) = n1(X) + n2(X), where n1(X) is the number of unique operators and n2(X) is the number of unique operands of a code snippet.
The entropy of a document X (in our case a code snip pet) is given by Eq (3), where x i is a token in X, count(x i ) is the number of occurrences of x i in the document X, and p(x i ) is given by Eq (4). The entropy (H(X)) in our context estimates the degree of disorder of the source code.
We used an existing tool 2 to estimate the comprehensibil ity of the code snippets using the Buse and Weimer (2010) model. We developed our own tool to automate the computa tion of the comprehensibility model by Posnett et al. (2011). 3 We executed these computations for all pairs of code snip pets that we collected either from real scenarios (first phase) or from the outcomes of the program transformation tools (second phase).

Code Snippets' Datasets
In the first phase of this research, we used an existing tool (MinerWebApp) and a dataset from a previous work (Maz inanian et al., 2017), to identify code snippet candidates to our research. MinerWebApp monitors the adoption of Java lambda expressions in open source projects hosted on GitHub, and has been used in previous research on the adop tion of lambda expressions . The goal of MinerWebApp is to identify and classify the use of lambda expressions code snippets. MinerWebApp classifies the occurrences of lambda expressions into three categories: • New method: When a new method containing lambda expressions is added to an existing class; • New class: When a new class is added to the project, and this class contains methods with lambda expressions; • Existing method: When a lambda expression is intro duced into an existing method.
The decision of using an existing tool and dataset simpli fied our process of collecting real usage scenarios of lambda expressions. We randomly selected 59 code snippets from the MinerWebApp dataset-considering exclusively the code snippets of the third category (Existing method). We also col lected 29 code snippets of refactoring scenarios we gener ated using RJTL (Dantas et al., 2018) and submitted via pull requests to open source projects. In total, we selected 88 code snippets from 22 projects, including code snippets from the Elastic Search, Spring Framework, and Eclipse Foundation projects. We manually reviewed these code snippets and re moved 22 pairs that clearly do not correspond to a refactoring or that already had a lambda expression in the first version of the code. This cleanup lead to a final dataset with 66 pairs of code snippets from 19 projects that we considered in the first phase of the research. In Table 1 we show the number of pairs of code snippets we collected from the GitHub repositories, coming either from MinerWebApp or from RJTL transfor mations.
All procedures to collect and characterize the code snip pets from GitHub pages have been automated, using a crawler and additional scripts for computing source code  Figure 2 shows an overview of the approach). The crawler expects as input a CSV file, where each line specifies the project, the url of the commit, the start and end lines of the code snippet, and the type of the refactoring (e.g., anony mous inner class to lambda expression, foreach statements to a recursive pattern using lambda expressions, and so on). In the second phase, we used three automated refactor ing tools (RJTL tool, NetBeans IDE, and Intellj IDE) to find opportunities and then introduced lambda expressions glob ally into the methods of five opensource systems (see Ta ble 2). We chose these systems because they have been used to assess the performance of Lambdaficator (Gyori et al., 2013)-lately integrated into NetBeans to assist developers to migrate legacy systems towards Java 8. We were also able to build and execute the test cases of these systems, before and after applying the transformations. After executing the three tools in the five systems, we generated a dataset of 1987 transformations recommending refactorings to intro duce lambda expressions ( Table 2 shows the details). We followed a set of steps in order to validate and create our second dataset of transformations. We first downloaded and built the last (stable) version of the systems, before ex ecuting the refactoring tools. After that, for each program transformation tool, we created a specific Git branch, exe cuted the program transformation tool, and built the system again-looking either for a compilation or test execution fail ure. We checked out the files that, after applying a transfor mation, introduced a failure, removing spurious transforma tions. Accordingly, we built a dataset with 1987 transforma tions. We then randomly selected 92 pairs of code snippets to explore in the second phase of our research. We classified this final set of 92 transformations (Appendix A details the taxonomy) and computed the source code metrics and read ability models. We stored the code snippets and the results of the metric calculations into a database. Table 3 summarizes this final set of 92 transformations. We finally investigated the situations where at least two tools recommended a refactoring in the same code snippet. Considering the initial set of 1987 transformations, we found 357 cases (17.96%) of code snippets having recommenda tions from more than one tool. Nonetheless, the recommen dations are not exactly the same. For instance, the code snip pets of Figure 3 present transformations recommended to the same original code (Figure 3(a)), but suggested by NetBeans IDE, IntelliJ IDE, and RJTL. In this example, it is possible to realize that the IntelliJ IDE leverages the mechanism of type inference, while NetBeans IDE and RJTL do not. Moreover, there is a slight difference in the indentation of the resulting code from the NetBeans IDE and RJTL recommendations. We removed this kind of duplication in our dataset with 100 code snippets, leading to a final dataset of 92 pairs of code that we used in the second phase of our research.

Procedures of the Qualitative Study
Regarding the qualitative study, we conducted the research using an approach based on a previous work (dos Santos and Gerosa, 2018). That is, we designed an online survey that al lowed the participants to evaluate pairs of code snippets. In the first phase we only invited professional developers with some background in Java programming, from a convenient population of developers in our own professional network. Table 7 details the characteristics of the survey participants from the first phase of our research. The survey was orga nized in two sections. The first section aimed to character ize the experience of the participants; while the second one aimed to investigate the benefits (or drawbacks) of introduc Code after applying the RJTL transformation ing lambda expressions into legacy code. This second section comprised the following (survey) questions.
• S1Q1: Do you agree that the adoption of lambda ex pressions on the right code snippet improves the read ability of the left code snippet? This is a Likert scale question-(1) meaning Strongly disagree and (5) mean ing Strongly agree, which focuses on the readability as pect. • S1Q2: Which code do you prefer? This is a yes or no question, which aims to understand if the new code improves general quality attributes. The same question has been explored in a previous work (dos Santos and Gerosa, 2018). • S1Q3: Would you like to include any additional com ment to your answers? This is an open question that al lowed the participants to optionally present further de tails about their answers.
We first conducted a pilot with five students, to evaluate whether our online survey tool would be able to properly cap ture the opinion of the developers. After conducting this pi lot, we implemented several adjustments in the layout and in the functionalities of the tool, in order to increase our confi dence in the tool for the next executions of the survey. The pilot also revealed that answering all pairs of code snippets was a timeconsuming activity. For this reason, we split the pairs of code snippets into two groups, and then randomly assigned the participants to answer the survey questions con sidering code snippets either from the first or from the second group. The participants should answer the survey's questions for a set of a minimum three and a maximum of six pairs of code snippets-randomly selected from the first or second groups of code snippets.
Considering the second phase of our study, we used the set of 92 randomly selected pairs of code snippets whose trans formed code correspond to a recommendation from RJTL, NetBeans IDE, or IntelliJ. In this phase, the participants an swered the following questions. Respondents presented their opinion about these sen tences using a Likert scale-(1) meaning Strongly dis agree and (5) meaning Strongly agree. The first three sentences are claims that motivate the adoption of Lambda expressions in Java programs (Gyori et al., 2013). The fourth sentence came from our own experi ence in debugging pieces of code that use Java lambda expressions. • S2Q2: How often would you perform this type of trans formation? This is a Likert scale question-(1) mean ing Never and (5) meaning Always. The goal was to evaluate how often developers would perform a specific transformation to introduce lambda expressions. • S2Q3: How important is the automated support for this kind of transformation? This is a Likert scale question-(1) meaning Not important at all and (5) meaning Extremely important. The goal of this ques tion was to evaluate how important the use of tools to support a specific transformation is. • S2Q4: Would you perform this transformation? Why?
This is an open question that allowed the participants to optionally present further details about their opinion.
In the second phase of our research, we used a set of so cial media tools to invite developers to answer the survey. That is, we sent a message to specific communities of Java Developers, including communities from Facebook, Reddit, Telegram, and mailing list of Java developers (e.g. NetBeans Developers, JDK Developers). We presumed that the devel opers have a good experience with Java programming. This phase had 182 participants located in 32 different countries (see Table 4). The developers needed 04:23 minutes (on av erage) to complete the questionnaire, where they evaluated a maximum of 5 transformations and answered a set of 7 questions regarding each pair of code snippet. In this phase, we generated a survey randomly selecting five pairs of code snippets for each participant. Tables 5 and 6 summarize the number of participants considering the level of education and professional experience of the respondents, respectively.  Table 6. Characterization of the Survey's Participants in the second phase over developer experience.

Developers Experience Number of participants Percentage (%)
Less than one year 14 7.69% Between one and four years 52 28.57% Between five and ten years 48 26.37% More than ten years 68 37.36% We crossvalidated the results of the qualitative assess ment with the results of the quantitative assessments, by cor relating the results of the estimates for program comprehen sion from the two models discussed in the previous section with the results of the surveys. We also explored the results of the survey considering the measurements of SLOC and CC, for all pairs of code snippets in the survey.

Data Analysis
We used exploratory data analysis (EDA) to answer our first two research questions. EDA is a method that allows re searchers to build a broad understanding about the data, using descriptive statistics (e.g., median and mean) and graphical methods (e.g., histograms and boxplots). We also leveraged hypothesis testing to further explore the first two research questions.
Regarding the remaining research questions, which we ad dressed using surveys as the main method for data collection, we also relied on EDA to consolidate the answers to the Lik ert scale based questions (in terms of descriptive statistics and plots); while the answers to the survey's openend ques tions were literally quoted. Since we collected a more sig nificant feedback for the openended questions in the second phase of the research (177 answers in total), we also consoli dated the answers to the second phase's openended question using Thematic Analysis (Silva et al., 2016; Shrestha et al., 2020. We conducted our thematic analysis in four steps. In the first, we carried out an initial reading of the answers to the fourth question of our survey (S2Q4), preparing the scene before starting the coding stage. In the second step, we per formed an initial coding for each answer. Next, in the third stage, we analyzed the codes with the goal of finding themes (that is, grouping of related codes). Finally, in the fourth step, we reviewed and merged the themes, generating a new, more comprehensive list of topics. We included a small phase of crossvalidation, in which two authors gave feedback on the assignments. These two authors did not contribute to the ini tial assignment of codes and themes to the answers.

Results of the First Phase
In this section we present the results from the first phase of our research. Initially we discuss the outcomes of the quan titative assessment, which considers the models of Buse and Weimer (2010) and Posnett et al. (2011) (Section 4.1). After that, we present the results of the qualitative assessments and compare the findings of the two studies (Section 4.2).

Quantitative Assessment
We considered the 66 pairs of selected code snippets dur ing the quantitative assessment. For each pair, we calculated the number of lines of code (SLOC), the cyclomatic com plexity (CC), the estimate comprehensibility using the Buse and Weimer and the Posnett et al. models. We addressed two main hypothesis in order to answer our research questions.

H1:
The introduction of lambda expressions improves program comprehension, according to the stateofthe art readability models.
Conversely, our first null hypothesis (H1 0 ) investigates whether the introduction of lambda expressions does not change program comprehension, according to stateofthe art readability models. We used a signal test (Wilcoxon SignedRank Test Wilcoxon (1945)) to investigate this hy pothesis, considering the comprehensibility assessments us ing the models of Buse and Weimer and Posnett et al. For each pair of code, the introduction of lambda expres sions might have increased, decreased, or unchanged the comprehensibility, according to both models. As such, the Wilcoxon SignedRank Test tested the null hypothesis that the comprehensibility of the source code before and after the introduction of lambda expressions are identical (Wilcoxon, 1945). Table 8 summarizes the results, considering all pairs of code snippets.
Although the Posnett et al. method builds upon the model of Buse and Weimer, our analysis revealed a lack of agree ment in the results from the two models. The outcomes of the test revealed that the introduction of lambda expres sions actually decreases program comprehension (pvalue < 0.0001), when considering the Buse and Weimer model. Nonetheless, when we considered the Posnett et al. model, we could not reject the null hypothesis, and this result sug gested that the introduction of lambda expressions does not affect the comprehension of the code snippets (pvalue = 0.668). Due to these conflicting results, we compared both H2. SLOC and CC can be used to predict the benefits (or drawbacks) on program comprehension, according to the readability models considered in this research.
We investigated this hypothesis using a regression model. First, we calculated the differences in the SLOC (∆s) and CC (∆cc) metrics, considering the code snippets before and after the introduction of lambda expressions. We then built two regression models, one considering as response variable the difference in the Buse and Weimer model (∆bw) and one considering as response variable the difference in the Posnett et al. model (∆p).
Accordingly, we unfolded H2 in two alternative hypothe ses, one for each readability model. That is, the null hypothe ses for H2 are as follows.
• H2.1 0 : There is no relationship between ∆bw and the predictors ∆s and ∆cc. • H2.2 0 : There is no relationship between ∆p and the predictors ∆s and ∆cc. Table 9 and Table 10 show the results of the regression analysis, considering the first and second models of Eq (5) and Eq (6). Considering a significance level < 0.05, we could not predict the benefits/drawbacks of introducing lambda ex pressions, according to the Buse and Weimer model to esti mate readability, in terms of lines of code (pvalue = 0.08) and cyclomatic complexity (pvalue = 0.98). This result sug gested that we should not reject the null hypothesis H2.1 0 , and there is a negligible relationship between the predic tors (∆s and ∆cc) with the response variable ∆bw. Finally, only 2% of the variability in ∆bw was explained by the lin ear regression of Eq. (5) (Adjusted Rsquared: 0.02). Simi larly, variables ∆s and ∆cc did not explain the variability in ∆p (Adjusted Rsquared: 0.05). Nonetheless, considering the second regression model (Eq. (6)), the result suggested that there is a relationship between SLOC and ∆p (pvalue = 0.01)-though it is a small correlation (ρ = −0.188 using the Spearman correlation method).
In summary, the results of the regression analysis refuted our hypothesis H2: ∆s and ∆cc presented a negligible re lationship with ∆bw and ∆p; and thus they could not ad equately predict the variability in the response variables of Eq. (5) and Eq. (6).

Qualitative Assessment
Considering the qualitative assessment, 28 participants with a substantial experience in Java programming evaluated a number between three and six pairs of code snippets. For each pair of code snippet, these participants answered the survey questions S1Q1, S1Q2, and S1Q3. Recall that we split the code snippets into two groups, and thus each code snippet was evaluated by 14 participants. The data collection lasted 16 days, and, on average, each participant spent 2:30 minutes to evaluate each pair of code snippet. We used two forms of data analysis in this assessment. First, we summarized the responses to SQ1 and SQ2 using ta bles and plots, which allowed us to build a broad view of the closed questions' answers. In the second analysis, we con sidered the answers to the open questions literally (some of them are quoted here), to draw a broader understanding about the implications of refactoring Java legacy code to introduce lambda expressions.

Improvements on Readability
The goal of the first question of our survey (Do you agree that the adoption of lambda expressions on the right code snip pet improves the readability of the left code snippet?) was to evaluate if, according to the perception of Java developers, the introduction of lambda expressions improve the compre hension of the code snippets. We used a Likert scale to inves tigate this. Considering the answers to all pairs of code snip pet, 11.1% and 39.7% either strongly agree or agree that the introduction of lambda expressions improve the readability of the code, respectively; while 24.6% of the responses were neutral, 21.4% disagree, and 3.2% strongly disagree with the SQ1 statement (see Table 11). Therefore, we found develop ers leaning towards a readability improvement after the in troduction of lambda expressions.
To better understand this result, we analyzed the an swers for each pair of code snippet (see Figure 4). Trans formations 1035, 1052, and 1180 present more than 60% of positive answers (i.e., introducing lambda expressions improves the readability of these code snippets). Differ ently, the pair of code snippet 1182 on Figure 5 received 79% of answers either neutral or negative (i.e., the intro duction of lambda expressions seems to reduce the read ability of this code snippet). In this particular case, a for(obj: collection) {...} statement is replaced by a collection.forEach(obj -> {...}) loop, which in cludes a lambda expression. Most of the participants did not agree that the introduction of a lambda expression improved the readability of the source code in this situation. One of the participants stated: "(considering the code snippet 1182) I think that replac ing a normal for each by a collection.forEach() would only bring benefits when there are additional calls either to the map or filter methods, or perhaps calls to some other method list processing." Figure 6 shows the pair of code snippet 1180. In this ex ample, an instance attribute (duplicate) was first initialized using an anonymous inner class (Figure 6(a)). This anony mous inner class was later replaced by a lambda expression (Figure 6(b)), and 64% of the participants either agree or strongly agree that this transformation improves the readabil ity of the code snippet. Regarding this pair of code snippet, one of the participants stated that:  Considering all pairs of code snippets we used in the sur vey, only in two pairs of code snippets (1166 and 1182) we observed a tendency towards either a neutral or a dis agreement opinion that the introduction of lambda expres sions improves the readability of the code. More specif ically, in these two cases, the percentage of agree and strongly agree was under 50%. Both are examples of trans formations that replace a regular for each statement to a collection.forEach(...) using a lambda expression.

Source Code Preference
The goal of the second question of our survey (Which code do you prefer?) was to understand if the practitioners had a pref erence for the code before or after the introduction of lambda expression. Considering the nine pairs of code snippets of the survey (that we randomly select from the initial population), only the pair of code snippet 1166 received more selections for the first version of the code (i.e., before the introduction of lambda expressions). Therefore, we found some evidence in this survey that the participants identify the introduction of lambda expressions as a transformation that improves the quality of the source code. Surely, this preference depends on the experience of the developers, as one of the partici pants state: "It depends on the practical knowledge on functional programming, since programmers of the 1980s and 1990s are likely to consider easier to understand code where loops, control variables, and pointers are ex plicit." We used the Spearman correlation test to verify whether the reduction on lines of code and the reduction on cyclo matic complexity could explain the preference of the partic ipants for the pieces of code after the introduction of lambda expressions. We found a moderate to high correlation (0.67) between the reduction on the lines of code and the number of votes in favor of the code after the introduction of lambda expressions. Therefore, in the cases that a source code trans formation to introduce lambda expressions reduced the num ber of lines of code, it might have also improved the gen eral quality of the code-according to the perceptions of the participants. Differently, we found a weak correlation be tween the reduction on cyclomatic complexity and the num ber of choices in favor (or against) of the code snippets using lambda expressions. We could understand this result because the introduction of lambda expressions did not reduce the cy clomatic complexity in several cases.

Results of the Second Phase
In this section, we replicate the process executed in the first phase, but only considering transformations suggested by au tomated tools. Section 5.1 presents the results of the quanti tative assessment, taking into account the models of Buse and Weimer (2010) and Posnett et al. (2011). After that, we present the results of the qualitative assessments and compare them to the results of the quantitative study (Sec tion 5.2).

Quantitative Assessment
We considered the 92 pairs of code snippets randomly se lected from the set of recommendations to introduce lambda expressions suggested by RJTL, NetBeans, and IntelliJ. For each pair, we estimated the code comprehension of the ver sions before and after applying the suggested transforma tions, using both the Buse and Weimer (2010) and Posnett et al. (2011) models. We also calculated the SLOC and CC metrics for both versions of code snippets.
To investigate H1 (The introduction of lambda expressions improves program comprehension, according to stateofthe art readability models), we executed the Wilcoxon Signed Rank Test considering the two models to measure code com prehension. First, we evaluated the situations where a trans formation increased, decreased or unchanged code com prehension according to the models. After that, we executed the Wilcoxon SignedRank Test. Table 12 summarizes the re sults, showing that, in most of the cases, the introduction of lambda expressions suggested by automated tools actually reduces code comprehension, according to both stateofthe art readability models. The results of the Wilcoxon SignedRank Test suggested that the introduction of lambda expressions decreases the comprehensibility of the pairs of code snippets (pvalue < 0.0001). For instance, Figures 7 and 8 show pairs of snip pets that have been evaluated using the readability metrics. The transformation of an anonymous inner class led to an im provement according to Buse and Weimer (2010) metric: the readability for the code before the transformation according to this model is 0.29; and 0.50 after introducing a lambda expression. However, considering a transformation that re places a for loop by a lambda expression, the metric's result worsened significantly, reducing from 0.72 to 0.13 after the source code transformation. To investigate the H2 hypothesis (SLOC and CC can be used to predict the benefits (or drawbacks) on program com prehension, according to the readability models considered in this research.), we calculated the differences in the SLOC (∆s) and CC (∆cc) metrics, considering the code snippets before and after the introduction of lambda expressions. Accordingly, we explored the null hypotheses H2.1 0 and H2.2 0 (Section 4). Tables 13 and 14 summarize the results of the regression analysis considering a significance level < 0.05.
After performing the regression analysis, both models led to a pvalue > 0.05, w.r.t the SLOC metric. However, dif ferently from the results of the first phase, the analyses led to a pvalue < 0.05 when considering the CC metric. Such results suggested that cyclomatic complexity can be used to estimate the impact on code comprehension after the intro duction of lambda expressions. Therefore, the results con firmed our second hypothesis with respect to the cyclomatic complexity metric, being possible to estimate the effect on the readability metrics using the difference on the CC met ric. We further detail these results in Section 6.

Qualitative Assessment
In the qualitative assessment, we report the results of a sec ond survey with practitioners, to capture the perception of the developers about the impact on the readability of the code after applying transformations that introduce lambda expressions. These transformations had been recommended by automated tools only. We present the distribution of re sponses in the form of plots to build a broad perspective of the opinion of the respondents to every closed question. We then show the insights we got after conducting a thematic analysis of the openended questions, highlighting the par ticipants' opinions with quotations and code examples.

The Impact of Introducing Lambda Expressions
In our second survey, our first question asked the opinion of the respondents about four sentences, which we use to un derstand the impact of introducing lambda expressions in the pairs of code snippets. We organize this section according to the sentences of the first question of the second survey.
The new code is easier to comprehend. The purpose of this sentence was to evaluate if the transformations to introduce lambda expressions (recommended by automated tools) im prove program comprehension. Contrasting with the over all claims about the benefits of introducing lambda expres sions (Gyori et al., 2013), we found that almost all types of transformations the automated tools suggest do not im prove the readability of the programs.Interestingly, except for three types of transformations (Anonymous Inner Class to Lambda, For loop to Any Match, and For loop to Filter), the respondents most often did not agree that the introduc tion of lambda expressions makes the code easier to compre hend. Actually, according to Figure 9, 68% of the respon dents stated that they did not agree that transformations in volving the chaining of different stream operations improve program comprehension, and we observed the same trend for other typical recursive patterns (e.g., map, reduce, and for each).  It is worth to link these results to the answers to the open ended question. That is, according to the participants, replac ing an anonymous inner classes by a lambda expressions of ten improves program readability. Figure 10 shows an exam ple of this particular type of transformation. After introduc ing the lambda expression, the code is more succinct because it removes some of the boilerplate code necessary to imple ment anonymous inner classes. Regarding the code snippet of Figure 10, one participant stated: "(the code on the right is…) easier to read, usually lambda also makes the code cleaner and compact." This comment suggests that this is a situation where the in troduction of a lambda expressions improves program com prehension.
Differently, transformations involving chaining of the stream API methods received 68% of responses as either  With respect to a transformation involving chaining, one of the respondents stated the following about the example of Figure 11. "It's a bad example ... although I use lambdas a lot, I would never use them in exactly this way." Considering the same example of code in Figure 11, an other participant discussed that: "(I would) almost never (execute this transformation). Transforming for loops into forEach statements with lambda expressions provides little benefit other than us ing a maybe slightly more concise syntax. "Readabil ity" in my mind is such a subjective criterion that it is close to useless as a metric for making any decisions: someone coming from a functional language will find a map/filter/reduce pipeline easier to "read", and some one coming from a structured programming language will naturally tend towards nested loops." This is an example of transformation that replaces for each statements by lambda expressions. According to the respondents, it does not improve program comprehension. Based on these results, we disclose that transformations of type Replacing anonymous inner class with lambda expres sions, Replacing a for loop with the filter pattern and Re placing a for loop with the AnyMatch method improve code comprehension; while the transformations Replacing a for loop with a foreach statement, Replacing a for loop with the reduce pattern , Replacing a for loop with the map pattern, and Replacing a for loop with a Chaining of operators of ten do not improve program comprehension according to the developers' opinion. .stream() .filter((resolver) →(resolver instanceof BasicResolver)) .forEachOrdered((resolver) →{ ((BasicResolver) resolver).setEventManager(eventManager); }); }

(b)
The new code is more succinct and readable. The purpose of this sentence was to assess whether or not the introduction of lambda expressions makes the code more succinct and im proves its readability. Figure 12 summarizes the results of the developers' responses to this particular sentence. In this case, we found a more positive tendency, and the transformations from anonymous inner class into lambda expressions and the transformations resulting in the map, reduce, filter, and anyMatch patterns present a leaning towards positive an swers (Agree or Strongly agree). However, the assessment revealed that two types of transformations do not improve readability: transformations involving forEach and chain ing of the stream API methods received more than 49% of negative responses (Strongly Disagree and Disagree).   The transformation in Figure 13 shows a scenario that replaces a for each statement by a call to the forEach method of the stream API. Although this is a straightfor ward situation where a developer might use a forEach, it does not improve the quality of the code, and most of the respondents considered that this particular scenario does not make the code more succinct and readable (more than 80% of the respondents are either neutral or does not agree that his transformation brings these benefits). Regarding this pair of code snippets, one of the respondents clearly stated this perception.
"(this) transformation does not improve readability and makes debugging more difficult." Differently, Figure 14 shows an example of transformation that makes the code more succinct and readable, according to the opinion of the respondents. In this case, more than 80% of the answers were either neutral or present a leaning towards the agreement that the resulting code is more succinct and readable.
Altogether, from these observations, we argue that trans formations replacing for loops by a forEach method call and the composition of stream operations (sec:chaining) do not improve readability or make the code more succinct. On the other hand, the other types of transformations have shown benefits regarding code readability. The intention of using a lambda expression in the new code is clear. The purpose of this question was to investi gate whether or not developers are able to understand the motivation for using the lambda expressions introduced in the new code. Figure 15 summarizes the results of the de velopers responses to this question. Similarly to the previous sentence, we found a more negative leaning when we consid ered the transformations that replace a for loop by a call to the forEach method and transformations that introduce a chaining of stream operations. The remaining types of transformations seemed to make clear the intention of using either a lambda expression instead of an anonymous inner class or a recursive pattern (e.g., filter, anyMatch, map or reduce) instead of a for loop.   Transformations introducing a call to the forEach method received 44% of negative (Strongly Disagree or disagree) responses. This suggests a neutral opinion regarding the clear intention of introducing a lambda expression. Figure 16 shows an example of code that replaces a for loop by the forEach pattern, where 66% of the respondents considered unclear the intention of the code. In particular, a participant stated that: "(I would never) perform this transformation. The for loop makes it clear and explicit that we are iterating over the elements in the collection-it is a fundamental part of the language that we all understand. The (use of) lambda expression does not." Figure 17 shows an example of transformation that makes the intention of the code clearer. This transformation replaces a for loop by a call to the anyMatch method, and 88% of the respondents assigned either a neutral or a positive answer (Agree or Strongly agree) with respect to the clear intention of using a lambda expression in this example. A respondent also claimed that: "…The new code is more elegant and makes the inten tion of finding some occurrence where the condition is true clearer." Altogether, from these observations, we argue that trans formations replacing for loops by calls to the forEach method and the composition of stream operators (chaining) do not make clear the intention of introducing lambda expres sions. On the other hand, the other types of transformations   have shown benefits, making it clear the intention of replac ing anonymous inner classes with lambda expressions and the use of other recursive patterns (filter, anyMatch, map, and reduce). The new code is harder to debug. The goal of this sentence was to assess whether or not the introduction of lambda ex pressions makes the code more difficult to debug. The results in Figure 18 show that practically all types of transformations present the side effect of hindering the task of debugging, apart from the transformations that replace anonymous inner classes by lambda expressions.
Transformations involving calls to the filter and chaining methods of the stream API received more than 70% of negative responses-that is, respondents either Agree or Strongly agree that the transformations make the code harder to debug. Differently, transformations that replace anonymous inner classes by lambda expressions received 53% of positive answers (respondents consider that this kind of transformation does not hinder debugging activities).  Figure 19 shows an example of a transformation that in troduces a forEach statement. In this case, 88.33% of the respondents were either neutral or presented a positive feel ing that this transformation does not hinder debugging tasks. Interesting, one participant claimed that this transformation made the code harder to debug (due to obfuscating the types of variables), although he/she was still leaning towards con sidering the transformation beneficial.
"Obfuscating the types of the variables used makes the code easier to change, but at the same time may make it harder to debug. I would still perform the transforma tion though." Figure 19. Pair of code snippet 510. Replacing loop to forEach pattern.  Figure 20 shows an example of transformation that also makes the code hard to debug (more than 85% of the respon dents either Agree or Strongly agree that this transformation hinders debugging tasks). However, in the opinion of a de veloper, an improvement in the transformation could actually make the resulting code easier to debug.
"Yes (I would perform this transformation), in a hurry, but with a minute more time I'd extract the filter into its own function. However, the suggested refactoring is in itself valuable because it does bring out the important part. If an automated tool did this to a whole codebase, it would make debugging easier, especially for junior developers." In summary, from these observations, we argue that evolv ing a legacy code to use the stream API and lambda expres sions often makes the resulting code harder to debug. This undesired side effect does not happen in the case of transfor mations from anonymous inner classes into lambda expres sions.

How often would you perform this type of trans formation?
The purpose of this question was to assess how often devel opers would perform the set of 98 transformations we explore during the survey. Interesting, besides the possible side ef fect of hindering debugging activities, respondents presented a positive tendency to accept 72% of the transformations in our dataset-respondents rejected 22% of the transfor mations and were neutral with respect to 6% of the trans formations. Nonetheless, when we discarded the transfor mations involving anonymous inner classes, the number of transformations that the respondents would accept dropped from 72% to 44.44%, and the respondents would reject 50% of the transformations. Figure 21 summarizes the responses to this question, which presents options related to frequency (from Never to Always). It is possible to observe that the respondents would not perform some of the transformations. For instance, the re spondents would never or rarely replace a for loop by a call to the forEach method in 50% of the scenarios. We found a similar result when considering transformations that intro duce the map recursive pattern. Differently, the respondents stated they will either Often or Always perform transforma tions replacing for loops by a call to the anyMatch method (61%) and inner classes by lambda expressions (60%). Ta ble 15 presents a different perspective about the answers to this question, without splitting them using the type of the transformations.

How important is the automated support for this kind of transformation?
The purpose of this question was to assess the importance of using tools to perform transformations that introduce lambda expressions. Figure 22 summarizes the results for this ques tion, where the options range from Not important at all to Very Important. We can observe in the figure that respon dents considered the support of automated tools either Mod erately Important or Very Important to apply the transfor mation, in more than 50% of the cases. This might indicate that developers prefer to perform these transformations using some code refactoring tool. However, transformations intro ducing the forEach recursive pattern received most of the responses between Not important at all and Low important, which perhaps supports that this particular kind of transfor mation does not improve the source code. Finally, the trans formation classified as Replacing a for loop with a Chaining of operators received most responses in Neutral (38%). Based in these results, we can argue that developers con sider worth the use of refactoring tools to introduce lambda expressions and rejuvenate Java programs. However, there is some room for improving these tools, as we discuss possible scenarios in the next section.

Synthesis of the Responses to the Openended Question
In this section we present a synthesis of answers to the open ended question of our second survey, using the thematic analysis procedures we detailed in Section 3. We found three recurrent themes that might explain the reasons for accept ing a transformation: More Succinct Code, Easier to Under stand, and Clear Code Intention. We also identified three re current themes that might justify why a given transforma  Finally, several answers claim that the transformations could be improved (the Need Improvements theme that appears in transformations marked either as accepted or rejected). Sev eral answers provided an alternative to the modified version of the code (often using a textually description, but in a few cases, the participants also shared as code example using a Gist 4 ). Most recommendations to improve the resulting code (i.e., the code after applying a transformation) relate to the source code format, e.g.: "No need of curly braces and semicolon on the second statement" and "I would always perform this transformation, but I would use line breaks and filters to make the code more readable". Perhaps, refactoring engines that introduce lambda expressions could benefit from ad vanced code format tools (e.g., the approach by Parr and Vinju (2016)). Other possible improvements are trickier, which might indicate the need to follow a careful code re view process after applying code transformations (Carvalho et al., 2020). For instance, one of the participants argued that: "[…] streams should produce collections as results, not populate them as sideeffects. If we fixed that, and broke to a new line before each transformation or filter, then I think it would be OK." Other possible improvements stress the use of the type in ference mechanism: "I don't think you need to specify (File file), do you? You could just say "file" and let the type get in ferred [, right]? Unless CollectionUtils.select is overloaded and takes multiple different functional types." We found that the transformation engines of NetBeans IDE and RJTL do not explore the type inference mechanism in their refactoring recommendations. Participants also suggested that the intro duction of lambda expressions brings small benefits, and, as such, they would rarely change a legacy code that is working just to introduce new language constructs or idioms.
"I would not rewrite legacy code to introduce a lambda expression in this way, unless the inner code itself would have to be rewritten." They are transformations to introduce lambda expressions that make the code more succinct.
"yes, perfect case for lambda, short, clear"; "Yes, I would because nowadays languages have improved their syntax to provide a better and easy code to developers make their softwares, Java 8 introduced Lambda, where you can write less code and do more."; "I would sometimes make this change, but not always because it is only making the code more succinct"; P203, P285, P749 Easier to understand 15 They are transformations to introduce lambda expressions that make the code more comprehensibly.
"Yes. The new code, besides looking cleaner, is also really easier to read and comprehend."; "Yes, code readability was a factor"; "Easier to read, usually lambda also makes the code cleaner and compact"; P803, P334, P337 Clear Code Intention 14 They are transformations to introduce lambda expressions that make the code more clear.
"Yes, since it looks more "straight forward", and it makes the code itself cleaner"; "I would do it because it's easier to write and the code gets cleaner."; "Yes, absolutely, clearer intent, more expressive, easier to read and comprehend."; P803, P635, P229 Harder to understand 5 They are transformations to introduce lambda expressions that make code less comprehensibly.
"This is still pretty hard to read and understand on account of a) the hard cast of the lambda to Callable<Object>, which seems weird is this necessary? Isn't it at least a Callable<T>? b) Why a "checkThat" method is calling "checkSucceeds" which seems a little like jumping to a conclusion."; "Maybe not a complex return on one line"; "I would never perform this transformation. The for loop makes it clear an explicit that we are iterating over the elements in the collection it is a fundamental part of the language that we all understand."; P229, P203, P583 Wrong scenario 5 They are transformations to introduce lambda expressions that shouldn't be done.
"Since this is a void method it will, by definition, never be truly functional. Splitting the original code into a map -with a side effect, no less! -and a terminal operation with forEach construct does not really improve anything in my mind."; "I tend to avoid trycatch in lambda expressions. I don't think it's bad to do so, but I personally don't do it, even if it means using an anonymous inner class."; P694, P547 Table 16 and Table 17 summarize the frequency of the re current themes. As a future work, our goal is to consider the answers to this openended question to improve the RJTL im plementation. All code snippets and datasets we used in our research are available in the paper's companion website 5 .

Discussion
As explained in the previous section, we found conflicting results in our research. In the first phase, the models for es timating readability diverge from one another. That is, the Buse and Weimer (2010) model suggests that when a de veloper introduces a lambda expression into Java legacy method, the readability of the method decreases. Differently, the model of Posnett et al. (2011) suggests that the introduc tion of lambda expressions does not impact program compre hension in the first phase. Contrasting, in the second phase, both models suggest that the introduction of lambda expres sions decreases program comprehension. The main differ ence between the two phases is that the second one only con sider transformations suggested by automated tools. Perhaps, manual transformations fix some problems related to read ability.
Nonetheless, the results of the qualitative assessments 4 Gist is a GitHub feature that allow developers to share code 5 https://waltim.github.io/jserd.html with practitioners suggest that the introduction of lambda expressions improves program comprehension in particu lar cases. For instance, the replacement of anonymous inner classes by lambda expressions often improve readabilityaccording to the results of our surveys. Other scenarios that the introduction of lambda expressions might be positive are the replacement of for loops with simple recursive pat terns like filter and anyMatch. We believe that these con flicting results are partially due to the limitations of both models on identifying improvements in finergrained trans formations. Considering the results of both quantitative and qualitative studies, we answer our research questions in Sec tion 6.1 and present some lessons learned in Section 6.2. Fi nally, we present some threats to the validity of our study in Section 6.3.

Answers to The Research Questions
When using a mixedmethods approach, the best scenario oc curs in situations where the results of a quantitative studies support the findings and explains the results of the qualita tive ones (or viceversa). Considering Table 18, which com bines the results of the quantitative and qualitative assess ment for the transformations that replace anonymous inner classes with lambda expressions, it is possible to observe differences between the outcomes of both readability mod els and the developers perceptions of code comprehension.
We are in favor of the results of the qualitative study. There fore, considering our first research question (Does the use of lambda expressions improve program comprehension?), our findings revealed that refactoring a legacy code to introduce lambda expression improves program comprehension in the specific scenarios we discussed earlier. After these results, we investigated whether the code com plexity metrics (SLOC and CC), independently, could pre dict if a transformation of a legacy code to introduce lambda expressions improves the readability of the code. To perform this investigation, we calculated the differences in SLOC (∆s) and CC (∆cc) metrics, considering the code snippets before and after the introduction of lambda expressions. Af ter that, we ran the Pearson's correlation test (Mukaka, 2012), to assess whether these differences correlate with possible improvements in program comprehension according to the survey respondents. We found that (∆cc) has no relation to the answers of developers about comprehension. On the other side, the (∆s) presents a moderate correlation (ρ = 0.5324 and pvalue < 0.05). Such results revealed that the greater the reduction of lines after the introduction of lambda expressions, the better the comprehension of the code ac cording to the developers opinion-independently of reduc ing the cyclomatic complexity or not. Therefore, tool devel opers could use SLOC to automatic learn good situations to suggest transformations that introduce lambda expressions.
Regarding the second research question (Does the intro duction of lambda expressions reduce source code complex ity?), after assessing the impact of introducing lambda ex pressions in 158 pairs of code snippets (66 of the first phase and 92 from the second phase of this research), we found that introducing lambda expressions (a) reduces the size of the code (SLOC) in 70% of the cases and (b) reduces the cy clomatic complexity in 40% of the cases. Only in a few cases, the introduction of lambda expressions increased SLOC. We did not find any case in which a transformation increases cy clomatic complexity. Considering our third research ques tion (What are the most suitable situations to refactor code to introduce lambda expressions?), we found that replacing anonymous inner class by a lambda expressions might be considered the killer application to introduce lambda expres sions in legacy Java code. In addition, scenarios replacing for loops having internal conditional with an anyMatch opera tor often improved the readability of the code and makes the intention of using the lambda expression more clear. Differ ently, just replacing a simple for over a collection statement with a collections.forEach() did not bring any benefit, according to the participants of our surveys. We also found that the chaining of stream methods and the introduction of recursive patterns (e.g., filter and map) hinders debugging activities according to the developers.
Regarding our fourth research question (How do practi tioners evaluate the effect of introducing lambda expressions into a legacy code?), developers agreed that the introduc tion of lambda expressions improve the quality of the code (in particular when removing the boilerplate code related to anonymous inner classes), though it might introduce some challenges to debugging activities in general. Developers would actually accept most of the RJTL, NetBeans, and In telliJ transformations (72%), and they considered worth the existence of automated support to introduce lambda expres sions and thus rejuvenate Java legacy code.
Finally, with respect to our last research question (What is the practitioners' opinion about the recommendations from automated tools to introduce lambda expressions?), the re sults suggested that the use of automated tools to rejuvenate Java programs is promising. Again, considering only recom mendations from NetBeans IDE, RJTL, and IntelliJ IDE, de velopers agreed that transformations replacing anonymous inner class by lambda expressions improve program com prehension. Still, the feedback from the participants revealed several weaknesses of these tools, and thus we found some space to improve these refactoring engines, as we discuss in the next section.

Lessons Learned
Need for reviewing comprehensibility models. The state oftheart models for estimating code readability could not capture the benefits of introducing lambda expressions, as the participants of our survey report. We believe that a fur ther investigation is necessary, in order to understand if these models fail to capture the benefits of finegrained transfor mations similar to the introduction of lambda expression, or if they also fail when evaluating general transformations such as more popular refactorings. Nonetheless, both models are sensitive for code formatting decisions, including the number of blank characters. Similar conclusions have been reported in a recent research work Fakhoury et al. (2019).
Recommendations for Refactoring Tools. We found that transforming anonymous inner class into lambda expres sions is the scenario that brings more benefits for code com prehension. We also found that replacing for loops having an internal conditional by an anyMatch and filter pat terns improves the code readability. Nonetheless, we con sider that it is not recommended to blindly apply automatic transformations from simple for loop statements into a collections.forEach() statement. This kind of transfor mations does not improve code readability. Several features might also help to identify the situations where introducing a lambda expression do not improve the code. For example, according to the participants, we should avoid combining the functional and imperative styles in the same method. Simi larly, several transformations led to pieces of code with a wrong indentation (e.g., comprising long lines or unneces sary curly braces). According to the practitioners, some rec ommendations decreased the readability of the code due to indentation issues.

Threats to Validity
There are two main threats to our work. First, our results de pend on the representativeness of the code snippets used in the investigation. Although we used a sample from real sce narios that introduce lambda expressions in legacy code, this sample might not correspond to a representative population that would be recommended to conclude our quantitative as sessment. We evaluated nine pairs of code snippets in the first survey. To circumvent such a threat, we replicated the study and evaluated 92 pairs of code snippets. This number is sim ilar to the number of code snippets evaluated in a previous study (Posnett et al., 2011).
The second threat is related to external validity. Initially, our research participants belonged to a relatively small group of professional developers, who despite having great experi ence in Java, were a small group of developers in our cy cle. During the replication of the study, we were able to sig nificantly increase the number of participants from different locations in the world. We believe that, with this variety of participants, our results became more reliable, allowing us to generalize our findings to this population.
Finally, we could have used other models to estimate read ability, which have been previously discussed in the litera ture (Scalabrino et al., 2016). However, we only found an implementation of one of these models, the one by Buse and Weimer (2010). We also implemented the computation for an additional model by Posnett et al. (2011), but it would be difficult to provide implementations for all models available in the literature.

Final Remarks
In this paper we presented the results of a mixedmethod investigation (i.e., using quantitative and qualitative meth ods) about the impact on code comprehension with the adop tion of lambda expressions in legacy Java systems. We used two stateoftheart models for estimating code comprehen sion (Buse andWeimer, 2010; Posnett et al., 2011), and found conflicting results. Both models (Posnett et al., 2011) and (Buse and Weimer, 2010) suggested that the introduction of lambda expressions does not improve the comprehensibil ity of the source code. Differently, the results of the quali tative studies (surveys with practitioners) indicated that the introduction of lambda expressions in legacy code improves code comprehension in particular cases (particularly when replacing anonymous inner classes by lambda expressions). After considering these conflicting results, we argue that (a) this kind of source code transformation improves software readability for specific scenarios and (b) we need more ad vanced models to understand the benefits on program com prehension after applying finergrained program transforma tions.

A Taxonomy of Lambda Expression Transformations
This appendix introduces a simple taxonomy used to classify the lambda expression transformations. For each member of the taxonomy, we present a brief description and an example.

Replacing anonymous inner class with lambda expressions
A developer might use this transformation to convert an anonymous inner class into a lambda expression. Figure 23 shows an example of this transformation.

Replacing a for loop with the map pattern
A developer might use this transformation to convert a for loop into a map recursive pattern of the stream API. Fig  ure 24 shows an example of this transformation.

Replacing a for loop with the reduce pattern
A developer might use this transformation to convert a for loop into a reduce pattern of the stream API. Figure 25 shows an example of this transformation. In this example, there is a composition between a map and a reduce, though the goal is to reduce a collection of test classes into the num ber of test methods.

Replacing a for loop with a foreach statement
A developer might use this transformation to convert a for loop into a forEach statement. Figure 26 shows an exam ple of this transformation. Respondents of our survey do not consider that this kind of transformation improves the quality of the code.

Replacing a for loop with the filter pattern.
A developer might use this transformation to convert a for loop into the filter recursive pattern of the stream API. Figure 27 shows an example of this transformation. Respon dents in our survey consider that this type of transformation improves the quality of the code.