A Systematic Mapping of Accessibility Problems Encountered on Websites and Mobile Apps: A Comparison Between Automated Tests, Manual Inspections and User Evaluations

The use of websites and mobile applications has become essential for numerous daily activities. However, not everyone can have full access to such services and content due to many websites and applications being inaccessible to people with disabilities, such as people with vision impairments. In this context, even though developers may demonstrate an effort to create more accessible content, there is limited information about the characteristics of different accessibility assessment methods applied to websites and mobile applications. Thus, the present study aimed to perform a meta-analysis of 38 types of accessibility problems on websites and mobile applications extracted from 38 studies in the literature from an initial search of 304 articles. Studies carried out automated assessments using tools, expert-based inspections and user testing involving disabled people. The results confirm other considerations made in the literature, showing that automated evaluation methods have significant limitations on an adequate coverage of accessibility problems, covering less than 40% of the types of problems found on websites and less than 20% on mobile apps. A significant percentage of problems both on mobile and web platforms were only encountered by studies involving users. Expert inspection showed a higher coverage of problems encountered by users, both on mobile apps and on websites, despite not covering all of them. Thus, the article concludes by showing a consolidation of literature data to reinforce that effective accessibility evaluations of web and mobile applications should count in expert-based inspections and user tests involving people with disabilities.


Introduction
With the popularization of digital resources, several companies from different sectors started to offer digital services such as e-commerce, communication, bank transactions, geolocation, and social networks. Using the web and mobile platforms, people have gained greater mobility to carry out their daily tasks. However, not everyone can use these features. Not all mobile and web apps have accessibility features, making people with disabilities unable to use these apps to take advantage of all the features. In this context, people with visual impairments have encountered several accessibility barriers. The World Wide Web Consortium (W3C) defines web accessibility as "the possibility and the condition of reach, perception, and understanding for the use, in equal opportunities, with security and autonomy, of sites and services available on the web" (W3C, 2016). We can also expand that concept to the context of the mobile app.
Usability and accessibility are fundamental requirements for any software to be high quality, whether a website or an application. According to the ISO 9241-171 standard (ISO, 2018), accessibility is the "usability of a product, service, environment or installation by people with the widest range of resources". Besides, that standard defines usability as "the extent to which a system, product or service can be used by specific users to reach specific objectives with effectiveness, efficiency, and satisfaction in a specific context of use".
Providing accessible interaction is fundamental to users with disabilities. Accessible websites and mobile apps should provide usable interaction for all users, including those with different sensory and motor abilities and users of computers with assistive technologies. Designers and developers need to count on appropriate techniques to design and evaluate the accessibility of accessible technology. Appropriate techniques are essential to learn user needs and uncover accessibility problems during the development process. Considering the aim of technology managers and developers to reach as many users as possible, knowing the main advantages and disadvantages of different evaluation methods is essential to plan their development process.
The literature on Human-Computer Interaction presents different methods to evaluate accessibility to reveal problems for people with disabilities, including people with visual impairments. Practitioners can employ accessibility evaluations through tests with users, manual inspections by specialists, and automated tests. However, more research is needed to understand the trade-off of using different methods for assessing web and mobile accessibility, focusing on visually impaired users. There is still limited knowledge on the coverage of the methods on the different types of accessibility problems encountered by people with visual impairments in these systems (Stephanidis, 2009).
In our previous study (Silva et al., 2019), we conducted a systematic literature mapping covering 19 studies that performed evaluations of web accessibility. We analyzed 38 types of accessibility problems and compared the types of problems covered by automated tools, manual inspections, and user evaluations. However, the study did not cover the problems encountered in mobile applications. Therefore, in the context of web and mobile accessibility for visually impaired users, this study aimed to compare different types of problems found in different ways of assessing accessibility in the literature by exposing benefits and disadvantages. Thus, we proposed the following main research question: When performing automated assessments, inspections by experts, and tests with users in web and mobile applications focusing on people with visual impairments, what are problems identified?
We defined the following specific questions to answer the research question: The remainder of this article is organized as follows. Section 2 presents web and mobile accessibility concepts, methods for finding accessibility problems, and related work. Section 3 describes the methodology used to conduct the study. Section 4 presents the results obtained with the analysis of the studies selected from the literature. Section 5 discusses the advantages and limitations of the evaluation methods to assess web and mobile accessibility. Section 6 presents the final remarks.

Background and related work
This section presents the main concepts of web and mobile accessibility, accessibility evaluation methods, and related work.

Web and mobile accessibility
Like everyone else, users with disabilities use the web and mobile applications for a variety of activities. For that use to occur with autonomy, blind or low vision people use screen reader software -an assistive technology that allows adequate access to systems using means non-visual, mainly with speech synthesis (Stephanidis, 2009) or use content expansion features. So, we can understand assistive technology as "hardware or software added or incorporated into a system that increases accessibility for an individual" (ISO, 2018). This technology brings more accessibility for people with different needs.
Although assistive technologies can provide more autonomy for people with disabilities, people with different types of disabilities (e.g., visual impairment) cannot have complete autonomy when accessing web pages or mobile applications with accessibility issues. In this context, web accessibility "deals with the possibility and the condition of reach, perception, and understanding for use, on equal terms opportunities, with security and autonomy, of websites and services available on the web" (W3C, 2019). Therefore, accessibility deals with the digital environments that facilitate interaction, information access, and manipulation by people with disabilities.
People with visual impairment can use different types of assistive technologies to use the web and mobile applications. On desktop computers and smartphones, people with little or no residual visual typically use screen reader software. That software synthesizes the content presented on the screen once developers provide appropriate textual descriptions and semantic information. On the one hand, blind people commonly use the keyboard to interact with the screen on desktop computers; on the other hand, those people use special gestures on touch screens as commands on mobile devices. People with low vision use different adaptations and specialized software to enlarge content, change colour schemes and enhance the display.

Accessibility evaluation
We can find different evaluation methods in the literature to encounter accessibility problems in websites and mobile applications. Most methods are concerned with evaluations with users with disabilities or inspections that involve reviewing accessibility guidelines. These methods usually involve the following characteristics (Brajnik, 2008): • To prescribe which steps, decisions, criteria should be used and what conditions accessibility problems are detected; • To prescribe how to classify and indicate problem ratings (in terms of severity, priority, and vice-versa); • To prescribe how to aggregate, describe, and report data on accessibility; • To prescribe how to select web pages or screens for evaluation.
The main accessibility approaches typically used to evaluate the web and mobile devices accessibility are automated evaluations, manual inspections by experts, and user evaluation.

Automated evaluation
The automated evaluation involves an evaluator using an automatic accessibility assessment tool to evaluate the compliance of a web page or the mobile application screens concerning accessibility recommendations coded in the tool (Brajnik et al., 2011). The resources available in automated tools can help verify a subset of guides such as the WCAG (Web Content Accessibility Guidelines) in a less time-consuming way that professionals use (Ivory, 2013). Those tools can be useful to help evaluate accessibility problems that would be tedious to manually check, for example, the lack of features like alternative texts and headings and values of colour contrasts predefined by sets of guidelines (Freire, 2012). Despite its benefits, the automated evaluation is limited and does not can identify all web accessibility issues. For example, a tool can determine whether there is an alternative text in an image but cannot judge whether the text is presented appropriately to the context (Brajnik et al., 2011).

Inspection by specialists
Along with the automated evaluation, checking accessibility manually by experts has a relevant role to be applied together in the evaluating process of web applications (Brajnik, 2008;Freire, 2012). The most used inspection by specialists is the Conformity Assessment. In this method, the evaluator uses guides such as the WCAG (W3C, 2020), Section 508 (Jaeger, 2006), and e-MAG (Electronic Government Accessibility Model) (Governo Brasileiro, 2014) to assess whether a web page complies with the accessibility recommendations in those guides (Brajnik et al., 2011;Abou-Zahra, 2008). The inspection by specialists finds problems on mobile and web platforms that cannot be verified automatically. Despite not identifying all possible problems encountered by users, it is a way to reveal them in earlier development stages (Freire, 2012;Zaphiris, 2007).

Tests with users
In the user tests, the goal is to involve (disabled) users in verifying the accessibility on Web pages. Disabled users are individually invited to browse web pages attempting to perform a task, and their behaviour is observed by the evaluators (Brajnik et al., 2011).
Evaluation with disabled users is critical because accessibility evidence (or lack thereof) of a web page or mobile application in its actual use by the target audience is provided. However, recruiting users with different types of disabilities is not an easy task (Petrie and Bevan, 2009).

Related work
In the literature, we found studies whose aim is to use accessibility assessment methods to understand the problems present on Web sites and mobile applications, their compliance with the existing accessibility guidelines, and the relationship among problems found by different methods. Harrison and Petrie (2007) used assessment methods with users and experts to assign degrees of priority for accessibility and usability problems and compared them with the degrees of priorities proposed by the WCAG 1.0 guides for accessibility and Health and Human Services for usability (Harrison and Petrie, 2007). The researchers selected six e-Commerce and e-government websites, evaluated by six users with different disabilities and one specialist. They concluded that the severity attributed to users and the specialist was similar. However, they differed from the guide's severity ratings, which proved to be a problem. A developer, while using the guides, is concerned with prioritizing the most critical problems. The research concluded that the experts perform better in foretelling the severity attributed by users than the priorities defined by the guidelines.
Regarding the automated tools used in Vigo et al. (2013)'s study, the objective was to understand the effectiveness of those tools for analyzing web accessibility concerning the WCAG 2.0 guidelines. These authors carried out an empirical evaluation of three Web sites and used six different tools. The authors compared the results with the authors' manual inspection with the guide WCAG 2.0 on the same websites. They concluded that relying only upon automated tools is not the best option since they covered from 23% to 50% of the authors' total success through the guide. Besides, each tool found, on average, 4 to 10 success criteria with the possibility of false positives.
In Jaeger (2006)'s study, the authors understood how accessible the US government websites were to the Section 508 accessibility guide. They pointed out the importance of involving users in the testing websites process. They also stressed the experts' importance to ensuring accessibility in the development and maintenance and the role of the automated tools to support the testing process (but not as the sole means for evaluation). Finally, their study pointed out the importance of improving feedback channels to understand the difficulties that disabled people can have with the site. For this purpose, ten government websites were evaluated through a set of methods, which are: i) analysis to understand whether the Section 508 standards attendance would result in a website accessible; ii) expert inspection to understand whether the Web sites met Section 508 standards; iii) tests with users to provide a detailed picture from the users' perspective with visual and motor disabilities; iv) automated tests to assess whether they are capable of providing a problems overview, and v) one questionnaire with webmasters to understand when deciding to implement the Section 508 standards and what types of evaluations carried out on the websites. In conclusion, the study provided some guidance for government agencies to meet the Section 508 standards requirements.
To investigate accessibility barriers in mobile applications, da Silva et al. (2018a)'s study found gaps in WCAG 2.0 technical guidelines, making them insufficient to meet all the disabilities users' needs. The work involved the Mercado Livre application, a mobile system aimed at electronic commerce, where the interaction of five visually impaired people was observed. All users involved in the study used a screen reader. They were asked to perform some tasks. Except for only one participant, the others had a residual view, but it did not enable them to use other resources to interact with mobile applications, such as expanding content. After completing the tasks, a questionnaire was applied to the participants to remember their thoughts during the task execution. Thus, this process characterizes the retrospective verbalization protocol (da Silva et al., 2018a). As a study result, the authors presented several difficulties reported by users on the interaction with the selected application and violations related to the WCAG 2.0 guidelines. For example, the application had icons with alternative text that did not adequately describe the features, violating one of the WCAG 2.0 guidelines.
Another study examining mobile accessibility was performed by Carvalho et al. (Carvalho and Freire, 2017;Carvalho et al., 2018a). In those studies, the authors investigated the interface components adequacy when developing mobile systems. The research involved analyzing three prototypes of mobile systems, focusing on accessibility problems for visually impaired people. They performed an accessibility assessment on a sample of thirty Android interface components. The components selection was based on documents investigating the standard components related to the HTML and the Android system. Based on the WCAG 2.0 success criteria, an expert appraiser audited all of the sample interface components present in the three prototypes in that study. The prototypes were implemented using three methods. The first included standard Android Studio components, generating a native application; the second prototype was developed using HTML components, resulting in a system with web resources; and the third was a hybrid application.
As a result, web applications proved to be more accessible when using TalkBack, but more complex web components, such as audio and video, violated some WCAG 2.0 success criteria. For example, the authors recommended that developers choose apps with web resources or hybrid applications where the content is divided into several sections. The study results show that they are superior concerning the native application without web resources (Carvalho and Freire, 2017).
da Silva et al. (2016a) also conducted an empirical study focused on mobile systems, aiming to identify accessibility barriers in the WhatsApp application. The barriers were related to the WCAG 2.0 success criteria. Their study involved five blind users who were asked to perform some tasks and verbalise their interaction experiences later. Users performed eleven tasks on WhatsApp, and the interaction with the application allowed researchers to observe existing accessibility barriers. As an example, when performing the tasks, there was a feedback absence and buttons without labels. Besides, most of the tasks were completed by users, but researchers had to help in some situations. Accessibility barriers were grouped according to the WCAG 2.0 principles, where researchers noticed guideline violations in almost all principles, except for the robust principle. Ghidini et al. (2016)'s study assessed the types of interaction facilitates used in mobile systems by visually impaired people. The study involved the electronic agenda prototype development and tests with participants. Interviews were conducted with six visually impaired people to understand the most common interacting means with smartphones and the facilities and barriers encountered in this interaction. Concerning the interaction with that application, the authors identified positive aspects and functionalities that can improve usability. With the results obtained from the interviewees, they developed a prototype to obtain other results, replacing the native calendar. After creating the prototype, the researchers conducted tests with users, which involved the native calendar application and the prototype. The study was composed of tasks in both applications and involved four visually impaired users who used screen readers in that testing. Next, the participants were asked about their interaction with the two applications. Regarding the native calendar, the authors state that the participants' general opinion was that it had poor usability. On the other hand, when asked about the interaction with the researchers' prototype, the participants considered it easier to use. Considering the results obtained, the researchers applied changes to the developed prototype, but they did not observe all the users. In making these changes, the researchers performed another test with a user with low vision. As in the test carried out previously, the study involved the native application and the researchers' prototype. In the native application, at various times, the researchers noticed that the participant had difficulty finding what looking for, such as buttons. Unlike the test with the native application, the prototype results were better. Hanson and Richards (2013) investigated accessibility indicators on the web on a wide range of websites. Conducted for fourteen years, the authors sought to observe improvements in accessibility and possible reasons that caused changes related to web accessibility. The study involved one hundred and eight sites, analyzing whether they complied with WCAG 2.0 recommendations using automated inspections. Before the WCAG 2.0 guidelines launch, the researchers noted the application and impacts caused by the WCAG 1.0 recommendations. As a criterion for selecting the sites to be tested, the researchers included only sites from English-speaking countries with a web accessibility policy. It facilitated the sites' understanding and the developers' intentions. Considering the researchers' analyses, there was low adherence to the WCAG 2.0 guidelines in many cases. According to them, the developers' lack of knowledge is one factor that causes low adherence. Besides, the complexity of the guidelines makes it difficult for developers to understand, often not experts in web accessibility, resulting in the lack of necessary resources implementation. However, the study also pointed out that the web has become more accessible in the last years, as there are fewer violations related to alternative means for images, greater headers use, among other changes. The researchers concluded that improvements in web accessibility result from good coding practices, the desire to improve the design, and the search for better results in web searches.
Rømen and Svanaes (2012)'s study involved desktop technology, where empirical tests were performed to verify the coverage level of WCAG 1.0 and WCAG 2.0 standards. The study involved three visually impaired people, two users with dyslexia, two motor disabilities, and six people without disabilities. Two sites in Norway were inspected in the survey, based on identical tasks, since the two platforms web content was similar. The study results showed that, on average, people without disabilities identified fewer problems compared to disabled users. The study also showed that referring to the WCAG 1.0 standard, less than 42% of the identified barriers would be covered by the technical guidelines. Concerning the WCAG 2.0 standard, less than 49% of accessibility problems would be covered.
It is also important to highlight Power et al.'s study (Power et al., 2012(Power et al., , 2011, in which the authors investigated web accessibility problems involving desktop technology. Thirtytwo visually impaired users performed some tasks and sixteen sites based on the WCAG 2.0 success criteria. The results of this study showed that: i) WCAG 2.0 technical guidelines covered only 50.4% of the problems identified by users; ii) many developers do not usually implement websites following technical guidelines, and iii) there is little evidence of a decrease in the number of accessibility problems when web systems are developed based on WCAG 2.0. The study also presented the problems' severity, the average number of prob-lems found on each evaluated site, and accessibility barriers categorization.
The analysis of related studies reported in this section showed that many studies in the literature had investigated accessibility problems encountered by different evaluation methods on websites and mobile applications. In web applications, more studies delved into the coverage analysis of different methods. However, there are fewer such studies focused on mobile accessibility. Finally, it is worth noting that most of the analyses were performed on single datasets derived from individual studies. Hence, the present paper analysis enabled a deeper meta-analysis of different evaluation methods results from various studies in the literature, focused both on web and mobile platforms.

Methodology
This study carried out a comparative analysis among different accessibility problems encountered in web and mobile applications reported in literature focused on visual disabilities people. Thus, the results allow practitioners and researchers to know more about accessibility evaluations in web and mobile platforms since the study brings benefits and limitations to each method. Developers and appraisers can understand how to combine the different techniques in the evaluation processes.
For this purpose, the study encompassed a systematic literature mapping in the last seven years, looking for problems found by different evaluation methods -automated tests, expert inspections, and tests with users.

Search strategy
The following search string was designed to find studies in which some accessibility assessment was applied on the web or mobile applications, using automated tests, user tests, and expert inspections: TITLE-ABS-KEY ( ( accessibility OR accessible ) AND ( mobile OR android OR apps OR ios OR talkback OR "talk back" OR "voice over" OR voiceover OR web OR website OR "web site") AND ("visual impairment" OR blind OR blindness OR "visual disability" OR "low vision" OR "partially sighted" ) AND ( evaluation OR assessment OR testing OR test OR inspection OR audit ) AND ( specialist OR expert OR appraiser OR estimator OR evaluator OR assayer OR manual OR automatic OR automated OR tool OR tools OR user OR users ) ) AND This string was used in the scientific article repository Scopus, which contains the most relevant Computer Science and Human-Computer Interaction publications. 267 studies were found with that string, and the search was performed from November 14, 2020 to June 13, 2021.

Inclusion and Exclusion Criteria
For including the studies in the systematic literature mapping, the following inclusion criteria were defined: • Studies should report assessing the accessibility of Web sites or mobile applications using automated tests, inspections by experts or user tests; • Studies should focus or address evaluations targeted at visually impaired users; • Studies must explicitly report the types of accessibility problems encountered; • The studies' full text must be available through the Brazilian Capes Portal; • Studies should report in detail the methods used and the procedures for evaluation; • Studies must be published up to February 2021.
In addition, the following exclusion criteria were defined: • Short paper studies with non-detailed presentation of methods used; • Studies that only report the number of problems encountered, without qualifying the types of problems. • Articles not written in English or Portuguese.

Study Selection
Below are the main steps for performing the systematic mapping. The first step was the execution of the search string in the Scopus database. The search returned two hundred sixtyseven potential studies. The second stage consisted of reading the titles, excluding only studies that presented a disparity in the title description, moving 190 studies to the next stage. The third step was to read the abstracts all the abstracts. In this step, we excluded articles that did not present criteria for inclusion, such as evaluation methods, barriers encountered. In the third stage, the remaining 108 studies were read entirely, following the inclusion criteria. Each article read was extracted from data. Seventy studies were discarded as they did not present accessibility problems identified by some evaluation method. Two studies were also discarded after repeatedly checking the publisher server for problems to obtain the full text.
Thus, the final selection resulted in 38 studies. However, some assessed web or mobile accessibility used one or more different methods among the selected studies. Table 1 presents all the studies selected in the complete reading phase.
The entire process of analyzing the studies and extracting the data happened manually, no tools were used.
• Step 1: The selected database was searched using a previously defined search string. From this, 267 potential studies were found. All titles, 190 studies with potential for systematic mapping were selected for the next step ( Step 2), according to the application of the inclusion and exclusion criteria. • Step 2: Of the 190 studies identified in Step 1, 108 studies were accepted to be analyzed in this stage. The abstracts of these studies were read, again using the inclusion and exclusion criteria; • Step 3: In this step, of the 108 consolidated studies of Step 2, 38 studies presented information relevant to the topic. From these studies, data were extracted to answer the research questions. These data were gathered in a spreadsheet to be analyzed.

Data extraction
After selecting the studies, the data were extracted from analyzed and consolidated. The following data were extracted from each study: • Instances of accessibility problems encountered; • Type of method used to find each problem; • Automated tool used; We carried out an analysis to consolidate the types of methods used and the types of problems encountered from the data extraction. An analysis of types of problems and a unique category is assigned to the types of problems to make it possible to compare the types of problems encountered by the different methods.

Results
This section presents the results obtained in the mapping study. We present the accessibility evaluation methods examined and accessibility problems identified. Data identified by the three main methods are shown separately. Similarly, we also summarized the methodological approach used and their characteristics. For example, we discussed the tools involved and the participants' profiles in user studies. Besides, accessibility problems are made explicit, relating them to the studies that identified them. Some accessibility problems were identified by more than one study. Even when it comes to a single evaluation method, there are also problems with accessibility identified by one study only, and there are accessibility issues found on the web and mobile platforms.

Automated Evaluation
This section presents the results obtained with the analysis of nine studies that carried out automated accessibility evaluations.

Automated Web Evaluation
Altogether seven studies involved automated testing on the web, with a set of twelve tools. Table 2 presents the tools used, the number of tools, the number of evaluated sites. The Wave tool was used in three studies. Three studies performed the tests with more than one tool, although one of the studies did not explain which tools were used.
The problems found by such studies had a total of twenty barriers, with violations related to the absence of labels, lack of headers, absence of alternative text, empty links, and duplicate information. Table 10 (in the Appendix) shows all the accessibility problems they encountered and the number of studies they found.

Automated Evaluation of Mobile Apps
Two studies involved automated tests on the mobile platform with a set of three tools. Table 3 presents the tools used, the number of tools, the number of apps evaluated, only one study used more than one tool. All were performed with the Android system.
The problems encountered by these studies had eleven barriers, with related violations: Inappropriate description in controls, target size, insufficient contrast, spacing and inappropriate title, among others. Table 11 (in the Appendix) shows all the accessibility problems they found and the number of studies they found.

Common Problems
Some violations were the same for both platforms: absence of labels, inadequate description of controls, duplicate information, insufficient contrast, incompatibility of inappropriate technologies, navigation sequence, visible focus, and spacing. A total of nine types of problems were found in both mobile apps and web apps.

Inspection by Experts
This section presents the results from the analysis of thirteen studies, in which inspections and specialists made the accessibility evaluation.

Inspection by Experts on Web
Of the ten studies that performed inspections by experts, six employed three or more experts. Only three studies provided the number and name of the tools used to aid in the inspections. Table 4 shows characteristics related to the methodology used by these studies, explaining the data regarding the method used, including the number of sites, expert profile, number of experts the number of tools involved.
The studies involving expert inspections yielded thirty-five barriers. Table 12 (in the Appendix) lists the types of accessibility problems (or barriers) identified in the studies, such as inappropriate description in controls, visible focus, too much information, useless elements, insufficient contrast, content inaccessible to keyboard interaction, sensory characteristics, and others.

Inspection by Experts on Mobile Applications
Of the two studies that performed expert inspections, all used three or more evaluators. The two studies employed the screen reader Talkback. Table 5 shows characteristics related to the methodology used by these studies, explaining the data Accessibility and usability of websites intended for people with disabilities: A preliminary study (Zitkus et al., 2016) A2 Accessibility and usability problems encountered on websites and applications in mobile devices by blind and normal-vision (Carvalho et al., 2018b) A3 Analysis of web accessibility in social networking services through blind users' perspective and an accessible prototype (Loureiro et al., 2014) A4 Analysis, redesign and validation of accessibility resources applied to an official electronic journal for the promotion of equal access to public acts (Rodrigues and Prietch, 2018) A5 Are users the gold standard for accessibility evaluation?   Android (Mateus et al., 2020) associated with the method used, including the number of applications, the profile of the specialist, number of specialists, the number of tools involved. The studies involving expert inspections in mobile applications yielded sixteen barriers. Table 13 (in the Appendix) lists the types of accessibility issues identified in the studies, such as visible focus, keyboard inaccessibility, insufficient contrast, images, error identification, and others.

Common Problems
Altogether, twelve common problems were found both on the mobile and web platforms: Absence of shortcuts, absence of headers, absence of resources for expansion, lack of labels, absence of alternative text, absence of titles, Insufficient contrast, visible focus, error identification, language not set, keyboard and Inadequate navigation sequence. Even though there are fewer studies on mobile apps, it is worth noting that the same number of problems types were encountered as the web platform.

User tests
This section presents the results encountered by twenty-six studies that conducted tests with users on mobile and web platforms.

User Tests on Web Sites
A total of seventeen studies carried out evaluations with blind, low vision, and normal-vision users, ten of these studies used ten or more users, seven used more than one tool, and five did not provide the number and identification of tools. Table  6 presents the characteristics of the methodological approach used in the studies on websites. The table lists the types of sites evaluated, the number of participants and their profiles, and assistive technologies used.
Users encountered issues such as inaccessibility to the keyboard, inappropriate title, inappropriate textual content, inappropriate alternate text, absence of shortcuts, inappropriate feedback, resizing text, images, spacing, and others. The table 14 (in the Appendix) present the accessibility problems identified.

User Tests on Mobile Platforms
Of eight studies that performed evaluations with blind users, with low vision, and with normal vision, two of these studies recruited ten or more users, and only two carried out tests in Android and iOS. Table 7 presents the characteristics of the methodological approach used in the studies involving mobile platforms. , It shows the types of systems evaluated, the number of participants and their profiles, and the assistive technologies used.
Users encountered problems such as the absence of resources for enlarging content, unreachable help links, absence

Common Problems
Seventeen barriers were found on both mobile and web platforms. Those problems were the lack of labels, inappropriate link destination, too much information, absence of alternative text, empty links, insufficient contrast, incompatibility of technologies, absence of titles, inappropriate description in controls, absence of resources for expansion, unreachable help link, absence of feedback, inappropriate textual content, inappropriate title, inconsistent content organization, images, resize text and spacing, pause, stop, hide. It is clear that mobile and website developers still need better accessibility practices.

Problems encountered by different methods
In the selected studies analysis, several problems of accessibility were collected, resulting from the use of different types of accessibility evaluations. There are cases where an acces- sibility problem has been identified by one single method, but there are situations where two or three methods identified problems of accessibility. Those unique problems do not characterize problem instances as in the studies of Carvalho et al. Carvalho et al. (2018b) or Power et al. Power et al. (2012)', as such information was not available in all studies. Thus, considering the accessibility problems identified, there are thirty-eight types of unique problems. For a better understanding of these results and considering unique problems, Table   8 and Table 9 shows the number of problems identified by the three methods, the number of unique problems encountered by only one method, the number of problems identified by two methods, and the number of problems identified by each method. Figures 1 and 2 show the types of problems found in a Venn diagram in web and mobile platforms. In the studies, 311 problem instances were found, of which 200 occurrences are on the web platform, and 111 occurrences are on the mobile platform.

Discussion
The results obtained show characteristics of each type of accessibility evaluation method for the web and mobile platforms. The benefits and limitations of inspections and tests are presented, providing more knowledge on visually impaired people's use of applications.

Problems identified by different methods
Question RQ1 was defined as "Among the problems identified in accessibility evaluations, what are problems found by any combinations of methods?". To answer this question, it is necessary to observe the results presented in Table 8 on websites and Table 9 on mobile. On evaluations of websites, eleven accessibility problems were identified by automated assessments, expert inspections, and user assessments. We were surprised to find that the most commonly encountered problems in Tables 10,12 and 14 still included common issues encountered since early studies on accessibility, such as the absence of (i) alternative text, (ii) labels and contrast, and (iii) headings. Accessibility problems identified by the three methods are relevant in terms of accessibility evaluation operationalization. Many problems may prevent the users' tasks execution and are serious issues. It is positive that even automated evaluations can identify those problems, meaning that they can be identified early in the development process.
These problems can be easily solved in many cases. For example, the lack of alternative text may be fixed by adding content with alternative text (alt="description of information"). The violation of such simple principles shows that deeper issues need to be investigated to bring accessibility into the development process of both web and mobile applications.
The results show that the use of automated tools and manual inspections can optimize the performance of assessments by users, enabling such problems to be addressed found even before testing with users.
Answering RQ1 specifically in the case of mobile apps, the results presented in Table 9 provide insight into the types of problems encountered. Four accessibility problems were identified by automated assessments, expert inspections and user evaluations.
The problems presented in Tables 11,13 and 15 show that the most common violations were: Insufficient contrast, Inadequate navigation sequence, Visible focus. It is important to note that even with few studies using automated tools in the mobile context, the tools have identified relevant problems, considering they have had a shorter evolution time than automated web accessibility evaluation tools.
We can see that the number of problems encountered by all mobile platform methods is lower than the web platform. Perhaps this difference is due to the number of studies using automated tools on the web and mobile devices. There are still few studies on the coverage of automated tools for mobile.

Problems identified by two methods
Research question 2 (RQ2) was stated as "What are the benefits and limitations of each method for evaluating accessibility on the web?". According to the results presented in Table 8, (i) five problems were found by automated inspections, and expert inspections and user tests identified tests with users and (ii) fourteen problems. Therefore, the discussion of this research question focuses on the problems with experts and users.
The problems encountered by users and experts were: Absence of contrast feature, Too much information, Inaccessible help link, Absence of feedback, Unexpected changes, Inappropriate alternate text, Inconsistent content organization, Inappropriate feedback, Keyboard, Location, Error identification, Color usage, Pause-Stop-Hide, Description of audio or alternative media (pre-recorded) and structural issues in the analysis of interactive elements.
Although the accessibility problems identified by all methods are relevant, it is important to highlight that the problems encountered by expert inspections and user tests have particular relevance. Many such problems are related to the inadequacy of the interface components rather than by the absence of specific accessibility features, as is normally the case of problems encountered by automated tools. In this sense, the inadequacy in one element can do more damage to usability than just the absence of a feature. For example, inappropriate text prevents access to non-textual content but does not cause errors in automated evaluations. However, unexpected changes can cause further damage to the interaction if the user does not know the reason for the change. The absence of feedback can lead the user to perform the same activity over and over again.
Research question 3 (RQ3) was stated as "What are the benefits and limitations of each method for evaluating accessibility on mobile platforms?". According to the results presented in Table 9, (i) four problems were found by automated inspections and tests with users, and (ii) seven problems were identified by expert inspections and user tests.
The problems encountered by users and experts were: Too much information, Absence of resources for expansion, Location, Keyboard, Images, Inadequate navigation sequence and Absence of titles.
Thus, results with problems found only by expert inspections and user tests corroborate other results found in the literature Vigo et al. (2013), that highlighted the drawbacks of using only automatic assessments and considering the relevance of the problems found only with the involvement of users and experts.

Mobile and Web platforms
When we analyze Table 10, 12, 14 and 15, it is possible to verify that most problems in the study were identified the web platform, totaling 64.30%. Table 11, 5 and 15 show that problems encountered on mobile platforms totalled 35.30%.
It was noteworthy that fewer studies have conducted largescale evaluations of mobile apps using automated evaluations tools. This type of study has been widespread in the literature focusing on Web accessibility. This might be one possible explanation for the limited number of problems identified on mobile apps.
There has been an increasing number of studies focusing on the accessibility of mobile apps. However, in the last seven years, they are still fewer in comparison to web accessibility studies.
After analyzing the data, we verified that barriers are found on web and mobile platforms. Those issues have common aspects, even if implemented with different technologies. They show that more recent endeavours to promote mobile accessibility can count on many lessons already learnt in web accessibility research, while more particular issues in the platform are investigated.

Benefits and Limitations of Different Methods
From the results obtained in our analysis, we can confirm that the main advantage of evaluations using automated tools is the agility in identifying problems early on in the development process. Inspections by specialists and user tests demand more time and planning to conduct and analyze studies. This result is in line with previous studies. Ivory (2013) pointed out that, in web accessibility evaluations, automated evaluations tools can speed up the process of identifying a subset of WCAG success criteria. Using automated evaluations may also have a lower cost and be easily applied even by less experienced developers and designers (Ivory, 2013;Jaeger, 2006). Automated assessment tools for mobile platforms, due to their characteristic of verifying components dynamically, can encounter a more significant number of problems in relation to web tools that perform the verification in a static way (Quispe and Eler, 2018;Eler et al., 2018). Mobile accessibility evaluation tools had better performance (Eler et al., 2018) in encountering a larger number of instances of violations.
Inspections by specialists are very relevant to help identify accessibility problems that could go unnoticed in user evaluations that may not explore particular parts of large systems and problems that automated tools cannot identify. In the analysis in this study, for example, inspections by specialists identified duplicated links and difficulties to find the "help" pages. Further to this, inspections by specialists can also be applied earlier on in the development process (Lazar, 2005;Freire, 2012), as organizations may organize consultancy demands or in-house inspections, with less difficulty than the logistics of user evaluations. Inspections by specialists may also help to identify problems that users with visual disabilities might not be able to identify due to lack of accessibility (e.g. a vital image with a null textual description that would be ignored by a screen reader). However, effective accessibility inspections require well-trained professionals, who might always be readily available.
Finally, the main benefit of using user evaluations is the ability to identify critical problems that cannot be identified by other methods and that have an essential impact on users with disabilities. For example, in the results found in this mapping study, user evaluations revealed problems with the inconsistency in content organization and too much information on a page or screen. These problems may severely impact the performance of people with visual disabilities, impacting their interaction and navigation on websites and mobile applications. However, these methods may be costly to apply and require a wider range of participants. As pointed out by Gonçalves et al. (2018), the different experiences participants may have with screen readers vary significantly, and this may impact the results obtained in user evaluations. In the case of mobile evaluation, evaluations must involve participants who use different mobile devices, as accessibility resources and assistive technologies may also vary in different platforms.

Most effective method
This study analyzed the different contributions that accessibility assessment methods have to identify problems that affect visually impaired users on websites and mobile applications.
As shown in Table 8, for the web platform, of all types of unique problems, ten were identified only by inspections by experts, eleven were found only by user evaluations, and automated tools found three types of problems. On Table  9, for the mobile platform, of all types of unique problems, nine were identified only by inspections by experts, fortyfour were found only by user evaluations, and automated tools found three types of problem. Despite being important productivity aids, automated tools cannot identify a broader range of accessibility problems. Expert inspections and user reviews are the most suitable methods to be used to identify more problems.
Along with using automated evaluation tools in earlier phases of the development process to identify more obvious problems more effectively, evaluations should incorporate inspections by specialists and user evaluations. This result is in line with the findings from Harrison and Petrie (2007), who showed that the severity of accessibility problems assigned by specialists and users were more in agreement than the priorities assigned by guidelines, for example.

Most cost-effective method
The fastest and less costly method is automated testing. It was able to identify ten problems on the mobile platform 8 and seven on the mobile platform 9, its testing capability allows for repetitive tests in a few seconds (Eler et al., 2018;Mateus et al., 2020;Brajnik et al., 2011). Furthermore, the tools were able to find unique issues on both platforms, three for web and three for mobile. This shows that developers should use the tools in both web and mobile platforms to incorporate basic accessibility resources early in the development process.

Impact of the issues encountered
Several studies point out that as accessibility guidelines are not able to cover all violations (Power et al., 2012;Carvalho et al., 2018b). The WCAG is constantly being updated so that it can cover all accessibility violations.
To reduce the distances found on the platforms, it is necessary to carry out tests using the three methods produced in this study to improve accessibility, however, tests with high resource users. However, it is necessary to understand the difficulties developers have to apply as good accessibility practices.
The violations found can seriously affect users, especially in this period of the pandemic COVID-19 (Agarwal et al., 2020) that everyone was forced to follow protocols of social distancing, barriers such as non-text content users may have problems reading the screen because they did not inform the content, problems with contracting users with the low vision and color blind, information can pass without them seeing. Studies show that these barriers have a high degree of severity, indicating that the user must exert excessive effort to fulfil the task (Carvalho et al., 2018b;Rømen and Svanaes, 2012).

Conclusion
This study aimed at characterizing the main benefits and limitations of different accessibility evaluation methods focused on people with visual impairments on Web and mobile platforms, based on a mapping of the literature of the past seven years. The study analyzed thirty-eight papers that evaluated websites and mobile applications that involved evaluation by automated tools, inspections by specialists and user evaluations. The results build upon a previous study (Silva et al., 2019) covering web accessibility problems that analyzed nineteen studies.
The study discussed the main benefits of each type of method. Evaluations with automated tools are faster and can help find problems that would be difficult to find manually with repetitive tests Ivory (2013). Inspections by specialists can identify problems that other methods could go unnoticed and may help predict more common problems that could be fixed before user evaluations. User evaluations are the "goldstandard" in accessibility evaluations, as they can encounter the most relevant problems that impact users in the systems and that specialists and tools might not identify.
However, the study also identified limitations. Automated evaluations with tools are not able to identify the adequacy of accessibility resources in the context in which they are used (Brajnik et al., 2011). Inspections by specialists may take time to be carried out and still cannot reveal all problems that real users may encounter. The experience participants may impact user evaluations have with different assistive technologies and take a significant time to be performed (Gonçalves et al., 2018). Further to this, some problems are difficult to identify by people with visual disabilities due to their nature and require inspection by a specialist.
It is important to emphasize that although studies on accessibility problems found in mobile applications have great relevance since the tools can find relevant and significant problems (Mateus et al., 2020).
Therefore, when conducting accessibility assessments, the ideal is to use assessments involving user tests and inspections by experts, as these two methods combined can identify more number of absences and inadequacies of accessibility features on web pages. Automated tests are useful but should not be performed as only evaluation method when you want to check the accessibility of a website or at the end of its development.
Future work could examine the differences in the outcomes of evaluations performed in different countries. The present study focused on the types of problems encountered by different methods, and many included studies did not focus on a single country or place. However, considering specific cultural issues would bring important findings to understand accessibility problems.  Silva, C. F., Ferreira, S. B. L., and Ramos, J. F. M. (2016b).
Whatsapp accessibility from the perspective of visually impaired people. In Proceedings of the 15th Brazilian Symposium on Human Factors in Computing Systems, pages 1-10. da Silva, C. F., Ferreira, S. B. L., and Sacramento, C. (2018a).
Mobile application accessibility in the context of visually impaired users.      If the technologies being used can provide visual presentation, text is used to convey information instead of images of text except for the following.  46 Error identification If an input error is automatically detected, the item with an error is identified and the error is described to the user in text. (de Oliveira et al., 2016) 8 Language not set Content that has no language defined. (Park et al., 2019) 42 Location It is not possible to know where it is within the system  59 Time limits Users are advised of the duration of any user inactivity that may cause data loss, unless data is preserved when the user does not take any action for more than 20 hours.  39 Keyboard All mouse operations have an accessible keyboard equivalents  20 Inadequate navigation sequence Content that does not allow an adequate navigation sequence by screen readers. (de Oliveira et al., 2016;Carvalho et al., 2016)   Alternative audio in another language. (Yi, 2020) 36 Inappropriate description in headers Headers that do not have a proper description of the content they are linked to.
( Gonçalves et al., 2018;Fernandes et al., 2015) 37 Inconsistent content organization Content that is not well organized. (Carvalho et al., 2018b) 38 Inappropriate feedback Feedback that is not identified by screen readers, such as color-based information. (Loureiro et al., 2015) 39 Keyboard mouse operations have an accessible keyboard equivalents (Valencia et al., 2014;Acosta-Vargas et al., 2019b;Akram and Bt Sulaiman, 2020) 46 Error identification If an input error is automatically detected, the item with an error is identified and the error is described to the user in text. (