Using visual-interactive properties to support data quality visual assessment on abstract and timeless data
Keywords:Data Quality Assessment, Information Visualization, Structured Data Defects, Visual Assessment
Visualization systems belong to supervised tools that can make noticeable the intrinsic structures of defects on data. However, despite the significant number of these systems that assist Data Quality Assessment, few provide resources to examine these structures deeply. This situation prevents data quality appraisers from using their contextual knowledge to confirm or refute any data defect. This article explores a visualisation system’s additional features and design characteristics (named V is4DD) that uses visual-interactive properties to support data quality visual assessment on abstract and timeless data (e.g., Customer, Billing). Additionally, we conduct a full review and outline the state-of-art visualization systems related to data quality assessment and fit Vis4DD into this scenario.
Beeley, C. and Sukhdeve, S. R. Web Application Development with R Using Shiny: Build stunning graphics and interactive data visualizations to deliver cutting-edge analytics. Packt Publishing Ltd, Birmingham, UK, 2018.
Bergman, L. D., Rogowitz, B. E., and Treinish, L. A. A rule-based tool for assisting colormap selection. In Proceedings of the 6th conference on Visualization’95. IEEE Computer Society, Washington DC, USA, pp. 118, 1995.
Bertin, J. Semiology of graphics: diagrams networks maps. Esri Press, California, US, 2010.
Borovina Josko, J. M. and Ferreira, J. E. Vis4dd: A visualization system that supports data quality visual assessment. In Proceedings of the satellite events of 32nd Brazilian Symposium on databases. SBC, Uberlandia, Brazil, pp. 46–51, 2017a.
Borovina Josko, J. M. and Ferreira, J. E. Visualization properties for data quality visual assessment: An exploratory case study. Information Visualization 16 (2): 93–112, 2017b.
Borovina Josko, J. M., Oikawa, M. K., and Ferreira, J. E. A formal taxonomy to improve data defect description. In Database Systems for Advanced Applications: DASFAA 2016 International Workshops: BDMS, BDQM, MoI, and SeCoP, Dallas, TX, USA, April 16-19, 2016, Proceedings, H. Gao, J. Kim, and Y. Sakurai (Eds.). Springer International Publishing, Cham, pp. 307–320, 2016.
Chambers, J. M. Software for data analysis: programming with R. Springer, New York, NY, USA, 2008.
Chandola, V., Banerjee, A., and Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 41 (3): 15:1–15:58, July, 2009.
Cheng, X., Cook, D., and Hofmann, H. Visually exploring missing values in multivariable data using a graphical user interface. Journal of statistical software 68 (1): 1–23, 2015.
Cook, D., Swayne, D. F., and Buja, A. Missing values. In Interactive and dynamic graphics for data analysis: with R and GGobi. Springer Science & Business Media, New York, NY, USA, pp. 47–62, 2007.
Dasu, T. Data glitches: Monsters in your data. In Handbook of Data Quality. Springer, Berlin, Germany, pp. 163–178, 2013.
Führing, P. and Naumann, F. Emergent data quality annotation and visualization. In ICIQ. MIT, Cambridge, MA, USA, pp. 424–430, 2007.
Gahlawat, A. Big data analysis using r and hadoop. IJCEM International Journal of Computational Engineering & Management 17 (5): 9–14, 2014.
Green-Armytage, P. A colour alphabet and the limits of colour coding. JAIC-Journal of the International Colour Association vol. 5, pp. 1–23, 2010.
Kandel, S., Parikh, R., Paepcke, A., Hellerstein, J. M., and Heer, J. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, Capri Island, Italy, pp. 547–554, 2012.
Kang, H., Getoor, L., Shneiderman, B., Bilgic, M., and Licamele, L. Interactive entity resolution in relational data: A visual analytic tool and its evaluation. IEEE Transactions on Visualization and Computer Graphics 14 (5): 999–1014, 2008.
Keim, D. A. Information visualization and visual data mining. IEEE Transactions on Visualization and Computer Graphics 8 (1): 1–8, Jan., 2002.
Mackinlay, J. Automating the design of graphical presentations of relational information. ACM Transactions on Graphics (TOG) 5 (2): 110–141, 1986.
Malik, W. A., Unwin, A., and Gribov, A. An interactive graphical system for visualizing data quality–tableplot graphics. In Classification as a Tool for Research. Springer, Berlin, Germany, pp. 331–339, 2010.
Maydanchik, A. Data quality assessment. Technics publications, Bradley Beach, NJ, USA, 2007.
Noselli, M., Mason, D., Mohammed, M., and Ruddle, R. Monat: a visualweb-based tool to profile health data quality. In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017). Vol. 5. SCITEPRESS, Porto, Portugal, pp. 26–34, 2017.
Research, C. and of Australasia, E. A. The era conference ranking exercise. http://portal.core.edu.au/confranks/, 2020.
Reuters, T. Journal citation reports. https://clarivate.com/webofsciencegroup/solutions/journal-citation-reports/, 2020.
Sánchez, R. Á., Iraola, A. B., Unanue, G. E., and Carlin, P. Taqih, a tool for tabular data quality assessment and improvement in the context of health data. Computer methods and programs in biomedicine vol. 181, pp. 104824, 2019.
Sjöbergh, J. and Tanaka, Y. Visualizing missing values. In 2017 21st International Conference Information Visualisation (IV). IEEE, London, United Kingdom, pp. 242–249, 2017.
Sulo, R., Eick, S., and Grossman, R. Davis: a tool for visualizing data quality. Posters Compendium of InfoVis vol. 2005, pp. 45–46, 2005.
Templ, M. and Filzmoser, P. Visualization of missing values using the r-package vim. Tech. rep., Department of Statistics and Probability Therory, Vienna University of Technology, 2008.
Teng, D., Yang, H., Ma, C., and Wang, H. Vdqam: A toolkit for database quality evaluation based on visual morphology. In 2012 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, Seattle, WA, USA, pp. 245–246, 2012.
Thomas, J. J. and Cook, K. A. Illuminating the path: The research and development agenda for visual analytics. IEEE Computer Society Press, New York, NY, USA, 2005.
Unwin, A., Hawkins, G., Hofmann, H., and Siegl, B. Interactive graphics for data sets with missing values — manet. Journal of Computational and Graphical Statistics 5 (2): 113–122, 1996.
Usher, W. and Pascucci, V. Interactive visualization of terascale data in the browser: Fact or fiction? In 2020 IEEE 10th Symposium on Large Data Analysis and Visualization (LDAV). IEEE, Salt Lake City, Utah, USA, pp. 27–36, 2020.
Wang, K., Ma, D. T. H. Y. C., and Wang, H. Dqvis: A toolkit for visual quality analysis for relational database. In Proceddings of 17th IEEE International Conference on Information Visualisation - Poster Session. IEEE, Porto, Portugal, 2013.
Ware, C. Information Visualization: Perception for Design. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2004.
Xie, Z., Huang, S., Ward, M. O., and Rundensteiner, E. A. Exploratory visualization of multivariate data with variable quality. In 2006 IEEE Symposium on Visual Analytics Science And Technology. IEEE, Baltimore, MD,USA, pp. 183–190, 2006.
Yang, W., Tao, Y., and Lin, H. Voxer—a platform for creating, customizing, and sharing scientific visualizations. Journal of Visualization 22 (6): 1161–1176, 2019.