A Comparison Between Hierarchical and Non-Hierarchical Software Clustering

  • Léo C. R. Antunes UNIRIO
  • Márcio de O. Barros UNIRIO

Resumo


Heuristic search algorithms have been applied in various areas of Software Engineering, such as requirements prioritization, software architecture improvement, and time and cost planning. The research field of Search-based Software Engineering (SBSE) proposes the use of heuristic search techniques to solve Software Engineering challenges, which are described as optimization problems. One of the challenges frequently analyzed from the SBSE perspective is the Software Module Clustering (SMC) problem. This problem involves distributing a software project’s component modules (or classes) among larger structures called clusters (or packages). Practice and experimental studies have shown a strong correlation between poor distributions and the presence of component failures. The main factors used to assess the quality of distributions are coupling and cohesion. However, research shows that optimization based on metrics that capture structural characteristics does not result in an adequate distribution of classes in packages from the point of view of software developers. One of the reasons that might have led to such limited results is that the algorithms produce non-hierarchical distributions of classes into packages. In this study, we executed the same greedy algorithm using two different fitness functions: one that does not consider hierarchy and the other that does. We wanted to measure whether they produced different results regarding authoritativeness. We found indications that a hierarchy-based approach produces solutions closer to those software developers propose.

Palavras-chave: Software Engineering, Software Clustering, Hierarchical Design

Referências

Jesús S. Aguilar-Ruiz, Isabel Ramos, José Cristóbal Riquelme Santos, and Miguel Toro. 2001. An evolutionary approach to estimating software development projects. Information and Software Technology 43, 14 (2001), 875–882. DOI: 10.1016/S0950-5849(01)00193-8 Publisher: Elsevier.

Giuliano Antoniol, Massimiliano Di Penta, and Mark Harman. 2005. Search-Based Techniques Applied to Optimization of Project Planning for a Massive Maintenance Project. In Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM 2005), 25-30 September 2005, Budapest, Hungary. IEEE Computer Society, 240–249. DOI: 10.1109/ICSM.2005.79

Andrea Arcuri and Lionel C. Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proc. of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011. ACM, 1–10. DOI: 10.1145/1985793.1985795

Anthony J. Bagnall, Victor J. Rayward-Smith, and Ian M. Whittley. 2001. The next release problem. Information and Software Technology 43, 14 (2001), 883–890. DOI: 10.1016/S0950-5849(01)00194-X

Márcio Barros. 2014. An experimental evaluation of the importance of randomness in hill climbing searches applied to software engineering problems. Empir. Softw. Eng. 19, 5 (2014), 1423–1465. DOI: 10.1007/S10664-013-9294-4

Márcio Barros, Fábio Farzat, and Guilherme Travassos. 2015. Learning from optimization: A case study with Apache Ant. Inf. Softw. Technol. 57 (2015), 684–704. DOI: 10.1016/J.INFSOF.2014.07.015

Iain Bate and Simon M. Poulding. 2011. Editorial for the special issue on searchbased software engineering. Software: Practice and Experience 41, 5 (2011), 467–468. DOI: 10.1002/SPE.1056

Michael Bowman, Lionel C. Briand, and Yvan Labiche. 2010. Solving the Class Responsibility Assignment Problem in Object-Oriented Analysis with Multi-Objective Genetic Algorithms. IEEE Transactions on Software Engineering 36, 6 (2010), 817–837. DOI: 10.1109/TSE.2010.70

Lionel Claude Briand, Sandro Morasca, and Victor R. Basili. 1999. Defining and Validating Measures for Object-Based High-Level Design. IEEE Trans. Software Eng. 25 (1999), 722–743. [link]

Colin James Burgess and Martin Lefley. 2001. Can genetic programming improve software effort estimation? A comparative evaluation. Inf. Softw. Technol. 43 (2001), 863–873. [link]

Aaron Clauset, M. E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E 70, 6 (Dec. 2004), 066111. DOI: 10.1103/PhysRevE.70.066111

J. Cohen. 1992. A power primer. Psychological Bulletin 112, 1 (July 1992), 155–159. DOI: 10.1037//0033-2909.112.1.155

R. M. Cormack. 1971. A Review of Classification. Journal of the Royal Statistical Society. Series A (General) 134, 3 (1971), 321–367. DOI: 10.2307/2344237 Publisher: [Royal Statistical Society, Wiley].

D. Doval, S. Mancoridis, and B.S. Mitchell. 1999. Automatic clustering of software systems using a genetic algorithm. In STEP ’99. Proceedings Ninth International Workshop Software Technology and Engineering Practice. IEEE Comput. Soc, Pittsburgh, PA, USA, 73–81. DOI: 10.1109/STEP.1999.798481

Juan J. Durillo, Yuanyuan Zhang, Enrique Alba, Mark Harman, and Antonio J. Nebro. 2011. A Study of the Bi-Objective next Release Problem. Empirical Softw. Engg. 16, 1 (Feb. 2011), 29–60. DOI: 10.1007/s10664-010-9147-3 Place: USA Publisher: Kluwer Academic Publishers.

Gordon Fraser and Jerffeson Teixeira de Souza. 2018. Guest editorial: searchbased software engineering. Empirical Software Engineering (2018), 1–3. [link]

Simon J. Gibbs, Dennis Tsichritzis, Eduardo Casais, Oscar Nierstrasz, and Xavier Pintado. 1990. Class management for software communities. Commun. ACM 33 (1990), 90–103. [link]

A. D. Gordon. 1987. A Review of Hierarchical Classification. Journal of the Royal Statistical Society. Series A (General) 150, 2 (1987), 119–137. DOI: 10.2307/2981629 Publisher: [Royal Statistical Society, Wiley].

MathewHall and Phil McMinn. 2012. An analysis of the performance of the bunch modularisation algorithm’s hierarchy generation approach. In 4 th Symposium on Search Based-Software Engineering. 19.

Mark Harman, Robert Hierons, and Mark Proctor. 2002. A new representation and crossover operator for search-based optimization of software modularization. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation (GECCO’02). Morgan Kaufmann Publishers, San Francisco, CA, USA, 1351–1358.

Mark Harman and Bryan F. Jones. 2001. Search-based software engineering. Inf. Softw. Technol. 43, 14 (2001), 833–839. DOI: 10.1016/S0950-5849(01)00189-6

Jon Kleinberg and Eva Tardos. 2006. Algorithm design. Pearson Addison Wesley.

Craig Larman. 2001. Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and the Unified Process. [link]

Eugene L. Lawler and D. E. Wood. 1966. Branch-and-Bound Methods: A Survey. Oper. Res. 14 (1966), 699–719. [link]

Ming Li and Paul M.B. Vitányi. 1990. Kolmogorov Complexity and its Applications. In Algorithms and Complexity. Elsevier, 187–254. DOI: 10.1016/B978-0-444-88071-0.50009-6

Rudi Lutz. 2001. Evolving good hierarchical decompositions of complex systems. J. Syst. Archit. 47 (2001), 613–634. [link]

Spiros Mancoridis, Brian Mitchell, Chris Rorres, Yih-Farn Chen, and Emden Gansner. 1998. Using Automatic Clustering to Produce High-Level System Organizations of Source Code. In 6th International Workshop on Program Comprehension (IWPC ’98), June 24-26, 1998, Ischia, Italy. IEEE Computer Society, Ischia, Italy, 45–52. DOI: 10.1109/WPC.1998.693283

Steve McConnell. 2004. Code Complete, Second Edition. [link]

Leandro L. Minku and Xin Yao. 2013. Ensembles and locality: Insight on improving software effort estimation. Inf. Softw. Technol. 55 (2013), 1512–1528. [link]

Brian Mitchell and Spiros Mancoridis. 2002. Using heuristic search techniques to extract design abstractions from source code. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation (GECCO’02). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1375–1382.

Brian Mitchell and Spiros Mancoridis. 2006. On the Automatic Modularization of Software Systems Using the Bunch Tool. IEEE Transactions on Software Engineering 32, 3 (March 2006), 193–208. DOI: 10.1109/TSE.2006.31

Marlon C. Monçores, Adriana C. F. Alvim, and Márcio O. Barros. 2018. Large Neighborhood Search applied to the Software Module Clustering problem. Computers & Operations Research 91 (March 2018), 92–111. DOI: 10.1016/j.cor.2017.10.004

Marlon da Costa Monçores. 2015. Busca em vizinhança grande aplicada ao problema de clusterização de módulos de software. Master’s thesis. [link] Accepted: 2018-06-25T22:06:23Z.

Alexandre Fernandes Pinto. 2014. Uma heurística baseada em busca local iterada para o problema de clusterização de módulos de software. Master’s thesis. [link] Accepted: 2018-07-10T22:00:52Z.

Kata Praditwong, Mark Harman, and Xin Yao. 2011. Software Module Clustering as a Multi-Objective Search Problem. IEEE Transactions on Software Engineering 37, 2 (2011), 264–282. DOI: 10.1109/TSE.2010.26

J. Rissanen. 1978. Modeling by shortest data description. Automatica 14, 5 (Sept. 1978), 465–471. DOI: 10.1016/0005-1098(78)90005-5

C E Shannon. 1948. A Mathematical Theory of Communication. The Bell System Technical Journal 27 (Oct. 1948), 623–656.

V. Tzerpos and R.C. Holt. 2000. ACCD: an algorithm for comprehension-driven clustering. In Proceedings Seventh Working Conference on Reverse Engineering. 258–267. DOI: 10.1109/WCRE.2000.891477 ISSN: 1095-1350.

Claes Wohlin. 2012. Experimentation in software engineering. Springer, NY.

Joseph Arthur Wood. 1998. Improving software designs via the minimum description length principle. PhD Thesis. [link]

Jifeng Xuan, He Jiang, Zhilei Ren, and Zhongxuan Luo. 2012. Solving the Large Scale Next Release Problem with a Backbone-Based Multilevel Algorithm. IEEE Transactions on Software Engineering 38, 5 (2012), 1195–1212. DOI: 10.1109/TSE.2011.92

Edward Yourdon and Larry L. Constantine. 1979. Structured design. Fundamentals of a discipline of computer program and systems design. Englewood Cliffs: Yourdon Press (1979). [link]
Publicado
30/09/2024
ANTUNES, Léo C. R.; BARROS, Márcio de O.. A Comparison Between Hierarchical and Non-Hierarchical Software Clustering. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 38. , 2024, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 69-79. DOI: https://doi.org/10.5753/sbes.2024.3279.