Unveiling the Secrets: Reconstruction of Shredded Documents using Deep Learning

  • Thiago M. Paixão IFES
  • Maria C. S. Boeres UFES
  • Thiago Oliveira-Santos UFES

Resumo


This work addresses the intricate task of reconstructing mechanically-shredded documents with potential application in forensic investigation. Our primary contributions consist of two novel deep learning approaches for fully automatic reconstruction tested on real-world shredded data that achieved state-of-the-art accuracy in more realistic scenarios. As a second major contribution, we introduce a novel framework for semi-automatic reconstruction inspired by the principles of active learning. The core of our proposal is a recommendation module that smartly flags potential errors in the reconstruction output (permutation of shreds) for human review, enabling even more enhanced reconstructions. The mentioned contributions and additional outcomes (datasets and experimental protocols) resulted in five relevant publications: three journal articles and two international conferences, including the premier IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR).

Referências

C. R. Babcock, “Tongsun park’s paper jigsaw puzzle solved,” The Washington Post, 18 Sep 1977, available at: [link]. (Accessed: April 24th, 2023).

A. Ukovich, G. Ramponi, H. Doulaverakis, Y. Kompatsiaris, and M. Strintzis, “Shredded document reconstruction using MPEG-7 standard descriptors,” in Symp. on Signal Process. and Info. Technol., 2004, pp. 334–337.

P. Butler, P. Chakraborty, and N. Ramakrishan, “The Deshredder: A visual analytic approach to reconstructing shredded documents,” in IEEE Conf. on Vis. Analytics Sci. and Technol. IEEE, 2012, pp. 113–122.

M. Prandtstetter and G. R. Raidl, “Combining forces to reconstruct strip shredded text documents,” in Int. Workshop on Hybrid Metaheuristics. Springer, 2008, pp. 175–189.

M. Prandtstetter, “Two approaches for computing lower bounds on the reconstruction of strip shredded text documents,” TR1860901, Technishe Universitat Wien, Institut fur Computergraphik und Algorithmen, Tech. Rep., 2009.

J. Chen, M. Tian, X. Qi, W. Wang, and Y. Liu, “A solution to reconstruct cross-cut shredded text documents based on constrained seed K-means algorithm and ant colony algorithm,” Expert Syst. with Appl., vol. 127, pp. 35–46, 2019.

D. Pomeranz, M. Shemesh, and O. Ben-Shahar, “A fully automated greedy square jigsaw puzzle solver,” in IEEE Conf. Comput. Vision and Pattern Recognit., 2011, pp. 9–16.

J. Perl, M. Diem, F. Kleber, and R. Sablatnig, “Strip shredded document reconstruction using optical character recognition,” in Int. Conf. on Imag. for Crime Detection and Prevention, 2011, pp. 1–6.

T. M. Paixão, M. C. S. Boeres, C. O. A. Freitas, and T. Oliveira-Santos, “Exploring Character Shapes for Unsupervised Reconstruction of Stripshredded Text Documents,” IEEE Trans. Inf. Forensics Secur., vol. 14, no. 7, pp. 1744–1754, 2019.

N. Xing and J. Zhang, “Graphical-character-based shredded chinese document reconstruction,” Multimedia Tools and Appl., vol. 76, no. 10, pp. 12 871–12 891, 2017.

L. Perdue, “What the argo movie got wrong about shredded documents,” [link], April 2013, accessed: June 5, 2023.

T. M. Paixão, R. F. Berriel, M. C. S. Boeres, C. Badue, A. F. De Souza, and T. Oliveira-Santos, “A deep learning-based compatibility score for reconstruction of strip-shredded text documents,” in Conf. on Graph., Patterns and Images, 2018, pp. 87–94.

T. M. Paixão, R. F. Berriel, M. C. S. Boeres, A. L. Koerich, C. Badue, A. F. De Souza, and T. Oliveira-Santos, “Self-supervised deep reconstruction of mixed strip-shredded text documents,” Pattern Recognit., vol. 107, p. 107535, 2020a.

T. M. Paixão, R. F. Berriel, M. C. S. Boeres, A. L. Koerich, C. Badue, A. F. D. Souza, and T. Oliveira-Santos, “Fast(er) reconstruction of shredded text documents via self-supervised deep asymmetric metric learning,” in IEEE/CVF Conf. on Comp. Vision and Pattern Recognit., 2020b, pp. 14 343–14 351.

D. Pöhler, R. Zimmermann, B. Widdecke, H. Zoberbier, J. Schneider, B. Nickolay, and J. Krüger, “Content representation and pairwise feature matching method for virtual reconstruction of shredded documents,” in 9th IEEE Int. Symp. Image and Signal Process. and Anal., 2015, pp. 143–148.

T. M. Paixão, R. F. Berriel, M. C. S. Boeres, A. L. Koerich, C. Badue, A. F. De Souza, and T. Oliveira-Santos, “A human-in-the-loop recommendation-based framework for reconstruction of mechanically shredded documents,” Pattern Recognit. Letters, vol. 164, pp. 1–8, 2022.

D. Applegate, R. Bixby, V. Chvatal, and W. Cook, “Concorde: A code for solving traveling salesman problems,” [link], 2001, accessed on: October 19, 2020.

M. Marques and C. Freitas, “Document decipherment-restoration: Stripshredded document reconstruction based on color,” IEEE Latin America Trans., vol. 11, no. 6, pp. 1359–1365, 2013.

Y. Liang and X. Li, “Reassembling Shredded Document Stripes Using Word-path Metric and Greedy Composition Optimal Matching Solver,” IEEE Trans. on Multimedia, vol. 22, no. 5, pp. 1168–1181, 2020.

N. Rubens, M. Elahi, M. Sugiyama, and D. Kaplan, “Active learning in recommender systems,” in Recommender systems handbook. Springer, 2015, pp. 809–846.
Publicado
06/11/2023
PAIXÃO, Thiago M.; BOERES, Maria C. S.; OLIVEIRA-SANTOS, Thiago. Unveiling the Secrets: Reconstruction of Shredded Documents using Deep Learning. In: WORKSHOP DE TESES E DISSERTAÇÕES - CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 36. , 2023, Rio Grande/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 49-55. DOI: https://doi.org/10.5753/sibgrapi.est.2023.27451.