A Deep Learning Approach to Mobile Camera Image Signal Processing
ResumoThe quality of the images obtained from mobile cameras has been an important feature for modern smartphones. The camera Image Signal Processing (ISP) is a significant procedure when generating high-quality images. However, the existing algorithms in the ISP pipeline need to be tuned according to the physical resources of the image capture, limiting the final image quality. This work aims at replacing the camera ISP pipeline with a deep learning model that can better generalize the procedure. A Deep Neural Network based on the UNet architecture was employed to process RAW images into RGB. Pre-processing stages were applied, and some resources for training were added incrementally. The results demonstrated that the test images were obtained efficiently, indicating that the replacement of traditional algorithms by deep models is indeed a promising path.
O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234–241.
J. Nishimura, T. Gerasimow, R. Sushma, A. Sutic, C.-T. Wu, and G. Michael, "Automatic isp image quality tuning using nonlinear optimization," in 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 2471–2475.
J. Johnson, A. Alahi, and L. Fei-Fei, "Perceptual losses for real- time style transfer and super-resolution," in European Conference on Computer Vision. Springer, 2016, pp. 694–711.
S. W. Hasinoff, D. Sharlet, R. Geiss, A. Adams, J. T. Barron, F. Kainz, J. Chen, and M. Levoy, "Burst photography for high dynamic range and low-light imaging on mobile cameras," ACM Transactions on Graphics (Proc. SIGGRAPH Asia), vol. 35, no. 6, 2016.
E. Reinhard, W. Heidrich, P. Debevec, S. Pattanaik, G. Ward, and K. Myszkowski, High dynamic range imaging: acquisition, display, and image-based lighting. Morgan Kaufmann, 2010.
S. Kaji and S. Kida, "Overview of image-to-image translation by use of deep neural networks: denoising, super-resolution, modality conversion, imaging," Radiological Physics and and reconstruction in medical Technology, vol. 12, no. 3, pp. 235–248, 2019.
Z. Yan, H. Zhang, B. Wang, S. Paris, and Y. Yu, "Automatic photo adjustment using deep neural networks," ACM Transactions on Graphics (TOG), vol. 35, no. 2, pp. 1–15, 2016.
M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, "Deep bilateral learning for real-time image enhancement," ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1–12, 2017.
E. Schwartz, R. Giryes, and A. M. Bronstein, "Deepisp: Toward learning an end-to-end image processing pipeline," IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 912–923, 2018.
A. Ignatov, L. Van Gool, and R. Timofte, "Replacing mobile camera isp with a single deep learning model," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 536–537.
R. Lukac, Computational Photography: Methods and Applications, ser. Digital Imaging and Computer Vision. CRC Press, 2010.
——, Single-sensor imaging: methods and applications for digital cameras. CRC Press, 2018.
B. K. Gunturk, J. Glotzbach, Y. Altunbasak, R. W. Schafer, and R. M. Mersereau, "Demosaicking: color ﬁlter array interpolation," IEEE Signal Processing Magazine, vol. 22, no. 1, pp. 44–54, 2005.
Q. Chen and V. Koltun, "Photographic image synthesis with cascaded reﬁnement networks," in Proceedings of the IEEE International Confer- ence on Computer Vision, 2017, pp. 1511–1520.
G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, "Image inpainting for irregular holes using partial convolutions," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 85–100.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in International Conference on Learning Representations, 2015.
L. Gatys, A. S. Ecker, and M. Bethge, "Texture synthesis using convolu- tional neural networks," in Advances in Neural Information Processing Systems, 2015, pp. 262–270.
L. A. Gatys, A. S. Ecker, and M. Bethge, "Image style transfer using convolutional neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2414–2423.
R. A. Horn and C. R. Johnson, Matrix analysis. Cambridge University Press, 2012
A. Mahendran and A. Vedaldi, "Understanding deep image representations by inverting them," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5188–5196.
Y. Wu and K. He, "Group normalization," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3–19.
S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in Proceedings of the 32nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, F. Bach and D. Blei, Eds., vol. 37. Lille, France: PMLR, 07–09 Jul 2015, pp. 448–456.
X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, pp. 249–256.
S. Reddi, S. Kale, and S. Kumar, "On the convergence of adam and beyond," in International Conference on Learning Representations, 2018.
A. Hore and D. Ziou, "Image quality metrics: PSNR vs. SSIM," in 2010 20th International Conference on Pattern Recognition. IEEE, 2010, pp. 2366–2369.
T. Chai and R. R. Draxler, "Root mean square error (RMSE) or mean absolute error (MAE)?–arguments against avoiding rmse in the literature," Geoscientific Model Development, vol. 7, no. 3, pp. 1247– 1250, 2014.