Multimodal social scenario perception model for initial human-robot interaction

Diego Cardoso Alves; Paula Dornhofer Paro Costa

doi:10.5753/sibgrapi.est.2019.8309

Diego Cardoso Alves University of Campinas
Paula Dornhofer Paro Costa University of Campinas

DOI: https://doi.org/10.5753/sibgrapi.est.2019.8309

Resumo

Human-robot interaction imposes many challenges and artificial intelligence researchers are demanded to improve scene perception, social navigation and engagement. Great attention is being dedicated to the development of computer vision and multimodal sensing approaches that are focused on the evolution of social robotic systems and the improvement of social model accuracy. Most recent works related to social robotics rely on the engagement process with a focus on maintaining a previously established conversation. This work brings up the study of initial human-robot interaction contexts, proposing a system that is able to analyze a social scenario through the detection and analysis of persons and surrounding features in a scene. RGB and depth frames, as well as audio data, were used in order to achieve better performance in indoor scene monitoring and human behavior analysis.

Referências

I. O. for Standardization, “Robots and robotic devices - vocabulary,” ISO/TC 299 Robotics, vol. 2, Mar. 2012.

M. Hutson, The 7 Laws of Magical Thinking: How Irrational Beliefs Keep Us Happy, Healthy and Sane. Plume, Jan. 2012.

J. Fogarty, S. Hudson, C. Atkeson, D. Avrahami, J. Forlizzi, S. Kiesler, J. Lee, and J. Yang, “Predicting human interruptibility with sensors,” ACM Transactions on Computer-Human Interaction (TOCHI), vol. 12, no. 1, pp. 119–146, 2005. https://doi.org/10.1145/1057237.1057243

J. Rivera, “A socio-technical systems approach to studying interruptions: Understanding the interrupter’s perspective,” Applied ergonomics, vol. 45, no. 3, pp. 747–756, 2014. https://doi.org/10.1016/j.apergo.2013.08.009

D. McFarlane and K. Latorella, “The scope and importance of human interruption in human-computer interaction design,” Human-Computer Interaction, vol. 17, no. 1, pp. 1–61, 2002. https://doi.org/10.1207/S15327051HCI1701_1

G. Miller, “The smartphone psychology manifesto,” Perspectives on psychological science, vol. 7, no. 3, pp. 221–237, 2012. https://doi.org/10.1177%2F1745691612441215

C. Roda and J. Thomas, “Attention aware systems: Theories, applications, and research agenda,” Computers in Human Behavior, vol. 22, no. 4, pp. 557–587, 2006. https://doi.org/10.1016/j.chb.2005.12.005

A. Campbell and T. Choudhury, “From smart to cognitive phones,” IEEE Pervasive Computing, vol. 11, no. 3, pp. 7–11, 2012. https://doi.ieeecomputersociety.org/10.1109/MPRV.2012.41

V. Pejovic and M. Musolesi, “Anticipatory mobile computing: A survey of the state of the art and research challenges,” ACM Computing Surveys (CSUR), vol. 47, no. 3, p. 47, 2015. https://doi.org/10.1145/2693843

S. Kim, J. Chun, and A. Dey, “Sensors know when to interrupt you in the car: Detecting driver interruptibility through monitoring of peripheral interactions,” in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2015, pp. 487–496. https://doi.org/10.1145/2702123.2702409

J. Ho and S. Intille, “Using context-aware computing to reduce the perceived burden of interruptions from mobile devices,” in Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 2005, pp. 909–918. https://doi.org/10.1145/1054972.1055100

N. Lathia, K. Rachuri, C. Mascolo, and P. Rentfrow, “Contextual dissonance: Design bias in sensor-based experience sampling methods,” in Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing. ACM, 2013, pp. 183–192. https://doi.org/10.1145/2493432.2493452

R. Harr and V. Kaptelinin, “Interrupting or not: exploring the effect of social context on interrupters decision making,” in Proceedings of the 7th Nordic Conference on Human-Computer Interaction: Making Sense Through Design. ACM, 2012, pp. 707–710. https://doi.org/10.1145/2399016.2399124

S. Iqbal and E. Horvitz, “Notifications and awareness: a field study of alert usage and preferences,” in Proceedings of the 2010 ACM conference on Computer supported cooperative work. ACM, 2010, pp. 27–30. https://doi.org/10.1145/1718918.1718926

D. Billings, K. Schaefer, J. Chen, and P. Hancock, “Human-robot interaction: developing trust in robots,” in Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction. ACM, 2012, pp. 109–110. https://doi.org/10.1145/2157689.2157709

J. Lewis and A. Weigert, “Trust as a social reality,” Social forces, vol. 63, no. 4, pp. 967–985, 1985. http://doi.org/10.2307/2578601

J. Rempel, J. Holmes, and M. Zanna, “Trust in close relationships.” Journal of personality and social psychology, vol. 49, no. 1, p. 95, 1985. https://psycnet.apa.org/doi/10.1037/0022-3514.49.1.95

P. Hancock, D. Billings, K. Schaefer, J. Chen, E. Visser, and R. Para-suraman, “A meta-analysis of factors affecting trust in human-robot interaction,” Human factors, vol. 53, pp. 517–527, 10 2011. https://doi.org/10.1177/0018720811417254

K. Schaefer, T. Sanders, R. Yordon, D. Billings, and P. Hancock, “Classification of robot form: Factors predicting perceived trustworthiness,” Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 56, no. 1, pp. 1548–1552, 2012. https://doi.org/10.1177%2F1071181312561308

M. Ahmad, O. Mubin, and J. Orlando, “A systematic review of adaptivity in human-robot interaction,” Multidisciplinary Digital Publishing Institute, vol. 1, pp. 1–14, jul 2017. https://doi.org/10.3390/mti1030014

P. Caleb, S. Dogramadzi, A. Huijnen, and H. Heuvel, “Exploiting ability for human adaptation to facilitate improved human-robot interaction and acceptance,” The Information Society, vol. 34, no. 3, pp. 153–165, 2018. https://doi.org/10.1080/01972243.2018.1444255

H. Koppula, R. Gupta, and A. Saxena, “Learning human activities and object affordances from rgb-d videos,” The International Journal of Robotics Research, vol. 32, no. 8, pp. 951–970, 2013. https://doi.org/10.1177%2F0278364913478446

C. Wu, J. Zhang, S. Savarese, and A. Saxena, “Watch-n-patch: Unsu-pervised understanding of actions and relations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4362–4370. https://doi.org/10.1109/TPAMI.2017.2679054

W. Samsudin and K. Ghazali, “Crowd behavior monitoring using self-adaptive social force model,” Mekatronika, vol. 1, no. 1, pp. 64–72, 2019.

A. Tsiami, P. Filntisis, N. Efthymiou, P. Koutras, G. Potamianos, and P. Maragos, “Far-field audio-visual scene perception of multi-party human-robot interaction for children and adults,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 6568–6572. https://doi.org/10.1109/ICASSP.2018.8462425

L. Turner, S. Allen, and R. Whitaker, “Interruptibility prediction for ubiquitous systems: conventions and new directions from a growing field,” in Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing. ACM, 2015, pp. 801–812. https://doi.org/10.1145/2750858.2807514

O. Palinko, K. Ogawa, Y. Yoshikawa, and H. Ishiguro, “How should a robot interrupt a conversation between multiple humans,” in International Conference on Social Robotics. Springer, 2018, pp. 149–159. https://doi.org/10.1007/978-3-030-05204-1_15

E. Lachat, H. Macher, T. Landes, and P. Grussenmeyer, “Assessment and calibration of a rgb-d camera (kinect v2 sensor) towards a potential use for close-range 3d modeling,” Remote Sensing, vol. 7, pp. 13 070–13 097, oct 2015. https://doi.org/10.3390/rs71013070

P. Puente, M. Bajones, C. Reuther, D. Wolf, D. Fischinger, and M. Vincze, “Robot navigation in domestic environments: Experiences using rgb-d sensors in real homes,” Journal of Intelligent and Robotic Systems, jun 2018. https://doi.org/10.1007/s10846-018-0885-6

C. Zimmermann, T. Welschehold, C. Dornhege, W. Burgard, and T. Brox, “3d human pose estimation in rgbd images for robotic task learning,” in IEEE International Conference on Robotics and Automation (ICRA), mar 2018. https://doi.org/10.1109/icra.2018.8462833

N. Zlatintsi, I. Rodomagoulakis, P. Koutras, A. Dometios, V. Pitsikalis, C. Tzafestas, and P. Maragos, “Multimodal signal processing and learning aspects of human-robot interaction for an assistive bathing robot,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3171–3175, apr 2018. https://doi.org/10.1109/icassp.2018.8461568

Intel Realsense camera R200 datasheet, Intel RealSense Technology, 2017.

I. Realsense. (2018, jul) Librealsense. Github. [Online]. Available: https://github.com/IntelRealSense/librealsense/blob/v1.12.1/examples/cpp-config-ui.cpp

B. Tadas, Z. Amir, C. Yao, and M. Louis-Philippe, “Openface 2.0: Facial behavior analysis toolkit,” IEEE International Conference on Automatic Face and Gesture Recognition, may 2018. https://doi.org/10.1109/FG.2018.00019

Z. Cao, T. Simon, S. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in CVPR, apr 2017. https://doi.org/10.1109/cvpr.2017.143

S. Roweis, “Levenberg-marquardt optimization,” University Of Toronto, 1996.

Y. I. Abdel-Aziz and H. M. Karara, “Direct linear transformation from comparator coordinates into object space coordinates in closerange photogrammetry,” Symposium on CloseRange Photogrammetry, 1971. https://doi.org/10.14358/PERS.81.2.103