Ⅰ. Introduction
According to data released by the Ministry of Health and Welfare, the annual number of reported missing children in South Korea ranged from approximately 20,000 cases in 2018~2021[1]. And as of April of 2022, there were about 7,500 reported cases. While the annual discovery rate of missing children stands at 99.5%, indicating a high success rate in locating them, there are still accumulated cases of children missing for over a year, totaling 871 cases as of April 2022. This underscores the significant scale of the issue of missing children. Therefore, early detection when a child goes missing is crucial to prevent long-term disappearances. To address this issue, various technologies and systems have been developed to detect missing child early.
One such example is the multi-complex processing signal technology through mobile devices[2]. This technology utilizes devices worn by the missing child, such as smartphones, necklaces, or wristbands, to receive a variety of signals, including Wi-Fi, LTE, and GPS, accurately pinpointing the missing child's location. However, this technology may face limitations as there is a possibility that the missing child may not have the device with them, and it may not function correctly in areas with signal interference or weak signal reception. For these reasons, camera- based technologies that can be applied without the need for the missing child's consent or cooperation are preferred over mobile device tracking technologies.
Technologies utilizing cameras use computer vision systems. These are particularly prevalent in CCTV(Closed-Circuit Television) systems, where they enable the collection of large amounts of data relatively easily from video footage and offer the advantage of being able to survey wide areas.
However, CCTV systems could have blind spots where certain areas cannot be observed may have difficulty identifying objects when they are far away or small in size. Therefore, in this paper, we propose deploying autonomous robots in multi-use facilities to address these limitations.
In addition, a deep learning based missing person recognition system is applied. This system has shown strong performance in pattern recognition and classification from data. However, in the case of deep-learning based facial recognition technology, it cannot identify faces When a person is positioned opposite to the camera and wearing a hat. Therefore, training deep learning models using external features such as a person's clothing or height information can be useful for identifying and searching for missing individuals[3].
Considering the constraints mentioned above and the existing limitations of current missing child prevention systems, this paper adopts the following deep learning and fundamental autonomous driving technologies. Firstly, to extract height information, a depth camera, which provides pixel-level depth information, is utilized to adopt a height estimation algorithm[4]. Subsequently, for obtaining clothing information, a more detailed multi-label classification, as opposed to traditional image classification methods, is used due to the diverse nature of clothing[5]. Additionally, for autonomous operation of robots indoors, Simultaneous Localization and Mapping (SLAM) technology is utilized to acquire spatial information (map) and real-time location information[6]. Finally, based on spatial information (map), navigation is facilitated through path planning to reach the destination[7].
Accordingly, this paper proposes and implements research on missing child exploration using deep learning and autonomous driving robots, utilizing the four key technologies mentioned above. Through this, in the event of a missing child, robots equipped with rapid information processing and decision-making capabilities can respond swiftly and contribute to accident prevention with minimizing blind spots.
In Chapter II of this paper, an explanation of the entire system is provided, along with detailed processes related to height estimation, multi-label classification, SLAM, and navigation. Chapter III discusses the implementation methods and results of these detailed processes. Lastly, in Chapter IV, the paper summarizes and presents its conclusions.
Ⅱ. System
2.1 System Model
The overall system algorithm of this study is implemented in the ROS (Robot Operating System) environment, which includes hardware control, communication between detailed processes (multi-label classification, height estimation, SLAM, and navigation), and the use of 3D visualization tools.
Fig. 1 illustrates the overall system model. The system initiates the search upon receiving information about the missing child, including height and clothing details. Robot navigation for the search is performed on a pre-drawn 2D map using 2D LiDAR. Additionally, visual SLAM for environmental perception is conducted based on feature points extracted from gray images obtained from the depth camera. Simultaneously, height estimation and multi-label classification for finding the missing child are executed sequentially, with color images from the depth camera being processed through their respective deep learning models. All three functionalities (deep learning, navigation, SLAM) communicate through ROS on the master PC. Motor control on the slave PC receives ROS messages through a TCPROS connection with the navigation on the master PC, enabling the operation of motors connected to the robot controller firmware.
2.2 Height estimation
To conduct height estimation(Represented as 1) in Fig. 1), the process begins by detecting individuals using the color data from the depth camera. Subsequently, a 3D coordinate array is created using Depth data. Then, for each person, their individual 3D coordinates are examined to extract distances. Out of the extracted distances, Y-values with significant deviations from the Z-median value are removed as outliers. The resulting Y-direction length information is then used to estimate the height of each person.
2.3 Multi-label classification
To conduct height estimation(Represented as 1) in Fig. 1), the process begins by detecting individuals using the color data from the depth camera. Subsequently, a 3D coordinate array is created using Depth data. Then, for each person, their individual 3D coordinates are examined to extract distances. Out of the extracted distances, Y-values with significant deviations from the Z-median value are removed as outliers. The resulting Y-direction length information is then used to estimate the height of each person.
Classification comparison : (a) single-label, (b) multi-label
2.4 SLAM and Navigation
In the robot's movement section for missing child search, navigation(Represented as 3) in Fig. 1) and SLAM(Represented as 4) in Fig. 1) are utilized. The robot performs Navigation based on a pre-drawn 2D Map[9]. For Path Planning in Navigation, both the DWA (Dynamic Window Approach) algorithm suitable for dynamic environments and the A* algorithm suitable for static environments are employed together[10]. Additionally, the TSP (Travelling Salesman Problem) algorithm is used, where after determining the number and coordinates of waypoints, the optimal path is planned by considering the cost between each point. After visiting all waypoints, the robot returns to the starting point and repeats this process until the missing child is located.
For the assessment of the missing child's location and the surrounding environment, the robot employs the ORB SLAM3 algorithm, a Visual SLAM(Represented as 4) in Fig. 1) technique using cameras[11]. While navigating through the environment following its path planning, the robot utilizes ORB SLAM3 to construct a 3D map. This real-time 3D map allows the robot to continually determine its current position and assess the surroundings within the multi-purpose facility. Ultimately, once the algorithms, including Navigation and ORB SLAM3, have successfully detected both the child's height and clothing information, motor control is initiated to stop the robot. Subsequently, using the SLAM's localization capabilities, the approximate child's position is transmitted to the Ground Control Center(GCS).
Ⅲ. Implementation
The hardware configuration for system implementation is depicted in Fig. 3. A laptop serves as the Master PC, controlling other devices such as the 2D Lidar and Depth Camera, as well as managing detailed processes. The Companion board functions as the Slave PC, equipped with firmware for motor control and connected to enable robot movement control as instructed by the Master PC.
The entire algorithm has been implemented within a Docker container running on Ubuntu 18.04, with ROS Melodic installed. Within this environment, all processes, including multi-label classification, height estimation, SLAM (Simultaneous Localization and Mapping), and Navigation, communicate with each other and execute as part of the integrated system.
The experiment was conducted using a female subject with a height of 167cm and wearing a striped top with black pants. The robot search for a person matching this description while moving around. When the robot detected the target, it report the discovery to the GCS. The following provides detailed explanations for each of the processes.
3.1 Height estimation
For height estimation, the RealSense D455 camera, capable of measuring distances between objects and the camera, was utilized. The height estimation algorithm, depicted in Fig. 4, is implemented as a single node, which is the smallest executable unit in ROS. It all begins when the system receives a message containing the missing person's height information through the 'input' topic. This message is subscribed to within the RECEIVE_HEIGHT function. The message is used for comparison with the results of estimating the person in the camera images during the ESTIMATION function. In the ESTIMATION function, before performing height estimation, a confidence score threshold of 0.9 (90%) was set for the person class to ensure accurate person extraction. Then, assuming that the camera and the person are horizontally aligned, experiments were conducted with four different individuals for height estimation. The results showed that the highest accuracy was achieved at a distance of over 2.4 meters. Therefore, considering a slight margin of error, height estimation is only performed when the distance of the detected object to the camera falls within the range of 2.4(±0.1) meters.
Height estimation algorithm
The Table 2 shows the results of height estimation based on actual measurements of 10 individuals and the average error. The measurements were taken when detected within a range of 2.4(±0.1) meters from the camera, and the average detection distance was 2.389 meters, with an average error of 1.3 centimeters.
Height estimation accuracy
3.2 Multi-label classification
For pretraining the multi-label classification model, the dataset utilized was the Clothing Dataset available on Kaggle. The dataset consisted of over 2000 tops classified into 9 classes and 900 bottoms classified into 9 classes, based on two criteria: color and type. Additionally, the entire dataset of top images and bottom images was split into 85% for training and 15% for validation. The model used for training is the pre-trained ResNet50 deep learning model from PyTorch. The training parameters include 50 epochs and a batch size of 32.
The multi-label classification algorithm, depicted in Fig. 6 is a single node that subscribes to the 'Input' topic, receiving messages in the RECEIVE_CLOTHES function. The messages received through this topic contain information about the missing person's clothing color and type. This information is utilized within the LIVESTREAM function, where the process involves obtaining frame images one by one from the video stream. These frame images are then processed through the multi-label classification model for real-time inference. By comparing the received information with the inference results, if they match for five consecutive times within a 15-second timeframe, it is considered a detection event, and a 'detect' topic is published.
The Fig. 5 depicts the training loss and validation loss for the top and bottom. As the number of epochs increases, in Fig.5. (a), it can be seen that the loss gradually decreases, while in Fig. 5. (b), the loss is shown to decrease sharply in the early stages.
Multi-label classification loss : (a) Top, (b) Bottom
The Table 7 represents the results of testing different combinations of colors and types for tops and bottoms. Among the 14 combinations for tops, 10 were successfully detected, showing a 61.1% probability of success. For bottoms, out of 18 combinations, 11 were successfully detected, demonstrating a 71.4% success rate.
Multi-label classification accuracy
Multilabel classification algorithm
3.3 SLAM and Navigation
Before conducting Navigation, coordinates for four waypoints were predefined. Subsequently, a 2D Map created with LiDAR SLAM was loaded, and the TSP (Traveling Salesman Problem) route planning algorithm was applied for Navigation. The results of this operation are depicted in Fig. 7, where the robot navigates through four waypoints in the order depicted in the figure. Simultaneously, the robot utilized ORB SLAM3, implemented with a camera, to gain real- time situational awareness of its surroundings. The results are illustrated in Fig. 8.
TSP algorithm implementation
ORB SLAM3 : (a) Current Frame, (b) Map Viewer
3.4 System implementation
The entire system initiates exploration upon receiving input information consisting of height data (167cm) and clothing details, including top (stripe, longsleeve) and bottom (black, pants) attributes. During the exploration, the robot performs two stages of processing: height estimation and multi-label classification, all based on the provided information. Starting from the top-left corner and proceeding counterclockwise, the Fig. 9 represents path planning, deep learning, real-time 3D map generation using ORB SLAM3, and the exterior view. Each of these processes is carried out concurrently. During the execution of the TSP path planning algorithm, the robot, at the second waypoint out of the total four waypoints, discovers a target with matching height information as input. Following that, multi-label classification was performed, and the detection results for height and attire information are depicted in Fig. 10 and Fig. 11, respectively.
Entire algorithm implementation
Multi-label classification result
Ⅳ. Conclusion
In this paper, the objective is to deploy robots in facilities with high population density, such as indoor multi-purpose complexes, where the probability of child abduction is high. The purpose is to use these robots to provide assistance and aid in rapidly locating missing children. Through this approach, it is possible to prevent the occurrence of prolonged cases of missing children in the long term.
During the development of the robot in this study, there were errors in the multi-label classification process where the inferred values differed from the actual values. This is believed to have occurred due to biased learning for specific data during the data preprocessing stage, resulting in biased outcomes. Therefore, for achieving higher AI performance, the intention is to improve by collecting data more evenly across the entire dataset to ensure accurate results.
Additionally, in this study, as path planning is conducted in a dynamic environment, errors can arise in robot localization and mapping. To address this, in future research, the plan is to explore the integration of IMU sensors with ORB SLAM3.
Finally, experiments are planned in the future to objectively demonstrate the performance of the proposed system by examining how quickly and accurately missing children can be located depending on the utilization of the suggested system.