Lightweight LiDAR—Camera Online Extrinsic Calibration with Multi-Dilation Encoder Blocks

Sang-Chul Kim♦°; Yeong-min Jang*

doi:10.7840/kics.2025.50.5.693

Index

Figures

Tables

PDF PubReader

Kim and Jang: Lightweight LiDAR—Camera Online Extrinsic Calibration with Multi-Dilation Encoder Blocks

ISSN: 1226-4717

Volume 50, No 5 (2025), pp. 693 - 702

10.7840/kics.2025.50.5.693

Sang-Chul Kim♦° and Yeong-min Jang*

Lightweight LiDAR—Camera Online Extrinsic Calibration with Multi-Dilation Encoder Blocks

Abstract: As the integration of multi-sensor systems, such as cameras and LiDAR, becomes increasingly common in various fields, the development of efficient and accurate extrinsic calibration techniques is emerging as a critical task. This article presents a novel lightweight deep-learning network for LiDAR―Camera targetless extrinsic calibration, which consists of only 4 million parameters. The proposed method utilized CNN-based multi-dilation encoder blocks which can extract multi-scale features, especially for sparse LiDAR depth image. The proposed block allows the network to be lightweight and excel in calibration performance. The proposed method achieved translation errors of 1.08 cm, 0.18 cm, and 0.56 cm along the X, Y, and Z axes, respectively. Additionally, it achieved rotation errors of 0.182°, 0.139°, and 0.141° for roll, pitch, and yaw, respectively. The proposed method also performs calibration in a one-shot approach, which is suitable for real-time applications. These results highlight the capabilities of the proposed method in enabling reliable fusion of LiDAR and camera data, enhancing the perception capabilities of autonomous vehicles.

Keywords: Camera , LiDAR , Extrinsic Calibration , Lightweight Deep Learning , KITTI odometry dataset , sensor fusion.

Ⅰ . Introduction

Research into drone and autonomous vehicle tech- nology has advanced rapidly in the past few years. The environment of drone or autonomous driving is very complex and dynamic. It is shown that a single sensor cannot guarantee reliable recognition in all scenarios.

LiDAR― camera systems are widely used in ro- botic vision applications such as 3D object detection and navigation task solving. In these applications, the LiDAR sensor can generate a sparse point cloud of the surroundings based on the distance measured by the light. The main advantage of LiDAR is its active illumination, which can operate independently of the ambient light. However, the disadvantages of LiDAR are its high cost, limited resolution, and low refresh rate, such as the Velodyne-64 LiDAR, which only measures 64 channels. In addition, LiDAR sensors cannot measure RGB information. RGB cameras are relatively inexpensive and can produce high-reso- lution color images at high frame rates. However, they cannot directly measure the depth information. In oth- er words, LiDAR sensors generate sparse 3D in- formation, while cameras capture 2D dense information. Fortunately, by fusion of LiDAR and camera measurements, most of the shortcomings of LiDAR sensors can be compensated for by RGB cam- eras, and vice versa. After that, LiDAR-camera sys- tems can detect and analyze target objects with more advanced and intelligent vision. However, sensor fu- sion requires extrinsic parameters of LiDAR and camera. Therefore, it is essential to calibrate the ex- ternal parameters of these sensors in [1].

Extrinsic calibration is necessary to align the rela- tive spatial position between LiDAR and the camera. Since LiDAR and cameras collect data in different ways, it is essential to determine their relative rela- tionship accurately. Extrinsic calibration determines extrinsic parameters which are represented as a trans- formation matrix. This transformation matrix calcu- lates the relative position and rotation matrix between LiDAR and the camera to convert the data into a com- mon coordinate system.

One approach of extrinsic calibration is to use a specified target, such as a chessboard pattern or other straight-line features to extract 2D-3D correspond- ences between the sensors and estimate the extrinsic parameters of the LiDAR and cameras. These meth- ods are also called target-based calibration methods. In addition, it involves comparing feature points from LiDAR point clouds and camera images to estimate the relative transformation between the two sensors. Most target-based correction algorithms are time-con- suming, tedious, and offline, making them unsuitable for drone or autonomous driving. During drone oper- ation, the positions of sensors will vary slightly de- pending on the flight time. After a period of operation, the sensor must be recalibrated to remove accumu- lated errors due to drift. This necessitates the use of online and targetless calibration methods.

Several targetless methods have been developed over the years, which use local feature extraction, such as edges and planes. However, these methods still lack the capability in recognizing diverse environments. Deep learning is introduced to enhance the targetless scheme. However, deep learning methods may incur high computational costs. One of the parameters to measure computational cost is the number of trainable parameters used in the network, which determines the size of the deep learning model. Many current meth- ods use architecture comprising of more than 10 mil- lion parameters, such as PSNet^[2] and CalibDepth^[3]. These methods typically use well-known architectures such as ResNet18, which consists of approximately 11 million parameters. Reducing model size allows for less memory requirements, enabling its application in onboard computers in autonomous vehicles.

Therefore, in this study, we propose a new online camera-LiDAR extrinsic calibration, which can in- dependently calibrate the extrinsic parameters be- tween a camera and a LiDAR sensor without using any specific pattern or calibration object. We focus on developing a lightweight model with a reduced number of parameters. More specifically, our con- tribution can be summarized as follows:

1) We proposed a one-shot extrinsic calibration meth- od comprising multi-dilation feature extraction blocks capable of extracting sparse features and fine-grained features of both RGB camera image and LiDAR depth image.

2) The proposed method only comprises ~4 million trainable parameters, making it lightweight for au- tonomous vehicle applications. The proposed ar- chitecture consists of two branches of a feature ex- traction network, each comprising of only 330,000 parameters.

3) By implementing one-shot calibration and light- weight model architecture, exquisite real-time per- formance can be achieved. This method was trained and tested on the KITTI odometry dataset, which consists of diverse environments.

Ⅱ. Related Research

Mainly, research works in extrinsic calibration have primarily concentrated on three categories: tar- get-based methods, target-less methods, and learn- ing-based methods, each with its own unique approach and advantages.

The first category, target-based methods, relies on objects with specified structures or patterns as mul- ti-sensor co-shooting targets to obtain the extrinsic pa- rameters between point clouds and RGB images^{[4 ]}. Giacomini et al.^[5] develop a simple yet robust method for LiDAR and RGB camera extrinsic calibration that leverages small markers and requires minimal human intervention, widely available calibration targets such as A3/A4-sized checkerboard patterns to reduce de- pendency on complex or custom calibration targets, which often require specialized equipment and manufacturing. Jeong et al.^[6] proposed a method which uses perspective projection instead of orthogo- nal projection onto a checkerboard target to address noisy LiDAR points problems while obtaining ex- trinsic parameters.

The second category, the targetless methods, elimi- nates the need for predefined targets by estimating ex- trinsic parameters by extracting useful information from surrounding environments automatically^{[4 ]}. Muñoz-Bañón et al.^[7] introduced a method for cali- brating the extrinsic parameters of camera and LiDAR by utilizing local edge features in arbitrary environ- ments, allowing for the calculation of 3D-to-2D errors between the data from both sensors. To minimize these errors, they employed the per- spective-three-point (P3P) algorithm in their solution. Song et al.^[8] developed a method called Galibr, which uses ground planes and edge information to calibrate the extrinsic parameters of the camera and LiDAR.

Lastly, learning-based methods leverage advance- ments in machine learning and require no artificial definition of features, which can learn useful in- formation using neural networks to automate and en- hance the calibration process^[9]. Schneider et al.^[10] in- troduce RegNet as the first method for calibration that employs a deep convolutional neural network (CNN), extracting and matching features with a network be- fore regressing the calibration parameters. CalibRCNN^[11] and CalibDepth^[3] incorporate both CNN and recurrent networks, such as LSTM, to per- form calibration not only using spatial features, but also temporal features. Xiao et al.^[12] incorporate trans- former architecture inside of the deep learning model which will extract and leverage correlation features with higher contributions. Together, these method- ologies represent a comprehensive framework for ad- dressing the challenges of LiDAR-camera calibration.

In contrast to these methods, our approach empha- sizes developing a deep learning model tailored for practical, on-the-field implementation. To achieve this, we focus on designing a lightweight deep learn- ing model that meets the requirements for efficiency and real-world usability for calibrating the extrinsic parameters in real time.

Ⅲ . Proposed Methods

In this approach, a deep neural network architecture is proposed to estimate the extrinsic calibration pa- rameters between LiDAR and camera sensors. Unlike conventional methods, the proposed framework in- troduces a significantly more lightweight prediction model with only ~4 million parameters, making it well-suited for real-world applications where compu- tational efficiency is a critical consideration. In this section, the problem definition, the architecture and the functionality of each part, and loss function uti- lization, will be elaborated.

3.1 Problem Definition

Extrinsic calibration in camera-LiDAR system is the determination of the extrinsic parameters, which represent the relative pose or rigid transformation be- tween a LiDAR and a camera placed on the same platform or vehicle. Extrinsic parameters are used to transform the coordinate frame of one sensor to the other, which is important in sensor fusion. To obtain a 3D LiDAR point coordinate relative to the camera, the following equation is used:

(1)

[TeX:] $$\begin{equation} P_C=[R \mid t] \cdot P_L=T \cdot P_L \end{equation}$$

where [TeX:] $$\begin{equation} P_c:\left(X_c, Y_c, Z_c\right) \end{equation}$$ is the camera coordinate system, [TeX:] $$\begin{equation} P_L:\left(X_L, Y_L, Z_L\right) \end{equation}$$ is the LiDAR coordinate system, T are the extrinsic parameter matrix, R is the rotation matrix, and t is the translation vector of the LiDAR-camera system.

In this study, the proposed extrinsic calibration method aims to predict the amount of misalignment in the extrinsic parameters. This misalignment is de- fined as the difference or shift from the known initial extrinsic parameters T_known to the actual extrinsic pa- rameters T_actual. The drift in extrinsic parameters oc- curs during vehicle operation, mostly caused by vi- brations, vehicle motions, and temperature changes, which accumulate over time. The accumulated drift in extrinsic can be expressed as:

(2)

[TeX:] $$\begin{equation} T_{\text {actual }}=\Delta T \cdot T_{\text {known }} \end{equation}$$

where [TeX:] $$\begin{equation} \Delta T \end{equation}$$ is the extrinsic parameter misalignment caused by this accumulated drift, denoted as rigid transformation between T_known and T_actual . The pro- posed method will estimate [TeX:] $$\begin{equation} \Delta T_{\text {predicted }} \end{equation}$$ to cancel the misalignment term [TeX:] $$\begin{equation} \Delta T \end{equation}$$ and calibrate the extrinsic pa- rameters using the following equation:

(3)

[TeX:] $$\begin{equation} T_{\text {actual }}=\Delta T_{\text {predicted }}^{-1} \cdot \Delta T \cdot T_{\text {known }} \end{equation}$$

If the prediction is accurate, [TeX:] $$\begin{equation} \Delta T_{p r e d i c t e d}^{-1} \end{equation}$$ and [TeX:] $$\begin{equation} \Delta T \end{equation}$$ will cancel each other, making T_known and T_actual equal.

3.2 Network Architecture

The proposed calibration network is composed of three main components: two branches of a feature ex- traction network, a feature matching layer, and fully connected layers. As all the parameters within these components are differentiable, the overall network can be trained end-to-end and simultaneously. The overall scheme of the proposed method is depicted in Figure 1.

Fig. 1.

The overall architecture of the proposed method.

The first part is the feature extraction network , which consists of two branches designed to extract features from the RGB images and the depth images from the point cloud data projection. The feature ex- traction for each branch is shown in Figure 2. The feature extraction comprises the multi-dilation block. The proposed multi-dilation block consists of several depthwise convolution layers that use various dilation rates to allow more efficient multiscale feature extraction.

Fig. 2.

The architecture of the feature extraction.

Dilated convolution offers several advantages over regular convolution. By introducing gaps between ker- nel elements, it expands the receptive field without increasing the number of parameters, allowing the net- work to capture wide-range dependencies efficiently. Additionally, dilated convolutions help preserve spa- tial resolution by avoiding excessive downsampling. However, dilated convolution with larger dilation may cause gridding artifact problems, where some in- formation is lost or overlapping in several pixels[13]. Thus, multiple dilation rates are applied to overcome the gridding artifacts problem. Additionally, multiple dilation rates can provide multiple resolution quality in the feature extraction, allowing richer representa- tion in the feature maps. By using multiple dilation rates, sparse features of depth images can be extracted more effectively. The proposed feature extraction also obtains three feature map resolutions to be utilized as coarse-to-fine feature matching in the next part of the network. These multiresolution feature maps can be denoted as [TeX:] $$\begin{equation} f_1(X), f_2(X), \text { and } f_3(X) \end{equation}$$ from smallest Fig. 1. The overall architecture of the proposed method. 696 논문 / Lightweight LiDAR—Camera Online Extrinsic Calibration with Multi-dilation Encoder Blocks resolution to largest resolution, respectively, where X is the input image (RGB image X_RGB and depth image X_depth). Other parts of the feature extraction, transition block and stem convolution, are used to reduce the resolution of the feature maps.

The second part, the featurematchinglayer , com- pares the obtained feature maps from both branches by calculating the 3D correlation cost of correspond- ing pixels in RGB and depth feature maps (X_rgb and x_depth). The feature matching part is shown in Figure 3. This part was inspired by the optical flow estima- tion scheme of PWC-Net^[14]. A correlation layer is used to compute the cost volume c (x₁, x₂) as the corre- lation between the flattened feature vectors of X_rgb and x_depth , which is expressed as follows:

(4)

[TeX:] $$\begin{equation} C\left(x_{r g b}\left(p_i\right), x_{\text {depth }}\left(p_j\right)\right)=\bigcup i, j \in D\left(\left(x_{r g b}\left(p_i\right)\right)^T, x_{\text {depth }}\left(p_j\right)\right) \end{equation}$$

To reduce complexity, a local cost volume is com- puted within a small disparity range (d= 9). The re- sulting cost volume has dimensions [TeX:] $$\begin{equation} d^2 \times H \times W \end{equation}$$ where H and Ware the height and width of the feature maps. The cost volume calculation is performed in three stages, each for [TeX:] $$\begin{equation} f_1(X), f_2(X), \text { and } f_3(X) \end{equation}$$, where the fea- ture matching is performed from fine features to coarse features. For every stage, the pixel optical flow is predicted using a CNN layer which determines the misalignment in both input feature maps. Feature warping is also performed to match the predicted flow from the previous stage to the next one.

Fig. 3.

Feature matching architecture.

The third part, fully-connectedlayers , predicts the transformation matrix between LiDAR and camera sensors from the feature matching result of the RGB branch and depth branch. The network includes three layers of fully connected layers, followed by two sep- arate branches, each comprising stacked fully con- nected layers for estimating rotation and translation. The network outputs a 1 × 3 translation vector and a 1 × 4 rotation quaternion.

3.3 Loss Function

For training, the model utilizes an input pair con- sisting of an RGB image and a misaligned depth image. Three types of loss functions are employed: a translation loss (L_T), a rotation loss (L_R), and a point cloud distance loss (L_P).

(5)

[TeX:] $$\begin{equation} L=\lambda_T L_T+\lambda_R L_R+\lambda_P L_P \end{equation}$$

where [TeX:] $$\begin{equation} \lambda_T, \lambda_R, \text { and } \lambda_P \end{equation}$$ denotes respective loss weight.

The smooth L1 loss is applied to the translation vector (t_pred). Because it provides a smoother gradient near zero due to the incorporation of a squared term. For the rotation loss (L_R), the quaternion angular dis- tance form is used instead of Euclidean distance, as quaternions represent directional information, and Euclidean distance fails to capture their differences accurately. The angular distance is defined as follows:

(6)

[TeX:] $$\begin{equation} L_R=D_a\left(q_{g t}, q_{\text {pred }}\right) \end{equation}$$

where q_gt is the ground truth of quaternion, q_pred is the prediction, and D_a is the angular distance of two quaternions.

Besides the regression loss, we also compute the point cloud distance in the loss function by computing the distance between the predicted point cloud and ground truth point cloud for each of the batch data using the L2 normalization equation and dividing it by the amount of the overall data. The point cloud loss is expressed as follows:

(7)

[TeX:] $$\begin{equation} L_P\left(P_{p r e d}, P_{g t}\right)=\frac{1}{N} \sum_{i=1}^N\left\|P_{p r e d i}-P_{g t i}\right\|_2 \end{equation}$$

This type of loss function, which combines trans- lation loss, rotation loss, and point cloud distance loss, is widely used in 3D computer vision, including ex- trinsic calibration. Related works such as RegNet^[10], CalibDNN^[15], CalibNet^[16], and PSNet^[2] employ sim- ilar combined loss functions. Inspired by these meth- ods, we adopt this approach.

The concept of this loss function originates from the PoseNet^[17] paper (for rotation and translation loss) and the work of Fan et al.^[18] (for point cloud distance loss). These papers discuss the impact of these loss components in detail. According to the PoseNet paper, combining translation and rotation loss is essential, as regressing position and orientation separately leads to poor performance. Additionally, the CalibNet paper highlights that incorporating point cloud distance loss significantly reduces translation error by ensuring proper alignment in 3D space

Ⅳ. Experiment and Result

4.1 Dataset

The KITTI odometry dataset^[19] was used to train the proposed model. The KITTI dataset comprises di- verse driving environments in 22 driving sequences. The dataset is split into a training set, which consists of sequence 01-21 with 39,011 frames, and a vali- dation and test set, which consists of 4541 frames of sequence 00. To simulate the misalignment condition in the LiDAR-camera system, the extrinsic parameters are misaligned within the range of 0.25 m and 10°.

4.2 Training Environment

The training process was conducted using an Intel Xeon Silver 4215R CPU and an NVIDIA Titan XP GPU. The model was implemented with Python v3.10 and the PyTorch v2.2 library, utilizing CUDA v11.8 and cuDNN v11.8. The training dataset was processed in batches of 16 over 200 epochs. At the start of the training, the learning rate is set to 10^-4 and reduced by a factor of 0.1 if the validation loss stagnates for 10 consecutive epochs. Adam optimizer is also used to update the model weights during training.

4.3 Evaluation Metrics

The performance of the proposed method is eval- uated by measuring the rotation and translation errors of the predicted extrinsic parameters. Absolute trans- lation errors are evaluated by measuring the Euclidean distance between predicted translation vectors and ground truth translation vectors. The absolute trans- lation error is expressed as follows:

(8)

[TeX:] $$\begin{equation} E_t=\left|t_{\text {pred }}-t_{\text {gt }}\right| \end{equation}$$

Absolute rotation errors are evaluated by measuring the Euler angle difference ([TeX:] $$\begin{equation} E_{\text {yaw }}, E_{\text {pitch }}, \text { and } E_{\text {roll }} \end{equation}$$) be- tween predicted rotation and ground truth rotation.

4.4 Results and Discussions

The performance of the proposed method was as- sessed using the KITTI odometry dataset with a mis- alignment range of 0.25 m and 10°. This method em- ploys a one-shot approach, which runs the model only once per frame. This approach ensures good real-time performance, efficient data processing, and efficient inference. The performance of the proposed method is then compared to several existing methods.

Table 1 presents the performance comparison in terms of translation errors and Table 2 presents the performance comparison in terms of rotation errors. The performance of all existing methods, except CalibDepth^[3], were evaluated on the same misalign- ment range with the proposed method (0.25 m and 10°), whereas CalibDepth were evaluated on mis- alignment range of 1.5 m and 20°. For CalibRCNN^[11], CALNet^[20], and PSNet^[2], we include the results pre- sented in the CalibFormer paper^[12].

ㅍBased on Table 1, the proposed method achieved the best accuracy in all translation errors metrics, whereas, based on Table 2, the proposed method ach- ieved best accuracy in one rotation errors metrics, which is the pitch rotation error. The proposed method shows exquisite performance, particularly in trans- lation accuracy, whereas in terms of rotation accuracy, the proposed method still requires improvements while it is still comparable to the existing methods.

Table 1.

The translation performance comparison with existing methods.

Method	Translation (cm)
Method		Mean	X	Y	Z
CalibRCNN [11]		5.3	6.2	4.3	5.4
CalibDNN [15]		5.07	3.8	1.8	9.6
CALNet [20]	3.03	3.65	1.63	3.80
PSNet [2]	3.07	3.8	2.8	2.6
CalibFormer [12]	1.19	1.10	0.90	1.56
CalibDepth [3]	1.17	1.31	1.02	1.17
Proposed	0.61	1.08	0.18	0.56

Table 2.

The rotation performance comparison with existing methods.

Method	Rotation (degree)
Method	Mean	Roll	Pitch	Yaw
CalibRCNN [11]	0.428	0.199	0.64	0.446
CalibDNN [15]	0.3	0.11	0.35	0.44
CALNet [20]	0.20	0.10	0.38	0.12
PSNet [2]	0.15	0.06	0.26	0.12
CalibFormer [12]	0.141	0.076	0.259	0.087
CalibDepth [3]	0.123	0.064	0.226	0.080
Proposed	0.154	0.182	0.139	0.141

The slight inferiority in rotation accuracy might be caused by the smaller model size compared to the ex- isting method and the utilization of non-pre-trained feature extraction, unlike CalibFormer^[12] which uses pre-trained ResNet18. However, the rotation accuracy of the proposed method is still applicable to the actual application. Additionally, we focus on small model size that allows extrinsic calibration to be performed on smaller onboard computers.

The performance of the proposed method is also evaluated qualitatively by observing the comparison between the projected depth image onto RGB camera image before and after the calibration. The visual re- sult is presented in Figure 4, where the misaligned input, calibrated image, and the ground truth are presented. Figure 4 shows the proposed method is ca- pable in aligning the point cloud to match the original position and orientation shown in the ground truth. The point cloud projections of the surrounding objects and landmarks (such as trees, cars, and poles) are well-aligned with their corresponding camera images.

Fig. 4.

The calibration results visualization of the proposed method.

We also compare the number of parameters with the existing methods, as presented in Table 3. For this comparison, only PSNet^[2] and CalibDepth^[3] men- tioned the number of parameters in their articles ex- plicitly, where PSNet only specified the size of its one branch of feature extraction part. Other methods, such as CalibRCNN^[11], CalibDNN^[15], CALNet^[20], and CalibFormer^[12], mentioned the utilization of ResNet-18, which has around ~11 M parameters. These methods are not included in Table 3 because the exact numbers are not mentioned explicitly in their articles. Comparing the proposed method with meth- ods included in the Table 3 and aforementioned ex- cluded methods, it shows that the proposed method utilized a much lower number of parameters while still outperforming these existing methods, especially in translation parameters.

Table 3.

Number of parameters comparison with existing methods.

Methods	Number of parameters
PSNet (one branch of feature extraction part only) [2]	13.5 M
CalibDepth [3]	43 M
Proposed	4.03 M

Table 4.

Computational cost analysis of the proposed method.

Parameter count	Average inference time (ms)	GPU Memory (MB)	Storage (MB)
4.03 M	30	900	48

In terms of real time performance, the proposed method completes calibration for each frame in ap- proximately 30 milliseconds, making it well-suited for real-time applications. The computational analysis of the proposed model is included in Table 4.

Ⅴ. Conclusion

This study proposes a deep learning-based target- less LiDAR-camera extrinsic calibration for autono- mous vehicle application. The proposed method uti- lizes a novel feature extraction method comprising multi-dilation blocks that can extract sparse and fine-grained features of both RGB camera image and LiDAR depth image. The proposed method also con- sists of only 4 million parameters, making it lightweight. The model is also executed in a one-shot approach, allowing for efficient real-time applications.

The performance of the proposed method is also evaluated under initial misalignment of up to 0.25 m and 10.0°. It outperforms several existing methods, especially in translation errors. The proposed method achieved translation errors of 1.08 cm, 0.18 cm, and 0.56 cm for X -axes, Y -axes, and Z -axes, respectively. It also achieved rotation errors of 0.182°, 0.139°, and 0.141°, for roll, pitch, and yaw, respectively. In terms of rotation accuracy, our proposed method still re- quires some further improvements, such as training on more datasets, improving the training methods, or incorporating advanced deep-learning architecture, such as transformers, where a lightweight self-atten- tion mechanism needs to be developed.

For future directions, we plan to expand the appli- cation of this deep learning based extrinsic calibration for adverse driving conditions, such as rain, snow, fog, or night driving conditions, by training the model on other available datasets supporting these scenarios. Additionally, this calibration method can be im- plemented on other modes of autonomous vehicles, such as unmanned aerial vehicle (UAV), where light- weight deep learning models become crucial.

Biography

Sang-Chul Kim

1994 : B.S. degree, Kyungpook National University

2005 : Ph.D. degree (MS-Ph.D Integrated), Oklahoma State University

2006~Present : Professor, School of Computer Science, Kook- min University

<Research Interests> Real-time operating systems, wireless communication, artificial intelligence

[ORCID:0000-0003-2622-0426]

Biography

Yeong-min Jang

1985 : B.S. degree, Kyungpook National University

1987 : M.S. degree, Kyungpook National University

1999 : Ph.D. degree, University of Massachusetts

2002~Present : Professor, School of Electrical Engineering, Kookmin University

<Research Interests> Artificial intelligence, optical wireless communication, visible light communi- cation, internet of things

[ORCID:0000-0002-9963-303X]

References

1 P. An, et al., "Geometric calibration for LiDAR-camera system fusing 3D-2D and 3D-3D point correspondences," Opt Express, vol. 28, no. 2, p. 2122, Jan. 2020. (https://doi.org/10.1364/OE.381176)doi:[[[10.1364/OE.381176]]]
2 Y. Wu, M. Zhu, and J. Liang, "PSNet: LiDAR and camera registration using parallel subnetworks," IEEE Access, vol. 10, pp. 70553-70561, 2022. (https://doi.org/10.1109/ACCESS.2022.318697 4)doi:[[[10.1109/ACCESS.2022.3186974]]]
3 J. Zhu, J. Xue, and P. Zhang, "CalibDepth: Unifying depth map representation for iterative LiDAR-camera online calibration," in 2023 IEEE ICRA, pp. 726-733, May 2023. (https://doi.org/10.1109/ICRA48891.2023.1016 1575)doi:[[[10.1109/ICRA48891.2023.10161575]]]
4 Z. Tan, X. Zhang, S. Teng, L. Wang, and F. Gao, "A review of deep learning-based LiDAR and camera extrinsic calibration," Sensors, vol. 24, no. 12, p. 3878, Jun. 2024. (https://doi.org/10.3390/s24123878)doi:[[[10.3390/s24123878]]]
5 E. Giacomini, L. Brizi, L. D. Giammarino, O. Salem, P. Perugini, and G. Grisetti, "Ca2Lib: Simple and accurate LiDAR-RGB calibration using small common markers," Sensors, vol. 24, no. 3, p. 956, Feb. 2024. (https://doi.org/10.3390/s24030956)doi:[[[10.3390/s24030956]]]
6 S. Jeong, S. Kim, J. Kim, and M. Kim, "O 3 LiDAR-camera calibration: One-shot, onetarget and overcoming LiDAR limitations," IEEE Sensor J., vol. 24, no. 11, pp. 1865918671, 2024. (https://doi.org/10.1109/JSEN.2024.3390170)doi:[[[10.1109/JSEN.2024.3390170]]]
7 M. A. Munoz-Banon, F. A. Candelas, and F. Torres, "Targetless camera-LiDAR calibration in unstructured environments," IEEE Access, Parameter count Average inference time (ms) GPU Memory (MB) Storage (MB) 4.03 M 30 900 48 Table 4. Computational cost analysis of the proposed method. 701 vol. 8, pp. 143692-143705, 2020. (https://doi.org/10.1109/ACCESS.2020.301412 1)doi:[[[10.1109/ACCESS.2020.3014121]]]
8 W. Song, M. Oh, J. Lee, and H. Myung, "Galibr: Targetless LiDAR-camera extrinsic calibration method via ground plane initialization," in 2024 IEEE Intell. Veh. Symp. (IV), pp. 217-223, 2024. (https://doi.org/10.1109/IV55156.2024.1058842 9)doi:[[[10.1109/IV55156.2024.10588429]]]
9 X. Li, Y. Xiao, B. Wang, H. Ren, Y. Zhang, and J. Ji, "Automatic targetless LiDARcamera calibration: A survey," Artificial Intell. Rev., vol. 56, no. 9, pp. 9949-9987, Nov. 2022. (https://doi.org/10.1007/s10462-022-10317-y)doi:[[[10.1007/s10462-022-10317-y]]]
10 N. Schneider, F. Piewak, C. Stiller, and U. Franke, "RegNet: Multimodal sensor registration using deep neural networks," arXiv preprint arXiv:1707.03167, 2017. (https://doi.org/10.48550/ARXIV.1707.03167)doi:[[[10.48550/ARXIV.1707.03167]]]
11 J. Shi, et al., "CalibRCNN: Calibrating camera and LiDAR by recurrent convolutional neural network and geometric constraints," in 2020 IEEE/RSJ Int. Conf. IROS, pp. 10197-10202, Oct. 2020. (https://doi.org/10.1109/IROS45743.2020.9341 147)doi:[[[10.1109/IROS45743.2020.9341147]]]
12 Y. Xiao, Y. Li, C. Meng, X. Li, J. Ji, and Y. Zhang, "CalibFormer: A transformer-based automatic LiDAR-camera calibration network," in 2024 IEEE ICRA, pp. 1671416720, May 2024. (https://doi.org/10.1109/ICRA57147.2024.1061 0018)doi:[[[10.1109/ICRA57147.2024.10610018]]]
13 F. Yu, V. Koltun, and T. Funkhouser, "Dilated residual networks," in 2017 IEEE CVPR, pp. 636-644, Jul. 2017. (https://doi.org/10.1109/CVPR.2017.75)doi:[[[10.1109/CVPR.2017.75]]]
14 D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, "PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume," arXiv preprint arXiv:1709.02371, Sep. 2017. (https://doi.org/10.1109/CVPR.2018.00780)doi:[[[10.1109/CVPR.2018.00780]]]
15 G. Zhao, J. Hu, S. You, and C.-C. J. Kuo, "CalibDNN: Multimodal sensor calibration for perception using deep neural networks," arXiv preprint arXiv:2103.14793, 2021 (https://doi.org/10.48550/ARXIV.2103.14793)doi:[[[10.48550/ARXIV.2103.14793]]]
16 G. Iyer, R. K. Ram, J. K. Murthy, and K. M. Krishna, "CalibNet: Geometrically supervised extrinsic calibration using 3D spatial transformer networks," in 2018 IEEE/RSJ Int. Conf. IROS, pp. 1110-1117, 2018. (https://doi.org/10.1109/IROS.2018.8593693)doi:[[[10.1109/IROS.2018.8593693]]]
17 A. Kendall, M. Grimes, and R. Cipolla, "PoseNet: A convolutional network for realtime 6-DOF camera relocalization," in 2015 IEEE ICCV, pp. 2938-2946, Dec. 2015. (https://doi.org/10.1109/ICCV.2015.336)doi:[[[10.1109/ICCV.2015.336]]]
18 H. Fan, H. Su, and L. Guibas, "A point set generation network for 3D object reconstruction from a single image," arXiv preprint arXiv:1612.00603, Dec. 2016. (https://doi.org/10.1109/CVPR.2017.623)doi:[[[10.1109/CVPR.2017.623]]]
19 A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? The KITTI vision benchmark suite," in CVPR, 2012. (https://doi.org/10.1109/CVPR.2012.6248074)doi:[[[10.1109/CVPR.2012.6248074]]]
20 H. Shang and B.-J. Hu, "CALNet: LiDAR-camera online calibration with channel attention and liquid time-constant network," in 2022 26th ICPR, pp. 5147-5154, Aug. 2022. (https://doi.org/10.1109/ICPR56361.2022.9956 145) 702doi:[[[10.1109/ICPR56361.2022.9956145]]]

Received: February 21 2025

Revision received: April 1 2025

Accepted: April 7 2025

Published (Electronic): May 31 2025

Corresponding Author: Sang-Chul Kim , sckim7@kookmin.ac.kr

Sang-Chul Kim, Kookmin University School of Computer Science, sckim7@kookmin.ac.kr

Yeong-min Jang, Kookmin University School of Electronics Engineering, yjang@kookmin.ac.kr

Statistics

Cite this article

IEEE Style

S. Kim and Y. Jang, "Lightweight LiDAR—Camera Online Extrinsic Calibration with Multi-Dilation Encoder Blocks," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 5, pp. 693-702, 2025. DOI: 10.7840/kics.2025.50.5.693.

ACM Style

Sang-Chul Kim and Yeong-min Jang. 2025. Lightweight LiDAR—Camera Online Extrinsic Calibration with Multi-Dilation Encoder Blocks. The Journal of Korean Institute of Communications and Information Sciences, 50, 5, (2025), 693-702. DOI: 10.7840/kics.2025.50.5.693.

KICS Style

Sang-Chul Kim and Yeong-min Jang, "Lightweight LiDAR—Camera Online Extrinsic Calibration with Multi-Dilation Encoder Blocks," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 5, pp. 693-702, 5. 2025. (https://doi.org/10.7840/kics.2025.50.5.693)

Index

Figures

Tables

Facebook

Twitter

LinkedIn

BibTex

RIS

Sang-Chul Kim♦° and Yeong-min Jang*

Lightweight LiDAR—Camera Online Extrinsic Calibration with Multi-Dilation Encoder Blocks

Ⅰ . Introduction

Ⅱ. Related Research

Ⅲ . Proposed Methods

3.1 Problem Definition

(1)

(2)

(3)

3.2 Network Architecture

(4)

3.3 Loss Function

(5)

(6)

(7)

Ⅳ. Experiment and Result

4.1 Dataset

4.2 Training Environment

4.3 Evaluation Metrics

(8)

4.4 Results and Discussions

Ⅴ. Conclusion

Biography

Sang-Chul Kim

Biography

Yeong-min Jang

References

Statistics

Related Articles

Flash 방식 FMCW 라이다의 Coherent Detection을 위한 Array Balanced Photodetector 구현

블랙박스 암호모듈에 적용 가능한 CRYSTALS-Kyber의 은닉채널 공격과 IP카메라의 잠재적 보안 위협

광 코드분할 다중접속을 이용한 펄스 라이다의 상호 간섭 제거

Real-Time 3D LiDAR-Based Obstacle Avoidance for UAV Using Sector-Filtering Algorithm

터널환경을 고려한 IMU/UWB/LiDAR 기반 복합항법 알고리즘 설계 및 성능분석

Deep Learning-Based Lightweight LiDAR and Fisheye Camera Online Extrinsic Calibration

Performance Evaluation of Optical Camera Communication System Using Two-Dimensional Coding

Nighttime LED Taillight Detection Method with Optical Camera Communication System

디스플레이 기반 통신(DFC)에서의 새로운 데이터 할당기법 연구

다중 카메라를 이용한 실시간 실내측위 방안

Cite this article