A Line-Based Unmanned Aerial Vehicle Localization Framework Using Mid-Wave Infrared Observations

Jungwoo Huh♦; Jaekyung Kim*; Kyungjune Lee*; Ingu Park**; Junhyeong Bak**; Byungjin Kang**; Sanghoon Lee°

doi:10.7840/kics.2026.51.1.169

Index

Figures

Tables

PDF PubReader

Huh , Kim , Lee , Park , Bak , Kang , and Lee: A Line-Based Unmanned Aerial Vehicle Localization Framework Using Mid-Wave Infrared Observations

ISSN: 1226-4717

Volume 51, No 1 (2026), pp. 169 - 183

10.7840/kics.2026.51.1.169

Jungwoo Huh♦, Jaekyung Kim, Kyungjune Lee, Ingu Park, Junhyeong Bak, Byungjin Kang**, Sanghoon Lee°

A Line-Based Unmanned Aerial Vehicle Localization Framework Using Mid-Wave Infrared Observations

Abstract: With the growing demand for unmanned aerial vehicle (UAV) localization systems that remain robust under challenging weather and lighting conditions, mid-wave infrared (MWIR) imagery has emerged as a promising alternative to conventional visible light (VL) imagery. However, existing VL-based localization methods are not directly applicable to MWIR images due to their fundamentally different visual characteristics. In this paper, we propose a novel UAV localization framework specifically designed for MWIR imagery. Our approach begins by detecting line features from MWIR frames, which are then matched to a predefined 3D line map of geologically meaningful structures. To establish robust correspondences under significant viewpoint changes, we introduce a 2D-3D line matching model trained using a synthetic dataset generated via a novel data augmentation strategy. The UAV’s pose is subsequently refined by minimizing the reprojection error between matched 2D and 3D lines. We validate our method using simulated MWIR flight sequences rendered from 3D model of the Sinjin Island. Unlike state-of-the-art VL-based baselines, which suffer from degraded performance in the MWIR domain, our method achieves more accurate and robust localization. In addition, the proposed framework runs in real-time, making it suitable for practical deployment in onboard UAV systems.

Keywords: UAV localization , Mid-wave Infrared , Line Detection , Line Matching , Data Augmentation , Pose , Refinement

Ⅰ. Introduction

The development of low-cost motors and gyroscopic sensors has led to the widespread production of unmanned aerial vehicles (UAVs), which have become essential tools for capturing aerial imagery^[1,2]. A key requirement for such tasks is accurate UAV localization, which involves estimating the vehicle’s pose by integrating various sensor inputs, including GPS data, IMU measurements, and visual features^[3]. While GPS and IMU provide initial pose estimates, visual features are commonly used to refine the UAV’s position and orientation. Many localization methods rely on visual features extracted fromvisible light (VL) imagery^[4,5]. However, VL sensors are highly sensitive to lighting conditions and have poor transmittance in adverse weather, limiting their effectiveness. To address these limitations, remote sensing imagery, particularly mid-wave infrared(MWIR), offers a promising alternative, as MWIR is less affected by lighting and weather conditions. Despite these advantages, UAV localization using MWIR imagery remains underexplored, and effective methodologies for applying localization techniques in this domain are not well established.

However, existing localization methods designed for VL images^[4,5] cannot be directly applied to UAV localization using MWIR imagery. The primary reason is that MWIR images exhibit significantly different characteristics from VL images, leading to a mismatch in the behavior of conventional visual descriptors. As shown in Fig. 1(a), keypoints extracted using the SIFT algorithm^[6] are much sparser in MWIR images due to their predominantly texture-less regions, whereas VL images yield a dense set of keypoint descriptors. In contrast, Fig. 1(b) depicts that line features detected by the EDLines algorithm^[7] show comparable results in both MWIR and VL images. This is because line detection relies primarily on strong image gradients rather than complex local descriptors, making it more robust to the modality gap between VL and MWIR

Fig. 1.

Feature detection results between (a) SIFT descriptor and (b) EDLines.

However, lines without associated descriptors are unsuitable for reliable feature matching, which is essential for UAV pose refinement. Most UAV localization methods rely on matching visual features detected in the current frame with those from previous frames. Since the UAV’s viewpoint changes over time, descriptors that are invariant to geometric transformations, such as SIFT, are commonly used to represent visual features. These descriptors allow for robust comparison of features despite changes in viewpoint or scale. In contrast, measuring the similarity between lines is inherently ambiguous without descriptive representations^[8,9]. As illustrated in Fig. 2, existing line distance metrics often result in mismatches under significant rotation or translation. Therefore, for line-based localization to be viable, a more sophisticated matching strategy must be employed.

Fig. 2.

Common mismatches between two set of lines using the angle distance of [ 9].

Motivated by these challenges, we propose a novel line-based UAV localization framework targeted for MWIR observations, which performs feature matching and pose refinement without relying on descriptors. An overview of the proposed framework is shown in Fig. 3. The UAV’s initial pose, estimated using GPS and IMU measurements, is refined using visual cues from MWIR imagery. We adopt lines as the primary visual features, as line detection proves more reliable than keypoint detection in MWIR images. Given a predefined 3D line map and the initial pose estimate, 3D lines are transformed into the UAV’s local coordinate system. These transformed lines are then matched with the detected 2D lines. Finally, the UAV pose is refined by minimizing the reprojection error between the 3D lines and their corresponding matched 2D observations.

Fig. 3.

Overview of the proposed MWIR imaging-based UAV localization method.

To address the inherent ambiguity in line matching, we design a 2D-3D line matching model capable of establishing correspondences under large rotation and translation changes. Both 2D and 3D lines are represented using Plücker coordinates^[10], which encode line geometry using direction and moment vectors rather than endpoints. This representation is significantly more stable, as endpoint-based coordinates are highly susceptible to detection noise, complicating model training. Our matching model learns the geometric structure of the 3D line map and the relationships between detected 2D lines through self-and cross-attention mechanisms^[11]. The final line correspondences are determined by solving an optimal transport problem using the matching scores predicted by the model.

Training this matching model requires a sufficiently large dataset that captures the underlying geometry of the 3D map and the relationship to its 2D projections. However, collecting labeled data for a specific 3D line map is impractical, as it requires manual annotation of 2D-3D line correspondences. To overcome this limitation, we propose a novel data augmentation strategy that generates synthetic training data in a fully automated manner. Given a 3D line map, our method samples a reference UAV pose and a randomly perturbed pose to simulate realistic viewpoint changes. It then renders 2D projections of the 3D lines from the augmented pose and adds noise to the projected lines to mimic real-world 2D line detections. By training on this synthetic dataset, our model learns to generalize effectively to real-world MWIR scenarios without requiring manually labeled ground-truth matches.

Our main contributions are summarized as follows:

· We propose a novel UAV localization framework targeted for MWIR imagery, using line-based representations for enhanced robustness under challenging sensing conditions.

· We introduce a 2D-3D line matching model that is significantly more robust to large rotational and translational variations compared to conventional matching methods

· We develop a data augmentation strategy that automatically generates realistic synthetic training data, enabling effective model training without the need for manually annotated real-world data.

Ⅱ. Methodology

2.1 Notations

Let [TeX:] $$\mathscr{L}_{3 D}=\left\{L_i \mid L_i \in \mathbb{R}^{3 \times 2}, i=1, \ldots, M\right\}$$ denote the set of predefined 3D lines, where each line Li is represented by its two endpoints in world coordinates, typically corresponding to geological landmarks. Similarly, let [TeX:] $$\mathscr{L}_{2 D}=\left\{l_i \mid l_i \in \mathbb{R}^{2 \times 2}, i=1, \ldots, N\right\}$$ denote the set of 2D lines detected fromthe current MWIR image. The UAV’s initial pose, estimated from sensor data, is denoted by [TeX:] $$P_i=\left[R_i / t_i\right]$$], where [TeX:] $$R_i \in \mathrm{SO}(3)$$ is the 3D rotation matrix and [TeX:] $$t_i \in \mathbb{R}^{3 \times 1}$$ is the translation vector. The refined pose, obtained after line matching and pose optimization, is denoted by [TeX:] $$P_r=\left[R_r / t_r\right]$$, with analogous definitions for R_r and t_r. The transformed 3D line map [TeX:] $$\tilde{\mathscr{L}}_{3 D}$$, expressed in the UAV coordinate system defined by P_i, is given by:

(1)

[TeX:] $$\tilde{\mathscr{L}}_{3 D}=\left\{R_i L_i+T_i \mid L_i \in \mathscr{L}_{3 D}, T_i=\left[t_i, t_i\right]\right\} .$$

2.2 Line Detection

As various line detection methods exist, it is important to select the approach most suitable for MWIR-based UAV localization. Broadly, line detection methods can be categorized into two groups: (1) image processing-based methods and (2) learning-based methods. We compare two widely used methods from each category: LSD^[12 and EDLines^[7 for image processing-based approaches, and AFM^[13 and MLSD^[14 for learning-based approaches. The comparison results are shown in Fig. 4.

Fig. 4.

Lines detected in MWIR images using (a) LSD, (b) EDLines, (c) AFM, and (d) M-LSD.

As illustrated in Fig. 4(a) and Fig. 4(b), image processing-based methods such as LSD and ED-Lines reliably detect prominent lines in MWIR images by leveraging strong image gradients. In contrast, as shown in Fig. 4(c) and Fig. 4(d), learning-based methods such as AFM and M-LSD perform poorly in this domain, exhibiting significant noise and failing to extract meaningful lines. This degradation stems from the fact that these models are trained on VL images and thus learn feature distributions specific to the VL domain. Adapting them to MWIR imagery would require substantial retraining with annotated MWIR data, which is expensive and difficult to obtain. Given these observations, we conclude that image processing-based line detection methods are more suitable for MWIR-based UAV localization, as they require no training and are robust to domain shifts. Among them, we adopt EDLines over LSD, as it produces fewer but more salient lines, reducing clutter and improving subsequent line matching performance.

2.3 Line Matching

As previously shown in Fig. 2, conventional distance-based line matching methods^[8,9] often result in mismatches, as the distance between two lines is inherently ambiguous without known correspondences. However, meaningful matches can still be inferred by leveraging the geometric relationships among neighboring lines. To exploit this relational information, we design a 2D-3D line matching model inspired by the graph attentional architecture^[11], as illustrated in Fig. 5. Our model begins by encoding the detected 2D lines Lﾘ2D and the transformed 3D lines Lﾘ3D using a line encoder and a midpoint encoder. To match the dimensionality between 2D and 3D lines, we represent the 2D lines in homogeneous coordinates. We then represent each line using Plücker coordinates^[10], which is composed of direction and moment vectors. Given a 3D [TeX:] $$\tilde{L}_i=\left[\tilde{x}_{1, i}, \tilde{x}_{2, i}\right]$$ where [TeX:] $$\tilde{x}_{1, i}$$ and [TeX:] $$\tilde{x}_{2, i}$$ are its endpoints, the Plücker coordinates of [TeX:] $$\tilde{L}_i$$ is computed as:

(2)

[TeX:] $$\begin{aligned} d & =\tilde{x}_{1, i}-\tilde{x}_{2, i} \\ m & =\tilde{x}_{1, i} \times \tilde{x}_{2, i} \end{aligned}$$

Fig. 5.

Architecture of the proposed 2D-3D line matching model.

where d is the direction vector and m is the moment vector. The moment vector is the normal of the plane formed by the two endpoints and the origin.

For 2D lines, the direction vector is constrained by the image plane, and the moment vector m uniquely defines the line up to scale. Thus, we normalize the moment vector to obtain the final representation [TeX:] $$\bar{m}=m\|m\|$$. Each 2D or 3D line is then encoded into a D-dimensional feature vector using a multi-layer perceptron (MLP), referred to as the line encoder. To differ-entiate between colinear lines, which cannot be distinguished solely by the moment vector, we also encode the midpoint [TeX:] $$\tilde{x}_{m, i}=\left(\tilde{x}_{1, i}+\tilde{x}_{2, i}\right) / 2$$ of each line using a separate MLP, referred to as the midpoint encoder. The outputs of the line encoder and midpoint encoder are summed to produce the final line feature [TeX:] $$f_i \in \mathbb{R}^D$$, which is then passed to the graph attentional module.

The graph attentional module and matching mechanism follow the formulation of^[11]. Given the encoded 2D and 3D line features [TeX:] $$F^{2 D}=\left[f_1^{2 D}, \ldots, f_{N_{2 D}}^{2 D}\right]$$ and [TeX:] $$F^{3 D}=\left[f_1^{3 D}, \ldots, f_{N_{3 D}}^{3 D}\right]$$, the module computes both self-attention (within each set) and cross-attention(between sets), allowing contextual interactions among and across line features. This attentional block is repeated N_A times to progressively refine the features. To obtain the final matching, we first compute a score matrix which quantifies the similarity between refined 2D and 3D line features. The score matrix [TeX:] $$S \in \mathbb{R}^{\left(N_{3 D}+1\right) \times\left(N_{2 D}+1\right)}$$ is computed using the inner-product between the refined features at the final layer L, augmented with a learnable dustbin score z for unmatched lines:

(3)

[TeX:] $$s_{i, j}= \begin{cases}f_i^{2 D} \cdot f_j^{3 D}, & i \in \mathscr{I}_{2 D}, j \in \mathscr{I}_{3 D} \\ z, & i=N_{3 D}+1 \text { or } j=N_{2 D}+1,\end{cases}$$

where [TeX:] $$\mathscr{I}_{3 D}=\left\{1,2, \ldots N_{3 D}\right\}$$ and [TeX:] $$\mathscr{I}_{2 D}=\left\{1,2, \ldots N_{2 D}\right\}$$. To obtain the assignment, we solve an optimal transport problem over S using the Sinkhorn algorithm^[15]], which provides a differentiable approximation of the Hungarian algorithm^[16]. This results in a soft assignment matrix [TeX:] $$\tilde{A}$$, which represents the model’s confidence in each 2D-3D line pair.

The entire network is trained end-to-end using a loss function that supervises both matched and unmatched primitives:

(4)

[TeX:] $$\begin{aligned} & L=-\sum_{(i, j) \in \mathscr{M}} \log \tilde{A}_{i, j} \\ & -\sum_{i \in \mathscr{I}_{2 D} \backslash \mathscr{M}_i} \log \tilde{A}_{i, N_{3 D}+1}-\sum_{j \in \mathscr{I}_{3 D} \backslash \mathscr{M}_j} \log \tilde{A}_{N_{2 D}+1, j}, \end{aligned}$$

where [TeX:] $$\hat{\mathscr{M}}$$ denotes the set of ground-truth 2D-3D line correspondences, and [TeX:] $$\hat{\mathscr{M}}_i$$, [TeX:] $$\hat{\mathscr{M}}_j$$ represent the sets of matched indices in 2D and 3D lines, respectively. The first term maximizes the confidence of correct matches, while the remaining terms guide the model to assign unmatched lines to the dustbin entry. Finally, the set of predicted matches is obtained by selecting, for each 3D line, the 2D line with the highest assignment score:

(5)

[TeX:] $$\mathscr{M}=\left\{(i, j) \mid i=\underset{i^{\prime} \in \mathscr{I} 2 D}{\arg \max } \tilde{A}_{i^{\prime}, j}, j \in \mathscr{I}_{3 D}\right\} .$$

2.4 Synthetic Data Augmentation

While careful model design is important, its effectiveness ultimately depends on the availability of appropriate training data. This is particularly critical for our 2D-3D line matching model, which must learn the geometric relationships specific to a given 3D line map. However, collecting such training data is laborintensive, requiring MWIR image sequences, 2D line detection, and manual annotation of correspondences between 2D and 3D lines. To address this challenge, we propose a novel synthetic data augmentation method that generates realistic 2D-3D line pairs with minimal domain discrepancy, enabling effective model training without manual labeling. The overall augmentation process is shown in Fig. 6.

Fig. 6.

Line matching data augmentation process.

First, a point-of-focus (PoF) xPoF is sampled near the 3D line map [TeX:] $$\mathscr{L}_{3 D}$$. This is achieved by drawing a 3D point from a Gaussian distribution defined by the mean and standard deviation of the midpoints of the lines in [TeX:] $$\mathscr{L}_{3 D}$$. Next, we sample a random distance [TeX:] $$r \in[n, f]$$, azimuth angle [TeX:] $$\theta \in[0,2 \pi]$$, elevation angle [TeX:] $$\phi \in[0, \pi / 2]$$, and twist angle [TeX:] $$\psi \in[-\pi / 3, \pi / 3]$$, where n and f denote the near and far bounds of the sampling radius. The initial UAV pose Pi is then defined as:

(6)

[TeX:] $$\begin{aligned} P_i & =\left[R_i \mid t_i\right] \\ R_i & =R_z(\psi) R_y(\phi) R_x(\theta) \\ t_i & =x_{\mathrm{PoF}}+r R_y(\phi) R_x(\theta) \hat{e}_x, \end{aligned}$$

where [TeX:] $$R_x(\cdot)$$, [TeX:] $$R_y(\cdot)$$, [TeX:] $$R_z(\cdot)$$ are rotation matrices around the respective axes, and [TeX:] $$\hat{e}_x$$ is the unit vector alongthe x-axis. We can now obtain the transformed 3D line map [TeX:] $$\mathscr{L}_{3 D}$$ given the initial UAV pose using Eq.(1). The next step is to generate synthetic 2D lines to make pairs with the transformed 3D lines. We first transform the initial pose to a new pose P_r, which will be the target refined pose. The refined pose is derived by adding random rotation and translation as follows:

(7)

[TeX:] $$\begin{aligned} P_r & =\left[R_r \mid t_r\right] \\ R_r & =R_z(\delta \psi) R_y(\delta \phi) R_x(\delta \theta) R_i \\ t_r & =t_i+\delta r_x \hat{e}_x+\delta r_y \hat{e}_x+\delta r_z \hat{e}_x, \end{aligned}$$

where [TeX:] $$\delta \psi, \delta \phi, \delta \theta, \delta r_x, \delta r_y, \delta r_z, \hat{e}_y, \hat{e}_z$$, are random rotation angles and random translation distances with respect to x, y, z axes, and unit vectors in the y, z-axis direction, respectively.

Given the refined pose P_r, we project the 3D line map [TeX:] $$\mathscr{L}_{3 D}$$ to the Pr to obtain the projected 2D line map [TeX:] $$\mathscr{L}_{2 D}^{\prime}=\left\{l_i^{\prime} \mid l_i^{\prime} \in \mathbb{R}^{2 \times 2}, i=1, \ldots, N_{3 D}\right\}$$. Each line in [TeX:] $$L_i=\left[x_{1, i}, x_{2, i}\right] \in \mathscr{L}_{3 D}$$ is projected by transforming its endpoints and applying the pinhole camera model:

(8)

[TeX:] $$l_i^{\prime}=\left[\pi\left(K\left(R_r x_{1, i}+t_r\right)\right), \pi\left(K\left(R_r x_{2, i}+t_r\right)\right)\right],$$

where K is the camera intrinsics and [TeX:] $$\pi(\cdot)$$ denotes the perspective projection function:

(9)

[TeX:] $$\pi\left(\left[\begin{array}{l} x \\ y \\ z \end{array}\right]\right)=\left[\begin{array}{l} x / z \\ y / z \end{array}\right].$$

From the projected 2D line map [TeX:] $$\mathscr{L}_{2 D}^{\prime}$$, we apply line noises which are frequently observed in real 2D linedetection as in Fig. 4. For each 2D line [TeX:] $$l_{2 D}^{\prime} \in \mathscr{L}_{2 D}^{\prime}$$ we randomly eliminate, split and perturb their endpoints.

After the target lines are augmented, additional noise lines are introduced to simulate realistic detection artifacts. Specifically, lines are sampled from two sources: a random noise map [TeX:] $$\mathscr{L}_R$$ and a parallel noise map [TeX:] $$\mathscr{L}_P$$. The random noise map contains lines generated near the 3D line map [TeX:] $$\mathscr{L}_{3 D}$$, while the parallel noise map includes lines that are spatially aligned and oriented parallel to lines in [TeX:] $$\mathscr{L}_{3 D}$$. Using the same projection process applied to [TeX:] $$\tilde{\mathscr{L}}_{3 D}$$, both [TeX:] $$\mathscr{L}_R$$ and [TeX:] $$\mathscr{L}_P$$ are projected to the refined pose P_r using Eq.(8), producing the projected sets [TeX:] $$\mathscr{L}_R^{\prime}$$ and [TeX:] $$\mathscr{L}_P^{\prime}$$. These are then added to the augmented target lines [TeX:] $$\mathscr{L}_2D^{\prime}$$ to generate the final synthetic 2D line detection set [TeX:] $$\tilde{\mathscr{L}}_{2 D}$$:

(10)

[TeX:] $$\tilde{\mathscr{L}}_{2 D}=\mathscr{L}_{2 D}^{\prime} \cup \mathscr{L}_R^{\prime} \cup \mathscr{L}_P^{\prime}$$

Since both the [TeX:] $$\tilde{\mathscr{L}}_{3 D}$$ and [TeX:] $$\tilde{\mathscr{L}}_{2 D}$$ are generated from the same underlying [TeX:] $$\mathscr{L}_{3 D}$$, the ground-truth correspondences can be automatically derived without the need for manual labeling. An example generated syntheticline is shown in Fig. 7.

Fig. 7.

Example visualization of the synthetic line detection generation process.

2.5 Pose Refinement

Finally, given the predicted matches [TeX:] $$\mathscr{M}$$ between the detected 2D lines [TeX:] $$\tilde{\mathscr{L}}_{2 D}$$ and transformed 3D lines [TeX:] $$\tilde{\mathscr{L}}_{3 D}$$, we can now estimate the refined pose P_r of the UAV from the initial pose P_i. Since the transformed lines [TeX:] $$\tilde{\mathscr{L}}_{3 D}$$ originate from the 3D line map [TeX:] $$\mathscr{L}_{3 D}$$, their identities are preserved, allowing us to directly associate [TeX:] $$\tilde{\mathscr{L}}_{2 D}$$ with the corresponding lines in [TeX:] $$\mathscr{L}_{3 D}$$. The refined pose P_r is obtained by minimizing the following energy function:

(11)

[TeX:] $$E\left(P_r\right)=\sum_{(i, j) \in \mathscr{M}}\left\|l_i^{\prime}-\tilde{l}_j\right\|^2+w\left\|m_i^{\prime}-\tilde{m}_j\right\|^2$$

where [TeX:] $$l_i^{\prime}$$ and [TeX:] $$m_i^{\prime}$$ represent the direction and moment vectors of the detected 2D line i, and [TeX:] $$\tilde{l}_j$$ and [TeX:] $$\tilde{m}_j$$ are the corresponding direction and moment vectors of the projected 3D line j. The scalar w is a weighting factor that controls the relative contribution of the moment term. It is evident from Eq.(8) that [TeX:] $$l_i^{\prime}$$ is a function of the refined pose P_r. Given the endpoints of [TeX:] $$l_i^{\prime}$$, the corresponding moment vector [TeX:] $$m_i^{\prime}$$ can be computed using Eq.(2). Since [TeX:] $$l_i^{\prime}$$ and [TeX:] $$m_i^{\prime}$$ are differentiable with respect to P_r, the energy function E(P_r) can be efficiently minimized using gradient-based non-linear optimization methods.

2.6 Implementation Details

For the 2D-3D line matching model, we use a feature dimension of D = 128 and apply a total of N_A= 3 graph attention blocks. To generate the synthetic training dataset, we sample 10,000 distinct pairs of initial and refined UAV poses (P_i, P_r) following the formulations in Eq.(6) and Eq.(7). The random distance r used to determine the initial pose is sampled uniformly from the range [n, f] with bounds n =10 and f = 200. Azimuth θ, elevation ϕ, and twist ψ angles are also sampled uniformly within their respective domains. For generating the refined pose P_r, random rotational perturbations δψ, δϕ, δθ are sampled from the range [−π/12, π/12], and translational perturbations δr_x, δr_y, δrz are sampled from [−10, 10]. To simulate realistic noise in 2D line detection, we generate 400 random and parallel lines around the 3D line map, ensuring no intersection with the existing lines. The 2D-3D line matching model is trained for 200 epochs using the Adam optimizer^[17], with a learning rate of 5×10^-4 and a batch size of 64. For pose refinement, the weighting factor in the energy function is set to w = 10⁻⁵, and the optimization is performed using the Levenberg-Marquardt algorithm^[18].

Ⅲ. Experiments

3.1 Dataset

Fig. 8.

Our simulated UAV flight dataset: (a) simulated flight sequences, (b) annotated 3D line map. Rendered images of Dive (Top) and Around (Bottom) sequences in (c) MIWR, (d) VL, (e) night time VL, (f) foggy VL.

To evaluate the effectiveness of our MWIR-based UAV localization framework, we constructed a simulated UAV flight dataset using the physics-based sensor simulator provided by OKTAL-SE^[19], which is widely adopted for generating realistic optical imagery^[20,21]. The simulation was performed over a detailed 3D model of Sinjin Island, with rendered images generated in both MWIR and VL bandwidth. We defined two distinct UAV flight patterns, Around and Dive, as illustrated in Fig. 8(a). These patterns are inspired by widely adopted flight configurations in benchmarks such as the EuRoC MAV Dataset^[22], which provide both easy and hard trajectories to evaluate robustness. In our dataset, the Dive sequence represents a relatively straightforward trajectory with limited viewpoint variation, whereas the Around sequence introduces significant angular and positional diversity, simulating more challenging conditions for localization.

The sensors used in both simulations have a resolution of 640×512 pixels. The intrinsic parameters are set to f_x = f_y = 1140, with a principal point at (p_x, p_y) = (320, 256), while assuming negligible camera distortion. Each sequence lasts 15 seconds and is rendered at 30 frames per second, resulting in a total of 450 frames per sequence. All flight paths were simulated around the annotated 3D line map of Sinjin Island, which consists of geologically meaningful structural lines as shown in Fig. 8(b). To simulate realistic flight conditions, we added random rotational and translational perturbations to the smooth UAV trajectories, following the formulation in Eq.(7). During each simulation, both the UAV pose and the corresponding images were recorded for ground-truth reference. Importantly, our 2D-3D line matching model is trained exclusively on synthetic data generated from the 3D line map of Sinjin Island, without using any annotated MWIR or VL image sequences. Sample MWIR and VL renderings from the Dive and Around sequences are shown in Fig. 8(c) and (d), respectively.

To further validate the advantages of MWIR imagery over visible light (VL) under adverse conditions, we additionally rendered VL versions of both the Dive and Around sequences under two challenging scenarios. The first scenario simulates night-time conditions, where low-light settings were applied toreduce overall image brightness and contrast, as shown in Fig.8(e). The second scenario depicts foggy weather, in which visibility was degraded by overlaying a semitransparent gray veil across the images, as illustrated in Fig.8(f). These conditions are designed to reflect realworld challenges where VL-based localization methods often fail, thereby emphasizing the robustness of MWIR-based localization.

3.2 Evaluation Metrics

To assess the performance of our MWIR-based UAV localization framework, we adopt standard evaluation metrics that are widely used in the visual localization and SLAM literature. Specifically, we evaluate translation and rotation error following established protocols in [4-5], and 2D-3D line matching precision, as commonly used in the feature matching literature^[11]. The translation error E_t> measures the accuracy of the estimated UAV position relative to the ground-truth trajectory. Given the predicted translation [TeX:] $$\hat{t}_i i$$ and ground-truth translation t_i at frame i, the error is defined as:

(12)

[TeX:] $$E_t=\frac{1}{N} \sum_{i=1}^N\left\|\hat{t}_i-t_i\right\|,$$

where N is the total number of frames, and [TeX:] $$\|\cdot\|$$ denotes the Euclidean norm. The rotation error ER quantifies the angular deviation between the estimated rotation matrix [TeX:] $$\hat{R}_i$$ and the ground-truth rotation matrix R_i, and is computed as:

(13)

[TeX:] $$E_R=\frac{1}{N} \sum_{i=1}^N \arccos \left(\frac{\operatorname{Tr}\left(R_i^{\top} \hat{R}_i\right)-1}{2}\right) \cdot \frac{180}{\pi},$$

where Tr(·) denotes the matrix trace. This yields the mean angular error in degrees across all frames. Finally, precision is used to evaluate the accuracy of the predicted 2D-3D line correspondences. A predicted match is considered a true positive (TP) if it correctly matches the ground-truth line pair, and a false positive (FP) otherwise. The precision is then defined as:

(14)

[TeX:] $$\text { Precision }=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}},$$

which reflects the proportion of correct matches among all predicted matches.

3.3 Quantitative Evaluation

3.3.1 Effect of Visual Refinement

We first compared our method against baseline sensor configurations: raw GPS, raw IMU, GPS+IMUfusion, and our full pipeline, GPS+IMUwith visual refinement. GPS is modeled as ground-truth positionscorrupted with zero-mean Gaussian noise to approximate consumer-grade accuracy (20mmean position)[23]. Note that raw GPS provides position only.

The IMU baseline synthesizes gyro/accel signals with white noise and bias random walks and recovers state via strapdown mechanization^[24]. The GPS+IMU base-line uses a loosely coupled EKF with position-only updates^[25], while our method further refines the fused pose via 2D-3D line correspondences. As shown in Table 1, GPS is noisy and provides no orientation, while IMU alone drifts and yields large translation errors. The GPS+IMU fusion approach reduces these errors but remains noise-limited. With using visual refinement, our method achieves the lowest translation and rotation errors, demonstrating the effectiveness of the proposed framework.

Table 1.

Quantitative localization comparison betweenbaseline sensor configurations.

Method	Dive		Around		Total
Method	E_t(m)	E_R(°)	E_t(m)	E_R(°)	E_t(m)	E_R(°)
GPS	15.6	-	21.3	-	18.5	-
IMU	88.0	13.8	51.4	9.0	69.7	11.4
GPS+IMU	57.0	13.8	25.6	9.0	41.3	11.4
Ours	2.8	1.7	7.0	3.3	4.9	2.5

3.3.2 Comparison with VL-based Methods

We evaluated our method on both the Around and Dive sequences by measuring trajectory estimation accuracy and 2D-3D line matching precision. To assess the effectiveness of our MWIR-based localization framework, we compared it against two recent state-of-the-art methods developed for VL imagery, ORB-SLAM3^[4] and DPVO^[5], on both VL and MWIR image sequences. ORB-SLAM3 is a classical simultaneous localization and mapping (SLAM) system that relies on handcrafted feature descriptors for keypoint detection and matching. DPVO is a deep learning-based visual odometry method that estimates camera motion by learning to match local image patches across frames for consistent frame-to-frame tracking. The trajectories for both ORB-SLAM3 and DPVO were estimated directly from the simulated image sequences. Since the estimated trajectories may differ from the groundtruth in scale, orientation, and translation, we aligned them using the Umeyama similarity transformation^[26] to enable fair comparison. As both ORB-SLAM3 and DPVO do not perform 2D-3D line matching, we report only pose estimation results for these methods. The full comparison is summarized in Table 2.

Table 2.

Quantitative localization comparison between VL-based methods on our simulated UAV flight sequences.

Method	Input	Dive					Around					Total
Method	Input	E_t(m)	E_R(°)	TP	FP	Precision	E_t(m)	E_R(°)	TP	FP	Precision	E_t(m)	E_R(°)	TP	FP	Precision
ORBSLAM3[4]	VL	8.6	2.1	-	-	-	15.8	6.2	-	-	-	12.2	4.1	-	-	-
	VL-Night	94.7	11.8	-	-	-	178.9	37.7	-	-	-	136.8	24.8	-	-	-
	VL_Foggy	59.3	9.1	-	-	-	138.4	29.4	-	-	-	98.9	19.3	-	-	-
	MWIR	54.2	3.6	-	-	-	87.4	26.6	-	-	-	70.8	15.1	-	-	-
DPVO[5]	VL	6.5	2.2	-	-	-	13.7	5.5	-	-	-	10.1	3.9	-	-	-
	VL-Night	79.1	9.8	-	-	-	158.3	31.1	-	-	-	118.7	20.5	-	-	-
	VL_Foggy	54.3	8.1	-	-	-	129.1	25.6	-	-	-	91.7	16.9	-	-	-
	MWIR	22.7	4.1	-	-	-	81.5	18.2	-	-	-	52.1	11.3	-	-	-
Ours	VL	2.7	1.6	823	233	0.779	6.6	3.1	903	113	0.889	4.7	2.4	1726	346	0.833
	VL-Night	75.2	9.0	437	555	0.441	150.3	29.0	482	516	0.483	112.7	19.6	919	1071	0.462
	VL_Foggy	52.8	7.8	517	596	0.464	120.5	24.0	561	579	0.492	86.6	16.1	1078	1175	0.479
	MWIR	2.8	1.7	798	266	0.750	7.0	3.3	881	129	0.872	4.9	2.5	1679	375	0.81

As summarized in Table 2, the VL-based baselines behave as expected: they are reasonable on clean VL (e.g., ORB-SLAM3: E_t = 12.2 m, DPVO: Et = 10.1 m) but degrade severely under VL-Night/Foggy (e.g., ORB-SLAM3 total Et ≈ 99-137 m; DPVO Et ≈ 92-119 m). When run on MWIR, both also deteriorate (ORB-SLAM3: 70.8 m, 15.1°; DPVO: 52.1 m, 11.3°), reflecting sparse keypoint detections in MWIR and modality issues due to VL-trained patch trackers. In contrast, our method maintains low error across modalities, 4.7 m/2.4° on VL and 4.9 m/2.5° on MWIR, and degrades more gracefully in VL-Night/Foggy (7.7-9.1 m, 3.5°-4.1°). These gains come from robust 2D-3D line matching, which is less sensitive to illumination and texture; the model achieves approximately 0.82 (82%) precision on MWIR overall (0.87onAround), enabling accurate pose refinement.

3.4 Qualitative Evaluation

3.4.1 Trajectory Visualization

We visualize the trajectories estimated by baseline sensor configurations and our framework in Fig.9(a) and (b). As expected, IMU-only suffers pronounced drift while GPS is reasonably accurate in positionbut does not provide orientation information. However, the loosely coupled GPS+IMU baseline still drifts as inertial errors accumulate, highlighting the limitation of localization in the absence of visual refinement. We also visualize the estimated trajectories also VL-based methods in Fig. 9(c) and (d). In Fig. 9(c), the Dive sequence involves less camera rotation than Around, resulting in relatively better performance for both ORB-SLAM3 and DPVO. As shown in Fig. 9(d), during the Around sequence, both ORB-SLAM3 and DPVO deviate significantly from the ground-truth trajectory from the beginning of the sequence. This degradation is primarily caused by the modality gap between VL and MWIR imagery, which leads to unreliable descriptor and patch matching. However, the estimated trajectories still exhibit instability, again due to poor descriptor and patch matching in the MWIR domain. In contrast, our method produces a more accurate and stable trajectory in both sequences, demonstrating its robustness and reliability under MWIR imaging conditions

Fig. 9.

Qualitative visualizations of trajectories. Comparison with baseline sensor configurations on (a) Dive and (b) Around sequences. Comparison between conventional methods using MWIR input on (a) Dive sequence (b) Around sequences.

3.4.2 Matching and Refinement Visualization

We also visualize the 2D-3D line matching results between the transformed [TeX:] $$\tilde{\mathscr{L}}_{3 D}$$ line map and the detected [TeX:] $$\tilde{\mathscr{L}}_{2 D}$$ lines . Fig. 10(a) presents qualitative results for both the Around and Dive sequences. In both cases, our matching model successfully establishes correct correspondences, even under substantial viewpoint changes involving significant rotation and translation. Additionally, as shown in the rightmost matching visualization, the model remains robust to various line detection noises such as line elimination, splitting, and endpoint perturbation, which frequently occur when the UAV approaches the target lines. Despite these challenges, our model consistently identifies correct matches, demonstrating strong robustness to real-world detection artifacts. To further assess the effectiveness of pose refinement, we visualize the projections of [TeX:] $$\tilde{\mathscr{L}}_{3 D}$$ before and after refinement in Fig. 10(b). In these visualizations, the red lines represent the projected 3D lines, while the green lines indicatethe matched 2D detections. As shown in the second row of Fig. 10(b), the alignment between projected and detected lines significantly improves after pose refinement, confirming the accuracy of the estimated pose.

Fig. 10.

Qualitative visualizations of (a) 2D-3D line matching and (b) reprojection of 3D lines after pose refinement.

3.5 Ablation Studies

We performed ablation studies to investigate the impact of both the synthetic data augmentation strategy and key parameters of the 2D-3D line matching model. To evaluate the effectiveness of the augmentation strategy, we compared 2D-3D line matching accuracy under four settings: no augmentation, target line augmentation only, random line augmentation only, and both augmentations combined. The results, summarized in Table 3, show that without random line augmentation, the model achieves only 4-7% precision, whereas adding random line augmentation increases precision significantly, reaching up to 70%. Although target line augmentation alone does not significantly improve performance, it effectively reduces false positives. When combined with random augmentation, it improves precision by up to 11.35%, which is particularly beneficial as reducing false matches directly contributes to more accurate pose refinement.

Table 3.

Ablation on data augmentation strategy.

Method	TP	FP	Precision
No Aug.	90	1921	0.0448
w/ Target Aug.	143	1824	0.0727
w/ Random Aug.	1480	618	0.7054
w/ Both	1679	375	0.8174

We also searched for the optimal parameter settings of our matching model by varying the feature dimension size D and the number of attention blocks N_A. As shown in Table 4, the model achieved the highest matching precision when the feature dimension was set to D = 128. A lower feature dimension resulted in insufficient model capacity, while higher dimensions led to overfitting and degraded performance. Similarly, the best results for the number of attention blocks were observed at N_A = 3. Increasing the number of attention blocks beyond this did not yield further improvements and instead introduced more false positives, which negatively affects pose refinement.

Table 4.

Ablation on model parameters.

Model Parameters	TP	FP	Precision
D=64	1322	375	0.7790
D=128	1679	375	0.8174
D=256	1549	464	0.7695
N_A=1	1189	387	0.7544
N_A=2	1443	336	0.8111
N_A=3	1679	375	0.8174
N_A=4	1647	387	0.8097
N_A=5	1650	393	0.8076
N_A=6	1492	498	0.7497
N_A=7	1517	414	0.7856
N_A=8	1214	586	0.6744
N_A=9	1538	430	0.781

3.6 Runtime Evaluation

As efficient onboard processing is crucial for realtime UAV localization, we evaluate the computational performance of our proposed framework across two hardware platforms: a standard PC and an embedded system. The evaluation setup is shown in Fig. 11(a) The PC is equipped with an Intel Core i7-6700 CPU and an NVIDIA GeForce GTX 1050 Ti GPU. The embedded platform, shown in Fig. 11(b), is an NXPi.MX 8M Plus^[27], a low-power board suitable for UAV deployment. The software pipeline of our framework is illustrated in Fig. 11(c), where multiprocessing is employed to parallelize the key modules.

Fig. 11.

Runtime analysis hardware and software settings. (a) Evaluation setup (b) NXP i.MX 8M Plus board (c) Software pipeline with multiprocessing.

The overall runtime evaluation is conducted as follows. The 2D-3D line matching model is first initialized in the main process and converted to ONNX^[28] format for lightweight and fast inference. The pipeline begins in the main process, simulating frame acquisition by reading MWIR image sequence frame-byframe saved in a local directory. Three subprocesses are then spawned to handle the core tasks: Process #1 performs 2D line detection from the current MWIR frame, Process #2 conducts 2D-3D line matching using the ONNX model, and Process #3 performs pose refinement and visualization. The acquired frames are passed to Process #1, which performs 2D line detection. The detected lines are then sequentially forwarded to Process #2 for 2D-3D line matching, and subsequently to Process #3 for pose refinement and visualization, completing the full localization loop. Table 5 summarizes the runtime of each module and the overall throughput. The framework runs at average 61 FPS on the standard PC and 20.3 FPS on the embedded board, confirming its real-time capability even on resource-constrained hardware.

Table 5.

Runtime performance on PC and embedded board.

Hardware	Main Proc.	Proc. #1	Proc. #2	Proc. #3	FPS
PC	0.0121	0.0022	0.0154	0.0164	61.0
Embedded Board	0.0239	0.0213	0.0492	0.0475	20.3

Ⅳ. Conclusion

We proposed a novel UAV localization framework for MWIR imagery, addressing the limitations of VL-based methods under challenging visibility conditions. Our approach leverages 2D-3D line matching between MWIR-detected lines and a predefined 3D line map, enabled by a robust matching model trained entirely on a synthetically augmented dataset. The framework achieves high matching precision and accurate pose refinement, demonstrating strong performance compared to VL-based baselines on simulated MWIR flights. It also runs in real-time on both desktop and embedded platforms, supporting practical deployment. A current limitation is the reliance on a predefined 3D line map. In future work, we aim to extend our framework to handle unseen environments by generalizing the matching model for online or map-free localization.

Biography

Jungwoo Huh

2017 : B.S. degree, Yonsei University

2017~Present : Integrated M.S & Ph.D. Program, Yonsei University

<Research Interest> Human Motion Estimation, Camera Pose Estimation, Intelligent Surveillance

[ORCID:0000-0002-1103-8309]

Biography

Jaekyung Kim

2017 : B.S. degree, Yonsei University

2017~Present : Integrated M.S & Ph.D. Program, Yonsei University

<Research Interest> Light Simulation, Photo-realistic Rendering, Object Detection

[ORCID:0000-0001-5142-8814]

Biography

Kyungjune Lee

2018 : B.S. degree, Yonsei University

2018~Present : Integrated M.S & Ph.D. Program, Yonsei University

<Research Interest> Camera Path Planning, Dynamic 3D Registration, Multi-modal Content Generation

[ORCID:0009-0003-9309-5722]

Biography

Ingu Park

2006 : B.S. degree, Pusan National University

2008 : M.S. degree, Pusan National University

2007~Present : Chief Research Engineer, Missile System Seeker R&D, LIG Nex1 Co., Ltd.

<Research Interest> Embedded System, Image Processing, Visual Object Detection and Tracking

[ORCID:0009-0009-1849-1067]

Biography

Junhyeong Bak

2020 : B.S. degree, Hannam University

2022 : M.S. degree, Inha University

2022~Present : Research Engineer, Missile System Seeker R&D, LIG Nex1 Co., Ltd.

<Research Interest> Fast Visual Object Detection and Tracking

[ORCID:0009-0009-4006-2357]

Biography

Byungjin Kang

2020 : B.S. degree, Yeungnam University

2022 : M.S. degree, Yeungnam University

2022~Present : Research Engineer, Missile System Seeker R&D, LIG Nex1 Co., Ltd.

<Research Interest> Infrared Image Object Detection and Tracking

[ORCID:0000-0002-1527-7884]

Biography

Sanghoon Lee

1989 : B.S. degree, Yonsei University

1991 : M.S. degree, Korea Advanced Institute of Science and Technology (KAIST)

1991~1996 : Member of Technical Staff, Korea Telecom (KT)

2000 : Ph.D. degree, University of Texas at Austin

2000~2002 : Member of Technical Staff, Lucent Technologies

2003~2007 : Assistant Professor, Department of Electrical and Electronic Engineering, Yonsei University

2007~2012 : Associate Professor, Department of Electrical and Electronic Engineering, Yonsei University

2012~Present : Full Professor, Department of Electrical and Electronic Engineering, Yonsei University

<Research Interest> Quality Assessment, 3D Character Animation, Multi-modal Content Generation

[ORCID:0000-0001-9895-5347]

References

1 U. Niethammer, M. James, S. Rothmund, J. Travelletti, and M. Joswig, "Uav-based remote sensing of the super-sauze landslide: Evaluation and results," Eng. Geology, vol. 128, pp. 2-11, Mar. 2012. (https://doi.org/10.1016/j.enggeo.2011.03.012)doi:[[[10.1016/j.enggeo.2011.03.012]]]
2 S. H. Oh, S. W. Park, and J. G. Kim, "Uavbased coastal drowning detection method," Korea Artificial Intell. Conf., pp. 129-130, Jeju Island, Korea, Sep. 2024.custom:[[[-]]]
3 J. Wendel, O. Meister, C. Schlaile, and G. F. Trommer, "An integrated GPS/MEMS-IMU navigation system for an autonomous helicopter," Aerospace Sci. and Technol., vol. 10, no. 6, pp. 527-533, Sep. 2006. (https://doi.org/10.1016/j.ast.2006.04.002)doi:[[[10.1016/j.ast.2006.04.002]]]
4 C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. Montiel, and J. D. Tardós, "Orb-slam3: An accurate open-source library for visual, visualinertial, and multimap slam," IEEE Trans. Robotics, vol. 37, no. 6, pp. 1874-1890, Dec. 2021. (https://doi.org/10.1109/TRO.2021.3075644)doi:[[[10.1109/TRO.2021.3075644]]]
5 Z. Teed, L. Lipson, and J. Deng, "Deep patch visual odometry," Advances in NeurIPS, vol. 36, pp. 39033-39051, New Oreleans, LA, USA, Dec. 2023.custom:[[[-]]]
6 D. G. Lowe, "Distinctive image features from scale-invariant keypoints," Int. J. Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004. (https://doi.org/10.1023/B:VISI.0000029664.99 615.94)doi:[[[10.1023/B:VISI.0000029664.99615.94]]]
7 C. Akinlar and C. Topal, "Edlines: A real-time line segment detector with a false detection control," Pattern Recognition Lett., vol. 32, no. 13, pp. 1633-1642, Oct. 2011. (https://doi.org/10.1016/j.patrec.2011.06.001)doi:[[[10.1016/j.patrec.2011.06.001]]]
8 H. Yu, W. Zhen, W. Yang, and S. Scherer, "Line-based 2-d-3-d registration and camera localization in structured environments," IEEE Trans. Instrumentation and Measurement, vol. 69, no. 11, pp. 8962-8972, Nov. 2020. (https://doi.org/10.1109/TIM.2020.2999137)doi:[[[10.1109/TIM.2020.2999137]]]
9 H. Yu, W. Zhen, W. Yang, J. Zhang, and S. Scherer, "Monocular camera localization in prior lidar maps with 2d-3d line correspondences," in 2020 IEEE/RSJ Int. Conf. IROS, pp. 4588-4594, Las Vegas, NV, USA, Oct. 2020. (https://doi.org/10.1109/IROS45743.2020.9341 690)doi:[[[10.1109/IROS45743.2020.9341690]]]
10 S. Teller and M. Hohmeyer, "Determining the lines through four lines," J. Graphics Tools, vol. 4, no. 3, pp. 11-22, Nov. 1999. (https://doi.org/10.1080/10867651.1999.10487506)doi:[[[10.1080/10867651.1999.10487506]]]
11 P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, "Superglue: Learning feature matching with graph neural networks," in Proc. IEEE/CVF Conf. CVPR, pp. 4938-4947, Seattle, WA, USA, Jun. 2020. (https://doi.org/10.1109/CVPR42600.2020.00499)doi:[[[10.1109/CVPR42600.2020.00499]]]
12 R. G. Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, "Lsd: A line segment detector," Image Process. On Line, vol. 2, pp. 35-55, Mar. 2012. (https://doi.org/10.5201/ipol.2012.gjmr-lsd)doi:[[[10.5201/ipol.2012.gjmr-lsd]]]
13 N. Xue, S. Bai, F. Wang, G.-S. Xia, T. Wu, and L. Zhang, "Learning attraction field representation for robust line segment detection," in Proc. IEEE/CVF Conf. CV PR, pp. 1595-1603, Long Beach, CA, USA, Jun. 2019. (https://doi.org/10.5201/CVPR.2019.00169)doi:[[[10.5201/CVPR.2019.00169]]]
14 G. Gu, B. Ko, S. Go, S.-H. Lee, J. Lee, and M. Shin, "Towards light-weight and real-time line segment detection," in Proc. AAAI Conf. Artificial Intell., vol. 36, pp. 726-734, Virtual, Feb. 2022. (https://doi.org/10.1609/aaai.v36i1.19953)doi:[[[10.1609/aaai.v36i1.19953]]]
15 M. Cuturi, "Sinkhorn distances: Lightspeed computation of optimal transport," Advances in NeurIPS, vol. 26, pp. 2292-2300, Lake Tahoe, NV, USA, Dec. 2013.custom:[[[-]]]
16 J. Munkres, "Algorithms for the assignment and transportation problems," J. Soc. Ind. and Appli. Math., vol. 5, no. 1, pp. 32-38, 1957. (https://doi.org/10.1137/0105003)doi:[[[10.1137/0105003]]]
17 D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in ICLR (Poster), San Diego, CA, USA, May 2015. (https://doi.org/10.48550/arXiv.1412.6980)doi:[[[10.48550/arXiv.1412.6980]]]
18 J. J. Moré, "The Levenberg-Marquardt algorithm: Implementation and theory," in Numerical Analysis, G. A. Watson, Ed., Lecture Notes in Mathematics, vol. 630, pp. 105-116, Aug. 2006. (https://doi.org/10.1007/BFb0067700)doi:[[[10.1007/BFb0067700]]]
19 OKTAL-SE, Oktal-se-simulation environment for infrared and radar applications, Retrieved Jun., 4, 2025, from https://www.oktal-se.fr/custom:[[[https://www.oktal-se.fr/]]]
20 S. Kim, W.-J. Song, and S.-H. Kim, "Robust ground target detection by sar and ir sensor fusion using adaboost-based feature selection," Sensors, vol. 16, no. 7, p. 1117, Jul. 2016. (https://doi.org/10.3990/s160771117)doi:[[[10.3990/s160771117]]]
21 T. Kim and H. Bang, "Fractal texture enhancement of simulated infrared images using a cnn-based neural style transfer algorithm with a histogram matching technique," Sensors, vol. 23, no. 1, p. 422, Dec. 2022. (https://doi.org/10.3990/s23010422)doi:[[[10.3990/s23010422]]]
22 M. Burri, J. Nikolic, P. Gohl, et al., "The euroc micro aerial vehicle datasets," Int. J. Robotics Res., vol. 35, no. 10, pp. 1157-1163, Sep. 2016. (https://doi.org/10.1177/0278364915620033)doi:[[[10.1177/0278364915620033]]]
23 K. Merry and P. Bettinger, "Smartphone GPS accuracy study in an urban environment," PloS one, vol. 14, no. 7, e0219890, Jul. 2019. (https://doi.org/10.1371/journal.pone.0219890)doi:[[[10.1371/journal.pone.0219890]]]
24 D. H. Titterton and J. L. Weston, Strapdown Inertial Navigation Technol., 2nd ed., The Institution of Eng. and Technol. (IET), 2004. (https://doi.org/10.1049/PBRA017E)doi:[[[10.1049/PBRA017E]]]
25 P. D. Groves, Principles of GNSS, Inertial, and Multisensor Integrated Navigation Syst., 2nd ed., Artech House, 2013. (https://doi.org/10.1017/S03734633313000672)doi:[[[10.1017/S03734633313000672]]]
26 S. Umeyama, "Least-squares estimation of transformation parameters between two point patterns," IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 04, pp. 376-380, Apr. 1991. (https://doi.org/10.1109/34.88573)doi:[[[10.1109/34.88573]]]
27 NXP Semiconductors, i.MX 8M Plus Evaluation Kit (8MPLUSLPD4-EVK), Retrieved Jun., 4., 2025, from https://w ww.nxp.com/design/desig n-center/development-boards-and-designs/8MP LUSLPD4-EVKcustom:[[[https://www.nxp.com/design/design-center/development-boards-and-designs/8MPLUSLPD4-EVK]]]
28 Microsoft Corporation, ONNX Runtime: Accelerating Machine Learning Inferencing, Retriev ed Jun., 4., 2025, https://onnxruntime.ai/custom:[[[https://onnxruntime.ai/]]]

Received: July 10 2025

Revision received: August 20 2025

Accepted: August 21 2025

Published (Electronic): January 31 2026

Corresponding Author: Sanghoon Lee , slee@yonsei.ac.kr

Jungwoo Huh, Yonsei University, Department of Electrical and Electronic Engineering, gjwjddn9@yonsei.ac.kr

Jaekyung Kim, Yonsei University, Department of Electrical and Electronic Engineering, jkkproject@yonsei.ac.kr

Kyungjune Lee, Yonsei University, Department of Electrical and Electronic Engineering, naive2kj90@yonsei.ac.kr

Ingu Park, LIG Nex1 Co., Ltd, Missile System Seeker R&D, ingu.park2@lignex1.com

Junhyeong Bak, LIG Nex1 Co., Ltd, Missile System Seeker R&D, junhyeong.bak@lignex1.com

Byungjin Kang, LIG Nex1 Co., Ltd, Missile System Seeker R&D, byungjin.kang@lignex1.com

Sanghoon Lee, Yonsei University, Department of Electrical and Electronic Engineering, slee@yonsei.ac.kr

Statistics

Cite this article

IEEE Style

J. Huh, J. Kim, K. Lee, I. Park, J. Bak, B. Kang, S. Lee, "A Line-Based Unmanned Aerial Vehicle Localization Framework Using Mid-Wave Infrared Observations," The Journal of Korean Institute of Communications and Information Sciences, vol. 51, no. 1, pp. 169-183, 2026. DOI: 10.7840/kics.2026.51.1.169.

ACM Style

Jungwoo Huh, Jaekyung Kim, Kyungjune Lee, Ingu Park, Junhyeong Bak, Byungjin Kang, and Sanghoon Lee. 2026. A Line-Based Unmanned Aerial Vehicle Localization Framework Using Mid-Wave Infrared Observations. The Journal of Korean Institute of Communications and Information Sciences, 51, 1, (2026), 169-183. DOI: 10.7840/kics.2026.51.1.169.

KICS Style

Jungwoo Huh, Jaekyung Kim, Kyungjune Lee, Ingu Park, Junhyeong Bak, Byungjin Kang, Sanghoon Lee, "A Line-Based Unmanned Aerial Vehicle Localization Framework Using Mid-Wave Infrared Observations," The Journal of Korean Institute of Communications and Information Sciences, vol. 51, no. 1, pp. 169-183, 1. 2026. (https://doi.org/10.7840/kics.2026.51.1.169)

Index

Figures

Tables

Facebook

Twitter

LinkedIn

BibTex

RIS

Jungwoo Huh♦, Jaekyung Kim*, Kyungjune Lee*, Ingu Park**, Junhyeong Bak**, Byungjin Kang**, Sanghoon Lee°

A Line-Based Unmanned Aerial Vehicle Localization Framework Using Mid-Wave Infrared Observations

Ⅰ. Introduction

Ⅱ. Methodology

2.1 Notations

(1)

2.2 Line Detection

2.3 Line Matching

(2)

(3)

(4)

(5)

2.4 Synthetic Data Augmentation

(6)

(7)

(8)

(9)

(10)

2.5 Pose Refinement

(11)

2.6 Implementation Details

Ⅲ. Experiments

3.1 Dataset

3.2 Evaluation Metrics

(12)

(13)

(14)

3.3 Quantitative Evaluation

3.3.2 Comparison with VL-based Methods

3.4 Qualitative Evaluation

3.5 Ablation Studies

3.6 Runtime Evaluation

Ⅳ. Conclusion

Biography

Jungwoo Huh

Biography

Jaekyung Kim

Biography

Kyungjune Lee

Biography

Ingu Park

Biography

Junhyeong Bak

Biography

Byungjin Kang

Biography

Sanghoon Lee

References

Statistics

Related Articles

자율주행을 위한 Monocular Depth Estimation에 관한 연구

자율주행을 위한 포인트 클라우드 3D 객체 인식에 관한 연구

무선랜 환경에서 처리량 향상을 위한 동시 전송

Precoded OFDMA with Superimposed Pilots

개선된 보팅 정책을 적용한 허프 변환 하드웨어 구조

자율주행 노면 청소차의 쓰레기 객체 탐지 성능 향상을 위한 이미지 합성 기법

최적 증강: IoST에서 Terrestrial-CubeSat 간 핸드오버를 위한 기계학습 기반 단말기 이동성 예측 알고리즘

전경/배경 정보의 융합을 이용한 딸기 질병의 효율적인 분류

IR 및 RGB 카메라 기반 행동 검출 인공지능을 활용한 주/야간 소 발정 모니터링 시스템 개발

운전자 및 동승자 머리 자세 추정 및 딥러닝을 이용한 교통사고 모니터링 시스템

Cite this article

Jungwoo Huh♦, Jaekyung Kim, Kyungjune Lee, Ingu Park, Junhyeong Bak, Byungjin Kang**, Sanghoon Lee°