Research and Implementation of a Hearing Aid Based on a Mel-Phase-Spectrum-Preprocessed GAN Model

Zujie Fan♦; Jaesoo Kim°

doi:10.7840/kics.2025.50.6.875

ISSN: 1226-4717

Volume 50, No 6 (2025), pp. 875 - 883

10.7840/kics.2025.50.6.875

Zujie Fan♦ and Jaesoo Kim°

Research and Implementation of a Hearing Aid Based on a Mel-Phase-Spectrum-Preprocessed GAN Model

Abstract: This paper introduces the MPSP(Mel-Phase-Spectrum-Preprocessed) Algorithm for optimizing audio preprocessing, combined with a GAN(Generative Adversarial Network) model to enhance audio output. The MPSP replaces the phase estimation method of the Griffin-Lim algorithm with a phase pre-storing technique, thereby improving noise reduction for audio output from a low-cost, custom-built hearing aid. Experimental results demonstrate that MPSP improves SDR(source-to-distortion ratio) performance by 2.31 times compared to the Griffin-Lim algorithm. The GAN model trained on data preprocessed with the MPSP algorithm was tested in three different environments, showing superior MSE(Mean Square Error) performance over the Spectral Gating-based noise-reduce method and the Denoising Autoencoder model. In PESQ(Perceptual evaluation of speech quality) evaluations, the GAN model maintained high performance in complex environments such as classrooms and workplaces, except in extremely noisy settings like restaurants. The hearing aid employs a deep neural network model to achieve cost-effective audio noise reduction, significantly improving the quality of life for individuals with hearing impairments. By pre-training the model for deployment on embedded systems, this solution can be widely applied across various industries.

Keywords: Hearing Aid , Speech processing , Deep learning , Autoencoder , GAN

Ⅰ . Introduction

A portion of the global population consists of in- dividuals with hearing impairments. According to the literature^[1], this group often experiences social iso- lation, depression, and cognitive decline, which con- tributes to hearing loss being recognized as the fourth most common disability worldwide. Currently, among advanced technologies, cochlear implants are the most effective solution for improving the hearing of in- dividuals with hearing loss; however, due to their high cost and the surgical risks associated with elderly pa- tients, hearing aids have become a viable alternative. Most commercially available hearing aids primarily use traditional filter-based noise reduction methods^[2], which require extensive calibration and incur high la- bor costs. Although numerous improved algorithms, such as the spectral gating-based noise reduction method^[3], demonstrate good performance, they are not suitable for small hearing aids. Thanks to the rapid development of artificial intelligence, deep learning has gradually become mainstream in the field of audio noise reduction, and existing neural network models have shown significant effectiveness in this area. However, due to the varying environments of in- dividuals with hearing impairments, neural network models face several challenges. For instance, au- toencoders^[4] are typically trained only on clean data, leading to a scarcity of labeled data, while conven- tional convolutional neural network models exhibit poor generalization ability and usually require large datasets for training^[5]. To address these challenges, we propose a phase preprocessing algorithm for audio data preprocessing, combined with a deep learning GAN model^[6] for data training. This approach aims to optimize the audio quality of hearing aids and im- prove the quality of life for individuals with hearing impairments. Experimental results demonstrate that the proposed method exhibits superior audio noise re- duction performance across most environments com- pared to other approaches.

Ⅱ. Related works

This section presents the principles of hearing aids along with traditional noise reduction algorithms and neural network-based noise reduction methods. It pro- vides a detailed analysis of their advantages and dis- advantages, serving as a foundational theoretical refer- ence for the phase preprocessing algorithm combined with the GAN model proposed in this paper to opti- mize audio processing.

2.1 Principles of hearing aid

The history of hearing aids dates back to the 18th century, when devices made of metal or wood were used to enhance sound transmission. The introduction of transistor technology in the 1940s and 1950s led to more compact and lightweight hearing aids. With the rapid advancement of modern technology, there is an increasing emphasis on sound quality and the adaptation of hearing aids to meet diverse auditory needs. This has been achieved through the in- corporation of DSP (Digital Signal Processors)^[7] for enhancements.

The operational principles of hearing aids can be divided into four steps:

Sound Wave Capture: The microphone captures ambient sound waves and converts them into electrical signals.

Signal Amplification: An operational amplifier is used to increase the signal strength for further processing. The amplification of the audio signal can be represented by Equation (1), where gain determines the device's ability to amplify sound and can be calcu- lated using Equation (2).

(1)

[TeX:] $$\begin{equation} V_{\text {OUT }}=A \times V_\epsilon \end{equation}$$

(2)

[TeX:] $$\begin{equation} \text { Gain }(\mathrm{dB})=10 \log _{10}\left(\frac{P_{\text {out }}}{P_\epsilon}\right) \end{equation}$$

Digital Signal Processing: Since conventional audio amplifiers do not typically implement noise reduction, other technologies must be integrated for effective digital signal processing.

Output: The processed signals are directly trans- mitted to the listener through headphones.

These principles and equations provide a theoretical foundation for the design and performance opti- mization of hearing aids. The advanced hearing aid processor technologies presented in Reference^[8] high- light significant improvements in the field. However, challenges such as cost, power consumption, and size continue to pose bottlenecks in the hardware develop- ment of hearing aids. Therefore, this study aims to integrate hardware solutions with artificial intelligence models to achieve enhanced audio quality.

2.2 Conventional noise reduction algorithms

Traditional noise reduction algorithms include fil- ters, spectral subtraction, and adaptive filters, which primarily suppress noise by modeling it or assuming the distribution characteristics of noise and signals. The principle of filter noise reduction involves limit- ing specific frequency signals (high or low frequency) to achieve audio noise reduction; however, its effec- tiveness is limited for complex and variable noise, which may lead to signal distortion. Reference^[9] pro- poses a method that combines filters with recurrent neural networks to effectively suppress echo noise, highlighting the advantages of neural networks in au- dio processing. Additionally, the noise_reduce algo- rithm is based on spectral gating, determining noise gates by calculating statistical data for each frequency channel. The IGSS algorithm presented in reference^[10] improves speech enhancement performance when the SNR is greater than -10 dB but is unsuitable for more complex hearing aid applications.

2.3 Deep learning noise reduction algorithm

The introduction of neural network models for au- dio noise reduction has significantly improved the tun- ing efficiency of hearing aids. Common neural net- work models include Convolutional Neural Networks (CNNs), Autoencoders, and Generative Adversarial Networks (GAN). Additionally, Long Short-Term Memory (LSTM), a special type of Recurrent Neural Network (RNN), can be incorporated to better handle long time series data, thus enhancing training accuracy.

In reference^[11,12], a method using Convolutional Autoencoders showed improved performance com- pared to Wave-U-Net, with the average difference be- tween the restored audio and the original being less than 2%. However, since this model directly utilizes waveforms, its effectiveness in handling nonlinear dis- tortions may be limited, and the training requires a large amount of high-quality labeled data, which is often not available from hearing aids. Reference^[13] presents a Variational Autoencoder (VAE) with 2D convolutional filters for background noise sup- pression, applying this model to practical applications in real-time audio streaming.

Reference^[14] presented a method utilizing CNN adaptive predictive filters, which achieved a 17.2% improvement compared to Full-connected Neural Network (FNN). However, due to the need for imme- diate feedback in hearing aids, the extensive use of CNN layers may introduce high latency, and the com- putational requirements of CNN models can lead to excessive power consumption in hearing aids.

Reference^[15] introduced the Denoise Auto-encoder with Generative Adversarial Networks (DNAN-GAN) model to address the issues of insufficient variability and excessive noise in traditional Linear Predictive Coding when generating speech. This approach, com- bined with Autoencoder models, demonstrated ex- cellent performance in audio noise reduction. Nonetheless, the method still has limitations, as a large number of speech samples may adversely affect the original GAN performance.

Overall, the GAN model shows significant advan- tages in the complex usage environment of hearing aids, effectively capturing complex data distributions to reconstruct audio signals that are disturbed in noisy conditions. Compared to traditional noise reduction al- gorithms, GAN excel in handling complex back- ground noises, such as crowd and traffic sounds. Therefore, this paper aims to enhance audio quality in hearing aids by exploring more suitable algorithmic integrations based on the GAN model to overcome existing limitations.

Ⅲ . Mel-Phase-Spectrum-Preprocessed GAN Hearing Aid

3.1 Design and realization of a simple hearing Aid

Fig 1. shows the circuit diagram of a simple hearing aid. To meet the requirements for low power con- sumption, compact size, and high audio quality, this circuit employs the TDA2320A operational amplifier from STMicroelectronics as the primary amplification component, powered by a +3.7V DC source from a 18650 lithium battery (BAT1). The microphone input signal is isolated through capacitors C3 and C4 to eliminate any DC offset interference. The circuit al- lows for adjustment of the output audio level via po- tentiometer J1. The amplification performance of the audio signal using this circuit has achieved the ex- pected standards.

Fig. 1.

The simple hearing aid circuit diagram.

3.2 Mel-Phase-Spectrum-Preprocessed algorithm

The Griffin-Lim algorithm is an iterative method used to reconstruct audio signals from the magnitude spectrum. In the absence of phase information, the al- gorithm begins with a random initial phase estimate, repeatedly performs time-frequency transformations, and constrains the spectrum using the input magnitude. Through this iterative process, the phase estimate is gradually refined, resulting in an audio sig- nal that closely approximates the original. MPSP algo- rithm is inspired by the work in reference^[16]. MPSP algorithm consists of two main steps, implemented in two functions, to convert the original audio signal into a Mel spectrogram and subsequently restore the audio signal. The first function extracts the Mel spectrogram using the librosa.stft function, which transforms the audio signal into its spectrogram representation via STFT(Short-Time Fourier Transform), while preserv- ing the phase information needed for reconstruction. The second function restores the audio from the Mel spectrogram by utilizing the pre-stored phase values and performing an inverse STFT^[17] (ISTFT) through librosa.istft, thereby converting the reconstructed STFT back into the time-domain signal, effectively recovering the original audio. The phase preprocess- ing-based MPSP algorithm achieves a 2.31-fold im- provement in SDR performance compared to the Griffin-Lim algorithm through iterative estimation.

3.3 Construction and training of the GAN Model

Fig 2. shows the Mel-Phase-Spectrum-Preprocessed GAN model. The Hearing Aid Noisy Signal Dataset provides noisy audio input, while the Clean Signal Dataset/REAL serves as a reference to train the dis- criminator with clean, noise-free signals. The Generator (G) converts noisy audio into “fake” denoised signals, while the Mel Spectrogram Discriminator (D) uses Mel spectrograms to dis- tinguish between real clean signals (REAL) and gen- erated fake signals (FAKE). The generator, compris- ing an input layer, hidden layers, and an output layer, processes noisy Mel spectrograms and outputs clean versions. The following is a detailed description of the work done at each level:

Fig. 2.

The simple hearing aid circuit diagram

Input Layer: The input layer of the generator ac- cepts Mel spectrogram data shaped as (time steps, fre- quency channels), where the frequency channels cor- respond to the frequency feature dimensions of the Mel spectrogram.

Hidden Layers: The first hidden layer is a fully connected layer with 128 nodes, utilizing the ReLU(Rectified Linear Unit) activation function to learn the latent features of the noisy Mel spectrogram. The second hidden layer is also a fully connected lay- er with 64 nodes, continuing to use the ReLU activa- tion function to further extract deeper features.

Output Layer: The output layer is a fully connected layer with the same number of nodes as the frequency channels of the Mel spectrogram, employing the tanh activation function to generate normalized Mel spec- trogram outputs. The purpose of the generator in the GAN model is to simulate the distribution of the target clean Mel spectrogram, effectively removing noise from the input data.

Discriminator: The discriminator also consists of an input layer, hidden layers, and an output layer.

Input Layer: The input layer of the discriminator similarly accepts Mel spectrogram data shaped as (time steps, frequency channels).

Hidden Layers: The first hidden layer is a fully connected layer with 64 nodes, using the ReLU acti- vation function to extract features from the input data. The second hidden layer is a fully connected layer with 128 nodes, continuing with the ReLU activation function to enhance the depth of feature extraction.

Output Layer: The output layer is a fully connected layer with a single node, utilizing the sigmoid activa- tion function to output a probability value indicating whether the input signal is “real” or “fake.” The dis- criminator classifies both real and generated Mel spec- trograms, providing feedback that guides the opti- mization process of the generator.

Equation (5) represents the loss function of the gen- erator, where [TeX:] $$\begin{equation} \left(D\left(G\left(z_i\right)\right)\right. \end{equation}$$ is the output of the discrim- inator for generated samples. The generator aims to maximize this output, making the discriminator be- lieve that the generated samples are real.

(5)

[TeX:] $$\begin{equation} L_G=-\frac{1}{N} \sum_{i=1}^N \log \left(D\left(G\left(z_i\right)\right)\right) \end{equation}$$

Equation (6) represents the loss function of the dis- criminator, where N is the number of samples in a batch, y_i is the label (with 1 for real samples and 0 for fake samples), D (x_i ) is the discriminator's output for the real sample x_i , G (z_i ) is the generator's output for random noise z_i , and [TeX:] $$\begin{equation} \left(D\left(G\left(z_i\right)\right)\right. \end{equation}$$ is the discrim- inator's output for the generated samples.

(6)

[TeX:] $$\begin{equation} L_D=-\frac{1}{N} \sum_{i=1}^N\left[y_i \log \left(D\left(x_i\right)\right)+\left(1-y_i\right) \log \left(1-D\left(G\left(z_i\right)\right)\right)\right] \end{equation}$$

Ⅳ . Performance comparison

4.1 The environment of performance evaluation

In this experiment, the dataset was obtained from audio recordings produced by a basic hearing aid in three different environments: a restaurant, a work- place, and a classroom. For each environment, 30 samples of clean audio recorded via mobile phone and 30 samples of noisy audio recorded through the hear- ing aid were collected, ach in WAV format with a duration of 10 seconds. The test data were collected in the same manner as the training data. The dataset has been uploaded to Kaggle, and its name is “HearingAid_voice”.

The performance of the hearing aid was evaluated through practical testing, with the results summarized as follows:

The hearing aid presented in this study is powered by a 903440X Li-ion battery with a capacity of 1500mAh. Performance evaluation through practical usage tests revealed that the device operates con- tinuously for 30 hours at standard volume settings. For typical usage scenarios, with an average daily use of 3 hours, the battery sustains operation for approx- imately one week.

In practical evaluations, a simulated experiment was conducted by deploying the model on a Raspberry Pi Zero (power consumption: 1.7 W). The results demonstrated that the latency of denoised audio feed- back was less than 0.5 seconds, effectively meeting the daily usage requirements of individuals with hear- ing impairments.

4.2 Performance comparison between noise reduction algorithms

The experiment involved applying three different noise reduction methods to the collected dataset. The first method applied the spectral gating-based re- duce-noise algorithm directly to the audio for noise reduction. The second method utilized a Denoising Autoencoder based on unsupervised learning. The third method used a GAN-based model, where the model was trained first, followed by validation using the test data. After applying the noise reduction meth- ods, the Mel spectrograms of the denoised audio were compared with those of the clean audio. Various eval- uation metrics such as SDR, PSNR, PESQ, and MSE were calculated to assess the performance of each approach. Fig 3. shows a comparison of SDR per- formance between the original audio, the Griffin-Lim algorithm, and the MPSP algorithm in restoring Mel spectrogram data. The proposed MPSP method im- proves the SDR performance by 2.31 times compared to the Griffin-Lim algorithm. Fig 4. shows a compares the restored audio signals between the original audio, the Griffin-Lim algorithm, and the MPSP algorithm. The results demonstrate that the audio restored using the MPSP method, which incorporates the original phase, closely resembles the original audio signal, while the audio restored by the iterative phase estima- tion of the Griffin-Lim algorithm suffers noticeable quality degradation.

Fig. 3.

Comparison of Mel spectrograms after restoration of original audio, griffin-Lim algorithm and MPSP algorithm.

Fig. 4.

Comparison of audio signal after restoration of original audio, griffin-Lim algorithm and MPSP algorithm.

This study conducted experiments in these three different environments and performed a comparative analysis of the resulting data.

Fig. 5.

Performance Evaluation of MSE, SDR, PESQ, and PSNR for Different Noise Reduction Algorithms in Work Environment

Fig 5. shows that in the “Work Environment,” where noise primarily comes from devices like key- boards, printers, and air conditioners, the SDR values of the three algorithms are quite similar. However, the GAN algorithm achieves the lowest MSE, indicat- ing its ability to more accurately reconstruct signals in this type of noisy environment.

Fig. 6.

Performance Evaluation of MSE, SDR, PESQ, and PSNR for Different Noise Reduction Algorithms in Classroom Environment

Fig 6. shows that in the “Classroom Environment,” the GAN algorithm significantly outperforms the other algorithms in terms of SDR and achieves the highest PESQ score. This suggests that the GAN algorithm is effective at handling background noise while pre- serving audio quality in quieter environments.

Fig. 7.

Performance Evaluation of MSE, SDR, PESQ, and PSNR for Different Noise Reduction Algorithms in Restaurant Environment

Fig 7. shows that in the “Restaurant Environment,” although the GAN algorithm shows relatively poor performance in SDR (-24.31), it achieves the highest PESQ score and has the lowest MSE. This indicates that, despite the high level of background noise, it still maintains good perceived audio quality. Comparison experiments across different environ- ments demonstrate that the hearing aid based on the phase preprocessing GAN model exhibits significant performance advantages in most scenarios.

Ⅴ. Conclusions

This study utilizes a phase preprocessing method for audio data in conjunction with a GAN model to optimize the audio quality of a self-made hearing aid, to achieve low-cost, high-quality hearing aids, and to assist more individuals with hearing impairments. The experimental results indicate that the proposed phase preprocessing method effectively enhances the re- storation of Mel spectrogram data, achieving a 2.31-fold improvement in SDR compared to the Griffin-Lim algorithm.

In terms of audio denoising, while autoencoders and denoising methods perform well in certain cases, they generally exhibit higher MSE. In contrast, GAN can capture the complexity of audio signals more ac- curately, resulting in superior MSE performance. The distinctive feature of the GAN model is its ability to capture complex data distributions, making it suitable for noise cancellation tasks across various environments. In this research, comparative results from real application scenarios, such as restaurant and office environments, demonstrate that GAN maintains good PESQ scores in “Restaurant” and “Classroom” environments, despite the presence of complex noise interference. The results indicate that GAN retain bet- ter PESQ scores even in more complex environments, and the algorithm not only suits intricate scenarios but also exhibits significant advantages in preserving sig- nal audio quality.

Biography

판주지에 (Zu jie Fan)

2020 : Bachelor’s, Computer En- gineering, Kyungpook Na- tional University

2022 : M.Sc, Computer Science and Engineering, Kyungpook National University

2022~Current : Ph. D. Compu- ter Science and Engineering, Kyungpook National University

<Research Interest> Machine Learning, Artificial Intelligence, Internet of Things

[ORCID:0000-0002-7102-6713]

Biography

김 재 수 (Jae soo Kim)

1985 : Bachelor’s, Electronic En- gineering,, Kyungpook Na- tional University

1987 : M.Sc Computer Science, Joong-Ang University

1999 : Ph. D. Computer Engin- eering, Kyungnam University

1987~1996 : Senior Researcher, Korea Electrical Research Institute

2003~2004 : Visiting Professor, The University of Cincinnati, OH, USA

1996~Current : 1996~Current: Professor, School of computer of Kyungpook National University

<Research Interest> Mobile Computing, Sensor Network, Internet of Things, UAV Network

[ORCID:0000-0003-2541-1669]

References

1 P. R. Dixon, D. Feeny, G. Tomlinson, S. Cushing, J. M. Chen, and M. D. Krahn, "Health-related quality of life changes associated with hearing loss," JAMA Otolaryngology - Head and Neck Surgery, vol. 146, no. 7, pp. 630-638, 2020. (https://doi.org/10.1001/jamaoto.2020.0674)doi:[[[10.1001/jamaoto.2020.0674]]]
2 D. Shi, W. S. Gan, B. Lam, and S. Wen, "Feedforward selective fixed-filter active noise control: Algorithm and implementation," IEEE/ACM Trans. Audio Speech and Lang. Process., vol. 28, pp. 1479-1492, 2020. (https://doi.org/10.1109/TASLP.2020.2989582)doi:[[[10.1109/TASLP.2020.2989582]]]
3 Y. Tsao, S. Matsuda, X. Lu, and C. Hori, "Speech enhancement based on deep denoising auto-encoder speech enhancement based on deep denoising autoencoder," Interspeech, vol. 2013, pp. 436-440, 2013.custom:[[[-]]]
4 A. Karthik and Dr. J. L. M. Iqbal, Fig. 6. Performance Evaluation of MSE, SDR, PESQ, and PSNR for Different Noise Reduction Algorithms in Classroom Environment Fig. 7. Performance Evaluation of MSE, SDR, PESQ, and PSNR for Different Noise Reduction Algorithms in Restaurant Environment 882 "Performance estimation based recurrentconvolutional encoder decoder for speech enhancement," Int. J. Advanced Sci. and Technol., vol. 29, no. 5, pp. 772-777, 2020. (https://www.researchgate.net/publication/3407 69086)custom:[[[-]]]
5 R. Y. L. AL-Taai, W. Xiaojun, and Z. Y, "Targeted voice enhancement by bandpass filter and composite deep denoising autoencoder," 2020 14th ICSPCS, pp. 1-6, 2020. (https://doi.org/10.1109/ICSPCS50536.2020.93 10026)doi:[[[10.1109/ICSPCS50536.2020.9310026]]]
6 A. Wali, Z. Alamgir, S. Karim, A. Fawaz, M. B. Ali, M. Adan, and M. Mujtaba, "Generative adversarial networks for speech processing: A review," in Computer Speech and Lang., vol. 72, no. 101308, 2022. (https://doi.org/10.1016/j.csl.2021.101308)doi:[[[10.1016/j.csl.2021.101308]]]
7 B. Hayes, J. Shier, G. Fazekas, A. McPherson, and C. Saitis, "A review of differentiable digital signal processing for music and speech synthesis," Frontiers in Signal Process., vol. 3, no. 1284100, pp. 102-112, 2024. (https://doi.org/10.3389/frsip.2023.1284100)doi:[[[10.3389/frsip.2023.1284100]]]
8 L. Gerlach, G. P. Vayá, and H. Blume, "A survey on application specific processor architectures for digital hearing aids," J. Signal Process. Syst., vol. 94, no. 11, pp. 1293-1308, 2022. (https://doi.org/10.1007/s11265-021-01648-0)doi:[[[10.1007/s11265-021-01648-0]]]
9 L. Ma, H. Huang, P. Zhao, and T. Su, "Acoustic echo cancellation by combining adaptive digital filter and recurrent neural network," arXiv preprint arXiv:2005, 09237v1, 2022. (https://doi.org/10.48550/arXiv.2005.09237)doi:[[[10.48550/arXiv.2005.09237]]]
10 X. Yan, Z. Yang, T. Wang, and H. Guo, "An iterative graph spectral subtraction method for speech enhancement," Speech Commun., vol. 123, pp. 35-42, 2020. (https://doi.org/10.1016/j.specom.2020.06.005)doi:[[[10.1016/j.specom.2020.06.005]]]
11 A. Nogales, S. Donaher, and Á. GarcíaTejedor, "A deep learning framework for audio restoration using convolutional/ deconvolutional deep autoencoders," Expert Syst. with Appl., vol. 230, 2023. (https://doi.org/10.1016/j.eswa.2023.120586)doi:[[[10.1016/j.eswa.2023.120586]]]
12 R. Buragohain, G. Ashishkumar, and C. V. R. Rao, "Single channel speech enhancement system using convolutional neural network based autoencoder for noisy environments," 2022 IEEE 19th India Council Int. Conf., pp. 1-6, Kochi, India, Nov. 2022. (https://doi.org/10.1109/INDICON56171.2022. 10039862)doi:[[[10.1109/INDICON56171.2022.10039862]]]
13 A. Nogales, J. C. Cayuela, and Á. J. GarcíaTejedor, "Analyzing the influence of diverse background noises on voice transmission: A deep learning approach to noise suppression," Applied Sci., vol, 14, no. 2, p. 740, 2024. (https://doi.org/10.3390/app14020740)doi:[[[10.3390/app14020740]]]
14 H. Lee, "Noise reduction system by using cnn deep learning model," Indonesian J. Electr. Eng. and Inf., vol, 9, no. 1, pp. 84-90, 2021. (https://doi.org/10.11591/ijeei.v9i1.2494)doi:[[[10.11591/ijeei.v9i1.2494]]]
15 P. H. Kuo, S. T. Lin, and J. Hu, "DNAEGAN: Noise-free acoustic signal generator by integrating autoencoder and generative adversarial network," Int. J. Distrib. Sensor Netw., vol, 16, no. 5, 2020. (https://doi.org/10.1177/1550147720923529)doi:[[[10.1177/1550147720923529]]]
16 D. Kitahara, "Frequency-undersampled shorttime fourier transform," arXiv preprint arXiv: 2010, 15029, 2020.custom:[[[-]]]
17 S. Hwang, J. Byun, J. Heo, J. Cha and Y. Park, "Performace comparative evaluation of Two-level skip connection for nested U-Net-based noise cancellation," Proc. of the Korean Institute of Broadcasting and Media Eng. Annual Conf., pp.192-194, Jeju, 20, 2022. 883custom:[[[-]]]

Received: October 24 2024

Revision received: December 27 2024

Accepted: January 7 2025

Published (Electronic): June 30 2025

Corresponding Author: Jaesoo Kim , kjs@knu.ac.kr

Zujie Fan, Kyungpook National University Department of computer science department, fzj@office.knu.ac.kr

Jaesoo Kim, Kyungpook National University Department of computer science department, kjs@knu.ac.kr