基於圖像串接和深度學習的改良生咖啡豆分類方法
No Thumbnail Available
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
為了解決生咖啡豆在影像辨識上的分類困難並提升精確度,這篇論文提出了一種通過串接不同的影像增強技術來融合不同的特徵提取演算法,以提高對生咖啡豆的辨識準確率。為了從原始影像中獲得各種關鍵特徵,我們選用了自適應閾值、位元平面分割、黑帽運算、Canny邊緣偵測、灰階、直方圖等化、Laplacian濾波、頂帽運算與非銳化濾鏡九種常見的影像增強方法。我們提出先在原本九種影像增強算法中挑選出與基準真相相關性較高的方法,並且僅將原始影像的RGB影像平面替換成相關性較高的影像處理方法,藉著多種特徵提升模型辨識度。在這項研究中,我們使用MobileViT進行實驗,最後選擇相關性較高的處理方式作為特徵融合的素材,經過影像串接產生的影像資料集作為新的輸入重新訓練。我們將不進行任何影像增強的分類方法視為基準。在二分法中,位元平面分割、直方圖等化和非銳化濾鏡的組合達到了96.9%的準確率,相對於原始方法提高了約5.5%。如果使用去除背景的相同資料集,相同的組合可以達到了97.0%的準確率;當我們選擇三分法進行實驗時,同樣都是由位元平面分割、直方圖等化和非銳化濾鏡的組合,分別達到了96.8%以及97.4%的準確率,較原始方法提升6.7%與4.9%。最後我們使用MobileNetV3驗證研究結果,在二分法的情況下,相同的影像增強組合分別在未去除背景與去除背景的影像可以獲得最高的99.12%與99.21%的準確率,相較原始方法有0.39%與0.44%的提升;如果以三分法再次進行實驗,與原始方法比較,大約分別有0.92%以及0.79%的提升,取得了98.73%與99.25%的準確率。
In order to address the classification challenges and improve accuracy in recognizing coffee green beans through image classification, this paper propose a method that enhances the classification accuracy by concatenating different image enhancement techniques, merging features from various algorithms. To extract crucial features from original images, we chose nine common image enhancement methods, Adaptive threshold, bit-plane slicing, black hat, Canny edge detection, grayscale, histogram equalization, Laplacian, top hat, and unsharp masking. We selected the methods with higher correlations corresponding to the ground truth from the nine image enhancement algorithms. We replaced the RGB channel of the original image with the image processing methods that exhibit higher correlation coefficients, thereby enhancing the model's recognition capability with multiple features. In this study, we conducted experiments using the MobileViT. We selected the processing methods with higher correlations as the materials for feature fusion. The image dataset generated through image concatenation as new inputs for training. We considered the method without any preprocessing as the baseline. In the dichotomy case, the combination of bit-plane slicing, histogram equalization, and unsharp masking achieved an accuracy of 96.9%, representing an improvement of approximately 5.5% compared to the original method. If using the same dataset with background removal, the same combination achieved an accuracy of 97.0%. In the trichotomy case, the same combination achieved accuracies of 96.8% and 97.4%, respectively, representing improvements of 6.7% and 4.9% over the original method. Finally, we validated the research results using MobileNetV3. In the dichotomy case, the same combination of image enhancement algorithms achieved the highest accuracies of 99.12% and 99.21% for image with and without background removal, respectively, representing improvements of 0.39% and 0.44% compared to the original method. In the trichotomy case, compared to the original method, there were improvements of approximately 0.92% and 0.79%, achieving accuracies of 98.73% and 99.25%, respectively.
In order to address the classification challenges and improve accuracy in recognizing coffee green beans through image classification, this paper propose a method that enhances the classification accuracy by concatenating different image enhancement techniques, merging features from various algorithms. To extract crucial features from original images, we chose nine common image enhancement methods, Adaptive threshold, bit-plane slicing, black hat, Canny edge detection, grayscale, histogram equalization, Laplacian, top hat, and unsharp masking. We selected the methods with higher correlations corresponding to the ground truth from the nine image enhancement algorithms. We replaced the RGB channel of the original image with the image processing methods that exhibit higher correlation coefficients, thereby enhancing the model's recognition capability with multiple features. In this study, we conducted experiments using the MobileViT. We selected the processing methods with higher correlations as the materials for feature fusion. The image dataset generated through image concatenation as new inputs for training. We considered the method without any preprocessing as the baseline. In the dichotomy case, the combination of bit-plane slicing, histogram equalization, and unsharp masking achieved an accuracy of 96.9%, representing an improvement of approximately 5.5% compared to the original method. If using the same dataset with background removal, the same combination achieved an accuracy of 97.0%. In the trichotomy case, the same combination achieved accuracies of 96.8% and 97.4%, respectively, representing improvements of 6.7% and 4.9% over the original method. Finally, we validated the research results using MobileNetV3. In the dichotomy case, the same combination of image enhancement algorithms achieved the highest accuracies of 99.12% and 99.21% for image with and without background removal, respectively, representing improvements of 0.39% and 0.44% compared to the original method. In the trichotomy case, compared to the original method, there were improvements of approximately 0.92% and 0.79%, achieving accuracies of 98.73% and 99.25%, respectively.
Description
Keywords
深度學習, 影像辨識, 非銳化濾鏡, 邊緣偵測, MobileViT, MobileNetV3, Deep Learning, Image Recognition, Unsharp masking, Edge Detection, MobileViT, MobileNetV3