自動樂譜辨識與打擊樂機器人系統

No Thumbnail Available

Date

2020

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

光學樂譜辨識系統是一套針對樂譜影像進行影像辨識的系統,在樂譜影像中,音符是用以記錄音階和節拍的資訊,在過去許多的研究和實驗當中,針對高解析度的樂譜辨識系統已經達到成熟的階段。然而,基於相機影像的樂譜辨識會受到環境光線、角度和模糊的影響,故仍有進一步研究的必要,我們初次嘗試將深度學習架構應用在基於相機影像的樂譜辨識系統。首先,我們使用線偵測演算法在即時攝影畫面中自動偵測樂譜影像,並找出樂譜當中的五線譜範圍,因為我們只專注於五線譜當中的音符資訊,為了完成這個任務,我們使用霍夫線偵測演算法並取得每一行五線譜的範圍。接下來,為偵測、切割及辨識每一個音符,我們將每一行獨立的五線譜送至基於Darknet53網路之YOLO v3的檢測模型當中,目前可以辨識六類的音符分類名稱分別為全音符、二分音符、四分音符、八分音符、四分休止符和二分休止符,再者,將YOLO v3所偵測到的音符根據樂譜中的位置進行排序,並送至卷積神經網路用以辨識音階,現階段我們可以辨識C3到F4共十一類的音階,最後我們透過RS232連接Delta機械手臂進行樂器的演奏。在光學樂譜辨識的發展中,我使用霍夫線偵測樂譜中每行的五線譜範圍,如此我們可以避免歌詞或圖案的雜訊,減少辨識的錯誤。不僅如此,透過自動化五線譜偵測所取得的樂譜影像使用深度學習的架構進行辨識,並在介面上顯示音階和節拍,至終,我們使用機械手臂進行演奏。
Optical music recognition (OMR) is a system for music score recognition. In music scores, notes are utilized to record pitch and duration information. After much research and experimentation, the recognition of high-resolution music scores is in a mature state. However, the research of the recognition of camera-based music scores is needed because of different illumination and perspective distortions. Therefore, we explored the utilization of deep learning architectures for music object recognition system. At the first step, we performed Hough lines detection algorithm to automatically detect scores, find the staff areas and get the boundary of each staff in real-time because we just needed to focus on the information in these areas. Then, in order to detect, recognize, and make a segmentation of musical notes, our approach was to feed each individual staff row into YOLOv3, which is based on Darknet-53, to classify the notes into six categories: whole notes, half notes, quarter notes, eighth notes, half rest, and quarter rest. After that, we utilized a convolutional neural network (CNN) to recognize the pitch. Currently, eleven classes are considered: pitch from C3 to F4. Finally, we employed one of the Delta robot’s serial ports (RS232) for communication. In the development of the OMR system, by using Hough lines determining for each staff area, we can avoid drawings, text and thus reduce detection errors. Moreover, we utilized deep learning architectures for music object recognition. The proposed system only needs a picture of music score by a webcam as input, and then it can automatically detect the staff area, as well as output the duration and pitch of the notes. Finally, we utilized robotic arms to play musical instruments.

Description

Keywords

樂譜辨識, Delta機械手臂, 深度學習, 影像處理, music score recognition, delta robot, deep learning, digital image processing

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By