人工智慧如何自動辨識電腦生成新聞之研究

No Thumbnail Available

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

在人工智慧迅速發展的這個時代,開始有了機器自動生成新聞的技術,但機器生成的新聞內容並非全然正確時,檢視資訊的來源及內容就變成非常重要的一環,現今機器也能協助人類進行文章分類判斷,那機器到底為何能夠如此強大?本研究為探討在中文經濟新聞的範疇內,電腦生成的文章特徵是否與其他相關文獻中提及的電腦生成英文文章相同,而BERT對於經由語言學要素中,針對語意、語用及語法所設計的五個實驗進行修改後的中文文章,是否仍然可以準確的判斷出一篇文章為電腦生成或人工撰寫,並找到BERT判斷的關鍵因素為何,實驗結論如下:1. 無論是在英文或中文文章中,只要是電腦生成的文章,特徵基本上是相同的。2. BERT在判斷一篇中文新聞為人類撰寫或電腦生成時,可能判斷的依據主要在於語意及語法兩個部分。3. 一篇中文約300~350字的新聞,若只更動語意的部分,如將語句長度縮短,或是將逗點之間的句子隨機做位置上的調換,可使BERT準確度出現些許下降;若進而更動到語法的部分,例如使用Google翻譯,將一篇文章的詞彙結構打亂,則可以使BERT判斷的準確度大幅下降。
In the era of rapid development of artificial intelligence, the technology of automatic news generation by machine has been introduced, but when the news content generated by machine is not entirely correct, it becomes very important to examine the source and content of information.In this study, we investigate whether the characteristics of computer-generated articles are the same as those of computer-generated English articles mentioned in other related literature in the context of Chinese economic news, and whether BERT can still accurately determine whether an article is computer-generated or human-generated after five experiments designed for semantic, pragmatic and syntactic elements in linguistics. The conclusions of the experiments are as follows:1. The characteristics of computer-generated articles are basically the same whether they are in English or Chinese.2. When BERT determines whether a Chinese news article is human-written or computer-generated, it may be based on the semantic and syntactic components.3. For a Chinese news article of 300-350 words, if only the semantic part is changed, such as shortening the length of the sentences or randomly swapping the position of the sentences between commas, the accuracy of BERT can be slightly reduced; if the syntactic part is further changed, such as using Google Translate to mess up the word structure of an article, the accuracy of BERT's judgment can be significantly reduced. For example, if we use Google Translate to disrupt the word structure of an article, the accuracy of BERT judgment can be significantly reduced.

Description

Keywords

人工智慧, 文字生成, 自然語言處理, 語言學, Artificial Intelligence, Text Generation, Natural Language Processing, Linguistics

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By