科學建構反應評量之發展與研究
No Thumbnail Available
Date
2016
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
本研究主要在發展「科學建構反應評量」,並發展評閱科學能力之「科學建構反應評量規準」。全評量包含「科學知識的記憶與了解」、「科學程序的應用與分析」、「科學邏輯的論證與表達」以及「問題解決的評估與創造」四個分評量,共計32 題建構題。研究者透過項目分析、建構效度及信度的檢驗,分析處理實徵資料,以檢視評量的信效度。分析結果顯示,評分者內之一致性值均> .9 ,可見評分者內的一致性相當穩定。其次,評分者間之Kendall ω 和諧係數值 > .9 ,P 值< .05,達顯著相關,顯示評分者間的評分結果相當一致。再者,評分者嚴苛度之多面向M卡方考驗未達顯著水準 (χ2 = 5.01,df = 3, p = .171 ),分散指標信度 (separation reliability) 為.57 ,表示評分者間具有一致性,與古典測驗理論的分析結果一致。另RSM 及PCM 模式比較之卡方考驗則達顯著水準,表示評分閾值 (threshold) 存在差異,未達理想水準。然將RSM 與PCM 所估計出來的Deviance 進行BIC 的轉換,結果顯示RSM 較為適配,顯示評分者間有相同的評分閾值。基此,後續仍應持續蒐集資料,進一步確認評分者閾值嚴苛度的一致性。此外,題本之內部一致性,均 > .8 ,全評量α 則在.90 以上,顯示SCRA 之Cronbach’s α 呈現相當不錯的範圍。最後,根據CFA 分析結果顯示,實徵資料尚且支持「科學建構反應評量」理論概念模式。「科學知識的記憶與了解」、「科學程序的應用與分析」、「科學邏輯的論證與表達」以及「問題解決的評估與創造」所檢測四個一階潛在因素,可被二階因素之「科學能力」解釋的百分比分別為.92 、.56 、.46 、.46。
This study aims to advance the Scientific Constructed-Response Assessments (SCRA), with a focus on the Rubric of Scientific Constructed-Response Assessments (RSCRA) designed to evaluate the extent of scientific comprehension. To this end, the optics is just the scientific unit of the assessment, consisting of 32 open-ended items categorized into 4 subscales: the memory and understanding of scientific knowledge,the application and analysis of scientific procedure,the demonstration and expression of scientific logic, and the evaluation and creating of problem solving. Item analysis, the Cronbach’s α of the intra-rater was bigger than .9, and showed the intra-rater consistency good. The Kendall coefficient of concordance of the inter-rater reliability was bigger than .9, the value of P was smaller than .5, and showed the inter-rater consistency good. Another, The analysis of many-facet Rasch measurement (MFRM) shows that the chi- square test of rater sevirity was not significant. This means the same of inter-rater consistency and the results are consistent with the classical test theory. And the comparison of the rating scale model (RSM) and the partial credit model (PCM) shows that the chi- square test of rater sevirity was significant.This means the difference of inter-rater threshold. However, the Bayesian Information Criterions (BIC) of RSM and PCM shows that the RSM was goodness-of-fit, and the inter-rater threshold can be considered the same. Therefore, we should continue to collect informations to confirm the consistency of inter-rater threshold. Furthermore, the Cronbach’s α of the items were bigger than .8 and also were within acceptable range. Finaly, Second-order confirmatory factor analysis shows that there was an acceptable goodness-of-fit among the SCRA. The SCRA accounted for .92, .56, .46, and .46 of the variance associated with the first test of 4 subscale.
This study aims to advance the Scientific Constructed-Response Assessments (SCRA), with a focus on the Rubric of Scientific Constructed-Response Assessments (RSCRA) designed to evaluate the extent of scientific comprehension. To this end, the optics is just the scientific unit of the assessment, consisting of 32 open-ended items categorized into 4 subscales: the memory and understanding of scientific knowledge,the application and analysis of scientific procedure,the demonstration and expression of scientific logic, and the evaluation and creating of problem solving. Item analysis, the Cronbach’s α of the intra-rater was bigger than .9, and showed the intra-rater consistency good. The Kendall coefficient of concordance of the inter-rater reliability was bigger than .9, the value of P was smaller than .5, and showed the inter-rater consistency good. Another, The analysis of many-facet Rasch measurement (MFRM) shows that the chi- square test of rater sevirity was not significant. This means the same of inter-rater consistency and the results are consistent with the classical test theory. And the comparison of the rating scale model (RSM) and the partial credit model (PCM) shows that the chi- square test of rater sevirity was significant.This means the difference of inter-rater threshold. However, the Bayesian Information Criterions (BIC) of RSM and PCM shows that the RSM was goodness-of-fit, and the inter-rater threshold can be considered the same. Therefore, we should continue to collect informations to confirm the consistency of inter-rater threshold. Furthermore, the Cronbach’s α of the items were bigger than .8 and also were within acceptable range. Finaly, Second-order confirmatory factor analysis shows that there was an acceptable goodness-of-fit among the SCRA. The SCRA accounted for .92, .56, .46, and .46 of the variance associated with the first test of 4 subscale.
Description
Keywords
科學建構反應評量, 評分者一致性, RSM 與PCM 模式比較, 多面向Rasch 測量模式, 驗證性因素分析, Scientific Constructed-Response Assessments, rater consistency, the model comparison of RSM& PCM, many-facet Rasch measurement, confirmatory factor analysis