個別研究品質好壞有關係嗎?類比教學對於學生科學概念改變成效後設分析
No Thumbnail Available
Date
2021-06-??
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
國立臺灣師範大學教育心理學系
Department of Educational Psychology, NTNU
Department of Educational Psychology, NTNU
Abstract
隨著「以證據為基礎的教育」理念的興起,透過量化研究統合,以提出科學實徵證據,逐漸受到重視,進而使得如何判讀個別研究品質好壞,也成為重要議題。本研究採用系統性文獻回顧與後設分析,針對類比教學主題進行研究統合,目的除探討類比教學對於學生科學概念改變的統合成效外,焦點在於探討一般學者慣用指標(出版類型)與其它研究品質評估指標對於統合結果的影響,以尋求能兼顧效度理論與實務可行之替代性個別研究品質評估指標。本研究結果發現:1. 類比教學整體具有中等教學效果;2. 在未考量個別研究品質時,單一類比教學成效傾向低於非單一類比教學,如類比橋、多重類比教學成效,而針對國小學習階段的參與者使用非單一類比教學,具有最佳教學成效,然而,若同時考量不同個別研究品質時,其顯著調節效果會變成不顯著,顯示研究品質評估具有相當影響力;3. 以「出版類型」作為個別研究品質評估,容易同時涉及出版偏誤議題,而本文建議使用替代性研究品質評估指標,即同時符合前測表現基準可比較性、評量工具信效度、與基本參與者人數等三指標,似乎更能兼顧效度理論、實徵結果與實務可行性。文末,提出幾點結論與建議,供未來研究者參考。
As evidence-based education evolves, an increasing body of evidence has highlighted the issue of constructing high-quality indicators in the synthesis of quantitative research. However, what is considered optimal remains unclear and a topic of debate. Numerous researchers assess the single indicator of primary research such as the publication, pretest performance, reliability and validity, and sample size. This approach can result in bias. Therefore, this study proposed alternative quality indicators to replace the single indicator with suitable evidence synthesis. Research on analogy instruction was integrated through a systematic review and meta-analysis, and the effects of analogy instruction were explored through quantitative research synthesis.With respect to literature reviews, document retrieval targets journal articles, master's theses, and doctoral dissertations on the basis of references to relevant retrospective studies. Regarding analogy instruction, systematic literature collection and positioning are performed to avoid missing relevant articles and causing sampling bias. Moreover, the participants included in the present study ranged from preschoolers to elementary, junior high school, vocational high school, and university students. The analogy instruction–related topics discussed included single analogies, analogy bridges, and multiple analogies. However, comments on or introductions of analogy instruction and scientific concept learning performance were excluded from analysis, as were articles on the development at different stages and other irrelevant topics. Because quantitative meta-analysis was conducted, qualitative studies were also disregarded, as were publications that failed to provide the information required for calculating effect sizes.In meta-analyses, Hedges' g is used as a measure of the effect size of standardized mean differences (Hedges, 1982). Owing to pretest–posttest differences in data provided by different institutions, the effect sizes can be roughly classified as those of the mean raw score and the mean-covariate adjusted score (Morris & DeShon, 2002; WWC, 2020b). When the information on adjustments to pretest data, including statistics from analysis of covariance, adjusted means, and standard deviations, was provided, the effect size of the mean-covariate adjusted score was used. According to WWC recommendations (2020b), if this information was not provided but other data (e.g., pretest and posttest means and standard deviations) were and the pretest– posttest assessment tools were consistent, difference-in-differences was used. To account for pretest–posttest correlations, difference-in-differences adjustment was performed. When no pretest data were provided or only posttest-related statistics (e.g., from analysis of variance) were provided, the effect size was calculated from the mean-raw score.Regarding the integration effect of analogy instruction in promoting students' scientific concept learning performance, a medium immediate effect size of 0.59 was obtained. The delayed effect size was reduced to 0.39 (medium).Regarding the moderators of analogy instruction, single analogy regression failed to reach significance, but the p-value was 0.08, that appeared to become more significant or nearly significant. The effect size of analogy instruction for the elementary school students was 0.88, whereas that for students in junior high school and above was 0.52, constituting a significant difference.With regard to seeking more appropriate indicators for assessing the quality of primary research, three indicators are proposed herein: baseline equivalence of the pretest, reliability and validity of assessment tools, and sample size. For baseline equivalence, the effect size was not significantly different regardless of whether the baseline equivalence of the pretest was even. Regarding the selection of effect calculations, even if the attrition rate was excessively high (more than 10%) or the pretest difference was greater than 0.5 standard deviations, they were classified as nonconforming indicators. By this standard, as long as statistics from covariate analysis or pretest and posttest data were provided, the effect size was estimated by using the mean score after covariate adjustment. This appeared to compensate for both types of possible differences, which in turn led to nonsignificant results.Regarding the reliability and validity of assessment tools, the regression coefficient β was −0.57 (pβ < 0.0001). The effect size of 1.06 was large. The result indicates that obtaining a large effect size is simple if the reliability and validity of assessment tools are not addressed. Regarding the sample size, the regression coefficient β for the basic number of participants was −0.81 (pβ = 0.003), demonstrating that a favorable effect size of 1.36 could be easily obtained for small-scale studies on analogy instruction. As Slavin (2008) noted, small-scale studies are likely to obtain extremely positive or extremely negative result.In terms of meeting the three indicators, primary research considering all levels of validity had a medium effect size of 0.49. Similar results were obtained regardless of the quality assessment method used. The effect sizes of the studies that did not meet the quality assessment standards exceeded those of the studies that did, indicating that the effects of analogy instruction in primary research may be overestimated if quality assessment is not considered.Significant effects were observed for both analogy instruction approaches and learning stages. The regression coefficient β values were 0.27 (pβ = 0.03) and −0.41 (pβ = 0.01), respectively. For the elementary school learners, the effect size of 0.77 for single analogies was large. The effect size for approaches other than single analogies was larger (1.04). For single analogies and nonsingle analogies among students in junior high school and above, the effect sizes were 0.36 and 0.63, respectively—roughly in the upper and lower intervals of medium effect sizes. Therefore, different analogy instruction topics had different effect sizes for students at different learning stages, with better results for topics other than single analogies.Regarding publication type, multiples analysis was performed on moderators of the analogy instruction topics and learning stages. A tendency toward positive effects was observed. When publication type is controlled and two moderators are examined at the same time, publication type had little influence on analogy instruction.Regarding the baseline equivalence of the pretest, the comparable regression coefficient β values were 0.07 (pβ = 0.72) and 0.03 (pβ = 0.89) to the analogy instruction topics and learning stages, respectively. Neither were significant. According to the study design and the information provided, the initial selection of an appropriate calculation scale can facilitate the adjustment of the comparability of the baseline equivalence of the pretest.When all three indicators (i.e., baseline equivalence, reliability and validity of assessment tools, and sample size) were met, the mentioned conditions exerted impacts on analogy instruction topics. When the conditions were not fulfilled, the influence of other moderators or the effects of analogy instruction could be misestimated or overestimated, respectively.When all three indicators (i.e., baseline equivalence, reliability and validity of assessment tools, and sample size) were met, the mentioned conditions exerted impacts on analogy instruction topics. When the conditions were not fulfilled, the influence of other moderators or the effects of analogy instruction could be misestimated or overestimated, respectively.The conclusions are as follows: First, analogy instruction had an overall positive effect. Second, when the impacts of quality indicators were not controlled, single analogies were less efficient than other analogy instruction topics, such as multiple and bridging analogies. Furthermore, the elementary school students performed better, especially when they received instruction on topics other than single analogies. Finally, publication type is not a favorable indicator in the quality assessment of primary research because of mixed impacts with publication bias. However, alternatives can be well defined both theoretically and empirically through the examination of baseline equivalence, the reliability and validity of assessment tools, and the establishment of basic sample size requirements.
As evidence-based education evolves, an increasing body of evidence has highlighted the issue of constructing high-quality indicators in the synthesis of quantitative research. However, what is considered optimal remains unclear and a topic of debate. Numerous researchers assess the single indicator of primary research such as the publication, pretest performance, reliability and validity, and sample size. This approach can result in bias. Therefore, this study proposed alternative quality indicators to replace the single indicator with suitable evidence synthesis. Research on analogy instruction was integrated through a systematic review and meta-analysis, and the effects of analogy instruction were explored through quantitative research synthesis.With respect to literature reviews, document retrieval targets journal articles, master's theses, and doctoral dissertations on the basis of references to relevant retrospective studies. Regarding analogy instruction, systematic literature collection and positioning are performed to avoid missing relevant articles and causing sampling bias. Moreover, the participants included in the present study ranged from preschoolers to elementary, junior high school, vocational high school, and university students. The analogy instruction–related topics discussed included single analogies, analogy bridges, and multiple analogies. However, comments on or introductions of analogy instruction and scientific concept learning performance were excluded from analysis, as were articles on the development at different stages and other irrelevant topics. Because quantitative meta-analysis was conducted, qualitative studies were also disregarded, as were publications that failed to provide the information required for calculating effect sizes.In meta-analyses, Hedges' g is used as a measure of the effect size of standardized mean differences (Hedges, 1982). Owing to pretest–posttest differences in data provided by different institutions, the effect sizes can be roughly classified as those of the mean raw score and the mean-covariate adjusted score (Morris & DeShon, 2002; WWC, 2020b). When the information on adjustments to pretest data, including statistics from analysis of covariance, adjusted means, and standard deviations, was provided, the effect size of the mean-covariate adjusted score was used. According to WWC recommendations (2020b), if this information was not provided but other data (e.g., pretest and posttest means and standard deviations) were and the pretest– posttest assessment tools were consistent, difference-in-differences was used. To account for pretest–posttest correlations, difference-in-differences adjustment was performed. When no pretest data were provided or only posttest-related statistics (e.g., from analysis of variance) were provided, the effect size was calculated from the mean-raw score.Regarding the integration effect of analogy instruction in promoting students' scientific concept learning performance, a medium immediate effect size of 0.59 was obtained. The delayed effect size was reduced to 0.39 (medium).Regarding the moderators of analogy instruction, single analogy regression failed to reach significance, but the p-value was 0.08, that appeared to become more significant or nearly significant. The effect size of analogy instruction for the elementary school students was 0.88, whereas that for students in junior high school and above was 0.52, constituting a significant difference.With regard to seeking more appropriate indicators for assessing the quality of primary research, three indicators are proposed herein: baseline equivalence of the pretest, reliability and validity of assessment tools, and sample size. For baseline equivalence, the effect size was not significantly different regardless of whether the baseline equivalence of the pretest was even. Regarding the selection of effect calculations, even if the attrition rate was excessively high (more than 10%) or the pretest difference was greater than 0.5 standard deviations, they were classified as nonconforming indicators. By this standard, as long as statistics from covariate analysis or pretest and posttest data were provided, the effect size was estimated by using the mean score after covariate adjustment. This appeared to compensate for both types of possible differences, which in turn led to nonsignificant results.Regarding the reliability and validity of assessment tools, the regression coefficient β was −0.57 (pβ < 0.0001). The effect size of 1.06 was large. The result indicates that obtaining a large effect size is simple if the reliability and validity of assessment tools are not addressed. Regarding the sample size, the regression coefficient β for the basic number of participants was −0.81 (pβ = 0.003), demonstrating that a favorable effect size of 1.36 could be easily obtained for small-scale studies on analogy instruction. As Slavin (2008) noted, small-scale studies are likely to obtain extremely positive or extremely negative result.In terms of meeting the three indicators, primary research considering all levels of validity had a medium effect size of 0.49. Similar results were obtained regardless of the quality assessment method used. The effect sizes of the studies that did not meet the quality assessment standards exceeded those of the studies that did, indicating that the effects of analogy instruction in primary research may be overestimated if quality assessment is not considered.Significant effects were observed for both analogy instruction approaches and learning stages. The regression coefficient β values were 0.27 (pβ = 0.03) and −0.41 (pβ = 0.01), respectively. For the elementary school learners, the effect size of 0.77 for single analogies was large. The effect size for approaches other than single analogies was larger (1.04). For single analogies and nonsingle analogies among students in junior high school and above, the effect sizes were 0.36 and 0.63, respectively—roughly in the upper and lower intervals of medium effect sizes. Therefore, different analogy instruction topics had different effect sizes for students at different learning stages, with better results for topics other than single analogies.Regarding publication type, multiples analysis was performed on moderators of the analogy instruction topics and learning stages. A tendency toward positive effects was observed. When publication type is controlled and two moderators are examined at the same time, publication type had little influence on analogy instruction.Regarding the baseline equivalence of the pretest, the comparable regression coefficient β values were 0.07 (pβ = 0.72) and 0.03 (pβ = 0.89) to the analogy instruction topics and learning stages, respectively. Neither were significant. According to the study design and the information provided, the initial selection of an appropriate calculation scale can facilitate the adjustment of the comparability of the baseline equivalence of the pretest.When all three indicators (i.e., baseline equivalence, reliability and validity of assessment tools, and sample size) were met, the mentioned conditions exerted impacts on analogy instruction topics. When the conditions were not fulfilled, the influence of other moderators or the effects of analogy instruction could be misestimated or overestimated, respectively.When all three indicators (i.e., baseline equivalence, reliability and validity of assessment tools, and sample size) were met, the mentioned conditions exerted impacts on analogy instruction topics. When the conditions were not fulfilled, the influence of other moderators or the effects of analogy instruction could be misestimated or overestimated, respectively.The conclusions are as follows: First, analogy instruction had an overall positive effect. Second, when the impacts of quality indicators were not controlled, single analogies were less efficient than other analogy instruction topics, such as multiple and bridging analogies. Furthermore, the elementary school students performed better, especially when they received instruction on topics other than single analogies. Finally, publication type is not a favorable indicator in the quality assessment of primary research because of mixed impacts with publication bias. However, alternatives can be well defined both theoretically and empirically through the examination of baseline equivalence, the reliability and validity of assessment tools, and the establishment of basic sample size requirements.