棒球投手受傷的復發事件分析
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
過去棒球數據分析主要著眼於選手的實力評估,像是使用Wins Above Replace-ment(WAR)值的指標,這些指標用來衡量一名球員相對於其他替代球員所能帶來的勝場數。此外,這些數據也被用來預測球員未來的表現,以及預測球隊在未來賽季中的勝場數等等。這些分析屬於賽博計量學的範疇,著重於數據背後的數學和統計模型。然而,過去的數據分析卻忽略了選手健康狀況在比賽中的重要性。究竟是什麼因素導致選手容易受傷受傷?是變化球的使用?與球種有關?或者是年齡、球速、以及球員的出賽次數和相關疲勞指標呢?從2015年到2023年間,Major League Baseball(MLB)開始記錄有達成一定局數限制下的投手以及他們的相關數據,包括他們的變化球種、球速、年齡等等。而一名投手的職業生涯中,可能發生多次受傷事故,這就意味著一位投手會有多個存活事件,為了探討這個問題,本論文使用存活分析中常見的復發事件邊際模型,包括Anderson-Gill 模型(AG)、Prentice-Williams-Peterson模型(PWP)和Wei-Lin-Weissfeld模型(WLW)。這些模型允許我們對擁有多次受傷事件的投手進行相關變數分析,從而更全面地理解各種因素對於投手健康和受傷風險的影響。我們也根據三個模型間的特性,比較模型間的差異。而在變數篩選時,將變數進行分類後,建立最佳模型以預測受傷風險。
In the past, baseball data analysis has primarily focused on evaluating player performance,including metrics such as Wins Above Replacement (WAR), which measures the number of wins a player contributes compared to a replacement-level player. Additionally, these metrics are used to predict future player performance and forecast a team’s wins in upcoming seasons. This analysis falls within the realm of sabermetrics, emphasizing the mathematical and statis-tical models behind the data. However, past data analysis has overlooked the importance of player health during games. What factors contribute to player injuries? Is it related to the use of breaking balls? Pitch types? Or perhaps age, pitch velocity, player appearances, and related fatigue indicators? From 2015 to 2023, the data on pitchers who achieved a certain inning threshold in Major League Baseball (MLB) were recorded, including their pitch types, velocity, age, and more. Over a pitcher’s career, multiple injury incidents may occur, meaning a pitcher will have multiple events. To address this, we turned to the use of common recurrent event marginal models in survival analysis, including the Anderson-Gill model (AG), Prentice- Williams-Peterson model (PWP),and Wei-Lin-Weissfeld model (WLW). These models allow us to analyze relevant variables for pitchers with multiple events comprehensively, thereby gain-ing a more comprehensive understanding of the various factors affecting pitcher health and in-jury risk. We determined which model to use based on the characteristics of the data among the three models and compared the performance of three models intensively. During variable selection and categorizing the variables, the most appropriate combinations were identified to establish the optimal model for prediction of injury risks.
In the past, baseball data analysis has primarily focused on evaluating player performance,including metrics such as Wins Above Replacement (WAR), which measures the number of wins a player contributes compared to a replacement-level player. Additionally, these metrics are used to predict future player performance and forecast a team’s wins in upcoming seasons. This analysis falls within the realm of sabermetrics, emphasizing the mathematical and statis-tical models behind the data. However, past data analysis has overlooked the importance of player health during games. What factors contribute to player injuries? Is it related to the use of breaking balls? Pitch types? Or perhaps age, pitch velocity, player appearances, and related fatigue indicators? From 2015 to 2023, the data on pitchers who achieved a certain inning threshold in Major League Baseball (MLB) were recorded, including their pitch types, velocity, age, and more. Over a pitcher’s career, multiple injury incidents may occur, meaning a pitcher will have multiple events. To address this, we turned to the use of common recurrent event marginal models in survival analysis, including the Anderson-Gill model (AG), Prentice- Williams-Peterson model (PWP),and Wei-Lin-Weissfeld model (WLW). These models allow us to analyze relevant variables for pitchers with multiple events comprehensively, thereby gain-ing a more comprehensive understanding of the various factors affecting pitcher health and in-jury risk. We determined which model to use based on the characteristics of the data among the three models and compared the performance of three models intensively. During variable selection and categorizing the variables, the most appropriate combinations were identified to establish the optimal model for prediction of injury risks.
Description
Keywords
存活分析, 復發事件, 邊際模型, 棒球, 投手, survival analysis, recurrent event, marginal model, baseball, pitcher