Modeling the Ability to Return Serve in ATP Tennis

22 min readJul 27, 2022

An older and more scientific article I had written a couple of years ago

Abstract

When looking at what aspects of a player’s physical nature and in-game play, what factors are significant in modeling the success a player has at returning serve? We looked at 102 male professional tennis players from the 2019 season and looked at their height, weight, ace against percentage, and return in-play points won percentage to determine which of these factors are indicative of success in the percentage of points won while returning and the percentage of games won while returning. We used a multiple regression model with a backward selection method to determine the significance of each of the individual predictors. We found that ace against percentage as well as return in-play points won percentage was significant in predicting both the rate at which a player wins a game while returning and the rate at which they win a point while returning. We found that weight and height were mostly insignificant in explaining our model. We discuss how these findings affect how weight and height are viewed in the tennis world and what we can learn about the importance of groundstrokes.

Introduction

In men’s professional tennis, there are always a few players that stand out from the rest of the pack through their ability to return the serve. Serving and returning occur at the most crucial part of each point, right at the very beginning. Many of tennis’ greats are considered to be some of the best returners of all time such as Novak Djokovic and Rafael Nadal. Other players with their lack of return skills have been unable to reach tennis greatness, such as Ivo Karlovic and John Isner, due to their lack of ability to return serve well. As such a key factor in determining the outcome and success of a player’s career, it is important to study to further understand the game of tennis.

This study will attempt to shed light on what aspects of a men’s professional player’s physical nature and aspects of their in-game play contribute to their success or lack thereof at returning serve. This will be measured by their percentage of return games won and the percentage of points won while returning serve. These two responses should give an accurate representation of the general ability of a returner. There are several predictors that we will be using and they include the height and weight of the player as well as their ace against percentage and their return in-play points percentage. Through these predictors, we will hopefully determine aspects of a player that are important in determining their success returning the serve.

We are undertaking this study to determine premonitions, game, and physical tendencies that can project future success at returning serve in men’s professional tennis. The results have a possibility to help with understanding what is successful for returners and what traits make differences that help them succeed. This can lead to coaching changes and the advent of different strategies and tactics that fundamentally change how the return of serve is approached in tennis. It can also improve further analysis into returning and how it stacks up against particular opponents’ serves which can benefit players and coaches everywhere.

Literature Review

Growing up in what many would consider the “Golden Age of Men’s Tennis”, I experienced watching the three greatest men’s players of all time (in my opinion), square off year after year. Due to my slight inclination to being somewhat defiant to the general group consensus, my favorite of the three became Novak Djokovic because he was never as popular and as liked by the tennis community as Federer or Nadal. Djokovic separated himself from the rest of the pack due to his movement, speed, and notably his ability to return serve like no other in the history of tennis. His innate ability to return serve always made me wonder; what makes someone a great returner? My goal for this research is to determine what predicting factors such as speed, height, racquet head speed, spin rotation, etc contribute to a player winning a higher percentage of points when returning serve than their counterparts.

Summaries

One study looked at qualitative analysis and how previous knowledge, in-game awareness, and anticipatory skills can influence a player’s aptitude for returning serve (Vernon, Farrow, Reid, 2018). They looked at six former and two current professional ATP (Association of Tennis Professionals) players and asked them a series of interview questions regarding the anticipation, predictions, and feel for the game that players look for when returning serve(Vernon, Farrow, Reid, 2018). From the interviews, they determined 9 overarching themes to the anticipatory behavior returning serves and various smaller themes that help contribute to the understanding of the study (Vernon, Farrow, Reid, 2018). These overarching themes (referred to as higher-order themes by the study) include mentality/confidence, returning characteristics, kinematic information sources, etc (Vernon, Farrow, Reid, 2018). Overall the study concluded that they identified areas through qualitative observation that they can further study quantitatively such as “kinematic and contextual information sources” (Vernon, Farrow, Reid, 2018).

Another I looked at dealt with movement on Hard Court surfaces with an emphasis on groundstrokes and where they were directed on the court (Martinez-Gallego, Guzman, James, Pers, Ramon-Llin, Vuckovic, 2013). This study was conducted at the ATP Valencia tournament in 2011 when the researchers used a computer tracking system called SAGIT to track eight matches and the players’ movements throughout their matches during the tournament (Martinez-Gallego, Guzman, James, Pers, Ramon-Llin, Vuckovic, 2013). Specifically, they looked at time spent on the “offensive” or “defensive” parts of the court, distance covered, as well as foot speed on the court (Martinez-Gallego, Guzman, James, Pers, Ramon-Llin, Vuckovic, 2013). The researchers used descriptive statistics and non-parametric tests, as well as Wilcoxon, signed ranks tests to determine differences between the winners and losers (Martinez-Gallego, Guzman, James, Pers, Ramon-Llin, Vuckovic, 2013). They concluded through their statistical analysis that the winner of individual games covered more distance in the “offensive zones” than the losers, and the loser covered more ground in the “defensive zones” (Martinez-Gallego, Guzman, James, Pers, Ramon-Llin, Vuckovic, 2013). However, they also concluded that the study needs refining regarding the simplicity of the zones and the non-gradient way of defining the specific zones (Martinez-Gallego, Guzman, James, Pers, Ramon-Llin, Vuckovic, 2013).
One more study I looked at was regarding “efficiency” when serving which references seven indicators that the researchers indicate are important to determine the effectiveness of the server (offense) and five indicators that refer to the effectiveness of the returner (defense) (Glass, Kenjegalieva, Taylor, 2014). The researchers looked at statistics from the 2009 ATP season as their numerical basis for the study. A VRS non-parametric model was used to model the regression as they added in some more factors such as height, speed, etc; that they used to determine their outcome and results (Glass, Kenjegalieva, Taylor, 2014). They concluded that there are definite groups in which players can be categorized and they are efficient on offense but not defense, efficient on defense but not offense, efficient on both, or inefficient on both (Glass, Kenjegalieva, Taylor, 2014). They also determined that tall players do not have a significant advantage when on offense as well as being left-handed which defies conventional wisdom (Glass, Kenjegalieva, Taylor, 2014).

Comparing

The study regarding anticipation and the qualitative analysis of what makes a great returner is a great analysis on which to base some of my data and findings. The limited nature of the study is qualitative lends itself to further being studied quantitatively. Although effective, the study is limited to only eight players and is too small of a sample size to really conclude anything as well. Unlike the study that was looking at efficiency, the smaller sample size lends itself to being skewed in some way or another. Similar to the anticipation study, the movement study also lends itself to further understanding quantitatively through more testing. I find the theory behind the study to be somewhat flawed as it paints “offensive” and “defensive” zones much more black and white than they are in real life. It is rather arbitrary and does not reflect some other important factors such as ball speed coming off the racquet as well as foot speed which changes where a particular player hits the ball from. The efficiency study does a great job at determining the ways at which you can measure proficiency from a tennis perspective such as the inputs for first serves in and break points saved; however, it does not go into the root cause predictors of why those things occur. In my study, that is what I will be intending to do.

Methodology

Discussion of Data

For this study, we will look at the date from the 2019 ATP (Association of Tennis Professionals; hereinafter “men’s professional tennis players” ) tour. The players that will have their data counted in this study will have had to play at least 20 matches at the ATP level during the 2019 season. 102 players played in at least 20 matches during the ATP season so they are the players that will be counted for this study. This amount of ATP professionals should suffice in gaining a rather accurate gauge on the totality of returning as it accounts for many different styles of play and body types. By excluding players that played in less than 20 matches, we can omit those who missed significant time due to injury as well as players who are not full-time members of the ATP tour. These players could potentially skew the data so it is better to not include them. One limitation is that we are only looking at a singular year of data. If one player has a “down” year or if another one has an “up” year, it fails to look at the totality of their career and take that into effect.

Data Collection

The data collected will be various quantitative pieces of information regarding the individual player. This data will include height, weight, ace against percentage, and return in-play points won percentage (specifics will be defined in the variable section). The results from the study could be immediately generalizable to other ATP professionals and future generations of professionals. I would expect the results to transfer over in some capacity to women’s professional tennis players as well as they are at the pinnacle of their sport as well. Some overarching generalizations could also benefit the casual player and gain insight into where they can improve their game.

Data will be obtained primarily from the internet database Ultimate Tennis Statistics and the ATP tour’s database on their website. The ATP tour website draws on data from every professional event and player which makes it a great tool for collecting my data. It is run by the ATP tour, so the data can be considered reliable. Ultimate Tennis Statistics is another comprehensive database that uses data from each individual match and player. The datasets so far have not had any missing values or unattainable numbers. If missing data exists for a particular match, that particular match will be discarded and not included within the data.

Variables

When deciding what variables to choose for the study, it came down to some natural physical traits that are considered undesirable for returning serves and in-game mechanics that are considered advantageous for returning. Taller players are generally considered to be at a disadvantage for returning compared to players of smaller stature. We wanted to put this notion to the test and decide if that can be proven accurate. Players that are aced less have a higher chance of getting balls back into play, which improves their ability to win a point greatly. Therefore, we thought that could be a contributing factor to the success of a returner. Those are a few examples of the reasoning we had when choosing the particular variables.

Response Variables

Percentage of points won on return: percent, quantitative, between 0 and 1
Percentage of return games won: percent, quantitative, between 0 and 1

Predictor Variables

Height: meters, quantitative
Weight: kilograms, quantitative
Ace Against %: quantitative
Return In-Play Points won %: quantitative

Research Hypothesis

For this study, we expect to find that height, weight, the rate at which an individual returner is aced, and the rate at which the individual returner wins the point when the ball is returned in play do not have significant linear relationships in predicting the percentage of points won while returning serve. At the same time, we can also expect that height, weight, the rate at which an individual returner is aced, and the rate at which the individual returner wins the point when the ball is returned in play do not have significant linear relationships in predicting the percentage of games won by an individual returner. Our alternative to those hypotheses is that height, weight, the rate at which an individual returner is aced, and the rate at which the individual returner wins the point when the ball is returned in play are significant linear predictors of predicting the percentage of points won while returning serve. For the second hypothesis, the alternative will be that height, weight, the rate at which an individual returner is aced, and the rate at which the individual returner wins the point when the ball is returned in play has a significant linear relationship in predicting the percentage of game won by an individual returner. From this regression, we will use backward elimination to find the regression that best suits the data.

Description of the Analyses

To determine the impact of the various predictor variables on the percentage of return points and the percentage of return games won, we will use a multiple regression model. After compiling the data, we generated the following multiple regression model:

The variables are defined as such:

· X1 is the height of an individual player in centimeters

· X2 is the weight of an individual player in kilograms

· X3 is the rate at which the individual player is aced while returning serve

· X4 is the rate at which the individual player wins the point if they return the serve in the field of play

Plan to Analyze Data

To analyze the data collected, we will be using Minitab software. Some other metrics we will use to analyze the data will be residual plots, and normal probability plots to assess a few things. The residual plots will be used to graphically show what the residuals look like along the regression line as well as being able to see and locate outliers in the data. We will also take a look at a normal probability plot to just assess and make sure that our data is approximately normal while looking for any skews.

When analyzing and answering our research question, we will use be using r-squared to determine whether a high percentage of the variation can be explained by our linear model. Since we will also be performing a partial f-test, we will also be looking at type III sum of squares as well as the p-values of each individual predictor. By doing this, we can figure out the significance of each individual predictor and determine whether the model is predictive of our response variables.

Results

For our study, we wanted to determine some factors that correlate to having success returning serve in men’s professional tennis. Specifically, what factors are determinants of an individual returner’s ability to win points when returning and win games when returning? We collected data from 102 professional men’s tennis players (ATP professionals) and compiled it into a data set. The purpose of this study is to determine premonitions, games, and physical tendencies that can project future success at returning serve at the ATP tour level.

Descriptive Statistics

Based on the data compiled, we gained some insight into each of our different variables in the study. Height was an interesting factor to look at as male professional tennis has a fair difference between the height of the tallest player on tour and the smallest player on tour for a range of 41cm. With a standard deviation of 7.63cm, it helps to show the levels of variation within the sport. Along with height, there was a vast difference in weight between the heaviest player, and the lightest player. The difference was 44kg between the two, which shows the large variation in body type in ATP players. A full summary of those two variables is listed below in Table 1

The next set of predictors is the rate at which a player is aced and the rate at which the player returns the opposing serve in the court of play. In both of these sets, there are a few notable outliers that lie a good distance away from the rest of the data which are shown below in figures 1 and 2 show the boxplots of both predictors. Specifically, the two super high rates of aces against I want to take a particular look at to see why they are so far away from the rest of the data. The summary of these predictors is also listed in table 2.

Boxplot of Percentage at Which Returner Wins Point When Return is In-play

Summary of Ace Against % and Returner Wins Point When return In-play %

As we take a look at comparing these predictors, one of the most important factors we can discuss is the correlation between each predictor. The correlation matrix suggests a strong positive correlation of .812 between height and weight. This would make sense as they are often intertwined with each other. Return in play points won seems to also moderately correlate negatively with height which is an interesting fact to note. Return in-play points also correlate moderately in the negative direction with weight. See the correlations below in Figure 3.

Correlation Matrix of Predictor Variables

Our two response variables for this study are the percentage of points and games won while returning serve. As we dissect these stats on each, we can expect the two to go relatively together as you need to win points in tennis to win games. The correlation between these two variables should be very high. When we look at the correlation between the two, we have confirmation that the two are very highly correlated at .962 (see Table 2). Now if we look at their respective standard deviations, we can see that the standard deviations vary a fair amount between the two with the percentage of points won standard deviation being 2.767 and the percentage of games won standard deviation being 4.781. Hopefully, through the regression, we might be able to account for the differences in standard deviation and understand why the numbers line up like that. See table 3 for a full summary. Boxplots of the data show a very similar box and whisker of the data for each of the response variables that further show their correlation to each other (Figures 4 and 5).

Correlation of Percentage of Points and Games Won on Return

Summary of Percentage of Points and Games Won on Return

Boxplot of Percentage of Points Won on Return

Boxplot of Percentage of Games Won on Return

Inferential Results

The logistical regression model we defined, was statistically significant when Y was defined as the rate at which an individual player wins a game returning. The regression produced a p-value of less than .000, and r2 = .9237. From these numbers, we can safely assume that our regression produces statistically significant results and helps explain Y. However, to find whether each predictor variable is significant, we used a backward selection model at a significance level of 0.05 to weed out any insignificant variables in the regressions. Table 1 shows our overall regression. Looking at the significance of the individual predictors, the data shows that since the weight has the highest p-value of 0.295, we will take that predictor out of the equation. That gives us Table 2. After taking out weight from the regression, height still has a larger p-value than we would like at p=.347. Therefore, we need to remove it from the overall regression. From that point, we get Table 3.

Test for Overall Regression Significance for Rate at Which an Individual Player Wins a Game Returning

Test for Overall Regression Significance Without Weight(kg)

Test for Overall Regression Significance Without Weight(kg) and Height(cm)

What all this tells us is that weight and height are insignificant in predicting the rate at which an individual player wins a game while returning. However, ace against percentage as well as return in-play points won percentage is very indicative of explaining the rate at which an individual player wins a game while returning. With return in-play points won percentage is the most indicative as the sum of squares is extremely large. This relationship can also be shown by the scatterplots in Figures 1 and 2.

The logistical regression model we defined was also statistically significant when Y was defined as the percentage of points won while returning. The regression produced a p-value less than .000, and r2 = .9886. From these numbers, we can safely assume that our regression is statistically significant. With our r2 value being around 6 percentage points higher than our regression that had Y as the rate at which a player wins a game while returning serve. To determine what predictors to use in our final model we again used a backward selection technique with a minimum acceptable p-value of 0.05. The overall regression for this model is shown in Table 4. With height having a p-value of over 0.05 in the overall regression, we took it out for the next model which gives us Table 5. In Table 5, all our predictors are significant in terms of having a p-value less than 0.05.

Test for Overall Regression Significance for Rate at Which an Individual Player Wins a Point Returning

Test for Overall Regression Without Height

Although at this point all of the predictor variables are significant in the data, weight has such a small sum of squares value, that it can be taken out of the regression equation with minimal effects occurring for the significance and predictive ability of the model. With weight taken out of the equation, the r2 value only decreases by .11%. It drops from 98.84% to 98.73% which is shown in Tables 6 and 7. This can also be shown through the scatterplots in Figures 3,4 and 5.

R values Summarized for Regression Without Height

R values Summarized for Regression Without Height and Weight

Conclusion

From the result of the regressions, we can reject the original null hypotheses and conclude that the overall regression for each of our overall regressions is significant. At a significance level of 0.05, we can conclude from the data that our four predictors are effective at explaining the rate at which an individual player wins a game when returning and the rate at which a player wins a point when the return is in-play. The most indicative predictors from both of our regressions are the ace against percentage and the return in-play points won percentage.

Interpretation of Results

Although the overall regression for each test was significant in both models, we discovered that height was not a predictor that was individually significant in either of our models. This is really interesting to us, as we would expect players that are taller to be poorer at returning than their shorter counterparts. The shortest player that we looked at was Diego Schwartzman who ranks number 3 in both return games won percentage, as well as return in-play points won percentage. At the other end of the spectrum, we have Ivo Karlovic who is the tallest player we looked at, and ranks dead last at both return games won percentage and return in-play points won percentage. Here is why we believe this occurred.

At the extreme ends of the spectrum regarding height, there seems to be some correlation between height and the response variables we looked at. But looking at the players that are found within the IQR range of height, there seems to be little or no difference that affects a player’s overall returning ability. As we slide away from the extreme ends of either spectrum, height becomes little to a none factor in a player’s ability to return serve.

Now regarding weight, it was deemed insignificant in the regression that modeled the rate at which a player wins a game when returning, but significant (according to p-value) when the regression models the rate at which a player wins a point when returning. Although it was a very small amount of variability that weight accounted for when looking at the rate at which a player wins a point while returning serve, we do not know why that would be the case. We speculate it might have something to do with weight becoming a factor for longer rallies when a player might tire more easily. Other than that, we would like to do more research on the subject to further understand it.

Ace against percentage and return in-play points won percentage being significant predictors in both overall models was to be expected. Return in-play points won percentage can essentially be classified as a measure of how good a player’s groundstrokes are in a nutshell. Being adept at groundstrokes is the most important factor for being a good returner as it is inherently a defensive position on the court. This coincides with our findings as it is the most significant predictor of both the rates at which a player wins a game when returning and when a player wins a point when returning. Ace against percentage being a good predictor variable for our models helps to explain some variation as well. Having a low ace against percentage translates to an ability to move quickly and have a good reaction from a defensive position on the court that lends itself to be more reactionary. In the regressions, it helps account for an aspect of the game that return in-play points percentage does not account for.

Implications

The research we have done can help to further the study of what is the make-up of a great returner. Through the research, we discovered that some physical aspects such as height and weight are not super predictive for determining whether or not a player returns serve well. This contrasts with conventional knowledge of tennis as taller players are often seen as worse returners than their smaller counterparts. In fact, we can see that conventionality is challenged when we look at two of the premier ATP players in the world, Daniil Medvedev and Dominic Thiem. Although Medvedev is 13cm taller than Thiem, he wins over 2% more points when returning and over 4% more games returning. A direct challenge to the conventional knowledge of tennis commentators and experts.

The most predictive value in both overall regressions was the return in-play points percentage. As stated earlier, that value is very indicative of how good a player’s groundstrokes are. This furthers the argument that groundstrokes such as the forehand and backhand are the most important for a player to be successful. Recently play style in men’s professional tennis has changed from a hyper-aggressive serve and volley style of play to a more conservative and precise groundstrokes game. Players such as Novak Djokovic and Rafael Nadal have revolutionized the game with their precision, power, and touch with groundstrokes which has vaulted them into some of the best tennis players we have ever seen. This study we have done helps to further show the importance of groundstrokes when returning serve and how a superior groundstroke game lends itself to a superior return game.

Limitations

In the study, we would have liked to gain more information regarding physical aspects other than the height or weight of the ATP players. Some variables we would have liked to throw into the regressions would have been arm length as well as the spin rate that the player generates on the ball. Arm length would have been an interesting potential factor to look at, as a longer could potentially help with reaching for returns when the serve is almost out of reach, as well as the ability to get “solid racquet” on the ball. The spin rate would have been also interesting to look at, as conventional tennis knowledge would have us believe that a higher topspin rate on a shot would be conducive to better results when returning serve. However, we could not find complete data on either of these variables. During a calendar year, only a select few tournaments calculate spin rates for individual players so the data would be incomplete which unfortunately meant that we could not use any of the data. Unlike the NFL (National Football League), there is not an ATP combine where every player is measured for things such as arm length, vertical and lateral jump just to name a few. With that being said, we could not find reliable information that we felt was suited to use for our study regarding arm length.

Future Research

While analyzing the data, we realized that differences in the surfaces of the court might play a factor in explaining why some players are adept at returning. Between the four surfaces played on the ATP tour (indoor hardcourt, outdoor hardcourt, grass, and clay), each of the surfaces provides a different bounce to the ball that some players deal with better than others. Roger Federer is known for his prowess on grass courts and Rafael Nadal is known for his remarkable ability to play on clay courts. Further studies can be done on the differences in the surfaces which could bring to light some of the impacts they have on the returner. We then could add those findings to this study to further our understanding of the game even more.

From this study, we found how ace against percentage and return in-play points won percentage is highly indicative of the rate at which a player wins a game or a point while returning. Although we found height and weight to not be rather significant, it still helps us conclude information and help change the conventional view. These findings can be used to further the study of returning serves and the overall study of the game. From our work, we now better understand the game of tennis and what makes a player successful.

Fin

Bibliography

Anthony J. Glass, Karligash Kenjegalieva, & Jason Taylor. (2015). Game, set and match: evaluating the efficiency of male professional tennis players. Journal of Productivity Analysis, 43(2), 119–131. https://doi.org/10.1007/s11123–014–0401–3

Martínez-Gallego, R., F Guzmán, J., James, N., Pers, J., Ramón-Llin, J., & Vuckovic, G. (2013). Movement characteristics of elite tennis players on hard courts with respect to the direction of ground strokes. Journal of Sports Science & Medicine, 12(2), 275–281.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3761832/

Vernon, G., Farrow, D., & Reid, M. (2018). Returning Serve in Tennis: A Qualitative Examination of the Interaction of Anticipatory Information Sources Used by Professional Tennis Players. Frontiers in Psychology, 9, 895–895. https://doi.org/10.3389/fpsyg.2018.00895

Modeling the Ability to Return Serve in ATP Tennis

Written by 3rd Quarter Analytics