Sunday, April 10, 2016

The Thrill of Analytics in IPL - Part I: What is a Par Score in an IPL Match?

The Indian Premier League (IPL) started today. In 2008 when the inaugural version of IPL was launched, the razzle-dazzle of this shorter format of the game definitely caught the attention of the cricket lovers. Some even argue that it roped in a completely new generation of fans into its fold, particularly, the younger generation who would rather indulge in the excitement of this shortest format of the game. Since its inauguration, the IPL has had its own share of problems. The most notorious amongst them being the controversy surrounding the spot-fixing and betting scandal of 2013. The betting scandal subsequently led to the suspension of two franchises - Chennai Super Kings and Rajasthan Royals. In spite of such hiccups, the games must go on and the so did the IPL. In the 2016 version of the IPL there are two new franchises – Rising Pune Super Giants and Gujarat Lions and the cricket lovers are awaiting another fun filled tournament.
 
As for me, I have always loved watching all forms of cricket but, T-20 and IPL have always been intriguing because they offer the excitement of some phenomenal analytics. In 2011, while visiting London, I remember telling a British friend that T-20 cricket offers some incredible opportunity for sophisticated analytics. He was so amused (or rather shocked) that he choked on the red wine! He asked me if I also watch WWE which, I have to confess that I sometimes do late at night in my hotel rooms to keep me awake. Skepticism aside, I am however sticking to my point and, I over the next 6 weeks while the IPL is in progress, I will write a few blogs on analytics surrounding this tournament. This one is the first in that series.
 
An important question that comes to my mind is ‘what is the par score in an IPL match’? In order to answer this question in a scientific manner, I collected data on all the IPL games since 2008 from www.espncricinfo.com. As a first step, I looked at the descriptive statistics of the first innings scores of all the IPL matches played since 2008 (see Chart A). One can see that the average score in the first innings has gone up since 2011 and that is a welcome sign. Overall it means that the scoring efficiency is increasing in this version of the sport. The average score from all seasons is 159 and in a naïve manner, we could infer that this is the par score. By the way, I would define the par score as one where the team has at least a 50% chance of winning the game.
 
Chart A: Descriptive Statistics of First Innings Scores in IPL
 
Now we know from Statistics that the average may not lie just in the center. If the distribution is skewed, the average is not necessarily a measure of the central tendency and, in those instances, a more reliable measure of the central tendency is the median. In this case, the median of first innings score is 160 which is close to the average. In other words, it means that the first innings scores are symmetrically distributed (see Chart B). 
 
Chart B: Frequency Distribution of Runs Scored in First Innings - Closely Follows a Bell Curve
 
The bell curve of first innings scores mean that there is an equal probability for a team to score on either side of the mean. This curve however does not tell us the probability of winning given a first innings score. In order to get that answer, I bucketed the scores in intervals of 10 and for each such interval, calculated the number of times the team batting first had won the match. I then calculated the probability of winning for each such interval (see Chart C). A few things are quite apparent from this chart: 
  • No team has ever won a match in IPL scoring less than 100 runs.
  • Each time a team has scored 220 runs or more, it has always won the match.
  • The probability of winning increases with the first innings score and follows the shape of an S-curve.

Chart C: First Innings Score and Probability of Winning

I fitted the S-curve using the equation shown in the inset. By statistical measures (p-Value et al), the fit of this S-curve to the underlying data is extremely good. Using this equation, one can verify that the score where the probability of winning is 50% is 163. In other words, the par score is 163. The probability of winning rises quite sharply as the score increases (see inset in Char C). Whereas the probability of winning is 50% at 163, it rises to almost 60% for another 10 runs. This explains why the runs scored in the last 2 or 3 overs are so important.

I also validated the outcome from the S-curve by calculating the average first innings score for each team and, then comparing the predicted winning ratio (based on the S-Curve formula) with the actual winning ratio (see Chart D). As expected, there are some deviations but, in general, the predicted values are quite close to the actual ratios. 
 
Chart D: Comparison of Predicted and Actual Winning Ratios
 
The readers now know how to calculate the probability of win for the team batting first. I encourage you to use this formula in this edition of the IPL and compare the predicted versus actual results. In the next blog, I will write on how the probability of the team which batted first changes once the other team starts batting.

2 comments:

  1. It is indeed encouraging and very interesting to see the analysis as application of theoretical constructs to real ground cricket.
    Few observations: 1. Out of the teams with average scores above the par score(163)such as CSK(165), MI(164), RCB(164), the RCB in particular has an actual winning ration much below 50(41%). Thus though scoring high themselves, the seem to give opportunity to the other team to score more then them 100-41 = 59% of the time. Is this indicative of their team composition with good batsmen but comparatively much poorer ballers /fielders ? If yes then these are their problem areas which don't let them win despite scoring high and they need to strategize improving on these areas to turn the balls around.

    2. Kochi Tuskers Kerela with just 7 matches batted first seem to have a comparatively very low share in the sample size. Could it affect the analysis ?

    3. Could environment variables such as pitch/stadium , season, time also impact the results and a mechanism to factor those in can be there ?

    4. Being highest average scorers, evidently CSK & MI seem to have batted first most of the time compared to others. Does it suggest that they are good at winning tosses ? & does it also reflect that data substantiates that batting first gives a team an advantage(RCB though seems to be an exception and need to identify and address their grey areas)

    ReplyDelete
  2. Thanks for providing good information,Thanks for your sharing.

    หนังเกาหลีใหม่

    ReplyDelete