On Friday, news broke of one of the more anti-climactic 1-for-1 trades of former NHL stars when the Oilers and Flames swapped slumping players Milan Lucic and James Neal. While a near-obvious win for the Oilers with respect to contract terms alone, the trade also offers an alluring second pathway to create additional value in the form of actual goal-scoring.
James Neal is one of the more consistent goal-scorers of his era — he’s the 17th leading goal scorer since he entered the league in 2008-09 with 270 goals, wedged between Jamie Benn and Thomas Vanek over that time period. That’s been good for 0.35 goals per game, or 28.7 goals per 82 games.
Whether he’s worth the 5-year deal he signed in Calgary last offseason isn’t the focus of this blog post (or even a question most Oilers fans should care about given the gift horse Neal’s trade represents). What I am fascinated with his how many goals to expect out of Neal in 2019-20 given two dynamics at play:
- The inexorable crawl of time — Neal is 31, soon to be 32 entering next season. There’s no doubt that goal scoring ability diminishes with each passing year over 30 (and potentially much earlier than that). But how severe is the expected drop-off between 31 and 32?
- An almost unbelievable one-year shooting slump that saw his shooting percentage (5.0%) drop to less than half of his career average (11.6%). Do players who experience such drop-offs usually see a comeback the next year or do they follow the trend into oblivion?
A simple methodology
For this analysis I’m going to include any NHL forward who scored at least 0.08 goals per game in each season between the ages of 27 and 31 since 1995-96. If any age in that span was lost to the 2004-05 lockout, I linearly interpolated the lost season using the average of the two surrounding seasons. I’ve also stripped out anyone who did not play in their age 32 (including those who lost their age 32 season to the lockout) season since it’s quite obvious Neal will play this upcoming season. It turns out 249 players fit this description — this will constitute our sample size.
For our response (Y) variable, I’ll be using their Age 32 season goals per game. This is what I’m trying to estimate for Neal.
For our predicting (X) variables, I’ll be using each players’ Age 27 through 31 seasons’ goals per game level — this represents the 5 seasons preceding the Age 32 season.
I’ll also be using a predicting variable the reflects the deviation of each players’ Age 31 shooting percentage from their Age 27-30 seasons’ average shooting percentage. So if you averaged 10% from Age 27-30 but then posted 5% at Age 31, you received -5% for this variable. I then multiplied this by their average goals per game from Ages 27-30 to allow the scoring slump to be magnified by their previous goal-scoring prowess. A shooting slump should theoretically impact a very high-scoring player much more than a usually low-scoring player.
Eventually I’ll be running a straight-up linear regression to reach my conclusions. I could throw a lot more complicated models at this to reach a goal prediction for 2019-20, but regression will offer the added benefit of having very accessible interpretation of the predicting variables, which is probably more interesting to me than the actual forecast I’ll generate. You’ll see why once I get there…
Exploratory Data Analysis
The above table outlines the average goals per game of our 249 players by Age — Age 27 is the highest-scoring at 0.283 goals per game while age 32 is the lowest-scoring at 0.220 goals per game. The goals per game drop-off by year is actually pretty tame between Ages 27-29, but hits a significant clip every year after Age 30. In fact, the Age 32 season sees goal scoring drop by 10.7% from the Age 31 level. In total, goal scoring pace drops 22.2% between Ages 27 and 32.
The above box plot diagram shows the Age 32 scoring improvement (over Age 31) between players whose shooting percentages either declined or improved at Age 31. You can see that players whose shooting percentages declined at Age 31 overwhelmingly saw a goals/game output increase at Age 32 versus those whose Age 31 seasons saw an increase in shooting percentage. This effect was strong enough for average scoring to actually increase between Ages 31 and 32 among those whose Age 31 shooting percentages were below their normal levels.
So, a) Neal’s age should work against him but b) his bad luck at Age 31 should work in his favour for Age 32. Not exactly rocket science.
Closest Neal Comparables
Out of curiousity, who are the 25 closest comparables in our 249 observations to Neal’s scoring output between Ages 27-31 and then what did they do at Age 32? To do this, we’ll find the shortest Euclidean distance between Neal’s Age 27-31 goals/game rates and the 249 players in our dataset (remember the distance formula from grade 10 math? This is the same thing, except using 5 dimensions instead of 2). They are:
The table is sorted in order of most-like James Neal’s Age 27-31 output. So, Matt Moulson’s numbers were most like James Neal’s, Eric Staal’s were 2nd most like his, etc. The average goals per game at Age 32 of this cohort was 0.22 — implying a crude k-Nearest-Neighbours prediction for Neal of 0.22 goals per game next year, or about 18 goals over 82 games.
But you can see that some of these 25 players’ Age 32 scoring was way up from their Age 31 season (like Vinny Prospal, Patrice Bergeron, or Jason Spezza) while some kept tumbling down (Martin Erat, Alex Burrows, or Marco Sturm). In total, 11 players’ Age 32 scoring went up, 12 went down, and 2 stayed the same. Curiously, all but two of these 25 comparables had an Age 31 shooting percentage below their normal levels. And if anyone can remind me who Brian Savage was I’d really appreciate it.
Get to the Regression please…
Ok, let’s get to the point. As mentioned, I’ll be running a linear regression on this data for two reasons: a) to predict Neal’s 2019-20 goal scoring, but honestly I’m more interested in b) interpreting the relationships between the predicting variables and my response variable. The following table outlines the coefficients obtained by this model, which I’ll outline in detail about why I find them so interesting.
As a reminder, these are expressed in Age 32 goals per game. Think of them as (kinda) the weighting of each variable in determining Age 32 scoring. Some observations:
- The intercept of this model is beautifully close to 0 at 0.0018. This implies that if you scored zero goals in the NHL at Age 27, 28, 29, 30, and 31, you’d score 0.0018 goals per game at Age 32, or about one goal every 7 full seasons of play. Your intercept should always make some kind of intuitive sense, but this might be one of my favourite intercepts ever from an interpretation perspective — that any random 32 year old dude could score one goal in the NHL every 552 games played. Sounds about perfect.
- The sum of all of my age-related coefficients is 0.83. What does that loosely mean? That Age 32 scoring is worth about 83% of whatever a player did between Ages 27-31. A 32 year-old is going to score 17%-ish less than some combination of what they did over the previous 5 years. Crudely — an Age 32 year-old player is 83% of what they used to be.
- Look at the Age 31 coefficient — 0.519. This basically means that you scored 1 goal per game at Age 31, you’d expect to score 0.519 more goals per game at Age 32 (holding all other variables constant). This is by faaaaar the most relevant variable in this model and obviously the largest coefficient in this model, suggesting that Age 31 scoring has massive relevance to what you do at Age 32 — 5-7 times more impact that what you did at any age between 27 & 30. In fact, take the 0.519, divide by the sum of the age coefficients, and you get 62% — so, of all the weightings the model puts on 5 different previous years of scoring before Age 32, 62% of that weight is put on just one season: the one that preceded it. This will hurt Neal’s projection.
- The engineered shooting percentage deviation variable I created has a negative coefficient, which makes sense — there is an inverse relationship between Age 31 SH% deviation and Age 32 scoring. So, if you shot below your average SH% at age 31, your scoring at Age 32 is expected to increase. This will help Neal’s projection.
The p-values for these coefficients are fairly uneven — a couple of them are pretty high (0.3-ish), meaning we’re only ~ 70% confident that they’re not actually zero. But leaving them in the model has a miniscule effect on the final prediction for Neal and really do help interpretation of our variables, so I’m opting to leave them in.
The R-squared of the model is 0.534 — meaning that about 53.4% of the variability in Age 32 scoring can be explained by the variability in our predicting variables. 47% is unexplained by this model — leaving all kinds of fun interpretation for what’s missing, such as injury effects, quality of linemates, favour of the coaching staff, unexpected SH% at Age 32, etc etc.
The predictions
Alright, I suppose I should get around to making a prediction here. In summary:
- The model expects James Neal to score 0.225 goals per game next season
- Over 82 games, this equates to about 18.5 goals
- There is wide variability in this prediction. A 95% prediction interval is between 6 and 31 goals, meaning 19 out of 20 times Neal will score between that many. A narrower 68% prediction band would be between 12-25 goals, meaning we’re 68% confident Neal will score between that range.
- What’s the probability that Neal scores 21 or more goals next year as the conditional 3rd round draft pick stipulates? I’d estimate something like 34%, assuming he plays 82 games.
- And then what’s the probability of losing the 3rd round pick? Well, it’s fascinatingly conditional not just on Neal scoring 21+ goals but also on him scoring 10+ goals more than Lucic. Even if I’d modeled Lucic today (which I haven’t), that conditional gap would make a closed-form probability solution either hard or impossible to figure out. If I can summon the effort, we may be able to use simulation to estimate how likely that scenario is.
Great article.
There are a few key differences I see that unfortunately, math is not able to show.
1) comparing to Moulson is difficult because he went to a worse team and was one of the slowest skaters in the NHL, Neal has neither of those factors
2) Many of the example players had dropped to the 3rd line and did not have nearly as productive linemates. This would include Gagner, Hagman, Antropov, Burrows and Bourque. Your teammates have a significant say as to whether you bounce back or continue producing.
3) Neal had a horrible year, with a shooting % of less than half his career average playing mostly on the 3rd line playing 4 minutes less a game. a bounce-back of shooting %, better linemates and more minutes will all improve his chances at a bounce-back season, not many of your comparables were in that same position.
Lastly, Neal will be playing with the best center he has played with in the past 3 years if he ends up with Nuge and I would say the Johannson and Nuge are quite equal. a huge improvement at the center is a huge advantage as to whether you can boucne back or not.
While using some of your math and also interpreting all of the advantages Neal will have this year over last year, (where he would have scored 21 goals if he shot his career average while on a 3rd line) I would suggest that Neal has a very good shot at scoring 24 goals over an 82 game season. He will likely miss 10 games so he would look to score 21 goals next year.