Larry Mahnken and SG's

Replacement Level Yankees Weblog

"Hey, it's free!"

The Replacement Level Yankees Weblog has moved!  Our new home is:

Larry Mahnken
Sean McNally
Fabian McNally
John Brattain

This is an awesome FREE site, where you can win money and gift certificates with no skill involved! If you're bored, I HIGHLY recommend checking it out!


Disclaimer: If you think this is the official website of the New York Yankees, you're an idiot. Go away.

April 10, 2006

Win Contributions
by Larry Mahnken

The last thing the world needs is a new offensive statistic, but I wanted to share something I've been playing around with the last couple of days. It's not meant to be predictive, it's just meant to measure value in a different way than other statistics.

The statistic is based on these ideas, which may be wrong, because I'm not a pro at this:

1) The run-value of an event is variable depending on the Base-Out State when it occurs. This, I believe, is a fairly non-controversial statement. A Grand Slam is the same as a solo homer, but obviously worth more runs. A single with a runner on is worth more than one with the bases empty, and a single with the bases empty is worth the same as a walk.

2) All runs in a game have the same value, regardless of when they were scored. The first run of a game and the 15th are worth the same, though each is worth less than a run scored in a 6-run game.

3) The ultimate value of an event to a team is dependent on the ultimate outcome of the game. A run scored in a loss is worthless, while a run scored in a win is valuable. (edited to make more sense. I hope the rest of this still makes sense with the edits)

The final two statements, I believe, are the controversial ones. Subscribers to the Game-State theory of value (first pioneered by the Mills brothers) believe that a run that happens late in a close game is worth more than one that happens earlier in the same game, and that tack-on runs are worth progressively less. I don't buy this. If you score 10 runs in the first it's the same as scoring ten in the ninth -- the direct impact on the likely outcome of the game at the time is different, but in the end, all other things being equal, they had the same impact on the actual outcome.

In the third statement I am making the point that the goal of a team is to win ballgames, not to score runs, and that a game can be won or lost on offense. If you score 0 runs, you'll never win, and you can always score enough runs to win. This statement holds true with pitching and defense in the opposite direction, and ultimately it can be said that you win because you score enough runs on offense and prevent enough runs on defense -- while you lose because you didn't score enough or prevent enough.

So how's this stat work? It's pretty simple.

First, I find the base-out state for every event on offense in the game (I'll explain at the end of this hole thing why I didn't do pitching and defense -- to simplify, it requires a whole lot more data that I don't have). Using Tangotiger's Run Expectancy Matrix, I find the expected runs scored for each state.

OK, here's where I made another decision I'm thinking a lot of people will disagree with. I figured what the worst possible outcome of each event was, and what the RE was for it. With nobody on and nobody out, the worst possible outcome was one out with nobody on, while with two on and no out, the worst that could happen is a triple play. Obviously there's a greater chance of an out in the first situation than a triple play in the second situation, but I made no adjustment for that. I'm not sure if I should, or how to do so if I should.

The reason I did this is so there would be no negative values. I calculated the value of each event as being the difference between the RE of the outcome and the RE of the "worst possible" outcome. I also calculated the difference between the outcome and the "best possible" outcome -- which is, of course, a home run.

OK, so the next step is to add up the "value" of every event for the team, as well as the total of the difference between the value and best possible value. You then add up these totals for each player.

If the team wins, then each player's "Win Contribution" is the percentage of the total team value (this is why I set the baseline as the worst possible outcome -- so the lowest possible contribution is 0). If they lose, their "Loss Contribution" is the percentage of the team total of runs below the best possible outcome.

It's pretty simple, though I'm not yet sure how well it works. I've only run it for the Yankees for the first six games, and here are the totals:

Player          Wins Losses
Jorge Posada .356 .455
Hideki Matsui .312 .468
Alex Rodriguez .257 .550
Robinson Cano .219 .380
Derek Jeter .218 .367
Johnny Damon .199 .344
Jason Giambi .158 .476
Bernie Williams .116 .352
Gary Sheffield .086 .537
Miguel Cairo .048 .000
Bubba Crosby .021 .000
Andy Phillips .010 .000
Kelly Stinnett .000 .072
Hopefully, at the end of the season, this will reflect which players contributed most to victories and were most responsible for the defeats. As you can see, currently the most responsible player for the Yankees' defeats is Alex Rodriguez, just ahead of Gary Sheffield, because he's made outs in so many high-RE situations. If you want to convert these numbers to a winning percentage (which is fair), you'll find that no regular has a Pct. over .500 -- which of course isn't surprising. While A-Rod has the most loss contributions, he's also contributed heavily to their wins and his .319 Pct. is not much different than the team's .333, the regular with the worst Pct. is Gary Sheffield, who has been responsible for only about 4.3% of their wins, but 13.4% of their losses.

Now here's why I didn't do pitching and defense: lack of data.

A pitcher's value shouldn't be based on the outcome except for walks, strikeouts and homers. For any ball in play the value should be the expected run value of where he hit the ball -- the difference between that and the outcome goes to the fielder. I suppose I could buy the data from BIS or STATS or something, but that would cost a LOT. I'd then have to parse the data by Base-Out state to find values for each point on the field. I'd love to have the data and time to do that, but for now let's see how nicely this stat works out, then maybe we'll go more in-depth.