February 20, 2004

The Greatest Pennant Race That Didn't Matter (More stat stuff)
by Larry Mahnken

Okay, a couple of days ago I used Baseball Prospectus' PECOTA to compare the Yankees' current lineup to last year's lineup. Since I know that Will Carroll and Joe Sheehan read the blog--Joe in the past day, since he sent me a link to this great article he wrote two years ago comparing MLB and the NFL, and I haven't been told to cease and desist yet, I'm going to work from the assumption that my limited use of the PECOTA projections is okay for now, and I'll finish the job on that study. (The PECOTA spreadsheet is really nice, by the way, if you've got BPro Premium and haven't downloaded it, it's really worth it).

Anyway, going back to what I wrote in the previous PECOTA column, I had the numbers projecting the Yankees improving by as much as 9 games over last season, which would put them near 110 wins. Really, that's making a lot of assumptions, not the least of which being that they'll stay healthy. It would be safer to go with the direct comparison to the projections for last year's lineup, which says that the lineup is about five games better.

Which is still great, but what people really want to know is how the Yankees match up with their main competition, the Red Sox. After all, Boston's improved this season, too, so the fact that New York has improved doesn't mean that they'll win the AL East again. Especially since some analysts--and some numbers--say that the Red Sox were a better team than the Yankees last season anyway.

So, what we need to know is how the Red Sox do when looked at the same way, and also, how the pitching staffs fare.

Now, before I proceed, I'd like to comment on this commentary on the Red Sox' pitching staff as written in Baseball Prospectus a couple of days ago:
The Yankees offense will need to be good, because PECOTA's 2004 weighted-mean forecasts rank the Boston staff as first in the American League in expected VORP. Not by a little, by a lot:

BOS 383.8
NYA 284.9
TOR 217.4
OAK 292.2
CHA 195.3
Now, about ten runs is equivalent to a win, so what this implies is that the Red Sox' rotation is about ten wins better than the Yankees', which is a hell of a lot of ground for the offense to make up. The Yankees would probably have to score about 1100 runs to make up that difference, so it looks like the Yankees are in trouble.

Except the analysis is wrong. The total VORP was calculated by adding up the VORP for all the players listed as Boston pitchers on the PECOTA spreadsheet. Some of the pitchers listed aren't on the Red Sox or Yankees this season, some of them aren't going to pitch in the majors this season, and Keith Foulke isn't listed as a Red Sock. Most importantly, this same method results in the Red Sox pitchers recording 1962 IP (12 IP per game) this season, and the Yankees recording 1891 IP (11.2 IP per game), which is...kinda high.

I mentioned this to Joe, and he apologized for the poor analysis--but I'm not really one to criticize Baseball Prospectus, I make plenty of screw ups with stats, like the time I park-adjusted ERA+ (I bet most of you don't get that). It's atypical of the work of BPro, but at the same time, it's incorrect, and I won't let it go without correction.

I'll get back to the pitching later, but first here's Boston's 2003 lineup projected for 2004, and their 2004 lineup (Note: since I couldn't find UZR data for David Ortiz at first, I put Kevin Millar there):

           2003 Lineup in 2004                             2004 Lineup              
Ps Name             EqMLVR Defense Total    Ps Name             EqMLVR Defense Total

CF Johnny Damon      .001    .105   .106    CF Johnny Damon      .001    .105   .106
2B Todd Walker      -.040   -.068  -.108    3B Bill Mueller      .000   -.031  -.031
SS Nomar Garciaparra .152    .019   .171    SS Nomar Garciaparra .152    .019   .171
LF Manny Ramirez     .334   -.136   .198    LF Manny Ramirez     .334   -.136   .198
DH David Ortiz       .161     ---   .161    DH David Ortiz       .161     ---   .161
1B Kevin Millar      .111    .006   .117    1B Kevin Millar      .111    .006   .117
RF Trot Nixon        .154    .074   .228    RF Trot Nixon        .154    .074   .228
3B Bill Mueller      .000   -.031  -.031    C  Jason Varitek     .021   -.025   .004
C  Jason Varitek     .021   -.025   .004    2B Pokey Reese      -.213    .136  -.077
Total                .894   -.056   .838    Total                .721    .148   .869
162 Game Total      144.8    -9.0  135.8    162 Game Total      116.8    24.0  140.8 

The first thing that stands out (if you want to scroll down a bit) is that last year's Yankees lineup projects as being a better hitting team than last year's or this year's Red Sox lineup. As wrong as that sounds, it kind of makes sense when you figure that Bernie, Giambi, Nick and Jeter were hurt last season. I still don't necessarily agree with it, so there's an issue you can have with these numbers.

But then, this projects the Red Sox to score about 950 runs this season, so maybe they're not that off.

So, now back to the pitching. I could try and adjust VORP so that it give a more realistic result, but that has two problems: first, we haven't been working with replacement level here, we're comparing to the league average, and the second is that VORP doesn't seperate pitching and defense. Since I'd already listed UZR for each player--and both teams have improved in that aread--I wanted to value pitching without defense, especially since I think the ERA's for both teams are probably projected a little high.

This would be a great job for DIPS, but since PECOTA didn't list HBP, IBB or Park Factors, I decided to use TangoTiger's FIP instead, which is really really easy to calculate. I calculated FIP for all the pitchers in PECOTA to find the league average (though it'll probably be a little high that way), then figured out the cumulative FIP for the Yankees and Red Sox. First, the Yankees:

       2003 Rotation in 2004                      2004 Rotation           
Name                FIP  Runs Saved    Name                FIP  Runs Saved
Mike Mussina        .26       26.19    Mike Mussina        .26       26.19
Andy Pettitte       .42       18.87    Kevin Brown         .13       24.12
Roger Clemens       .35       18.98    Javier Vazquez      .15       30.31
David Wells         .80       10.99    Jose Contreras      .50        9.81
Jeff Weaver         .98        6.96    Jon Lieber         1.15        2.64
Jose Contreras      .50        9.81    Gabe White         1.46        -.42
Chris Hammond       .74        3.70    Felix Heredia      2.02       -4.18
Gabe White         1.46        -.42    Steve Karsay        .49        5.85
Felix Heredia      2.02       -4.18    Tom Gordon          .30       10.18
Jeff Nelson         .57        3.94    Paul Quantrill      .78        3.92
Mariano Rivera      .08        8.92    Mariano Rivera      .08        8.92
Total per 162g      .63      120.19    Total per 162g      .48      143.96 

What should stand out is the depth of the rotation. THe Yankees' top three starters are fantastic, and much better than Pettitte and Clemens would have been, but the bottom is a bit shakier. That's great for the postseason, not so much for the regular season. However, as we'll see at the end of the article, the Yankees probably don't have to worry so much about the postseason.

Now, for the Red Sox:

       2003 Rotation in 2004                      2004 Rotation           
Name                FIP  Runs Saved    Name                FIP  Runs Saved
Pedro Martinez     -.54       38.71    Pedro Martinez     -.54       38.71
Derek Lowe          .85        9.93    Curt Schilling      .19       24.39
Tim Wakefield       .95        7.06    Derek Lowe          .85        9.93
John Burkett       1.26        1.77    Tim Wakefield       .95        7.06
Jeff Suppan        1.34         .52    Byung-Hyun Kim      .41       14.33
Bronson Arroyo      .88        4.56    Bronson Arroyo      .88        4.56
Byung-Hyun Kim      .41       14.33    Ramiro Mendoza     1.00        2.94
Scott Sauerbeck     .80        3.96    Mike Timlin         .80        3.88
Mike Timlin         .80        3.88    Alan Embree         .49        5.26
Alan Embree         .49        5.26    Scott Williamson    .80        4.20
Scott Williamson    .80        4.20    Keith Foulke        .47        7.69
Total per 162g      .71      107.05    Total per 162g      .47      145.15 

So, PECOTA says the Red Sox rotation is better than the Yankees. Barely.

What stands out here is how crucial the Curt Schilling accquistion was for the Red Sox, how important it is for Byung-Hyun Kim to succeed as a starter, as well as the distinct advantage the Yankees have in the postseason. PECOTA probably short-changes Schilling a little because of his missed time last season (last I checked, appendicitis wasn't chronic), so he probably no worse than Brown and Vazquez, and probably a little better.

Boston's bullpen is about the same as New York's overall, but the Yankees have two pitchers that rate better than anyone in Boston's pen. But then, Boston has a lefty who's not only above-average, but considerably so.

Now, we've got the projected runs above average scored for both teams. We have the runs above average prevented for both teams. I think we've got enough data to make a projection.

202 runs scored above the AL Average is 989 runs. Adding Runs Saved and UZR together gives the Yankees 94 runs prevented above average, or 693. The Pythagorean Winning Percentage Formula projects 106 wins for the Yankees (108 if A-Rod plays shortstop).

117 runs scored above the AL Average is 904 runs. 169 runs prevented above average is 618 runs. The Pythagorean projections for Boston is.....108 wins.

Yes, that's right, if everything happened just like PECOTA projects, the Red Sox would win the AL East, despite 106 wins by the Yankees.

Maybe we should just assume that I screwed up the numbers somewhere.