If you’re here, you probably understand the basics of Overwatch League statistics, at least from a team point of view: wins/losses, differential, even non-official things like map winning percentage are easy enough to describe.
The primary reason this site exists, however, is to explore other methods of rating teams and players, methods that I’ve borrowed from leagues and websites that cover traditional sports and applied to the OWL, along with some I crafted myself. Each has its strengths, and all can be used to give us additional insight into how well teams have performed.
Stats Definitions Quick links:
W2M
SRS
SAMP
First, here are a few definitions of how I measure and count statistics, some of which are different from official OWL sources. Every stat you’ll find on this site comes from my watching matches – either live or on the VODs. As a result, you’ll find nothing here that a dedicated person couldn’t record themselves, with no outside help, and in a reasonable amount of time per match. For the most part, that includes wins and losses for players and teams. It does not include damage, eliminations, healing, time played, and so forth, because that would be virtually impossible for a person to record simply by watching the matches. I still use Stats Lab data from time to time on my Twitter, but I don’t keep that information on my site.
Draws
I treat draws as half a win and half a loss. So a team that wins 3-0-1 has won 3.5 out of 4 maps, for a .875 winning percentage. Most sports leagues seem to assign some value to a draw/tie, so I thought this to be the best solution, despite the draw technically counting as a zero for both teams in terms of match impact. Still, I thought that it was wrong for the aforementioned 3-0-1 result to count as a “perfect” match for the winning team.
Sweeps/Reverse Sweeps
By a similar note, I count a “sweep” as a 3-0-0 match, with no drawn maps. A reverse sweep, on the other hand, I count as any match where a team is one map away from a loss and comes back to win. This would include any number of draws, whether a part of the losses or the wins. As of this writing, this has yet to occur in OWL, and there has only been one match where a team came back from a 3-0 margin to win 4-3.
Regular Season vs. Playoffs
I count regular-season matches as those that are pre-scheduled, typically at the start of the season, and count in official standings to determine placement in knockout rounds or tournaments. Every other match is counted as a playoff match. This is somewhat contrary to official OWL policy, which usually counts knockouts and other such matches under their own category, while reserving “playoffs” for the “main” parts of tournaments and end-of-season playoffs.
2023 Pro-Am (and other exhibitions)
I don’t include the 2023 Pro-Am in my stats (for team or player wins and losses), as I feel like it was more of an exhibition than any kind of official league matches. The same goes for the 2018 preseason that took place in December 2017. However, I have included the bonus money from the Pro-Am in the totals for each team.
2023 Contenders
I’ve decided to only record the 2023 Contenders teams as they participate in OWL matches. This includes the stage knockouts and playoffs – whether they play against other Contenders teams or OWL teams – but does not include the tournaments that got the individual teams to the knockouts. I will also be counting Contenders teams as a single entity, rather than keeping track of each individual team’s record. Thus, there will be a place in my records for “2023 Contenders” that will aggregate the total results for those teams (win-loss record, map record, SRS, and so on) as if they were a single team. I will probably not include them in my records section (for win/loss streaks and so on), since that should focus on individual teams, but might change that approach at a later time.
Stats Definitions
W2M
For a while now, I’ve been trying to figure out how to compare win-loss records of people who have vastly different number of maps played. While we tend to use winning percentage as a key, this only applies if a player has a suitable number of maps played. For instance, you wouldn’t say a player with a 7-3 record is clearly better than one with a 60-40 record, even though the first has the edge in winning percentage, .700 to .600.
OK, but what if that first guy wins his next map? Now he’s 8-3. Probably still not definitively better. What if he goes to 9-3? To 10-3? And so on. Clearly, by the time he gets to 60-3, he’s better than the guy at 60-40, but where between 7-3 and 60-3 does he “pass” that guy?
I actually first took a crack at this a few years ago with NFL coaches. From a purely regular-season win-loss standpoint, was John Madden (103-32-7, .759 winning percentage) better or worse than someone like Chuck Noll (193-148-1, .566)? Madden has the better winning percentage, but in fewer than half the games Noll coached – 142 versus 342.
Perhaps the question isn’t strictly, “Who was better?” because that’s too prone to subjectivity. Rather, I’d like to ask, “Who had the better career?” or perhaps “Who had more of an impact?” Was it a guy who won 76% of his games but only coached 142 or a guy who won just over half of his games but coached 200 more? Which man “did more,” so to speak?
What I’m about to describe is a flawed statistic, I’ll readily admit, because win-loss records are heavily dependent on other factors. For an NFL coach, the players he has are a major factor. In OWL, if a team uses the same six players in every map, then they’ll all have the same record, even though they’re almost certainly of different skill levels. However, I think this statistic still has some value, and I’ll explain why later.
First, how do we compute it? Volume of wins should matter – Noll’s 193.5 is “better” than Madden’s 106.5 (counting ties as half a win). But winning percentage is also important – Madden’s .759 is better than Noll’s .566.
I worked through a few different ways to come up with a number, some more complex than others. At first I tried to work OWL teammates’ or team overall performance into things, but I found that to be unsatisfying because it would result in some players being penalized for a) playing most of their team’s maps and b) being on a good team. It resulted in some players with a .700 winning percentage grading about the same as someone under .500, which just didn’t feel right. Other variations of multiplying and dividing numbers wound up canceling out one factor or the other and leaving me with some ridiculously simple overall stat, like “two times wins.”
Finally, I thought I’d just keep it simple and make the most use of the two stats I wanted to employ. To that end, I just multiplied the two together: wins times winning percentage. Or, since winning percentage is wins divided by maps played, you can also express this as wins squared divided by maps played.
Alright, so let’s go back to our two NFL coaches, substituting “games coached” for “maps played.”
Madden: 106.5^2 / 142 = 79.9
Noll: 193.5^2 / 342 = 109.5
So Noll comes out on top. Does that mean he was “better,” strictly speaking? Maybe not, but it means he had a greater impact or value over his 23 years as a head coach than Madden did in 10. Still, note that Noll had 2.4 times as many games as Madden but still only rates 1.4 times higher with this stat. That reflects pretty well on Madden!
Going back to my initial example, 60-40 guy has a value of 3600/100, or 36. At 7-3, the second guy is at 4.9. To surpass the other guy, he’d have to win his next 32 maps in a row, going to a record of 39-3, a winning percentage of .929. That’s probably unrealistic, and at that point you’d probably say that he’s definitely better, strictly speaking. However, he’s won 21 fewer maps and played in fewer than half of the other guy, so my thought is that he’s provided roughly the same value as a player with a greater body of work.
The key point is that this stat uses both a counting (wins) and a rate (winning percentage) stat and therefore doesn’t need qualifiers like, “Sure, he’s good, but he’s only played X games/maps” – a statement that prioritizes rate over counting – or “He’s played a ton and has a lot of wins so of course he’s good” – which prioritizes counting over rate. That’s different from most traditional statistics in sports, like batting average, completion percentage, shooting percentage (rate stats), home runs, touchdown passes, goals scored (counting stats). It’s a little more like Wins Above Replacement in baseball, which accounts for both ability above a certain rate and volume of play.
(For what it’s worth, Captain Planet’s Player Impact Rating is a rate stat, requiring players to have appeared in at least 60% of their team’s maps. It’s a good one, for certain, and probably is better at figuring out players’ real value and ability than what I put together, though it would be nice if it could rate players below that 60% threshold.)
As I mentioned earlier, my stat isn’t perfect. It’s heavily team-dependent and does correlate strongly to overall wins, but I think it has two solid applications. The first is that I think it will prove more useful in the long term. Once players have 500 to 1,000 maps under their belts, having played across many seasons with many different teammates, the quality of one’s team will have less of an impact and it will be easier to determine who rates the highest.
The second, and more immediate, benefit, I think, is in comparing players on the same team, especially ones with the same role or that don’t, or rarely, play together. In these cases, they have mostly the same teammates and can be more directly compared. For instance, take a look at Atlanta’s 2019 page and compare Dafran with the man who replaced him in the lineup, Babybay. Dafran scores 10.3 (on 28 maps) while Babybay was 20.0 (on 74 maps). That’s interesting, to see that Babybay played basically three times as many maps while scoring just twice as well.
Those two weren’t on the team together, so maybe you could trace that difference to changes in the meta or just luck of the draw or schedule difficulty between stages. But then there’s this comparison to make: Persia versus Aimgod for the 2019 Boston Uprising. All one has to do is look at Aimgod’s 16.6 and Persia’s 0.9 to ask why the former was replaced in the lineup in the latter part of the season for the latter. In fact, Aimgod led the Uprising in this stat, probably in part because everyone else’s rating was dragged down by Persia.
All right, this explanation has gone on for a while with my calling it “this stat” or something similar. I wrestled for a while with what to call it, but felt that too many names with “value” or “score” in it made it sound like it was doing too much. As I said earlier, it doesn’t mean that a player with a higher number is outright better than a player with a lower one (well, maybe in the Aimgod/Persia case…). It just means that the higher player did more or provided more value without actually being more valuable, if that makes sense.
So at the end of the day, I just decided to be literal with my naming. It’s wins squared divided by maps, or W^2/M, so I just abbreviated that to “W2M.” Maybe it doesn’t roll off the tongue, and maybe I’ll waffle again and rename it later, but then I’d have to change my spreadsheets, and I hate doing that. So W2M it is for now, until I change my mind.
SRS
SRS stands for “Simple Rating System.” It was adapted for the NFL by Doug Drinen for on Pro-Football-Reference and is now used by virtually all sites in the Sport-Reference family. Here’s Drinen’s original description of the system and the primer on the SR site.
Briefly, a team’s SRS value tells you how much better (if positive) or worse (if negative) a team is than the average team in the league. In most sports, that’s couched in terms of points, goals, runs, or whatever the league uses to keep score. An NFL team that has an SRS of 5.8 scores, on average, 5.8 points more than an average opponent, while one with an SRS of -7.1 scores 7.1 fewer points than an average opponent.
What makes SRS more than just a simple matter of point differential, however, is how it incorporates a team’s strength of schedule (SOS) into the calculation. SOS is the average SRS of all teams that team has played against. So a team that’s played good teams will have a high SOS and a team that’s played poor teams will have a low/negative SRS. Add a team’s average margin of victory (MOV) to its SOS and you get its SRS. In other words, a team’s SRS is equal to how much it wins/loses each game by plus how much better/worse than average its opponents are.
It’s a little more difficult than that to figure things out; as Drinen points out, once you adjust one team’s SOS, that changes its SRS, so you have to recompute the SOS of every team that one played, and so on, all the way down the list, until the difference in iterations reaches zero. Fortunately, I have a spreadsheet. A very large spreadsheet.
For SRS in OWL, I use maps as the “points.” So a team that has an SRS of 1.00 tends to score one more map than its opponents. In a four-map match, the average score would be 2.5 to 1.5. And SRS is a vital component of our last statistic …
SAMP
Here’s a team strength ranking/potentially predictive algorithm I’ve come up with for the Overwatch League. For the sake of calling it something other than “my system” or “the system,” I’ve given it a snappy title: Schedule-Adjusted Map Percentage, or SAMP.
Here’s what SAMP tries to answer: If you have two OWL teams, one with, say, a 40% overall map win rate, and one with a 70% overall map win rate, what are the odds that one team or the other will win a map? Or the match? That’s actually fairly easy to figure out, on the surface.
Suppose A = the first team’s map win percentage, or 0.4, and B = the second team’s map win percentage, or 0.7. The formulas to determine the winning percentages (WP) for each team in a single map is:
WP(A) = (A – AB)/(A – 2AB + B)
WP(B) = (B – AB)/(A – 2AB + B)
In this case, you come up with
WP(A) = 0.22 = 22% chance for Team A to win
WP(B) = 0.78 = 78% chance for Team B to win
The problem with this simple measure (which is briefly explained way down at the end of this post) is that it doesn’t take strength of schedule into account. If Team A has played a tough schedule and Team B has played an easy one, then they should be closer in actual ability. It’s also possible that A’s played an easy schedule and B a tough one, so they should be farther apart. Hence, we need to put the SA in SAMP., and that’s where things get a little tricky.
The very long math part
To start, what is map winning percentage (MWP)? It’s simply maps won divided by maps played (or maps won plus maps lost); ties are ignored in this system. So, we have:
MWP = MW/(MW + ML) = MW/MP
Now I’ll step away and figure something else out: maps played per match or game (MPG). (I know, people usually refer to a full set of maps as a “match,” but since I’m already using M for “map,” I’m going to go with G for “game.”) That’s easy enough, it’s just:
MPG = (MW + ML)/G
Now, here’s a question for you: How many maps does a team win per game? And how many do they lose? That’s MW/G and ML/G. What’s their average differential per game, a.k.a. their margin of victory? That’s
MOV = (MW – ML)/G
Now I’m going to do some math. Hang on!
MOV = (MW – ML)/G
MOV * G = MW – ML
ML = MW – MOV * G
Now, we’ll sub in this formula for ML into the MPG formula:
MPG = (MW + ML)/G
MPG = (MW + MW – MOV * G)/G
MPG = (MW + MW)/G – (MOV * G)/G
MPG = 2MW/G – MOV
MPG + MOV = 2MW/G
(MPG + MOV)/2 = MW/G
Reversed, that’s
MW/G = (MPG + MOV)/2
In other words, a team’s maps won per game is equal to half of the sum of a team’s maps played per game and their margin of victory. Take the simple example of a team that played one match and won it 3-1. It played 4 maps in its one game (MPG) and won by 2 (MOV). 4 + 2 = 6. Divide by 2: 6/2 = 3. And 3 is the number of maps it won (MW/G).
Here’s a more complex example. Through Stage 3 of Season 1 (when I’m writing this), the Philadelphia Fusion had played 30 games (G), with 71 maps won (MW) and 62 lost (ML) (and one tie, but again, we’re ignoring that).
MPG = (MW + ML)/G
MPG = (71 + 62)/30 = 4.433…
MOV = (MW – ML)/G
MOV = (71 – 62)/30 = 0.3
MW/G = (MPG + MOV)/2
(4.4333 + 0.3)/2 = 2.3666…
Or, the basic method, directly dividing maps won by games:
MW/G = 71/30 = 2.3666…
It’s a match!
Now, go back to the first formula in this section, where we defined map winning percentage, i.e., the number we really want.
MWP = MW/MP
I can divide both the numerator and the denominator on the right side by the same number – G, in this case – and get:
MWP = (MW/G)/(MP/G)
MW/G was the formula I derived a moment ago ( = (MPG + MOV)/2 ) and MP/G is just another way to say MPG.
MWP = ( (MPG + MOV)/2 )/MPG
MWP = (MPG + MOV)/2MPG
MWP = MPG/2MPG + MOV/2MPG
MWP = 1/2 + MOV/2MPG
Let’s check this again, using the Philadelphia Fusion example. MW = 71, ML = 62, MOV = 0.3, MPG = 4.4333…, so
MWP = 1/2 + 0.3/2*4.4333… = ~0.5338
Now, the easy way:
MWP = MW/(MW + ML)
MWP = 71/(71 + 62) = ~0.5338
Perfect!
Time to get SeRiouS
Now, the final piece to the puzzle, in which we finally introduce the SA part of SAMP. For that, we turn to the Simple Rating System, or SRS, a system developed by Doug Drinen to rate NFL teams and used extensively by the Sports-Reference family of sites.
I’ve been computing SRS for OWL nearly since the start of the league. A team’s SRS is just its MOV plus its SOS (strength of schedule). It basically says how much a team would win each match by if it faced average competition. If SOS is above zero, it means the team has faced a harder-than-average schedule and its SRS will be higher than its MOV. Reverse all that if the SOS is below zero.
(I’m not going to get into all the complexities of how SRS – or primarily SOS, since that’s the tough part – is computed. Here is Drinen’s original post on the subject, and a description of how Sports-Reference uses it for various sports.)
What all that means is simply this: SRS is a modified form of MOV that takes strength of schedule into account. We already use MOV in our schedule, so subbing in a team’s SRS for that should be just what we need! So this formula:
MWP = 1/2 + MOV/2MPG
becomes
SAMP = 1/2 + SRS/2MPG
Still with me? Good, because we’re done! Whew!
Let’s use this formula to figure out the winning percentages for maps contested between Philadelphia and Florida, using their numbers at the end of Stage 3. Here are Philadelphia’s relevant (extremely precise, taken right from my spreadsheet for maximum accuracy) numbers:
SRS: 0.269476266801931
MPG: 4.4333333333333
That gives us:
SAMP: 0.53039206
This means that, against an average opponent (SAMP = 0.5), the Fusion should win 53.039206% of maps. For reference, the Fusion’s actual map winning percentage (MW/MP) is 53.3834586%, which is slightly higher, as Philadelphia has a very small negative SOS (about -0.03), thus making their SAMP smaller, as we’d expect from the slightly easy schedule.
For Florida, we have:
MW = 35
ML = 86
MP = 121
SRS = -1.43900020296486
MPG = 4.0333333333333
SAMP = 0.321611545
Florida’s actual map winning percentage is 0.302521008. They’ve had a pretty tough schedule (SOS about 0.14), compared to Philadelphia, and their SAMP is higher than their MWP, as expected.
Now, I’ll feed those two SAMPs into that WP formula from waaaaay back in the first section – using them as the A and B values – to get (to one decimal place):
WP(PHI) = 70.4%
WP(FLA) = 29.6%
Now that I have this, I can compute odds for entire matches – my originally stated goal. For instance, this system gives Philadelphia an 84.3% chance of winning the match and a 15.7% chance for Florida. And that’s not all I can do. There’s a 24.6% chance Philadelphia wins 4-0, and a 7.7% chance Florida wins 3-2, with an overall 26.0% chance the series goes to five maps. On average, Philadelphia wins almost exactly 3 maps (3.00065827, to be precise), while Florida wins about 1.26. That’s all relatively basic probability, though I might go into it in a different post.
For now, we’re done here, though I’ll try to address what I figure will be some of the most basic questions people will have about SAMP.
Why did you create SAMP?
I like numbers. And sports. And spreadsheets. And Overwatch.
OK, the (slightly) longer version is that I’ve been using the Elo matchup formula to try and predict Overwatch matches (and even full stages) for a while now, and while it’s OK, I don’t think it’s a perfect match for what I’m looking for. In particular, it has trouble dealing with “edge cases,” like teams as bad as the Shanghai Dragons, giving them surprisingly good odds (in the 10-15% range) for winning matches, which they’ve been very, very bad at. On the one hand, that might be a good thing, since regression to the mean is a thing. On the other hand, maybe the Dragons are so bad (and the Excelsior so good) compared to the rest of the league that they aren’t going to regress.
Also, the Elo formula uses a number that’s more or less plucked out of the ether (400 in the case of the base formula which starts players off with a 1500 ranking, and 4 for the version of the formula I used for my OWL predictions), and that irks me a bit. SAMP has its flaws, as I mention elsewhere in this section, but at least it doesn’t “make anything up,” so to speak.
You’re not counting tied maps?
No. To be perfectly honest, I don’t know how to. SAMP is binary, counting just wins and losses and I don’t know how to introduce a third, very low-probability outcome into it. For what it’s worth, only about 2% of OWL maps end in a tie, so I don’t feel like excluding them is a big deal.
Are these numbers really valid? Do the Fusion really have a 70% chance of taking a map from the Mayhem? What about if Philadelphia is on a winning streak? What if Florida just got a really good player?
SAMP isn’t perfect. Then again, neither is any predictive system. A 0.000001% chance of winning the jackpot from a slot machine means you probably won’t win – but sometimes you do. Upsets are what make sports, and esports, fun! (Loot boxes, on the other hand…)
That being said, I think SAMP provides the best possible predictive algorithm that relies on pure statistics. If Florida did get a new player, how much better would that make them? 5%? 10%? Maybe he’s a great hitscan player but a bad communicator and will actually make the team worse. Nobody can know this for sure (though analysts, both professional and amateur, will try).
Winning and losing streaks are another thing that sports people like to think mean something, but they’re usually not as important as overall record. Being 20-10 is pretty good, but it generally doesn’t matter if you got there by going WWL 10 times or if you started 10-10 and then won 10 straight.
Like SRS, which it’s based upon, SAMP uses only the results of matches that have happened and makes the – admittedly limited – assumption that this is a team’s true skill level. As such, it’s as (in)accurate as any metric, even the most basic one: wins and losses. Early in a season, would you say, without a doubt, that a 3-1 team is better than one that’s 2-2? Probably not. But you’d likely say that of a team that’s 30-10 versus one that’s 20-20. More information helps us make better assumptions about a team’s talent level, and that’s always going to be the case, no matter what you use to measure a team.
I think SAMP, and SRS, do a pretty good job of predicting how well a team performs and lets us crunch some numbers to come up with some reasonably accurate predictions. If you don’t like it, that’s fine.
Shouldn’t you take map types (Assault, Control, etc.) into account, or even maps themselves (Numbani, Junkertown, etc.) into account?
Maybe, but that would greatly ramp up the complexity. What I’ve got now allows me to compute these percentages with relatively little effort using basic, easily obtainable statistics. I also have some question as to how valid it would be to use individual map types or maps, due to the small sample sizes.
This is all pretty complex. Is there a more intuitive way to figure it all you?
Kind of. Here’s my attempt at explaining it (mostly) in English, and is how I came up with it all. In other words, I reasoned it out by pieces, I didn’t just start making up formulas and plugging stuff in.
The objective is to come up with something like map win percentage (MWP = MW/MP) but using SRS. So I came up with that “simple” example, of the team that wins 3-1 and assumed they beat an average team. Their margin of victory (and also their SRS) is +2, and their basic map win percentage is 0.75. They also average 4 maps played per match. All that should be pretty obvious.
Then I tried fiddling with the numbers to see if I could make something happen, and I realized that (MP/G + MOV)/2 = MW/G. In this case, that’s (4 + 2)/2 = 3. You can try it for a few other cases, like for a negative MOV or using results of two games or a game with a five-game series, and you’ll see that it works. And I knew that since MWP = MW/MP, I could mash all that together to get a MWP-ish number that had MOV in it. And once I got MOV in that kind of formula, I could sub in SRS. This is closer to what I actually use in my spreadsheet, a kind of simplified version of the “final formula,” above.
Was that better? Probably not, but I tried.
That’s not so bad. So why do the long explanation?
I wanted to show all my work, so people would know how mathematically sound it all was, and to have everything up front and precise so I could direct people here if they had questions.
Also, the first time I tried coming up with something like this, I got it close, but wrong. I’m honestly a little fuzzy on how I came up with what I did come up with, but when I applied one more division, it somehow matched up with the formula above. So, I like to think of writing this up and making sure everything followed properly is a way of checking my work.
That first WP formula, with the A’s and B’s, could you explain that?
This has gone on long enough, so I’ll give the really, really quick explanation, similar to the “I just plugged in the numbers and came up with a formula” method I talked about in the last section.
Pretend you’re in a coin-flip league. You win if you get heads and your opponent gets tails; ties are reflipped until someone wins. Instead of being 50/50, however, all the coins are weighted. Suppose Team A’s coin comes up heads 3 out of 4 times and Team B’s comes up heads one 1 out of 3 times. Here are all the possible results:
A | B | Winner |
H | H | Reflip |
H | T | A |
H | T | A |
H | H | Reflip |
H | T | A |
H | T | A |
H | H | Reflip |
H | T | A |
H | T | A |
T | H | B |
T | T | Reflip |
T | T | Reflip |
There are 7 final results (non-reflips). A wins 6, B wins 1. WP(A) = 6/7, WP(B) = 1/7. Feed those into the WP formula (A = 3/4, B = 1/3), and you’ll get the same results.
You could build charts like this with binary win-loss cases using any probabilities, but I wouldn’t recommend it with really large numbers.
Bro, do you even play?
I do. I’m vaguely competent, though I don’t play much competitive, since I prefer to have fun. Mystery Heroes FTW!