Ben "Noxville" Steenhuisen
|
September 2, 2021

What's the risk of unofficial esports data?

Esports data has to be treated with care. The top quality data requires expertise in esports data and specific titles to reach its full potential.

About the Author
Ben works as a senior software architect at Bayes Esports. His first esports love was Counter-Strike 1.6 but he also picked up Dota and Dota 2 along the way. Ben has worked at multiple international esports events doing statistics for the broadcast, and when he’s not at his keyboard he’s… actually he’s always at his keyboard.

Esports data is in generel terms very widely accessible. This is great as the accessibility allows myriad third parties to explore the data and make interesting products with it.

However, small but significant nuances that can be difficult to understand exist in every single title - and these can lead to confusion for both casual fans and experts alike. There are broadly two places where errors can occur: the ‘digitization’ of actions or statistics that occur in the game, and the interpretation of those actions. 

In traditional sports complex and interconnected cameras, computer vision algorithms and audio systems are used to calculate and adjudicate specific events such as whether a ball was in or out (e.g. HawkEye in tennis), or if a batsman has nicked a ball (e.g. in cricket), or even just tracking the position of the ball (in soccer). Despite being custom built, these systems have a significant underlying error rate.

Esports on the other hand operate in a digital environment, and as a result there are precise internally tracked properties for every object in that environment. This allows us to accurately answer any question of that environment: there is simply enough data to perfectly recreate the path of every single bullet in a match of CS:GO if we so wanted.

Official and unofficial data

Perfect digitization is only limited by the fidelity of the event collection method: unofficial OCR data (for example, recorded off a Twitch stream) might miss specific events because of on stream replays, unconventional overlays or just because the observer missed the action. As a result, unofficial data reintroduces issues similar to those that plague traditional sports. Mistakes are made and crucial information is missed, leading to inaccurate betting odds, incorrect analyses, and the spread of misinformation. 


Official data sources are generally generated by the game server and thus avoid the shortcomings of OCR. By avoiding errors and mistakes caused by scraped data through the use of what is mostly an absolute source of truth in the form of official data sources, the professionalism of esports can be raised to the next level and the user experience of both casual fans and experts can be improved.


Interpretation issues with OCR data

Moving on to interpretation issues, in traditional sports there are often ‘stat corrections’ issued which can overrule who was credited for specific actions (both good and bad!). These can take days to be announced. A side effect of the structure of the leagues were for instance in American Football, where the NFL League Office needs to liaise with the official statisticians of the NFL in order to issue corrections.

These adjustments are based on a best effort process, simply because the underlying data could be wrong. While these corrections of a player gaining or losing a single yard or tackle may seem very minor at first glance, it is important to understand that not only do these small errors add up over the course of an entire season, but also that every single yard, tackle or fumble recovery matters in a players chase for milestones and records, for experts to make accurate analyses and for fans to know who to rely on for their fantasy team or who to place their bet on.

This happens very rarely in esports because the game itself is an arbiter for who gets credit for each action. That said, very rare situations do occur caused by bugs, glitches or the game's limitations where similar issues may arise and only those with deep understanding of the game can fairly recognize that corrections need to be made. 

An example of this is in DOTA 2 where heroes are capped at two thousand ‘creeps’, so in games where this is surpassed they need to be adjusted appropriately because the game itself might be wrong. Not doing so or doing so incorrectly once again causes errors and mistakes to arise that could and should be avoided, so that statistics may be used by players, experts and fans alike to properly gauge the performance of a player or team.

Conclusion: Why you should avoid unofficial esports data

As a conclusion, esports can completely bypass many of the issues which plague traditional sport data. However, in order to do so, esports data has to be treated with care and the clear pitfalls associated with unofficial data have to be avoided. Top quality data, combined with overall expertise in both esports data and the specific title you wish to explore, are required if esports is to reach its full potential.


Recent Blog Posts

Esports is a billion dollar industry - why is the legitimacy of esports still a topic in 2021?

View more

Predicting the unpredictable with esports events

View more