From StatDNA to Arsenal Data Analytics

To many people, the 2003 book Moneyball: The Art of Winning an Unfair Game by Michael Lewis was the beginning of a new era in modern sport. 

The book, detailing the work of baseball general manager Billy Beane at the Oakland Athletics, became a global sensation after it was adapted into a film in 2011 by Bennett Miller and starring Brad Pitt in the role of Beane. 

Since then, all sports, be it football, rugby, cricket or basketball, have all begun to look to find their own Moneyball moment. 

Billy Beane’s expert management of the Oakland Athletics kickstarted a data revolution in sport.

While the idea of stats in football have generally been laughed away as being the sort of things the boffins who couldn’t kick a ball in PE were using as a way of making their interest in the sport feel justified, it is fair to say that analytics and data have been at the heart of football’s tactical and recruiting revolution. 

Though many books such as Christoph Biermann’s Football Hackers: The Science and Art of a Data Revolution and Rory Smith’s recent release: Expected Goals: The Story of How Data Conquered Football and Changed the Game Forever have dived deep into how clubs use their analytics, in reality, as detailed as they are, they can barely scratch the surface. 

Clubs are jealous custodians of their data and are unwilling, bordering on virulent at the thought of sharing their findings with anyone else. 

Data and analytics have played a huge part in football’s tactical renaissance.

For years, Premier League clubs tended to dovetail their analytical approach through the data findings from the likes of Opta, the official analytics arm of the Premier League and Prozone, an outside company that quickly weaved its way into the vernacular of many of the Premier League’s elite. 

While data and analytics may seem relatively ubiquitous in the world of the modern game, it may surprise many of the Premier League’s biggest names do not currently have their own in-house analytics department. 

Arsenal, on the other hand, boast their own, (at one point cutting edge) analytics department, one that is also the envy of many around the world. 

In 2012, Arsenal announced the acquisition of American data and analytics company, StatDNA. 

While the acquisition was recorded in the company’s accounts as “AOH – USA, LLC”, no one understood why Arsenal were investing £2.165m to acquire an American data company, when there were far greater issues affecting the club. 

To paint the scene, the club’s acquisition of the company was announced during the club’s annual general meeting (AGM). 

Ivan Gazidis and Arsène Wenger believed in the potential of StatDNA from the start.

At this time, Stan Kroenke was in the midst in a bitter battle of wills with Uzbekistan minority shareholder Alisher Usmanov for sole ownership of the club. 

At the time of the AGM (October 25th), Arsenal were drifting listlessly in 9th place in the Premier League. Five days earlier, the club had been undone by Norwich at Carrow Road; the club’s captain, Robin van Persie had been sold in the summer and had scored his 6th goal of the season for Manchester United as Alex Ferguson’s side began to stake a claim on lifting their 20th league title under the Scotsman. 

The club had been active in the transfer market with Lukas Podolski, Santi Cazorla and Olivier Giroud added to the squad, but they were not enough to relieve the pain of having lost van Persie to a rival, allowing his contract to run down in the first place or the fact that the club had only spent the money as a result of making most of it back. The purse strings were clearly not being loosened where it was needed most. 

But, while the naysayers were quick to dismiss the purchase, Arsenal knew different. To manager, Arsène Wenger and chief executive, Ivan Gazidis, the potential was invaluable. Hendrik Almstadt, previously an executive assistant at the club, had recommended the company to Arsenal as a means of potentially gaining a foothold in the transfer market. 

Almstadt, a keen and avid football fan, had been toying with number of metrics and analytics as a means of identifying talent in his spare time during his executive role at the club. 

Arturo Vidal was identified as a potential replacement for Patrick Vieira.

Though not one of chief scout Steve Rowley’s trusted scouts, Almstadt began using certain parameters to find specific players similar to that of club legend Patrick Vieira. The name he found was Arturo Vidal, then of Bayer Leverkusen in the Bundesliga. 

Though the recommendations was never made to Wenger, it highlighted to Almstadt the importance that data and analytics could make in football. 

The story of how Wenger was convinced to sign StatDNA is fairly well-known at this point. 

Prior to their acquisition of the company, Arsenal had been working with StatDNA on an ad-hoc basis. However, it was a presentation from Almstadt that convinced Wenger to make the move permanent. 

Though not a part of StatDNA, Almstadt delivered a presentation to Wenger, a presentation that showed that StatDNA would never have suggested the signings of Marouane Chamakh and Park-chu Young, owing to their underlying stats. 

Some ten years on from the club’s purchase of StatDNA (now called the decidedly less exciting “Arsenal Data Analytics”), the company has played a key part in many of the club’s successes and its failures. 

For every Per Mertesacker, there will be a Shkodran Mustafi. 

Marouane Chamakh was one of the signings StatDNA flagged as a bad purchase made by the club.

What initially started off as a means of gaining ground in the transfer market, has since evolved into being the very heart of Arsenal’s footballing model moving forward. 

Players have been signed (or not signed) on the findings of StatDNA, management decisions have been made and tactical analysis carried out on the back of StatDNA and its data. 

While rumours have abounded for many years that the company advocated against the signing of Real Sociedad winger Antoine Greizmann, for many Arsenal fans, their introduction to StatDNA was the 2016/17 season, specifically, the summer transfer window. 

Of course, in reality, the company had been highly involved in almost all key football decisions prior to this, but for fans, it was the three signings Arsenal made that summer that coloured the view of the club’s approach to analytics. 

That summer, Arsenal, having watched on jealously as Leicester City lifted the Premier League trophy only a few months prior, began to finally loosen the pursestrings. 

Granit Xhaka joined from Borussia Mönchengladbach, while Lucas Pérez joined from Deportivo La Coruña and Shkodran Mustafi from Valencia.

Lucas Pérez is widely viewed as a StatDNA misfire.

These signings are not looked back on with particular fondness. 

While Xhaka has enjoyed something of a renaissance period under Mikel Arteta in the past 18 months, neither Pérez or Mustafi are remembered especially fondly. 

Pérez made 21 appearances for the club, scoring 7 goals, while Mustafi is largely remembered as an extremely expensive, error-prone mistake. 

Both players cost Arsenal roughly £55m. Overall, the club made back just £4m on their respective sales, all of which was generated by Pérez – Mustafi was released. 

Many derided Arsenal’s statistical approach and the work of Jaeson Rosenfeld, the erstwhile founder of StatDNA and the club’s foremost data guru. 

Fans deduced that Arsène Wenger, notorious for being far too trusting of those at the club, had been duped by “fancy numbers” and had failed to do his due diligence in assessing the pair before signing them. 

The obvious reality is that the data alone would not have been the reason Arsenal signed either player. The data had merely suggested the club take a look at the players – it was up to the scouting department to assess them from there. 

Arsène Wenger was criticised for wanting to play Granit Xhaka as a box-to-box midfielder.

It is also worth attaching context to all three players. 

Xhaka, who was himself criticised on a regular basis, was initially signed by Wenger as a box-to-box midfielder, something many fans at the time criticised him for. Spare a thought for them now as Xhaka now thrives in the role for Mikel Arteta, culminating in a well-taken goal in the team’s emphatic 3-1 victory over Tottenham in the north London derby, his second goal of the season. 

Statistically, Mustafi was one of the highest performing defenders in Europe and was reportedly under the watch of many top clubs around Europe; he was also not the club’s first choice, with Wenger preferring a move for Roma defender, Kostas Manolas. 

Like Mustafi, Pérez too was not the club’s first, ideal choice for the role. 

The club’s lengthy and public pursuit of Leicester’s Jamie Vardy proved unsuccessful, as too had their pursuits of Pierre-Emerick Aubameyang, Alexandre Lacazette and Álvaro Morata. The Pérez signing in particular has the air of Wenger being coerced into striking a deal, rather than one that was driven by the Frenchman. Pérez’s lack of game time under Wenger (even after scoring a Champions League hat-trick for the club) seems to support this theory.

Though Rosenfeld and many of his StatDNA cohorts such as Sarah Rudd and Fran Taylor, were brought to the club on Wenger’s recommendation, they were eventually tasked with finding his replacement. 

Former Paris Saint-Germain coach Unai Emery succeeded Wenger.

In the end, Wenger called time on his Arsenal career in April of 2018 and the club began the painstaking process of replacing him. 

The club’s executive structure was, at this point, a far cry from the structure that had acquired the company six years or so prior. 

Now, Sven Mislintat was the club’s head of recruitment (and one who favoured his own analytics company Matchmetrics as a means of recruitment rather than StatDNA) and Raúl Sanllehí was the club’s first-ever head of football relations. 

StatDNA would, of course, play an integral role in the process. The club whittled down its long list of candidates and began internal discussions as to who would succeed the Frenchman. 

It’s difficult to know just how much of Arsenal’s statistical findings played in the final decision. The decision to appoint Unai Emery certainly wasn’t a unanimous one and its difficult to totally eliminate the cynicism that Sanllehí may well have had the ultimate say, based on his cordial relations with super-agent Arturo Canales. 

Other candidates for the role, as revealed by The Athletic included: Max Allegri, Mikel Arteta, Thierry Henry, Julen Lopetegui, Ralf Rangnick, Jorge Sampaoli and Patrick Vieira.

Jorge Sampaoli was one of the many other candidates considered as Wenger’s successor.

That process, in many ways, spelled the beginning of the end for Rosenfeld at the club. 

Mislintat’s preference for his own data and analytics tools had effectively sidelined StatDNA from much of the recruitment process and the club’s scouts were essentially left in the dark from the German’s thought process. Mislintat eventually departed the club after a dispute with Sanllehí over the technical director role. 

While Mislintat’s departure may have been seen as a chance for Rosenfeld and co to stamp their authority back on the decision-making process, there was more frustration to endure. 

Ivan Gazidis’ departure only a few months prior had caused a mad scramble from all sides as executives tried as hard as they could to get their hands on whatever modicum of power they could find. 

Sanllehí won much of the internal power struggle and assumed most of the recruitment responsibilities in the wake of Mislintat’s departure. 

The following summer proved to be the last straw for Rosenfeld. 

Sven Mislintats’ Matchmetrics analytics company was often dovetailed with that of StatDNA.

Arsenal enjoyed an expensive trip to the transfer market, spending nearly £150m, but Rosenfeld’s suggestions were largely ignored by much of the executive team. 

In Wenger’s 22 years with the club, it had taken him 16 years to spend over £30m – signing Mesut Özil from Real Madrid for £42.5m. When Wenger left, his most expensive recruit turned out to be his last, Pierre-Emerick Aubameyang from Borussia Dortmund (Mislintat’s old stomping ground) for around £54m. It took Sanllehí just 18 months to eclipse it. 

The club presented the signing of Nicolas Pépé from Lille for £72m. 

Whatever else can be said of the signing, it was very much not a recommendation that StatDNA made and especially not at the price that the club ended up paying. 

Despite leaving the club in 2020, joining Wenger at FIFA, Rosenfeld did still have one last say in a major executive decision. 

Following the end of the 2018-19 season, Sanllehí was in favour of extending Emery’s contract. 

On the surface, Emery’s season had been fairly successful. The club reached the quarter-final of the League Cup, finished within a point of fourth place and reached the final of the UEFA Europa League, coming within 45 minutes of winning it. That season had also included an impressive, if not wholly convincing, twenty-two match unbeaten run in all competitions. 

Despite reaching the UEFA Europa League final, the underlying data of Emery’s time at Arsenal was not convincing.

While Sanllehí was sufficiently impressed, Rosenfeld was not. 

StatDNA had identified that Emery’s season, while fairly encouraging, had plenty of areas for worry, often relying on other teams underperforming their xG, certain players “bailing the team out” and other factors beyond Emery’s control and many more that he seemed unwilling to discard. 

Since then, the club has undergone yet more revitalisation

Sanllehí was relieved of his duties back in 2020 and Edu Gaspar has since become the club’s technical director. 

Edu’s appointment is likely to have been more positive news for StatDNA. 

The Brazilian is a huge supporter of data and analytics, and even pushed Corinthians to make use of their Central Intelligence centre during his time with the club. The centre is Corinthians’ own in-house analytics department and is still relied on to this day. 

While the decision to make 55 staff members redundant in the wake of the COVID-19 pandemic will certainly have raised some eyebrows, it did give Edu the chance to re-vamp the club’s scouting set-up. 

Edu’s time in Brazil kickstarted a love of data and analytics.

Edu ensured that the scouts who remained behind were those who were most familiar with data and analytics and who would be working closely with StatDNA. 

“I want to work with fewer people. I want to work a lot more with StatDNA, which we have internally here at the club. It is very important.” If ever there was a vote of confidence from the executive structure that Arsenal needed StatDNA, this quote from Edu was it. 

Sarah Rudd, the club’s vice president of software and analytics left the club in August of 2021 to found Blue Crow Analytics, allowing Tolly Colburn to become data analytics lead, while Rudd’s direct replacement is Chris Dove, who becomes the head of software and analytics. 

In many ways, Arsenal still have much to manage with StatDNA. 

Though Sven Mislintat may have left the club, there are still many analysts that he brought with him to the club who use the Matchmetrics data and analytics. Mikel Arteta had seen a way to join the thinking of the two teams. 

Lee Mooney, formally the head of data insights and decision technology at Manchester City, left Pep Guardiola’s side in 2019 to start up his own company. Despite this, Mooney has since offered support to Arteta and the analytics department on a consultancy basis. 

Long time Arsenal scout Francis Cagigao was let go as part of a scouting reshuffle.

Today, Arsenal’s recruitment is a far more refined animal than it was before. 

Many of Arsenal’s scouts over the years have worried that the club has been looking to replace them with StatDNA. While Edu’s decision to cull the department will have done little to assuage that fear, the mantra has always been that the stats are there purely to support anything a scout picks up on. To compliment, rather than replace. 

With Edu’s decision to cull the scouting department, including releasing such big name scouts such as Francis Cagigao, Pete Clark, Brian McDermott, Ty Gooden, Leonardo Scirpoli, Alex Stafford, Julio de Marco and Alessandro Sbrizzo, it seems as though Arsenal will be focusing more on a data-led approach moving forward. 

It is important to note that while StatDNA plays a huge part in Arsenal’s recruitment policy, it is also at the forefront of Arsenal’s tactical approach. 

The company are able to provide careful and clean analysis of games to Mikel Arteta and his coaching staff. The data is poured over by a team of data analysts and sports scientists and is then interpreted by Arteta and his staff. 

Of course, the full selection of stats in Arsenal’s arsenal (pun intended) isn’t fully-known, but the club has access to a myriad of advanced in-game metrics.

Mikel Arteta and his coaching staff have a huge variety of advanced metrics at their disposal.

Arsenal can find the expected possession value of a player and their packing stats. Elsewhere, the club can monitor each player’s PPDA (passes allowed per defensive action), post shot xGs and more. 

Other key metrics such as xT (expected threat), xGC (expected goal chains) and defensive coverage per player are essential to building an advanced tactical set-up. 

Moreover, the club are able to pinpoint those key areas of the pitch in which their opponents are stronger and which they are weakest. 

In the Amazon documentary All or Nothing, Mikel Arteta is seen advising his players to keep the ball out of the middle areas of the pitch, for a match against Wolves. These are, statistically, the areas of the pitch in Wolves are at their best. 

Players such as João Moutinho and Rúben Neves have been key to Wolves build-up play in recent times and Arteta is seen telling his players to keep the ball out wide where possible. It is of no surprise that Arsenal’s two goals in that game both started on the wing. 

Gabriel Paulista is widely regarded as one of the first StatDNA signings.

Indeed, many of these metrics are as useful in recruitment as they are in general tactical preparation. The signing of Gabriel Paulista from Villarreal is largely considered to be StatDNA’s first official signing for the club. Indeed, the player was signed through a combination of defensive statistics which informed their recommendation to the scouting department. 

While Arsène Wenger was characteristically coy about which specific metrics the club investigated in the signing, he did reveal that interceptions, defensive errors, winning tackles and set piece receptions were some of the most considered.

In reality, StatDNA has been at the heart of many more. 

In its ten years with Arsenal, StatDNA has not always enjoyed the most positive press from Arsenal fans. 

While many have derided many of their recommendations, the software has been at the heart of Arsenal’s footballing operation for over ten years and its difficult to see Arsenal looking elsewhere. 

In the end, StatDNA, Prozone, Opta, STATS, StatsBomb, Decision Technologies and any other data and analytics tools out there can only present ideas to those in power, how those suggestions and the numbers are interpreted is ultimately down to those who read them.

Of course, the perception of StatDNA may have changed in another ten years time, but one thing is certain, StatDNA is Arsenal Data Analytics and, as far as Arsenal are concerned, it is here for the duration. 

Leave a Reply

%d bloggers like this: