In December of 2012, Arsenal quietly made one of their most significant signings to date. The club purchased US-based football data analytics company StatDNA for a reported figure of £2.165m.
There was nothing especially “cloak and dagger” about the purchase. The company’s acquisition was listed in the club’s annual financial reports and was even mentioned by then-Chief Executive Ivan Gazidis in the club’s Annual General Meeting (AGM), though it was somewhat covertly referred to by Gazidis as “AOH-USA LLC”, which stands for Arsenal Overseas Holdings.
But just what exactly is StatDNA?
Well, for one thing, the company is no longer known as “StatDNA”, but instead goes under the catchy name of Arsenal Data Analytics or Arsenal Data Holdings. The company focuses on providing Arsenal with real-time analytics and data to ensure that the club can make the correct decisions moving forward, whether it be scouting a new player or monitoring the training habits of the playing squad.
The leader of StatDNA, Jaeson Rosenfeld eventually became the club’s Head of Analytics and was often at the heart of key recruitment decisions.
Former Arsenal manager, Arsène Wenger has been a big proponent of data and analytics in football ever since he joined in 1997. Before this, the club had often outsourced much of its data and analytics to third parties, such was the norm for Premier League clubs, but a presentation from Rosenfeld and the company’s Head of Business Development, Hendrik Almstadt, persuaded Wenger to take the leap on the company full-time.
Rosenfeld and Almstad’s presentation was simple. Why the signings of Marouane Chamakh and Park Chu-young were not the right decisions.
Of course, any Arsenal fan who had watched either play (and they would have had very limited opportunities with the latter) could likely have told Wenger themselves, but StatDNA’s pitch was more holistic than that.
The company is an expert in the field of sports data performance analysis, which is a rapidly developing area and one that I, and others, believe will be critical to Arsenal’s competitive position, the insights produced by the company are widely used across our football operations – in scouting and talent identification, in game preparation, in post-match analysis and in gaining tactical insights.”Ivan Gazidis — Arsenal 2012 AGM.
The company raised key stats in both players that would have advised against their signing. Wenger was convinced and the company was bought outright; a major coup for Arsenal, who had had multiple rivals sniffing around the company for some time. In fact, Arsenal had paid StatDNA a $250,000 retainer, just to ensure that none of their rivals could use their services.
In Christoph Biermann’s book Football Hackers: The Science and Art of a Data Revolution, more light was shed on the presentation the company made to Wenger.
At the time the Expected Goals (xG) stat was practically unheard of. StatDNA used it as one of their many reasons for flagging up both signings.
Almstad pointed out that Chamakh’s low xG rating was because he was taking shots from improbable positions. The player’s early form for the club was described in the book as a “hot streak” and highly unlikely to ever be replicated.
Other categories besides xG have been added to the software. Pass Value was added to the software’s repertoire, as was assigning players a Creation Value grade.
Those few Arsenal fans who are familiar with the name “StatDNA” will point to it’s obvious failings. The likes of Kevin de Bruyne were flagged as potentially bad signings over questions regarding his physical stats. Players such as Antoine Griezmann, who was recommended personally by Wenger, was also discounted amidst fears of his underperforming metrics. Even later Wenger signings such as Shkodran Mustafi, Granit Xhaka and Lucas Pérez were all recommended by StatDNA.
However, the software has also flagged up some very useful players. While not truly exceptional in his time with the club, Brazilian defender Gabriel Paulista proved to be a useful signing. Gonzalo Higuaín was another who was suggested before Real Madrid sold him to Napoli, although Arsenal were unable to complete a deal then.
StatDNA has evolved over time and has been at the heart of all major transfers at the club since its purchase. Defender Per Mertesacker, then of Werder Bremen, was not an especially remarkable player where stats were concerned, however, the software used its newest feature “Defensive Errors” and found that Mertesacker, despite his generally average metrics in other areas, almost never seemed to make a mistake.
In 2020, Rosenfeld left Arsenal to join Wenger at FIFA. The reasons are not best-known. The romantic explanation is that Wenger was simply enamoured with Rosenfeld’s ideas and wanted him to help FIFA further. The more pessimistic outlook is that Rosenfeld’s transfer suggestions were repeatedly being overlooked in favour of then-Director of Football Raul Sanllehí’s preferred contacts-driven approach and the appointment of Sven Mislintat as Head of Recruitment, who preferred to use his own data software, MatchMetrics.
However, StatDNA went well beyond simple recruitment advice.
A lot of clubs use external data providers to help them with recruitment and conditioning, the most well-known company is of course Opta, however, rather than simple macro analytics, Arsenal wanted a more tailored approach.
StatDNA is used by the club to measure a player’s stats in training and is often used to decode matches, generally within a 14-hour period.
One area that has been of concern to Arsenal over the years has been defence. StatDNA has been used as a means of analysing the club’s defensive metrics, focusing on far more niche optics.
For instance, Arsenal often use the data to assess how often a player notices (or fails to notice) specific types of runs from attackers. They are able to assess how often a player loses a one-on-one duel and which combinations of play suit specific defensive partnerships.
The company was also essential for selecting Wenger’s successor.
When Wenger stepped down, the club began searching for the next manager (or in this instance, head coach). The data led them to several potential candidates.
Juventus boss Max Allegri was interviewed by the selection panel (Gazidis, Sanllehí and Mislintat, along with recommendations by Rosenfeld), but felt that, quite apart from his limited grasp of English, that the stats simply weren’t in his favour.
The stats pointed towards Allegri favouring an overly-defensive approach, one that relied more on scoring early and then relying on more defensive tactics to see out the game. Arsenal were not impressed by this. The club had, since 1997, played ostensibly attacking football and the team’s defence was not suited to Allegri’s style of play, not without hugely significant investment in the playing staff, potentially signing players with little or no resale value.
In the end, Arsenal opted for Unai Emery, a decision that was was based on a combination of a presentation Emery made to the selection panel and performance metrics.
A lot has changed since then.
Since Wenger’s departure, there has been a huge turnover of staff at the club.
After Ivan Gazidis left for AC Milan, an internal power struggle began, causing a vacuum as everyone clawed for any type of power they could get their hands on.
Sven Mislintat, erstwhile Head of Recruitment, was the first to go. Fallings out with Sanllehí over the approach to transfers was one point of contention, being passed over for the promised role of Technical Director was another. Former Arsenal Invincibles midfielder Edu Gaspar was later appointed in the role.
Rosenfeld eventually left following disagreements with Sanllehí and, evenetually, he too was shown the exit.
Sanllehí’s departure should be noted as being a turning point in terms of how the club uses data in the current set-up.
As Director of Football, Sanllehí ultimately called the shots when it came to most big decisions. He decided that Arsenal should be relying on the huge rolodex of intermediaries at his beck and call and conclude deals through agent recommendations, rather than identifying them themselves.
Of course, data and analytics were still used. All signings needed to be thoroughly vetted before they were signed, but the thoroughness of the recruitment process was often usurped in favour of Sanllehí wanting to appease specific agents.
When Sanllehí eventually left the role in August of 2020, the direction of recruitment fell to Edu.
Edu is a big advocate of statistical analysis and its uses in the modern game.
During his time with Brazilian side Corinthians, Edu pushed hard for the club’s Central Intelligence sector to be greater utilised. The move paid off as Corinthians, to this day, still use the hub as a means of tracking players performances and conditioning in training and analysing potential recruitment decisions. Exactly how StatDNA is used at Arsenal.
Initially, Arsenal had an unofficial Recruitment Comittee, not too dissimilar to how Liverpool conduct transfers now.
Initially, the committee included:
- Francis Cagigao — Head of Recruitment
- Arsenal manager
- Sarah Rudd — Head of Data
- Ivan Gazidis (later Vinai Venkatesham)
One of Edu’s first decisions, was to gut the Arsenal scouting team. Francis Cagigao (responsible for the signings of Cesc Fàbregas, Héctor Bellerín, Alexis Sánchez and Gabriel Martinelli), who took over from Sven Mislintat as Head of Recruitment and Pete Clark, Head of UK Recruitment were let go.
“I want to work with fewer people. I want to work a lot more with StatDNA, which we have internally here at the club. It is very important. I want the people I want to work with to be very close to me. I don’t want individual people working in one area or for one country. I want a group working together. Fewer people with much more responsibilities. That is my vision.”Edu Gaspar on the importance of StatDNA and the redundancies facing the Arsenal scouting team.
Other scouts, Ty Gooden (France and Belgium), Leonardo Scirpoli (Germany), Alex Stafford (Scotland), Julio de Marco (Spain) and Alessandro Sbrizzo (Italy) were also released by the club.
Since then, things have become a lot more streamlined. Those unable to use data and analytics are now surplus to requirements.
Edu now oversees a much smaller, but more focused team. The scouts are now responsible for pouring over the data of players and for watching them in-person.
A few changes in the make-up of the scouting team have allowed Arsenal a more laser-focused approach to recruitment.
Romain Poirot was poached from Manchester United, and covers France; James Ellis moved to N5 after five years with Fulham and is responsible for UK scouting; while Toni Lima joined from management company Baskonia Alavés.
Daniel Karbassiyoon, once a youth player at the club in Edu’s last playing season, was previously one of Arsenal’s main scouts in North and South America, he is now the Head of IT Products.
Karbassiyoon’s remit is to refine the club’s bespoke in-house recruitment tools and to provide greater improvement to the club’s performance hub for coaches, the medical and strengthening teams and for the club’s sports scientists.
Sarah Rudd was previously the club’s Vice President of Software and Analytics, but her departure in August 2021, saw the hire of Chris Dove, who is the Head of Software and Analytics, with Tolly Colburn promoted to Data Analytics Lead.
Contracts lead Huss Fahmy left the club not long after Edu’s appointment, which has now been filled by Richard Garlick, who covers the contract side of things and will occasionally lend a hand in negotiations, playing a key part in the capture of Benjamin White from Brighton last summer.
The processes in place are not exclusive for the senior side. The club’s coveted Hale End Academy, dutifully led by former captain and fan favourite, Per Mertesacker, has been using the same methodology to make sweeping changes for the youth teams.
The decision to sack Head of Youth Scouting Steve Morrow, was brought about as Arsenal looked to move for a more streamlined approach to data and analytics and its uses in recruitment. Steve Brown, former Head of Coaching at MK Dons is the club’s Lead Talent ID Coordinator and took over from Morrow.
Arsenal’s in-house data findings told the club that many young players currently playing at the top-level had been released by other top clubs. As a result, Arsenal began to focus their attentions, in earnest, to signing players who have fallen through the cracks at other clubs. Players such as Jonathan Dinzeyi, George Lewis, Salah Eddine Oulad M’Hand and Tim Akinola are clear examples of this policy.
That’s not to say that the club haven’t been investing money in the academy. Nikolaj Möller, Omar Rekik and Mika Biereth were all signed on StatDNA’s recommendations.
This summer has seen a lot of investment into the club’s senior playing staff.
Two positions stood out as being of particular interest to Mikel Arteta and his coaching staff. A striker and a left-back.
Pierre-Emerick Aubameyang’s tumultuous departure in January, coupled with Alexandre Lacazette leaving on a free transfer back to Lyon in the summer, meant Arsenal needed to act.
After months of preparation and work, the club eventually honed in on a list of names, with Manchester City’s Gabriel Jesus appearing at the very top of the list.
To the uncultured eye, Arsenal’s interest in Jesus appeared to be purely superficial. “He played under Pep Guardiola and wants to leave, he must be good”.
Recruitment decisions such as these seldom, if ever, rely on such hyperbole.
Arsenal’s data team had a specific set of parameters with which to work. Arteta wanted a high-pressing forward, one who was positionally and tactically versatile and someone who had the fitness stats to back it all up.
Gabriel Jesus, referred to by Pep Guardiola as “the best pressing forward in world-football” was the ideal candidate.
As with most football transfers, Arsenal’s groundwork for the deal began months ago. The exact details of the transfer are, clandestine to say the least, however, Arsenal ultimately made contact with Jesus’ agents and began presenting the club to them in a bid to convince the player to join.
One crucial part of Arsenal’s strategy to entice Jesus to join them over other contenders such as Chelsea, Real Madrid or Tottenham Hotspur, was the use of data in their presentation.
Arsenal presented key stats and metrics in which they felt that Jesus was underperforming. The club did not simply present the forward’s failings and issues, but rather a plan to help improve them. This is what swung the winds in Arsenal’s favour, just as Rosenfeld had with Arsène Wenger, nearly eleven years prior.
As for a left-back, Arsenal took a similar approach.
For this position, Arteta had a very specific kind of player in mind, once again.
Given the fluid 4-3-3 system Arsenal play under Arteta, the Spaniard wanted a player who could easily slot into midfield and who had high distribution stats.
The Arsenal analytics team raised a number of names, key among them was Ajax’s Lisandro Martínez.
Despite playing as the left-sided central defender in Erik ten Hag’s Ajax team, Arsenal saw the Argentinian defender as the perfect candidate for the fullback role.
Martínez’s distribution and passing metrics were off the charts, statistically listed in the top one percentile of ball-playing defenders. In other words, a must-buy.
There were concerns raised, of course. Chief among them was the idea that Martínez’s height (5’9) would be too short to play the role of a central defender. His low centre of gravity, and speed, coupled with the aforementioned passing stats made him an ideal candidate to compete with Kieran Tierney.
Though Arsenal failed to sign Martínez from Ajax, they were right to field worries over his height. Swapping Amsterdam for Salford, Martínez was repeatedly exposed by both Brighton and Brentford in Manchester United’s opening two games for aerial balls.
Other names were also mentioned such as Aaron Hickey at Bologna, Sergio Gómez at Anderlecht and Oleksandr Zinchenko at Manchester City, the latter of whom joined in a £32m deal. All three were products of labour-intensive statistical analysis, with Zinchenko the obvious candidate owing to his familiarity with the Premier League, Mikel Arteta and the system Arsenal play, which is closely modelled on Pep Guardiola’s Manchester City tactics.
Unusually, data also plays a small, but significant part in Arsenal’s departures.
Much has been made of the club’s “Dragon’s Den style” of sourcing loans, but the science runs much deeper than that.
Ben Knapper, the club’s loan manager, oversees a process in which clubs who wish to sign any of Arsenal’s players must make a formal presentation to the club in order to take the player on loan.
This approach is particularly novel in the world of football.
Loans are used for a myriad of reasons. Recovery from injury, allowing youth players a chance at senior football or even giving problem players an outlet to put themselves in the shop window, as was the case with Mattéo Guendouzi.
But for Arsenal, loaning out a youth player needs to mean more than that.
How do you loan a player out and not impede their development? For Arsenal, the key is to find teams that suit the player’s style of football.
While the clubs presenting for players are required to “do their homework” on the player before presenting for them, Arsenal will do the same.
Knapper has carried out intensive scouting into the teams who wish to take any of the players on loan. When Lincoln City took Tyreece John-Jules on loan in 2020, Arsenal carried out extensive statistical analysis of the club as well as in-person scouting before allowing the player to move.
When the loan has ended, Arsenal will study the stats. Knapper will debrief with the player and the loan club to glean pros and cons from the loan.
The stats are key for Arsenal in selecting where the player will end up next. This was the case for Brooke Norton-Cuffy, who spent last season on loan with Lincoln City and Mika Biereth, currently on loan at RKC Waalwijk.
Despite playing at another club, the in-house data and analysis team will monitor the player as if they were still at the club.
A difficulty that has arisen among most top-flight clubs is the jealousy with which they guard their data. A key part of Arsenal’s loan operation is ensuring that the loan club provide all of the detailed stats and analysis that they ask for.
Marseille needed to provide constant up-to-date information on defender William Saliba during his loan last season. This data helped inform Mikel Arteta’s decision to rely on the young Frenchman in the forthcoming campaign.
The most famous story of Arsenal’s unique loan process, is how Eddie Nketiah eventually joined Leeds United on loan.
Bristol City and Fortuna Dusseldorf had all made their interest in the player known, but Leeds won out in the end. All three clubs had to make specific presentations to the club and to the player, including information on how Nketiah would be accommodated in the team’s set-up, as well as information regarding the team’s overall performance metrics.
Though presentations from both club’s Director of Football were impressive, Nketiah opted for Leeds, after their strong plan for his development was made clear.
Loan managers are not necessarily anything new, but Arsenal’s approach to the system, seems to be unique in its own right.
Stats have also been a huge help to Arsenal’s injury management.
For this, Arsenal do not just use StatDNA. Firstbeat Sports offer heart-rate variability (HRV) data, which is coupled with STATSports GPS metrics to better quantify the issues affecting the squad.
Tom Allen is the club’s Lead Sports Scientist and is essential in help club doctor Gary O’Driscoll in preventing injuries and managing the workload of the players.
When Arsène Wenger was at the club, he saw the importance of data and analytics in reducing injury risks and enhancing a player’s physical performance stats; as a result, Arsenal developed a process that centres around five key areas:
- Skeletal Stress
These individual categories are then given a score out of 1-5. A score is then approximated through an algorithm made up of 75 different variables of data.
“We need to find a way of collating that information because more technology is being created every single year, there has to be a signal we need to find, we don’t just want to be running through loads of data. We need to know what our objectives are and how we’re going to decide on the information relevant for the manager.”Tom Allen at Firstbeat’s 2018 HRV Summit.
After this, Arsenal will then plot the player’s performances into specific stages. They will plot the player’s “story” using three stages to see whether the player is overworked, tired, stressed or facing potential fatigue. This information is then fed to the coaches.
Aside from that, the club also operates its own system, referred to as “the PMI System”.
Essentially, this works thus:
- P: Protect — Players who are at a high-risk of injury or “overloading” need to be protected from being overloaded or poor conditioning training.
- M: Maintain — The player has previously overloaded and must be managed according to their injury state.
- I: Increase — The player can now have their workload increased.
This system assigns one of the three letters to each member of the squad. For instance, Kieran Tierney may be assigned a P rating, while Granit Xhaka is awarded an M rating.
The PMI System has been very effective for Arsenal in managing injuries, but it is seldom helpful if the head coach or manager ignores the advice of the medical team. In Unai Emery’s time with the club, he ignored the medical team’s PMI ratings for defender Laurent Koscielny and midfielder Aaron Ramsey, and ended up pushing both too far.
The club have, more recently, adopted the use of STATSports GPS performance tracker vests. The vests allow Arsenal to capture highly-detailed metrics including Total Distance, Max Speeds, High Speed Running, Intensity and Strain levels. Arsenal even announced a partnership with STATSports allowing younger footballers to supply their data to the club and potentially train with the club too.
Statistical analysis is still in its infancy with regards to modern football.
Other sports have used data analysis to a far greater standard than football does, chief among them Billy Beane, former General Manager of the Oakland Athletics baseball team and subject of the Michael Lewis book (and subsequent film) Moneyball.
Using data and analytics to uncover potential players or to spot gaps in the market is often described as a club attempting to “Moneyball their way out of trouble”.
Billy Beane’s idol is Arsène Wenger. You know why? His ability to spend money and extract value. That is what it is all about to be successful in pro sports. If you can do that better than other people, you are always going to be pretty good.Stan Kroenke on Bill Beane’s admiration for Arsène Wenger.
While most clubs will rely on stats to a certain point, there are still many detractors of its usage in modern football.
Many will argue that stats in do not tell the whole story and that clubs who rely too heavily on them are taking a far higher risk, albeit a more calculated one.
The fabled “eye test” will always reign supreme, especially among football’s old schoolers.
Ultimately, Arsenal have opted for a collection of both. This allows the club to identify potential players, but allow the club’s scouts to see if the player passes the eye test.
Mikel Arteta himself has been known to favour specific signings and has an eye for signings, just as Wenger and Emery before him did. Indeed, the signing of Fábio Vieria from FC Porto was championed by Arteta. It was the Arsenal scouts who initially bought him to the Spaniard’s attention.
What many fail to realise is that sound judgement is needed to complement the stats. Jaeson Rosenfeld reportedly favoured Arsenal signing Nicolas Pépé over Unai Emery’s requested Wilfried Zaha, however, the data and analytics did not make Arsenal spend £72m on the player.
Data can provide a more holistic view of a team’s performances as well. On the surface, Unai Emery’s first season at Arsenal was enough to convince many that he was the right man for the job. The overview of those skeptical of data will say that a manager who finishes a point outside of the top four, reaches the quarter final of the League Cup and who reaches the final of the UEFA Europa League final, has likely done a good job.
The data said otherwise.
Rosenfeld felt that the underlying data of Emery’s first season was not sustainable through the team’s performances. These findings ultimately proved to be true and helped dissuade the club from offering the Spaniard improved terms on his contract. Emery was sacked a few months later.
Stan Kroenke becoming a full-time owner of the club may also influence how data is used. Kroenke is a big fan of Beane’s Moneyball approach, but it runs deeper than that.
Since becoming a full owner of the club in 2018, the Kroenkes have invested more money into the club, which has afforded Arsenal greater movement in the transfer market, such as when Thomas Partey’s release clause of £45m was activated on deadline day to bring him in from Atlético Madrid.
The data allows Arsenal to provide real-time analysis and perspective on players to present to Stan and his son Josh to push a move through. The appointment of Tim Lewis as non-executive director leans more towards this theory too.
Evidence to back-up their wants and needs allows Arsenal to convince the powerbrokers on specific deals.
Raul Sanllehí’s reliance on a web of contacts that included the likes of Arthur Canales, Oleg Smaliychuk, Kia Joorabchian and Jorge Mendes meant that Arsenal were aligning themselves with the wrong kinds of people without any evidence to back the relationship up.
Data cannot tell the whole story, but it does paint a decent picture, Arsenal now seem to have struck the right balance between statistical analysis and human interaction, however, the old adage runs true: Football is played on a football pitch, not on a spreadsheet. But who says the spreadsheets can’t help?