Big Data, Data Science and Moneyball Recruiting

With each passing day, an increasing amount of data is being generated and transmitted by and about more people than ever before.

At Google’s 2010 Atmosphere convention, Google CEO Eric Schmidt stated that “There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days.”

In case you were wondering, an Exabyte is 1,000,000,000 gigabytes, or 10,000,000 terabytes. That’s a lot of information.

Interestingly, Google’s CEO may have actually underestimated the amount of data being generated at the time. From their research, RJMetrics believes that a more accurate figure would be approximately 6.6 exabytes every 2 days. One thing is for sure – the number is even bigger today.

What does any of this have to do with recruting? Why should HR, recruiting and sourcing professionals, as well as corporate executives care about big data?

Well, because a chunk of big data is human capital data, and as I have been ranting about for the better part of 3 years, human capital data can be leveraged to identify and hire more great people more quickly.

If you’re a dinosaur recruiter or sourcer, I don’t recommend you read the rest of this post, because:

  1. I will challenge they way you think and work, and that might make you uncomfortable
  2. You’ll probably think it’s a load of garbage
  3. It might make you aware of your pending extinction (the precise timing of which is debatable)

I have to warn you that this is not a short, quick-hit post – this may be the longest single post I have ever written, which explains why you didn’t see a post from me last week. I wrote this piece to introduce a human capital paradigm shift, to challenge the long-standing conventional wisdom in HR and recruiting, and to (hopefully!) provoke progressive thought from my peers. If that’s not your thing, turn back now.

If you want a glimpse into the future of talent identification and acquisition, you’re always interested in figuring out how your company can gain a competitive advantage, and you’re wondering what the heck my “Moneyball recruiting” reference could possibly be about, then read on. 

What is Big Data?

Wikipedia claims that “Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.”

Other sources attempting to define big data include “the tools, processes and procedures allowing an organization to create, manipulate, and manage very large data sets…”

Regardless of definition, the big data concept centers around huge amounts of data that are not only increasing in volume, but also in velocity and variety.

The data velocity aspect is the speed at which new data is generated. One example of the increasing velocity of human capital data would be social media posts/updates. For example, Twitter crossed the 200M tweets/day mark in June – that’s 1 billion tweets every five days. How’s that for velocity?

The variety of data sources and types should be obvious, especially when it comes to human capital data – LinkedIn profiles (which can now be converted into resumes/CVs) and updates, Facebook, Google+ and Twitter profiles and updates, recommendations/awards/endorsements, blogs, blog comments, mobile updates, press releases, and much, much more.

The Big Deal about Dig Data

According to MGI and McKinsey’s Business Technology Office, “The amount of data in our world has been exploding and analyzing large data sets—so-called big data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus…Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future.”

MGI’s research into big data offered several key insights, including:

  1. “Data have swept into every industry and business function and are now an important factor of production, alongside labor and capital.”
  2. Big data “can unlock significant value by making information transparent and usable at much higher frequency.”
  3. “As organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore expose variability and boost performance. Leading companies are using data collection and analysis to conduct controlled experiments to make better management decisions; others are using data for basic low-frequency forecasting to high-frequency nowcasting to adjust their business levers just in time.”
  4. ” The use of big data will become a key basis of competition and growth for individual firms. From the standpoint of competitiveness and the potential capture of value, all companies need to take big data seriously. In most industries, established competitors and new entrants alike will leverage data-driven strategies to innovate, compete, and capture value from deep and up to real time information. Indeed, we found early examples of such use of data in every sector we examined.”
  5. “There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

The dig deal about big data is that data can be used to make better decisions.

While McKinsey found that some companies are using data collection and analysis to make better management decisions, I think there is a huge opportunity to collect and analyze human capital data, specifically to make better hiring decisions – to gain a holistic advantage over competitors by finding, identifying, and enabling the recruitment of top talent. 

Data Science and Talent Science

If you were really paying attention when you read point #5 above, you would not be surprised to learn that the hottest job you never heard of is based on data science. Yes, the highly skilled, nerdy-cool job that companies are scrambling to fill is that of the data scientist.

While there are many different ways of explaining what data science is, data scientists collect, extract and analyze information from large datasets and deliver actionable intelligence to non-data experts.

In a recent Fortune Magazine article, Michal Lev-Ram explains that “A data scientist helps companies make sense of the massive streams of digital information they collect every day, everything from internally generated sales reports to customer tweets. The gig which requires the specialist to capture, sort, and figure out what data are relevant is one part statistician, one part forensic scientist, and one part hacker.”

The article goes on to explain that while data scientists have been been playing key roles at companies like Google and Amazon for quite some time now, a variety of other companies such as Wal-Mart, Foursquare and Bitly are hiring data scientists to analyze their data and provide intelligence that can lead to better business decisions or new products. Furthermore, Deep Nishar, Senior Vice President of Products & User Experience at LinkedIn, was quoted as saying “Data engineers are already harder to find than search engineers, and that’s a sign of the times.”

According to this article, data scientists are already an integral part of competitive intelligence, a field encompassing a number of activities, such as data mining and analysis, that can help businesses gain a competitive edge. Ken Garrison, CEO of the industry group Strategic and Competitive Intelligence Professionals (SCIP), explains, “The field involves collecting data, analyzing it and delivering the data as intelligence that is actionable.”

“The question facing every company today, every startup, every non-profit, every project site that wants to attract a community, is how to use data effectively,” writes Mike Loukides, Vice President of Content Strategy for O’Reilly Media, on the O’Reilly radar website. He adds, “not just their own data, but all the data that’s available and relevant.”

There is already a significant shortage of people with the talent necessary for companies to take advantage of big data of any form, let alone human capital data.

When it comes to taking advantage of the vast and ever-increasing quantities of human capital data available, the sourcers of tomorrow will not be the average sourcers of today – a small subset of them will evolve into data scientists/engineers specializing in human capital data. Perhaps they will be known as Talent Engineers or Talent Scientists?

Regardless of title, data scientists in support of talent identification and acquisition efforts will collect, extract and analyze human capital information from large datasets and deliver actionable intelligence to hiring managers and teams.

In other words, these data scientists will find and identify top talent and enable better data/fact-based sourcing and hiring decisions, empowering their employer with the competitive advantage of consistently hiring the best people.

Moneyball Recruiting

If you’re unfamiliar with Moneyball, the term comes from Moneyball: The Art of Winning an Unfair Game, a book by Michael Lewis about the Oakland Athletics baseball team and its general manager Billy Beane. Its focus is the team’s modernized, analytical, sabermetric approach to assembling a competitive baseball team, despite Oakland’s disadvantaged revenue situation. Simply put, the Oakland A’s didn’t have the money to buy top players, so they had to find another way to be competitive.

The central premise of Moneyball is that the collected wisdom of baseball insiders (including players, managers, coaches, scouts, and the front office) over the past century with regard to player selection is subjective and often flawed.

Through the use of rigorous statistical analysis of baseball player performance and game records and Sabermetrics, the Oakland Athletics picked players based on qualities that flew in the face of conventional baseball wisdom and the beliefs of many baseball scouts and executives. In 2002, with approximately $41 million in salary, the Oakland A’s were competitive with larger market teams such as the New York Yankees, who spent over $125 million in payroll that same season.

When Sabermetrics was introduced into baseball, it was immediately rejected by many simply because it was new, different, leveraged statistics over intuition and experience, and frequently questioned conventional wisdom with regard to traditional measures of baseball skill evaluation. For instance, Sabermetricians doubt that batting average is as useful as conventional wisdom says it is because team batting average provides a relatively poor fit for team runs scored.

While baseball traditionalists scoff at the sabermetric revolution and have disparaged Moneyball for emphasizing concepts of sabermetrics over more traditional methods of player evaluation, the impact of Moneyball upon major league front offices is undeniable.

For example, teams such as the New York MetsNew York YankeesSan Diego PadresSt. Louis CardinalsBoston Red SoxWashington NationalsArizona DiamondbacksCleveland Indians, and the Toronto Blue Jays have hired full-time Sabermetric analysts.

Interesting, yes?

Oh, and the 2004 Boston Red Sox built their 2004 team with Moneyball in mind (the General Manager at the time was one of Billy Beane’s disciples). They also happened to win the World Series in 2004, ending the “Curse of the Bambino.”


If you’ve seen the Moneyball trailer (or movie), you will hear one of the quotes that struck a chord with me, which was “You don’t put a team together with a computer.”

When I first heard that quote when I saw the Moneyball trailer, I immediately thought of all of the people who respond to my articles with “recruiting is about people and not about technology” (e.g., sourcing, information retrieval, databases, analytics, etc.). I also thought about all of the great people I’ve hired and the powerful teams I have put together with a computer. :-)

When I recently wrote about using predictive analytics in recruiting, I received a response on Twitter from a well-known recruiting personality who was highly dubious of the ability to use text and data to predict who might be a good hire for any particular hiring need.

I expected that kind of response, primarily because leveraging predictive analytics in identifying and hiring people contradicts conventional recruiting wisdom (e.g., “recruiting is about people and not about technology”).

Much of what is accepted as sourcing, recruiting and hiring best practices today is largely based upon conventional wisdom – ideas or explanations that are generally accepted as true.

However, the problem with any conventional wisdom is though the ideas or explanations are widely held, they are also largely unexamined and untested, and thus not necessarily true.

Conventional wisdom can be a significant obstacle to the acceptance of new information or the introduction of new ideas, theories and explanations, in many cases due to the fact that conventional wisdom is often made of ideas that are convenient, appealing and deeply assumed. At some point, however, these assumptions and beliefs can be be violently shaken when they no longer match reality at all.

Some people would call this violent shaking of conventional wisdom disruptive innovation, and I believe it is coming to talent acquisition in the form of Moneyball recruiting.

What Could Moneyball Recruiting Look Like?

Is there an equivalent to Moneyball in recruiting – in challenging conventional HR and recruiting wisdom and identifying and hiring top talent through the use of data, statistics, empirical evidence and objective facts?

I believe there definitely could be. Yes – it just hasn’t been developed yet.

Here are just a few ways we could apply the Moneyball concept/analogy to talent acquisition:

  1. Moving away from using largely subjective means of assessing talent and making hiring decisions to more objective, fact and empirical data-based means
  2. Identifying and acquiring top talent looking for traits, experience, accomplishments and information overlooked by traditional recruiting and assessment methods
  3. Challenging conventional wisdom as to what top talent looks like and where it comes from (e.g., Ivy league schools, high G.P.A., certifications, M.B.A’s, experience at certain companies, etc.)
  4. Developing objective performance measurements that are relevant across any role, responsibility, company, and industry and that stick with each person as they move through their career, similar to a credit score
  5. Individual companies developing “secret sauces” for sourcing, analyzing and evaluating potential hires based on their own data and factual statistical analysis of the makeup of their ideal hire and employee
  6. Breaking away from the idea that the only way to hire great people is to “buy” and poach them from competitors or specific companies (look at how incestuous Facebook, Google, LinkedIn, Microsoft, Apple and Yahoo are with regard to their talent pool)

I think it would be fascinating to objectively examine the conventional wisdom that referrals are the best hires. I know that might sound like blasphemy to some, as many simply assume referrals are the best hires, but surveys based on people’s subjective opinions of who the best employees are aren’t objective, and they certainly aren’t based on empirical evidence. Besides, referrals may score highly on quality-of-hire metrics based more on a self-fulfilling prophecy than anything else.

It’s one thing to say, think and feel that referrals produce the highest quality of hire, and it’s entirely another to prove it with objective, factual data.

In reality, referrals may in fact simply be the least-worst source of hire. Contemplate that little gem for a bit – I’ll be writing a post on it in the near future.

What about our assumptions on where great people come from?

For example, we know there is a talent war going on, specifically between tech titans such as Facebook, Google, LinkedIn, Microsoft, Apple and the usual suspects. We can easily guess as to why any one of those companies would like to hire someone from one of the others, but is the reasoning behind it and the practice of it based on conventional wisdom and assumption or based on proof that these people make great hires?

Unless someone performs a real study into the matter, I say it’s all based on assumption. If you think I am wrong, show me the unequivocal proof.

Here’s where some real Moneyball recruiting can be implemented – instead of paying top dollar for an already highly paid industry retread, develop and use a structured and proven data and fact-based methodology for identifying the next superstar from a non-obvious company out or straight out of school. Wouldn’t it be interesting to see where the real game-changer employees come from, and not just assume they come from the obvious short list of companies? Who’s really to say that the best Facebook engineer isn’t one that came from IBM, GE, or some obscure company?

If we believe what is written and said about Google – Google prefers people who have high G.P.A.’s and targets people who have graduated from specific schools – two of which (so my sources tell me and LinkedIn verifies) are Stanford and Berkeley. Is this based on the perceived value of achieving a high grade point average and graduating from certain schools or it is based on factual, data-based proof?

Wouldn’t it be interesting if Google went back through all of their hires – every single one of them – and found ways of objectively measuring their actual impact/value, and then crunched the data to find out what schools their fact-based top performers came from and what their G.P.A.’s were? Would the facts support the conventional wisdom and subjective preference for high grade point averages and certain schools? It certainly can’t be overlooked that some very smart, driven, and talented people never get the chance to go to a prestigious school based on a number of uncontrollable factors, not the least of which is socioeconomic status.

Regardless, it would be extremely valuable for Google (and any company!) to find out for a fact any patterns in the backgrounds and makeup of their top talent and then leverage them in their sourcing and recruiting efforts.

While we’re challenging assumptions, is a college degree really necessary to be a top performer?

Many companies make a college degree a prerequisite for hiring for specific roles or even at all, and many well-respected companies have hiring managers that are degree and university snobs – rejecting resumes based on schools attended and degrees earned.

If you work for such a company, I have a few names for you: Steve Jobs, Bill Gates, Mark Zuckerberg, Michael Dell, and Sean Parker. Do any of them ring a bell?

Outliers you say? Perhaps, but they are certainly proof you don’t need a college degree to be successful, and they just happen to be the most well-known successful college drop-outs – there are no doubt countless other successful people who never went to college or dropped out – they just aren’t in the public eye.

What if you could leverage data to identify the potential in people before they were 18, regardless if they were on a path to college or not? And no, we’re not talking about intrinsically flawed data like G.P.A. and SAT scores. How many brilliant, high-potential people could be given the right opportunity to fully realize their potential, regardless of whether or not they were born into the right family, in the right place, at the right time, and the stars aligned for them to be able to attend a prestigious university, let alone any college?

And if you think point #3 above is far fetched and an impossibility, (funded in part by Google CEO Eric Schmidt, by the way) has already tried to come up with a numerical score for individuals based on work history, education history, and social network. It has some serious flaws (the subject of another post), but it shows that there is already a movement to try and represent and rank people based on a single numerical score, and won’t be the only foray into that space. I believe that there is a significant opportunity for companies to develop their own data-based and statistically driven talent identification and acquisition models.

Final Thoughts

Is there already a way to leverage data and technology to increase the probability of finding and identifying people who are capable of being the next “A” player and significant contributor to your team?

Yes, I know there is. It already exists, albeit in a very crude form using tools and technologies in ways they were not intended nor designed for.

If only companies would start to focus their business intelligence and predictive analytics horsepower that many already currently use (and spend  millions on) for marketing, product development, sentiment analysis, healthcare, etc., and focus it on human capital data to enable better hiring decisions (which always starts with talent identification, aka sourcing, by the way), we would begin to see Moneyball-like disruption develop in the HR and recruiting function.

There is no denying that we are well into the Information Age, which is characterized by the ability of individuals to find and transfer information freely, and to have instant access to knowledge that would have been difficult or impossible to find previously. The digital revolution has already begun and we are seeing a shift from traditional industry that the industrial revolution brought through industrialization, to an economy based on the manipulation of data and information, i.e., an information society.

As such, I agree wholeheartedly with Mike Loukides that “The future belongs to the companies who figure out how to collect and use data successfully.

However, I’d add that the future belongs more accurately to the companies who figure out how to collect and use human capital data successfully.

That’s because the companies that can consistently hire great people, through identifying people and basing hiring decisions on data and not intuition and conventional wisdom, are more likely to develop the best teams.

And the best teams win.

Even if you aren’t a baseball fan (I’m not!), I highly recommend you read Moneyball or at least watch the movie – I am not alone in seeing that value and underlying message of Moneyball can be applied to HR and recruiting, and certainly almost anything that has to do with building teams and businesses.

The old guard in baseball thought that using statistics and unconventional measures of performance through the use of Sabermetrics defied everything they knew about baseball. They were right.

If you think the idea of leveraging data and statistics to find and hire top talent defies everything we know about human resources and recruiting, I say you’re right.

I also say it’s a good thing, and that we’re just getting started.