Talent Mining – Unearthing Value in Human Capital Data

JIT Talent IdentificationThere are people in the HR/recruiting industry who believe that searching databases, the Internet, and social networking sites to source talent is relatively easy and that it can be automated through the use of technology.

While those people are actually right (to an extent), I am happy to say that unfortunately for them, it’s not that simple.

While anyone can manually write or automate basic searches and find some people, those searches only return a small percentage of the available talent that can be found and they also exclude qualified people. Moreover, there are actually many different levels of searching human capital data in the form of resumes, social media profiles, etc., most of which cannot be replicated or automated by software solutions available today.

In this post, I’m going to share my original slide deck from my SourceCon presentation on the 5 levels of talent mining that I delivered in DC at the Spy Museum (what an awesome venue for a sourcing conference!) and then I’ll dive deep into each distinct level, including examples.

SourceCon Deck

The 5 Levels of Talent Mining from SourceCon 2010 DC from Glen Cathey

The Advantages of Data: Predictive Control and Speed

Before I dive into each of the 5 levels of talent mining, I want to take a moment to explain precisely why sourcing potential candidates via human capital data (typically text from resumes, social profiles and activity, Internet content, etc.) is so powerful. There are two major advantages talent mining has over all other forms of generating candidates (e.g., cold calling, referrals, applicants, etc.):

  1. More predictive control
    • When searching human capital data, you have a significant ability to predictively control the primary candidate variables: what people have done, what people can do, what people would like to do, where they would like to work, their compensation and their availability.
  2. Speed
    • When searching human capital data, you can identify as many as 60 closely matched (highly likely to be QIA – qualified, interested and available) potential candidates per hour. Compare that to cold calling, trying to elicit referrals, and posting a job…

Has that piqued your interest?

If so, read on to learn more about the 5 levels of talent mining. If not, you’re reading the wrong blog. :)

Level 1 Sourcing / Talent Mining

Level 1 sourcing is essentially “buzzword bingo.” It involves little more than taking job titles and required skill terms from job descriptions, using them as search terms, and then performing straight lexical (word for word, title for title) matching.

As such a superficial level of keyword sourcing and matching, Level 1 sourcing does not require any deep understanding of the roles, skills, responsibilities, or technologies involved in the hiring profiles or the candidates.

This level of basic keyword and title searching and matching will produce results, and this is where some people get the false sense that sourcing is easy. Here’s the catch – the results are limited to only those people who happen to match the titles and keywords search for. Which is never all of the best candidates that you have access to.

A single search cannot find all qualified candidates, as it will both include and exclude qualified candidates, relegating them to dark matter.

The danger of Level 1 sourcing lies in the fact that it will not (and can not) find people who are qualified but do not happen to have the exact titles searched for, nor those people who actually do have the right skills and experience but who 1) simply don’t happen to mention all of them in their resume or social media profile, and/or 2) express their matching skills and experience using words that differ from those used in the job description and required skills – and thus those used in the search.

Level 1 sourcing creates huge volumes of dark matter – entire populations of qualified candidates that you actually have access to, but your searches never retrieve them or you never review them. If you didn’t find it, it doesn’t exist, right? :-)

The good news is that level 1 sourcing works, gets results, and can be easily be performed by “junior” personnel/researchers, because almost anyone can match titles and keywords. Additionally, Level 1 sourcing can be completely automated using software – why pay people to match keywords when matching applications can do it for considerably less than $5 per hour?

The bad news is that in addition to creating huge hidden talent pools of fantastic people who will not and can not be found, level 1 sourcing provides absolutely no competitive advantage. If two companies are performing level 1 searching for the same types of people, they will find the same candidates. Same titles and keywords = same results. Interestingly enough – they will also NOT find the same people.

Think about it.

Level 2 Sourcing / Talent Mining

Level 2 sourcing goes beyond literal lexical matching and takes a step into conceptual search territory. Instead of relying solely on the exact titles and experience keywords provided in a given job description, level 2 sourcing involves the utilization of synonymous terms and concepts.

For example, let’s say you were sourcing for a position with a title of “Safety Physician.” While a level 1 sourcer would search only for the exact title of “Safety Physician” and find people who happen to have that title, a level 2 sourcer would perform research and discover that other organizations use a variety of other titles to describe the same role, such as Associate Director of PVRM, Pharmacovigilance Physician, Senior Drug Safety Associate, Global Safety Senior Medical Scientist, Global Pharmacovigilance (Contract) Physician, and Medical Director, Drug Safety & Pharmacovigilance.

A level 1 sourcer using only the title “Safety Physician” in their search could not find appropriately qualified candidates that used one of the above titles instead of “Safety Physician.” To the level 1 sourcer, those other candidates simply don’t exist – they are unware of their existence. However, a level 2 sourcer would find them.

At the skills search level, a level 1 sourcer looking to find software engineers with “Ruby on Rails” experience would search for that exact phrase, and would find only those people who happen to mention it. A level 2 sourcer would perform research and discover that some people with that experience may instead express “Ruby on Rails” as Rails, Ruby, or simply RoR. As such, the level 2 sourcer would be able to find candidates that the level 1 sourcer cannot.

Level 2 sourcing can be automated – there are many vendors offering solutions that will take basic title and keyword searches and automatically search for synonymous titles, words, and phrases.

However, there limitations with automated solutions, and there are a few aspects of level 2 sourcing that can only be performed by humans:

  1. It takes a human being to interpret and understand the hiring need, which can not be effectively conveyed soley by a job description, titles, and required skills, to determine what search terms to use (and which ones not to use!)
  2. Only a human sourcer can analyze the relevance of the results from initial searches and adaptively learn from them to creatively refine successive searches to increase both the quantity and the quality of relevant results.
  3. Applications have no awareness of dark matter – at this time, only human sourcers have the ability to be aware that their search criteria may actually eliminate qualified candidates. This awareness enables them to take appropriate action to alter their searches to uncover candidates that previous searches eliminated.

Level 3 Sourcing / Talent Mining

Level 3 sourcing involves searching for and identifying what isn’t explicitly mentioned by candidates – in other words, searching for what isn’t there.

The fact is, most people have skills and experience that they do not directly express in their resumes and social media profiles. This is because:

  • People cannot be reduced to and represented wholly/accurately by a text-based document, form or page
  • Unless you’re recruiting professional resume writers, your potential candidates aren’t ;)
  • 99% of people don’t write their resumes and social profiles in consideration of how people might search for text to find them
  • People simply don’t mention everything they’ve ever done or worked with on a resume, let alone a social networking site
  • People still believe shorter resumes are better, which means they are purposefully limiting the text they use to express their skills and experience (a huge contributor to dark matter)
  • There are a ridiculous number of ways people can express the same skills and experience
  • Companies don’t use the same job titles for the same jobs and responsibilities
  • Let’s not even talk about misspellings… :)

All of the above and more creates HUGE volumes of resumes, candidate records, and social network profiles of people who have skills and experience that cannot be directly searched for because it simply isn’t there. Most sourcers and recruiters are never even aware of these people because they can’t be returned by standard (level 1 and 2) search tactics/strategies.

Level 3 sourcing involves incorporating an understanding of the intrinsic limitations of human capital data in the form of resumes and social media profiles detailed above into sourcing strategies and tactics, and is a skill that can needs to be developed over time from observation and direct experience.

For example, let’s say a manager has an opening for someone with Rational Unified Process experience.

  • A level 1 sourcer would search for “Rational Unified Process.”
  • A level 2 sourcer (human OR otherwise) would/could search for synonymous terms (RUP OR “Rational Unified Process”).
  • A level 3 sourcer would be able to find people with Rational Unified Process experience without actually searching for the terms by researching which companies use RUP and searching specifically for people who have worked for them but who do not say (RUP OR “Rational Unified Process”) by using the NOT operator.

A level 3 sourcer is capable of finding the same candidates someone who employs only level 1 and 2 sourcing tactics, as well as candidates level 1 and 2 sourcers cannot. Additionally, a level 3 sourcer can find candidates that matching applications employing level 2 sourcing concept/semantic search cannot – this is because an application cannot effectively search for words and concepts that cannot be found because they simply aren’t there.

Level 4 Sourcing / Talent Mining – Semantic / Natural Language Search

Level 4 sourcing involves searching for responsibilities and capabilities, not just keywords and/or titles.

Moreover, level 4 sourcing takes concept searching beyond synonymous words and phrases (level 2 sourcing) and targets meaning at the sentence level – specifically targeting what people DO, not just what they SAY.

To the best of my knowledge, there are no applications available today that perform dynamic sentence-level (not static phrase level) semantic search via verb/noun combinations. However, any human sourcer can perform level 4 sourcing manually by searching for verb/noun cominations using a search engine that supports the NEAR (e.g., Monster’s “classic” search) or any other proximity search operator.

Search Example 1

Let’s say you’re looking for someone who has had experience performing administrative support for C-level executives.  Using Monster, you could use a search something like this:

support* near (CEO or CFO or CTO or CIO or “C-Level” or chief*)

Essentially this search is looking for any permutation of the verb “support” to be mentioned within 10 words (forwards or backwards) of one of the many ways of expressing a C-level title. This can effectively target sentences in which people express the responsibility of supporting C-level executives.

Here are snippets from 3 different resumes. Notice that no title search was necessary due to the power of targeting sentence-level meaning:




Search Example 2

If you were looking for someone who had experience configuring Juniper routers, you could run a search like this on Monster:

config* near juniper near router*

This search is essentially looking for people who mention that they have experience configuring Juniper routers, because some permutation of the root “config” has to be mentioned within 10 words of Juniper, which also has to be mentioned within 10 words of router or routers. In most cases, due to the proximity specifications, these 3 words variants will be found in the same sentence – expressing Juniper router configuration responsibility.

Does it work? You decide.



Search Example 3

If you use PCRecruiter (which uses Lucene for text search) and you were looking for people who had experience creating Access databases, you could run this search:

“created access database”~7

That search is asking the database for any result in which the words “created,” Access,” and “Database” are all within 7 words of each other. And it works.

Notice that this is not an exact phrase search – in the relevant phrases, the words are actually in a different order than expressed in the search above, yet the concept is the same.


Level 4 sourcing is user-defined, grammatical natural language search.

As complex as that sounds, it’s essentially intelligent keyword search empowered by proximity search capability (extended Boolean) that effectively enables semantic search targeting verb/noun combinations. Best of all, it produces highly relevant results, matched at the responsibility level – what people have done and can do, not just words they happen to mention.

#relevance #win

Level 5 Sourcing / Talent Mining

Level 5 sourcing is a creative use of human capital data in which sourcers deliberately search for the “wrong people” in order to find the “right people.”

This can involve #1 searching for under/overqualified professionals – people who do not have enough years of experience for a specific position, or those who are very experienced and likely to be looking for compensation above what you can offer for a given position, as well as #2 searching for people who likely work with or know the professionals you need to find.

In some ways this isn’t much different than cold calling, yet it has the advantage of specificity and target variable control. For example, let’s say you’re looking for C# software engineers with at least 3 years of SharePoint portal development experience, and you know from experience that people with more than 5 years of applicable experience tend to want a higher level of compensation than you are able to offer.

Once you’ve exhausted all searches/sources for direct matches to your need (C# software engineers with 3 to 5 years of SharePoint portal development experience), you could deliberately search for people with precisely the right experience, but less than 3 years or more than 5.

While you may not be able to immediately assist these people, by identifying them ahead of need you can effectively and proactively build your candidate pipelines for junior and more senior C#/SharePoint portal developers, and you afford yourself the opportunity to network with these individuals to identify people they know who do have 3-5 years of applicable experience.

Going one step further, you could search specifically for people who would have experience working with your target candidate pool. This could include software testers, business analysts, development/project managers, etc. By searching for, identifying and contacting testers, business analysts, and managers who have experience working on C#/SharePoint portal projects, you can proactively build your pipeline of candidates with these skills, as well as network with them in an effort to identify C# software engineers with SharePoint portal development experience.

Beyond the 5 Levels

I believe that it is all too easy for people to oversimplify the sourcing role and function, as well as suggest that sourcing (finding people) is easy, that it can be effectively mastered and performed by junior personnel, and that it can be fully automated through the use of search and match applications.

All of which is precisely why I took the time to analyze talent mining and share with you the fact that there are at least 5 distinct levels of candidate sourcing.

I say “at least” because I am not satisfied to say that there are only 5 levels – there may be more than 5 distinct levels of talent mining and candidate sourcing. I’m looking forward to the sourcing and recruiting cognoscenti to digest my assessment of sourcing/talent mining and offer their thoughts and feedback.


You’ll notice that in my assessment, only level 1 and to some extent level 2 sourcing can be performed solely by search and match applications without human involvement.

Some aspects of level 2 sourcing can only be accomplished by a living, breathing, thinking person. For example: Interpreting and understanding the hiring need, analyzing the relevance of the results from initial searches and adaptively learning from them to creatively refine successive searches to increase both the quantity and the quality of relevant results, and leveraging an awareness of hidden talent pools to take appropriate action to alter searches to specifically uncover candidates that previous searches eliminated.

Similarly, at this time, only people are capable of interfacing with and searching databases and Internet sites to perform level 3 – 5 sourcing.

I believe that the solution to the talent sourcing challenge lies in:

  1. The ability of  people to truly understand the positions being sourced for, an awareness and appreciation of the intrinsic limitations of human capital data, and the ability to employ sound search/data mining tactics and strategies to go beyond these limitations and leverage human capital data to find all of the best candidates, both directly and indirectly.
  2. Companies finally “getting it” by understanding and appreciating of the true value of human capital data, which is directly proportional to the ability to quickly retrieve exactly what you want when you want it. This should lead companies to offer their sourcing and recruiting teams better search capability and technology (for both internal databases and external resources).
What say you?

  • Excellent revisit of this topic – I remember when you shared the original deck in DC and it remains great. The contextual explanations here are useful, though if I could suggest only one thing to help readers, it would be to provide an example of the RUP search related to the last bullet in Level 3 (or slide 46 in your deck for SAP) for what else you’d include in the search (besides the name of target company using product and negated terms related to the product) so the sourcer doesn’t have to sift through a laundry list of employees who are clearly not relevant (e.g., the recruiters, the secretaries, etc.).

  • Rob McIntosh

    I remember that deck and fond memories of that SourceCon. I would be curious to get your POV on how long you think it will take before algorithms can perform all 5 levels? Watson can now beat a human being at the game of Go. if anyone is familiar with the game it requires totally different computer thinking with an absurd number of choices way beyond Chess. if you are not familiar as a reader of the implications of this breakthrough in computing, then check this out and think about what these advancements mean for the sourcing.


  • Rob McIntosh

    response edit. I should have said Deep Mind not Watson….sorry Alphabet, I gave undue credit to IBM :-)

  • 看看!

  • Randy Bailey

    First of all, this is particularly valuable when consumed along with the last blog The Best Boolean and Semantic Search Tool (http://booleanblackbelt.com/2016/01/the-most-powerful-boolean-search-operator/), especially the link to Linkedin Talent Connect 2014 Keynote address. This answers some of the unanswered questions from that deck, however that presentation is by far the best hour of sourcing training I’ve ever taken!

    Glen: Question for you, what is the current current state of the Talent Intelligence & Analytics solutions available today? In 2010 you said there were none, how about today?

    In response to Rob’s comments: Last week I stumbled across a presentation about some of the StitchFix algorithms. They are doing some amazing things at selecting Fashion for women, based on comments & a questionnaire that customers have to fill out. My GF happens to use them and LOVES them! The deck/presentation is largely over my head technically, but I understood the jist of what they were doing and its scary!


  • Steve Levy

    Glen: Very nice taxonomy. My take on Levels above 5 delves into not inferring unwritten/unspoken text but “unthought” cognitions – as in, “it’s like you knew me and what I was thinking.” Look up the LIDA model of cognition – next few years are going to be very interesting…

  • Rob, what Google’s Deep Mind (Go examples) and IBM’s Watson (Chess examples) are doing for computing applications isn’t much different than what Wilber Wright did by circling the Statue of Liberty for aviation in 1909: capture the consumers attention, offer proof what can be done today and foster the imagination of what’s possible in the future.

    Although it would be great if these teams shifted their attention to TA (highly dubious) – it will be a long time coming before this level of computing is available for our industry. To synthesize the nuance of unstructured talent data needed to effectively job match, will require this type of effort and I just don’t see this level of brilliance in HRTech today. My biggest take away from Glen’s post is that the technology available for TA for sifting through search habits, social data sharing and profiles/resumes to match candidate attitude, interest, motivation and career activity will thankfully continue to require a human to connect the dots to a specific role or company.

    IMO, as data becomes more prevalent, sourcing will become a job of data analysis (the very earliest stage of this is happening now) with little need for isolation and search. Eventually, well after I have moved on to the big CRM in the sky, tech will provide data strong enough for matching with most roles, and data analysis will be needed for only the most complex of open reqs. It would be great fun if this happens sooner, but we’re about 10-15 years behind consumer industries and they’re use of data analytics is still evolving on the up swing…but I can still hope… :)

  • Pingback: Weekend Reads for Sourcers and Recruiters - HiringSolved()

  • Pingback: Vergeet Boolean Search, dit zijn de 5 niveaus om talent te vinden()

  • Pingback: Smarter Sourcing for being a Talent Magnet: - Adfers Consultants LLP()