Beyond Boolean: Human Capital Information Retrieval

When I recently spoke at SourceCon in New York, I showed an example Boolean search string that could be used as a challenge or an evaluation of a person’s knowledge and ability.

The search string looked something like this:

(Director or “Project Manage*” or “Program Manage*” or PM*) w/250 xfirstword and (truck* or ship* or rail* or transport* or logistic* or “supply chain*”) w/10 (manag* or project)* and (Deloitte or Ernst or “E&Y” or KPMG or PwC or PricewaterhouseCoopers or “Price Waterhouse*”)

During the presentation, an audience member asked me why there wasn’t any use of site:, inurl:, intitle:, etc. I responded by acknowledging that for many, sourcing and Boolean search seems to be synonymous with Internet search – however, this is definitely not the case.

Boolean Logic is Simply the Simplest Way to Search

Some (but I hope not too many!) sourcing and recruiting professionals may be surprised to learn that Boolean logic significantly predates the Internet as well as computers – by over a century!

I still run into sourcers and recruiters that are not aware that the word “Boolean” comes from the man who invented Boolean Logic in the 19th century – George Boole. Boolean Logic is the basis of modern computer logic, and George Boole is regarded in hindsight as one of the founders of the field of computer science.

With Boolean logic being created in the 1800′s – it’s pretty obvious that Boolean logic is not just for searching for people and information on the Internet.

Practically any information system from which you need to search and retrieve information from “speaks” Boolean.

This is understandable, because using Boolean logic is the simplest way to construct a search. When you want a combination of terms/phrases you use AND, when you want at least one of a group of terms/phrases you use OR, and when you don’t want something you use NOT. It really doesn’t get any easier than that.

When anyone types more than a single word or phrase into Google, Bing, LinkedIn, Amazon, eBay, etc., they’re performing Boolean search, because spaces are automatically converted to ANDs. Billions of people across the globe are running basic Boolean strings whether they are aware of this or not, which is a testament to how easy Boolean search is.

Sourcing isn’t about Boolean Search Strings

Sourcing candidates is much more than Boolean search strings – they are but one aspect of sourcing.

Sourcing talent is more accurately and completely defined and described as human capital information retrieval.

Information retrieval (IR) is “the science of searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web.”

Leveraging information systems for talent discovery and identification is about searching documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the Internet for human capital information, including titles, companies, responsibilities, skills, technologies, social network updates, blog posts, resume information, event and association lists, etc.

With IR, an information retrieval process begins when a user enters a query into an interface.

Queries are simply formal statements of information needs. For a sourcer or recruiter, their information need is typically to find information that will lead them to discover and identify people with specific skills, experience, capabilities, education, etc.

While using Boolean operators is arguably the easiest way to construct a query, IR queries do not have to be limited solely to Boolean logic, as can be seen in the various non-Boolean query modifiers of Internet search engines (here are some of Google’s and Bing’s), LinkedIn’s advanced search operators, faceted search (e.g., LinkedIn’s filters), etc.

The “hard” part of creating queries for human capital information retrieval isn’t deciding which Boolean operators to use. AND/OR/NOT is the easy part. In fact, my daughter learned about Boolean logic last year, including constructing Venn diagrams – in her 1st grade public school class!

The hard part of creating queries is intelligently selecting a combination of words and phrases, and in some cases strategically excluding some words and phrases, that will return highly relevant results – people who are not only likely to be qualified for the position being sourced for, but also highly likely to be interested in the opportunity (i.e., “recruitable”).

Yes – you actually have to think in order to create effective queries that return highly relevant results.

Human-Computer Information Retrieval

Human–computer information retrieval (HCIR) is “the study of information retrieval techniques that bring human intelligence into the search process.”

According to Wikipedia, which IBM’s Watson used heavily to compete in Jeopardy, “The fields of human–computer interaction (HCI) and information retrieval (IR) have both developed innovative techniques to address the challenge of navigating complex information spaces…[and] Human–computer information retrieval has emerged in academic research and industry practice to bring together research in the fields of IR and HCI, in order to create new kinds of search systems that depend on continuous human control of the search process.” (emphasis mine)

The term human–computer information retrieval was coined by Gary Marchionini whose main thesis is that “HCIR aims to empower people to explore large-scale information bases but demands that people also take responsibility for this control by expending cognitive and physical energy.” (emphasis mine again)

For those who simply want information systems to magically provide them with the most relevant results at the click of a button, you should take special note of the fact that experts in the field of HCIR do not believe that people should step out of the information retrieval process and let semantic search/NLP algorithms/AI be solely responsible for the search process.

If you’re interested in learning more about HCIR, I suggest you read this blog – you may be surprised and interested to see who the author is, where he’s been, what he’s done, where he is now, and what’s on his mind.

Talent Mining

In my opinion and experience, Boolean search neither adequately describes nor gives proper credit to what sourcers and recruiters are really doing when they leverage the Internet, resume databases, ATS/CRM applications and social networking sites such as LinkedIn to find candidates, and to what some very talented and highly skilled professionals are able to accomplish with human capital information.

At SourceCon 2010, I spoke about a specialized form of HCIR which I call talent mining, which is essentially human capital information retrieval – a specialized form of IR involving querying and analyzing human capital data (resumes, social network profiles and updates, blogs, etc.) for talent discovery, identification, and ultimately acquisition.

I believe there are at least five distinct levels of Talent Mining:

  1. Skill/Title Search
  2. Concept Search
  3. Implicit Search
  4. Semantic/Natural Language Search
  5. Indirect Search

Talent Mining is not defined by nor limited to Boolean search – any and all information retrieval methods that can be leveraged to discover and return human capital data are applicable and should be used.

At the strategic level, talent mining is the process of transforming human capital data into an informational and competitive advantage, which is much more than simply writing Boolean search strings.

Only the simplest and most basic level 1 talent mining can be performed without much thought – slapping titles and keywords taken directly from a job description into a Boolean search string and hitting “search.”

Beyond that, more advanced level 1 and most certainly levels 2 through 5 talent mining require significant “cognitive energy,” as well as involve continual improvement.

In fact, effective sourcing can and should be an iterative process.

Beyond Boolean & Internet Search

I believe that those who equate sourcing with basic Boolean Internet search don’t fully understand or appreciate the power of human capital data, its many forms and sources, and the many ways that it can be leveraged.

While the Internet has a lot of information, it is also full of garbage (others would call it “noise”) and it does not hold as many “findable” resumes as you may have been led to believe.

There is no denying that non-resume human capital data is valuable, but searching the Internet for non-resume information can easily spiral into an exercise in low ROI, time consuming garbage-sifting. Many don’t realize (or want to recognize) that non-resume data offers shallow information at best and thus has less qualitative and predictive value.

Additionally, the Internet isn’t a database – it’s a network of networks and the information stored on those networks is largely unstructured.

Structured data is an order of magnitude (it could easily be argued many orders of magnitude) more valuable and searchable than unstructured data, if for no other reason than it’s intrinsically high predictive value.

LinkedIn offers a good example of the power of structured human capital data, although a large percentage of LinkedIn profiles are information-anemic. Even so, all profiles are required to have employer and title information, and both are structured, fully searchable fields.

Additionally, corporate ATS’s and major job board resume databases have hundreds of thousands to tens of millions of candidate records – with deep and sometimes well-structured data. I’m perpetually confused as to why there is so much written on Internet sourcing and why I don’t see more people writing and speaking about mining all of the rich human capital data hiding in resume databases and applicant tracking systems.

Perhaps one of the reasons why the sourcing function and role isn’t highly regarded or respected by some is because those people equate sourcing with basic Boolean search. If all they think sourcers and recruiters are doing is directly searching for keywords and titles from job descriptions, then I can actually understand why some people would think of sourcing as an entry level role or function.

However, sourcing isn’t just about Boolean search, it’s about human capital information retrieval.

While Boolean logic is the simplest way to construct an IR query and practically all information systems accept basic Boolean operators, the real “magic” and work of sourcing talent is the iterative, intelligent, and cognitively challenging process of selecting a combination of words and phrases, and in some cases strategically excluding others, analyzing the results returned, making changes to the query based on observed relevance, and repeating the process until an acceptable quantity of highly qualified and matched candidates are identified.

I would personally like to see more sourcing, recruiting and HR conferences and blogs to address human capital information retrieval, specifically with regard to focusing on the sourcing process, as well as deep and structured human capital data. If this happens, I don’t think it will be long before companies start to realize that sourcing can offer a serious strategic competitive advantage, and perhaps invest more in technologies and talented people to achieve a competitive advantage based on human capital data for talent discovery, identification, acquisition, and retention.

  • Glen, again we are surfing the same brainwave. Last week I read an article that essentially bashed resume databases and paid resources — but why are companies spending money on them if their end goal is to use them as little as possible? We have an issue of Pareto paralyzation in the inner sourcing circles — we focus on the 80% which yields the 20%. For those unfamiliar with Pareto (otherwise known as the 80/20 Rule) it states basically that 80% of your results come from 20% of your resources. And vice versa. We are so enthralled with the 80% (the free resources an the Internet itself) that we forget about the structured resources – that can of course be quite expensive – but also are designed specifically as tools and resources to help us do our jobs better. They’re not evil or garbage — they are resources, just like the free stuff, and if we feel that way, then we are simply not approaching them with a proper sourcing mindset (what can I do to get the most out of this resource?).

  • That was way nerdy & makes complete sense. It will fly over the head of most recruiters out there, even though you clearly establish your background, premise, and theory. Nice article. Regarding why folks don’t search their own databases & maximize them…. I have some ideas. It is not uncommon for me to find 10s of thousands of appropriate leads for a client in their own systems. A few key factor is ease of use, functionality, and comfort level with established systems. Boolean & search (at their core) are very simple and easy to get. Mastering them and effecting skilled HCIR results is a whole ‘nother thing.

  • I echo Amybeth’s comments.

    Specially within corporate recruitment ATS / recruitment databases are resume / CV eating blackholes. Loads of external CVs are dumped into the system without any strong process OR search system designed to retrive this data efficiently. I am sure this is one of the major issues in corporate sourcing / recruitment with very little emphasise given. Information is what it is – whether you find it on google, internal database, intranet, job boards, social media. It is more important to know how you are extracting it.

    Also – low hanging fruits like job boards, inurl:resume boolean command and LinkedIn obvious profiles are getting more coverage. Very little people know that this is just a tip of iceberg. This information is very easily available on net for every other recruiter. How are you making use of informaiton in other ways… which are not so obvious. I am amazed to see so many people asking about how do we find online resumes. I believe contacts like sales, marketing etc are found more easily on company site, publicity material, conferences than carrying out inurl:resume search.

    We as recruiters are so addicted to see information in resume format (Job boards, LinkedIn) that we never think there are 1000 other ways information is hidden.

    That is a real sourcing in my eyes….

    Sometimes you have to find wrong people to find right people… Glen Cathey (SourceCon 2011 DC)

  • Harshali

    This will be possible only if we go beyond the job portal but Organization we are in always look for number of people sourced. If the organization start thinking source only well defined people, I think there is some challenges and constructive work. BUT unfortunately its race where people want people only and does not have patience to have people with brains.

    Can you give more elaboration as how we can go ahead with Talent Mining process..

  • Glen, thanks for the plug for HCIR and for my blog! For folks who want to go deep, I recommend looking at where we have the programs and proceedings for the Workshop on Human-Computer Interaction and Information Retrieval. Given that I’m now at LinkedIn and this year’s workshop will be next door at Google, I’m hoping to see submissions this year that apply HCIR in the context of sourcing and recruiting.

  • Pingback: Has Oracle discovered the Holy Grail of searching? | Business Computing World()

  • Pingback: How to Find and Identify Active Job Seekers on LinkedIn()

  • Pingback: What is a Boolean Black Belt Anyway?()

  • Pingback: Boolean Search Strings, Referrals and Source of Hire()

  • Pingback: The Current and Future State of Talent Sourcing()

  • Pingback: 100+ Free Sourcing & Recruiting Tools, Guides, and Resources()

  • Pingback: Boolean Strings, Semantic and Natural Language Search - Oh My! | Boolean Black Belt-Sourcing/Recruiting()

  • Pingback: LinkedIn’s New Non-Boolean Search Functionality | Boolean Black Belt-Sourcing/Recruiting | Boolean Black Belt-Sourcing/Recruiting()