Beyond Boolean: Human Capital Information Retrieval

When I recently spoke at SourceCon in New York, I showed an example Boolean search string that could be used as a challenge or an evaluation of a person’s knowledge and ability.

The search string looked something like this:

(Director or “Project Manage*” or “Program Manage*” or PM*) w/250 xfirstword and (truck* or ship* or rail* or transport* or logistic* or “supply chain*”) w/10 (manag* or project)* and (Deloitte or Ernst or “E&Y” or KPMG or PwC or PricewaterhouseCoopers or “Price Waterhouse*”)

During the presentation, an audience member asked me why there wasn’t any use of site:, inurl:, intitle:, etc. I responded by acknowledging that for many, sourcing and Boolean search seems to be synonymous with Internet search – however, this is definitely not the case.

Boolean Logic is Simply the Simplest Way to Search

Some (but I hope not too many!) sourcing and recruiting professionals may be surprised to learn that Boolean logic significantly predates the Internet as well as computers – by over a century!

I still run into sourcers and recruiters that are not aware that the word “Boolean” comes from the man who invented Boolean Logic in the 19th century – George Boole. Boolean Logic is the basis of modern computer logic, and George Boole is regarded in hindsight as one of the founders of the field of computer science.

With Boolean logic being created in the 1800′s – it’s pretty obvious that Boolean logic is not just for searching for people and information on the Internet.

Practically any information system from which you need to search and retrieve information from “speaks” Boolean.

This is understandable, because using Boolean logic is the simplest way to construct a search. When you want a combination of terms/phrases you use AND, when you want at least one of a group of terms/phrases you use OR, and when you don’t want something you use NOT. It really doesn’t get any easier than that.

When anyone types more than a single word or phrase into Google, Bing, LinkedIn, Amazon, eBay, etc., they’re performing Boolean search, because spaces are automatically converted to ANDs. Billions of people across the globe are running basic Boolean strings whether they are aware of this or not, which is a testament to how easy Boolean search is.

Sourcing isn’t about Boolean Search Strings

Sourcing candidates is much more than Boolean search strings – they are but one aspect of sourcing.

Sourcing talent is more accurately and completely defined and described as human capital information retrieval.

Information retrieval (IR) is “the science of searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web.”

Leveraging information systems for talent discovery and identification is about searching documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the Internet for human capital information, including titles, companies, responsibilities, skills, technologies, social network updates, blog posts, resume information, event and association lists, etc.

With IR, an information retrieval process begins when a user enters a query into an interface.

Queries are simply formal statements of information needs. For a sourcer or recruiter, their information need is typically to find information that will lead them to discover and identify people with specific skills, experience, capabilities, education, etc.

While using Boolean operators is arguably the easiest way to construct a query, IR queries do not have to be limited solely to Boolean logic, as can be seen in the various non-Boolean query modifiers of Internet search engines (here are some of Google’s and Bing’s), LinkedIn’s advanced search operators, faceted search (e.g., LinkedIn’s filters), etc.

The “hard” part of creating queries for human capital information retrieval isn’t deciding which Boolean operators to use. AND/OR/NOT is the easy part. In fact, my daughter learned about Boolean logic last year, including constructing Venn diagrams – in her 1st grade public school class!

The hard part of creating queries is intelligently selecting a combination of words and phrases, and in some cases strategically excluding some words and phrases, that will return highly relevant results – people who are not only likely to be qualified for the position being sourced for, but also highly likely to be interested in the opportunity (i.e., “recruitable”).

Yes – you actually have to think in order to create effective queries that return highly relevant results.

Human-Computer Information Retrieval

Human–computer information retrieval (HCIR) is “the study of information retrieval techniques that bring human intelligence into the search process.”

According to Wikipedia, which IBM’s Watson used heavily to compete in Jeopardy, “The fields of human–computer interaction (HCI) and information retrieval (IR) have both developed innovative techniques to address the challenge of navigating complex information spaces…[and] Human–computer information retrieval has emerged in academic research and industry practice to bring together research in the fields of IR and HCI, in order to create new kinds of search systems that depend on continuous human control of the search process.” (emphasis mine)

The term human–computer information retrieval was coined by Gary Marchionini whose main thesis is that “HCIR aims to empower people to explore large-scale information bases but demands that people also take responsibility for this control by expending cognitive and physical energy.” (emphasis mine again)

For those who simply want information systems to magically provide them with the most relevant results at the click of a button, you should take special note of the fact that experts in the field of HCIR do not believe that people should step out of the information retrieval process and let semantic search/NLP algorithms/AI be solely responsible for the search process.

If you’re interested in learning more about HCIR, I suggest you read this blog – you may be surprised and interested to see who the author is, where he’s been, what he’s done, where he is now, and what’s on his mind.

Talent Mining

In my opinion and experience, Boolean search neither adequately describes nor gives proper credit to what sourcers and recruiters are really doing when they leverage the Internet, resume databases, ATS/CRM applications and social networking sites such as LinkedIn to find candidates, and to what some very talented and highly skilled professionals are able to accomplish with human capital information.

At SourceCon 2010, I spoke about a specialized form of HCIR which I call talent mining, which is essentially human capital information retrieval – a specialized form of IR involving querying and analyzing human capital data (resumes, social network profiles and updates, blogs, etc.) for talent discovery, identification, and ultimately acquisition.

I believe there are at least five distinct levels of Talent Mining:

  1. Skill/Title Search
  2. Concept Search
  3. Implicit Search
  4. Semantic/Natural Language Search
  5. Indirect Search

Talent Mining is not defined by nor limited to Boolean search – any and all information retrieval methods that can be leveraged to discover and return human capital data are applicable and should be used.

At the strategic level, talent mining is the process of transforming human capital data into an informational and competitive advantage, which is much more than simply writing Boolean search strings.

Only the simplest and most basic level 1 talent mining can be performed without much thought – slapping titles and keywords taken directly from a job description into a Boolean search string and hitting “search.”

Beyond that, more advanced level 1 and most certainly levels 2 through 5 talent mining require significant “cognitive energy,” as well as involve continual improvement.

In fact, effective sourcing can and should be an iterative process.

Beyond Boolean & Internet Search

I believe that those who equate sourcing with basic Boolean Internet search don’t fully understand or appreciate the power of human capital data, its many forms and sources, and the many ways that it can be leveraged.

While the Internet has a lot of information, it is also full of garbage (others would call it “noise”) and it does not hold as many “findable” resumes as you may have been led to believe.

There is no denying that non-resume human capital data is valuable, but searching the Internet for non-resume information can easily spiral into an exercise in low ROI, time consuming garbage-sifting. Many don’t realize (or want to recognize) that non-resume data offers shallow information at best and thus has less qualitative and predictive value.

Additionally, the Internet isn’t a database – it’s a network of networks and the information stored on those networks is largely unstructured.

Structured data is an order of magnitude (it could easily be argued many orders of magnitude) more valuable and searchable than unstructured data, if for no other reason than it’s intrinsically high predictive value.

LinkedIn offers a good example of the power of structured human capital data, although a large percentage of LinkedIn profiles are information-anemic. Even so, all profiles are required to have employer and title information, and both are structured, fully searchable fields.

Additionally, corporate ATS’s and major job board resume databases have hundreds of thousands to tens of millions of candidate records – with deep and sometimes well-structured data. I’m perpetually confused as to why there is so much written on Internet sourcing and why I don’t see more people writing and speaking about mining all of the rich human capital data hiding in resume databases and applicant tracking systems.

Perhaps one of the reasons why the sourcing function and role isn’t highly regarded or respected by some is because those people equate sourcing with basic Boolean search. If all they think sourcers and recruiters are doing is directly searching for keywords and titles from job descriptions, then I can actually understand why some people would think of sourcing as an entry level role or function.

However, sourcing isn’t just about Boolean search, it’s about human capital information retrieval.

While Boolean logic is the simplest way to construct an IR query and practically all information systems accept basic Boolean operators, the real “magic” and work of sourcing talent is the iterative, intelligent, and cognitively challenging process of selecting a combination of words and phrases, and in some cases strategically excluding others, analyzing the results returned, making changes to the query based on observed relevance, and repeating the process until an acceptable quantity of highly qualified and matched candidates are identified.

I would personally like to see more sourcing, recruiting and HR conferences and blogs to address human capital information retrieval, specifically with regard to focusing on the sourcing process, as well as deep and structured human capital data. If this happens, I don’t think it will be long before companies start to realize that sourcing can offer a serious strategic competitive advantage, and perhaps invest more in technologies and talented people to achieve a competitive advantage based on human capital data for talent discovery, identification, acquisition, and retention.