Using Extended Boolean to Achieve Semantic Search in Sourcing

When it comes to sourcing and recruiting, semantic search is perhaps the most powerful way to quickly find people who have experience you’re looking for.

Now, I am not talking about black box semantic search (e.g., Google, Monster’s 6Sense, etc.).

I’m referring to user-defined semantic search, where you tell a search engine exactly what you want with your query, and the search engine doesn’t try to “understand” your search terms or “figure out” what you mean through taxonomiesRDFa, keyword to concept mapping, graph patterns, entity extraction, fuzzy logic, etc.

If you’re not very familiar with semantic search (for sourcing – not search engines), I strongly suggest you read my comprehensive article from January 2012 on the subject: The Guide to Semantic Search for Sourcing and Recruiting.

Of course, at the heart of semantic search is semantics, which is the study of meaning inherent at the levels of words, phrases, and sentences.

In this post, I’m going to review some sites/databases that claim to support proximity search (Monster, Google, Bing, Exalead) and show you how to use proximity search (a form of extended Boolean) to achieve level 3 semantic search – which is grammatical/natural language search using noun/verb combinations in your queries.

If you’d like to learn more about the 5 levels of semantic search, you can view this Slideshare presentation from my 2010 SourceCon keynote (starting on slide 72).

But before I get to explaining extended Boolean search, I am first going to explain the challenges of “standard” Boolean search which will set the stage for an appreciation of the power of level 3 semantic search.

The Problem “Standard” Boolean Search (AND / OR/ NOT)

The vast majority of sourcers and recruiters create Boolean search strings with keywords and titles that simply return a collection of words on a resume, profile or page – not people with specific experience.

As we all know – your search terms can appear on a resume, a LinkedIn profile or web result, but that doesn’t guarantee you that the result is viable or even relevant. That’s precisely why it can take so much time reviewing results – you have to inspect each result to see if it’s relevant.

Relevance can be defined as the extent to which a search result matches the information need based on the intent of the person executing the search.

Highly “relevant” results are those that match exactly what the searcher is looking for.

For sourcing, highly relevant results are essentially people who are highly likely to be qualified and ideally interested in the opportunity you are sourcing/recruiting for.

Most sourcers and recruiters are actually trying to find people who have specific skills and experience, because most hiring managers typically want people who have been paid for very specific responsibilities.

Of course, just because certain words appear in a person’s resume or profile it does not mean that the person has been primarily responsible for working with those words (typically skills, technologies, etc.).

For example, if you were looking for someone who has iOS development experience, searching for [develop* AND iOS], even along with titles and other terms, can and will return many people who do not have iOS development experience, but simply mention those words at various points of their resume or profile. That’s because the search engine doesn’t “know” what you’re looking for – it simply returns results with the keywords you asked for.

If your intent is to find iOS developers and you return results of people who mention iOS and development, but they do not have any iOS development experience – these results are known as false positives. The words you searched for are in the results, but the results do not match your need/intent.

For example:

Semantic Search False Positive

Using standard Boolean operators, you simply cannot control precisely where your keywords and titles appear in results (resumes, LinkedIn profiles, etc.), and without that control, you will suffer lower relevance and a higher percentage of false positive results.

Even so, you can move beyond keyword and title search and search for exact phrases, which is what many people in sourcing and recruiting refer to as natural language search.

Natural Language Search – Exact Phrases

Natural language search is essentially just what it says – searching by natural language using phrases or even full sentences.

Facebook’s Graph Search is a good example of a natural language search interface (it doesn’t even support Boolean queries), and many people try typing in full questions in sentence form into Google and Bing, which could be classified as natural language search. Furthermore, Google and Bing do a pretty good job of figuring out that you’re searching for the answer to a question when you search for sentences such as “How ____ ____ ____…,” “What is ____ ____ ____,” etc.

However, when it comes to sourcing and recruiting, most articles on the subject of natural language search focus on searching for exact phrases to find people with specific experience.

As a basic example, “worked at Google” will return results of people who mention that they have worked at Google.

Searching for “database design” will return results of people who mention that exact phrase, which in theory should be mentioned in resumes, LinkedIn profiles and web results of people who have been responsible for database design.

The problem with exact phrase semantic search is that while it is effective in finding people who express their experience in a single, highly specific way (the exact phrase), it is also conversely and highly ineffective at the same time, given that there is almost an infinite number of ways that a person could write sentences in which they refer to a specific experience/responsibility/skill.

When you search for an exact phrase, you are actually creating dark matter as a result of the elimination of a large number of viable results/potential candidates through exclusion. In other words, an exact phrase search excludes all relevant results of people who actually have the experience you need, but express that experience in ways other than the exact phrase you searched for.

For example, below you will find just a small sample of the many ways people actually refer to the concept of “database design” without actually using the exact phrase of “database design.” These examples are taken from actual resumes and profiles – notice the varying distance between the mentions of database from design and the variation in the order of the terms.

  • Responsible for designing, developing and maintaining the database
  • design, development and tuning Essbase database
  • Responsible for physical and conceptual design, development, testing and maintaining database applications using, SQL server
  • Skilled in database development and design
  • database management, development/design of
  • database system and design relational data model
  • Involved in the design of the Database and Developed Stored Procedures
  • Design and Implement Database using Oracle 10g
  • design DB2 Report database with metadata information
  • Responsibilities included design of the database and subsequent coding
  • database schema design
  • Involved in Architectural design for the databases and database tables
  • database and system design

You should be aware that when you perform exact phrase searching, you are actually excluding and eliminating more viable results than you are including.

The Extended Boolean NEAR Operator

Extended Boolean in the form of proximity search allows you to move beyond the significant limitations of exact phrase search and instead search for words/terms within a specific distance of others, regardless of order.

Check out this excerpt from an article by the Stanford University Natural Language Processing Group, which is a team of faculty, research scientists, postdocs, programmers and students who work together on algorithms that allow computers to process and understand human languages, including work with sentence understanding, machine translation, probabilistic parsing and tagging, grammar induction, word sense disambiguation, and automatic question answering.

Extended Boolean Stanford

Proximity search also has the benefit of drastically increasing the relevance of results while significantly decreasing false positives.

When it comes to sourcing and recruiting, this is because you are able to target target sentence-level semantics, typically in the form of verb/noun combinations, enabling you to find people based on their responsibilities – what they actually do/have done, as opposed to a collection of keywords (standard free text queries).

Thankfully, some search engines support extended Boolean in the form of proximity search via the NEAR search operator.

Monster Proximity Search

Monster is the only major job board resume database that recognizes and supports the NEAR Boolean operator without limitations (some folks have discovered that CareerBuilder, although undocumented, apparently does support the NEAR operator, albeit with some limitations – hence I am not covering it here).

According to Monster’s documentation, the NEAR operator has a maximum proximity of 10 words. For example, the search string [router* NEAR config*] returns ONLY those resumes that have router/routers and configure/configured/configuring/configurations, etc. within 10 words of each other, forwards or backwards – order does not matter.

As an example of a very basic semantic search – let’s say my intent is to find tax managers.

With standard Boolean, I would be limited to searching for “Tax Manager,” “Manager of Tax,” and other titles I can think of that my target talent pool might use. If I were to not use title searches, I would be searching for tax* AND manager (and other words) as keywords, and I would literally have no control over where the words are returned in results, which would yield a high percentage of false positives (e.g., resumes in which tax* and manager, etc., are mentioned, but the people are not tax managers).

With the NEAR operator, however, we can simply ask Monster to return results in which people mention Tax within 10 words of Manager: [Tax* NEAR Manager]

That may not sound very special or powerful to you, but take a look at just one example result:

Monster NEAR Tax Manager Example

I want you to notice a few things.

First, do you see the objective? The person mentions they would like a “manager position in a corporate tax department.”

How’s that for a match to the intent of our search? That’s the power of sentence-level semantics.

Additionally, notice the last 2 titles, both containing “Manager, Tax” – most people would never have thought of that title when searching for Tax Managers. Yet, thankfully, due to the NEAR operator, we don’t actually have to think of all of the different ways a person might mention the fact that they managed the tax function for a company. As long as the two terms are within 10 words of each other, regardless of order, they will be returned.

This isn’t trivial – it’s quite powerful, and ends up returning Dark Matter results that would otherwise be excluded by standard searches, like this one:

Monster NEAR tax manager example 2

Of course, we can accomplish more than title searching using Monster’s NEAR operator.

Perhaps the most powerful use of NEAR is to search for specific responsibilities, which can be broken down into verbs and nouns.

For example, if we’re looking for people who have developed mobile apps, specifically with iOS – develop* is the verb/responsibility and iOS and mobile are the nouns, which could be expanded conceptually to include Iphone*/iPad*, etc.

On Monster, you could run this in a search with everything else you’d be looking for: [develop* NEAR (mobile OR iOS OR iPhone* OR iPad*)], and you would return results like this one:

Monster NEAR operator iOS mobile developer

While the result may look like any other result you might get from a standard Boolean search, the fact that we required Monster to only return results in which there was at least one condition where any word beginning with the root of develop* was within 10 words of mobile, iOS, iPhone or IPad is why this result was returned.

It doesn’t take a genius to see this person has been primarily responsible for developing mobile apps, including iOS apps – which is actually exactly what we asked for by specifying that Monster only return results with the verb/responsibility of development in close proximity to what we want people to have experience developing (iOS platform).

There is almost no limit to the power of the NEAR operator to achieve semantic search – all you have to do is learn how to translate your needs into responsibilities (verbs – what people have done/are doing) and nouns (what they’ve done/are doing it with).

For example, let’s say that among other things, we need people who have experience with SAP implementations.

That’s as simple as [SAP NEAR implement*]

Monster NEAR SAP implementations

We can even be more precise if we needed to – such as looking for people with SAP implementation and specific module configuration experience (let’s say SD).

No problem – just ask for it: [SAP NEAR implement* AND config* NEAR SD]

Monster NEAR SAP implementation SD configuration

Hopefully the power of semantic search at the sentence level via the extended Boolean NEAR operator is quite clear.

When you can search for specific responsibilities (verbs) within close proximity to specific nouns, you can find people who have the exact experience you’re looking for, rather than hoping to, which is the case with standard Boolean search, where your search terms must be present, but in no particular proximity to any other term(s).

For more information and examples of level 3 semantic search, you can review my Slideshare presentation on semantic search from January 2012.

All is not perfect with the NEAR operator, however. Sometimes, results can be returned where the 10 word distance of the NEAR condition bleeds over from one sentence to the next, breaking the sentence level meaning we’re targeting.

For example, below you can see SD mentioned within 10 words of “configured” but they are in 2 separate sentences, where this person is talking about configuring SAP for ALE/IDoc processing, not configuring SD.

Monster NEAR operator SAP SD example of sentence bleed

It would be magical if we could tell the search engine to only return results where the verbs and nouns we’re looking for are in the same sentence, but to the best of my knowledge, no one has developed that capability yet.

Extended Boolean Search with Exalead, Bing, and Google

I realize that not everyone has access to Monster, so let’s take a look at three Internet search engines that claim to support proximity search – Exalead, Bing and Google.

Exalead Proximity Search

Exalead is a decent-sized Internet search engine that claims to support proximity searching via NEXT/NEAR operators (NEXT not really being all that helpful beyond quasi-phrase searching).

I say “decent-sized” because it’s not a major search engine in my opinion – mostly because it does not appear to index near as many pages as Google or Bing, and this is especially and painfully evident when you do back to back searches using the same LinkedIn X-Ray search comparing Exalead to Google or Bing.

While Exalead’s own documentation refers to fixed proximity of 16 words, Exalead does appear to support configurable proximity functionality (NEAR/X).

It can be quite difficult to tell if a search engine is actually adhering to proximity search parameters – this is because many search engines automatically weight result based on the proximity of search terms.

Some search results – especially those ranked highest (page 1, etc.) – will appear on the surface to match the specified proximity (e.g., NEAR/15). As such, you have to inspect quite a few search results – I recommend skipping at least to page 10 (or the last page of results if less than 10 pages) to look for any evidence that the proximity distance isn’t being adhered to.

Let’s take a look at an Exalead X-Ray of LinkedIn searching for people who mention  (iOS OR iPhone OR iPad) NEAR (develop OR developed OR developing OR developer OR development)

(site:www.linkedin.com/in OR site:www.linkedin.com/pub) (iOS OR iPhone OR iPad) NEAR (develop OR developed OR developing OR developer OR development) location NEAR “san francisco bay area” 

Here is an example result – you can see several instances in which there is obvious proximity:

Exalead iOS proximity search results 1

Tying proximity search back to achieving semantic search – the intent of the query was to find people who have iOS/iPhone/iPad development – and as you can see, the people have the target experience due to specifying that the search results must have some mention of development (verb/responsibility) within close proximity of what we wanted people to have development experience with (noun/target skill).

If you have doubts as to whether or not a search engine really supports proximity search, you can look for terms that are not likely to be commonly mention in close proximity.

For example:

(site:www.linkedin.com/in OR site:www.linkedin.com/pub) portlet NEAR redesign

There are only 2 results returned by that search, and both have “portlet” less than 10 words from redesign.

If you remove the NEAR command, you get 28 results, which serves and further evidence.

Now, to check to see if Exalead really supports configurable proximity, I’ll use another LinkedIn X-Ray search, first looking for 2 terms without proximity, then trying to search for the two terms just using the NEAR operator, then using the NEAR operator to attempt to search for the 2 terms within 4 terms of each other (NEAR/4):

(site:www.linkedin.com/in OR site:www.linkedin.com/pub) OpenGL Maya – 409 results

(site:www.linkedin.com/in OR site:www.linkedin.com/pub) OpenGL NEAR Maya – 195 results

(site:www.linkedin.com/in OR site:www.linkedin.com/pub) OpenGL NEAR/4 Maya – 71 results

The fact that the results get progressively smaller is a good indicator, as it is logical that they should going from no specified proximity, to broad proximity, to tight proximity.

I also inspected quite a few results and the ones I checked appeared to all at least have 1 instance in which OpenGL was mentioned withing 4 words of Maya. However, I also found 1 result like this result, in which NEAR is actually highlighted, which raises a few questions, doesn’t it?

Bing Proximity Search

Bing used to support configurable proximity search brilliantly.

I’ve written several posts about it in the past, including this one with examples of LinkedIn X-Ray searches, resume searches and Twitter bio searches. In that post, you can see evidence of just how effective Bing’s proximity search used to be.

Bing Proximity Search Used to Work

However, if you click any of the links from that post or try to leverage NEAR:X in your own Bing searches, you will see the functionality is obviously no longer supported by Bing.

Sigh.

To offer a quick comparison to Exalead, look at the differences in the search results for these two very basic proximity searches of the entire Internet:

Exalead: OpenGL NEAR/5 maya – 3,521 results

Bing: OpenGL NEAR:5 maya – 8 results

The fact that the Bing search only returns 8 results suggests something is obviously going on, but I can’t tell exactly what. I have my theories, but I won’t bore you with them.

Google Proximity Search

Google supposedly supports proximity search with the undocumented AROUND(x) operator and functionality.

Some of you might recall the stir this post written by a Google employee caused in the sourcing and recruiting community:

Google Around Proximity Operator

Of course, this got everyone excited – including me – but as soon as I started testing Google’s proximity search functionality using AROUND(x), I found results that did not have any instance in which the specified proximity of terms was met.

Using the same LinkedIn X-Ray search I’ve demonstrated earlier, here is what it would look like on Google:

(site:www.linkedin.com/in OR site:www.linkedin.com/pub) OpenGL AROUND(4) Maya – 714 results

A cursory glance yields the perception that the 4 word proximity between OpenGL and Maya is being adhered to…

Google Around Proximity Operator OpenGL Maya 1

However, examining some of the results from page 1 alone show that there are some results that do not have any instance in which OpenGL is within 4 words of Maya:

Google Around Proximity Operator OpenGL Maya 2

Finding examples of Google’s AROUND(x) failing to work isn’t limited only to LinkedIn X-Ray searching – even the 4th result on page 1 of one of the example searches from the original post claiming Google has always supported AROUND(x) functionality (“jerry brown” AROUND(9) “tea party” doesn’t have any instance in which “jerry brown” is within 9 words of “tea party”

I’m quite disappointed.

Final Thoughts

Natural language search in the form of targeting exact phrases definitely works, but also excludes many viable results (real people, btw!) due to the fact that people can express their experience in an almost infinite number of ways at the sentence level.

Being able to control how close words are mentioned to each other via the NEAR operator enables us to achieve a more powerful level of semantic search – tapping into sentence structure (verbs &  nouns) and thus the power of meaning in language beyond simple and fixed phrases.

With extended Boolean search, instead of throwing a bunch of keywords together and having to sift through large volumes of irrelevant and false positive results, we can harness semantics to find people based on what they have experience doing, not just based on what words they happen to include somewhere in their resume or LinkedIn profile.

Kudos to Monster for being the only major online job board to properly support the NEAR operator, and props to Exalead for not only supporting NEAR, but going a step further and supporting configurable proximity via NEAR/x.

While I am disappointed that Google and Bing don’t fully support proximity search – Google’s AROUND(x) doesn’t work and the asterisk is poor-man’s proximity, and Bing dropped configurable proximity search that was working well – I also realize both search engines were created for the most common use case/average searcher instead of power users/searchers.

Be aware that some ATS’s do support  support proximity search – iCIMS supports the NEAR operator, and Bullhorn and PCRecruiter both support configurable proximity search (because both use Lucene) – check the documentation and/or ask your support rep for more info on supported search functionality and proper syntax.

As always – happy hunting!