Beyond Boolean Search: Proximity and Weighting

Beyond Basic Boolean

Most sourcing, recruiting, and staffing professionals are familiar with the basic Boolean operators of AND, OR, and NOT. However, I have found that few are familiar with what some refer to as “extended” Boolean functionality, such as proximity search and term weighting.

Proximity and term weighting, where supported, are not actually logical (Boolean) operators – they are more accurately referred to as text or content operators.

Whatever you call them – extended Boolean or text operators – they offer sourcers and recruiters significantly more control, power and precision when executing searches, and in the hands of an expert, they can enable semantic search.

Relevance is Everything!

When it comes to search – relevance rules.

Ultimately, any sourcing or recruiting professional knows that what’s most critical in running Boolean searches on LinkedIn, the Internet, a job board, or in an internal resume database is getting relevant results.

However, few people talk about exactly what determines relevance – and I think I know why.

According to Wikipedia, “relevance” denotes how well a retrieved set of documents (or a single document) meets the information need of the user.

The problem is that no search engine, social networking site, or database can “know” what is relevant to you – only you can determine how relevant results are because only you know what you were looking for in the first place!

For sourcing and recruiting, relevant results are typically defined as resumes or profiles of (or information about) potential candidates whose experience and capabilities closely match the hiring profile or job opening that the sourcer or recruiter is trying to find candidates for.

I’d argue that the value of any source of information (LinkedIn, a resume database, the Internet, etc.) lies less in the information contained within, and more in the ability of a user to extract out precisely and completely what the user needs – finding and retrieving any and all appropriately qualified candidates.

Information has no value to you if you are unable to find it and take action on it.

So how can extended Boolean help sourcers and recruiters find more relevant results?

Let’s take a look at proximity first.

Proximity Search

Proximity search functionality enables a user to search for specific terms that are mentioned within a certain distance of other specific terms.

Being able to control how close search terms are to each other can be especially helpful when leveraging the structure of certain websites and pages – I’ll demonstrate this later in the post using LinkedIn and Twitter as examples.

In my opinion, the more powerful application of proximity search lies in the ability to perform natural language or semantic search.

Semantic search uses the science of meaning in language to produce highly relevant search results rather than have a user sort through a list of loosely related keyword results. Words that are close together are often in the same sentence, and when you can search for meaning at the sentence level, you can target people based on what they actually do/what their responsibilities have been.

Being able to target sentences in which people detail their specific responsibilities and level of responsibility is absurdly more powerful than basic keyword search (Level 1 Talent Mining), which is prone to low levels of relevance and false positives.

There are 3 main types of proximity searching: fixed proximity, variable proximity, and adjacency. For the purposes of this post – I will focus only on fixed and variable proximity.

Fixed Proximity Search

Fixed proximity is most commonly represented by the NEAR operator. The search engines that do recognize and support the NEAR operator typically define NEAR proximity as within 1 to 10 words (specific search engines can differ – check their documentation). Monster’s resume database supports the NEAR operator (which doesn’t have to be capitalized, btw) at a fixed distance of up to 10 words.

How could you leverage fixed proximity to find more relevant search results?

If you were looking for a Windows and Exchange administrator, any basic keyword and title search can pull tons of results of resumes that mention all of the search terms, as well as a high percentage of false positive results. False positive results in this example would be of resumes that mention all of the search terms and titles, but the people have never been primarily responsible for administering windows and exchange servers. A 1 year helpdesk professional can show up in these results because all they have to do is mention the keywords somewhere in their resume.

Leveraging fixed proximity, you could craft this (purposefully basic) search using the NEAR operator: Windows and Exchange NEAR admin* and server*.

That search will ONLY return results of resumes/profiles that mention Exchange within 1 to 10 words of any word starting with the root of admin (administrator, administration, administer, administered, etc.).

Being able to control the fact that Exchange MUST be mentioned within close proximity to admin* will dramatically affect and improve the relevance of the search results, typically returning results of candidates who either have a title using both terms and/or candidates that talk about being responsible for Exchange administration.

Here are some examples of sentences from results that demonstrate the variety of relevant results that can be retrieved with the above search:
  • Managed & administered more than 300 Exchange Servers
  • Provisioned & administer multiple Exchange 5.5/2003 servers
  • Not only are there administration duties for Exchange and Blackberry…
  • Exchange/RightFax administrator
  • Installing, Configuring, and Administering Microsoft Exchange 2000 Server
  • Administer a Microsoft Exchange 2003/2007 environment
  • 8+ years of expertise as a System Administrator in Windows 2003 family, Windows 2000 family, MS Exchange 5.5, MS Exchange 2000, and Exchange 2003
  • I am proficient with the following skills; planning, installation and administration of Windows Active Directory, Windows Servers, Exchange Server
  • Windows Server Support, Active Directory,Exchange Server 2000, 2003 administration and Blackberry Server administration
  • Administer Exchange 2003 Server for corporate email

As you can see, being able to control the proximity of specific search terms essentially increases the likelihood of returning results of candidates who have had administrative responsibility for Exchange servers, effectively increasing the relevance of the results, because that’s what we were actually trying to find and identify!

Configurable Proximity

A search engine that supports configurable proximity affords users the ability to precisely control the distance between specific search terms.

This can produce even more relevant results than the NEAR operator, because the NEAR operator’s maximum range of 10 can allow for some non-relevant results to be returned. The farther words are mentioned apart from each other, the less likely it is that they are semantically related. In fact, at a distance over 10 words, each word could easily be mentioned in separate bullet points or in separate sentences on a resume and be completely unrelated.

However, with configurable proximity, a sourcer or recruiter can choose the maximum distance between search terms.

Instead of being limited to a distance of 10 or fewer words, a search engine that allows for configurable proximity allows you to create searches that force terms to be quite close together – as close as you like.

For example, you could choose to search for only people who mention Exchange within 5 words of any word starting with the root of admin (administrator, administration, administer, administered, etc.), regardless of order. A maximum distance of 5 words will dramatically increase the relevance of the search results because mentioning those 2 search terms at such a close range makes it more likely that they are mentioned in the same bullet point or sentence and thus more likely to be semantically related.

Essentially, this search will only return results of people who specifically mention something about being responsible for administering Exchange at least once in their resume. By employing this kind of search, a sourcer is actually performing a semantic search, targeting sentence-level meaning, as they are looking specifically for people who talk about having a particular responsibility – not just looking for documents that happen to contain the search terms.

Leveraging Website and Page Structure with Proximity Search

Once you have noticed a consistent pattern to the structure of certain websites and pages, you can use Internet search engines that support proximity search to target the distance between search terms to yield highly relevant search results.

Although Google supposedly supports proximity search with their undocumented AROUND(x) search operator, I have found its reliability to be suspect. Perhaps that’s why it’s not officially documented? :-)

The good news is that Bing’s configurable proximity search functionality of NEAR:x seems to work quite well and consistently.

To leverage the structure of certain websites such as LinkedIn, here is a quick example of how you can target current titles and companies when using Bing.

site:linkedin.com current near:3 “engineer at Google” “san francisco bay area”

In this query, all of the results must have the phrase “engineer at Google” within 3 words of the word “Current,” which is on every LinkedIn profile.

If you click on any of the cached results, you can see how Bing happily returned results of people who have the phrase “engineer at Google” in their current title field:


With Bing’s NEAR:x functionality, it is remarkably simple to X-Ray Twitter and target people in specific locations who mention specific titles and/or skill terms in their bios.
For example, let’s say you wanted to find Twitter profiles of user experience professionals who live in the New York area. You could run a search like this on Bing to force the search engine to return only results that mention UX within 15 words of “Bio” and “New York” within 3 words of “Location:”

site:twitter.com bio near:15 UX location near:3 new york

You can see how Bing’s proximity search helps you target terms in Twitter bios and location text:

Viewing a cached result displays Bing’s NEAR:x flawless execution:

How’s that for a relevant result?

Basically as good as it gets – I wanted someone who lives in the NY area who is a User Experience professional, and that’s exactly what I got! That is relevance!

Of course, when searching Twitter, it is especially important to realize that people can be very creative in how they may describe themselves (titles, skills, etc.), their experience, and their location – they can enter whatever they want.

As such, you could not find the above Twitter bio by searching only for “Drupal.”

Performing Semantic Search with Configurable Proximity

You can perform basic semantic search by targeting sentence-level meaning using Bing’s support of configurable proximity.

For example, let’s say you were searching for resumes on the Internet and wanted to find people who have had a specific responsibility, such as configuring juniper routers.

You could run a basic search like this: (inurl:resume OR intitle:resume) configuring near:5 juniper juniper near:5 routers

And see results like this:

Of course, there are many different ways to run that search – I only wanted to demonstrate the power of being able to control how close search terms are to each other, especially when targeting responsibilities, typically stated in verb/noun combinations. This allows you to perform semantic search at the sentence level.

Now that we’ve played around a bit with proximity search, let’s move onto the other half of extended Boolean – variable term weighting.

Variable Term Weighting

Talented sourcers and recruiters know that not all terms are equally important in a query.

In most queries and searches, certain search terms are more important than others. When running standard Boolean queries, all search terms are considered/weighted equally – and this is the stone that the makers of so-called semantic search applications often throw at Boolean search.

Unfortunately, many search engines and database search interfaces simply assign relevance to results by the number of search term “hits” in each document. In most cases, the simple frequency of search terms does not correlate to relevant results. This is where the derisive description “buzzword bingo” comes from, most often used to denote that there is little skill involved in running Boolean searches counting matched keywords.

Using an Information technology hiring profile as an example – if a sourcer was looking for candidates who have significant experience administering Windows servers and Exchange email servers they might create a simple Boolean query such as this: Windows AND Exchange AND server* and admin*.

That search is highly likely to return and rank candidates who are Windows systems administrators who mention Windows many times in their resume/profile and happen to mention Exchange once or twice as highly relevant because of the number of “hits” for Windows – which is by nature a very common term in resumes.

This would leave the sourcer with having to sort through a large volume of false positive results (that contain the keywords, but are not of people who have been primarily responsible for administering Windows and Exchange servers) to find the candidates who actually have been primarily responsible for administering Exchange servers as well as Windows servers.

Search engines that offer users the ability to assign different weights to each search term enable sourcers and recruiters to move beyond simple buzzword matching and take control of the relevance of the results. Essentially, with variable term weighting you can assign a number value to words to increase their weight when ranking retrieved documents – which does not change the total number of results, but the ORDER of the results.

Using the same example as above, a sourcer using a search engine that supports variable term weighting could create a Boolean search string to more heavily weight the term “Exchange.” That Boolean query would pull the same number of results as the first search that had no term weighting – however, it would sort and rank the results heavily favoring resumes/profiles that mention Exchange more often in relation to the other search terms, increasing the likelihood that the sourcer can quickly identify candidates who have had experience being responsible for administering and supporting Exchange servers.

By employing variable term weighting, you can positively affect the relevance of the search results.

Final Thoughts

Hopefully I’ve shed some light on how being able to control the proximity of two search terms can yield results that are FAR more relevant than results that simply mention the two terms anywhere in a document or form – this is the critical difference between the semantic similarity between a search and its results vs. the lexical similarity between a search and its results.

There are countless ways you can apply extended Boolean functionality such as variable term weighting and proximity searching to nearly any industry/hiring profile to create searches that return highly relevant results – results that are more relevant than those that can be achieved with standard Boolean logic.

Using a search engine that supports both variable proximity and variable term weighting can empower sourcers and recruiters to quickly find large volumes of highly relevant results, increasing productivity and achieving Just-In-Time sourcing and recruiting.

I wish the makers of search engines would seek less to “dummy-down” search interfaces and functionality and incorporate more powerful search capability that allows users to take significant control over the relevance of their search results.

There are a few search engines and ATS/CRM systems that support both configurable proximity search and variable term weighting.

Does yours?

Related Posts Plugin for WordPress, Blogger...

About Glen Cathey

Glen Cathey is a sourcing and recruiting thought leader with over 16 years of experience working in large staffing agency and global RPO environments (>1,000 recruiters and nearly 100,000 hires annually). Starting out his career as a top producing recruiter, he quickly advanced into senior management roles and now currently serves as the SVP of Strategic Talent Acquisition and Innovation for Kforce, working out of their renowned National Recruiting Center with over 300 recruiters. Often requested to speak on sourcing and recruiting best practices, trends and strategies, Glen has traveled internationally to present at many talent acquisition conferences (5X LinkedIn Talent Connect - U.S. '10, '11, '12, Toronto '12, London '12, 2X Australasian Talent Conference - Sydney & Melbourne '11, '12, 6X SourceCon, 2X TruLondon, 2X HCI) and is regularly requested to present to companies (e.g., PwC, Deloitte, Intel, Booz Allen Hamilton, Citigroup, etc.). This blog is his personal passion and does not represent the views or opinions of anyone other than himself.

  • Eric Sperry

    I can tell this is going to be very helpful in my sourcing, as you addressed several problems I come up against almost constantly when only using basic Boolean.  Great article!

  • http://twitter.com/JenniferLenifer Jennifer D

    Fantastic article! Thank you!

  • Traci

    Great info! Can you tell me the names of the ATS systems that support this because I know that mine definitely does not!

  • http://amitaigivertz.com Amitai Givertz

    Glen,

    Although Google supposedly supports proximity search with their undocumented AROUND(x) search operator, I have found its reliability to be suspect.

    This is actually a really useful operator whose real value is often misunderstood. It’s primary advantage over other ways of proximity search is that it will produce results where the order of the SERP keywords are not affected by the order within the query.

    Look at this…:

    cisco * * * area network

    …compared to:

    cisco AROUND(3) area network

    In the first example the keywords appear in their query sequence — “cisco” then “area” then “network” But with AROUND(3) the weighting of results doesn’t appear to be based on the key words themselves but their relative proximity. So “cisco” could be the third or second keyword in order as is the case more often than not.

    It appears that the greater the number used with the AROUND(x) operator the better it works for semantic searching. Consider, the closer keywords appear to one another in their weighted order [as in cisco * * * area network] the greater the accuracy relative to relevance. When keywords occur with a greater number of other words separating them, the greater the accuracy relative to context.

    For example, compare…

    cisco * * * area network router

    …with:

    cisco AROUND(9) area network router

    You need the additional keyword[s] to create the effect we are looking for otherwise the number of results balloons to a useless number. Also to see the effect you need to view the cached page. The highlighted keywords are handy to zone in on the actual results. Also the actual results likely fall outside the range of Google’ssnippet. Maybe that’s why the results looked suspect to you,

    If you play with this operator you start to see it’s potential for producing unique types of result.

    Try this and you’ll see what I mean:

    (cisco AROUND(5) area network) AROUND(5) david

    Now, how would you express that with Google’s “documented” operators?

    Have fun!

  • paulduplantis

    You didn’t provide a sample for variable weighting a keyword string that I see. Could you provide an example. Would Bing and sites such as Dice, CB and Monster support this?

  • Pingback: 100+ Free Sourcing & Recruiting Tools, Guides, and Resources()