Boolean Strings, Semantic and Natural Language Search – Oh My!

An entertaining blog post by Matt Charney was recently brought to my attention in which he tells the world to shut up and stop talking about Boolean strings – he argues that Boolean search is a dying art and that “investing time or energy into becoming a master at Boolean is a lot like learning the fine art of calligraphy or opening a Delorean dealership.”

You can read the snippet regarding Boolean Strings below – click the image to be taken to the entire post, in which Matt addresses mobile recruiting and employer branding.

Matt Charney Boolean Strings

I enjoyed Matt’s post and his approach, but I did not find his arguments to be thoroughly sound – although I suspect he wasn’t trying to make them so (after all, his blog is titled “Snark Attack”).

I’m going to take the opportunity to address the points Matt raised – not because I am trying to stay “relevant,” as some might suggest (my blog is a not-for-profit personal passion and I don’t consult/train for a fee), and also not because I have a vested interest in “keeping Boolean search alive” (because I really don’t) – rather, because I am still amazed that a fundamental lack of understanding of search and information retrieval – both “manual” Boolean search and “automated” taxonomy driven and/or AI-powered semantic search – and I am constantly trying to help people not only understand both, but also appreciate their intrinsic limitations, as well as separate reality from hype.

So, without further ado:

#1 Searching for anything (people, information, etc.) isn’t about Boolean strings – it’s about information retrieval

Boolean logic just happens to be the easiest way to construct a query, which, when it comes to sourcing talent, is essentially asking a system to return results of people who have a high probability of being qualified for and interested in your current and/or future opportunities, regardless of job seeking status. If you’re not writing queries with the specific intent to predictively control the probability of match beyond keywords, I’d argue you’re not approaching talent sourcing from the right perspective, which will drastically affect the efficacy of your searches.

If you haven’t already read my Search: Beyond Boolean article, I strongly advise you to do so.

I do not believe sourcers and recruiters need to become masters of “Boolean search” – I believe they need to develop a mastery of information retrieval, which involves knowing how to use sites and systems to find people who are highly likely to be excellent matches for a company’s talent needs, as well as people who are highly likely to know and can recommend/refer people from the target talent pool – regardless of the search interface, syntax or solution.

How can you expect to get good answers (results/people) when you don’t know how to ask the right questions (construct effective queries)?

#2 Boolean logic is the easy part of search strings

Boolean logic is this simple:

Boolean Honey Badger

As I a fond of telling folks, my daughter learned Boolean logic in kindergarten – public school, no less!

The “hard” part of any search for talent, regardless of whether or not Boolean operators are used, is thinking and formulating and testing and continually improving searches that are maximally inclusive and that return people who have a high probability of being qualified for, interested in and a great cultural fit for the opportunity being sourced/recruited for, regardless of a person’s job-seeking status.

Surely no one would advocate that sourcers should stop thinking and trying to positively control outcomes and just trust in the search button to do this “work?”

#3 You really need to understand semantic search before you are swayed by anyone’s opinion on the subject, including those who are selling semantic search solutions

Semantic search can sound like a magical solution to all of your problems if you don’t really understand how it works, what the limitations are, and exactly how a particular solution is attempting to make good on any “semantic search” claim.

For a comprehensive primer on semantic search for sourcing and recruiting, read The Guide to Semantic Search for Sourcing and Recruiting.

#4 Google’s semantic search capability is geared towards very basic information retrieval for common queries

However even this has obviously proved very complex – after all, we’re talking about Google (super smart and talented engineers), it’s 2013, and better results for basic natural phrase searching is JUST being rolled more effectively.

I must say that Google “understanding” what you mean when you ask Google, “What’s the closest place to buy the iPhone 5s to my home?” and knowing where you live to provide relevant results, as technologically challenging as it obviously is (otherwise it wouldn’t have taken the brilliant team at Google this long to do it), is still child’s play compared to, “Find me people who have at least 3 years of experience performing module integration testing on target PC and mobile platforms with specific windows applications and associated drivers and libraries, who are familiar with Windows 8 logo requirements, who live within a commutable distance to the work location, who are highly likely to be willing to accept the compensation we can pay and who are highly likely to fit our team’s and corporate culture.”

However, you could easily pull that off with a basic Boolean search 15 years ago.

To gain a better understanding of the limitations of black box semantic search and artificial intelligence-powered matching solutions, check out this presentation I delivered as my first SourceCon keynote:

#5 Monster’s semantic search solution is quite good, but…

Monster’s built and continues to improve upon what I believe is the most comprehensive set of sourcing/recruiting taxonomies in existence.

However, their solution certainly has its limitations, as all semantic search solutions do (see the Slideshare above), and whether or not “the results tend to be far more accurate and relevant than the traditional keyword searches upon which Boolean logic relies” (as Matt suggests) depends on who’s performing the search and their skill level.

I’m speaking from direct experience and also from seeing what junior sourcers (with my training) can accomplish using traditional keyword searches in comparison with taxonomy-driven semantic searches. I am not sure if you’ve had the opportunity to perform that level of side by side comparison testing – so I think it’s important for me to share that, as many people have opinions and write/speak about Boolean search, semantic search and matching solutions, even though they have little to no experience with any, let alone use them for filling 10’s of 1,000’s of job openings annually.

#6 Facebook’s Graph Search, with it’s “natural language search” interface is interesting, and I like it – but it has very limited semantic search

While it’s pretty cool to write your searches as full sentences, Graph Search isn’t really doing that much other than parsing your natural language query into its component parts – you can see this after running a search because your search is basically broken down into employer, title, location, etc.

Here’s an example of a simple Graph Search for Google software engineers in New York:

Facebook Graph Search Example Query

Here is how Facebook breaks down that natural language search into employer, title, location, etc.:

Facebook Graph Search Example Boolean Broken Out Fields

Why bother typing in a natural language query into Graph Search when you can simply enter an employer, a title, and a location? It’s certainly less “work.”

For example, compare that broken down Facebook Graph Search with an equivalent search in LinkedIn, which doesn’t involve the use of any Boolean search operators, by the way – looks pretty similar, right?

Title, current company, location…

LinkedIn Search Comparison with Facebook Graph Search

With Facebook Graph Search, while there is clearly some level of semantic search going on, as I’ve found examples where Graph Search returns results with related titles I did not search for, it’s quite limited and certainly not built for anything beyond super-surface-level matching, and in many cases it only returns exact lexical matches (e.g., Graph Search doesn’t seem to “know” that some software engineers could call themselves “programmer,” “coder,” “developer,” etc.).

For deeper insight into how Graph Search works under the hood, check out this very informative post by Xiao Li (engineering manager on the natural language team for Graph Search) and Maxime Boucher (research scientist for Graph Search) – you’ll see that while they’ve built something very cool and quite complex, but their use of semantics is primarily focused on “understanding” and interpreting and suggesting queries and definitely not on searching for and returning results of people who are highly likely to have the skills and experience you’re looking for.

Facebook Graph Search Semantic

Let’s also not forget that not everyone even enters professional information on their Facebook profile, and that there is no way to search for skills and experience beyond searching for people who “like” things (companies, technologies, languages, etc.). “Liking,” “following” or being a member of a group on Facebook doesn’t mean you actually have any experience with with it (e.g., Android).

#7 Semantic search/matching solutions are handicapped by the variety and absence of text generated by the people in your target talent pool

This is a point that few people really seem to appreciate, as all of the taxonomy and/or A.I.-powered semantic search technology in the universe will never:

  1. “Know” all of the ways in which people can represent their experience in text
  2. Be able to search and match on something that isn’t even mentioned and thus isn’t there to be found and matched in the first place

This means that when you rely solely on a semantic matching solution, you are only finding people who happen to mention your search terms and those that the matching solution “thinks” are related (which are not necessarily relevant – only the person performing the search can determine relevance).

Correspondingly, you would also be excluding all of the people who actually DO have the experience you’re looking for, but who do not mention their experience in ways that the matching solution “recognizes” or “understands,” as well as all of the people who DO have the experience you’re looking for, but who do not explicitly mention their experience.

I hope people and companies care about the fact that when they rely solely on a semantic search solution, they are not necessarily finding the best people – they are only finding the people who happen to mention the search terms and those that the matching solution “thinks” are related. The same goes for basic Boolean searches – this is a core challenge for information retrieval of any kind.

While it is certainly convenient to type in a few terms and let a semantic matching solution do the “work” for you, there is a cost.

#8 Boolean is eternal

As John Childs pointed out in his excellent comment on Matt’s post, you can cover up, hide and remove Boolean logic search interfaces (type two or more words in Google and you’re using the AND operator), but you can’t actually eliminate it from the information retrieval process, including solutions leveraging semantic search.

When a semantic search solution takes a single search term or title and searches for other “equivalent” terms or titles, the solution itself is essentially relying on the Boolean OR operator, and when you use a semantic search solution to search for titles, keywords, location, etc., the solution is utilizing the AND operator to execute the query to include the keywords AND the titles AND the location, etc.

By the way, Boolean logic the foundation of nearly all computer programming – check out this video:

Boolean logic in programming YouTube video

That means the software being used to power semantic search solutions have been written leveraging Boolean logic and they rely on Boolean logic to “work” in the first place.

Oh, the IRONY – at least for those who are trying to argue that Boolean is obsolete.

#9 I don’t make any money from consulting on the topic of search, so I probably don’t know what I’m talking about