Is Boolean Search Boring and Less Effective than Semantic Search?

 

Boolean Search is Boring

Do you think Boolean search is boring, tiresome and ineffective, and that semantic search delivers faster results that count?

I was struck by the image Marc Drees used in his #SOSUEU: the day after post, which you can see above.

I would have loved to sit on that panel discussion and contribute my experience and thoughts on the subject – I was actually supposed to attend and speak but I left the sponsoring company just prior to the event.

Such is life. :)

Regardless, I am happy to weigh in here, and I believe that the majority of people simply aren’t looking at search properly in the first place.

I’ll address the statements from #SOSUEU in order.

Boolean Search is Boring

Let’s hit the reset button first and get a couple of things straight:

If Boolean search is boring, then searching the Internet, Amazon, etc.,  for anything is boring. Any time you use more than 1 term in your search on Google, Bing, Amazon, eBay, etc., you’re using Boolean search. The same is true with LinkedIn and many other sites you can search to find people. Am I alone in this simple understanding?

This may confuse some people, but “Boolean Search” isn’t about Boolean – it’s about search. Searching is about finding things you need and want, and there are many ways that you can search for and find those things. Do you find it more “exciting” to select from a list or check a box on a LinkedIn facet?

Github list and LinkedIn Facet

Whether you type in keywords, select from a list, check a box, apply a filter, etc., all you’re doing is configuring a query to get results to review.

I don’t think there is anything intrinsically “boring” about Boolean operators. I think the real issue is that some recruiters just don’t enjoy searching for people, and if you don’t enjoy something it’s common to find it boring. The same people who bash Boolean search don’t find typing terms into separate search fields, picking from lists and checking boxes exciting or particularly enjoyable.

Some people really like searching databases, social networks, the Internet, etc., for people to engage and recruit. Others would be happy to post jobs and wait for people to come to them and would rather not ever have to search for potential candidates to engage.

To say that Boolean search is boring is to say that carefully looking for and trying to find people (paraphrasing Merriam-Webster’s definition of search) is boring. Semantic search solutions alleviate the boredom of searching for those who find it tedious, because similar to posting a job and getting responses, semantic search solutions often allow you to enter minimal information and get results.

I’d be willing to bet that those same recruiters who don’t enjoy searching to find people to engage also don’t enjoy reviewing responses to job postings as many are unqualified – they would much rather be given a list of well matched people, which semantic search solutions claim to be able to do.

What do you think?

Boolean Search is Tiresome

If you think Boolean search is tiresome, I say you’re lazy.

Why?

Well, for basic Boolean search, we’re talking 2 operators and 2 of modifiers – in many search engines you don’t even need to type AND, as any old space will do.

Is typing in OR, -, ” ” and ( ) really tiresome?

Is filling out multiple search fields (e.g. Twitter below) any less tiresome than typing a couple of Boolean operators? By the way, the common elements of most “advanced search” interfaces are essentially AND’s (All of these words), OR’s (Any of these words), NOT’s (None of these words), and quotation marks (This exact phrase).  Oh wait – that’s basic Boolean, right? Snap!

Twitter Advanced Search Interface

While I love the concept of natural language queries, I actually find writing them tiresome and limiting (e.g. Facebook Graph Search).

Facebook Graph Search Tiresome

Is it less tiresome to use Facebook’s search fields? They’re all essentially linked by AND’s, btw.

Facebook Graph Search Interface

If there is anything that people really find tiresome about any non-semantic search, Boolean or otherwise, is that it requires you to think and expend mental energy.

I know – thinking is tough!

If you’re read a lot of my content over the years, you know I like to bring up the study of information retrieval techniques that bring human intelligence into the search process, otherwise known as Human–computer information retrieval (HCIR).

The term human–computer information retrieval was coined by Gary Marchionini who explained that “HCIR aims to empower people to explore large-scale information bases but demands that people also take responsibility for this control by expending cognitive and physical energy.”

I know the dream is to have computers read our minds so we don’t have to type a single search term – but until that day comes, you should be aware that experts in the field of HCIR do not believe that people should mentally “check out” of the information retrieval process and let semantic search/NLP algorithms/AI be solely responsible for the results.

Having said that, I can see why some people would see the process of building a massive OR string for all of the ways in which a person could possibly reference a certain skill or experience to ensure maximum inclusion as tiresome. It’s exhaustive work – especially if you don’t want to exclude great people who simply don’t reference their experience with the most commonly used search terms.

This is one of the main value propositions of semantic search solutions for people sourcing – through taxonomies and/or NLP-powered AI/algorithms, a user can enter in a single search term and effectively search for other related terms without requiring the user to know and search for all of the other related terms.

Sound great, right? Letting a taxonomies and/or algorithms doing the conceptual search work for you is certainly a lot less tiresome than having to perform research and pay attention to search results, looking for patterns of related and relevant terms to continually refine and improve Boolean searches.

To be sure, there are some solid semantic search offerings for talent sourcing on the market today that make sourcing talent faster and easier for people who find Boolean search boring, tiring, difficult and ineffective. However, to think that semantic search solutions don’t come with their own host of challenges and limitations would be ridiculous.

In case you haven’t see it before, you may want to quickly drive through my Slideshare on Artificial Intelligence and Black Box Semantic Search vs. Human Cognition and Sourcing derived from my 2010 SourceCon keynote to get a high-level overview of some of the challenges faced by semantic search solutions specific to talent sourcing.

If you don’t want to flip through the presentation, here’s a very brief summary:
  • Human capital data/text is often incomplete and widely varied – many people with the same job have different titles, explain their experience using different terms, and in many cases simply do not explicitly mention critical skills and experience
  • Semantic search solutions can only search for what is explicitly stated in resumes and social profiles
  • Taxonomies are difficult, if not impossible to make “complete” and thus they can exclude qualified talent
  • AI/NLP can be useful in determining related terms, but not necessarily relevant terms
  • Many semantic search solutions suffer from “once and done” query execution – there is no way to refine and improve searches or to exclude false positives/irrelevant results

Boolean Search is Ineffective

The effectiveness of Boolean search strings has more to do with the person writing the queries and the sources being searched and less to nothing to do with Boolean logic.

When used in a search, Boolean operators are essentially being used as a very basic query language, and according to Wikipedia, “an information retrieval query language attempts to find…information that is relevant to an area of inquiry.”

Any search a user conducts, whether they know it or not, is essentially a formal statement of an information need.

How effectively a user can translate their information need into a query/search string largely determines the relevance of the results – Boolean logic itself often has little to nothing to do with search relevance!

Assuming a sourcer/recruiter has a solid understanding of  what they’re looking for (a dangerous assumption, by the way – try giving 5 people the same job description and then ask them separately what they’re looking for), the effectiveness of any search they use, whether Boolean, faceted, semantic, etc., is more dependent upon the user’s ability to “explain” their needs to the system/site being searched via an effective query.

For example, let’s say you’re sourcing for a sales leader and you have a military veteran hiring initiative. Regardless of whether you decided to search your ATS (e.g. Taleo), LinkedIn, CareerBuilder, Indeed, etc., you’re essentially asking the same question, “Do you have anyone with experience leading sales teams who is also a veteran?” (among other things – just trying to keep it simple here).

How would you construct a Boolean search for a sales leader who is also a veteran?

How would (and/or could!) a semantic search engine search for a sales leader who is also a veteran?

Ultimately, it comes down to how many ways can someone who has sales leader experience could possibly express that experience on their resume or profile, and how many ways someone who is a veteran could possibly reference their veteran status.

Do you know them all?

Does any semantic search engine know them all? Some don’t know any because they simply aren’t included in their taxonomies. Others could use NLP to find some, but definitely not all. However, a person with decent sourcing skills could produce a veteran query like this one in about 5 minutes (not too tiresome) and continuously improve it:

(Army OR USAR OR “U.S.A.R.” OR “Army Reserve” OR “Army Reserves” OR Navy OR USN OR USNR OR “U.S.N.” OR “U.S.N.R.” OR “Naval Reserves” OR “Naval Reserve” OR “Air Force” OR USAF OR “U.S.A.F.” OR USFAR OR “U.S.A.F.R.” OR “Force Reserve” OR “Force Reserves” OR “Forces Reserve” OR “Forces Reserves” OR Marines OR “Marine Corp” OR “Marine Corps” OR USMC OR “U.S.M.C.” OR USMCR OR “U.S.M.C.R.” OR MARFORRES OR “Marine Expeditionary Force” OR MEF OR “Coast Guard” OR USCG OR “U.S.C.G.” OR USCGR OR “National Guard” OR Veteran OR “honorable discharge” OR “honorably discharged”)

The effectiveness of any search, Boolean or semantic, can be measured by the relevance of the results (e.g., a high percentage of the results are exactly what the searcher is looking for) and the inclusiveness of the results (how many relevant results are retrieved as a percentage of the relevant results available to be retrieved – those available but not retrieved are excluded into the abyss of Dark Matter).

Only the person conducting the search can judge the relevance of the results returned by any search, Boolean or semantic, as relevance is defined as the ability (as of an information retrieval system) to retrieve material that satisfies the needs of the user, and only the user truly knows what their needs are.

When it comes to inclusion, I am aware of some folks who are proponents of “good enough” searches and search solutions (e.g., finding some good people quickly is good enough and there is no need to find all of the best people).

Try telling your company’s executives that you really don’t care about finding the best people available to be found and that you believe that the quickest and easiest to find should be good enough for your company’s hiring needs.

Let me know how that works for you.

Okay, but what about Context and Weighting?

Some folks argue that Boolean search is ineffective due to the fact that Boolean searches are not contextual (e.g. you search for a term and it shows up not in the person’s recent experience, or in their experience at all) and that all terms in a query are given equal weight (e.g., if you search for 10 terms, some terms are likely to be more important than others, but basic Boolean logic doesn’t allow you to differentiate the value/relevance of specific terms).

Admittedly, some Boolean searches are.

However, if you have well parsed/structured data and a search interface that allows you to exploit that structured data, you can use simple Boolean logic to search contextually. For example, most recent/past employer and title, most recent/past experience, etc.

Some search engines do in fact allow you to assign different weights to terms within a single Boolean query (e.g. Lucene, dtSearch, etc.) – this functionality is sometimes referred to extended Boolean search. These same search engines allow you to search for terms in or exclude them from specific areas (e.g. top of the resume, bottom of the resume) via proximity search – functionality that also allows you to perform powerful user-specified semantic search at the verb/noun level to target people with specific responsibilities (have goosebumps yet?).

Okay, that was easy to address.

So, Is Boolean Search Boring and Less Effective than Semantic Search?

For some people, yes – Boolean search is boring, tiresome and less effective than semantic search.

For others, Boolean search is exciting, easy, and more effective than semantic search.

What do I know about any of this?

I’ve evaluated, implemented and used extended Boolean search solutions as well as semantic search solutions. In addition to using them myself on a regular basis, I help 100’s of recruiters use them effectively to find the right people for 1,000’s of real positions. From my practical hands-on experience, I can tell you that sometimes semantic search produces very good results – sometimes it doesn’t. Sounds similar to Boolean, yes?

To all of the “Boolean Bashers” out there – you’re missing the point.

The effectiveness of any Boolean search has more to do with more to do with the person writing the queries and the sources being searched and less to nothing to do with Boolean logic and search syntax

Let’s remember what the goal of sourcing is – to easily find and successfully engage people who are highly likely to be the right match for the roles being sourced/recruited for.

The ultimate sourcing solution would parse resumes and profiles into highly structured data that could be searched via semantic search (autopilot) and extended Boolean (manual control) to ensure that any user could quickly find the right people under any circumstance.

I’m honestly not sure why anyone believes sourcing solutions have to leverage semantic search and exclude Boolean/extended Boolean search capability.