LinkedIn Sourcing Challenge – X-Ray Location False Positives

I was extremely pleased to receive many responses/solutions to the Ruby LinkedIn Sourcing Challenge I posted recently, including some from well-known online sourcing heavyweights, as well as a number from other talented folks who came out of the Internet ether from several continents to show off their skills and take a crack at solving the challenge.

Kudos to those who successfully found people on LinkedIn who have experience with Ruby but do not make explicit mention of it on their profile!

I sincerely hope everyone appreciated seeing the various approaches and methods people utilized to solve the first LinkedIn Sourcing Challenge – that was my primary motivator in posting it.

One thing I noticed from some of the responses is that for a few people, the challenge seemed too easy.

So – if you’re up for another LinkedIn Sourcing Challenge, take a crack at this one – it’s at least a degree more difficult than the last. :-)

X-Ray Location False Positives

Have you ever noticed that when you X-Ray search LinkedIn targeting a specific metro area, some of your results are not actually of profiles of people who live in your targeted metro area?

If you haven’t, let me show you.

Here is a basic LinkedIn X-Ray search: “greater Atlanta area” java j2ee weblogic -dir

If you look at the 8th result, you will see that the person does not live in Atlanta. Rather, he lives in the “Dallas/Fort Worth Area.”

So how does this result turn up?

Initially, I was confused when I saw these results popping up. However, a quick click of the cached result shows exactly why these kinds of non-local profiles appear in your searches:

Location false positive results like these are returned because there are positive hits on the “standard” search terms (e.g., Java, J2ee, weblogic) from the LinkedIn profile itself, and a positive hit on “Greater Atlanta Area” in the section of “Find a different _____ _____” section of public search results.

I don’t think this phenomenon was intentional on LinkedIn’s part, although it would certainly be interesting if it was. :-)

The LinkedIn Sourcing Challenge

Find a way to X-Ray LinkedIn for profiles from a specific metro area that:

  1. Reliably eliminates location false positives
  2. Does not eliminate any profiles that actually are from your target metro area

A bit of advice – before you try to whip through this challenge thinking you have it solved, be sure to thoroughly test your proposed solution.

That means checking multiple pages of results to ensure no profiles of people who do not live in your target metro area have leaked in, as well as making sure your solution doesn’t accidentally or unnecessarily reduce the overall number of legitimate results of local profiles. If the number of results from your solution seems low for the types of people you’re looking for, your solution likely eliminated profiles you didn’t want to.

Who can crack this challenge?

  • Gary Cozin

    Glen- I added “location: greater atlanta area” in quotes to the string & was able to check the first 13 pages of results w/o any false locations.

    Corrected string: “location: greater Atlanta area” java j2ee weblogic -dir


  • Gary Cozin

    Forgot to indicate – my string above was for BING. The following seems to work well in Google w/o false positive locations: location AROUND(2) “greater Atlanta area” java j2ee weblogic -inurl:dir


  • Gary Cozin

    Nope – disregard the Google string- found false positives but not the BING string above- sorry!

  • Gary Cozin

    Last comment – (was trying to delete some comments above but wouldn’t allow me:)
    I got the same results in BING w/o location false positives in this string: location NEAR:2 “greater Atlanta area” java j2ee weblogic -dir

  • Glen,

    Here’s what I tried in Google…. (inurl:in OR inurl:pub) java “location * London, United Kingdom” -dir -jobs

    In page code source – there is a variable called location in table format which clubs all locations in LinkedIn page. I also know that location and area goes hand in hand – hence used “*”.

    Now, if I normally run a search – (inurl:in OR inurl:pub) java “London, United Kingdom” -dir -jobs

    “jason davis”
    “Paul McCann”

    I get above guys who are NOT living in London area….. but they are “False” results.

    If I add them in my original search string (inurl:in OR inurl:pub) java “location * London, United Kingdom” -dir -jobs “Jason davis”..

    Results are 0… This means it has negated the results where location is not mentioned near to “location” variable….

    Till now – I have got this tested OK….

  • Interesting approach Gary. Were you trying to use Bing’s location: operator? For those who might be interested – here is a good list of Bing operators:

    While the search results page does show positive hits of the phrase “Location Greater Atlanta Area,” if you open up a result from Gary’s search ( – cached or otherwise), there actually isn’t any mention of the word “location” anywhere on the profiles.

    Gary – I noticed your search produces less than 150 results (if you click through all pages)…not the 300+ that Bing estimates.

    For what it’s worth – there are 336 results from a keyword search inside of LinkedIn, in a 50 mile radius of 30303 (center of ATL).

    So, I did some quick research and used -location to find some people that your solution does not find.

    Here is the search: -location “greater Atlanta area” java j2ee weblogic -dir

    Here is one example result:

    She’s in Atlanta and mentions java, j2ee, and weblogic.

    You cannot find her using your location NEAR:2 “greater Atlanta area” search – you can test it by adding her first name to your search and see no results are returned: margarita location NEAR:2 “greater Atlanta area” java j2ee weblogic -dir

    However, you can find her using this search: margarita java j2ee weblogic -dir “greater Atlanta area”

    Here is another person your original solution does not find:

    For anyone else responding – please take note of how I approached testing the solution to uncover flaws…

    This challenge isn’t as easy as it seems on the surface…it will take some serious hacking. :-)

  • Luis Cupertino

    I typed the following first: “current=recruitment manager” “manchester, United kingdom” -dir

    The results (8 pages) of this search include someone called David Gittoes who is actually a correct result, but the search includes ‘false’ results.

    I then tried: “current=recruitment manager” “location manchester, United kingdom” -dir

    Although it looks perfect, David and I think a couple of others have gone!

    Any tips on how I can get David back?

  • Gary Cozin

    It seems Sarang’s approach using the * may work correctly eliminating false positives for location in Google: “location * greater Atlanta area” java j2ee weblogic -inurl:dir

  • Gary Cozin

    Thanks for the analysis, Glen – You are correct, it didn’t bring back ‘everyone’ – but there were enough positives to start sourcing them. Very interested to hear the actual ‘correct’ way to do it from the Master!

  • Luis Cupertino

    Having given it a good go with my limited experience, it seems very difficult to get the perfect result through LinkedIn unless you pay or through the search engine. The more you try to perfect the results the more ‘correct’ results you knock out. If you leave it more vague, you get a few false results, but more pages with the correct ones! I’m quite happy with the latter, although they was only 8 pages!

  • Sarang,
    I do recommend using more search keywords so we’re not dealing with such a vast, generic, estimated data set (6,000+).

    Adding a few more terms gets it down to a manegable #. For example: (inurl:in OR inurl:pub) java j2ee apache “location * London, United Kingdom” -dir -jobs

    This search, retaining your “location * London, United Kingdom” tactic returns 269 results:

    Doing exactly what I did with Gary’s approach, excluding the term “location” with -location yields 318 results: (inurl:in OR inurl:pub) java j2ee apache -location “London, United Kingdom” -dir -jobs

    Notice result #4 – first name Teofilis:

    He lives in London and matches the search criteria.

    However, if you try and add his first name to your search format, you get 0 results: teofilis (inurl:in OR inurl:pub) java j2ee apache “location * London, United Kingdom” -dir -jobs

    You CAN find him by adding -location to the search: teofilis (inurl:in OR inurl:pub) java j2ee apache -location “London, United Kingdom” -dir -jobs

    This is but one of many people who do live in London that are excluded by your proposed solution.

    Gary – there is no single correct/right way to solve this challenge. :-)

    I fear that more than a few people took at swing at this challenge, but failing to solve it quickly, gave up.

    Last week’s LinkedIn Sourcing Challenge racked up many responses the same day I posted it – but it was an easier challenge. So far, at 8:34 EST, only 3 people have tried to solve this one so far (publicly).

    Can anyone crack this LinkedIn X-Ray Sourcing Challenge?

  • Leveraging “location” in the search string definitely yields results, but likely the same results anyone/everyone else would find.

    I’ve always been especially interested in finding people that everyone has access to, but cannot and do not find. :-)

    There is no single ‘correct’ way to solve this challenge – likely many different angles, just waiting to be discovered.

  • Luis – that’s the tough question. :-) Using “location” in the search string appears to “work” because local results come up, but using that term eliminates people you don’t actually want to.

    Thanks for being 1of 3 brave souls to publicly take a crack at this challenge Luis – it’s definitely tougher than last week’s Ruby sourcing challenge!

  • Gary Cozin

    Until someone comes up with a better solution to capture the false positive w/location, although I may miss some results with my solution above- it gives me enough results w/location requested that I can use!

  • No doubt Gary – but the challenge I issued to the world (literally!) to return the maximum # of local profiles while eliminating non-local profiles remains unsolved. :-)

    I sincerely hope someone comes forward with a solution.

    As a hint – it may have nothing to do with trying to manipulate the location phrase under the headline…

  • Jung Kim

    Tough challenge indeed and below you will find the search string used via Bing editor (proofread | “american english style”) “location san francisco bay area” -dir
    – not the first name last name you were looking for?

    See URL:

  • Thanks Glen.

    I’ll try again…. :) (rolling up my sleeves)

  • Gary Cozin

    Glen – it’s that ‘hint’ i/we need!!!! Any idea when that will be coming???? :)

  • Luis Cupertino

    I tried using manchester with a number of different operators: inanchor, intitle, post code, zip code, instreamset, company*m etc, but have literally run out of ideas. I hate feeling defeated, but don’t know what else to use. Very annoying.

  • Jung Kim

    After further testing/review from LinkedIn’s advanced search page, I realize my search string via Bing is not returning the maximum number of results (only 126 results)

  • Jung Kim

    Few questions to consider…..
    1. Bing or Google will not retrieve keywords from the recommendation section of the person’s profile.
    2. How long does it take for Bing and/or Google to index profiles for new members?
    3. How long does it take for Bing and/or Google to account for recent edit/update changes to
    LinkedIn profiles?

    This still does not rectify the issue for the huge variance/gap of results when comparing Bing vs. LinkedIn’s advanced search. Thoughts?

  • Jung Kim

    I incorporated the following phrase: -“Bing is not responsible for the content of this page” into my string and increased the total # of results from 126 to 160, but that’s about it. editor (proofread OR american english style) “location san francisco bay area” -“Bing is not responsible for the content of this page”

  • Gary – one more thing.

    Comparing internal LI Search to X-ray search may not be a great idea; as LinkedIn’s google page might be very different than internal (database) page.
    This profile shows no sign of j2ee AND apache keywords. He is listed as product consultant.

    However, if you search him by Eytan AND “The Fizzback Group”, his linked profile shows all J2EE, APACHE, Java skill as well as he is now a “Technical Product Manager” at same company.

    Does this give us any clues?

  • Jung Kim

    Sarang brings up a good point of the limitations of indexing web 2.0 websites via Google and Bing; since they mostly crawl the “surface web”

  • Hi Glen, I tried this string on YAHOO & BING and it yielded a 3 page results w/o the Location False Positives: location NEAR:4 “sacramento” java j2ee weblogic -dir

    At the end of the day you will agree that Google is becoming more of a spoiler and I will also add that, it depends on what search engine you use. Sometimes I find that yahoo and bing provides a Yield – It – As – Is results.

  • John Turnberg

    Laiksyde is on to it but the near# needs to be greater:

    “financial services” “to your network” current near:8 “greater new york city area” “san francisco bay area”

    Using 7 decreased my results while searching for CSS HTML people in NYC

  • Hi Glen,

    Thank you for this new challenge!

    I tried this string for finding the SAP ABAP developers around munich area which gives me 23 profiles from this area i did check in linkedin people search and it yields almost the same result, let me know your comments.


    SAP “ABAP” ~developer “munich area” (inurl:pub | inurl:in) -inurl:dir -~recruitment -“account manager”



  • Gary’s original suggested Bing search solution is a sound one, with the exception that it actually excludes some results of people who do live in the target area. location NEAR:2 “greater atlanta area” java j2ee weblogic -dir

    Bing claims 322 results, but if you click through to the last page (#13), there are only 123 real results.

    You would not be aware of the results you missed with the above search unless you experiment with -location to see the effect (as I have previously noted).

    I’ve configured this Bing search to specifically target the profiles that Gary’s approach cannot find – it pulls 53 real results, the MAJORITY of which are mutually exclusive of the above search. -location “greater atlanta area” NEAR:25 current java j2ee weblogic -dir

    I emphasize MAJORITY because there are a small # of overlapping results that can be found in each query.

    For these outlier results, I’ve noticed a pattern, and it’s odd. I’ll write a post on my findings soon, as it will be easier to demonstrate with images.

    In the meantime, can anyone pick up on the pattern?


  • Jung Kim

    I find it interesting in your last search that Bing’s crawlers has a snapshot of the phrase: “Public profile powered by” for the 53 profiles (the language is listed on the top right hand side on some profiles while others are listed on the bottom of the page). What is going on here?

  • Glen,

    OK – here I go again. I don’t think this giving 100% results – but may be another angle to think…

    In this case I’ve tried proximity between location and “Current”. I observed that the next best static keyword on LI profile after location is “Current”. Hence, I tried NEAR, AROUND operators.

    Bing: “london, united kingdom” NEAR:12 “current” java j2ee weblogic -dir

    Google “current” AROUND(12) “london, united kingdom” java j2ee weblogic (inurl:pub OR inurl:in) -dir

    For some reason though – this also throw back irrelevant results. Like for Google Search – I got this guy…

    I fail to understand that why Google is getting this profile in searh. IT does not have “Current” keyword in proximity to “London, United Kingdom”. I’ve same problem with Bing….

    I believe it has a strong logic to fulfill the requirement – however, not sure why execution is not as per expectations. Are NEAR, AROUND operators 100% accurate?


  • Hi John,

    Thanks for the comments. I tried near:25, near:15 and 10 earlier on both BING and Yahoo and they all still yielded me the same results. Yahoo yielded 27 results while Bing came up with 26 result. Still wondering why the results are that few though?

  • Siacampo (“greater Atlanta area” Near:industry) java j2ee weblogic -dir

    I added Near:industry and for the first several pages did not find false positives.

  • Thanks Siacampo!

    When I checked a cached result from your search, it does not appear that the search engine is interpreting/processing your search the way you think it is.

    For example, looking at this result, you will see it did not process your attempt at using Near the way you intended.

    When using Near, you need to follow it by a # for the word distance, such as Near:2.

    Also – while eliminating false positives is half of the challenge, and perhaps the easier part. The very difficult part is not eliminating results of people who do live in the target location. :-)

  • Sarang,
    I’ve found Google’s AROUND operator to be fuzzy at best, and certainly not always adhered to.

    In my searching, I have yet to observe an instance in which Bing’s NEAR operator clearly isn’t operating as intended, but that doesn’t mean we can’t break it and find one if we try.

  • Balazs Paroczay

    Gents – for a few minutes LinkedIn did not work well and thus I could not get the usual visual.

    Thanks to it I could check that in the summary/overview of the profile (usually it is in the blue box) the structure is the following:

    Mark Wilson
    Technical Architect at Chubb Insurance Company of Europe

    Ipswich, United Kingdom

    Interestingly the words ‘Location’ and ‘Industry’ do not appear in the ‘normal visual’ of the public profile but appeared in this version. So it may mean these words are ON the site, however, in the background somewhere. Therefore I tried to put the word Industry into Sarang’s string and I believe it works now.

    Give it a try on Google: “london, united kingdom * industry” java j2ee weblogic -dir

    Please let me know your thoughts,

  • Hi Balazs,

    You approach looks great and gives accurate results.

    I tried the string changing the location to munich which gives me 4 results all from munich “Munich, Germany * industry” java j2ee weblogic -dir

    When i added the keyword “area” next to munich in the string it gives me 18 results local to munich “Munich area, Germany * industry” java j2ee weblogic -dir

    let me know your thoughts on this



  • Balazs,
    You are definitely on to something by targeting words that are in the search result preview but not in the actual profile.

    However, it appears to artificially limit the # of results.

    Adapting your string to my area, I get 86 local results with Google. “greater atlanta area * industry” java j2ee weblogic -dir

    Using location NEAR:2 on Bing, I get 124 local results – approximately 50% more. location NEAR:2 “greater Atlanta area” java j2ee weblogic -dir

    Any thoughts as to why it does not pull all available local profiles?

  • Glen,

    I think using “Industry” rather than “location” works perfectly. For some reason – “industry” keyword is available in most of the profiles as oppose to “location”. “greater Atlanta area” java j2ee weblogic -dir -industry
    Returns 0 results in Bing and similar in Google. (Google does return 7 irrelevant results)

    I think using “greater Atlanta area” and NEAR in Bing works perfectly. NEAR works very well in Bing as oppose to AROUND in Google industry NEAR:5 “greater Atlanta area” java j2ee weblogic -dir
    132 results in Bing. I’ve checked all the pages – apart from couple of exceptions it works perfectly. (Exceptions include industry keyword as title with location).

    However, if we apply same logic in Google “greater atlanta area * industry” java j2ee weblogic -dir
    It only returns 81 result.

    Let’s compare BING VS Google Result to find out why some profiles have been excluded in Google.

    There is a guy called IGOR

    His profile comes up in BING but not in Google. His profile structure and page source code is exactly similar to any other people which comes up in both Google and BiNG searches. I’m failing to understand why Google has excluded this profile in it’s search. – this guy comes up for Google / BING – both the searches. Compare his page source code to IGOR – literally not difference.

    I think * is not working 100% as expected?

    Glen, does this mean we have a solution for BING but Google still remains unsolved?

    Please verify.

    Guys, we are moving closer it seems….

  • Pingback: LinkedIn’s Dark Matter – Undiscovered Profiles()

  • I might be wrong – this is just a guess – but I think Gary’s approach using the word “location” works on Google because Google indexes this word if it’s under the “dt” HTML tag. It is *invisible* on the page but is indexed.

    Balazs shows another (unrelated) example of that in the Boolean Strings group (

    Profiles that show up with the “-location” search – such as Margarita’s – have been indexed before a change in LinkedIn’s HTML code. You can find the differences in you compare the current and the cached version of her profile (before it’s revisited by Google, of course).

    In general, searching becomes quite hard in the middle of page rearrangements since it takes some time to re-index them.

  • Irina,
    Are you suggesting that in the near future, all available public LinkedIn profiles can be found using the word “location” via Google?

  • Yes, sort-of. Not necessarily all, since, as we know, Google is not great at indexing common “structural” words and phrases on LinkedIn.
    It would be interesting to see if the string with “-location” still picks Margarita when her profile is crawled again and the cached copy is the same as the current profile.

  • Sarang,
    It appears the suggestions posted by you, Gary and me all work within 1 result on Bing: industry NEAR:5 “greater Atlanta area” java j2ee weblogic -dir = 124 “location: greater Atlanta area” java j2ee weblogic -dir = 124 “greater atlanta area” NEAR:25 current java j2ee weblogic -dir = 125

    Admittedly, I did not check each result of each page for total results parity, but from clicking through to the last page of each search, while the results are in slightly different order, I recognized most of the same names.

    However, while all 3 Bing approaches exclude location false positives, something is still telling me we’re not finding all of the available results. I can’t find the Dark Matter, and maybe it’s not there to be found, but 124/125 results is well under half of the 300+ results from an Internal LinkedIn search. Although it’s obviously not and apples to apples comparison between an indexed page and the actual page, that ratio seems quite low. The again, maybe it’s not – which is scary, because it could mean that anyone relying heavily on X-Ray searching to ID people may only be finding a 50% or fewer of the actual profiles that exist in LinkedIn.


    I agree with you that it’s very odd that applying the same search logic (targeting the word location or industry) to Google yields 25% – 50% less results than Bing with the same keywords. Honestly don’t know what’s going on there.


  • I’ll definitely watch that profile, because I am very curious now!

    Perhaps I am simply tired (I need to go to bed – but this is interesting me greatly) – is this what you are seeing structurally different between Margarita’s current profile and her cached profile?

    Current LI profile


    Greater Atlanta Area


    Greater Atlanta Area

    As of today, the last time her LI profile was cached was 3/1. Do you think once it gets recrawled the cached version will reference location this way?


    Greater Atlanta Area

    Even so – the actual page that comes up in the search results already has that format?

    I thought the cached page was used in ranking results, but did not necessarily prevent current pages to appear or not appear?

  • I’ll definitely watch that profile, because I am very curious now!

    Perhaps I am simply tired (I need to go to bed – but this is interesting me greatly) – is this what you are seeing structurally different between Margarita’s current profile and her cached profile?

    Current LI profile


    Greater Atlanta Area


    I had to alter the HTML so it would even show up here after saving the comment…

    Greater Atlanta Area

    As of today, the last time her LI profile was cached was 3/1. Do you think once it gets recrawled the cached version will reference location this way?


    Greater Atlanta Area

    Even so – the actual page that comes up in the search results already has that format?

    I thought the cached page was used in ranking results, but did not necessarily prevent current pages to appear or not appear?

  • Glen,


    Cached pages are used for appearing in search results, *not* the actual pages. Search engines don’t check actual pages when they display the results.


  • Irina,
    I just checked through the results of a Bing X-Ray using -location and found several results that had been cached as of 3/13 (one example = and 1 that has been cached as recently as 5 days ago (3/17 =

    That result that was last cached on 3/17 cannot be found using a “location greater atlanta area” search.

    However, you CAN find this profile, last cached nearly a month prior on 2/18, leveraging “location” as with this search: “location greater atlanta area” java j2ee weblogic -dir

    I’m still going to keep an eye on the LinkedIn profiles that can currently only be found using -location to watch for changes, but was curious what you make of a very recently cached profile that still can’t be found via the “location” phrase and an older one that can?

  • Glen,

    Let me clarify; sorry, I was typing in a hurry before. Of course, this is still a theory to be verified.

    The .dt. HTML tag is apparently found by Google; I am not talking about Bing at the moment.
    In Google, the “correct” search would be with one asterisk:

    “location * greater atlanta area” (etc.)

    This string should find everything after Google caches all profiles. It will find the profiles in the correct location and not the other ones that are false positives.

    (I thought Gary said this but it looks like he was trying Bing and Google’s AROUND.)

    Google also will find the word “Industry” under the same .dt. tag.

    Perhaps Bing also picks the .dt. tag from the “new” profiles, but I haven’t played with it much yet. From the first glance, it looks like the cached and the real pages in your examples do differ when the page is not found, in spite of seemingly recent date of the cached page.

  • Balazs,

    I just posted a related comment here; these two words are in the source code under the “dt” tag.

    Looking at your experience, I am wondering whether the words “under” “dt” become *visible* with some type of web page rendering (perhaps more-text-than-image-oriented, for slow connections, etc.). This would be related to the browser mode, not to LinkedIn.
    The same (being visible) might be true about the “alt” tag that helped you discover using the word “logo” to find groups.

  • Pingback: How I Search LinkedIn to Find and Identify Talent()