Boolean Search: Will George Boole Rotate in His Grave?
January 12, 2016
Boolean logic is, for most math wonks, the father of Boolean logic. This is a nifty way to talk about sets and what they contain. One can perform algebra and differential equations whilst pondering George and his method for thinking about fruits when he went shopping.
In the good old days of search, there was one way to search. One used AND, OR, NOT, and maybe a handful of other logic operators to retrieve information from structured indexes and content. Most folks with a library science degree or a friendly math major can explain Boolean reasonably well. Here’s an example which might even work on CSA ProQuest (nèe Lockheed Dialog) even today:
CC=77? AND scam?
The systems when fed the right query would reply with pretty good precision and recall. Precision provided info that was supposed to be useful. Recall meant that what should be included was in the result set.
I thought about Boole, fruit, and logic when I read “The Best Boolean and Semantic Search Tool.” Was I going to read about SDC’s ORBIT, ESA Quest, or (heaven help me) the original Lexis system?
Nope.
I learned about LinkedIn. Not one word about Palantir’s injecting Boolean logic squarely in the middle of its advanced data management processes. Nope.
LinkedIn. I thought that LinkedIn used open source Lucene, but maybe the company has invested in Exorbyte, Funnelback, or some other information access system.
The write up stated:
If you use any source of human capital data to find and recruit people (e.g., your ATS/CRM, resume databases, LinkedIn, Google, Facebook, Github, etc.) and you really want to understand how to best approach your talent sourcing efforts, I recommend watching this video when you have the time.
Okay, human resource functions. LinkedIn, right.
But there is zero content in the write up. I was pointed to a video called “Become a LinkedIn Search Ninja: Advanced Boolean Search” on YouTube.
Here’s what I learned before I killed the one hour video:
- The speaker is in charge of personnel and responsible for Big Data activities related to human resources
- Search is important to LinkedIn users
- Profiles of people are important
- Use OR. (I found this suggestion amazing.)
- Use iterative, probabilistic, and natural language search, among others. (Yep, that will make sense to personnel professionals.)
Okay. I hit the stop button. Not only will George be rotating, I may have nightmares.
Please, let librarians explicitly trained in online search and retrieval explain methods for obtaining on point results. Failing a friendly librarian, ask someone who has designed a next generation system which provides “helpers” to allow the user to search and get useful outputs.
Entity queries are important. LinkedIn can provide some useful information. The tools to obtain that high value information are a bit more sophisticated than the recommendations in this video.
Stephen E Arnold, January 12, 2016
Search Is Marketing and Lots of Other Stuff Like Semantics
January 12, 2016
I spoke with a person who asked me, “Have you seen the 2013 Dave Amerland video? The video in question is “Google Semantic Search and its Impact on Business.”
I hadn’t. I watched the five-minute video and formed some impressions / opinions about the information presented. Now I wish I had not invested five minutes in serial content processing.
First, the premise is that search is marketing does not match up with my view of search. In short, search is more than marketing, although some view search as essential to making a sale.
Second, the video generates buzzwords. There’s knowledge graph, semantic, reputation, Big Data, and more. If one accepts the premise that search is about sales, I am not sure what these buzzwords contribute. The message is that when a user looks for something, the system should display a message that causes a sale. Objectivity does not have much to do with this, nor do buzzwords.
Third, presentation of the information was difficult for me to understand. My attention was undermined by the wild and wonderful assertions about the buzzwords. I struggled with “from stings to things, from Web sites to people.” What?
The video is ostensibly about the use of “semantics” in content. I am okay with semantic processes. I understand that keeping words and metaphors consistent are helpful to a human and to a Web indexing system.
But the premise. I have a tough time buying in. I want search to return high value, on point content. I want those who create content to include helpful information, details about sources, and markers that make it possible for a reader to figure out what’s sort of accurate and what’s opinion.
I fear that the semantics practiced in this video shriek, “Hire me.” I also note that the video is a commercial for a book which presumably amplifies the viewpoint expressed in the video. That means the video vocalizes, “Buy my book.”
Heck, I am happy if I can an on point result set when I run a query. No shrieking. No vocalization. No buzzwords. Will objective search be possible?
Stephen E Arnold, January 12, 2016
Omnity: A Worry for the Googlers?
January 8, 2016
A New Year. Another Google challenger. Anyone remember Qwant.com which kept Eric Schmidt awake at night? Yep, right.
I read “Semantic Search Engine Omnity Reckons It Can Beat Google.”
The write up had a great phrase: Tyranny of the taxonomy.”
This should make the purveyors of Boot Camps, software, and human controlled term schema developers perspire. Well, maybe only a little on the upper lip.
The new sheriff is Omnity described this way:
Omnity is a new kind of search engine that asks the question: What if, instead of searching for keywords like “baseball scores” or “best-rated Nintendo 64 games,” a search engine let users search across disparate documents, from Wikipedia pages and news articles to patent filings and PDFs, in order to find shared interconnectedness?
The method used? The article reports:
when Omnity searches across documents, it throws out “grammatical glue but semantic noise”—commonly used words like “the,” “he,” “she,” or “it.” Stripped of this “noise,” Omnity is then able to analyze the remaining “rare words” to find common threads that link together different documents.
Once the company works out the name confusion with the 3D utility product, the system will be easier to find online. Check it out at https://www.omnity.io/.
If Mr. Schmidt is reading this blog post, now you can dream about Qwant and Omnity.
PS. The write up had a wonderful quote from the founder of Omnity, Brian Sager, which I reproduce here:
“I use Google every day and it’s great, but no, we’re more likely to buy Google.”
Worry, Mr. Schmidt. Worry.
Stephen E Arnold, January 8, 2016
Search Online Too Long? Tietze Disease Will Get You
January 8, 2016
I read “Technology Addict Develops Tietze Disease from Spending 23 Hours a Day Online.” I know, gentle reader, that using search engines can be frustrating. I know too that most of my readers spend hours upon hours trying to make Bing, Google, and Yandex point to a specific document which will answer your most pressing business question.
The fix is little more than search systems which return relevant results without ads and fluff.
Be aware. If you find yourself investing hours upon hours in crafting queries, you may succumb to “shooting pains” in your “back and chest.” You may have strained your “costal cartridges.”
The culprit Tietze disease.
Rest easy. The problem is benign. Go back to searching. Be tough.
Stephen E Arnold, January 8, 2016
The Long Goodbye of Internet Freedom Heralded by CISA
January 8, 2016
The article on MotherBoard titled Internet Freedom Is Actively Dissolving in America paints a bleak picture of our access to the “open internet.” In spite of the net neutrality win this year, broadband adoption is decreasing, and the number of poor Americans forced to choose between broadband and smartphone internet is on the rise. In addition to these unfortunate trends,
“Congress and President Obama made the Cybersecurity Information Sharing Act a law by including it in a massive budget bill (as an extra gift, Congress stripped away some of the few privacy provisions in what many civil liberties groups are calling a “surveillance bill”)… Finally, the FBI and NSA have taken strong stands against encryption, one of the few ways that activists, journalists, regular citizens, and yes, criminals and terrorists can communicate with each other without the government spying.”
What this means for search and for our access to the Internet in general, is yet to be seen. The effects of security laws and encryption opposition will obviously be far-reaching, but at what point do we stop getting the information that we need to be informed citizens?
And when you search, if it is not findable, does the information exist?
Chelsea Kerwin, January 8, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
In Scientific Study Hierarchy Is Observed and Found Problematic to Cooperation
January 8, 2016
The article titled Hierarchy is Detrimental for Human Cooperation on Nature.Com Scientific Reports discusses the findings of scientists related to social dynamics in human behavior. The abstract explains in no uncertain terms that hierarchies cause problems among human groups. Perhaps surprisingly to many millennials, hierarchies actually forestall cooperation. The article explains the circumstances of the study,
“Participants competed to earn hierarchy positions and then could cooperate with another individual in the hierarchy by investing in a common effort. Cooperation was achieved if the combined investments exceeded a threshold, and the higher ranked individual distributed the spoils unless control was contested by the partner. Compared to a condition lacking hierarchy, cooperation declined in the presence of a hierarchy due to a decrease in investment by lower ranked individuals.”
The study goes on to explain that regardless of whether power or rank was earned or arbitrary (think boss vs. boss’s son), it was “detrimental to cooperation.” It also goes into great detail on how to achieve superior cooperation through partnership and without an underlying hierarchical structure. There are lessons to take away from this study in the many fields, and the article is mainly focused on economic metaphors, but what about search vendors? Organization does, after all, have value.
Chelsea Kerwin, January 8, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Oscobo: A Privacy Centric Web Search System
January 7, 2016
Before you get too excited, the Oscobo service uses results from Bing. Yep, that is the search engine which uses Baidu in China and Yandex in Russia for results.
The Oscobo search system is about privacy for its users, not about the dreary precision, recall, and relevance issues. “Oscobo Is An Anonymous Search Engine Targeting Brits” reports that the system reminded the article’s author of DuckDuckGo and Hulbee, both working to ensure the privacy of their users.
The results are filtered to cater to the needs of the UK online search it seems.
According to the write up, Oscobo’s business model
is simple paid search, based on bare-bones search data (i.e. whatever string a user is searching for) and their location — given the product is serving the U.K. market this is assumed to be the U.K., but whatever search string they input may further flesh out a more specific location.
There is no definition of “paid search”, however. You can check out the system at https://oscobo.co.uk/.
Stephen E Arnold, January 7, 2016
Did Apple Buy Topsy for an Edge over Google
January 7, 2016
A couple years ago, Apple bought Topsy Labs, a social analytics firm and Twitter partner out of San Francisco. Now, in “Apple Inc. Acquired Topsy to Beat Google Search Capabilities,” BidnessEtc reports on revelations from Topsy’s former director of business development, Aaron Hayes-Roth. Writer Martin Blanc reveals:
“The startup’s tools were considered to be fast and reliable by the customers who used them. The in-depth analysis was smart enough to go back to 2006 and provide users with analytics and data for future forecasts. Mr. Roth and his team always had a curiosity attached to how Apple would use Twitter in its ecosystem. Apple does not make use of Twitter that much; the account was made in 2011 and there aren’t many tweets that come out of the social network. However, Mr. Roth explains that it was not Twitter data that Apple had its eye on; it was the technology that powered it. The architecture of Topsy makes it easier for systems to search large amounts of data extremely fast with impressive indexing capabilities. Subsequently, Apple’s ecosystem has developed quite a lot since Siri was first introduced with the iPhone 4s. The digital assistant and the Spotlight search are testament to how far Apple’s search capabilities have come.”
The article goes on to illustrate some of those advances, then points out the ongoing rivalry between Apple and Google. Are these improvements the result of Topsy’s tech? And will they give Apple the edge they need over their adversary? Stay tuned.
Cynthia Murrell, January 7, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Reverend Bayes: Still Making Headlines
January 6, 2016
Autonomy, now owned by Hewlett Packard Enterprise, was one of the first commercial search and content processing firms to embrace Bayesian methods. The approach in the 1990s was not as well known and as widely used as it is today. Part of the reason was the shroud of secrecy dropped over the method. Another factor was the skepticism some math folks had about the “judgment” factor required to set up Bayesian methods. That skepticism is still evident today even though Bayesian methods are used by many of the information processing outfits making headlines today.
A good example of the attitude appears in “Bayes’s Theorem: What’s the Big Deal?”
Here’s the quote I noted:
Embedded in Bayes’ theorem is a moral message: If you aren’t scrupulous in seeking alternative explanations for your evidence, the evidence will just confirm what you already believe. Scientists often fail to heed this dictum, which helps explains why so many scientific claims turn out to be erroneous. Bayesians claim that their methods can help scientists overcome confirmation bias and produce more reliable results, but I have my doubts.
Bayesian methods are just one of the most used methods in analytics outfits. Will these folks change methods? Nah.
Stephen E Arnold, January 6, 2015
Google Search and Cultural Representation
January 6, 2016
Google Search has worked its way into our culture as an indispensable, and unquestioned, tool of modern life. However, the algorithms behind the platform have become more sophisticated, allowing Google to tinker more and more with search results. Since so many of us regularly use the search engine to interact with the outside world, Google’s choices (and ours) affect the world’s perception of itself. Researcher Safiya Umoja Noble details some of the adverse effects of this great power in her paper, “Google Search: Hyper-Visibility as a Means of Rendering Black Women and Girls Invisible,” posted at the University of Rochester’s InVisible Culture journal. Not surprisingly, commerce features prominently in the story. Noble writes:
“Google’s algorithmic practices of biasing information toward the interests of the powerful elites in the United States,14 while at the same time presenting its results as generated from objective factors, has resulted in a provision of information that perpetuates the characterizations of women and girls through misogynist and pornified websites. Stated another way, it can be argued that Google functions in the interests of its most influential (i.e. moneyed) advertisers or through an intersection of popular and commercial interests. Yet Google’s users think of it as a public resource, generally free from commercial interest15—this fact likely bolstered by Google’s own posturing as a company for whom the informal mantra, ‘Don’t be evil,’ has functioned as its motivational core. Further complicating the ability to contextualize Google’s results is the power of its social hegemony.16 At the heart of the public’s general understanding and trust in commercial search engines like Google, is a belief in the neutrality of technology … which only obscures our ability to understand the potency of misrepresentation that further marginalizes and renders the interests of Black women, coded as girls, invisible.”
Noble goes on to note ways we, the users, codify our existing biases through our very interaction with Google Search. To say the paper treats these topic in depth is an understatement. Noble provides enough background on the study of culture’s treatment of Black women and girls to get any non-social-scientist up to speed. Then, she describes the extension of that treatment onto the Web, and how certain commercial enterprises now depend on those damaging representations. Finally, the paper calls for a critical approach to search to address these, and similar, issues. It is an important, and informative, paper; we suggest interested readers give it a gander.
Cynthia Murrell, January 6, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph