The Ugly Underbelly of Search
February 5, 2013
By now everyone has heard about the major snafu incurred by the Github repository at the end of January. Search is our favorite topic of discussion, and while we primarily focus on all the good it can do for individuals and organizations, there is another side to search. In the wrong hands, or in incapable hands, search can have serious negative repercussions. The H Open article, “GitHub Search Exposes Uploaded Credentials,” fills us in.
The article gets to the heart of the problem:
“Users of the GitHub project hosting system have been reminded not to upload sensitive information to the system’s Git repositories. The reminder comes after GitHub launched a new search service based on elasticsearch. The launch of the service sent people off searching the code and, as people tend to do, they searched for private information. Various searches for terms such as ‘BEGIN RSA PRIVATE KEY’ were revealing many people had, in fact, been uploading private keys.”
Perhaps as a blessing in disguise, the elasticsearch infrastructure collapsed under the weight of searches as curious readers searched for themselves after hearing the news on Twitter. So the moral of this story is to never upload private keys or similar data into repositories, under any circumstances. A little common sense goes a long way. And, just to be safe, explore a more trusted solution based on Lucene and Solr, which pull from the strength of a large open source community. These solutions, like LucidWorks, are less likely to crack under the pressure.
Emily Rae Aldridge, February 5, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
A Quote To Note About Search
February 5, 2013
Search is act of trying to find the answer to a question. Internet users browse the Web searching for answers to their questions. The main tool people user to search the Internet are search engines, but while reading Explore this quote came up:
“Forget search engines. The real revolution will come when we have research engines, intelligent Web helpers that can find out new things, not just what’s already been written. Facebook Graph Search isn’t anywhere near that good, but it’s a nice hint at greater things to come.”
Gary Marcus, a neuroscientist, said this quote about Facebook and how its new Graph search mean big changes for search in the coming years. Explore also mentioned that it echoes Vannevar Bush’s 1945 vision for the future of knowledge. Bush was an engineer and well known for his work on analog computers and little project called the Manhattan. Reflecting on this quote, one can only agree that yes, Graph Search and other searches, are on the brink of something grand. From the science fiction and romantic writing angle, these will be the times that people will find nostalgic for our infant-like knowledge. All the information in the world can be discovered on a little device someone carries around in their pocket, but people are still clueless about how to use it.
Google is already trying to remedy this with Knowledge Graph, which is the start of a Star Trek like computer. People need to be taught how to use information and what it can do for them, rather than passively let it seep through their heads. The time to start is now.
Whitney Grace, February 05, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
A Search Death Report a Decade Too Late
February 3, 2013
I wrote a feature for Searcher Magazine in 2003 called “In Search of…the Good Search.” The original title was “Search Is Dead.” I picked up the theme in a number of Beyond Search articles; for example, “The Search Is Dead Question.”
I was interested in the February 2013 write up “The End of the Web, Computers, and Search as We Know It.” The main idea is that search is dead. I am okay with the premise. I did find the following statement interesting in light of the explosion of interest in making information in academic papers free which is bubbling along with the Google agreement to pay France for links.
Here’s the passage I noted:
But it’s about time: “Bring me what I want” is almost always more useful than “Let me rummage around and see what I can find.” No matter how fast it seems, most search is a waste of time. In a way, we are using time (i.e., the time-based structure) to gain time. Instead of doing an endless series of separate searches, we tune the knobs on our stream-browser to continuously feed us just the information we need. This future doesn’t just kill the operating system, browser, and search as we know it — it changes the meaning of “computer” as we know it, too. Whether large or small (e.g., a smartphone), a computer’s main function in the near future will be tuning in to — as a car radio tunes in a broadcast station — the constantly flowing global cyberflow. We won’t care much about the computer devices themselves since we’ll be more focused on the world of information … and our lives as attached to it.
My thought is that the subtext for this remark rests upon the chronological approach in Scopeware. But when I ran a query for the system, Google had nothing substantive but Bing.com produced a reference to LegalTech.com and a download link on Softpedia.
My view:
- The death of search took place with the rise of pay to play services. Online advertising is the main engine of growth. As pay to play grew, the likelihood that different types of retrieval systems would become the next big thing has dwindled. After Google went public, the old precision and recall model ended up in the morgue.
- Search has been devalued by the systems marketed aggressively by the Big Five in search. These were Autonomy, Convera, Endeca, Fast Search, and Verity. Each installation left licensees with some surprises. None of these outfits exist as a self standing multi billion dollar, absolutely essential solution. Vestiges of the legacy of these breakthrough systems may be seen in the HP Autonomy dust up in my opinion.
- The stampede to predictive analytics, business intelligence, and personalized systems is little more than a way to get ride of the hassle of making the user craft a query and using smart software to tell the hapless what he or she needs to know. Do these systems work? In my book, the marketing is better than the technology at this time. Licensing pure search is not what most vendors do. The pitch is for customer support, Big Data, and sentiment. Search is a tough sell in 2003 and is even a tougher sell today.
I am okay with brave new worlds, nifty technology, and total immersion in pay to play. I just want to shift the moment of death back a decade. Reporting about a death long after it occurred is similar to the disappearance of content in a Web centric world. Maybe the Library of Congress will save the day with its archive of Twitter messages.
Stephen E Arnold, February 3, 2013
Embracing Open Source Only To Make Money
February 3, 2013
Open source offers companies many advantages: software tailored specifically to their needs, no licensing fees, and the support of an entire community. IBM was one of the big companies who adopted an open source policy and others have been following suit. According to Marketwire, Expert System is another business adding open source says the article, “Expert System Announces Integration With Apache Solr For Enterprise Search.”
Expert System is a semantic software company that provides insights into its clients’ information. For its Cogito semantic platform, Expert System installed Apache Solr, an open source enterprise search platform. The goal is that Apache Solr will give clients more precise search results and access to big data and enterprise content.
“’As more organizations recognize the opportunity presented by their information streams, it is important they understand that there are advanced tools that can improve the performance of their existing enterprise content and search investment,’ said Luca Scagliarini, Vice President of Strategy & Business Development, Expert System. ‘Semantic technology not only excels in making search and information management more accurate, but it also allows organizations to improve the quality of their information for use in the decision making process.’”
Great! Take advantage of open source and use it to deliver a better quality product to customers. That is how open source should be used (as long as Expert System gives something back in return), but there is a problem here. As more companies follow IBM’s open source approach, they seem to be forgetting that IBM is a consulting company and not a software/hardware company. Adopting open source may not build revenue, instead (without the right plan) it will simply create bigger IT headaches.
Whitney Grace, February 3, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Search and Innovation
February 1, 2013
I don’t want to rain on the innovation parade. However, another search lawsuit is upon us. “Microsoft Sued over Search-Related Patents” reports that the alleged infringement relates to advertising. In My March 2013 Cebit Promise talk I comment about the loss of innovation in search. This new legal dust up makes it clear that the focus in search is on the dance among the search system, the user, and the advertiser. In short, innovation is not precision and recall in the manner of dusty equations. Perhaps innovation is the dutiful servant of revenue and legal eagles?
Stephen E Arnold, February 1, 2013
Sponsored by HighGainBlog
Facebook Graph Search No Threat to Google Search
February 1, 2013
Contrary to some early predictions, it looks like Google has nothing to worry about from Facebook’s just-released “graph search” function. The Manila Times reports, “Facebook’s New Search Product Not Threat to Google – Analysts.” The brief write-up reports:
“After Facebook rolled out the friends-based search product on Tuesday, people began thinking about the question of how this new feature could affect Google, the king of search. Facebook CEO Mark Zuckerberg said that ‘graph search’ is different from an all-purpose search engine. His view was agreed by experts, who said that compared with Facebook’s focus on the network of friends, the search function of Google takes a much more holistic approach. Analysts agreed that Facebook’s search tool is unlikely to challenge Google’s leading position in web search at least in the near future.”
The new feature allows users to tap into opinions and recommendations expressed by their “friends” when searching for information. Our own leader, Stephen E. Arnold, has observed that it functions better for some folks than for others, and that the less superficial the search, the less useful it is. Thanks, but no thanks.
If you’re getting a sense of déjà vu, it may be because of similar social-linked moves last year by Microsoft and, yes, Google itself. Microsoft tied recommendations from Foursquare into their Bing results, while Google connected Google+ data with its search (opting out is possible). All three implementations seem like either-love-it-or-hate-it propositions. But, hey, all is well as long as the advertisers are happy.
Cynthia Murrell, February 01, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Exclusive Interview: Miles Kehoe, LucidWorks
January 30, 2013
Miles Kehoe, formerly a senior manager at Verity and then the founder of New Idea Engineering, joined LucidWorks in late 2012. I worked with Miles on a project and found him a top notch resource for search and the tough technical area which was our concern.
I was able to interview Miles Kehoe on January 25, 2013. He was forthcoming and offered me insights which I found fresh and practical. For example, he told me:
You know I come from a ‘platform neutral’ background, and I know many of the folks involved with ElasticSearch. Their product addresses many of the shortcomings in Solr 3.x, and a year or two ago that would have been a coup. But now, Solr 4 completely addresses those shortcomings, and then some, with SolrCloud and Zoo Keeper. ES says it doesn’t require a pesky ‘schema’ to define fields; and when you’re playing with a product for the first time, that is kind of nice. On the other hand, folks I know who have attempted production projects with ES tell me there’s no way you want to go into production without a schema. Apache Lucene and Solr enjoy a much larger community of developers. If you check the Wikipedia page, you’ll see that Lucene and Solr both list the Apache Software Foundation as the developer; Elastic Search lists a single developer, who it turns out, has made the vast majority of updates to date. While it is based on Apache Lucene, Elastic Search is not an Apache project. Both products support RESTful API usage, but Elastic requires all transactions to use JSON. Solr supports JSON as well, but goes beyond to support transactions in many formats including XML, Java, PHP, CSV and Python. This lets you write applications to interact with Solr in any language and with any protocol you want to use. But the most noticeable difference is that Solr has an awesome Web Based Admin UI, ES doesn’t. If you’re only writing code, you might not care, but the second a project is handed over to an Admin group they’re bound to notice! It makes me smile every time somebody says ES and “ease of use” in the same sentence – you remember the MS DOS prompt back in 1990? Although early adopters enjoyed that “simplicity”, business people preferred mouse-based systems like the Mac and Windows. We’re seeing this play out all over again – busy IT people want an admin UI – they don’t want to spend all day at what amounts to a “web command line”, stitching together URLs and JSON commands.
I found this comment prescient. I learned about a possible issue triggered by ElasticSearch in “Github Search Exposes Passwords Then Crashes.”
I pressed Mr. Kehoe for key points of differentiation in open source search. I pointed out that every vendor is rushing to embrace open source search. Some do it with lights flashing like IBM and others operate in a lower profile manner like Attivio. He told me:
Just as we have different products and services for our customers, we can customize our engagements to meet our customers’ needs. Some of our customers want to have deep product expertise in-house, and with training, best practice and advisory consulting, and operations/production consulting, we help them come up to speed. We also provide ongoing technical and production support for mission critical applications – just last month an eCommerce site ran into production problems on the Friday afternoon before Christmas. We were able to help them out and have them at full capacity before dinner. Not to dwell on it, but what sets LucidWorks apart is the people. We employ a large number of the team that created and enhances Lucene and Solr including Grant Ingersoll, Steve Rowe and Yonik Seeley. We also have significant expertise on the business side as well. At the top, Paul Doscher grew Exalead from an unknown firm into a major enterprise search player over just a few years; my former business partner Mark Bennett and I have built up deep understanding of search since our Verity days in the early 1990s.
Important information for those analyzing search systems I believe.
You can read the full text of the interview on the ArnoldIT Search Wizards Speak series at http://goo.gl/31682. Search Wizards Speak is the largest, no cost, freely available collection of interviews with experts in search and content processing. There are more than 60 interviews available. You can find the full series listing at http://www.arnoldit.com/search-wizards-speak/ and http://arnoldit.com/wordpress/wizards-index/.
Stephen E Arnold, January 30, 2013
Sponsored by Dumante.com
Apache Lucene and Solr New Codec
January 30, 2013
Apache Lucene and Solr have announced the new release of version 4.1. Improvements to Solr’s request parsing and support of Internet Explorer are just a few of the new features available. Read about all of the new features and upgrades in The H Open article, “Apache Lucene and Solr Update with New Default Codec.”
The article begins:
“The Apache Lucene project has announced Lucene and Solr 4.1, the latest updates to the Java-based text search library and search platform built around it. Lucene 4.1 has a new default codec “Lucene41Codec” which is based on a previously experimental “Block” indexing format. The new codec includes optimisations around pulsing (where a term only appears in one document) and efficient compressed stored fields to help keep data within the bounds of I/O cache.”
Lucene and Solr serve as the basis for many strong enterprise products. LucidWorks is one company that builds its solutions atop Lucene and Solr, ensuring that they are harnessing the best and most current open source advancements. Check out LucidWorks Big Data and/or LucidWorks Search – both are sure to get even better, benefiting from the improvements in Lucene and Solr’s new codec.
Emily Rae Aldridge, January 30, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Graph Search Makes Facebook Rival Google
January 30, 2013
Facebook’s search application has never been very strong. Yandex’s Wonder application has urged Facebook to bump up its search development and launch the new Graph Search. Steve Cheny’s blog takes an in depth look at the new Graph Search in his post: “Graph Search’s Dirty Promise And The Con Of The Facebook ‘Like.’” Graph Search is supposed to compete with Google and allow users to search all of the content on their social networks. Cheny says that Graph Search is much weaker than Facebook wants to admit and most of the data it searches is outdated.
Cheny explains that Facebook has convinced companies that they need to buy fans, meaning “likes” on Facebook. Facebook’s users are not its customers, rather these companies are and they have spent 50% of their advertising budget on Facebook campaigns. All of this produces a lot of data and connections, but Cheny argues that it will not meet users’ real needs.
“The truth is Graph Search deserves the exact disclaimer FB gave it… it’s a beta product. Through time, iteration, and effort it can and will be a useful tool for FB power users who are well connected, to find people and to sift through memories. But the fact is we’re living in a web where services are unbundling, and social is unbundling too. You simply can’t roll up recommendations for people, places, and interests into a service that’s one size fits all. “
Of course Graph Search is a beta. It will not decide what you do, only try to influence your decision. Facebook have you failed in search?
Whitney Grace, January 30, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Quote to Note: Craziness about Facebook Search
January 29, 2013
Here’s a quote to note. I don’t want to lose this puppy. I spotted it in the dead tree edition of the New York Times. The location of this notable phrase is the business section, page B 7. The story containing the quote is “Facebook’s Search Had to Go Beyond Robospeak.” The story explains the wonderfulness of Facebook’s beta search system. We love Facebook search. How could the company possibly improve on a graph surfing system which blocks outfits like Yandex from indexing content. No way. Anyway, here’s the quote:
Letting users talk with a computer on their own terms.
Oh, baby. Do I love this type of insightful comment about search and retrieval. I was not aware that I was able to talk with Facebook, but what do I know. Even better I live the idea of doing the talking on my own terms.
How interesting is this statement about letting users talk with a computer? Beyond interesting. The statement ventures into the fantasyland of every person who watched and confused Star Trek, Star Wars, and Mary had a little lamb.
A keeper.
Stephen E Arnold, January 29, 2013
Check out our sponsor Dumante.com