The Ugly Underbelly of Search

February 5, 2013

By now everyone has heard about the major snafu incurred by the Github repository at the end of January. Search is our favorite topic of discussion, and while we primarily focus on all the good it can do for individuals and organizations, there is another side to search. In the wrong hands, or in incapable hands, search can have serious negative repercussions. The H Open article, “GitHub Search Exposes Uploaded Credentials,” fills us in.

The article gets to the heart of the problem:

“Users of the GitHub project hosting system have been reminded not to upload sensitive information to the system’s Git repositories. The reminder comes after GitHub launched a new search service based on elasticsearch. The launch of the service sent people off searching the code and, as people tend to do, they searched for private information. Various searches for terms such as ‘BEGIN RSA PRIVATE KEY’ were revealing many people had, in fact, been uploading private keys.”

Perhaps as a blessing in disguise, the elasticsearch infrastructure collapsed under the weight of searches as curious readers searched for themselves after hearing the news on Twitter. So the moral of this story is to never upload private keys or similar data into repositories, under any circumstances. A little common sense goes a long way. And, just to be safe, explore a more trusted solution based on Lucene and Solr, which pull from the strength of a large open source community. These solutions, like LucidWorks, are less likely to crack under the pressure.

Emily Rae Aldridge, February 5, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Written by Stephen E. Arnold · Filed Under News, Search

Comments

4 Responses to “The Ugly Underbelly of Search”

Charlie Hull on February 5th, 2013 4:59 am

Just to be clear, *any* search engine would have revealed passwords and other information stored as plaintext by developers. The Github team have also posted an explanation of what caused the downtime and what they did to fix it https://t.co/CBgJWrMG – seems it was a combination of deploying a newer version of Elasticsearch without sufficient testing, some old issues with Lucene and Java 6 and some human error. With help from the Elasticsearch team they had it all fixed in just over a day – not bad.
mister j on February 5th, 2013 12:26 pm

elasticsearch is based on Lucene. It is also open source; on github it has ~650 forks and a rather large list of contributors https://github.com/elasticsearch/elasticsearch/graphs/contributors

“elasticsearch aims to solve all these problems and more. It is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene.”

http://www.elasticsearch.org/
LucidWorx on February 8th, 2013 4:19 pm

This article is nothing but garbage! What makes it even more of a joke is knowing that it is paid for by LucidWorks, a direct competitor to ElasticSearch. Why doesn’t your article mention how HORRIBLE GitHib’s previous Solr based solution was? If LucidWorks and Solr are so great, why didn’t GitHub stick with it? I guess it would have “cracked under the pressure.”
Eric Gaumer on February 8th, 2013 8:01 pm

Judging from the author’s bio, she’s not qualified to give any technical advice regarding search or distributed systems. In fact, a simple Google search leads me to believe she’s likely compensated for promoting LucidWorks.

Please approach any of her rhetoric with complete skepticism. It’s nothing more than seedy marketing tactics.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.