Apache Lucene Solr Updates

January 25, 2013

The DZone Big Data/BI Zone has let us know that a new version of Apache Lucene Solr has hit the internet. Apache Lucene Solr 3.6.2 has been unveiled, and it will roll into many other products that build upon the open source code. Read the details in, “Apache Lucene Solr 3.6.2.”

The gist of the release is in the first few lines:

“Apache Lucene and Solr PMC recently announced another version of Apache Lucene library and Apache Solr search server numbred 3.6.2. This is a minor bugfix release concentrated mainly on bugfixes in Apache Lucene library.

Apache Lucene 3.6.2 library can be downloaded from the following address: http://lucene.apache.org/core/mirrors-core-3x-redir.html?. Apache Solr 3.6.2 can be downloaded at the following URL address: http://lucene.apache.org/solr/mirrors-solr-3x-redir.html?

Two products sure to be affected and improved by the update are LucidWorks Search and LucidWorks Big Data. LucidWorks chooses to use Lucene Solr as its foundation because of its dependability, agility, and strong developer and user communities. LucidWorks and any product that builds on open source and is going to be strong, secure, and continuously updated, just by its nature, and therefore a better choice than a proprietary option.

Emily Rae Aldridge, January 25, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Enterprise Search by White Is Vital Tool in Business Search Management

January 24, 2013

Martin White, information management consultant and Managing Director of Intranet Focus Ltd., is one of the leading experts on enterprise search and information access. White has published seven books on topics surrounding information consultancy and enterprise search applications. His most recent publication, Enterprise Search: Enhancing Business Performance, focuses on how to plan and implement a managed search environment in your corporation. The book explains how to meet both the needs of your business and your employees.

White makes a clear business case for search, emphasizing the need to evaluate current search systems and the creation of a support team. The book is well organized and easy to read, with a thorough preface giving an overview of chapters and topics as well as simplified summaries at the end of each chapter. This style makes White’s recent book a great tool for the busy professional.

Chapter 12 is a recommended starting place, listing twelve critical success factors. White states that if you don’t meet at least eight of these twelve, which include investing in a search support team, getting the best out of your current investment in search, and providing location-independent search, then you definitely need the contents of this book.

In Chapter 10, titled Managing Search, White expands on the idea of managing a search support team:

“Implementing search should never be ‘a project’. The work of ensuring that users continue to have high levels of search satisfaction will never come to a close. Each week, and perhaps even most days, there will be something that needs attention. The role of the search support team is not just to be reactive but to anticipate when changes to the search application need to be made, or to identify a training requirement that will address an issue that is just starting to show up on the search logs and user satisfaction surveys.”

Most organizations are not prepared for the rate of growth of information that they are experiencing. White does a great job dissecting the need for enterprise search and then giving you the tools to successfully manage your system, based on far more than just available technology. The section on the future of enterprise search, Chapter 11, stood out to me. White makes an excellent case for why this topic can no longer be ignored.

Additional features include a thorough glossary, lists of books and blogs on information retrieval and enterprise search, and resources for further reading. The book is available here from O’Reilly Media in eBook and print formats. Highly recommended.

Andrea Hayden, January 24, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Search Technologies Success Lies in Corporate Retreats

January 23, 2013

While many companies may see corporate retreats as an obvious place to cut spending, co-founder and chief executive of Search Technologies believes retreats are some of the most valuable investments made by the company. In the Washington Post article “Value Added: This Herndon Search Company Found its Perfect Retreat in Costa Rica,” we learn about how Kamran Khan of Search Technologies believes corporate retreats are crucial to the success of his growing business. The most recent off-site cost $100,000 in company money and took place in highly-educated and tech-savvy Costa Rica.

The article explains the importance:

“Khan, who started Search Technologies in 2005, said it’s the only time when everyone in the company — including the management team — can be in one place. Khan uses the chance to address his 100-person staff, informing them of how the company is doing and outlining the goals for the next year. ‘I prefer to get people together and . . .clarify our strategy, which is very simple: We are going to be experts in the search space.’”

Khan and his team at Search Technologies may be onto something with this plan. Launched in 2005, the company was on track for $18 million in revenue for 2012, and the company’s net profit margin is about 5 percent. The IT services and search implementation software company services the Daily Mail newspaper’s Web site portfolio in Britain and helped Amazon.com launch its new cloud search product. Apparently the secret to success lies in Khan’s philosophy of hiring “good people” and taking beach trips. We have learned that Search Technologies is hiring in anticipation of further growth during 2013.

Andrea Hayden, January 23, 2013

Now You Are Talking: Can a Company Make Money with Enterprise Search?

January 22, 2013

I have better things to do that capture my immediate thoughts about “Inside H-P’s Missed Chance to Avoid a Disastrous Deal.” You can find the article in a dead tree version of the Wall Street Journal on page 1 with a jump to Page 16, where the “would not comment” phrase appears with alarming frequency.

The most interesting point in the write up is the quote, allegedly crafted by a Hewlett Packard Big Dog:

Now you’re talking.

Like much of the chatter about search, content processing, and Big Data analytics, on the surface these information retrieval software companies are like Kentucky Derby hopefuls on a crisp spring morning. The big pay day is two minutes away. How can the sleek, groomed, documented thoroughbreds lose?

The reality, documented in the Wall Street Journal, is that some companies with sure fire winning strategies can win. Now you’re talking.

How did HP get itself into the headline making situation? How can smart folks spend so much money, reverse course, and appear to be so scattered? Beats me.

I have, however, seen this before. As I read the Wall Street Journal’s story, I wrote down some thoughts in the margin of the dead tree instance of the story at the breakfast table.

image

A happy quack to Lubrisyn.com

Herewith are my notes to myself:

First, name one search vendor in the period from 1970 to the present which has generated more than $1 billion in revenue from search. Acquisitions like IBM’s purchase of iPhrase (er, what happened to that outfit), Vivisimo (now a Big Data company!), or SPSS’s Clementine (ah, you don’t know Clementine. Shame on you.) Don’t toss Google and its search appliance into the mix. Google only hints at the great success of the product. When was the last time you searched using a Google Search Appliance?

Second, didn’t Microsoft purchase Fast Search & Transfer for $1.2 billion in January 2008. How is that working out? The legions of search add in vendors for SharePoint are busy, but the core system has become a little bit like dear old Clementine. Fast Search was the subject of a couple of probes, but the big question which has not yet been answered as far as I know is, “How much revenue did Fast Search generate versus how much revenue Fast Search reported?” I heard that the revenues were, to some degree, inflated. I thought search was a sure fire way to make money.

Third, after more than a decade of top down marketing, why did Endeca need cash infusions from Intel and SAP venture units? How much did Oracle pay for Endeca? Some azure chip consultants have described Endeca as the leading vendor of enterprise search. Endeca added ecommerce and business intelligence to its line up of products. What was the firm’s revenue at the time of its sale to Oracle? I estimated about $150 million.

Fourth, Dassault, the company with the “system”, bought Exalead. What has happened to this promising technology? Is Exalead now a $200 million a year revenue producer for the prestigious French engineering firm? Perhaps the “system” has been so successful that Exalead is now infused into Dassault clients throughout the world? On the other hand, wouldn’t a solution with this type of impact make headlines every week even in the US. Is it more difficult to to cultivate information retrieval revenues than other types of software revenue? The good news is that Dassault paid a reasonable price for Exalead, avoiding the Autonomy, Endeca, and Fast Search purchase prices.

These examples reminded me that even if my estimates are wide of the mark by 20 or 30 percent, how could any company generate the astounding growth required to pay the $11 billion acquisition cost, invest in search technology, and market a product which is pretty much available for free as open source software today? Answer: Long shot. Exercise that horse and make sure you have what it takes to pay the jockey, the stable hands, the vet, and the transportation costs. Without that cash cushion, a Derby hopeful will put a person in a financial hole. Similar to search dreams of big acquirers? Yep. Maybe identical?

Two different points occurred to me.

On one hand, search and its bandwagon riders like Big Data analytics must seems to be a combination of the Klondike’s mother load and a must-have function no matter what a professional does for a living. The reality is that of the 65 search and related vendors I have written about in my books and confidential reports, only three managed to break the $100 million in search revenue ceiling. The companies were Autonomy, Endeca, and Fast Search. Of the three, only Endeca emerged relatively unscathed from the process. The other 62 companies either went out of business (Convera, Delphes, Entopia) or stalled at revenues in the millions of dollar. If one totals the investments in these 65 firms to generate their revenues, search is not a break even investment. Companies like Attivio and Coveo have captured tens of millions of venture dollars. Those investors want a return. What are the odds that these companies can generate more revenues than Autonomy? Interesting question.

On the other hand, search and its child disciplines remain the most complex of modern computing problems. Whether it is voice to text to search and then to predictive analytics for voice call intercepts or just figuring out what Buffy and Trent in the sales department need to understand a new competitor, software is just not up to the task. That means that money pumped into promising companies will pay big dividends. Now the logic may make sense to an MBA, but I have spent more than 35 years explaining that progress in search is tough to achieve, expensive to support, and disappointing to most system users. The notion that a big company could buy software that is essentially customized to each customer’s use cases (notice the plural of “cases”) and make big money is a characteristic of many firms and managers. The reality is that even governments lack the money to make search work.

Don’t get me wrong.

There are small firms which because they focus on quite specific problems can deliver value to a licensee. However, big money assumes that search technology will be a universal, easily applied to many situations. Even Google, with its paid search model, is now facing innovation challenges. With lots of smart people, Google is hiring the aging wizards of search in an attempt to find something that works better than the voting methods in use today.

What do my jottings suggest? Search is a tough business. Assumptions about how much money one can make from search in an era of open source options and cost cutting need to be looked at in a different way. The current approach, as the Wall Street Journal write up makes clear, is not working particularly well. Does this search revenue track record suggest that the azure chip consultants, former middle school teachers, and real journalists miss the larger message of search, content processing, and Big Data analytics? My tentative answer is, “Yep.”

Stephen E Arnold, January 22, 2013

The Question Drives the Search

January 22, 2013

Over at Chiliad, an article called “Search Vs. Correlation Vs. Causality-What Do Your Goals Require?” discusses how different types of questions change search results. Business intelligence and search are different aspects of the same end result and together they can generate more useful results. Correlations provide analytics, thus turning up unexpected and often useful relationships. The value is not in observations, but rather connections between data, which then influences decision making. The “why” factor is also a big part, because it explains how the data will be used and what the end result will be.

It involves more legwork than anything else:

“Iterative Discovery—understanding “why”—requires a different approach. Not only does digging in deliver more information, it suggests new inquiry and allows you to dig deeper. It helps you understand—across all your sources—what matters most. Although Chiliad named this approach Iterative Discovery, we didn’t invent it. Great researchers and analysts did. We simply observed them—and created a tool tuned to figuring out…’What does it mean?’”

If the why question cannot be answered than search, business intelligence, and everything else is useless. Users conduct these actions to find an answer and if an answer is not provided the action are worthless.

Whitney Grace, January 22, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Vaporware Does Not Make You Rich

January 20, 2013

HP bought Autonomy in hopes to turn a profit from the company’s software, but upon delving into Autonomy’s records HP discovered they had invested in vaporware. Read Write focuses on the “Vaporware Allegation Latest HP/Autonomy Twist.” Stanley Morrical is suing HP, because he does not believe the software exists and all HP has to do is prove that it bought $10.3 billion worth of marketable software. To cover a possible blunder, HP claims that Autonomy fooled them with creative accounting and information misrepresentation. Morrical states that HP is doing this to cover its own tracks for making a foolish purchase or a nonexistent purchase.

“While claiming to have IDOL 10 ready, HP actually had nothing to sell, Morrical is accusing. Essentially, he claims, IDOL 10 was vaporware.

‘You go out in the market and say it’s available and it’s not,’ Aron Liang, an associate at the San Francisco law firm Cotchett Pitre & McCarthy, which is representing Morrical, said. ‘So either they knew it and they’re lying or they don’t even know what they’re selling, which in some ways may even be worse.’

David Schubmehl, a tech analyst for International Data Group, said he was briefed on IDOL 10 in June. However, Schubmehl says he hasn’t talked to any companies using the software.

‘I can’t confirm that anyone is actually using IDOL 10,’ Schubmehl said. ‘However, I have had briefings about that back in June and it certainly seemed to be part of their big data offerings.’”

Nobody has used IDOL 10 it seems, so how could a company have $900 million in revenues from vaporware? Somebody here is lying, but HP and Autonomy are pointing the finger at the other person. Whose nose is really growing?

Whitney Grace, January 20, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Yandex Creates Powerful Facebook Search App

January 19, 2013

We know that Facebook is very protective of its services/ products and that their practices concerning user data are questionable. What will Facebook do, however, with Yandex’s new search app? Tech Crunch announced, “Russian Giant Yandex Has Secretly Built A Killer Facebook Search Engine App Codenamed ‘Wonder’.” The search engine app allows users to ask what content and businesses friends visited. Facebook prohibits search engines to use its data without permission. A spokesperson from Yandex was not able to comment on Wonder, but did confirm the company as interested in mining social data and building social products.

Wonder works by allowing its users to vocally search for information and it lists whether their friends have searched for it as well. Yandex so far has limited themselves to the Russian market, but Google and other competitors have eaten away at its revenue and so they are turning to other areas. Some areas are mobile, maps, and app discovery for services/products.

What does Facebook think about this? Facebook tried to allow its users to search friends’ content with Nearby. Also Wonder might use too much of Facebook’s user data and Facebook does not volunteer user information to search engines, which Wonder might do. Facebook is taking its own steps to get into search:

“CEO Mark Zuckerberg himself explained at TechCrunch Disrupt SF that Facebook is getting into search:

‘Search is interesting. I think search engines are really evolving to give you a set of answers’’ I have this specific question, answer this question for me.’ Facebook is pretty uniquely positioned to answer the questions people have. ; What sushi restaurants have my friends gone to in New York in the last six months and Liked?’ These are questions that you could potentially do at Facebook if we built out this system that you couldn’t do anywhere else. And at some point we’ll do it. We have a team working on search.’”

There are various options that Facebook could do with Wonder: buy it, make a joint partnership, grant permission, etc. but we will have to wait and see what will happen. We do know that users are demanding Facebook create a better search engine and Wonder is making them work faster to develop it.

Whitney Grace, January 19, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Social Search: Don Quixote Is Alive and Well

January 18, 2013

Here I float in Harrod’s Creek, Kentucky, an addled goose. I am interested in other geese in rural Kentucky. I log into Facebook, using a faux human alias (easier than one would imagine) and run a natural language query (human language, of course). I peck with my beak on my iPad using an app, “Geese hook up 40027.” What do I get? Nothing, Zip, zilch, nada.

Intrigued I query, “modern American drama.” What do I get? Nothing, Zip, zilch, nada.

I give up. Social search just does not work under my quite “normal” conditions.

First, I am a goose spoofing the world as a human. Not too many folks like this on Facebook, so my interests and my social graph is useless.

Second, the key words in my natural language query do not match the Facebook patterns, crafted by former Googlers and 20 somethings to deliver hook up heaven and links to the semi infamous Actor’s Theater or the Kentucky Center.

social outcast

Social search is not search. Social search is group centric. Social search is an outstanding system for monitoring and surveillance. For information retrieval, social search is a subset of information retrieval. How do semantic methods improve the validity of the information retrieved? I am not exactly sure. Perhaps the vendors will explain and provide documented examples?

Third, without context, my natural language queries shoot through the holes in the Swiss Cheese of the Facebook database.

After I read “The Future of Social Search,” I assumed that information was available at the peck of my beak. How misguided was I? Well, one more “next big thing” in search demonstrated that baloney production is surging in a ailing economy. Optimism is good. Crazy predictions about search are not so good. Look at the sad state of enterprise search, Web search, and email search. Nothing works exactly as I hope. The dust up between Hewlett Packard and Autonomy suggests that “meaning based computing” is a point of contention.

If social search does not work for an addled goose, for whom does it work? According to the wild and crazy write up:

Are social networks (or information networks) the new search engine? Or, as Steve Jobs would argue, is the mobile app the new search engine? Or, is the question-and-answer formula of Quora the real search 2.0? The answer is most likely all of the above, because search is being redefined by all of these factors. Because search is changing, so too is the still maturing notion of social search, and we should certainly think about it as something much grander than socially-enhanced search results.

Yep, Search 2.0.

But the bit of plastic floating in my pond is semantic search. Here’s what the Search 2.0 social crowd asserts:

Let’s embrace the notion that social search should be effortless on the part of the user and exist within a familiar experience — mobile, social or search. What this foretells is a future in which semantic analysis, machine learning, natural language processing and artificial intelligence will digest our every web action and organically spit out a social search experience. This social search future is already unfolding before our very eyes. Foursquare now taps its massive check in database to churn out recommendations personalized by relationships and activities. My6sense prioritizes tweets, RSS feeds and Facebook updates, and it’s working to personalize the web through semantic analysis. Even Flipboard offers a fresh form of social search and helps the user find content through their social relationships. Of course, there’s the obvious implementations of Facebook Instant Personalization: Rotten Tomatoes, Clicker and Yelp offer Facebook-personalized experiences, essentially using your social graph to return better “search” results.

Semantics. Better search results. How does that work on Facebook images and Twitter messages?

My view is that when one looks for information, there are some old fashioned yardsticks; for example, precision, recall, editorial policy, corpus provenance, etc.

When a clueless person asks about pop culture, I am not sure that traditional reference sources will provide an answer. But as information access is trivialized, the need for knowledge about the accuracy and comprehensiveness of content, the metrics of precision and recall, and the editorial policy or degree of manipulation baked into the system decreases.

image

See Advantech.com for details of a surveillance system.

Search has not become better. Search has become subject to self referential mechanisms. That’s why my goose queries disappoint. If I were looking for pizza or Lady Gaga information, I would have hit pay dirt with a social search system. When I look for information based on an idiosyncratic social fingerprint or when I look for hard information to answer difficult questions related to client work, social search is not going to deliver the input which keeps this goose happy.

What is interesting is that so many are embracing a surveillance based system as the next big thing in search. I am glad I am old. I am delighted my old fashioned approach to obtaining information is working just fine without the special advantages a social graph delivers.

Will today’s social search users understand the old fashioned methods of obtaining information? In my opinion, nope. Does it matter? Not to me. I hope some of these social searchers do more than run a Facebook query to study for their electrical engineering certification or to pass board certification for brain surgery.

Stephen E Arnold, January 18, 2013

Fifteen Year Old Invents Information Filter App

January 18, 2013

Useful apps can be made by anyone, but Fast Company reported on how “This 15-Year-Old Built An App To Help His High School Debate Team. It Could Do Much More Than That.” Tanay Tandy invented an app he calls Clipped that was developed to extract information from news articles and other sources and create a bulleted list. It is being touted as a new tool that could put research assistants, Congressional aides, and judicial clerks out of work. Clipped has received mixed reviews so far, but Tandy is working on an upgrade that should resolve the problems.

Tandy personally created the algorithm for his debate prep. Here is how he uses it:

“I use it to scan over articles, and after using Clipped, if I like an article, I have to go back and read the whole thing. For a typical debate I have about 100 different evidence files about 2-3 pages in length. There might be an article where the title might sound appealing, but after running Clipped, I can see the focus of the article is definitely not what I’m looking for. Last year for a debate on animal rights, I found a paper on animal rights–but it was targeted towards the philosophical side of why to respect animal rights. But for that specific debate, I was looking for evidence from the scientific side, research showing that animals can think as much as humans.”

Tandy does not believe anyone is too young to launch a product as long as the right people are around and ego does not go to a person’s head. Tandy just built a tool to make his life easier and was not looking for fame, but now he has a project that will appeal to college review boards. Also Google might be keeping an eye on him for future jobs.

Whitney Grace, January 18, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

The Teflon Coated Google

January 17, 2013

For eighteen months, the Federal Trade Commission investigated Google to see if it was using its corner on the Internet search market to push its own products and services at the expense of its rivals. The Wall Street Journal reports in “Behind Google’s Antitrust Escape” that the FTC decided not to purse an antitrust suit, instead they opted for a series of smaller issues. Google agreed to make some changes in its search business. The FTC could not find any evidence that Google’s customers as well as its rivals were being harmed. All the FTC discovered were customers’ complaints about Google’s actions, which were not enough to make a case.

During the investigation, Google was setting itself up against the antitrust violation:

“Google also dispatched executive chairman Eric Schmidt and other employees to garner support from lawmakers, adding political pressure to the landscape. In November, for instance, staff members of U.S. Senator Mark Udall, a Democrat from Colorado, spoke with Google representatives. Afterward, Mr. Udall sent a letter to FTC Chairman Jon Leibowitz, encouraging the agency to proceed “cautiously” in its probes of Internet companies, which “have some of the highest consumer satisfaction rates in the country” and have created millions of jobs.”

Udall’s letter was only one of several letters that Congress members sent to the FTC. Many of these letters were leaked and Congress was concerned about information leaking. It was even suggested that the FTC leaked the info for strategic advantage. Whatever the truth is, Google got off with a slap on the hand and will continue on with its search dominance.

Whitney Grace, January 17, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta