Google Does Not Own Enterprise Market
January 23, 2013
Hadoop and the Google Search Appliance are often noted as the big player in Big Data. However, Paul Doscher, a leader in open source enterprise search, has another opinion. He believes in the power and dependability of Apache Lucene. Read more of what he has to say in “Enterprise Search Doesn’t Begin and End With Google.”
The article begins:
“Paul Doscher, CEO of Lucid Imagination, [now LucidWorks] wants you to know that when it comes to enterprise search — and search that can handle the big data wave — open-source Lucene is a contender. Of course, as head of the company that offers both open source and commercial versions of Lucene, Doscher is no neutral observer. At the company’s Lucene Revolution conference . . . Doscher announced an application development stack that knits together Hadoop, Mahout, R and Lucene/Solr for handling search, machine learning, recommendation engines and analytics as a platform for enterprise search. That stack, called LucidWorks Big Data, is in beta and aims to make it faster and easier for developers to deploy enterprise-scale search.”
Doscher may in fact have a vested stake in Lucene, but LucidWorks has adopted Lucene because it cannot be beat. Other industry leaders say much the same thing. So for reliability and affordability, consider Apache Lucene and the dependable out-of-the-box offerings from LucidWorks.
Emily Rae Aldridge, January 23, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Foundem Officially Brings Suit Against Google
January 23, 2013
That pesky Foundem, unlike the US regulatory agencies, just won’t go away. eWeek reports on a new lawsuit in “Google Being Sued in UK for Bias in Search Results.” The British shopping-comparison site Foundem has been challenging Google since 2010, when it helped spur an antitrust investigation by the European Union. Now, it seems the company is making good in its promise to take its fight to the courts, according to a story from Bloomberg.
This development is bubbling just as Google has reached an accord with the US Federal Trade Commission. Writer Todd R. Weiss reports:
“The FTC ruled that not enough evidence existed to prove allegations from some competitors that Google had manipulated its search algorithms to harm competing Websites and unfairly promote its own competing vertical properties. Instead, the company entered into a voluntary agreement with the FTC to change some of its other business practices. . . .
“The search company voluntarily will end some past business practices that could stifle competition in the markets for popular devices such as smartphones, tablets and gaming consoles, as well as the market for online search advertising, according to the agency. Under a binding settlement with the FTC, Google will allow competitors access ‘on fair, reasonable, and nondiscriminatory terms to patents on critical standardized technologies needed to make popular devices such as smartphones, laptop and tablet computers, and gaming consoles,’ the FTC reported.”
Google has agreed not to seek court injunctions to block the use of such “standards-essential patents,” many of which came from its acquisition of Motorola Mobility last year. Last July, the FTC also resolved charges that Google actively bypassed Apple Safari browser privacy settings with a $22.5 million settlement .
Meanwhile, Foundem is not the only European company bringing antitrust action against Google. So far, the EU seems to be taking a tougher stance against the company than US regulators have. Will Google be able to wriggle a compromise out of this one?
Cynthia Murrell, January 23, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Key Yahoo Traffic in Steady Decline
January 23, 2013
Yahoo’s CEO Marissa Mayer has a big hill to climb. Just as she is working to woo investors, recent statistics present some troublesome numbers, we learn from “Mayer’s 10X Challenge: Yahoo’s Homepage, Mail, and Search Traffic Show Significant Year-Over-Year Declines” at All Things D. Yes, that search thing is one of the issues; email and the customizable homepage are the other primary problem spots. The article informs us:
“Private stats from comScore show that those three areas have continued their longtime decline over the last year, in some cases dropping significantly. In November and December, for example, compared to the same two months a year ago, U.S. search was down 28 percent and 24 percent respectively, while mail was down 16 percent and 12 percent.
“This matters a great deal, since the troika of homepage, mail, and search have been the critical driver of the Yahoo value ecosystem for advertisers.
“The impact of those drops is felt all over Yahoo, whose music, movie, games and travel site have also seen massive drop-offs in traffic year over year in those same months.”
Though the drops in search and email were steep, the home page seemed to stabilize a bit, and Flickr has increased its users by a handsome 37 percent. That could have something to do increased support the CEO has been showing for that project.
Flickr, however, is not enough to save Yahoo, insists Swisher. She does give the company credit for other moves it has been making to refresh its offerings, like a new version of Yahoo Mail and a homepage redesign that emphasizes its mobile and social facets. Will it be enough to keep Yahoo near the top of the heap?
Cynthia Murrell, January 23, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Intro to Probability
January 23, 2013
For anyone whose basic understanding of probability theory is incomplete, Math ? Programming kindly offers a thorough introduction in “Probability Theory—A Primer.” Blogger Jeremy Kun notes that key concepts in artificial intelligence, machine learning, and statistics are based upon probability theory. Referring to plans for his blog, he goes on:
“A number of our future posts will rely on the ideas and terminology we lay out in this post. Our first formal theory of machine learning will be deeply ingrained in probability theory, we will derive and analyze probabilistic learning algorithms, and our entire treatment of mathematical finance will be framed in terms of random variables.”
Kun simplifies by framing his explanation finitely in terms of naive set theory and without the complications of measure theory. He emphasizes:
“This primer is not meant to connect probability theory to the real world. Indeed, to do so would be decidedly unmathematical. We are primarily concerned with the mathematical formalisms involved in the theory of probability, and we will leave the philosophical concerns and applications to future posts. The point of this primer is simply to lay down the terminology and basic results needed to discuss such topics to begin with.”
The lesson goes on in depth, covering finite probability spaces, random variables, expected value, and variance and covariance. Kun throws in plenty of helpful definitions and formulas along the way. This might be one to review now and tuck away for future reference.
Cynthia Murrell, January 23, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Now You Are Talking: Can a Company Make Money with Enterprise Search?
January 22, 2013
I have better things to do that capture my immediate thoughts about “Inside H-P’s Missed Chance to Avoid a Disastrous Deal.” You can find the article in a dead tree version of the Wall Street Journal on page 1 with a jump to Page 16, where the “would not comment” phrase appears with alarming frequency.
The most interesting point in the write up is the quote, allegedly crafted by a Hewlett Packard Big Dog:
Now you’re talking.
Like much of the chatter about search, content processing, and Big Data analytics, on the surface these information retrieval software companies are like Kentucky Derby hopefuls on a crisp spring morning. The big pay day is two minutes away. How can the sleek, groomed, documented thoroughbreds lose?
The reality, documented in the Wall Street Journal, is that some companies with sure fire winning strategies can win. Now you’re talking.
How did HP get itself into the headline making situation? How can smart folks spend so much money, reverse course, and appear to be so scattered? Beats me.
I have, however, seen this before. As I read the Wall Street Journal’s story, I wrote down some thoughts in the margin of the dead tree instance of the story at the breakfast table.
A happy quack to Lubrisyn.com
Herewith are my notes to myself:
First, name one search vendor in the period from 1970 to the present which has generated more than $1 billion in revenue from search. Acquisitions like IBM’s purchase of iPhrase (er, what happened to that outfit), Vivisimo (now a Big Data company!), or SPSS’s Clementine (ah, you don’t know Clementine. Shame on you.) Don’t toss Google and its search appliance into the mix. Google only hints at the great success of the product. When was the last time you searched using a Google Search Appliance?
Second, didn’t Microsoft purchase Fast Search & Transfer for $1.2 billion in January 2008. How is that working out? The legions of search add in vendors for SharePoint are busy, but the core system has become a little bit like dear old Clementine. Fast Search was the subject of a couple of probes, but the big question which has not yet been answered as far as I know is, “How much revenue did Fast Search generate versus how much revenue Fast Search reported?” I heard that the revenues were, to some degree, inflated. I thought search was a sure fire way to make money.
Third, after more than a decade of top down marketing, why did Endeca need cash infusions from Intel and SAP venture units? How much did Oracle pay for Endeca? Some azure chip consultants have described Endeca as the leading vendor of enterprise search. Endeca added ecommerce and business intelligence to its line up of products. What was the firm’s revenue at the time of its sale to Oracle? I estimated about $150 million.
Fourth, Dassault, the company with the “system”, bought Exalead. What has happened to this promising technology? Is Exalead now a $200 million a year revenue producer for the prestigious French engineering firm? Perhaps the “system” has been so successful that Exalead is now infused into Dassault clients throughout the world? On the other hand, wouldn’t a solution with this type of impact make headlines every week even in the US. Is it more difficult to to cultivate information retrieval revenues than other types of software revenue? The good news is that Dassault paid a reasonable price for Exalead, avoiding the Autonomy, Endeca, and Fast Search purchase prices.
These examples reminded me that even if my estimates are wide of the mark by 20 or 30 percent, how could any company generate the astounding growth required to pay the $11 billion acquisition cost, invest in search technology, and market a product which is pretty much available for free as open source software today? Answer: Long shot. Exercise that horse and make sure you have what it takes to pay the jockey, the stable hands, the vet, and the transportation costs. Without that cash cushion, a Derby hopeful will put a person in a financial hole. Similar to search dreams of big acquirers? Yep. Maybe identical?
Two different points occurred to me.
On one hand, search and its bandwagon riders like Big Data analytics must seems to be a combination of the Klondike’s mother load and a must-have function no matter what a professional does for a living. The reality is that of the 65 search and related vendors I have written about in my books and confidential reports, only three managed to break the $100 million in search revenue ceiling. The companies were Autonomy, Endeca, and Fast Search. Of the three, only Endeca emerged relatively unscathed from the process. The other 62 companies either went out of business (Convera, Delphes, Entopia) or stalled at revenues in the millions of dollar. If one totals the investments in these 65 firms to generate their revenues, search is not a break even investment. Companies like Attivio and Coveo have captured tens of millions of venture dollars. Those investors want a return. What are the odds that these companies can generate more revenues than Autonomy? Interesting question.

On the other hand, search and its child disciplines remain the most complex of modern computing problems. Whether it is voice to text to search and then to predictive analytics for voice call intercepts or just figuring out what Buffy and Trent in the sales department need to understand a new competitor, software is just not up to the task. That means that money pumped into promising companies will pay big dividends. Now the logic may make sense to an MBA, but I have spent more than 35 years explaining that progress in search is tough to achieve, expensive to support, and disappointing to most system users. The notion that a big company could buy software that is essentially customized to each customer’s use cases (notice the plural of “cases”) and make big money is a characteristic of many firms and managers. The reality is that even governments lack the money to make search work.
Don’t get me wrong.
There are small firms which because they focus on quite specific problems can deliver value to a licensee. However, big money assumes that search technology will be a universal, easily applied to many situations. Even Google, with its paid search model, is now facing innovation challenges. With lots of smart people, Google is hiring the aging wizards of search in an attempt to find something that works better than the voting methods in use today.
What do my jottings suggest? Search is a tough business. Assumptions about how much money one can make from search in an era of open source options and cost cutting need to be looked at in a different way. The current approach, as the Wall Street Journal write up makes clear, is not working particularly well. Does this search revenue track record suggest that the azure chip consultants, former middle school teachers, and real journalists miss the larger message of search, content processing, and Big Data analytics? My tentative answer is, “Yep.”
Stephen E Arnold, January 22, 2013
Connectors Allow for Comprehensive Enterprise Information Delivery
January 22, 2013
Countless studies have shown that while the large majority of companies see great value in big data, they have not deployed the technologies that are poised to help them begin to collect, store and analyze big data. Information Week reports on Information Builders chief markting officer Michael Corcoran and his participation as a panelist at a Gartner conference in “Big Data Master Plan: Time To Start.”
Corcoran found that many audience members were not sure about if they were going to purchase technologies to transform big data into meaningful opportunities because they were not sure of the definition of big data and the power it holds.
After detailing the basics, he discusses some of the challenges enterprise organizations face:
One snag that organizations often encounter when setting up a big data initiative is finding ways to ensure that unstructured information from multiple sources is accurate and clean, and that it integrates well with existing data systems. “When they start to think about some of the new opportunities for big data, including social media or third-party industry data, how do they marry that elegantly?” Corcoran asked rhetorically.
Many organizations have found success in big data technologies utilizing connectors that allow users to work with data from multiple apps and information sources – structured and unstructured. These technologies enable true enterprise information delivery.
Megan Feil, January 22, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Facebook Graph Search Finally Arrives
January 22, 2013
Graph search is finally here; the highly anticipated answer to Facebook’s abysmal user search experience. The announcement leaves users and experts alike wondering about the functionality, but on the surface, hey, you can now search Facebook. ComputerWorld offers a nice write-up in, “How Facebook Built Graph Search and What it Means to Social Media.”
After dealing with the user experience side of the coin, the discussion turns to the developer side:
“To create Graph Search, the engineers likely used some combination of open source tools that are available on the market, combined with internally-developed code written specifically for Facebook’s extremely unique use case, predicts Jeffrey Kelly, big data expert at The Wikibon Project. Tools like Apache Lucene Solr and Cassandra- used by Netflix to index its movie library in Amazon Web Service’s cloud. ‘FB doesn’t use straight off the shelf software and hardware,’ he says. ‘They can’t, they either customize open source technology or develops it in-house.’”
Open source continues to make headlines, and Facebook is of course a highly visible open source user. Apache Lucene supports a number of other highly successful products including an open source enterprise leader, LucidWorks. Enterprise search doesn’t always make the big flashy headlines, but it is an important lynchpin of successful, dependable business, and therefore a necessary investment.
Emily Rae Aldridge, January 22, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Social Norms on Social Media
January 22, 2013
It is no secret that social networking is growing rapidly and is here to stay. But now we can sit back and watch as the world tries to figure out the boundaries on conversational topics on social media. Social norms that are widely accepted, such as which topics you can or cannot bring up in certain situations, get foggy when it comes to discussions online. We learn in “Religion, Politics, Sports… What People Around the World Do and Do Not Talk About on Social Media” on Quartz that citizens of different countries are already appearing to set vastly different new rules.
We learn:
“Different countries are writing those rules differently. A recent survey, conducted by the Pew Research Center, shows how common it is in each of 20 countries to talk about politics, religion, sports, and music or movies on social media. There are some surprising differences.
Europeans generally talk politics online less than people in the Americas. Middle Easterners are the most voluble of all, which is no surprise given the recent Arab Spring.”
It also appears that religious discussions vary greatly all over the world and pop culture is popular everywhere. Now that we know what is popular, perhaps this will be a starting place for new social norms to be set regarding social networking. We wonder what Miss Manners would have to say about this.
Andrea Hayden, January 22, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Complex Facebook Analytics Tool Available from Wolfram Alpha
January 22, 2013
Wolfram Alpha is famous for its knowledgeable tools and widgets that involve highly complex algorithms and computations. However, many may be surprised to hear about the Facebook analytics tool which is available from the systematic knowledge engine. The article “Use Wolfram Alpha to Dig Up Cool Statistics About Your Facebook Account [Weekly Facebook Tips]” on MakeUseOf tells readers how to get detailed facebook information about their account.
The article shares:
“With the Wolfram Alpha Facebook analytics tool, you can find out a huge amount of information about your Facebook account. It’s quite fun to see which of your posts or photos are the most popular, who your top commenters are, who is sharing your posts the most and more interesting tidbits. Plus, it’s easy to use this tool and completely free. Why not have a go?”
I decided to have a go with the Facebook tool, and was overwhelmed with the amount of detailed information I was provided. Wolfram Alpha told me everything from the moon phase at the time of my birth to statistical data about the top contributors on my page. Of course, all of this information is readily available to anyone with access to my page. This tool is fun, but may encourage others to consider resetting the privacy settings on their accounts.
Andrea Hayden, January 22, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
The Question Drives the Search
January 22, 2013
Over at Chiliad, an article called “Search Vs. Correlation Vs. Causality-What Do Your Goals Require?” discusses how different types of questions change search results. Business intelligence and search are different aspects of the same end result and together they can generate more useful results. Correlations provide analytics, thus turning up unexpected and often useful relationships. The value is not in observations, but rather connections between data, which then influences decision making. The “why” factor is also a big part, because it explains how the data will be used and what the end result will be.
It involves more legwork than anything else:
“Iterative Discovery—understanding “why”—requires a different approach. Not only does digging in deliver more information, it suggests new inquiry and allows you to dig deeper. It helps you understand—across all your sources—what matters most. Although Chiliad named this approach Iterative Discovery, we didn’t invent it. Great researchers and analysts did. We simply observed them—and created a tool tuned to figuring out…’What does it mean?’”
If the why question cannot be answered than search, business intelligence, and everything else is useless. Users conduct these actions to find an answer and if an answer is not provided the action are worthless.
Whitney Grace, January 22, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
 
	





