A Non Search Person Explains Why Search Is a Lost Cause

December 16, 2013

The author of “2013: the Year ‘the Stream’ Crested” is focused on tapping into flows of data. Twitter and real time “Big Data” streams are the subtext for the essay. I liked the analysis. In one 2,500 word write up, the severe weaknesses of enterprise and Web search systems are exposed.

The main point of the article is that “the stream”—that is, flows of information and data—is what people want. The flow is of sufficient volume that making sense of it is difficult. Therefore, an opportunity exists for outfits like The Atlantic to provide curation, perspective, and editorial filtering. The write up’s code for this higher-value type of content process is “the stock.”

The article asserts:

This is the strange circumstance that obtained in 2013, given the volume of the stream. Regular Internet users only had three options: 1) be overwhelmed 2) hire a computer to deploy its logic to help sort things 3) get out of the water.

The take away for me is that the article makes clear that search and retrieval just don’t work. Some “new” is needed. Perhaps this frustration with search is the trigger behind the interest in “artificial intelligence” and “machine learning”? Predictive analytics may have a shot at solving the problem of finding and identifying needed information, but from what I have seen, there is a lot of talk about fancy math and little evidence that it works at low cost in a manner that makes sense to the average person. Data scientists are not a dime a dozen. Average folks are.

Will the search and content processing vendors step forward and provide concrete facts that show a particular system can solve a Big Data problem for Everyman and Everywoman? We know Google is shifting to an approach to search that yields revenue. Money, not precision and recall, is increasingly important. The search and content  vendors who toss around the word “all” have not been able to deliver unless the content corpus is tightly defined and constrained.

Isn’t it obvious that processing infinite flows and changes to “old” content are likely to cost a lot of money. Google, Bing, and Yandex search are not particularly “good.” Each is becoming a system designed to support other functions. In fact, looking for information that is only five or six years “old” is an exercise in frustration. Where has that document “gone.” What other data are not in the index. The vendors are not talking.

In the enterprise, the problem is almost as hopeless. Vendors invent new words to describe a function that seems to convey high value. Do you remember this catchphrase: “One step to ROI”? How do you think that company performed? The founders were able to sell the company and some of the technology lives on today, but the limitations of the system remain painfully evident.

Search and retrieval is complex, expensive to implement in an effective manner, and stuck in a rut. Giving away a search system seems to reduce costs? But are license fees the major expense? Embracing fancy math seems to deliver high value answers? But are the outputs accurate? Users just assume these systems work.

Kudos to Atlantic for helping to make clear that in today’s data world, something new is needed. Changing the words used to describe such out of favor functions as “editorial policy”, controlled terms, scheduled updates, and the like is more popular than innovation.

Stephen E Arnold, December 16, 2013

Business Intelligence: Free Pressures For Fee Solutions

December 14, 2013

I read “KB Crawl sort la tête de l’eau,” published by 01Business. The hook for the article is that KB Crawl, a company harvesting Internet content for business intelligence analyses, has emerged from bankruptcy. Good news for KB Crawl, whose parent company is reported to be KB Intelligence.

The write up contained related interesting information.

First, the article points out that business intelligence services like KB Crawl are perceived as costs, not revenue producers. If this is accurate, the same problem may be holding back once promising US vendors like Digital Reasoning and Ikanow, among others.

Second, the article seems to suggest that for fee business intelligence services are in direct competition with free services like Google. Although Google’s focus on ads continues to have an impact on the relevance of the Google results, users may be comfortable with information provided by free services. Will the same preference for free impact the US business intelligence sector?

Third, the article identifies a vendor (Ixxo) as facing some financial headwinds, writing:

D’autres éditeurs du secteur connaissent des difficultés, comme Ixxo, éditeur de la solution Squido.

But the most useful information in the story is the list of companies that compete with KB Crawl. Some of the firms are:

  • AMI Software. www.amisw.com.  This company has roots in enterprise search and touts 1500 customers
  • Data Observer. www.data-observer.com. The company is a tie up between Asapspot and Data-Deliver. The firm offers “an all-encompassing Internet monitoring and e-reputation services company.”
  • Digimind. www.digimind.com. The firm makes sense of social media.
  • Eplica. A possible reference to a San Diego employment services firm.
  • iScop. Unknown.
  • Ixxo. www.ixxo.fr. The firm “develops innovative software applications to boost business responsiveness when faced with unstructured data.”
  • Pikko. www.pikko-software.com. A visualization company.
  • Qwam. www.qwamci.com. Another “content intelligence” company.
  • SindUp. www.sindup.fr. The company offers a monitoring platform for strategic and e reputation information.
  • Spotter. www.spotter.com. A company that provides the “power to understand.”
  • Synthesio. www.synthesio.com. The company says, “We help brands and agencies find valuable social insights to drive real business value.”
  • TrendyBuzz. www.trendybuzz.com. The company lets a client measure “Internet visibility units.”

My view is that 01Busienss may be identifying a fundamental problem in the for fee business intelligence, open source harvesting, and competitive intelligence sector.

Information about business and competitive intelligence that I see in my TRAX Overflight service is mostly of the “power of positive thinking” variety. Companies like Palantir capture attention because the firms are able to raise astounding amounts of funding. Less visible are the financial pressures on the companies trying to generate revenue with systems aimed at commercial enterprises.

If the 01Business article is on the money, what US vendors are like to have their heads under water in 2014? Use the comments section of this blog to identify the stragglers in the North American market.

Stephen E Arnold, December 14, 2013

Semantria and Diffbot: Clever Way to Forge a Tie Up

December 12, 2013

Short honk. I came across an interesting marketing concept in “Diffbot and Semantria Join to Find and Parse the Important Text on the ‘Net (Exclusive).”

Semantria (a company that offers sentiment analysis as a service) participated in a hackathon in San Francisco. The explains:

To make the Semantria service work quickly, even for text-mining novices, Rogynskyy’s team decided to build a plugin for Microsoft’s popular Excel spreadsheet program. The data in a spreadsheet goes to the cloud for processing, and Semantria sends back analysis in Excel format.

Semantria sponsored a prize for the best app. Diffbot won:

A Diffbot developer built a simple plugin for Google’s Chrome browser that changes the background color of messages on Facebook and Twitter based on sentiment — red for negative, green for positive. The concept won a prize from Semantria, Rogynskyy said. A Diffbot executive was on hand at the hackathon, and Rogynskyy started talking with him about how the two companies could work together.

I like the “sponsor”, “winner” and “team up” approach. The pay off, according to the article, is “While Semantria and Diffbot technologies continue to be available separately, they can now be used together.”

Sentiment analysis is one of the search submarkets that caught fire and then, based on the churning at some firms like Attensity, may be losing some momentum. Marketing innovation may be a goal other firms offering this functionality in 2014.

Stephen E Arnold, December 12, 2013

Quote to Note: NLP and Recipes for Success and Failure

December 11, 2013

I read “Natural language Processing in the Kitchen.” The post was particularly relevant because I had worked through “The Main Trick in Machine Learning.” The essay does an excellent job of explaining coefficients (what I call for ease of recall, “thresholds.”) The idea is that machine learning requires a human to make certain judgments. Autonomy IDOL uses Bayesian methods and the company has for many years urged licensees to “train” the IDOL system. Not only that, successful Bayesian systems, like a young child, have to be prodded or retrained. How much and how often depends on the child. For Bayesian-like systems, the “how often” and “how much” varies by the licensees’ content contexts.

Now back to the Los Angeles Times’ excellent article about indexing and classifying a small set of recipes. Here’s the quote to note:

Com­puters can really only do so much.

When one jots down the programming and tuning work required to index recipes, keep in mind the “The Main Trick in Machine Learning.” There are three important lessons I draw from the boundary between these two write ups:

  1. Smart software requires programming and fiddling. At the present time (December 2013), this reality is as it has been for the last 50 years, maybe more.
  2. The humans fiddling with or setting up the content processing system have to be pretty darned clever. The notion of “user friendliness” is strongly disabused by these two articles. Flashy graphics and marketers’ cooing are not going to cut the mustard or the sirloin steak.
  3. The properly set up system with filtered information processed without some human intervention hits 98 percent accuracy. The main point is that relevance is a result of humans, software, and consistent, on point content.

How many enterprise search and content processing vendors explain that a failure to put appropriate resources toward the search or content processing implementation guarantees some interesting issues. Among them, systems will routinely deliver results that are not germane to the user’s query.

The roots of dissatisfaction with incumbent search and retrieval systems is not the systems themselves. In my opinion, most are quite similar, differing only in relatively minor details. (For examples of the similarity, review the reports at Xenky’s Vendor Profiles page.)

How many vendors have been excoriated because their customers failed to provide the cash, time, and support necessary to deliver a high-performance system? My hunch is that the vendors are held responsible for failures that are predestined by licensees’ desire to get the best deal possible and believe that magic just happens without the difficult, human-centric work that is absolutely essential for success.

Stephen E Arnold, December 11, 2013

Palantir: What Is the Main Business of the Company?

December 11, 2013

I read about Palantir and its successful funding campaign in “Palantir’s Latest Round Valuing It at $9B Swells to $107.8M in New Funding.” Compared to the funding for ordinary search and content processing companies, Palantir is obviously able to attract investors better than most of the other companies that make sense out of data.

If you run a query for “Palantir” on Beyond Search, you will get links to articles about the company’s previous funding and to a couple of stories about the companies interaction with IBM i2 related to an allegation about Palantir’s business methods.


Image from the Louisiana Lottery.

I find Palantir interesting for three reasons.

First, it is able to generate significant buzz in police and intelligence entities in a number of countries. Based on what I have heard at conferences, the Palantir visualizations knock the socks off highly placed officials who want killer graphics in their personal slide presentations.

Second, the company has been nosing into certain financial markets. The idea is that the Palantir methods will give some of the investment outfits a better way to figure out what’s going up and what’s going down. The visuals are good, I have heard, but the Palantir analytics are perceived, if my sources are accurate, as better than those from companies like IBM SPSS, Digital Reasoning, Recorded Future, and similar analytics firms.

Third, the company may have moved into a new business sector. The firm’s success in fund raising begs the question, “Is Palantir becoming a vehicle to raise more and more cash?”

Palantir is worth monitoring. The visualizations and the math are not really a secret sauce. The magic ingredient at Palantir may be its ability to sell its upside to investors. Is Palantir introducing a new approach to search and content processing? The main business of the company could be raising more and more money.

Stephen E Arnold, December 11, 2013

Exclusive Silobreaker Interview: Mats Bjore, Silobreaker

November 25, 2013

With Google becoming more difficult to use, many professionals need a way to locate, filter, and obtain high value information that works. Silobreaker is an online service and system that delivers actionable information.

The co-founder of Silobreaker said in an exclusive interview for Search Wizards Speaks says:

I learned that in most of the organizations, information was locked in separate silos. The information in those silos was usually kept under close control by the silo manager. My insight was that if software could make available to employees the information in different silos, the organization would reap an enormous gain in productivity. So the idea was to “break” down the the information and knowledge silos that exists within companies, organizations and mindsets.

And knock down barriers the system has. Silobreaker’s popularity is surging. The most enthusiastic supporters of the system come from the intelligence community, law enforcement, analysts, and business intelligence professionals. A user’s query retrieves up-to-the-minute information from Web sources, commercial services, and open source content. The results are available as a series of summaries, full text documents, relationship maps among entities, and other report formats. The user does not have to figure out which item is an advertisement. The Silobreaker system delivers muscle, not fatty tissue.

Mr. Bjore, a former intelligence officer, adds:

Silobreaker is an Internet and a technology company that offers products and services which aggregate, analyze, contextualize and bring meaning to the ever-increasing amount of digital information.

Underscoring the difference between Silobreaker and other online systems, Mr. Bjore points out:

What sets us apart is not only the Silobreaker technology and our commitment to constant innovation. Silobreaker embodies the long term and active experience of having a team of users and developers who can understand the end user environment and challenges. Also, I want to emphasize that our technology is one integrated technology that combines access, content, and actionable outputs.

The ArnoldIT team uses Silobreaker in our intelligence-related work. We include a profile of the system in our lectures about next-generation information gathering and processing systems.

You can get more information about Silobreaker at www.silobreaker.com. A 2008 interview with Mr. Bjore is located at on the Search Wizards Speak site at http://goo.gl/f7niAH.

Stephen E Arnold, November 25, 2013

Search Boundaries. Explode.

November 14, 2013

I read a quite remarkable news release. The title? Grab your blood pressure medicine because you may “explode.”

Expertmaker: Artificial Intelligence (AI) Explodes the Boundaries of Enterprise Search

I expect a sign to warn me off. Was it safe to read about such a potentially powerful technology?

Expertmaker Info

Straightaway I poked through my information about search vendors. I did not recall the name “Expertmaker.” I think it is catchy, echoing the Italian outfit Expert System.

Expertmaker is located at www.expertmaker.com.  The company offers the following products:

  1. Consulting
  2. Products that are “an online solution and/or mobile solution.”
  3. Big Data Anti Churn. I am not exactly sure what this means, and I did not want to contact Expertmaker to learn more.
  4. Flow, a virtual assistant platform.

The technology is positioned as “artificial intelligence.” The description of the company’s technology is located at this link. I scanned the information on the Expertmaker Web site. I noted some points that struck me as interesting, particularly in relation to the news release that triggered my interest. (Who says news releases are irrelevant? Expertmaker has my attention. I suppose that is a good thing, but there are other possible viewpoints too. My attention can be annoying, but, hey, this is a free blog about going “beyond search.”)

First, the label “artificial intelligence” is visible in the description. The AI angle is “machine learning and evolutionary computing.” The point is that the system performs functions that would be difficult using an old fashioned database like DB2, Oracle, or SQL Server. (I assume that the owners of these traditional databases will have some counter arguments to offer.)

Second, the system makes it possible to build search-based applications. (Dassault Exalead has been beating this tom tom for six or seven years. I presume that the Cloud 360 technology is relegated to the user car lot because Expertmaker has rolled into the search dealership.)

Third, a development environment is available, including a “Desktop Artificial Intelligence Toolkit.” There are “solvers.” There are various AI technologies. There is knowledge discovery. There is a “published solution.” And there is this component:

Semantic, value based, meta-data structures allow high precision understanding and value based searches.  With the solution you can create your own semantic structures for handling complex solutions.

Okay, this is pretty standard fare for search start ups. I am not sure what the system does, but I looked at examples, including screenshots.

Read more

2013 Text Mining Summit Draws Record Crowd

November 2, 2013

The Linguamatics Blog recently reported on the outcome of the 2013 Text Mining Summit in the post “Pharma and Healthcare Come Together to See the Future of Text Mining.”

According to the article, this year’s event drew a record crowd of over 85 attendees who had the opportunity to listen to industry experts from the pharma and healthcare sector.

The article summarizes a few event highlights:

“Delegates were provided with an excellent opportunity to explore trends in text mining and analytics, natural language processing and knowledge discovery. Delegates discovered how I2E is delivering valuable intelligence from text in a range of applications, including the mining of scientific literature, news feeds, Electronic Health Records (EHRs), clinical trial data, FDA drug labels and more. Customer presentations demonstrated how I2E helps workers in knowledge driven organizations meet the challenge of information overload, maximize the value of their information assets and increase speed to insight.”

Events like the Text Analytics Summit are excellent opportunities for members of the data analytics community to gather and share their insights and new advances in the industry.

Jasmine Ashton, November 02, 2013

Sponsored by ArnoldIT.com, developer of Beyond Search

Search Wizards Speak: Oleg Rogynskyy, Semantria

October 28, 2013

Semantria is a company focused on providing text and sentiment analysis to anyone. The company’s approach is to streamline the analysis of content to that in less than three minutes and for a nominal $1,000, the power of content processing can help answer tough business questions.

The firm’s founder is Oleg Rogynskyy, who has worked at Nstein (now part of Open Text) and Lexalytics. The idea for Semantria blossomed from Mr. Rogynskyy’s insight that text analytics technology was sufficiently mature so that it could be useful to almost any organization or business professionals.

I interviewed Mr. Rogynskyy on October 24, 2013. He told me:

At Semantria, we want to simplify and democratize access to text analytics technology. We want people to be able to get up and running in no time, with a small budget, and actually derive value from our technology. The classic story is you buy a system worth $100k and don’t deploy it.

Semantria focuses on a class of problems that a few years ago would have been outside the reach of many firms. He said:

We make it simple for our clients to solve the following problems: First, some organizations have too much text to read. For example, a Twitter stream or surveys with many responses. Also, there is the need to move quickly and reduce the time to get to market. Many survey results come with an expiry date before they’re irrelevant. Then there is reporting the information. Anyone can use their Excel smarts to build simple/interesting reports and visuals out of unstructured data. But that can take some time, and Semantria accelerates this step. Finally, users need to analyze text with the same impartiality each time. A human might see a glass as half full or half empty, but Semantria will always see a glass with water.

One of the most interesting aspects of Semantria is that the company delivers its solution as a cloud service. Mr. Rogynskyy observed:

We are happily in the cloud, and in the cloud we trust. We have android and iOS software development kits in the works, so whoever wants to talk to our API from mobile devices will be doing it with ease very soon.

You can get more information about Semantria at https://semantria.com.

This interview is one or more than 60 full-text interviews with individuals who are deeply involved in search, content processing, and analytics. You can find the full series at www.arnoldit.com/search-wizards-speak.

Stephen E Arnold, October 28, 2013

Bitext Teams Up with Actuate for New Data Analysis Solution

October 4, 2013

Bitext.com recently reported on an exciting new partnership in the news release “Actuate and Bitext Announce Collaboration to Deliver Text Analytics Engines and Sentiment Analysis for Big Data Through BIRT.”

According to the article, Bitext, a leader in sentiment analysis, is teaming up with Actuate, a business intelligence software creator, to produce a new and improved text and semantics analytics solution. Additionally, Bitext has announced that it will also be creating a solution with Salesforce.com.

The article states:

“‘Our collaboration with Bitext – providers of advanced semantic solutions for social media, search, and more – extends the types of analysis that can be performed with Actuate’s commercial BIRT developer and end-user platform or solution, by adding the ability to score sentiment toward products and services,” said Josep Arroyo, VP of Analytic Solutions at Actuate. “Users of Actuate with Bitext can now tap more than just negative or positive sentiment analysis. They can also visualize anticipated risks, opportunities and threats for personalized insights, in a single display on any device.’”

This new solution will be a huge asset to marketing professionals, as well as customer support specialists looking to use predictive analytics techniques to gain valuable insights into their customer base. For more information about Bitext, navigate to www.bitext.com.

Jasmine Ashton, October 4, 2013

« Previous PageNext Page »