CyberOSINT banner

The Future of Enterprise and Web Search: Worrying about a Very Frail Goose

May 28, 2015

For a moment, I thought search was undergoing a renascence. But I was wrong. I noted a chart which purports to illustrate that the future is not keyword search. You can find the illustration (for now) at this Twitter location. The idea is that keyword search is less and less effective as the volume of data goes up. I don’t want to be a spoil sport, but for certain queries key words and good old Boolean may be the only way to retrieve certain types of information. Don’t believe me. Log on to your organization’s network or to Google. Now look for the telephone number of a specific person whose name you know or a tire company located in a specific city with a specific name which you know. Would you prefer to browse a directory, a word cloud, a list of suggestions? I want to zing directly to the specific fact. Yep, key word search. The old reliable.

But the chart points out that the future is composed of three “webs”: The Social Web, the Semantic Web, and the Intelligent Web. The dates for the Intelligent Web appears to be 2018 (the diagram at which I am looking is fuzzy). We are now perched half way through 2015. In 30 months, the Intelligent Web will arrive with these characteristics:

Embedded image permalink

  • Web scale reasoning (Don’t we have Watson? Oh, right. I forgot.)
  • Intelligent agents (Why not tap Connotate? Agents ready to roll.)
  • Natural language search (Yep, talk to your phone How is that working out on a noisy subway train?)
  • Semantics. (Embrace the OWL. Now.)

Now these benchmarks will arrive in the next 30 months, which implies a gradual emergence of Web 4.0.

The hitch in the git along, like most futuristic predictions about information access, is that reality behaves in some unpredictable ways. The assumption behind this graph is “Semantic technology help to regain productivity in the face of overwhelming information growth.”

Read more

Kelsen Enters Legal Search Field

February 23, 2015

A new natural-language search platform out of Berlin, Kelsen, delivers software-as-a-service to law firms. Basic Thinking discusses “The Wolfram Alpha of the Legal Industry.” Writer Jürgen Kroder interviewed Kelsen co-founder Veronica Pratzka. She explains what makes her company’s search service different (quote auto-translated from the original German):

“Kelsen is generated based on pre-existing legal cases not a search engine, but a self-learning algorithm that automatically answers. 70-80 percent of the global online data are very unstructured. Search engines look for keywords and only. Google has many answers, but you have to look for them yourself thousands of search results together and hope that you just entered the correct keywords. Kelsen, however, is rather a free online lawyer who understands natural language practitioner trained in all areas of law, works 24/7 and is always up-to-date….

“First Kelsen understands natural language compared to Google! That is, even with the entry of long sentences and questions, not just keywords, Kelsen is suitable answers. Moreover, Kelsen searches ‘only’ relevant legal data sources and provides the user with a choice of right answers ready, he can also evaluate.’
“One could easily Kelsen effusive as ‘the Wolfram Alpha the legal industry,’ respectively. We focus on Kelsen with legal data structure and analyze them in order to eventually make available. From this structuring and visualization of legal data not only seeking advice and lawyers can benefit, but also legislators, courts and research institutions.”

Pratzka notes that her company received boosts from both the Microsoft Accelerator and the IBM Entrepreneur startup support programs. Kelsen expects to turn a profit on the business-to-consumer side through premium memberships. In business-to-business, though, the company plans to excel by simply outperforming the competition. Pratzka seems very confident. Will the service garner the attention she and her team expect?

Cynthia Murrell, February 23, 2015

Sponsored by ArnoldIT.com, developer of Augmentext

Facebook Gains Natural Language Capacity with Via AI Acquisition

February 11, 2015

Facebook is making inroads into the natural language space, we learn from “Facebook Buys Wit.ai, Adds Natural Language Knowhow” at ZDNet. Reporter Larry Dignan tells us the social-media giant gained more than 6,000 developers in the deal with the startup, who has created an open-source natural language platform with an eye to the “Internet of Things.” He writes:

“Wit.ai is an early stage startup that in October raised $3 million in seed financing with Andreessen Horowitz as the lead investor. Wit.ai aims to create a natural language platform that’s open sourced and distributed. Terms of the deal weren’t disclosed, but indicates what Facebook is thinking. As the social network is increasingly mobile, it will need natural language algorithms and knowhow to add key features. Rival Google has built in a bevy of natural language tools into Android and Apple has its Siri personal assistant.”

Though the Wit.ai platform is free for open data projects, it earns its keep through commercial instances and queries-per-day charges. Wit.ai launched in October 2013, and is headquartered in Palo Alto, California.

Cynthia Murrell, February 11, 2015

Sponsored by ArnoldIT.com, developer of Augmentext

Watson Goes Open Source…Not Really

January 2, 2015

IBM’s Watson is becoming a new natural language processing analytical tool. It is doubtful that IBM will ever expose Watson’s guts to the open source community, but parts of its internal software organs were designed around existing open source work. Also do not doubt the open source community’s resourcefulness. The community is already building their own Watson-like entities. InfoWorld lists these open source projects on “Watson Wannabes: 4 Open Source Projects For Machine Intelligence.”

DARPA DeepDive is an automated system for classifying unstructured data that emulates Watosn’s decision-making process with human guidance. Christopher Re of the University of Wisconsin, developed it.

Apache Unstructured Information Management (UIMA) is a program that was actually used to program Watson. It is a standard for performing analysis on textual content. IBM UIMA architecture is available via the open source Apache Foundation. It is not a complete machine learning system and only offers the minimum code to build on.

OpenCog’s goal is to build a platform for developers to build and share artificial intelligence programs. OpenCog wants to help create intelligent systems that have humanlike world understanding rather than being focused on one specific area. OpenCog is already using NLP, making it a practical solution similar to Watson.

The Open Advancement of Question Answering Systems (OAQA) is more akin to Watson than the other three. It offers an advanced question and answering system-using NLP. IBM and Carnegie Mellon University started it. OAQA is only a toolkit, not a downloadable solution.

“The one major drawback to each project, as you can guess, is that they’re not offered in nearly as refined or polished a package as Watson. Whereas Watson is designed to be used immediately in a business context, these are raw toolkits that require heavy lifting. Plus, Watson’s services have already been pre-trained with a curated body of real-world data. With these systems, you’ll have to supply the data sources, which may prove to be a far bigger project than the programming itself.”

All too true.

Whitney Grace, January 02, 2015
Sponsored by ArnoldIT.com, developer of Augmentext

Garbling the Natural Language Processors

December 30, 2014

Natural language processing is becoming a popular analytical tool as well as a quicker way for search and customer support. Dragon Nuance is at the tip of everyone’s tongue when NLP enters a conversation, but there are other products with their own benefits. Code Project recently reviewed three of NLP in, ”A Review Of Three Natural Language Processors, AlchemyAPI, OpenCalais, And Semantria.”

Rather than sticking readers with plain product reviews, Code Project explains what NLP is used for and how it accomplishes it. While NLP is used for vocal commands, it can do many other things: improve SEO, knowledge management, text mining, text analytics, content visualization and monetization, decision support, automatic classification, and regulatory compliance. NLP extracts entities aka proper nouns from content, then classifies, tags, and provides a sentiment score to give each entity a meaning.

In layman’s terms:

“…the primary purpose of an NLP is to extract the nouns, determine their types, and provide some “scoring” (relevance or sentiment) of the entity within the text.  Using relevance, one can supposedly filter out entities to those that are most relevant in the document.  Using sentiment analysis, one can determine the overall sentiment of an entity in the document, useful for determining the “tone” of the document with regards to an entity — for example, is the entity “sovereign debt” described negatively, neutrally, or positively in the document?”

NLP categorizes the human element in content. Its usefulness will become more apparent in future years, especially as people rely more and more on electronic devices for communication, consumerism, and interaction.

Whitney Grace, December 30, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

A Plan for Achieving ROI via Text Analytics

December 6, 2014

ROI is the end goal for many big data and enterprise related projects and it is refreshing to see some information published in regards to if companies achieve it like we recently saw in a Smart Data Collective article, “Text Analytics, Big Data and the Keys to ROI.” According to a study released last year (further discussed in“Text/Content Analytics 2011: User Perspectives on Solutions and Providers”) the reason many businesses do not get positive returns has to do with the planning phase. Many report that they did not start with a clear plan to get there.

The author shares with us an example from his full-time work in text analytics. One of his clients that was focused on sifting through masses of social media data and data from government applications looking for suspicious activity needed a solution for a text-heavy application. The author responded by suggesting a selective cross-lingual process, one which worked with the text in its native language, and only on the text that was relevant to the topic of interest.

The following happened after the author’s suggestion:

Although he seemed to appreciate the logic of my suggestions and the quality benefits of avoiding translation, he just didn’t want to deal with a new approach. He asked to just translate everything and analyze later – as many people do. But I felt strongly that he’d be spending more and getting weaker results. So, I gave him two quotes. One for translating everything first and analyzing later – his way, and one for the cross-lingual approach that I recommended. When he saw that his own plan was going to cost over a million dollars more, he quickly became very open minded about exploring a new approach.

It sounds like the author could have suggested a number of similar semantic processing solutions. For example, Cogito Intelligence API enhances the ability to decipher meaning and insights from a multitude of content sources including social media and unstructured corporate data. The point is that ROI is out there and there are innovative companies like Expert System and beyond enabling it.

Megan Feil, December 6, 2014

NLP Market Poised for Rapid Growth

October 9, 2014

Here’s a robust prediction. PR Newswire declares, “Natural Language Processing Market to See 21.1% CAGR for 2013-2018.” (For those not aware, CAGR stands for compound annual growth rate.) The forecast comes from a report found for sale at the logically named site ReportsnReports. Companies across the NLP spectrum are profiled in the 199 page report. The write-up explains:

“The Natural Language Processing (NLP) market is estimated to grow from $ 3,787.3 million in 2013 to $9,858.4 million in 2018. This represents a Compounded Annual Growth Rate (CAGR) of 21.1% from 2013 to 2018. In the current scenario, web and e-commerce, healthcare, IT and Telecommunication vertical continues to grow and are the largest contributor for Natural Language Processing (NLP) software market. In terms of regional growth, North America is expected to be the biggest market in terms of revenue contribution. European and APAC region is expected to experience increased market traction, due to increasing adoption across various verticals and investment support in research projects from the regional government.”

According to the report, factors like growing smartphone usage, enhanced customer experiences, the big data trend, and machine-to-machine technology are pushing the natural language processing market forward. Unsurprisingly, the adoption of electronic health records in the healthcare industry plays a large role, as well. The report is said to supply comprehensive analysis of global adoption trends, the competitive landscape, and venture-capital funding opportunities. It also examines some of the major vendors that seem to make innovation a priority, giving them the edge in integrating with enterprise platforms. See the write-up for more details.

Cynthia Murrell, October 09, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Watson and Its API

September 24, 2014

Short honk: Attention, Watson fans. check out the documentation “Example Post for Answers with Evidence.” Put your code hat on.

Stephen E Arnold, September 25, 2014

The March of IBM Watson: From Kitchen to Executive Suite

August 5, 2014

Watson, fresh from its recipe innovations at Bon Appétit, is on the move…again. From the game show to the hospital, Watson has been demonstrating its expertise in the most interesting venues.

I read “A Room Where Executives Go to Get Help from IBM’s Watson.” The subtitle is an SEO dream: “Researchers at IBM are testing a version of Watson designed to listen and contribute to business meetings.” I know IBM has loads of search and content processing capability. In addition to the gems cranked out by Dr. Jon Kleinberg and Dr. Ramanathan Guha, IBM has oodles of acquisitions in the search and content processing sector. Do you know about Clementine? Are you familiar with iPhrase? Have your explored Cybertap’s indexing and search function with your local IBM representative? What about Vivisimo? What about the search functions in DB2, FileNet, and OminFind regardless of its incarnation? Whew. That’s a lot of search and content processing horsepower. I think most of that power remains in the barn.

Watson is not in the barn. Watson is a raging bull. Watson is, I believe, something special. Based on open source technology plus home brew wizardry, Watson is a next-generation information retrieval world beater. The idea is that Watson is trained in a manner similar to the approach used by Autonomy in 1996. Then that indexed content is whipped into a question answering system. Hapless chefs, litigation wary physicians, and now risk averse MBAs can use Watson to make better decisions or answer really tough questions.

I know this to be true because Technology Review tells me so. Whatever MIT-tinged Technology Review says is pretty darned solid. Here’s a passage I noted:

Everything said in the room can be instantly transcribed, providing a detailed record of any meeting, and allowing the system to listen out for commands addressed to “Watson.” Those commands can be simple requests for information of the kind you might type into a search box. But Watson can also take a more active role in a discussion. In a live demonstration, it helped researchers role-playing as executives to generate a short list of companies to acquire.

The write up explains that a little bit of preparation is required. There’s the pesky training, which is particularly annoying when the topic of the meeting is, “The DOJ attorneys are here to discuss the depositions” or “We have a LOCA at the reactor. Everyone to my conference room now.” I suppose most business meetings are even more exciting.

Technology Review points out that the technology has a tough time converting executive speech to text. Watson uses the text as fodder for the indexing and parsing required to pass queries to the internal subsystems which then tap into Watson for answers. The natural language query and automatic query refinement functions seem to work well for game show questions and for discerning uses of tamarind. For a LOCA meeting or discussion of a deposition, Watson may need a bit more work.

I find the willingness of major “real” news outlets to describe Watson in juicy write ups an indication of the esteem in which IBM is held. My view is a bit different. I am not sure the Watson group at IBM knows how to generate substantial revenues. The folks have to make some progress toward $1 billion in revenue and then grow that revenue to a modest $10 billion in five or six years.

The fact that outfits in search and content processing have failed to hit more modest benchmarks for decades is irrelevant. The only search company that I know has generated billions is Google. Keep in mind that those billions come from online advertising. HP bought Autonomy for $11 billion in the hopes of owning a Klondike. IBM wisely went with open source technology and home grown code.

But the eventual effect of both HP’s and IBM’s approach will be more modest revenues. HP makes a name for itself via litigation and IBM is making a name for itself with demonstrations and some recipes.

Search and content processing, whether owned by a large company or a small one, faces some credibility, marketing, revenue, technology, and profit challenges. I am not sure a business triathlete can complete the course at this time. Talk is just so much easier than getting over or around the course intact.

Stephen E Arnold, August 5, 2014

I2E Semantic Enrichment Unveiled by Linguamatics

July 21, 2014

The article titled Text Analytics Company Linguamatics Boosts Enterprise Search with Semantic Enrichment on MarketWatch discusses the launch of 12E Semantic Enrichment from Linguamatics. The new release allows for the mining of a variety of texts, from scientific literature to patents to social media. It promises faster, more relevant search for users. The article states,

“Enterprise search engines consume this enriched metadata to provide a faster, more effective search for users. I2E uses natural language processing (NLP) technology to find concepts in the right context, combined with a range of other strategies including application of ontologies, taxonomies, thesauri, rule-based pattern matching and disambiguation based on context. This allows enterprise search engines to gain a better understanding of documents in order to provide a richer search experience and increase findability, which enables users to spend less time on search.”

Whether they are spinning semantics for search, or if it is search spun for semantics, Linguamatics has made their technology available to tens of thousands of users of enterprise search. Representative John M. Brimacombe was straightforward in his comments about the disappointment surrounding enterprise search, but optimistic about 12E. It is currently being used by many top organizations, as well as the Food and Drug Administration.

Chelsea Kerwin, July 21, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Next Page »