Microsoft, Text Analytics, and Writing

January 21, 2015

I read the marvelously named “Microsoft Acquires Text Analysis Startup Equivio, Plans to Integrate Machine Learning Tech into Office 365: Equivio Zoom In. Find Out.”

Taking a deep breath I read the article. Here’s what I deduced: Word and presumably PowerPoint will get some new features:

While Office 365 offers e-discovery and information governance capabilities, Equivio develops machine learning technologies for both, meaning an integration is expected to make them “even more intelligent and easy to use.” Microsoft says the move is in line with helping its customers tackle “the legal and compliance challenges inherent in managing large quantities of email and documents.”

The Fast Search & Transfer technology is not working out?  The dozens of SharePoint content enhancers are not doing their job? The grammar checker is not doing its job?

What is different is that Word is getting more machine learning:

Equivio uses machine learning to let users explore large, unstructured sets of data. The startup’s technology leverages advanced text analytics to perform multi-dimensional analyses of data collections, intelligently sort documents into themes, group near-duplicates, and isolate unique data.

Like Microsoft’s exciting adaptive menus, the new system will learn what the user wants.

Is this a next generation information access system? Is Microsoft nosing into Recorded Future territory?

Nope, but the desire to covert what the user does into metadata seems to percolate in the Microsoft innovation coffee pot.

If Microsoft pulls off this shotgun marriage, I think more pressure will be put on outfits like Content Analyst and Smartlogic.

Stephen E Arnold, January 21, 2015

NGIA Palantir Worth Almost As Much As Uber and Xiaomi

January 18, 2015

Short honk: Value is in the eye of the beholder. I am reminded of this each time I see an odd ball automobile sell for six figures on the Barrett Jackson auction.

Navigate to “Palantir Raising More Money After Tagged With $15 Billion Valuation.” Keep in mind that you may have to pay to view the article, or you can check out the apparently free link to source data at

The key point is that Palantir is an NGIA system. Obviously it appears on the surface to have more “value” than Hewlett Packard’s Autonomy or the other content processing companies in the hunt for staggering revenues.

Stephen E Arnold, January 18, 2015

Zaizi: Search and Content Consulting

January 13, 2015

I received a call about Zaizi and the company’s search and content services. The firm’s Web site is at Based on the information in my files, the company appears to be open source centric and an integrator of Lucene/Solr solutions.

What’s interesting is that the company has embraced Mondeca/Smartlogic jargon; for example, content intelligence. I find the phrase interesting and an improvement over the Semantic Web lingo.

The idea is that via indexing, one can find and make use of content objects. I am okay with this concept; however, what’s being sold is indexing, entity extraction, and classification of content.

The issue facing Zaizi and the other content intelligence vendors is that “some” content intelligence and slightly “smarter” information access is not likely to generate the big bucks needed to compete.

Firms like BAE and Leidos as well as the Google/In-Tel-Q backed Recorded Future offer considerably more than indexing. The need is to process automatically, analyze automatically, and generate outputs automatically. The outputs are automatically shaped to meet the needs of one or more human consumers or one or more systems.

Think in terms of taking outputs of a next generation information access system and inputting the “discoveries” or “key items” into another system. The idea is that action can be taken automatically or provided to a human who can make a low risk, high probability decision quickly.

The notion that a 20 something is going to slog through facets, keyword search, and the mind numbing scan results-open documents-look for info approach is decidedly old fashioned.

You can learn more about what the next big thing in information access is by perusing CyberOSINT: Next Generation Information Access at

Stephen E Arnold, January 14, 2015

Palantir and 2014 Funding

January 5, 2015

I read an article that confused me. Navigate to “Palantir Secures First $60M Chunk of Projected $400M Round as Market Asks, “Who?”

This sentence suggests that Palantir wants to go public. What do you think?

But although it would clearly find no trouble catching the market’s attention, the company is in rush to take on the pressure of public trading The secretive nature of its clientele and an apparent desire to prioritize long-term strategy over short-term returns are the primary considerations behind that approach, but what facilitates it is the ease with which Palantir has managed to draw private investors so far.

I wonder if this article means “no” rush. I wonder if this article is software generated.

Here’s another interesting passage:

The document [cited by Techcrunch?] doesn’t specify the source of the capital or what Palantir intends to spend it on, but based on the claim in NYT report that it wasn’t profitable as of May, the money will probably go primarily toward fueling operations. The paper also noted that most of the estimated billion dollars that the company raked in this year came from private sector customers, which provides a hint as to the areas where the funding will be invested, namely the development of its enterprise-oriented Gotham offering.

I have my own views about Palantir which are summarized in the forthcoming CyberOSINT: Next Generation Information Access monograph. (If you want to order a copy, write benkent2020 at yahoo dot com. The book is available to law enforcement, security, and intelligence professionals.)

The statement “isn’t profitable” is fascinating if true.

Stephen E Arnold, January 5, 2015

A Big NGIA Year Ahead

January 1, 2015

The New Year is upon us. We will be posting a page on Xenky where you can request a copy of CyberOSINT: Next Generation Information Access, a link to the seminar which is limited to law enforcement and intelligence professionals only, and some supplementary information that will allow my Beyond Search blog to shift from the dead end enterprise search to the hottest topics in information access.

If you want information about CyberOSINT: Next Generation Information Access, you can send an email to benkent2020 at yahoo dot com. We will send you a one pager about the study. To purchase the book, you must be an active member of the armed forces, a working law enforcement professional, or an individual working for one of the recognized intelligence agencies we support; for example, a NATO member’s intelligence operation.

Stephen E Arnold, January 1, 2015

SAP Hana Search 2014

December 25, 2014

Years ago I wrote an analysis of TREX. At the time, SAP search asserted a wide range of functionality. I found the system interesting, but primarily of use to die hard SAP licensees. SAP was and still is focused on structured data. The wild and crazy heterogeneous information generated by social media, intercept systems, geo-centric gizmos, and humans blasting terabytes of digital images cheek by jowl with satellite imagery is not the playground of the SAP technology.

If you want to get a sense of what SAP is delivering, check out “SAP Hana’s Built-In Search Engine.” My take on the explanation is that it is quite similar to what Fast Search & Transfer proposed for the pre-sale changes to ESP. The built-in system is not one thing. The SAP explainer points out:

A standalone “engine” is not enough, however. That’s why SAP HANA also includes the Info Access “InA” toolkit for HTML5. The InA toolkit is a set of HTML5 templates and UI controls which you can use to configure a modern, highly interactive UI running in a browser. No code – just configuration.

To make matters slightly more confusing, I read “Google Like Enterprise Search Powered by SAP Hana.” I am not sure what “Google like” means. Google provides its ageing and expensive Google Search Appliance. But like Google Earth, I am not sure how long the GSA will remain on the Google product punch list. Futhermore, the GSA is a bit of a time capsule. Its features and functions have not kept pace with next generation information access technologies. Google invested in Recorded Future a couple of years ago and as far as I know, none of the high value Recorded Future functions are part of the GSA. Google also delivers its Web indexing service. Does Google like refer to the GSA, Google’s cloud indexing of Web sites, or the forward looking Recorded Future technology?

The Google angle seems to relate to Fiori search. Based on the screenshots, it appears that Fiori presents SAP’s structured data in a report format. Years ago we used a product called Monarch to deliver this type of information to a client.

My hypothesis is that SAP wants to generate more buzz about its search technology. The company has moved on from TREX, positioned Hana search as a Fast Search emulation, and created Fiori to generate reports from SAP’s structured data management system.

For now, I will keep SAP in my “maybe next year” folder. For now. I am not sure what SAP information access systems deliver beyond basic keyword search, some clustering, and report outputs. SAP at some point may have to embrace open source search solutions. If SAP has maintained its commitment to open source, perhaps these technologies are open source. I would find that reassuring.

Regardless of what SAP is providing licensees, it is clear that the basic features and functions of next generation information access systems are not part of the present line up of products. Like other IBM-inspired companies, the future is rushing forward with SAP search receding in tomorrow’s rear view mirror. Calling a system “Google like” is not helpful, nor does it suggest that SAP is ware of NGIA systems. Some of SAP’s customers will be licensing these systems in order to move beyond what is a variation of query, scan results, open documents, read documents, and hunt for useful information. Organizations require more sophisticated information access services. The models crafted in the 1990s are, in my opinion, are commoditized. Higher value NGIA operations are the future.

Stephen E Arnold, December 25, 2014

Coveoed Up with End of Week Marketing

December 22, 2014

I am the target of inbound marketing bombardments. I used to look forward to Autonomy’s conceptual inducements. In fact, in my opinion, the all-time champ in enterprise search marketing is Autonomy. HP now owns the company, and the marketing has fizzled in my opinion. I am in some far off place, and I sifted through emails, various alerts, and information dumped in my Overflight system.

I must howl, “Uncle.” I have been covered up or Coveo-ed up.

Coveo is the Canadian enterprise search company that began life as a hard drive search program and then morphed into a Microsoft-centric solution. With some timely venture funding, the company has amped up its marketing. The investor have flown to Australia to lecture about search. Australia as you may know is the breeding ground for the TeraText system which is a darned important enterprise application. Out of the Australia research petri dish emerged Funnelback. There was YourAmigo, and some innovations that keep the lights on in the Google offices in the land down under.

Coveo sent me email asking if my Google search appliance was delivering. Well, the GSA does exactly what it was designed to do in the early 2000s. I am not sure I want it to do anything anymore. Here’s part of the Coveo message to me:


Is your Search Appliance failing you? Is it giving you irrelevant search results, or unable to search all of your systems? It’s time you considered upgrading to the only enterprise search platform that:

  • Securely indexes all of your on-premise and cloud-based source systems
  • Provides easy-to-tune relevance and actionable analytics
  • Delivers unified search to any application and device your teams use

If I read this correctly, I don’t need a GSA, an Index Engines, a Maxxcat, or an EPI Thunderstone. I can just pop Coveo into my shop and search my heart out.

How do I know?

Easy. The mid tier consulting firm Gartner has identified Coveo as “the most visionary leader” in enterprise search. I am not sure about the methods of non-blue chip consulting firms. I assume they are objective and on a par with the work of McKinsey, Bain, Booz, Allen, and Boston Consulting Group. I have heard that some mid tier firms take a slightly different approach to their analyses. I know first hand that one mid tier firm recycled my research and sold my work on Amazon without my permission. I don’t recall that happening when I worked at Booz, Allen, though. We paid third parties, entered into signed agreements, and were upfront about who knew what. Times change, of course.

Another message this weekend told me that Coveo had identified five major trends that—wait for it—“increase employee and customer proficiency in 2015.” I don’t mean to be more stupid than the others residing in my hollow in rural Kentucky, but what the heck is “customer proficiency”? What body of evidence supports these fascinating “trends.”

The trends are remarkable for me. I just completed CyberOSINT: Next Generation Information Access. The monograph will be available in early 2015 to active law enforcement, security, and intelligence professionals. If you qualify and want to get a copy, send an email to benkent2020 at yahoo dot com. I was curious to see if the outlook my research team assembled from our 12 months of research into the future of information access matched to Coveo’s trends.

The short answer is, “Not even close.”

Coveo focuses on “the ecosystem of record.” CyberOSINT focuses on automated collection and analytics. An “ecosystem of record” sounds like records management. In 2015 organizations need intelligence automatically discovered in third party, proprietary, and open source content, both historical and real time.

Coveo  identifies “upskilling the end users.” In our work, the focus is on delivering to either a human or another system outputs that permit informed action. In many organizations, end users are being replaced by increasingly intelligent systems. That trend seems significant in the software delivered by the NGIA vendors whose technology we analyzed. (NGIA is shorthand for next generation information access.)

Coveo is concerned about a “competent customer.” That’s okay, but isn’t that about cost reduction. The idea is to get rid of expensive call center humans and replace them with NGIA systems. Our research suggests that automated systems are the future, or did I just point that out in the “upskilling” comment.

Coveo is mobile first. No disagreement there. The only hitch in the git along is that when one embraces mobile, there are some significant interface issues and predictive operations become more important. Therefore, in the NGIA arena, predictive outputs are where the trend runway lights are leading.

Coveo is confident that cloud indexes and their security will be solved. That is reassuring. However, the cloud as well as on premises’ solutions, including hybrid solutions, have to adopt predictive technology that automatically deals with certain threats, malware, violations, and internal staff propensities. The trend, therefore, is for OSINT centric systems that hook into operational and intel related functions as well as performing external scans from perimeter security devices.

What I find fascinating is that in the absence of effective marketing from vendors of traditional keyword search, providers of old school information access are embracing some concepts and themes that are orthogonal to a very significant trend in information access.

Coveo is obviously trying hard, experimenting with mid tier consulting firm endorsements, hitting the rubber chicken circuit, and cranking out truly stunning metaphors like the “customer proficiency” assertion.

The challenge for traditional keyword search firms is that NGIA systems have relegated traditional information access approaches to utility and commodity status. If one wants search, Elasticsearch works pretty well. NGIA systems deliver a different class of information access. NGIA vendors’ solutions are not perfect, but they are a welcome advance over the now four decades old approach to finding important items of information without the Model T approach of scanning a results list, opening and browsing possibly relevant documents, and then hunting for the item of information needed to answer an important question.

The trend, therefore, is NGIA. An it is an important shift to solutions whose cost can be measured. I wish Mike Lynch was driving the Autonomy marketing team again. I miss the “Black Hole of Information”, the “Portal in a Box,” and the Digital Reasoning Engine approach. Regardless of what one thinks about Autonomy, the company was a prescient marketer. If the Lynch infused Autonomy were around today, the moniker “NGIA” would be one that might capture of Autonomy’s marketing love.

Stephen E Arnold, December 23, 2014


Cyber OSINT Surprise: Digital Reasoning

December 19, 2014

I read “Machine Learning Can Help Sift Open Source Intelligence.” I found one familiar name, Basis Technologies. I found one established vendor, Opera Solutions, and I noted one company that has a content processing system. In the run up to the February 19, 2014, Cyber OSINT conference, Basis Technologies pointed out that it was not really into cyber OSINT at least on February 19, 2014. Opera Solutions is interesting and was on the list of 20 firms to invite. We filled the 12 slots quickly. Some deserving companies could not be included. Then there is Digital Reasoning, an outfit in Nashville, Tennessee.

The write up says:

The company’s cognitive computing platform, dubbed Synthesys, scans unstructured open source data to highlight relevant people, places, organizations, events and other facts. It relies on natural language processing along with what the company calls “entity and fact extraction.” Applying “key indicators” and a framework, the platform is intended to automate the process of deriving intelligence from open source data, the company claims. The platform then attempts to assemble and organize relevant unstructured data using similarity algorithms, categorization and “entity resolution.”

The idea which unifies these three companies appears to be fancy math; that is, the use of statistical procedures to resolve issues associated with content processing.

The only hitch in the git along is that the companies that appear to be making the quickest strides in cyber OSINT use hybrid approaches. The idea is that statistical systems and methods are used. These are supplemented with various linguistic systems and methods.

The distinction is to me important. In the February 2015 seminar, a full picture of the features and functions associated with content processing in English and other languages is explored. There are profiles of appliance vendors tapping OSINT to head off threats. But the focus of the talks is on the use of advanced approaches that provide system users with an integrated approach to open source information.

The article is good public relations/content marketing. The article does not highlight the rapid progress the companies participating in the seminar are making. Yesterday’s leaders are today’s marketing challenge. Tomorrow’s front runners are focused on delivering to their clients solutions that break new ground.

For information about the seminar, which is restricted to working law enforcement and intelligence professionals and to place an order for my new monograph “CyberOSINT: Next Generation Information Access,” write benkent2020 at yahoo dot com.

Stephen E Arnold, December 19, 2014

Bottlenose: Not a Dolphin, Another Intelligence Vendor

December 15, 2014

Last week, maybe two weeks ago, I learned that KPMG invested in Bottlenose. The company say that the cash will “take trend intelligence global.” The company asserts here:

We organize the world’s attention and emotion.

I am, as you may know, am interested in what I call NGIA systems. These are next generation information access systems. Instead of dumping a list of Google-style search results in front of me, NGIA systems provide a range of tools to use information in ways that do not require me to formulate a query, open and browse possibly relevant documents, and either output a report or pipe the results into another system. For example, in one application of NGIA system functions, the data from a predictive system can be fed directly into the autonomous component of a drone. The purpose is to eliminate the time delay between an action that triggers a flag in a smart system and taking immediate action to neutralize a threat. NGIA is not your mother’s search engine, although I suppose one could use this type of output input operation to identify a pizza joint.

I scanned the Bottlenose Web site, trying to determine if the claims of global intelligence and organizing the world’s attention and emotion was an NGIA technology or another social media monitoring service. The company asserts that it monitors “the stream.” The idea is that real-time information is flowing through the firm’s monitoring nodes. The content is obviously processed. The outputs are made available to those interested in the marketing.

The company states:

Our Trend Intelligence solutions will take in all forms of stream data, internal and external, for a master, cross-correlated view of actionable trends in all the real-time forces affecting your business.

The key phrase for me is “all forms” of data, “internal and external.” The result will be “a master, cross-correlated view of actionable trends in all the real time forces affecting your business.” Will Bottlenose deliver this type of output to its customers? See “Leaked Emails Reveal MPAA Plans to Pay Elected Officials to Attack Google.” Sure, but only after the fact. If the information is available via a service like Bottlenose there may be some legal consequences in my view.

By my count, there are a couple of “alls” in this description. A bit of reflection reveals that if Bottlenose is to deliver, the company has to have collection methods that work like those associated with law enforcement and intelligence agencies. A number of analysts have noted that the UK’s efforts to intercept data flowing through a Belgian telecommunications company’s servers is interesting.

Is it possible that a commercial operation, with or without KPMG’s investment, is about to deliver this type of comprehensive collection to marketers? Based on what the company’s Web site asserts, I come away with the impression that Bottlenose is similar to the governmental services that are leading to political inquiries and aggressive filtering of information on networks. China is one country which is not shy about its efforts to prevent certain information from reaching its citizens.

Bottlenose says:

Bottlenose Nerve Center™ spots real-time trends, tracks interests, measures conversations, analyzes keywords and identifies influencers. As we expand our library of data sources and aggregate the content, people, thinking and emotion of humanity’s connected communications, Bottlenose will map, reflect and explore the evolving global mind. We aim to continuously show what humanity is thinking and feeling, now.

I can interpret this passage as suggesting that a commercial company will deliver “all” information to a customer via its “nerve center.” Relationships between and among entities can be discerned; for example:

Trend Intelligence - Sonar

This is the type of diagram that some of the specialized law enforcement and intelligence systems generate for authorized users. The idea is that a connection can be spotted without having to do any of the Google-style querying-scanning-copying-thinking type work.

My view of Bottlenose and other company’s rushing to emulate the features and functio0ns of the highly specialized and reasonably tightly controlled systems in use by law enforcement and intelligence agencies may be creating some unrealistically high expectations.

The reality of many commercial services, which may or may not apply to Bottlenose, is that:

  1. The systems use information on RSS feeds, the public information available from Twitter and Facebook, and changes to Web pages. These systems do not and cannot due to the cost  perform comprehensive collection of high-interest data. The impression is that something is being done which is probably not actually taking place.
  2. The consequence of processing a subset of information is that the outputs may be dead wrong at worst and misleading at best. Numerical processes can identify that Lady Gaga’s popularity is declining relative to Taylor Swift’s. But this is a function that has been widely available from dozens of vendors for many years. Are the users of these systems aware of the potential flaws in the outputs? In my experience, nope.
  3. The same marketing tendencies that have contributed to the implosion of the commercial enterprise search sector are now evident in the explanation of what can be done with historical and predictive math. The hype may attract a great deal of money. But it appears that generating and sustaining revenue is a challenge few companies in this sector have been able to achieve.

My suggestion is that Bottlenose may not be a “first mover.” Bottlenose is a company that is following in the more than 15 year old footsteps of companies like Autonomy, developers of the DRE, and i2 Ltd. Both of these are Cambridge University alumni innovations. Some researchers push the origins of this type of information analysis back to the early 1970s. For me, the commercialization of the Bayesian and graph methods in the late 1990s is a useful take off point.

What is happening is that lower computing costs and cheaper storage have blended with mathematical procedures taught in most universities. Add in the Silicon Valley sauce, and we have a number of start ups that want to ride the growing interest in systems that are not forcing Google style interactions on users.

The problem is that it is far easier to paint a word picture than come to grips with the inherent difficulties in using the word “all.” That kills credibility in my book. For a company to deliver an NGIA solution, a number of software functions must be integrated into a functioning solution. The flame out of Fast Search & Transfer teaches a useful lesson. Will the lessons of Fast Search apply to Bottlenose? It will be interesting to watch the story unfold.

Stephen E Arnold, December 15, 2014

Artificial Intelligence: Duh? What?

December 13, 2014

I have been following the “AI will kill us”, the landscape of machine intelligence craziness, and “Artificial Intelligence Isn’t a Threat—Yet.”

The most recent big thinking on this subject appears in the Wall Street Journal, an organization in need of any type of intelligence: Machine, managerial, fiscal, online, and sci-fi.

Harsh? Hmm. The Wall Street Journal has been running full page ads for Factiva. If you are not familiar with this for fee service, think 1981. The system gathers “high value” content and makes it available to humans clever enough to guess the keywords that unlock, not answers, but a list of documents presumably germane to the keyword query. There are wrappers that make Factiva more fetching. But NGIA systems (what I call next generation information access systems) use the Factiva methods perfected 40 years ago as a utility.

These are Cheetos. nutritious, right? Will your smart kitchen let you eat these when it knows you are 30 pounds overweight, have consumed a quart of alcohol infused beverages, and ate a Snickers for lunch? Duh? What?

NGIA systems are sort of intelligent. The most interesting systems recurse through the previous indexes as the content processing system ingests data from users happily clicking, real time content streaming to the collection service, and threshold adjustments made either by savvy 18 year olds or some numerical recipes documented by Google’s Dr. Norvig in his standard text Artificial Intelligence.

So should be looking forward to the outputs of a predictive system pumping directly into an autonomous unmanned aerial vehicle? Will a nifty laser weapon find and do whatever the nifty gizmo does to a target? Will the money machine figure out why I need $300 for concrete repairs and decline to give it to me because the ATM “knows” the King of Concrete could not lay down in a feather bed. Forget real concrete.

The Wall Street Journal write up offers up this titbit:

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta