December 28, 2016
I spotted a tweet about making smart content smarter. It seems that if content is smarter, then intelligence becomes contentier. I loved my logic class in 1962.
Here’s the diagram from this tweet. Hey, if the link is wonky, just attend the conference and imbibe the intelligence directly, gentle reader.
The diagram carries the identifier Data Ninja, which echoes Palantir’s use of the word ninja for some of its Hobbits. Data Ninja’s diagram has three parts. I want to focus on the middle part:
What I found interesting is that instead of a single block labeled “content processing,” the content processing function is broken into several parts. These are:
A Data Ninja API
A third component in the top box is the statement “analyze unstructured text.” This may refer to indexing and such goodies as entity extraction.
The second box performs “text analysis.” Obviously this process is different from “the analyze unstructured text” step; otherwise, why run the same analyses again? The second box performs what may be clustering of content into specific domains. This is important because a “terminal” in transportation may be different from a “terminal” in a cloud hosting facility. Disambiguation is important because the terminal may be part of a diversified transportation company’s computing infrastructure. I assume Data Ninja’s methods handles this parsing of “concepts” without many errors.
Once the selection of a domain area has been performed, the system appears to perform four specific types of operations as the Data Ninja practice their katas. These are the smart components:
- Smart sentiment; that is, is the content object weighted “positive” or “negative”, “happy” or “sad”, or green light or red light, etc.
- Smart data; that is, I am not sure what this means
- Smart content; that is, maybe a misclassification because the end result should be smart content, but the diagram shows smart content as a subcomponent within the collection of procedures/assertions in the middle part of the diagram
- Smart learning; that is, the Data Ninja system is infused with artificial intelligence, smart software, or machine learning (perhaps the three buzzwords are combined in practice, not just in diagram labeling?)
- The end result is an iPhrase-type representation of data. (Note: that this approach infuses TeraText, MarkLogic, and other systems which transform unstructured data to metadata tagged structured information).
The diagram then shows a range of services “plugging” into the box performing the functions referenced in my description of the middle box.
If the system works as depicted, Data Ninjas may have the solution to the federation challenge which many organizations face. Smarter content should deliver contentier intelligence or something along that line.
Stephen E Arnold, November 28, 2016
November 11, 2016
What’s next in search? My answer is, “No search at all. The system thinks for you.” Sounds like Utopia for the intellectual couch potato to me.
I read “The Latest in Search: New Services in the Content Discovery Marketplace.” The main point of the write up is to highlight three “discovery” services. A discovery service is one which offers “information users new avenues to the research literature.”
See, no search needed.
The three services highlighted are:
- Yewno, which is powered by an inference engine. (Does anyone remember the Inference search engine from days gone by?). The Yewno system uses “computational analysis and a concept map.” The problem is that it “supplements institutional discovery.” I don’t know what “institutional discovery” means, and my hunch is that folks living outside of rural Kentucky know what “institutional discovery” means. Sorry to be so ignorant.
- ScienceOpen, which delivers a service which “complements open Web discovery.” Okay. I assume that this means I run an old fashioned query and ScienceOpen helps me out.
- TrendMD, which “serves as a classic “onward journey tool” that aims to generate relevant recommendations serendipitously.”
I am okay with the notion of having tools to make it easier to locate information germane to a specific query. I am definitely happy with tools which can illustrate connections via concept maps, link analysis, and similar outputs. I understand that lawyers want to type in a phrase like “Panama deal” and get a set of documents related to this term so the mass of data can be chopped down by sending, recipient, time, etc.
But setting up discovery as a separate operation from keyword or entity based search seems a bit forced to me. The write up spins its lawn mower blades over the TrendMD service. That’s fine, but there are a number of ways to explore scientific, technical, and medical literature. Some are or were delightful like Grateful Med; others are less well known; for example, Mednar and Quertle.
Discovery means one thing to lawyers. It means another thing to me: A search add on.
Stephen E Arnold, November 11, 2016
November 9, 2016
I read “Peter Thiel Explains Why His Company’s Defense Contracts Could Lead to Less War.” I noted that the write up appeared in the Washington Post, a favorite of Jeff Bezos I believe. The write up referenced a refrain which I have heard before:
Washington “insiders” currently leading the government have “squandered” money, time and human lives on international conflicts.
What I highlighted as an interesting passage was this one:
a spokesman for Thiel explained that the technology allows the military to have a more targeted response to threats, which could render unnecessary the wide-scale conflicts that Thiel sharply criticized.
I also put a star by this statement from the write up:
“If we can pinpoint real security threats, we can defend ourselves without resorting to the crude tactic of invading other countries,” Thiel said in a statement sent to The Post.
The write up pointed out that Palantir booked about $350 million in business between 2007 and 2016 and added:
The total value of the contracts awarded to Palantir is actually higher. Many contracts are paid in a series of installments as work is completed or funds are allocated, meaning the total value of the contract may be reflected over several years. In May, for example, Palantir was awarded a contract worth $222.1 million from the Defense Department to provide software and technical support to the U.S. Special Operations Command. The initial amount paid was $5 million with the remainder to come in installments over four years.
I was surprised at the Washington Post’s write up. No ads for Alexa and no Beltway snarkiness. That too was interesting to me. And I don’t have a dog in the fight. For those with dogs in the fight, there may be some billability worries ahead. I wonder if the traffic jam at 355 and Quince Orchard will now abate when IBM folks do their daily commute.
Stephen E Arnold, November 9, 2016
November 7, 2016
There are differences among these three use cases for entity extraction:
- Operatives reviewing content for information about watched entities prior to an operation
- Identifying people, places, and things for a marketing analysis by a PowerPoint ranger
- Indexing Web content to add concepts to keyword indexing.
Regardless of your experience with software which identifies “proper nouns,” events, meaningful digits like license plate numbers, organizations, people, and locations (accepted and colloquial)—you will find the information in “Performance Comparison of 10 Linguistic APIs for Entity Recognition” thought provoking.
The write up identifies the systems which perform the best and the worst.
Here are the five systems and the number of errors each generated in a test corpus. The “scores” are based on a test which contained 150 targets. The “best” system got more correct than incorrect. I find the results interesting but not definitive.
The five best performing systems on the test corpus were:
- Intellexer API (best)
- Lexalytics (better
- AlchemyLanguage IBM (good)
- Indico (less good)
- Google Natural Language.
The five worst performing systems on the test corpus were:
- Microsoft Cognitive Services (dead last)
- Hewlett Packard Enterprise Haven (penultimate last)
- Text Razor (antipenultimate)
- Meaning Cloud
- Aylien (apparently misspelled in the source article).
There are some caveats to consider:
- Entity identification works quite well when the training set includes the entities and their synonyms as part of the training set
- Multi-language entity extraction requires additional training set preparation. “Learn as you go” is often problematic when dealing with social messages, certain intercepted content, and colloquialisms
- Identification of content used as a code—for example, Harrod’s teddy bear for contraband—is difficult even for smart software operating with subject matter experts’ input. (Bad guys are often not stupid and understand the concept of using one word to refer to another thing based on context or previous interactions).
Net net: Automated systems are essential. The error rates may be fine for some use cases and potentially dangerous for others.
Stephen E Arnold, November 7, 2016
October 21, 2016
Have you ever visited a Web site and then lost the address or could not find a particular section on it? You know that the page exists, but no matter how often you use an advanced search feature or scour through your browser history it cannot be found. If you use Google Chrome as your main browser than there is a solution, says GHacks in the article, “Falcon: Full-Text history Search For Chrome.”
Falcon is a Google Chrome extension that adds full-text history search to a browser. Chrome usually remembers Web sites and their extensions when you type them into the address bar. The Falcon extension augments the default behavior to match text found on previously visited Web Sites.
Falcon is a search option within a search feature:
The main advantage of Falcon over Chrome’s default way of returning results is that it may provide you with better results. If the title or URL of a page don’t contain the keyword you entered in the address bar, it won’t be displayed by Chrome as a suggestion even if the page is full of that keyword. With Falcon, that page may be returned as well in the suggestions.
The new Chrome extension acts as a delimiter to recorded Web history and improves a user’s search experience so they do not have to sift through results individually.
October 3, 2016
Pharmaceutical companies are a major power in the United States. Their power comes from the medicine they produce and the wealth they generate. In order to maintain both wealth and power, pharmaceutical companies conduct a lot of market research. Market research is a field based on people’s opinions and their reactions, in other words, it contains information that is hard to process into black and white data. Lexalytics is a big data platform built with a sentiment analysis to turn market research into useable data.
Inside Big Data explains how “Lexalytics Radically Simplifies Market Research And Voice Of Customer Programs For The Pharmaceutical Industry” with a new package called the Pharmaceutical Industry Pack. Lexalytics uses a combination of machine learning and natural language processing to understand the meaning and sentiment in text documents. The new pack can help pharmaceutical companies interpret how their customers react medications, what their symptoms are, and possible side effects of medication.
Our customers in the pharmaceutical industry have told us that they’re inundated with unstructured data from social conversations, news media, surveys and other text, and are looking for a way to make sense of it all and act on it,’ said Jeff Catlin, CEO of Lexalytics. ‘With the Pharmaceutical Industry Pack — the latest in our series of industry-specific text analytics packages — we’re excited to dramatically simplify the jobs of CEM and VOC pros, market researchers and social marketers in this field.
Along with basic natural language processing features, the Lexalytics Pharmaceutical Industry Pack contains 7000 sentiment terms from healthcare content as well as other medical references to understand market research data. Lexalytics makes market research easy and offers invaluable insights that would otherwise go unnoticed.
October 1, 2016
Search vendors can save their business by embracing text analytics. Sounds like a wise statement, right? I would point out that our routine check of search and content processing companies turned up this inspiring Web page for Attensity, the Xerox Parc love child and once hot big dog in text analysis:
Attensity joins a long list of search-related companies which have had to reinvent themselves.
The company pulled in $90 million from a “mystery investor” in 2014. A pundit tweeted in 2015:
In February 2016, Attensity morphed into Sematell GmbH, a company with interaction solutions.
I mention this arabesque because it underscores:
- No single add on to enterprise search will “save” an information access company
- Enterprise search has become a utility function. Witness the shift to cloud based services like SearchBlox, appliances like Maxxcat, and open source options. Who will go out on a limb for a proprietary utility when open source variants are available and improving?
- Pundits who champion a company often have skin in the game. Self appointed experts for cognitive computing, predictive analytics, or semantic link analysis are tooting a horn without other instruments.
Attensity is a candidate to join the enterprise search Hall of Fame. In the shrine are Delphes, Entopia, et al. I anticipate more members, and I have a short list of “who is next” taped on my watch wall.
Stephen E Arnold, October 1, 2016
September 23, 2016
I read “Commentary: The US Army Should Rethink Its Approach to DCGS.” The write up is interesting because it helped me understand the relationships which exist between an elected official (Congressman Duncan Hunter, Republican from California) and a commercial enterprise (Palantir Technologies). Briefly: The Congressman believes the US Army should become more welcoming to Palantir Technologies’ Gotham system.
A representation of the Department of Defense’s integrated defense acquisition, technology, and life cycle management system.
The write up points out that the US Army is pretty good with tangible stuff: Trucks, weapons, and tanks. The US Army, however, is not as adept with the bits and the bytes. As a result, the US Army’s home brew Distributed Common Ground System is not sufficiently agile to keep pace with the real world. DCGS has consumed about $4 billion and is the product of what I call the “traditional government procurement.”
The Congressman (a former Marine) wants to US Army to embrace Palantir Gotham in order to provide a better, faster, and cheaper system for integrating different types of information and getting actionable intelligence.
US Marine Captain Duncan Hunter before becoming a Congressman. Captain Hunter served in Iraq and Afghanistan. Captain Hunter was promoted to major in 2012.
The write up informed me:
Congress, soldiers and the public were consistently misinformed and the high degree of dysfunction within the Army was allowed to continue for too long. At least now there is verification—through Army admittance—of the true dysfunction within the program.
Palantir filed a complaint which was promptly sealed. The Silicon Valley company appears to be on a path to sue the US Army because Palantir is not the preferred way to integrate information and provide actionable intelligence to US Army personnel.
The Congressman criticizes a series of procedures I learned to love when I worked in some of the large government entities. He wrote:
he Army and the rest of government should take note of the fact that the military acquisition system is incapable of conforming to the lightening pace and development targets that are necessary for software. This should be an important lesson learned and cause the Army—especially in light of repeated misleading statements and falsehoods—to rethink its entire approach on DCGS and how it incorporates software for the Army of the future.
The call to action in the write up surprised me:
The Army has quality leaders in Milley and Fanning, who finally understand the problem. Now the Army needs a software acquisition system and strategy to match.
My hunch is that some champions of Palantir Gotham were surprised too. I expected the Congressman to make more direct statements about Palantir Gotham and the problems the Gotham system might solve.
After reading the write up, I jotted down these observations:
- The DCGS system has a number of large defense contractors performing the work. One of them is IBM. IBM bought i2 Group. Before the deal with IBM, i2 sued Palantir Technologies, alleging that Palantir sought to obtain some closely held information about Analyst’s Notebook. The case was settled out of court. My hunch is that some folks at IBM have tucked this Palantir-i2 dust up away and reference it when questions about seamless integration of Gotham and Analyst’s Notebook arise.
- Palantir, like other search and content processing vendors, needs large engagements. The millions, if not billions, associated with DCGS would provide Palantir with cash and a high profile engagement. A DCGS deal would possibly facilitate sales of Gotham to other countries’ law enforcement and intelligence units.
- The complaint may evolve into actual litigation. Because the functions of Gotham are often used for classified activities, the buzz might allow high-value information to leak into the popular press. Companies like Centrifuge Systems, Ikanow, Zoomdata, and others would benefit from a more open discussion of the issues related to the functioning of DCGS and Gotham. From Palantir’s point of view, this type of information in a trade publication would not be a positive. For competitors, the information could be a gold mine filled with high value nuggets.
Net net: The Congressman makes excellent points about the flaws in the US Army procurement system. I was disappointed that a reference to the F 35 was not included. From my vantage point in Harrod’s Creek, the F 35 program is a more spectacular display of procurement goofs.
More to come. That’s not a good thing. A fully functioning system would deliver hardware and software on time and on budget. If you believe in unicorns, you will like me have faith in the government bureaucracy.
Stephen E Arnold, September 23, 2016
August 23, 2016
Search and retrieval technology finds a place in a “bot landscape.” The collection of icons appears in “Introducing the Bots Landscape: 170+ Companies, $4 Billion in Funding, Thousands of Bots.” The diagram of the bots landscape in the write up is, for me, impossible to read. I admit it does convey the impression of a lot of a bots. The high resolution version was also difficult for me to read. You can download a copy and take a gander yourself at this link. But there is a super high resolution version available for which one must provide a name and an email. Then one goes through a verification step. Clever marketing? Well, annoying to me. The download process required three additional clicks. Here it is. A sight for young eyes.
I was able to discern a reference to search and retrieval technology in the category labeled “AI Tools: Natural Language Processing, Machine Learning, Speech & Voice Recognition.” I was able to identity the logo of Fair Issacs and the mark of Zorro, but the other logos were unreadable by my 72 year old eyes.
The graphic includes these bot-agories too:
- Bots with traction
- Connectors and shared services
- Bot discover
- Bot developer frameworks and tools
The bot landscape is rich and varied. MBAs and mavens are resourceful and gifted specialists in classification. The fact that the categories are, well, a little muddled is less important than finding a way to round up so many companies worth so much money.
Stephen E Arnold, August 23, 2016
August 4, 2016
A year ago I read “20+ Text Mining and Text Analysis Tools.” The sale of Recommind to OpenText and the lack of excitement about search gave me an idea. Where are the companies identified by a mid tier consulting firm today. Let’s take a quick look.
AlchemyAPI. The company now asserts that its powers the “AI economy.” The Web sites has been updated since I last looked. There is a demo and a “free API key.” The system is now a platform. Gartner found the company to be a “cool vendor” in 2014. The company offers a webinar called “Building with Watson.”
Angoss. The company allows a customer to “predict, act, perform.” The focus is now on “customer intelligence in a single analytics tool.” The firm offers “knowledge” products and an insight optimizer.
Attensity. The company has undergone some change. The www.attensity.com Web site 404s. Years ago a text analytics cheerleader professed to be a fan. I think portions of the company operate under a different name in Germany. Appears to be in quiet mode.
Basis Technology. The company provided language reacted tools to outfits like Fast Search & Transfer. Someone told me that Basis dabbled in enterprise search. One high profile executive jumped to a company in Madrid.
Brainspace. The company’s Web site tells me, “We build brains.” The company offers NLP technology. Gartner “recommends” Brainspace for “advanced text analytics for financial institutions.” That’s good. The company does not list too many financial institutions as customers on its home page, however.
Buzzlogix. This company’s focus appears to be squarely on social media. The idea is that the firm helps its customers “listen, learn, and act.” When I visited the Web site, the most recent “news” appeared in November 2015.
Clarabridge. The company focuses on understanding “customer needs, wants, and feelings.” The company provides the “world’s most comprehensive customer intelligence platform.”
Clustify. The company positions its text analytics tools for eDiscovery. The company’s most recent news release is dated January 2014 and addresses the Recommind championed predictive coding approach to figuring out what was what in text documents.
Connexor. The company offers “machinese” demonstrations of its capabilities. The most recent item on the company’s Web site is the April 2015 announcement of a free NLP Web service.
DatumBox. This company is a “machine learning framework” provider. It makes machine learning “simple.” The Web site offers a free API key, which knocks the local KFC manager out as a potential licensee. The company’s most recent blog post is dated March 16, 2016. The most recent release is 0.7.0.
Eaagle. This is a company focused on the “new frontier of effective customer relationship management, research, and marketing.” Customers include HermanMiller, Chubb, and Suncor Energy. Data sheets, white papers, and documentation are available and no registration is necessary. Eaagle maintains a low profile.
ExpertSystem. The company bought Temis, a firm based on some ideas in the mind of a former IBM wizard. ExpertSystem, a publicly traded company, is pursuing the pharmaceutical industry and performing independent text analyses of Melania Trump’s and Michelle Obama’s speeches. The two ladies exhibit strong linguistic differences. The company’s stock is trading at $1.81 a share, a bit below Alphabet Google, an outfit also in the text analytics game.
FICO (Fair Isaac Corporation). The company gives “you the power to make smarter decisions.” The company has tallied a number of acquisitions since 1992. Its most recent purchase was Quadmetrics, a predictive analytics company. FICO is publicly traded and the stock is trading at $115.60 a share.
Cognitum. The company asserts that one can “improve your business with the innovation leader in semantic technology.” The company’s main product is Fluent Editor and it offers flagship platform called Ontorion. The firm’s spelling of “scallable” on its home page caught my attention.
IBM. The focus was not on Watson in the listing. Instead, the write up identified IBM Content Analytics as the product to watch. IBM’s LanguageWare uses a range of techniques to process content. IBM is very much in the content processing game with Watson becoming the umbrella “brand.” IBM just tallied is 16th straight quarter of declining revenue.
Intellexer offers text analytics, information security, media content search, and reputation management. The company’s most recent news release, dated May 13, 2016, announces the new version of Conceptmeister “which analyzes text from a photo, cloud documents, and URL.” Essentially this software creates a summary of the source content.
KBSPortal. This company offers natural language processing as a software as a service or NLP as SAAS. A demonstration of the system processes Wikipedia content. A demo video is available. To view it, I was asked to sign in. I declined. The company provides its prices and explains what each component does. Kudos for that approach.
Keatext. The company focuses on “customer experience management.” The company offers a two week free trial of its system. The system incorporates natural language processing. The company’s explanation of what it does requires a bit of digging.
Lexalytics. Lexalytics is in the sentiment analysis business. The company’s capabilities include categorization and entity extraction. Social media monitoring can be displayed on dashboards. The company posts its prices. When I was involved in a procurement, Lexalytics prices, based on my recollection, were significantly higher than the fees quoted on this page. At one time, Lexalytics engaged in a merger or deal with Infonics. The company acquired Semantria a couple of years ago.
Leximancer. This Australian company’s software turns up in interesting places; for example, the US social security administration in Beltsville, Maryland. The firm’s “text in, insight out” technology emerged from research at the University of Queensland. The company was founded by UniQuest, a techohlogy commercialization company operated by the University of Queensland. The system is quite useful.
Linguamatics. This company has built a following in the pharmaceutical sector. The system does a good job processing academic and research information in ways which can influence certain lines of inquiry. The company now says that it offers the “world’s leading text mining platform.” the company was founded in 2001, and it has been moving along at a steady pace. Quite useful software and capabilities.
Linguasys. Surprised to see an installation profile. The outfit is maintaining a low profile.
Luminoso. The company provides “enterprise feedback and experience analytics.” The company has teamed with another Boston-area outfit, Basis Technologies, to form a marketing partnership. The angle the company seems to be promoting is that if you are using other systems, you can enhance them with text analytics.
MeaningCloud. Meaning cloud asserts that with its system one can “extract valuable information from any text source.” The company’s Text Classification API supports the Interactive Advertising Bureau’s “standard contextual taxonomy.” The focus seems to be on sentiment analysis like Lexalytics.