September 23, 2016
I read “Commentary: The US Army Should Rethink Its Approach to DCGS.” The write up is interesting because it helped me understand the relationships which exist between an elected official (Congressman Duncan Hunter, Republican from California) and a commercial enterprise (Palantir Technologies). Briefly: The Congressman believes the US Army should become more welcoming to Palantir Technologies’ Gotham system.
A representation of the Department of Defense’s integrated defense acquisition, technology, and life cycle management system.
The write up points out that the US Army is pretty good with tangible stuff: Trucks, weapons, and tanks. The US Army, however, is not as adept with the bits and the bytes. As a result, the US Army’s home brew Distributed Common Ground System is not sufficiently agile to keep pace with the real world. DCGS has consumed about $4 billion and is the product of what I call the “traditional government procurement.”
The Congressman (a former Marine) wants to US Army to embrace Palantir Gotham in order to provide a better, faster, and cheaper system for integrating different types of information and getting actionable intelligence.
US Marine Captain Duncan Hunter before becoming a Congressman. Captain Hunter served in Iraq and Afghanistan. Captain Hunter was promoted to major in 2012.
The write up informed me:
Congress, soldiers and the public were consistently misinformed and the high degree of dysfunction within the Army was allowed to continue for too long. At least now there is verification—through Army admittance—of the true dysfunction within the program.
Palantir filed a complaint which was promptly sealed. The Silicon Valley company appears to be on a path to sue the US Army because Palantir is not the preferred way to integrate information and provide actionable intelligence to US Army personnel.
The Congressman criticizes a series of procedures I learned to love when I worked in some of the large government entities. He wrote:
he Army and the rest of government should take note of the fact that the military acquisition system is incapable of conforming to the lightening pace and development targets that are necessary for software. This should be an important lesson learned and cause the Army—especially in light of repeated misleading statements and falsehoods—to rethink its entire approach on DCGS and how it incorporates software for the Army of the future.
The call to action in the write up surprised me:
The Army has quality leaders in Milley and Fanning, who finally understand the problem. Now the Army needs a software acquisition system and strategy to match.
My hunch is that some champions of Palantir Gotham were surprised too. I expected the Congressman to make more direct statements about Palantir Gotham and the problems the Gotham system might solve.
After reading the write up, I jotted down these observations:
- The DCGS system has a number of large defense contractors performing the work. One of them is IBM. IBM bought i2 Group. Before the deal with IBM, i2 sued Palantir Technologies, alleging that Palantir sought to obtain some closely held information about Analyst’s Notebook. The case was settled out of court. My hunch is that some folks at IBM have tucked this Palantir-i2 dust up away and reference it when questions about seamless integration of Gotham and Analyst’s Notebook arise.
- Palantir, like other search and content processing vendors, needs large engagements. The millions, if not billions, associated with DCGS would provide Palantir with cash and a high profile engagement. A DCGS deal would possibly facilitate sales of Gotham to other countries’ law enforcement and intelligence units.
- The complaint may evolve into actual litigation. Because the functions of Gotham are often used for classified activities, the buzz might allow high-value information to leak into the popular press. Companies like Centrifuge Systems, Ikanow, Zoomdata, and others would benefit from a more open discussion of the issues related to the functioning of DCGS and Gotham. From Palantir’s point of view, this type of information in a trade publication would not be a positive. For competitors, the information could be a gold mine filled with high value nuggets.
Net net: The Congressman makes excellent points about the flaws in the US Army procurement system. I was disappointed that a reference to the F 35 was not included. From my vantage point in Harrod’s Creek, the F 35 program is a more spectacular display of procurement goofs.
More to come. That’s not a good thing. A fully functioning system would deliver hardware and software on time and on budget. If you believe in unicorns, you will like me have faith in the government bureaucracy.
Stephen E Arnold, September 23, 2016
August 23, 2016
Search and retrieval technology finds a place in a “bot landscape.” The collection of icons appears in “Introducing the Bots Landscape: 170+ Companies, $4 Billion in Funding, Thousands of Bots.” The diagram of the bots landscape in the write up is, for me, impossible to read. I admit it does convey the impression of a lot of a bots. The high resolution version was also difficult for me to read. You can download a copy and take a gander yourself at this link. But there is a super high resolution version available for which one must provide a name and an email. Then one goes through a verification step. Clever marketing? Well, annoying to me. The download process required three additional clicks. Here it is. A sight for young eyes.
I was able to discern a reference to search and retrieval technology in the category labeled “AI Tools: Natural Language Processing, Machine Learning, Speech & Voice Recognition.” I was able to identity the logo of Fair Issacs and the mark of Zorro, but the other logos were unreadable by my 72 year old eyes.
The graphic includes these bot-agories too:
- Bots with traction
- Connectors and shared services
- Bot discover
- Bot developer frameworks and tools
The bot landscape is rich and varied. MBAs and mavens are resourceful and gifted specialists in classification. The fact that the categories are, well, a little muddled is less important than finding a way to round up so many companies worth so much money.
Stephen E Arnold, August 23, 2016
August 4, 2016
A year ago I read “20+ Text Mining and Text Analysis Tools.” The sale of Recommind to OpenText and the lack of excitement about search gave me an idea. Where are the companies identified by a mid tier consulting firm today. Let’s take a quick look.
AlchemyAPI. The company now asserts that its powers the “AI economy.” The Web sites has been updated since I last looked. There is a demo and a “free API key.” The system is now a platform. Gartner found the company to be a “cool vendor” in 2014. The company offers a webinar called “Building with Watson.”
Angoss. The company allows a customer to “predict, act, perform.” The focus is now on “customer intelligence in a single analytics tool.” The firm offers “knowledge” products and an insight optimizer.
Attensity. The company has undergone some change. The www.attensity.com Web site 404s. Years ago a text analytics cheerleader professed to be a fan. I think portions of the company operate under a different name in Germany. Appears to be in quiet mode.
Basis Technology. The company provided language reacted tools to outfits like Fast Search & Transfer. Someone told me that Basis dabbled in enterprise search. One high profile executive jumped to a company in Madrid.
Brainspace. The company’s Web site tells me, “We build brains.” The company offers NLP technology. Gartner “recommends” Brainspace for “advanced text analytics for financial institutions.” That’s good. The company does not list too many financial institutions as customers on its home page, however.
Buzzlogix. This company’s focus appears to be squarely on social media. The idea is that the firm helps its customers “listen, learn, and act.” When I visited the Web site, the most recent “news” appeared in November 2015.
Clarabridge. The company focuses on understanding “customer needs, wants, and feelings.” The company provides the “world’s most comprehensive customer intelligence platform.”
Clustify. The company positions its text analytics tools for eDiscovery. The company’s most recent news release is dated January 2014 and addresses the Recommind championed predictive coding approach to figuring out what was what in text documents.
Connexor. The company offers “machinese” demonstrations of its capabilities. The most recent item on the company’s Web site is the April 2015 announcement of a free NLP Web service.
DatumBox. This company is a “machine learning framework” provider. It makes machine learning “simple.” The Web site offers a free API key, which knocks the local KFC manager out as a potential licensee. The company’s most recent blog post is dated March 16, 2016. The most recent release is 0.7.0.
Eaagle. This is a company focused on the “new frontier of effective customer relationship management, research, and marketing.” Customers include HermanMiller, Chubb, and Suncor Energy. Data sheets, white papers, and documentation are available and no registration is necessary. Eaagle maintains a low profile.
ExpertSystem. The company bought Temis, a firm based on some ideas in the mind of a former IBM wizard. ExpertSystem, a publicly traded company, is pursuing the pharmaceutical industry and performing independent text analyses of Melania Trump’s and Michelle Obama’s speeches. The two ladies exhibit strong linguistic differences. The company’s stock is trading at $1.81 a share, a bit below Alphabet Google, an outfit also in the text analytics game.
FICO (Fair Isaac Corporation). The company gives “you the power to make smarter decisions.” The company has tallied a number of acquisitions since 1992. Its most recent purchase was Quadmetrics, a predictive analytics company. FICO is publicly traded and the stock is trading at $115.60 a share.
Cognitum. The company asserts that one can “improve your business with the innovation leader in semantic technology.” The company’s main product is Fluent Editor and it offers flagship platform called Ontorion. The firm’s spelling of “scallable” on its home page caught my attention.
IBM. The focus was not on Watson in the listing. Instead, the write up identified IBM Content Analytics as the product to watch. IBM’s LanguageWare uses a range of techniques to process content. IBM is very much in the content processing game with Watson becoming the umbrella “brand.” IBM just tallied is 16th straight quarter of declining revenue.
Intellexer offers text analytics, information security, media content search, and reputation management. The company’s most recent news release, dated May 13, 2016, announces the new version of Conceptmeister “which analyzes text from a photo, cloud documents, and URL.” Essentially this software creates a summary of the source content.
KBSPortal. This company offers natural language processing as a software as a service or NLP as SAAS. A demonstration of the system processes Wikipedia content. A demo video is available. To view it, I was asked to sign in. I declined. The company provides its prices and explains what each component does. Kudos for that approach.
Keatext. The company focuses on “customer experience management.” The company offers a two week free trial of its system. The system incorporates natural language processing. The company’s explanation of what it does requires a bit of digging.
Lexalytics. Lexalytics is in the sentiment analysis business. The company’s capabilities include categorization and entity extraction. Social media monitoring can be displayed on dashboards. The company posts its prices. When I was involved in a procurement, Lexalytics prices, based on my recollection, were significantly higher than the fees quoted on this page. At one time, Lexalytics engaged in a merger or deal with Infonics. The company acquired Semantria a couple of years ago.
Leximancer. This Australian company’s software turns up in interesting places; for example, the US social security administration in Beltsville, Maryland. The firm’s “text in, insight out” technology emerged from research at the University of Queensland. The company was founded by UniQuest, a techohlogy commercialization company operated by the University of Queensland. The system is quite useful.
Linguamatics. This company has built a following in the pharmaceutical sector. The system does a good job processing academic and research information in ways which can influence certain lines of inquiry. The company now says that it offers the “world’s leading text mining platform.” the company was founded in 2001, and it has been moving along at a steady pace. Quite useful software and capabilities.
Linguasys. Surprised to see an installation profile. The outfit is maintaining a low profile.
Luminoso. The company provides “enterprise feedback and experience analytics.” The company has teamed with another Boston-area outfit, Basis Technologies, to form a marketing partnership. The angle the company seems to be promoting is that if you are using other systems, you can enhance them with text analytics.
MeaningCloud. Meaning cloud asserts that with its system one can “extract valuable information from any text source.” The company’s Text Classification API supports the Interactive Advertising Bureau’s “standard contextual taxonomy.” The focus seems to be on sentiment analysis like Lexalytics.
July 24, 2016
I love the buzzword game. Move enterprise search to sentiment analysis. Then another player nudges forward “mood mapping.” I expect to hear “checkmate,” but I only notice the odor of baloney.
Navigate to “Move Over Sentiment Analysis, Mood Mapping Is Here.” I learned or was exposed to this passage:
“Sentiment analysis has long been the standard approach for understanding how consumers feel about a brand. While this approach sticks to aggregating positive, negative and neutral sentiments; in a world of colors, that’s like having only black, white and grey. With Mood Mapping, we have introduced a next-generation algorithm that helps marketers understand how consumers feel about a particular brand and how they are likely to act on their feelings, something that brand marketers care for very deeply,” Amarpreet Kalkat, co-founder-CEO, Frrole, told [the author, maybe?].
And the company is an award winner. I learned:
Founded in 2014, Frrole, a Microsoft Ventures Accelerator-incubated company, had raised an angel round of $2,45,000 in 2014. The profitable company expects to hit $1 million in revenue run-rate by the end of this year. The start-up was recognized as ‘Marketing Technologist of the Year’ at the Big Data and Analytics Summit 2016.
I think at the pre school near our home, every child gets an award whether a victor in a competition or not. Perhaps Microsoft will integrate mood mapping with LinkedIn in Office 365. That will be exciting when Word’s numbering function fails to count.
Stephen E Arnold, July 24, 2016
July 8, 2016
Another day, another merger. PR Newswire released a story, VirtualWorks and Language Tools Announce Merger, which covers Virtual Works’ purchase of Language Tools. In Language Tools, they will inherit computational linguistics and natural language processing technologies. Virtual Works is an enterprise search firm. Erik Baklid, Chief Executive Officer of VirtualWorks is quoted in the article,
“We are incredibly excited about what this combined merger means to the future of our business. The potential to analyze and make sense of the vast unstructured data that exists for enterprises, both internally and externally, cannot be understated. Our underlying technology offers a sophisticated solution to extract meaning from text in a systematic way without the shortcomings of machine learning. We are well positioned to bring to market applications that provide insight, never before possible, into the vast majority of data that is out there.”
This is another case of a company positioning themselves as a leader in enterprise search. Are they anything special? Well, the news release mentions several core technologies will be bolstered due to the merger: text analytics, data management, and discovery techniques. We will have to wait and see what their future holds in regards to the enterprise search and business intelligence sector they seek to be a leader in.
June 27, 2016
Trainspotting is a collection of short stories or a novel presented as a series of short stories by Irvine Welsh. The fun lovers in the fiction embrace avocations which seem to be addictive. The thrill is the thing. Now I think I have identified Palantir spotting.
Navigate to “Palantir Seeks to Muzzle Former Employees.” I am not too interested in the allegations in the write up. What is interesting is that the article is one of what appears to be of series of stories about Palantir Technologies enriched with non public documents.
The Thingverse muzzle might be just the ticket for employees who want to chatter about proprietary information. I assume the muzzle is sanitary and durable, comes in various sizes, and adapts to the jaw movement of the lucky dog wearing the gizmo.
Why use the phrase “Palantir spotting.” It seems to me that making an outfit which provides services and software to government entities is an unusual hobby. I, for example, lecture about the Dark Web, how to recognize recycled analytics algorithms and their assorted “foibles,” and how to find information in the new, super helpful Google Web search system.
Poking the innards of an outfit with interesting software and some wizards who might be a bit testy is okay if done with some Onion type or Colbert like humor. Doing what one of my old employers did in the 1970s to help ensure that company policies remain inside the company is old hat to me.
In the write up, I noted:
The Silicon Valley data-analysis company, which recently said it would buy up to $225 million of its own common stock from current and former staff, has attached some serious strings to the offer. It is requiring former employees who want to sell their shares to renew their non-disclosure agreements, agree not to poach Palantir employees for 12 months, and promise not to sue the company or its executives, a confidential contract reviewed by BuzzFeed News shows. The terms also dictate how former staff can talk to the press. If they get any inquiries about Palantir from reporters, the contract says, they must immediately notify Palantir and then email the company a copy of the inquiry within three business days. These provisions, which haven’t previously been reported, show one way Palantir stands to benefit from the stock purchase offer, known as a “liquidity event.”
Okay, manage information flow. In my experience, money often comes with some caveats. At one time I had lots and lots of @Home goodies which disappeared in a Sillycon Valley minute. The fine print for the deal covered the disappearance. Sigh. That’s life with techno-financial wizards. It seems life has not changed too much since the @Home affair decades ago.
I expect that there will be more Palantir centric stories. I will try to note these when they hit my steam powered radar detector in Harrod’s Creek. My thought is that like the protagonists in Trainspotting, Palantir spotting might have some after effects.
I keep asking myself this question:
How do company confidential documents escape the gravitational field of a comparatively secretive company?
The Palantir spotters are great data gatherers or those with access to the documents are making the material available. No answers yet. Just that question about “how”.
Stephen E Arnold, June 27, 2016
June 1, 2016
A few days ago, I stumbled upon a copy of a letter from the GAO concerning Palantir Technologies dated May 18, 2016. The letter became available to me a few days after the 18th, and the US holiday probably limited circulation of the document. The letter is from the US Government Accountability Office and signed by Susan A. Poling, general counsel. There are eight recipients, some from Palantir, some from the US Army, and two in the GAO.
Has the US Army put Palantir in an untenable spot? Is there a deus ex machina about to resolve the apparent checkmate?
The letter tells Palantir Technologies that its protest of the DCGS Increment 2 award to another contractor is denied. I don’t want to revisit the history or the details as I understand them of the DCGS project. (DCGS, pronounced “dsigs”, is a US government information fusion project associated with the US Army but seemingly applicable to other Department of Defense entities like the Air Force and the Navy.)
The passage in the letter I found interesting was:
While the market research revealed that commercial items were available to meet some of the DCGS-A2 requirements, the agency concluded that there was no commercial solution that could meet all the requirements of DCGS-A2. As the agency explained in its report, the DCGS-A2 contractor will need to do a great deal of development and integration work, which will include importing capabilities from DCGS-A1 and designing mature interfaces for them. Because the agency concluded that significant portions of the anticipated DCSG-A2 scope of work were not available as a commercial product, the agency determined that the DCGS-A2 development effort could not be procured as a commercial product under FAR part 12 procedures. The protester has failed to show that the agency’s determination in this regard was unreasonable.
The “importing” point is a big deal. I find it difficult to imagine that IBM i2 engineers will be eager to permit the Palantir Gotham system to work like one happy family. The importation and manipulation of i2 data in a third party system is more difficult than opening an RTF file in Word in my experience. My recollection is that the unfortunate i2-Palantir legal matter was, in part, related to figuring out how to deal with ANB files. (ANB is i2 shorthand for Analysts Notebook’s file format, a somewhat complex and closely-held construct.)
Net net: Palantir Technologies will not be the dog wagging the tail of IBM i2 and a number of other major US government integrators. The good news is that there will be quite a bit of work available for firms able to support the prime contractors and the vendors eligible and selected to provide for-fee products and services.
Was this a shoot-from-the-hip decision to deny Palantir’s objection to the award? No. I believe the FAR procurement guidelines and the content of the statement of work provided the framework for the decision. However, context is important as are past experiences and perceptions of vendors in the running for substantive US government programs.
May 19, 2016
I read “The Real Lesson for Data Science That is Demonstrated by Palantir’s Struggles · Simply Statistics.” I love write ups that plunk the word statistics near simple.
Here’s the passage I highlighted in money green:
… What is the value of data analysis?, and secondarily, how do you communicate that value?
I want to step away from the Palantir Technologies’ example and consider a broader spectrum of outfits tossing around the jargon “big data,” “analytics,” and synonyms for smart software. One doesn’t communicate value. One finds a person who needs a solution and crafts the message to close the deal.
When a company and its perceived technology catches the attention of allegedly informed buyers, a bandwagon effort kicks in. Talks inside an organization leads to mentions in internal meetings. The vendor whose products and services are the subject of these comments begins to hint at bigger and better things at conferences. Then a real journalist may catch a scent of “something happening” and writes an article. Technical talks at niche conferences generate wonky articles usually without dates or footnotes which make sense to someone without access to commercial databases. If a social media breeze whips up the smoldering interest, then a fire breaks out.
A start up should be so clever, lucky, or tactically gifted to pull off this type of wildfire. But when it happens, big money chases the outfit. Once money flows, the company and its products and services become real.
The problem with companies processing a range of data is that there are some friction inducing processes that are tough to coat with Teflon. These include:
- Taking different types of data, normalizing it, indexing it in a meaningful manner, and creating metadata which is accurate and timely
- Converting numerical recipes, many with built in threshold settings and chains of calculations, into marching band order able to produce recognizable outputs.
- Figuring out how to provide an infrastructure that can sort of keep pace with the flows of new data and the updates/corrections to the already processed data.
- Generating outputs that people in a hurry or in a hot zone can use to positive effect; for example, in a war zone, not get killed when the visualization is not spot on.
The write up focuses on a single company and its alleged problems. That’s okay, but it understates the problem. Most content processing companies run out of revenue steam. The reason is that the licensees or customers want the systems to work better, faster, and more cheaply than predecessor or incumbent systems.
The vast majority of search and content processing systems are flawed, expensive to set up and maintain, and really difficult to use in a way that produces high reliability outputs over time. I would suggest that the problem bedevils a number of companies.
Some of those struggling with these issues are big names. Others are much smaller firms. What’s interesting to me is that the trajectory content processing companies follow is a well worn path. One can read about Autonomy, Convera, Endeca, Fast Search & Transfer, Verity, and dozens of other outfits and discern what’s going to happen. Here’s a summary for those who don’t want to work through the case studies on my Xenky intel site:
Stage 1: Early struggles and wild and crazy efforts to get big name clients
Stage 2: Making promises that are difficult to implement but which are essential to capture customers looking actively for a silver bullet
Stage 3: Frantic building and deployment accompanied with heroic exertions to keep the customers happy
Stage 4: Closing as many deals as possible either for additional financing or for licensing/consulting deals
Stage 5: The early customers start grousing and the momentum slows
Stage 6: Sell off the company or shut down like Delphes, Entopia, Siderean Software and dozens of others.
The problem is not technology, math, or Big Data. The force which undermines these types of outfits is the difficulty of making sense out of words and numbers. In my experience, the task is a very difficult one for humans and for software. Humans want to golf, cruise Facebook, emulate Amazon Echo, or like water find the path of least resistance.
Making sense out of information when someone is lobbing mortars at one is a problem which technology can only solve in a haphazard manner. Hope springs eternal and managers are known to buy or license a solution in the hopes that my view of the content processing world is dead wrong.
So far I am on the beam. Content processing requires time, humans, and a range of flawed tools which must be used by a person with old fashioned human thought processes and procedures.
Value is in the eye of the beholder, not in zeros and ones.
Stephen E Arnold, May 19, 2016
April 19, 2016
I read an article in Jeff Bezos’ newspaper. The title was “We Analyzed the Names of Almost Every Chinese Restaurant in America. This Is What We Learned.” The almost is a nifty way of slip sliding around the sampling method which used restaurants listed in Yelp. Close enough for “real” journalism.
Using the notion of a frequency count, the write up revealed:
- The word appearing most frequently in the names of the sample was “restaurant.”
- The words “China” and “Chinese” appear in about 15,000 of the sample’s restaurant names
- “Express” is a popular word, not far ahead of “panda”.
The word list and their frequencies were used to generate a word cloud:
To answer the question where Chinese food is most popular in the US, the intrepid data wranglers at Jeff Bezos’ newspaper output a map:
Amazing. I wonder if law enforcement and intelligence entities know that one can map data to discover things like the fact that the word “restaurant” is the most used word in a restaurant’s name.
Stephen E Arnold, April 19, 2016
April 8, 2016
The chatter about smart is loud. I cannot hear the mixes on my Creamfields 2014 CD. Mozart, you are a goner.
If you want to cook up some smart algorithms to pick music or drive your autonomous vehicle without crashing into a passenger carrying bus, navigate to “Top 10 Machine Learning Algorithms.”
The write up points out that just like pop music, there is a top 10 list. More important in my opinion is the concomitant observation that smart software may be based on a limited number of procedures. Hey, this stuff is taught in many universities. Go with what you know maybe?
What are the top 10? The write up asserts:
- Linear regression
- Logistic regression
- Linear discriminant analysis
- Classification and regression trees
- Naive Bayes
- K nearest neighbors
- Learning vector quantization
- Support vector machines
- Bagged decision trees and random forest
- Boosting and AdaBoost.
The article tosses in a bonus too: Gradient descent.
What is interesting is that there is considerable overlap with the list I developed for my lecture on manipulating content processing using shaped or weaponized text strings. How’s that, Ms. Null?
The point is that when systems use the same basic methods, are those systems sufficiently different? If so, in what ways? How are systems using standard procedures configured? What if those configurations or “settings” are incorrect?
Stephen E Arnold, April 8, 2016