Exclusive Interview: Tom Reamy KAPS Group
February 27, 2013
You have encountered a special page. To continue, click here.
Another Palantir Push: But Little Hard Financial Data. Why Not?
February 23, 2013
I was reading about the TED Conference’s yo-yo presentation. My eye drifted across an expanse of cellulose and landed on “The Humane Way to Crack Terrorists.” (This link will go dead so be aware that you may have to pay to read the item online.) The subtitle was one of those News Corp. Google things: “Big data may make enhanced interrogation obsolete.” The source? Some minor blog from America’s hinterland, Silicon Valley? Nope. The Wall Street Journal, February 23, 2013, page C 12.
What’s the subject – really? The answer, in my opinion, Palantir. If you monitor the flagship, traditional media, Palantir has a solid track record of getting written about in print magazines. I suppose that the folks who have pumped about $150 million into the “big data” company read those magazines and the Wall Street Journal type publications each day. I know I do, and I am an addled goose in rural Kentucky, the high tech nerve center of the new industrial revolution. After February 28, 2013, I am not sure about the economy, however.
Here’s the passage I noted:
There’s a tellingly brief passage in “The Finish: The Killing of Osama bin Laden” by Mark Bowden. “The hunt for bin Laden and others eventually drew on an unfathomably rich database,” he writes. “Sifting through it required software capable of ranging deep and fast and with keen discernment—a problem the government itself proved less effective at solving than were teams of young software engineers in Silicon Valley. A startup called Palantir, for instance, came up with a program that elegantly accomplished what TIA [Terrorism Information Awareness program, set up in 2002] had set out to do.” When I met the chief executive and co-founder of Palantir, Alex Karp, recently, he was straightforward: “It is my personal belief that flawless data integration at any kind of scale, with a rigorous access control model, allows analysts to perform operations that are only intrusive on the data. They are not intrusive on human beings.” Obviously, Palantir doesn’t comment on classified work. But its technological phalanx—processing countless leads, from flight manifests to tapped phone calls, into one resource for people to interpret—is known to have been key in locating bin Laden. The company, founded in 2004, has large contracts across the intelligence community and is enterprise-wide at the FBI. Its first client was the CIA.
Nifty stuff. Palantir has high profile clients like intelligence and law enforcement outfits. But where is a hedge fund or a consumer products company? Allegedly the fancy math technology can work wonders. The implication is that outfits like Digital Reasoning, Recorded Future, and even Tibco are not in Palantir’s league. Oh, really? What about outfits like IBM and Oracle and SAS? Nah. Palantir seems to be where the good stuff happens in the context of this Wall Street Journal article.
In my view, the write up triggered several notes on my ubiquitous 4×6 paper note cards, just like the ones I used in high school debate competitions:
First, what about that legal dust up with i2 Group? Here’s a link to refresh one’s memory. I recall that there was also some disagreement, a few real media stories, and then a settlement regarding sector leader i2. Note: I did some work years ago for this out, which is now owned by IBM. Oh, and after the settlement silence. Just what was that legal dispute about anyway? The Wall Street Journal story does not touch on that obviously trivial issue related to the legal matter. Why not? The space in the newspaper was probably needed to cover the yo-yo guy.
Second, can software emulate the motion picture approach to reality? In my experience, numerical recipes can be useful, but they can also provide some points which are subject to contention. A recent example is the gentleman’s disagreement about an electric vehicle. Data, analyses, and interpretations—muddled. Not like the motion pictures’ tidiness and quite final end point. “The end” solves a lot of fictional problems. Life is less clear, a lot less clear in my experience.
Third, how is Palantir doing as a business? After all, the story ran in the Wall Street Journal, which is about business. I appreciate the references to a motion picture, but I am curious about how Palantir is doing on its march to generate a billion or more in revenues. At some point, the investors are going to look at the money pumped into Palantir, the time spent developing the magical technology which warrants metaphorical juxtaposition to Hollywood outputs, and the profitability of the company’s sales. Why doesn’t the Wall Street Journal do the business thing? Revenue, commercial customers, and case studies which do not flaunt words which Bing and Google love to consume in their indexing systems?
It is Saturday, and I suppose I there are lots of 20 somethings working at 0900 Eastern as I write this. They will fill the gap. I will have to wait. I wonder if the predictive algorithms from Palantir can tell me how long before hard facts become available?
One final question: If this Palantir type of system worked, why aren’t the firms in this Palantir-type software sector dominating in financial services, marketing, and consumer products? I wonder if the reason is that fancy math generates high expectations and then creates some situations in which reality does not work just like a cinema thriller?
Stephen E Arnold, February 23, 2013
IBM and Price Cuts: Is Watson a Factor?
February 17, 2013
I read “IBM Cuts Price of Watson Based Power Servers.” I have no clue if the story is correct, half current, or incorrect. What’s important is that CIOL.com thought the notion of a Watson related price cut newsworthy.
The Power7 based servers were hot stuff several years ago. CPU performance is no longer the gating factor as it was in the days of STAIRS III. Input output, memory subsystems, and various types of latency make a system fast or not. Heck, careless programming can make Google’s zippy boxes howl with pain when its innards suffer a computational cramp.
The write up asserts:
IBM will roll out eight new Power Systems for entry level starting at $5,947. The new systems include Power Express 710, 720, 730 and 740 family of products…. IBM will also introduce two new PowerLinux Systems – 7R1 and 7R2 – optimized for IBM InfoSphere BigInsights and InfoSphere Streams big data analytics software. The company will also introduce two new Power Systems – 750 and 760 – for midsized and large enterprises.
The hot item in the story in my opinion is this reference:
The new systems are based on IBM’s Watson system and are powered by its Power7+ microprocessor technology. These will enable users to build and deploy infrastructure for private and hybrid clouds, as per a release.
The write up includes the now obligatory baloney about the big data, cloud and caching tactics for performance.
If the story is incorrect, no big deal. Any publicity is good, even for a dog movie like “Heaven’s Gate” and its expensive roller skates. If the story is half correct, why is Watson making an appearance in juxtaposition to “entry level.” Is the vaunted Jeopardy winning technology not generating sufficient revenue to payback the development time and the sunk marketing costs? If the story is correct, I am interested in the fact that high end information technology has to be bundled at lower prices.
Years ago, I was told by an informed person that IBM knew what it was doing when it came to search and information retrieval. Maybe the company will come to dominate the enterprise market for big data, analytics, and smarter search. On the other hand, hasn’t IBM travelled this road before and yet the journey continues.
Stay tuned to Jeopardy or monitor the cancer related news stream. Watson is with us along with a Power7 chip which may be experiencing some symptoms of rheumatism.
Stephen E Arnold, February 17, 2013
Sinequa France: Update 2013
February 14, 2013
My research team was winnowing our archive of information about European search vendors. Since Martin White’s article for eContent in 2011, a number of changes have swept through the search and content processing sector. Some changes were significant; for example, HP’s stunning acquisition of Autonomy. Others were more modest; for example, the steady progress of such companies as Sinequa and Spotter, among others.
The European technical grip on search is getting stronger. Google is the dominant player in Web search. But in enterprise content processing, some European firms are moving more rapidly than their North American or Pacific Rim counterparts.
The Sinequa tag cloud. See http://www.sinequa.com/en/page/solutions/category-1.aspx
One interesting example is Sinequa, based in Paris. The company, like other French technology firms, has a staff of capable engineers and managers. However, unlike some other companies, Sinequa has continued to establish a track record as a company innovating in technology and capturing some important accounts; for example, Siemens, the German industrial powerhouse.
Sinequa’s approach is to emphasize that enterprise search has moved to unified information access. A number of companies make similar claims. Sinequa has established that its technology can deliver the type of one-stop access to structured and unstructured content that almost every vendor claims to deliver. You can get a useful overview of the architecture of the Sinequa platform at http://www.sinequa.com/en/page/product/product.aspx.
A relatively recent addition to the Sinequa.com Web site are case analysis videos. I find case examples extremely useful. The presentation of this type of information in rich media format makes it easier for me to get a sense of the value of the solution a vendor delivers. I found the Mercer video particularly interesting. You can find these testimonials at http://www.sinequa.com/en/page/clients/clients-video.aspx.
The trajectory of European search, content processing, and analytics vendors is difficult to plot in today’s uncertain economic climate. Sinequa warrants a close look for organizations seeking an integrated approach to its content assets. For more information about Sinequa’s current activities, tap into the firm’s blog at http://blog.sinequa.com/
Stephen E Arnold, February 14, 2013
Sponsored by EMRxNow, the information service which tracks automated indexing of electronic medical records
Change Comes to Attensity
February 14, 2013
Just as the demand for analytics is ascending, Attensity makes a management change. We learn the company recently named J. Kirsten Bay their head honcho in “Attensity Names New President/CEO,” posted at Destination CRM. The press release stresses the new CEO’s considerable credentials:
“Bay brings to Attensity nearly 20 years of strategic process and organizational policy experience derived from the information management, finance, and consumer product industries. She is an expert in advising both the public and private sector on the development of econometric policy models. Most recently, as vice president of commercial business with iSIGHT Partners, Bay provided strategic counsel to Fortune 500 companies on managing intelligence requirements and implementing customer and development programs to integrate intelligence into decision programs.”
The company’s flagship product Attensity Pipeline collects and semantically annotates data from social media and other online sources. From there, it passes to Attensity Analyze for text analytics and customer engagement suggestions.
Headquartered in Palo Alto, California, folks at Attensity pride themselves on the accuracy of their analytic engines and their intuitive reports. Rooted in their development of tools that serve the intelligence community, the company now provides semantic solutions to many Global 2000 companies and government agencies.
Cynthia Murrell, February 14, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
From Jeopardy to Cancer Treatment: An IBM Story
February 10, 2013
I read “IBM Supercomputer Watson to Help in Cancer Treatment.” I am burned out on the assertions of search, content processing, and analytics vendors. The algorithms predict, deliver actionable information, and answer tough questions. Okay, I will just believe these statements. Most of the folks with whom I interact either believe these statements or do not really care.
Watson, as you may know, takes open source goodness, layers on a knowledge base, and wraps the confection in layers of smart software. I am simplifying, but the reality is irrelevant given the marketing need.
Here’s the passage I noted:
A year ago, a team at Memorial Sloan-Kettering started working with an IBM and a WellPoint team to train Watson to help doctors choose therapies for breast and lung cancer patients. They continue to share their knowledge and expertise in oncology and information technology, beginning with hundreds of lung cancers, the aim being to help Watson learn as much as possible about cancer care and how oncologists use medical data, as well as their experiences in personalized cancer therapies. During this period, doctors and technology experts have spent thousands of hours helping Watson learn how to process, analyze and interpret the meaning of sophisticated clinical data using natural language processing; the aim being to achieve better health care quality and efficiency.
There you go. For the dozens of companies working to create next generation information retrieval systems which are affordable, actually work, and can be deployed without legions of engineers—game over. IBM Watson has won the search battle. Now for the optimists who continue to pump money into decade old search companies which have modest revenue growth, kiss those bucks goodbye. For the PhD students working on the revolutionary system which promises to transform findability, get a job at Kentucky Fried Chicken. And Google? Well, IBM knows your limits so stick to selling ads.
IBM is doing it all:
Manoj Saxena, IBM General Manager, Watson Solutions, said:
“IBM’s work with WellPoint and Memorial Sloan-Kettering Cancer Center represents a landmark collaboration in how technology and evidence based medicine can transform the way in which health care is practiced. breakthrough capabilities bring forward the first in a series of Watson-based technologies, which exemplifies the value of applying big data and analytics and cognitive computing to tackle the industry’s most pressing challenges.”
How different is Watson from the HP Autonomy, Recommind, or even the DR LINK technology? Well, maybe the open source angle is the same. But IBM needs to do more than make assertions and buy analytics companies as the company recycles open source technology in my opinion. I thought IBM was a consulting firm? Here I am wrong again. Watson probably “knew” that after hours of training, tuning, and talking. But in the back of my mind, I ask, “What if those training data are inapplicable to the problem at hand? What if the journal articles are fiddled by tenure seekers or even pharmaceutical outfits or institutions trying to maximize insurance payouts or careless record keeping by medical staff? Nah, irrelevant questions. IBM has this smart system nailed. Search solved. What’s next IBM?
Stephen E Arnold, February 10, 2013
eDiscovery: A Source of Thrills and Reduced Costs?
February 2, 2013
When I hear the phrase “eDiscovery”, I don’t get chills. I suppose some folks do. I read after dinner last night (February 1, 2013) “Letter From LegalTech: The Thrills of E-Discovery.” The author addresses the use of search and content processing technology to figure out which documents are most germane to a legal matter. Once the subset has been identified, eDiscovery provides outputs which “real” attorneys (whether in Bangalore or Binghamton) can use to develop their “logical” arguments.
A happy quack to
One interesting factoid bumps into my rather sharp assessment of the “size” of the enterprise search market generated by an azure chip out. The number was about $1.5 billion. In the eDiscovery write up, the author says:
Nobody seems to know how large the e-discovery market is — estimates range from 1.2 to 2.8 billion dollars — but everyone agree it’s not going anywhere. We’re never going back to sorting through those boxes of documents in that proverbial warehouse.
I like the categorical affirmative “nobody.” The point is that sizing any of the search and content processing markets is pretty much like asking Bernie Madoff type professionals, “How much in liquid assets do you have?” The answer is situational, enhanced by marketing, and believed without a moment’s hesitation.
I know the eDiscovery market is out there because I get lots of PR spam about various breakthroughs, revolutions, and inventions which promise to revolutionize figuring out which email will help a legal eagle win a case with his or her “logical” argument. I wanted to use the word “rational” in the manner of John Ralston Saul, but the rational attorneys are leaving the field and looking for work as novelists, bloggers, and fast food workers.
One company—an outfit called Catalyst Repository Systems—flooded me with PR email spam about its products. I called the company on January 31, 2013. I was treated in an offhand, suspicious manner by a tense, somewhat defensive young man named Mark, Monk, Matt, or Mump. At age 69, I have a tough time figuring out Denver accents. Mark, Monk, Matt, or Mump took my name and phone number. He assured me that his boss would call me back to answer my questions about PR spam and the product which struck me as a “me too.” I did learn that he had six years of marketing experience and that he just “push the send button.” I suggested that he may want to know to whom he is sending messages multiple times, he said, “You are being too aggressive.” I pointed out that I was asking a question just like the lawyers who, one presumes, gobbles up the Catalyst products. He took my name, did not ask how to spell it, wrote down my direct line and did not bother to repeat it back to me, and left me with the impression that I was out of bounds and annoying. That was amusing because I was trying hard to be a regular type caller.
A happy quack to Bitter Lawyer which has information about the pressures upon some in the legal profession. See http://www.bitterlawyer.com/i%E2%80%99m-unemployed-and-feel-ripped-off-by-my-ttt-law-school/
Mark, Monk, Matt, or Mump may have delivered the message and the Catalyst top dog was too busy to give me a jingle. Another possibility is that Mark, Monk, Matt, or Mump never took the note. He just wanted to get a person complaining about PR spam off the phone. Either way, Catalyst qualifies as an interesting example of what’s happening in eDiscovery. Desperation marketing has infected other subsectors of the information retrieval market. Maybe this is an attempt to hit in reality revenues of $1.5 billion?
Social Search: Don Quixote Is Alive and Well
January 18, 2013
Here I float in Harrod’s Creek, Kentucky, an addled goose. I am interested in other geese in rural Kentucky. I log into Facebook, using a faux human alias (easier than one would imagine) and run a natural language query (human language, of course). I peck with my beak on my iPad using an app, “Geese hook up 40027.” What do I get? Nothing, Zip, zilch, nada.
Intrigued I query, “modern American drama.” What do I get? Nothing, Zip, zilch, nada.
I give up. Social search just does not work under my quite “normal” conditions.
First, I am a goose spoofing the world as a human. Not too many folks like this on Facebook, so my interests and my social graph is useless.
Second, the key words in my natural language query do not match the Facebook patterns, crafted by former Googlers and 20 somethings to deliver hook up heaven and links to the semi infamous Actor’s Theater or the Kentucky Center.
Social search is not search. Social search is group centric. Social search is an outstanding system for monitoring and surveillance. For information retrieval, social search is a subset of information retrieval. How do semantic methods improve the validity of the information retrieved? I am not exactly sure. Perhaps the vendors will explain and provide documented examples?
Third, without context, my natural language queries shoot through the holes in the Swiss Cheese of the Facebook database.
After I read “The Future of Social Search,” I assumed that information was available at the peck of my beak. How misguided was I? Well, one more “next big thing” in search demonstrated that baloney production is surging in a ailing economy. Optimism is good. Crazy predictions about search are not so good. Look at the sad state of enterprise search, Web search, and email search. Nothing works exactly as I hope. The dust up between Hewlett Packard and Autonomy suggests that “meaning based computing” is a point of contention.
If social search does not work for an addled goose, for whom does it work? According to the wild and crazy write up:
Are social networks (or information networks) the new search engine? Or, as Steve Jobs would argue, is the mobile app the new search engine? Or, is the question-and-answer formula of Quora the real search 2.0? The answer is most likely all of the above, because search is being redefined by all of these factors. Because search is changing, so too is the still maturing notion of social search, and we should certainly think about it as something much grander than socially-enhanced search results.
Yep, Search 2.0.
But the bit of plastic floating in my pond is semantic search. Here’s what the Search 2.0 social crowd asserts:
Let’s embrace the notion that social search should be effortless on the part of the user and exist within a familiar experience — mobile, social or search. What this foretells is a future in which semantic analysis, machine learning, natural language processing and artificial intelligence will digest our every web action and organically spit out a social search experience. This social search future is already unfolding before our very eyes. Foursquare now taps its massive check in database to churn out recommendations personalized by relationships and activities. My6sense prioritizes tweets, RSS feeds and Facebook updates, and it’s working to personalize the web through semantic analysis. Even Flipboard offers a fresh form of social search and helps the user find content through their social relationships. Of course, there’s the obvious implementations of Facebook Instant Personalization: Rotten Tomatoes, Clicker and Yelp offer Facebook-personalized experiences, essentially using your social graph to return better “search” results.
Semantics. Better search results. How does that work on Facebook images and Twitter messages?
My view is that when one looks for information, there are some old fashioned yardsticks; for example, precision, recall, editorial policy, corpus provenance, etc.
When a clueless person asks about pop culture, I am not sure that traditional reference sources will provide an answer. But as information access is trivialized, the need for knowledge about the accuracy and comprehensiveness of content, the metrics of precision and recall, and the editorial policy or degree of manipulation baked into the system decreases.
See Advantech.com for details of a surveillance system.
Search has not become better. Search has become subject to self referential mechanisms. That’s why my goose queries disappoint. If I were looking for pizza or Lady Gaga information, I would have hit pay dirt with a social search system. When I look for information based on an idiosyncratic social fingerprint or when I look for hard information to answer difficult questions related to client work, social search is not going to deliver the input which keeps this goose happy.
What is interesting is that so many are embracing a surveillance based system as the next big thing in search. I am glad I am old. I am delighted my old fashioned approach to obtaining information is working just fine without the special advantages a social graph delivers.
Will today’s social search users understand the old fashioned methods of obtaining information? In my opinion, nope. Does it matter? Not to me. I hope some of these social searchers do more than run a Facebook query to study for their electrical engineering certification or to pass board certification for brain surgery.
Stephen E Arnold, January 18, 2013
Dr. Jerry Lucas: Exclusive Interview with TeleStrategies ISS Founder
January 14, 2013
Dr. Jerry Lucas, founder of TeleStrategies, is an expert in digital information and founder of the ISS World series of conferences. “ISS” is shorthand for “intelligence support systems.” The scope of Mr. Lucas’ interests range from the technical innards of modern communications systems to the exploding sectors for real time content processing. Analytics, fancy math, and online underpin Mr. Lucas’ expertise and form the backbone of the company’s training and conference activities.
What makes Dr. Lucas’ viewpoint of particular value is his deep experience in “lawful interception, criminal investigations, and intelligence gathering.” The perspective of an individual with Dr. Lucas’ professional career offers an important and refreshing alternative to the baloney promulgated by many of the consulting firms explaining online systems.
Dr. Lucas offered a more “internationalized” view of the Big Data trend which is exercising many US marketers’ and sales professionals’ activities. He said:
“Big Data” is an eye catching buzzword that works in the US. But as you go east across the globe, “Big Data” as a buzzword doesn’t get traction in the Middle East, Africa and Asia Pacific Regions if you remove Russia and China. One interesting note is that Russian and Chinese government agencies only buy from vendors based in their countries. The US Intelligence Community (IC) has big data problems because of the obvious massive amount of data gathered that’s now being measured in zettabytes. The data gathered and stored by the US Intelligence Community is growing beyond what typical database software products can handle as well as the tools to capture, store, manage and analyze the data. For the US, Western Europe, Russia and China, “Big Data” is a real problem and not a hyped up buzzword.
Western vendors have been caught in the boundaries between different countries’ requirements. Dr. Lucas observed:
A number of western vendors made a decision because of the negative press attention to abandon the global intelligence gathering market. In the US Congress Representative Chris Smith (R, NJ) sponsored a bill that went nowhere to ban the export of intelligence gathering products period. In France a Bull Group subsidiary, Amesys legally sold intelligence gathering systems to Lybia but received a lot of bad press during Arab Spring. Since Amesys represented only a few percent of Bull Group’s annual revenues, they just sold the division. Amesys is now a UAE company, Advanced Middle East Systems (Ames). My take away here is governments particularly in the Middle East, Africa and Asia have concerns about the long term regional presence of western intelligence gathering vendors who desire to keep a low public profile. For example, choosing not to exhibit at ISS World Programs. The next step by these vendors could be abandoning the regional marketplace and product support.
The desire for federated information access is, based on the vendors’ marketing efforts, is high. Dr. Lucas made this comment about the existence of information silos:
Consider the US where you have 16 federal organizations collecting intelligence data plus the oversight of the Office of Director of National Intelligence (ODNI). In addition there are nearly 30,000 local and state police organizations collecting intelligence data as well. Data sharing has been a well identified problem since 9/11. Congress established the ODNI in 2004 and funded the Department of Homeland Security to set up State and Local Data Fusion Centers. To date Congress has not been impressed. DNI James Clapper has come under intelligence gathering fire over Benghazi and the DHS has been criticized in an October Senate report that the $1 Billion spent by DHS on 70 state and local data fusion centers has been an alleged waste of money. The information silo or the information stovepipe problem will not go away quickly in the US for many reasons. Data cannot be shared because one agency doesn’t have the proper security clearances, job security which means “as long as I control access the data I have a job,” and privacy issues, among others.
The full text of the exclusive interview with Dr. Lucas is at http://www.arnoldit.com/search-wizards-speak/telestrategies-2.html. The full text of the 2011 interview with Dr. Lucas is at this link. Stephen E Arnold interviewed Dr. Lucas on January 10, 2013. The full text of the interview is available on the ArnoldIT.com subsite “Search Wizards Speak.”
Worth reading.
Donald Anderson, January 14, 2013
Semantria Goes Pentalingual
January 1, 2013
Semantria is a text analytics and sentiment analysis solutions company. In order to reach a new clientele as well as work with companies with an international base, “Semantria Announces Content Classification and Categorization Functionality in 5 Languages.” Semantria now speaks English, Spanish, French, German, and Portuguese.
To power its categorization functionality, Semantria uses the Concept Matrix. It is a large thesaurus that used Wikipedia in its beta phase. After digesting Wikipedia, the Concept Matrix created lexical connections between every concept within it. Semantria developed the technology with Lexalytics and the Lexalytics Salience 5 engine powers the Concept Matrix. The Concept Matrix is a one of a kind tool that organizes and classifies information:
“Seth Redmore, VP Product Management and Marketing at Lexalytics, explains; ‘Text categorization requires an understanding of how things are alike. Before the Concept Matrix, you’d have to use a massive amount of training data to “teach” your engine, i.e. ‘documents about food’.’ And, he continues, ‘With the Concept Matrix, the training’s already done, and by providing Semantria a few keywords, it drops your content into the correct categories.’ ”
A piece of software that does all the organizing for you, how amazing is that? If it “ate” Wikipedia and made lexical connections, what could it do with Google, Bing, the entire Internet?
Whitney Grace, January 01, 2013
Sponsored by ArnoldIT.com, developer of Augmentext