Business Intelligence: Optimism and Palantir

June 28, 2010

Business intelligence is in the news. Memex, the low profile UK outfit, sold to SAS. Kroll, another low profile operation, became part of Altegrity, anther organization with modest visibility among the vast sea of online experts. Now Palantir snags $90 million, which I learned in “Palantir: the Next Billion Dollar Company Raises $90 Million.” In the post financial meltdown world, there is a lot of money looking for a place that can grow more money. The information systems developed for serious intelligence analysis seem to be a better bet than funding another Web search company.

Palantir has some ardent fans in the US defense and intelligence communities. I like the system as well. What is fascinating to me is that smart money believes that there is gold in them there analytics and visualizations. I don’t doubt for a New York minute that some large commercial organizations can do a better job of figuring out the nuances in their petabytes of data with Palantir-type tools. But Palantir is not exactly Word or Excel.

The system requires an understanding of such nettlesome points as source data, analytic methods, and – yikes – programmatic thinking. The outputs from Palantir are almost good enough for General Stanley McChrystal to get another job. I have seen snippets of some really stunning presentations featuring Palantir outputs. You can see some examples at the Palantir Web site or take a gander (no pun intended by the addled goose) at the image below:

image

Palantir is an open platform; that is, a licensee with some hefty coinage in their knapsack can use Palantir to tackle the messy problem of data transformation and federation. The approach features dynamic ontologies, which means that humans don’t have to do as much heavy lifting as required by some of the other vendors’ systems. A licensee will want to have a tame rocket scientist around to deal with the internals of pXML, the XML variant used to make Palantir walk and talk.

You can poke around at these links which may go dark in a nonce, of course: https://devzone.palantirtech.com/ and https://www.palantirtech.com/.

Several observations:

  • The system is expensive and requires headcount to operate in a way that will deliver satisfactory results under real world conditions
  • Extensibility is excellent, but this work is not for a desk jockey no matter how confident that person in his undergraduate history degree and Harvard MBA
  • The approach is industrial strength which means that appropriate resources must be available to deal with data acquisition, system tuning, and programming the nifty little extras that are required to make next generation business intelligence systems smarter than a grizzled sergeant with a purple heart.

Can Palantir become a billion dollar outfit? Well, there is always the opportunity to pump in money, increase the marketing, and sell the company to a larger organization with Stone Age business intelligence systems. If Oracle wanted to get serious about XML, Palantir might be worth a look. I can name some other candidates for making the investors day, but I will leave those to your imagination. Will you run your business on a Palantir system in the next month or two? Probably not.

Stephen E Arnold, June 27, 2010

Freebie

Real Time Search Systems, Part 4

June 24, 2010

Editor’s note: In this final snippet from my June 15 and June 17, 2010, lectures, I want to relate the challenge of real-time content to the notion of “aboutness.” An old bit of jargon, I have appropriated the term to embrace the semantic methods necessary to add context to information generated by individuals using such systems as blogging software, Facebook, and Twitter. These three content sources are representative only, and you can toss in any other ephemeric editorial engine you wish. The “aboutness” challenge is that a system must process activity and content. “Activity” refers to who did what when and where. The circumstances are useful as well. The “content” reference refers to the message payload. Appreciate that some message payloads my be rich media, disinformation, or crazy stuff. Figuring out which digital chunk has value for a particular information need is a tough job. No one, to my knowledge, has it right. Heck, people don’t know what “real time” means. The more subtle aspects of the information objects are not on the radar for most of the people in the industry with whom I am acquainted.

Semantics

I hate defining terms. There is always a pedant or a frustrated PhD eager to set me straight. Here’s what I mean when I use the buzzword “semantic”. A numerical recipe figures out what something is about. Other points I try to remember to mention include:

  • Algorithms or humans or both looking at messages, trying to map content to concepts or synonyms
  • Numerical recipes that send content through a digital rendering plant in order to process words, sentences, and documents and add value to the information object
  • Figure out or use probabilities to take a stab at the context for an information object
  • Spit out Related Terms, or Use For Terms
  • Occupy PhD candidates, Googlers, and 20-something MBAs in search of the next big thing
  • A discussion topic for a government committees nailing down the concept before heading out early on a Friday afternoon.

When semantics is figured out and applied, the meaning of Lady Gaga becomes apprehendable to a goose like me:

image

In order to tackle the semantics of a real time content object, two types of inputs are needed: activities or monitoring the who does what and when. The other is the information object itself. When the real time system converts digital pork into a high value wiener, the metadata and the content representation become more valuable than the individual content objects. This is an important concept, and I am not going to go into detail. I will show you the index / content representation diagram I used in my lectures:

image

The nifty thing is that when a system or a human beats on the index / content representation, the amount of real time information increases. The outputs become inputs to the index / content representation. The idea is that as the users beat on the index / content representation, the value of the metadata goes up.

Read more

Real Time Search Systems, Part 3

June 23, 2010

Editor’s note: This is the draft taxonomy of real time systems that I discussed in my June 15 and 17 lectures. It may or may not make sense, but I wanted to make clear that the broad use of the phrase “real time” does not convey much meaning to me. The partial fix, short of incarcerating the marketers who slap “real time” on their brochures, is to come up with “types” of real time information. The type helps make clear the cost and other characteristic features of a system sporting the label “real time”.

Stop and think about the difference in user expectations between an investment firm and a middle school child processing information. The greed mongers want to get the freshest information possible to make the maximum return on each bet or investment. The middle school kid wants to make fun of a teacher.

The greed mongers spend millions for Fancy Systems from Thomson Reuters, Exegy, or a similar specialist. The reason is that if the Morgan Stanley Type As get bond information a few milliseconds after the God loving folks at Goldman, lots of dough can slip through the clutching paws of the person responsible for a trade. With a great deal at stake, real time means in milliseconds.

The middle school wit is happy with whatever happens as long as the teacher remains blissfully ignorant of the message. If the recipient lets out a hoot, then there may be consequences, but the downside is less painful than what happens to the crafty Wall Street wonder.

The figure below presents the draft taxonomy. If you find it silly, no problem. If you rip it off, a back link would be a nice gesture, but I don’t have any illusions about how stateless users conduct themselves.

image

Where does the latency originate? The diagram below provides the tech sleuth with some places to investigate. The lack of detail is intentional. Free blog, remember?

Read more

Real Time Search Systems, Part 1

June 21, 2010

Editor’s note: For those in the New Orleans real time search lecture and the Madrid semantic search talk, I promised to make available some of the information I discussed. Attendees are often hungry to have a take away, and I want to offer a refrigerator magnet, not the cruise ship gift shop. This post will provide a summary of the real time information services I mentioned. The group focuses on content processed from such services as Facebook, Twitter, blogs, and other geysers of digital confetti. A subsequent blog post will present the basics of my draft taxonomy of real time search. I know that most readers will kick the candy bar wrapper into the gutter. If you are one of the folks who picks up the taxonomy, a credit line would make the addled goose feel less like a down pillow and more like a Marie Antoinette pond ornament.

What’s Real Time Search?

Ah, gentle reader, real time search is marketing baloney. Life has latency. You call me on the phone and days, maybe weeks go by, and I don’t return the call. In the digital world, you get an SMS and you think it was rocketed to you by the ever vigilant telecommunications companies. Not exactly. In most cases, unless you conduct a laboratory test between mobiles on different systems, capturing the transmit time, the receiving time, and other data points such as time of day, geolocation, etc., you don’t have a clue what the latency between sending and receiving. Isn’t it easier to assume that the message was sent instantly. When you delve into other types of information, you may discover that what you thought was real time is something quite different. The “check is in the mail” applies to digital information, index updating, query processing, system response time, and double talk from organizations too cheap or too disorganized to do much of anything quickly. Thus, real time is a slippery fish.

Real Time Search Systems

Why do I use the phrase “real time”? I don’t have a better phrase at hand. Vendors yap about real time and a very, very few explain exactly what their use of the phrase means. One outfit that deserves a pat on the head is Exalead. The company explains that in an organization, most information is available to an authorized user no less than 15 minutes after the Exalead system becomes aware of the data. That’s fast, and it beats the gym shorts of many other vendors. I would love to pinpoint the turtles, but my legal eagle cautions me that this type of sportiness will get me a yellow card. Figure it out for yourself is the sad consequence.

Here’s the list of the systems I identified in my lectures. I don’t work for any of these outfits, and I use different services depending on my specific information needs. You are, therefore, invited to run sample queries on these services or turn to one of the “real” journalists for their take. If you have spare cash and found yourself in the lower quartile of your math class, you may find that an azure chip consultant is just what you need to make it in the crazy world of online information.

In my lectures I made four points about these types of real-time search services.

First, each of these services did at the time of my talks deliver more useful and comprehensive results than the “real time search” services from the Big Gals in the Web search game; namely, Google, Microsoft Bing, and Yahoo. Yahoo, I pointed out, doesn’t do real time search itself. Yahoo has a deal with the OneRiot.com outfit. The service is useful and I suppose I could stick it in the list above, but I am just cutting and pasting from the PowerPoint decks I used as crutches and dogs in my lecture.

Read more

Podcast Interview with Paul Doscher, Part 2: The Exalead Technology

June 21, 2010

Exalead’s Paul Doscher talks about Exalead’s technology on the June 21, 2010, ArnoldIT Beyond Search podcast. Exalead has been growing rapidly, landing blue-chip accounts with the largest technology company in North America, the French postal service, and Canada’s Urbanizer.com. In this podcast, Mr. Doscher talks about Exalead’s technical approach to content processing and the framework that makes search-based applications crack tough problems in information access. You can listen to the podcast on the ArnoldIT.com Web site. More information about Exalead is available from www.exalead.com. The ArnoldIT podcast series extends the Search Wizards Speak series of interview beyond text into rich media. Watch this blog for announcements about other rich media programs from the professionals who move information retrieval beyond search.

Stephen E Arnold, June 21, 2010

Sponsored by Stephen E. Arnold

A Smidgen of Dataspace Seeps from the Google

June 18, 2010

My favorite dataspace guru and his colleagues have nailed a US patent. I am speaking of Dr. Alon Halevy and his clutch of merry Googlers, Jayant Madhavan  and David Ko. I know the azure chip set is excited about Google’s ability to pinpoint gaps in media coverage and methods for sucking the filling out of Wi Fi content. I am not. Nope, I keep my eye on the ball. In this case the system and method disclosed in US7,739,258, “Facilitating Searches through Content Which Is Accessible through Web-Based Forms.” These are not your mother’s social security forms. With some prior patents in the possession of sporty outfits like AT&T and Lucent, the Googlers have to tiptoe through the dataspace. I was going to write a lyric for Tiny Tim to sing: something along the line of “Tiptoe through the manifold with me.” But I won’t.

Here’s the crystal clear prose from the Google wizard and legal eagle team:

One embodiment of the present invention provides a system that facilitates crawling through web-based forms to gather information to facilitate subsequent searches through content which is accessible though the web-based forms. During operation, the system first obtains web-based forms to be searched. Note that the system can obtain these web-based forms from a number of sources. For example, the system can crawl through web sites to identify web-based forms, the system can receive manually provided web-based forms, or the system can find web-based forms through methods other than crawling. Next, the system creates database entries for the identified forms. This involves obtaining and storing metadata describing the identified forms into database entries and then storing these database entries in a form database to facilitate searches through content which is accessible through the identified forms. Note that this form database can include a web index and associated documents, which can be used to facilitate web search queries that return both ordinary documents and documents that result from form queries.

My view? Important.

Stephen E Arnold, June 18, 2010

Freebie

Attensity SAS Staff Shuffle

June 16, 2010

I learned recently that SAS lost Manya Mayes to Attensity. No big deal, but Ms Mayes had been at SAS for 15 years. Attensity seems to be serious about its text analytics business. You can get more information in the write up “Attensity Group Appints Manya Mayes as Director of Advanced Analytics.” Here’s what an Attensity officer said about the new hire:

Her SAS expertise and customer and product experience will be a great asset to the Attensity team. Her addition will build on Attensity’s current analytic capabilities, bringing advanced analytics expertise to the team.”

A couple of thoughts. I wonder why Ms. Mayes is not an officer of Attensity. Second, will SAS push back and make some noise about competition? When Google hired an Endeca expert in eCommerce, I received email suggesting that any connection between the Endeca hire and Google’s aspirations in markets where Endeca has a presence was silly.

That’s a silly goose for you I suppose.

Stephen E Arnold, June 16, 2010

Freebie

Exalead Acquired by Dassault

June 11, 2010

I have done some work for Exalead over the last five years, and I have gone down in history as one of the few people from Kentucky to talk my way into the Exalead offices in Paris without an appointment. L’horreur. I had a bucket of KY Fry in my hand and was guzzling a Coca Lite.

Out of that exciting moment in American courtesy, I met François Bourdoncle, a former AltaVista.com wizard. He watched in horror as I gobbled a crispy leg and asked him about the origins of Exalead, his work with then-Googler Louis Monier, and his vision for 64 bit computing. I wrote up some of the information in the first edition of the Enterprise Search Report, a publication now shaped into a quasi-New Age Cliff’s Notes for the under 30 crowd. I followed up with M. Bourdoncle in February 2008, and published that interview as part of the ArnoldIT.com Search Wizards Speak series. The last time I was in Paris, I dropped by the Exalead offices and had a nice chat. I even made a video. Several Exaleaders took me to dinner, pointing out that McDo was not an option. Rats.

image

So what’s with the sale of Exalead to Dassault Systèmes?

The azure chip crowd has weighed in, and I will ignore those observations. There is some spectacular baloney being converted into expensive consulting burgers, and I will leave you and them to your intellectual picnic.

Here’s my take:

Differentiator

There are lots of outfits asserting that their search and content processing system will work wonders. I don’t want to list these companies, but you can find them by navigating either to Google.com or Exalead.com/search and running a query for enterprise search. The problem is that most of these outfits come with what I call an “interesting history.” Examples range from natural language processing companies that have been created from the ashes of not-so-successful search vendors to Frankenstein companies created with “no cash mergers.” I know. Wild, right. Other companies have on going investigations snapping like cocker spaniels at their heels. A few are giant roll ups, in effect, 21st century Ling Temco Vought clones. A few are delivering solid value for specific applications. I can cite examples in XML search, eDiscovery, and enhancements for the Google constructs. (Okay, I will mention my son’s company, Adhere Solutions, a leader in this Google space.)

The point for me is that Exalead combined a number of working functions into a platform. The platform delivers search enabled applications; that is, the licensee has an information problem and doesn’t know how to cope with costs, data flows, and the need for continuous index updating. The Exalead technology makes it easy to suck in information and give different users access to the information they need to do their job. For some Exalead customers, the solution allows people to track packages and shipments. For other licensees, the Exalead technology sucks in information and generates reports in the form of restaurant reviews or competitive profiles. The terminology is less important than solving the problem.

That’s a key differentiator.

Technology

Google and Exalead were two outfits able to learn from the mistakes at AltaVista.com. Early on I learned that the founder of Exalead could have become a Googler. The reason Exalead exists is that M. Bourdoncle wanted to build a French company in France without the wackiness that goes along with tackling this mission in the US of A. Americans don’t fully understand the French, and I can’t do much more than remind you, gentle reader, that French waiters behave a certain way because of the “approach” many Americans make to the task of getting a jambon sandwich and a bottle of water.

I understood that M. Bourdoncle wanted to do the job his way, and he focused on coding for a 64 bit world when there were few 64 bit processors in the paws of enterprise information technology departments. He tackled a number of tough technical problems in order to make possible high performance, low cost scaling, and mostly painless tailoring of the system to information problems, not just search. Sure, search is part of the DNA, but Exalead has connectors, text to voice, image recognition, etc. And, happily, Exalead’s approach plays well with other enterprise systems. Exalead can add value with less engineering hassles than some of the firm’s competitors can. Implementation can be done in days or weeks, and sometimes months, not years like some vendors require.

So the plumbing is good.

That’s a high value asset.

Read more

Exclusive Interview with Seth Grimes, Alta Plana Now Available

June 9, 2010

In April 2010, I spoke with Seth Grimes, the founder of Alta Plana. Mr. Grimes is an analytics strategy consultant. He is founding chair of the Sentiment Analysis Symposium and of the Text Analytics Summit, contributing editor at Intelligent Enterprise magazine, and text analytics channel expert at the Business Intelligence Network. He founded Washington DC-based Alta Plana Corporation in 1997. Mr. Grimes consults, writes, and speaks on information-systems strategy, data management and analysis systems, industry trends, and emerging analytical technologies.

In the interview, he highlighted one of the challenges search and content processing systems face. He said:

I’ve in the past characterized search as evidence of a failure of design. If information were correctly and adequately categorized and organized and made accessible, we wouldn’t need search, would we?  I’ve retreated from that view as I’ve seen search evolve into information access, into technology that not only finds but also organizes results from sources the user likely-as-not didn’t know about.  Yet I’d call my statement still largely true when it comes to the enterprise’s own data holdings: Search is necessitated by a failure of design.  Do a better job organizing information as it’s created or acquired, and also, by the way, stop allowing application vendors to bring in siloed search applications, and the in-organization situation will improve.

To read the full interview, navigate to Search Wizards Speak and click on the Seth Grimes’ interview or click this link.

Stephen E Arnold, June 9, 2010

Unsponsored post.

Business Intelligence Firm Sells for $1.13 Billion

June 8, 2010

Kroll is not a household name. If your house includes an intelligence or police professional, you may have a Kroll T shirt somewhere. The company was part of Marsh McLennan, an outfit that looks like an insurance company. I am not going to sort our what Marsh’s business interests are or explain the Kroll set up. You can get some information in this interview with a Kroll executive. No, don’t ask how I know him.

Kroll is now part of another outfit you probably never heard about either: Altegrity. Read “Kroll-Altegrity: A Reunion of Sorts.” Why’s this important? I think other outfits in this market sector hope to be acquired. In my addled goose view, I don’t think the Marsh executives were sad to see Kroll say adios. There was the money, and the management effort to be in the Kroll line of work is demanding.

Who else is in the Kroll business? Sorry. Not for free.

Stephen E Arnold, June 8, 2010

Freebie.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta