Business Intelligence: Optimism and Palantir

June 28, 2010

Business intelligence is in the news. Memex, the low profile UK outfit, sold to SAS. Kroll, another low profile operation, became part of Altegrity, anther organization with modest visibility among the vast sea of online experts. Now Palantir snags $90 million, which I learned in “Palantir: the Next Billion Dollar Company Raises $90 Million.” In the post financial meltdown world, there is a lot of money looking for a place that can grow more money. The information systems developed for serious intelligence analysis seem to be a better bet than funding another Web search company.

Palantir has some ardent fans in the US defense and intelligence communities. I like the system as well. What is fascinating to me is that smart money believes that there is gold in them there analytics and visualizations. I don’t doubt for a New York minute that some large commercial organizations can do a better job of figuring out the nuances in their petabytes of data with Palantir-type tools. But Palantir is not exactly Word or Excel.

The system requires an understanding of such nettlesome points as source data, analytic methods, and – yikes – programmatic thinking. The outputs from Palantir are almost good enough for General Stanley McChrystal to get another job. I have seen snippets of some really stunning presentations featuring Palantir outputs. You can see some examples at the Palantir Web site or take a gander (no pun intended by the addled goose) at the image below:

image

Palantir is an open platform; that is, a licensee with some hefty coinage in their knapsack can use Palantir to tackle the messy problem of data transformation and federation. The approach features dynamic ontologies, which means that humans don’t have to do as much heavy lifting as required by some of the other vendors’ systems. A licensee will want to have a tame rocket scientist around to deal with the internals of pXML, the XML variant used to make Palantir walk and talk.

You can poke around at these links which may go dark in a nonce, of course: https://devzone.palantirtech.com/ and https://www.palantirtech.com/.

Several observations:

  • The system is expensive and requires headcount to operate in a way that will deliver satisfactory results under real world conditions
  • Extensibility is excellent, but this work is not for a desk jockey no matter how confident that person in his undergraduate history degree and Harvard MBA
  • The approach is industrial strength which means that appropriate resources must be available to deal with data acquisition, system tuning, and programming the nifty little extras that are required to make next generation business intelligence systems smarter than a grizzled sergeant with a purple heart.

Can Palantir become a billion dollar outfit? Well, there is always the opportunity to pump in money, increase the marketing, and sell the company to a larger organization with Stone Age business intelligence systems. If Oracle wanted to get serious about XML, Palantir might be worth a look. I can name some other candidates for making the investors day, but I will leave those to your imagination. Will you run your business on a Palantir system in the next month or two? Probably not.

Stephen E Arnold, June 27, 2010

Freebie

Real Time Search Systems, Part 4

June 24, 2010

Editor’s note: In this final snippet from my June 15 and June 17, 2010, lectures, I want to relate the challenge of real-time content to the notion of “aboutness.” An old bit of jargon, I have appropriated the term to embrace the semantic methods necessary to add context to information generated by individuals using such systems as blogging software, Facebook, and Twitter. These three content sources are representative only, and you can toss in any other ephemeric editorial engine you wish. The “aboutness” challenge is that a system must process activity and content. “Activity” refers to who did what when and where. The circumstances are useful as well. The “content” reference refers to the message payload. Appreciate that some message payloads my be rich media, disinformation, or crazy stuff. Figuring out which digital chunk has value for a particular information need is a tough job. No one, to my knowledge, has it right. Heck, people don’t know what “real time” means. The more subtle aspects of the information objects are not on the radar for most of the people in the industry with whom I am acquainted.

Semantics

I hate defining terms. There is always a pedant or a frustrated PhD eager to set me straight. Here’s what I mean when I use the buzzword “semantic”. A numerical recipe figures out what something is about. Other points I try to remember to mention include:

  • Algorithms or humans or both looking at messages, trying to map content to concepts or synonyms
  • Numerical recipes that send content through a digital rendering plant in order to process words, sentences, and documents and add value to the information object
  • Figure out or use probabilities to take a stab at the context for an information object
  • Spit out Related Terms, or Use For Terms
  • Occupy PhD candidates, Googlers, and 20-something MBAs in search of the next big thing
  • A discussion topic for a government committees nailing down the concept before heading out early on a Friday afternoon.

When semantics is figured out and applied, the meaning of Lady Gaga becomes apprehendable to a goose like me:

image

In order to tackle the semantics of a real time content object, two types of inputs are needed: activities or monitoring the who does what and when. The other is the information object itself. When the real time system converts digital pork into a high value wiener, the metadata and the content representation become more valuable than the individual content objects. This is an important concept, and I am not going to go into detail. I will show you the index / content representation diagram I used in my lectures:

image

The nifty thing is that when a system or a human beats on the index / content representation, the amount of real time information increases. The outputs become inputs to the index / content representation. The idea is that as the users beat on the index / content representation, the value of the metadata goes up.

Read more

Real Time Search Systems, Part 3

June 23, 2010

Editor’s note: This is the draft taxonomy of real time systems that I discussed in my June 15 and 17 lectures. It may or may not make sense, but I wanted to make clear that the broad use of the phrase “real time” does not convey much meaning to me. The partial fix, short of incarcerating the marketers who slap “real time” on their brochures, is to come up with “types” of real time information. The type helps make clear the cost and other characteristic features of a system sporting the label “real time”.

Stop and think about the difference in user expectations between an investment firm and a middle school child processing information. The greed mongers want to get the freshest information possible to make the maximum return on each bet or investment. The middle school kid wants to make fun of a teacher.

The greed mongers spend millions for Fancy Systems from Thomson Reuters, Exegy, or a similar specialist. The reason is that if the Morgan Stanley Type As get bond information a few milliseconds after the God loving folks at Goldman, lots of dough can slip through the clutching paws of the person responsible for a trade. With a great deal at stake, real time means in milliseconds.

The middle school wit is happy with whatever happens as long as the teacher remains blissfully ignorant of the message. If the recipient lets out a hoot, then there may be consequences, but the downside is less painful than what happens to the crafty Wall Street wonder.

The figure below presents the draft taxonomy. If you find it silly, no problem. If you rip it off, a back link would be a nice gesture, but I don’t have any illusions about how stateless users conduct themselves.

image

Where does the latency originate? The diagram below provides the tech sleuth with some places to investigate. The lack of detail is intentional. Free blog, remember?

Read more

Real Time Search Systems, Part 2

June 22, 2010

Editor’s note: This post tiptoes through the tulips. In this instance, tulips is a synonym for industrial strength content processing systems that can be licensed by commercial entities. governmental organizations, or individuals who want to become a baby Fuld or Kroll. Achieving this type of azure chip transcendence means that you will be a hit at the local bingo parlor when you share your insights with your table mates.

Industrial Strength Tools

The free services don’t provide the user with much in the way of post processing horsepower. Another weakness of free services is that the average user deals with what each system spits out in response to a click or a query. The industrial strength systems provide such functions as:

A system or method for “plugging” in different streams of content. Examples range from electronic mail in the wonderful Microsoft Exchange Server to proprietary content stuffed into a clunky content management system. These connectors are a big deal because without different inputs of content, a real time search engine does not have the wood to burn in the fire box.

Each system provides or supports some type of software circuit board. The idea is that the content moves from the connectors over the circuits on the circuit board to its destination. Acquired content must be processed so its first destination is a system or systems which extract data, generate metadata, and, in the case of Google, figures out the context of the message. The result is an index that contains index terms, metadata, and often such extras as a representation of the source message, precalculated values, and new information constructs.

Applications or “hooks” that make it possible for another software program to tap into the generated values and processed content to create an output. Now the outputs can vary widely. Another software system may just look up an item. Another software application might glue together different items from the index and content representation. The user sees a report, a display on a mobile phone, or maybe a mashup which allows the human to “recognize” or “spot” what’s needed. No searching required.

The Vendors

In my lectures I mentioned some different outfits in each of my two talks. I have rolled up the vendors in the list below. My suggestion is to do some research about each of these companies. I provide “additional color” on the technologies each vendor licenses, but that information is not going to find its way into a free blog posting. Problem? Read the About information available from the tab at the top of this page.

  • Exalead http://www.exalead.com Robust system which handles structured and unstructured data. Outputs may be piped to other enterprise software, a report, or a peripatetic worker with a mobile phone in Starbucks.
  • Fetch Technologies http://fetch.com Developed initially for certain interesting government information needs, you can customize Fetch using its graphical programming method and perform some quite useful analyses
  • JackBe http://www.jackbe.com Developed initially for certain interesting government information needs, you can license JackBe and process a wide range of content.
  • Silobreaker http://www.silobreaker.com Developed initially for certain interesting government information needs, you can output reports that are as good as the roll ups crafted by a trained intelligence professional.

What do these systems do in “real time?” Each of them, when properly resourced, can ingest flows of data and unstructured content, assign metadata, and output alerts, reports, or Google-style search results within minutes of the content becoming known to the system.

Read more

Endeca and Business Intelligence

June 14, 2010

We are ready for a long vacation here at the goose pond. Most search and content processing companies are plotting their fall marketing campaigns. For Endeca, the time is right for cranking the knob on the firm’s marketing activities. A recent example is “One on One with Paul Sonderegger”. Mr. Sonderegger is an Endeca executive who makes frequent appearances at conferences and in some cases represents the public face of the company.

There were several interesting points that emerged in the interview. Let me highlight several and urge you to read the original interview.

First, Endeca has more than 600 customers, including Boeing, the US Defense Intelligence Agency, Ford Motor Company, Texas Instruments, and Walmart. (I had heard that Walmart was using Google search technology. That’s not surprising since most organizations have five or more search systems each working happily away.)

Second, Endeca like Attivio and a number of other search vendors are doing what the 20 somethings call “up leveling”. The idea is that selling search is less lucrative than selling an enterprise solution that the Board of Directors, the CEO, and the CFO perceiving as delivering “value” to the organization. Mr. Sonderegger said:

Traditional BI tools are very good at reporting on structured data and answering questions the company knew to ask ahead of time.   But today there is a greater need for information discovery so that people can answer the questions they just discovered mattered, in the moment, and make it part of the decision-making process.  Endeca’s search capabilities facilitate this type of self-service discovery on both structured and unstructured or jagged data sources – such as documents or e-mails, empowering users to ask and answer questions of all types of data.

Third, the notion of “dashboarding” is now a hook on which Endeca and other vendors hang licensing deals. A “dashboard” displays essential information in one interface. When I hear the word “dashboard” I think of the crazy information system in a big BMW, but I get the idea. Mr. Sonderegger noted:

Our technology complements existing BI systems.  Every one of our customers already has at least one reporting platform in place and it reliably publishes valuable reports.  However, each of those reports inspires follow-on questions.  And those questions change depending on what matters to that person right then.  The convergence of BI and search technologies reveals relationships in the data that lead to unanticipated answers or new insights – even if no one knew ahead of time those exact questions would be asked.

The idea is that Endeca does not require a rip-and-replace approach. Endeca’s method adds value and delivers a dynamic dashboard.

When I think of Endeca, I think about eCommerce. The business intelligence capability of Endeca, as I seem to recall, have been part of the system for a number of years. In today’s market, Endeca may be reminding prospects that it has more capabilities than powering online shopping.

Stephen E Arnold, June 14, 2010

Freebie

Business Intelligence Firm Sells for $1.13 Billion

June 8, 2010

Kroll is not a household name. If your house includes an intelligence or police professional, you may have a Kroll T shirt somewhere. The company was part of Marsh McLennan, an outfit that looks like an insurance company. I am not going to sort our what Marsh’s business interests are or explain the Kroll set up. You can get some information in this interview with a Kroll executive. No, don’t ask how I know him.

Kroll is now part of another outfit you probably never heard about either: Altegrity. Read “Kroll-Altegrity: A Reunion of Sorts.” Why’s this important? I think other outfits in this market sector hope to be acquired. In my addled goose view, I don’t think the Marsh executives were sad to see Kroll say adios. There was the money, and the management effort to be in the Kroll line of work is demanding.

Who else is in the Kroll business? Sorry. Not for free.

Stephen E Arnold, June 8, 2010

Freebie.

Top 1,000 Sites: Interesting and Odd

May 29, 2010

You can get Google’s version of the Top 1,000 Web sites via the Double Click Ad Planner. There are some anomalies. I could not spot Google.com nor YouTube.com. Microsoft’s sites were not rolled up but presented as individual sites; for example, Live.com at #3, MSN.com at #5, Microsoft.com at #6, and Bing.com at #14. Same handling of Adobe. The approach makes sense. A notable red herring link was Com.com which points to Cnet.com. A surprise that Ca.gov was on the list at 565 and the UK’s Direct.gov.uk was # 803. I did not spot any of the much-loved US government Web sites. The National Institutes of Health was #176. The IRS was #288 ahead of Hulu.com at #292. NASA was #604. The Department of Education was #762. The USGS turned up at # 978. The other US government entities were presumably outside the Top 1,000. Google’s star crossed social networking service Orkut was #45 with 45 million visitors. Facebook, according to the Google report, has 540 million visitors. To get an idea of the variance between the Google data and Nielsendata, compare some high profile companies. I looked at Nielsen’s April traffic data for Apple. Nielsen reported 61,158 million uniques. Google reported 72 million. Similar differences pepper traffic league tables. Which is less incorrect? I average which is close enough for horse shoes in my opinion.

The “truth” appears in log files. The problem is that comprehensive log file analysis is a challenge in many organizations. Net net: Some Web site operators may not know the hard count.

Stephen E Arnold

SAS Text Analytics and Teragram

May 28, 2010

I received a call about Teragram, the text processing company that SAS acquired a couple of years ago. I did a quick Overflight check and realized that I had not documented the absorption of Teragram into SAS. Teragram’s technology is alive and well, but the SAS positioning is for content processing to be a component of SAS Text Analytics. The product and solution has its own subsite within SAS.com. You can locate the details at http://www.sas.com/text-analytics/.

Another important point is that SAS Text Analytics includes four components. There is the SAS Enterprise Content Categorization function. The system parses content and identifies entities. Metadata are created along with category rules.

The second function is SAS Sentiment Analysis. A number of companies are competing in this sector. The SAS approach sucks in emails, tweets, and other documents. The system identifies various subjective shades in the source content.

SAS Text Miner now includes both text and data mining operations. The system is not one of those Web 2.0, “it is really easy” solutions. The system is easy to use, but to put “easy” in context, you will need programming and statistical savvy along with solid data set building skills.

The SAS Ontology Management solution provides a centralized method for keeping index terms and metatags consistent. Sounds easy, but this type of consistency is the difference between useful and useless information. SharePoint lacks this type of functionality. You have been given a gentle reminder about consistent tagging, dear SharePoint user.

SAS has a blog focused on text analytics. You can read “The Text Frontier” but last time I checked, the blog’s most recent update was posted in March 2010.

Bottomline: Teragram is alive and well, just part of SAS Text Analytics.

Stephen E Arnold, May 28, 2010

Freebie

Exalead and Dassault Tie Up, Users Benefit

May 24, 2010

A happy quack to the reader who alerted us to another win by Exalead.

Dassault Systèmes (DS) (Euronext Paris: #13065, DSY.PA), one of the world leaders in 3D and Product Lifecycle Management (PLM) solutions, announced an OEM agreement with Exalead, a global software provider in the enterprise and Web search market. As a result of this partnership, Dassault will deliver discovery and advanced PLM enterprise search capabilities within the Dassault ENOVIA V6 solutions.

The Exalead CloudView OEM edition is dedicated to ISVs and integrators who want to differentiate their solutions with high-performing and highly scalable embedded search capabilities. Built on an open, modular architecture, Exalead CloudView uses minimal hardware but provides high scalability, which helps reduce overall costs. Additionally, Exalead’s CloudView uses advanced semantic technologies to analyze, categorize, enhance and align data automatically. Users benefit from more accurate, precise and relevant search results.

This partnership with Exalead demonstrates the unique capabilities of ENOVIA’s V6 PLM solutions to serve as an open federation, indexing and data warehouse platform for process and user data, for customers across multiple industries. Dassault Systèmes PLM users will benefit from its Exalead-empowered ENOVIA V6 solutions to handle large data volumes thus enabling PLM enterprise data to be easily discovered, indexed and instantaneously available for real-time search and intelligent navigation. Non-experts will have the opportunity to access PLM know-how and knowledge with the simplicity and the performance of the Web in scalable online collaborative environments. Moreover, PLM creators and collaborators will be able to instantly find IP from any generic, business, product and social content and turn it into actionable intelligence.

Stephen E Arnold, May 22, 2010

Freebie.

KB Crawl to Release New Version of KB Crawl

May 23, 2010

The 2010 I-Expo in Paris come June 9-10 will be the forum for the release of the new KB BI Platform. UK Web site here. French Web site here. In an effort to improve the management of strategic information from the Web, the KB Crawl additional modules will allow users to optimize and personalize their Web monitoring system. Technologically, the new release provides an architecture entirely SaaS (Software as a Service), which means the technology does not need to use its IT department to set up a project monitoring. KB Crawl SAS provides all accommodation on a “Cloud” and guarantees the confidentiality of data. With the KB Crawl Suite’s integrated software, continuous monitoring of the Web to disseminate information and intelligence reports is at the tip of your fingers. The Platform BI enables you to collect, manage and disseminate strategic information collaboratively. KB Crawl has become one of the leading French players for market intelligence and Internet monitoring and this new release is in line with that pattern. For more information navigate to the KB Crawl Web site and download the white paper “KB Crawl 4 and Specialist Modules”.

Melody K. Smith, May 23, 2010

Note: Post was not sponsored.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta