Training Wheels for Business Intelligence?
September 17, 2009
Business intelligence is not like riding a bicycle. In fact, business intelligence requires quite a bit of statistical and mathematical sophistication. Some pundits and marketers believe that visualization will make the outputs of business intelligence systems “actionable”. I don’t agree. There’s another faction in business intelligence who see search as the solution to the brutal costs and complexities of business intelligence. I am on the fence about this “solution” for three reasons. First, if the underlying data are lousy, the outputs are lousy and the user is often none the wiser. Second, the notion of “search” is an interface spin. The user types a query and the system transforms the query into something the system can understand. What if the transformation goes off the tracks? The user is often none the wiser. Third, the notion of visualization combined with search is a typical marketing play: take two undefined notions which sound really good and glue them together. The result is an even more slippery term which, of course, no one defines with mathematical or financial precision.
Now read Channel Web’s “Visualization, Search, Among Emerging Trends in BI”, and you will see how the trade press creates a sense of purpose, movement, and innovation without providing any substance. The source of the article is none other than azure chip consultancy, the Gartner Group. I wrote about the firm’s assertion that no one can “copy” its information. I know at least one reason: I find quite a few of the firm’s assertions off the tracks upon this goose’s railroad runs.
Here’s the key passage in the Channel Web write up for me:
Schlegel identified seven emerging trends that will be key drivers for BI implementations, perhaps even down to the consumer level, in the future. The trends are: interactive visualization, in-memory analytics, BI integrated search, Software-as-a-Service, SOA/mash-ups, predictive modeling and social networking software. "A lot of technologies we’ll talk about to help build BI systems don’t even exist today, but some are right around the corner," he said. "Business intelligence can break out of the corporate world. Usually it’s consumer technology moving into the corporate world. I think it could be the other way around."
“Intelligence”, in my opinion, is an art or practice supported by human and machine-centric systems. Business intelligence remains a niche business because the vendors who market business intelligence systems rely on structured data, statistical routines taught in second and third year stats classes, and anchored in the programming tools from SAS and SPSS (now a unit of IBM). By the way, IBM now owns Cognos and SPSS, which seems to be a market share play, not a technology play in my opinion.)
The end of enterprise libraries caused a vacuum in some organization’s information access. The “regular” business intelligence unit focused on structured data and generating reports that look pretty much like the green bar reports I obtained from stats routines in the mid 1960s. To say that business intelligence methods are anchored in tradition is a bit of an understatement.
The surge in end user access to information on the Internet has thrown a curve to the business intelligence establishment. In response, SAS, for example, licensed the Inxight tools to process information and then purchased Teragram to obtain more of the “unstructured text goodness” that was lacking in traditional SAS installations. New vendors such as Attivio and Clarabridge have exploited this gap in the traditional Business Objects (now part of SAP and owner of Inxight), Cognos, SAS, and SPSS product offerings. I am not sure how successful these “crossover” companies will be. Clarabridge seems to have an edge because its technology plays well with MicroStrategy’s Version 9 system. Attivio is in more of a “go it alone” mode.
With Google’s Fusion Tables and WolframAlpha’s “search” service, there is increasing pressure on business intelligence vendors to:
- Cut prices
- Improve return on investment
- Handle transformation and meta metatagging of unstructured information
- Deliver better for fee outputs that the math folks from Google and Wolfram do for free.
My hunch is that the Gartner position reflects the traditional world of business intelligence and is designed to sell consulting services, maybe a conference or two.
Much can be done to enhance the usability of business intelligence. I think that in certain situations, visualization tools can clarify certain types of data. The notion of a search interface is a more complicated challenge. My research suggests that Google’s research into converting a query into a useful query that works across fact based information is light years ahead of what’s referenced in the trade publications and most consultants’ descriptions of next generation business intelligence.
When structured and unstructured content are processed in a meaningful way, new types of queries become possible. The outputs of these new types of queries deliver useful business intelligence. My view is that much of business intelligence is going to be disrupted when Google makes available some of its innovations.
In the meantime, the comfortable world of business intelligence will cruise along with incremental improvements until the Google disruption, if it takes place, reworks the landscape. Odds are 70 – 30 for Google to surprise the business intelligence world in the next six to nine months. Fusion Tables are baby steps.
Stephen Arnold, September 17, 2009
European Search Vendor Round Up
September 16, 2009
Updated at 8 29 am, September 17, 2009, to 23 vendors
I received a call from a very energetic, quite important investment wizard from a “big” financial firm yesterday. Based in Europe, the caller was having a bad hair day, and he seemed pushy, almost angry. I couldn’t figure out why he was out of sorts and why he was calling me. I asked him. He said, “I read your Web log and you annoy me with your poor coverage of European search vendors.”
I had to admit that I was baffled. I mentioned the companies that I tracked. But he wanted me to do more. I pointed out that the Web log is a marketing vehicle and he can pay me to cover his favorite investment in search. That really set him off. He wanted me to be a journalist (whatever that meant) and provide more detailed information about European vendors. And for free.
Right.
After the call, I took a moment and went through my files to see which European vendors I have mentioned and the general impression I have of each of these companies. The table below summarizes the companies I have either profiled in my for fee studies or the companies I have mentioned in this diary / marketing Web log. You may disagree with my opinions. I know that the azure chip consultants at Gartner, Ovum, Forrester, and others certainly do. But that’s understandable. The addled geese here in Harrod’s Creek actually install systems and test them, a step that most of the azure chip crowd just don’t have time because of their exciting work to generate enough revenue to keep the lights on, advise clients, and conduct social network marketing events. Just my opinion, folks. I am entitled to those despite the wide spread belief that I should be in the Happy Geese Retirement Home.
Vendor | Function | Opinion |
Autonomy | Search and eDiscovery | One of the key players in content processing; good marketing |
Bitext | Semantic components | Impressive technology |
Brox | Open source semantic tools | Energetic, marketing centric open source play |
Empolis GmbH | Information management and business intel | No cash tie with Attensity |
Exalead | Next generation application platform | The leader in search and content processing technology |
Expert System | Semantic toolkit | Works; can be tricky to get working the way the goslings want |
Fast ESP | Enterprise search, business intelligence, and everything else | Legacy of a police investigation hangs over the core technology |
InfoFinder | Full featured enterprise search system | my contact in Europe reports that this is a European technology. Listed customers are mostly in Norway. |
Interse Scan Jour | SharePoint enterprise search alternative | Based in Copenhagen, the Interse system adds useful access functions to SharePoint; sold in Dec 2008 |
Intellisearch | Enterprise search; closed US office | Basic search positioned as a one size fits all system |
Lumur Consulting | Flax is a robust enterprise search system | I have written positively about this system. Continues to improve with each release of the open source engine. |
Lexalytics | Sentiment analysis tools | A no cash merger with a US company and UK based Infonics; |
Linguamatics | Content processing focused on pharma | Insists that it does not have a price list |
Living-e AG | Information management | No cash tie with Attensity |
Mindbreeze | Another SharePoint snap in for search | Trying hard; interface confusing to some goslings |
Neofonie | Vertical search | Founded in the late 1990s, created Fireball.de |
Ontoprise GmbH | Semantic search | The firm’s semantic Web infrastructure product, OntoBroker, is at Version 5.3 |
Pertimm | Enterprise search | Now positioned as information management |
PolySpot | Enterprise search with workflow | Now at Version 4.8, search, work flow, and faceted navigation |
SAP Trex | Search tool in NetWeaver; works with R/3 content | Works; getting long in the tooth |
Sinequa | Enterprise search with workflow | Now at Version 7, the system includes linguistic tools |
Sowsoft | High speed desktop search | Excellent, lightweight desktop search |
SurfRay | Now focused on SharePoint | Uncertain; emerging from some business uncertainties |
Temis | Content processing and discovery | Original code and integrated components |
Tesuji | Lucene enterprise search | Highly usable and speedy; recommended for open source installations |
Updated at 8 29 am Eastern, September 17, 2009
SurfRay Reloaded
September 14, 2009
A happy quack to the reader who alerted me to the news about the reappearance of SurfRay, a company that dropped off my radar. The firm has announced via PR Newswire a new version of Ontolica. You can read the news release at the PR Newswire Web site. Note that PR Newswire links can go dark, so if this SharePoint compatible product interests you, you may want to do some sleuthing. Asserted in “SurfRay Announces Availability of Ontolica 4.0 for SharePoint, With New Reporting and Analytics Module” are analytics features. Furthermore, existing customers can upgrade for free through October 20, 2009. The Beyond Search team has not had an opportunity to kick the tires of this product although we did request information when rumors of the release reached us in Harrod’s Creek. You can get more information about the company at its Web site or by running this Devilfinder metasearch string. The product appears to compete in the same sector as Interse (also based in Denmark) and BA Insight (US). Some of the functionality asserted by SurfRay may be found in Coveo’s and Exalead’s SharePoint compatible systems. Adhere Solutions (owned by a Beyond Search gosling) offers software that makes it possible to use the Google Search Appliance to search, slice, and dice SharePoint content. With important announcements about Fast ESP (Microsoft’s enterprise search solution for large scale SharePoint installations), organizations with SharePoint have a large number of options to consider. The question that continues to flap around the goose pond is, “How can an organization determine which SharePoint solution is the appropriate one for that particular organization?” Marketing, not technology, seems to be the knife edge at the present time. Little wonder the geese at Beyond Search are addled. What a cornucopia of choices exist for the 100 million happy SharePoint license holders (if we accept the broad market size rumors bruited at conferences).
Stephen Arnold, September 14, 2009
Very Large Databases – Googzilla Being Coy
August 31, 2009
I read Technofeel’s “VLDB09 Part Two” and noted another Google head fake. Technofeel points out that Google’s paw prints were all over the conference from his point of view. MapReduce and Hadoop (an open source semi MapReduce) presentations caught his attention. In my opinion, the most interesting comment in the write up was:
Finally, I ended my visit at VLDB09 with two presentation of Google Interns about data mining to get structured result sets out of semi unstructured pages with lists and tables.
These two Google papers are important. You can get links to them from Technofeel’s article. Let me make two or three observations:
- The use of “interns” is a way for the Google to reward bright folks while keeping the big guns off the podium. The experience of the Google Books product manager makes this use of interns prudent.
- The content of the papers is not intern grade. When you work through the two documents, you will learn that Google has made significant advances in methods for working out issues in manipulating Google-scale structured data and discerning context.
- The traditional world of relational databases is on a collision course with Googzilla. Big data are part of the Google core competency.
Those are some interns because their co authors are among Google’s most sophisticated researchers and academic colleagues. Technofeel’s instincts are good. He may want to check the bios of the secondary and tertiary authors of these Google papers. The interns are not the hubs on these wheels.
Stephen Arnold, August 31, 2009
Data Warehouse Leader to Reinvent Data Warehousing
August 26, 2009
“IBM Announces ‘Smart Analytics System’ Aimed at Reinventing Data Warehousing” reminded me of Einstein’s discomfort with some of the implications of his theory of relativity. Invent one thing, then scramble to find a way to deal with problems that won’t go away. IBM, one might assert, invented data warehousing. It was an IBM researcher who developed our old friend the relational database. The Codd approach has been the big dog in data management for a long time. Options are now becoming more widely available, but when one says, “Data warehousing”, I think IBM. That’s why I am an addled goose I suppose.
Mr. Data Warehouse. Image source: http://en.wikipedia.org/wiki/Edgar_F._Codd
This article-interview makes clear that something is not right in IBM land. For me, the most suggestive comment in the Intelligent Enterprise write up was this passage:
Though IBM is promising better performance, a big part of the appeal seems to be targeted at executives who would favor contract simplicity and a single “throat to choke” over enterprising, but potentially riskier, in-house development, integration and innovation.
The “reinvention” seems to be to be little more than fixing responsibility for a mission critical system on a company big enough to take to court if the data warehouse has a leaking roof. In my experience these traditional data warehouses have more problems than a fast-build Shanghai apartment building.
My thought is to take a hard look at the assumptions about data warehousing, then poke into some options. Dare I suggest Aster Data? What about a Perfect Search enabled system?
Stephen Arnold, August 26, 2009
Metadata: Not Delivering and Dying
August 26, 2009
I watched a year ago as dozens of people filed into a program called “the drill instructor’s approach to metadata” or something that suggested a Marine Corps. physical training session. Yep, I thought, metadata in a day. I flapped my tail feathers and waddled on by the room stuffed with people who paid hundreds of dollars to get a knowledge injection.
Metadata is not exactly a botox injection that worked particularly well.
Lousy metadata produces a result that can be unexpected.
The notion of adding specific index terms to a content object is simple on the surface, but the indexing and tagging are intellectual walnuts. Get the terms wrong and no one can find documents because no one uses those words. Get the categories wrong and the helpful folders are like lumber rooms filled with odds and ends. Try to fix these problems, and the average MBA or art history major falls to the floor with their ankles bound by torn garments.
I quite enjoyed “Resuscitating Your Dying Metadata Strategy.” The title evoked an image of a gasping automated indexing system with three or four consultants poking at an intellectual body lying face down on the content processing vendor’s license agreement. And the word “dying” was a good one. There is a certain urgency to the word. “Sickly” denotes that a recovery may be likely. “Dying” suggests that I flip to Google Local to identify a funeral home.
The key segment of the article in my opinion was this passage:
a large number of IT professionals know intuitively that metadata management is the right thing to do, but have a hard time articulating why they need it. Also they admit a lack of engagement and collaboration with business stakeholders they are aiming to help. They also often have failed attempts to get metadata efforts off the ground in the past and are trying to fast track something…anything! So how can IT reverse this trend? They need to better scope and prioritize their metadata efforts by building a more realistic business case that can demonstrate real value-add.
The touchstones for me are the notion of a disconnect between users and information technology professionals. Then there is the notion that a lack of intellectual rigor and perhaps expertise have created problems. The organization wants a silver bullet.
Yes, this sounds familiar.
Metadata are important. The addled goose has no quick fixes to offer. The type of controlled terms that once were the strength of commercial databases such as ABI / INFORM are no longer valued. Creating consistent, useful controlled term lists and developing meaningful classification systems takes time and effort. Once these lists are in hand, the terms can be applied via human or “smart” systems. The moment the lists and classification systems are completed, the work begins to keep these lists in step with language. Sci tech terminology drifts less quickly than general business terminology.
The message is that an organization must continue to invest in complex, knowledge centric work. In my experience few organizations have the appetite for this activity. Quite a few folks who buy commercial databases in order to create a knowledge monopoly invest too little to keep their information products’ indexing up to snuff. The newcomers spend some money and time but fall into the trap of finding a Hollywood doctor to administer a quick botox injection to hide a wrinkle before an audition.
The folks who work at metadata often find themselves ignored. A good example is the 500,000+ categories generated by the Google. You can see a bit of this system in action if you run this query, verified at 8 am on August 25, 2009: “skin cancer”. Here is the result list I saw:
Based on my research, Google has been plugging away at metadata and making progress. Organizations faced with revivifying their dying metadata systems may want to learn from their errors and their consultants’ silly promises about certain automated systems. Maybe Google will make its metadata systems available someday? Maybe one of the graduates of the drill instructor programs that teach taxonomy will discover a silver bullet that is easy, cheap, and fast?
The addled goose’s team does controlled vocabularies the old-fashioned way, working with partners like Access Innovations, a company with automated systems and the deep experience required to tackle metadata in an informed way. No wonder he is paddling alone and thinking of the good old days when the ABI / INFORM and the Business Dateline teams worked each week to refine their term lists and tweak their classification systems. That was hard work not suitable to the social networking, Tweet sending “experts” selling metadata systems like carnival mountebanks.
Stephen Arnold, August 26, 2009
Silobreaker Update
August 25, 2009
I was exploring usage patterns via Alexa. I wanted to see how Silobreaker, a service developed by some savvy Scandinavians, was performing against the brand name business intelligence companies. Silobreaker is one of the next generation information services that processes a range of content, automatically indexing and filtering the stream, and making the information available in “dossiers”. A number of companies have attempted to deliver usable “at a glance” services. Silobreaker has been one of the systems I have relied upon for a number of client engagements.
I compared the daily reach of LexisNexis (a unit of the Anglo Dutch outfit Reed Elsevier), Factiva (originally a Reuters Dow Jones “joint” effort in content and value added indexing now rolled back into the Dow Jones mothership), Ebsco (the online arm of the EB Stevens Co. subscription agency), and Dialog (a unit of the privately held database roll up company Cambridge Scientific Abstracts / ProQuest and some investors). Keep in mind that Silobreaker is a next generation system and I was comparing it to the online equivalent of the Smithsonian’s computer exhibit with the Univac and IBM key punch machine sitting side by side:
Silobreaker is the blue line which is chugging right along despite the challenging financial climate. I ran the same query on Compete.com, and that data showed LexisNexis showing a growth uptick and more traffic in June 2009. You mileage may vary. These types of traffic estimates are indicative, not definitive. But Silobreaker is performing and growing. One could ask, “Why aren’t the big names showing stronger buzz?”
A better question may be, “Why haven’t the museum pieces performed?” I think there are three reasons. First, the commercial online services have not been able to bridge the gap between their older technical roots and the new technologies. When I poked under the hood in Silobreaker’s UK facility, I was impressed with the company’s use of next generation Web services technology. I challenged the R&D team regarding performance, and I was shown a clever architecture that delivers better performance than the museum piece services against which Silobreaker competes. I am quick to admit that performance and scaling remain problems for most online content processing companies, but I came away convinced that Silobreaker’s engineering was among the best I had examined in the real time content sector.
Second, I think the museum pieces – I could mention any of the services against which I compared Silobreaker – have yet to figure out how to deal with the gap between the old business model for online and the newer business models that exist. My hunch is that the museum pieces are reluctant to move quickly to embrace some new approaches because of the fear of [a] cannibalization of their for fee revenues from a handful of deep pocket customers like law firms and government agencies and [b] looking silly when their next generation efforts are compared to newer, slicker services from Yfrog.com, Collecta.com, Surchur.com, and, of course, Silobreaker.com.
Third, I think the established content processing companies are not in step with what users want. For example, when I visit the Dialog Web site here, I don’t have a way to get a relationship map. I like nifty methods of providing me with an overview of information. Who has the time or patience to handcraft a Boolean query and then paying money whether the dataset contains useful information or not. I just won’t play that “pay us to learn there is a null set” game any more. Here’s the Dialog splash page. Not too useful to me because it is brochureware, almost a 1998 approach to an online service. The search function only returns hits from the site itself. There is not compelling reason for me to dig deeper into this service. I don’t want a dialog; I want answers. What’s a ProQuest? Even the name leaves me puzzled.
I wanted to make sure that I was not too harsh on the established “players” in the commercial content processing sector. I tracked down Mats Bjore, one of the founders of Silobreaker. I interviewed him as part of my Search Wizards Speak series in 2008, and you may find that information helpful in understanding the new concepts in the Silobreaker service.
What are some of the changes that have taken place since we spoke in June 2008?
Mats Bjore: There are several news things and plenty more in the pipeline. The layout and design of Silobreaker.com have been redesigned to improve usability; we have added an Energy section to provide a more vertically focused service around both fossil fuels and alternative energy; we have released Widgets and an API that enable anyone to embed Silobreaker functionality in their own web sites; and we have improved our enterprise software to offer corporate and government customers “local” customizable Silobreaker installations, as well a technical platform for publishers who’d like to “silobreak” their existing or new offerings with our technology. Industry-wise,the recent statements by media moguls like Rupert Murdoch make it clear that the big guys want to monetize their information. The problem is that charging for information does not solve the problem of a professional already drowning in information. This is like trying to charge a man who has fallen overboard for water instead of offering a life jacket. Wrong solution. The marginal loss of losing a few news sources is really minimal for the reader, as there are thousands to choose from anyways, so unless you are a “must-have” publication, I think you’ll find out very quickly that reader loyalty can be fickle or short-lived or both. Add to that that news reporting itself has changed dramatically. Blogs and other types of social media are already favoured before many newspapers and we saw Twitters role during the election demonstrations in Iran. Citizen journalism of that kind; immediate, straight from the action and free is extremely powerful. But whether old or new media, Silobreaker remains focused on providing sense-making tools.
What is it going to be, free information or for fee information?
Mats Bjore: I think there will be free, for fee, and blended information just like Starbuck’s coffee.·The differentiators will be “smart software” like Silobreaker and some of the Google technology I have heard you describe. However, the future is not just lots of results. The services that generate value for the user will have multiple ways to make money. License fees, customization, and special processing services—to name just three—will differentiate what I can find on your Web log and what I can get from a Silobreaker “report”.
What can the museum pieces like Dialog and Ebsco do to get out of their present financial swamp?
Mats Bjore: That is a tough question. I also run a management consultancy, so let me put on my consultant hat for a moment. If I were Reed Elsevier, Dow Jones/Factiva, Dialog, Ebsco or owned a large publishing house, I must realize that I have to think out of the box. It is clear that these organizations define technology in a way that is different from many of the hot new information companies. Big information companies still define technology in terms of printing, publishing or other traditional processes. The newer companies define technology in terms of solving a user’s problem. The quick fix, therefore, ought to be to start working with new technology firms and see how they can add value for these big dragons today, not tomorrow.
What does Silobreaker offer a museum piece company?
Mats Bjore: The Silobreaker platform delivers access and answers without traditional searching. Users can spot what is hot and relevant. I would seriously look at solutions such as Silobreaker as a front to create a better reach to new customers, capture revenues from the ads sponsored free and reach a wider audience an click for premium content – ( most of us are unaware of the premium content that is out there, since the legacy contractual types only reach big companies and organizations. I am surprised that Google, Microsoft, and Yahoo have not moved more aggressively to deliver more than a laundry list of results with some pictures.
Is the US intelligence community moving more purposefully with access and analysis?
The interest in open source is rising. However, there is quite a bit of inertia when it comes to having one set of smart software pull information from multiple sources. I think there is a significant opportunity to improve the use of information with smart software like Silobreaker’s.
Stephen Arnold, August 25, 2009
Tweets Are Mostly Pointless Babble
August 15, 2009
I enjoy Mashable. The articles come at topics in a way that is youthful, enthusiastic even. I noted Jennifer Van Grove’s “40% of Tweets Are Pointless Babble.” I was surprised that * only * 40 percent of the message traffic was pointless. However, I think Ms. Van Grove reveals that she has not spent much time in monitoring traffic for intelligence and law enforcement entities. With that experience in her bag of tricks, she might reach a different conclusion about the “noise” in the Twitterstream. “Pointless” to one person might be evidence to another. Youth has its advantages but understanding the value of filtering traffic may not be apparent to an avid sender of Tweets.
Stephen Arnold, August 14, 2009
Visualization and Confusion
August 15, 2009
Visualization of search results or other data is a must-have for presentations in the Department of Defense. What’s a good presentation? One that has killer visualizations of complex data. The problem is that sizzle in one colonel’s graphics triggers a graphics escalation. This is a briefing room version of Mixed Martial Arts. The problem, based on my limited experience in this type of content, is that most of the graphics don’t make much sense. In fact, when I see a graphic I usually have zero idea about where the data originated, the mathematical methods used to generate the visual, or what Photoshop wizardry may have been employed to make that data point explode in my perceptual field. Your mileage may differ, but I find that visualization is useful in small doses.
To prove that what I prefer is out of date and that my views are road kill on the information superhighway, you will want to explore “15 Stunning Examples of Data Visualization”. Stunning is an appropriate word. After looking at these examples, I am not sure what is being communicated in some of these graphics. Example: Big fluctuations.
If you want to add zing to your briefings, you will definitely get some ideas from this article. If I am in the audience, expect questions from the addled goose. Know your data thoroughly because I am not sure some of these examples communicate on the addled goose wave length.
Stephen Arnold, August 14, 2009
Morphing Search Vendor Adventures: Customer Feedback
August 13, 2009
Quite a few search and content processing companies are chasing the supposed honey pot of customer support, customer feedback, customer self help, and just about any way to cut these costs. Forbes ran a cheerleading article that I was going to ignore. “No,” one of the goslings said, “This write up makes some good points.” Okay, the story is “The Upside of Bad Online Customer Reviews” by Mirela Iverac. The core idea is that customers who complain can provide useful information to the company that caused the dust up in the first place. The underlying technical hook is that the outfit mentioned in the story, based on what I have heard, uses the Attensity system to deliver the bag of goodies. If you revel in feedback loops that work, snag the Forbes’s write up.
Stephen Arnold, August 13, 2009