Data Warehouse Leader to Reinvent Data Warehousing

August 26, 2009

“IBM Announces ‘Smart Analytics System’ Aimed at Reinventing Data Warehousing” reminded me of Einstein’s discomfort with some of the implications of his theory of relativity. Invent one thing, then scramble to find a way to deal with problems that won’t go away. IBM, one might assert, invented data warehousing. It was an IBM researcher who developed our old friend the relational database. The Codd approach has been the big dog in data management for a long time. Options are now becoming more widely available, but when one says, “Data warehousing”, I think IBM. That’s why I am an addled goose I suppose.

image

Mr. Data Warehouse. Image source: http://en.wikipedia.org/wiki/Edgar_F._Codd

This article-interview makes clear that something is not right in IBM land. For me, the most suggestive comment in the Intelligent Enterprise write up was this passage:

Though IBM is promising better performance, a big part of the appeal seems to be targeted at executives who would favor contract simplicity and a single “throat to choke” over enterprising, but potentially riskier, in-house development, integration and innovation.

The “reinvention” seems to be to be little more than fixing responsibility for a mission critical system on a company big enough to take to court if the data warehouse has a leaking roof. In my experience these traditional data warehouses have more problems than a fast-build Shanghai apartment building.

My thought is to take a hard look at the assumptions about data warehousing, then poke into some options. Dare I suggest Aster Data? What about a Perfect Search enabled system?

Stephen Arnold, August 26, 2009

Metadata: Not Delivering and Dying

August 26, 2009

I watched a year ago as dozens of people filed into a program called “the drill instructor’s approach to metadata” or something that suggested a Marine Corps. physical training session. Yep, I thought, metadata in a day. I flapped my tail feathers and waddled on by the room stuffed with people who paid hundreds of dollars to get a knowledge injection.

Metadata is not exactly a botox injection that worked particularly well.

botox lips

Lousy metadata produces a result that can be unexpected.

The notion of adding specific index terms to a content object is simple on the surface, but the indexing and tagging are intellectual walnuts. Get the terms wrong and no one can find documents because no one uses those words. Get the categories wrong and the helpful folders are like lumber rooms filled with odds and ends. Try to fix these problems, and the average MBA or art history major falls to the floor with their ankles bound by torn garments.

I quite enjoyed “Resuscitating Your Dying Metadata Strategy.” The title evoked an image of a gasping automated indexing system with three or four consultants poking at an intellectual body lying face down on the content processing vendor’s license agreement. And the word “dying” was a good one. There is a certain urgency to the word. “Sickly” denotes that a recovery may be likely. “Dying” suggests that I flip to Google Local to identify a funeral home.

The key segment of the article in my opinion was this passage:

a large number of IT professionals know intuitively that metadata management is the right thing to do, but have a hard time articulating why they need it.  Also they admit a lack of engagement and collaboration with business stakeholders they are  aiming to help. They also often have failed attempts to get metadata efforts off the ground in the past and are trying to fast track something…anything! So how can IT reverse this trend? They need to better scope and prioritize their metadata efforts by building a more realistic business case that can demonstrate real value-add.

The touchstones for me are the notion of a disconnect between users and information technology professionals. Then there is the notion that a lack of intellectual rigor and perhaps expertise have created problems. The organization wants a silver bullet.

Yes, this sounds familiar.

Metadata are important. The addled goose has no quick fixes to offer. The type of controlled terms that once were the strength of commercial databases such as ABI / INFORM are no longer valued. Creating consistent, useful controlled term lists and developing meaningful classification systems takes time and effort. Once these lists are in hand, the terms can be applied via human or “smart” systems. The moment the lists and classification systems are completed, the work begins to keep these lists in step with language. Sci tech terminology drifts less quickly than general business terminology.

The message is that an organization must continue to invest in complex, knowledge centric work. In my experience few organizations have the appetite for this activity. Quite a few folks who buy commercial databases in order to create a knowledge monopoly invest too little to keep their information products’ indexing up to snuff. The newcomers spend some money and time but fall into the trap of finding a Hollywood doctor to administer a quick botox injection to hide a wrinkle before an audition.

The folks who work at metadata often find themselves ignored. A good example is the 500,000+ categories generated by the Google. You can see a bit of this system in action if you run this query, verified at 8 am on August 25, 2009: “skin cancer”. Here is the result list I saw:

skin cancer

Based on my research, Google has been plugging away at metadata and making progress. Organizations faced with revivifying their dying metadata systems may want to learn from their errors and their consultants’ silly promises about certain automated systems. Maybe Google will make its metadata systems available someday? Maybe one of the graduates of the drill instructor programs that teach taxonomy will discover a silver bullet that is easy, cheap, and fast?

The addled goose’s team does controlled vocabularies the old-fashioned way, working with partners like Access Innovations, a company with automated systems and the deep experience required to tackle metadata in an informed way. No wonder he is paddling alone and thinking of the good old days when the ABI / INFORM and the Business Dateline teams worked each week to refine their term lists and tweak their classification systems. That was hard work not suitable to the social networking, Tweet sending “experts” selling metadata systems like carnival mountebanks.

Stephen Arnold, August 26, 2009

Software as a Service: More Complexity

August 26, 2009

Short honk: A happy quack to the reader who sent me a link to “Software as a Service Not Yet Ready”. We goslings like SaaS. Sure, there some Amazon type bugs in the woodwork, but that’s to be expected. A much stronger anti SaaS stance is expressed by Haseet Sanghrajka, a senior manager at ST Consulting. He asserted:

There are moves to develop standards in data hosting, which is good, but until the SaaS model has gone through a few iterations the market will not clearly understand the pitfalls, or work out just how to address them. From bandwidth redundancy to the strengths of service providers, organisations cannot afford to underestimate the complexity of this model.

I am popping this into the Beyond Search quotation file drawer.

Stephen Arnold, August 26, 2009

Google and Yelp: Coffee Shop Table Talk

August 26, 2009

I was sitting in suburban Virginia. My table mates were an MBA from a prestigious university and a math whiz who spoke fluent Arabic. Just behind me were 30 somethings–two fellows and one young woman. The three were chatting excitedly among themselves. A thread of their conversation wafted to me. The gist of the talk was that the Google had evidenced some interest in Yelp.com. Yelp.com is the go-to place for a wide range of information. The service is growing rapidly, according to Compete.com.

yelp usage

Ah, maybe a deal in the works? Possibly the Google showing its normal interest in sites that generate traffic? The addled goose is now watching Overflight to see if any info substantiates or invalidates these three folks’ in-the-know, caffeine-infused yapping. Rumor now. The addled goose is watching.

Stephen Arnold, August 25, 2009

LexisNexis and Recommind Tie Up

August 26, 2009

LexisNexis continues to look for ways to boost its revenues. The efforts are interesting because the company continues to retrace its steps in an effort to crack the code. LexisNexis, like Westlaw, faces a mini revolt. Some law firm clients are asking the firms to do legal work for fixed prices, reduced rates with ceilings on costs, and be more creative in reducing the costs of certain legal work. This is bad news for the commercial legal information companies. The reason? The companies delivering US legal information depend to some degree on taxi meter pricing. The idea is that the legal researcher pays for time and other variables for system access. Not surprisingly, for a patent matter, a legal researcher can run up a four or five figure bill. In the good old days, the law firms’ clients would pay up. Today, some clients are balking. Once again the traditional business model runs headlong into the new realities of business.

What’s the fix? LexisNexis has tried a tie up with Microsoft to put research in Microsoft Word. Did you activate the feature? I didn’t. LexisNexis has tried to diversify into fraud, content analysis, and risk. Do you think of LexisNexis when I say these words? I didn’t think so. LexisNexis has tried different angles of attack on search, law firm software services, and Web access.

The financial pressure continues to mount.

I just learned that knowledge management is the next revenue Petri dish. “LexisNexis and Recommind to Deliver New Knowledge Management Capabilities for Law Firms” reported this new venture. The story reported:

The new offering integrates Lexis Search Advantage content and services accessed through lexis.com with MindServer Search, Recommind’s enterprise search platform. It provides a one-stop destination combining access to documents and information from both a firm’s internal sources as well as trusted LexisNexis® content, delivering search results that are more complete, efficient and actionable.

The run down of benefits is pretty much what one would expect: information integration, better research, etc.

The proof of the pudding will be revenue. LexisNexis is straying from its core competency of delivering commercial grade legal information. Will knowledge management generate enough cash to put LexisNexis back on the fast growth track? In my opinion, the company is in a race. Some government entities are making more legal related information available online. Attorneys looking for ways to cut costs are likely to flock to these free services. Another challenge is the interest in lower cost professional information services like FastCase.

Recommind shifted from a legal niche strategy to an enterprise search strategy. Does this tie up mean that Recommind is returning to its legal niche or diversifying courtesy of LexisNexis.

Interesting moves by both companies. Each firm has search technology. Now search and content have morphed into knowledge management. Does anyone know what knowledge means? Does anyone know what management means? The phrase strikes me as “old school”.

Stephen Arnold, August 26, 2009

Silobreaker Update

August 25, 2009

I was exploring usage patterns via Alexa. I wanted to see how Silobreaker, a service developed by some savvy Scandinavians, was performing against the brand name business intelligence companies. Silobreaker is one of the next generation information services that processes a range of content, automatically indexing and filtering the stream, and making the information available in “dossiers”. A number of companies have attempted to deliver usable “at a glance” services. Silobreaker has been one of the systems I have relied upon for a number of client engagements.

I compared the daily reach of LexisNexis (a unit of the Anglo Dutch outfit Reed Elsevier), Factiva (originally a Reuters Dow Jones “joint” effort in content and value added indexing now rolled back into the Dow Jones mothership), Ebsco (the online arm of the EB Stevens Co. subscription agency), and Dialog (a unit of the privately held database roll up company Cambridge Scientific Abstracts / ProQuest and some investors). Keep in mind that Silobreaker is a next generation system and I was comparing it to the online equivalent of the Smithsonian’s computer exhibit with the Univac and IBM key punch machine sitting side by side:

silo usage

Silobreaker is the blue line which is chugging right along despite the challenging financial climate. I ran the same query on Compete.com, and that data showed LexisNexis showing a growth uptick and more traffic in June 2009. You mileage may vary. These types of traffic estimates are indicative, not definitive. But Silobreaker is performing and growing. One could ask, “Why aren’t the big names showing stronger buzz?”

silo splash

A better question may be, “Why haven’t the museum pieces performed?” I think there are three reasons. First, the commercial online services have not been able to bridge the gap between their older technical roots and the new technologies. When I poked under the hood in Silobreaker’s UK facility, I was impressed with the company’s use of next generation Web services technology. I challenged the R&D team regarding performance, and I was shown a clever architecture that delivers better performance than the museum piece services against which Silobreaker competes. I am quick to admit that performance and scaling remain problems for most online content processing companies, but I came away convinced that Silobreaker’s engineering was among the best I had examined in the real time content sector.

Second, I think the museum pieces – I could mention any of the services against which I compared Silobreaker – have yet to figure out how to deal with the gap between the old business model for online and the newer business models that exist. My hunch is that the museum pieces are reluctant to move quickly to embrace some new approaches because of the fear of [a] cannibalization of their for fee revenues from a handful of deep pocket customers like law firms and government agencies and [b] looking silly when their next generation efforts are compared to newer, slicker services from Yfrog.com, Collecta.com, Surchur.com, and, of course, Silobreaker.com.

Third, I think the established content processing companies are not in step with what users want. For example, when I visit the Dialog Web site here, I don’t have a way to get a relationship map. I like nifty methods of providing me with an overview of information. Who has the time or patience to handcraft a Boolean query and then paying money whether the dataset contains useful information or not. I just won’t play that “pay us to learn there is a null set” game any more. Here’s the Dialog splash page. Not too useful to me because it is brochureware, almost a 1998 approach to an online service. The search function only returns hits from the site itself. There is not compelling reason for me to dig deeper into this service. I don’t want a dialog; I want answers. What’s a ProQuest? Even the name leaves me puzzled.

the dialog page

I wanted to make sure that I was not too harsh on the established “players” in the commercial content processing sector. I tracked down Mats Bjore, one of the founders of Silobreaker. I interviewed him as part of my Search Wizards Speak series in 2008, and you may find that information helpful in understanding the new concepts in the Silobreaker service.

What are some of the changes that have taken place since we spoke in June 2008?

Mats Bjore: There are several news things and plenty more in the pipeline. The layout and design of Silobreaker.com have been redesigned to improve usability; we have added an Energy section to provide a more vertically focused service around both fossil fuels and alternative energy; we have released Widgets and an API that enable anyone to embed Silobreaker functionality in their own web sites; and we have improved our enterprise software to offer corporate and government customers “local” customizable Silobreaker installations, as well a technical platform for publishers who’d like to “silobreak” their existing or new offerings with our technology. Industry-wise,the recent statements by media moguls like Rupert Murdoch make it clear that the big guys want to monetize their information. The problem is that charging for information does not solve the problem of a professional already drowning in information. This is like trying to charge a man who has fallen overboard for water instead of offering a life jacket. Wrong solution. The marginal loss of losing a few news sources is really minimal for the reader, as there are thousands to choose from anyways, so unless you are a “must-have” publication, I think you’ll find out very quickly that reader loyalty can be fickle or short-lived or both. Add to that that news reporting itself has changed dramatically. Blogs and other types of social media are already favoured before many newspapers and we saw Twitters role during the election demonstrations in Iran. Citizen journalism of that kind; immediate, straight from the action and free is extremely powerful. But whether old or new media, Silobreaker remains focused on providing sense-making tools.

What is it going to be, free information or for fee information?

Mats Bjore: I think there will be free, for fee, and blended information just like Starbuck’s coffee.·The differentiators will be “smart software” like Silobreaker and some of the Google technology I have heard you describe. However, the future is not just lots of results. The services that generate value for the user will have multiple ways to make money. License fees, customization, and special processing services—to name just three—will differentiate what I can find on your Web log and what I can get from a Silobreaker “report”.

What can the museum pieces like Dialog and Ebsco do to get out of their present financial swamp?

Mats Bjore: That is a tough question. I also run a management consultancy, so let me put on my consultant hat for a moment. If I were Reed Elsevier, Dow Jones/Factiva, Dialog, Ebsco or owned a large publishing house, I must realize that I have to think out of the box. It is clear that these organizations define technology in a way that is different from many of the hot new information companies. Big information companies still define technology in terms of printing, publishing or other traditional processes. The newer companies define technology in terms of solving a user’s problem. The quick fix, therefore, ought to be to start working with new technology firms and see how they can add value for these big dragons today, not tomorrow.

What does Silobreaker offer a museum piece company?

Mats Bjore: The Silobreaker platform delivers access and answers without traditional searching. Users can spot what is hot and relevant. I would seriously look at solutions such as Silobreaker as a front to create a better reach to new customers, capture revenues from the ads sponsored free and reach a wider audience an click for premium content – ( most of us are unaware of the premium content that is out there, since the legacy contractual types only reach big companies and organizations. I am surprised that Google, Microsoft, and Yahoo have not moved more aggressively to deliver more than a laundry list of results with some pictures.

Is the US intelligence community moving more purposefully with access and analysis?

The interest in open source is rising. However, there is quite a bit of inertia when it comes to having one set of smart software pull information from multiple sources. I think there is a significant opportunity to improve the use of information with smart software like Silobreaker’s.

Stephen Arnold, August 25, 2009

Is Search the Answer to Short Attention Spans

August 25, 2009

I thought about the essay I read that danced around the subject of Google making me more stupid. The current variation on this theme appeared in the UK Daily Mail. The story that caught my attention was “Digital Overload Is Making Us More Easily Distracted.” The premise struck me as odd. Someone digital plumbing is altering how a human concentrates. Hmmm. I thought humans made choices about concentration. For example, I decide to read a book. I decide to read and focus my concentration of the act of reading. Ask the goslings. When I concentrate, a person can walk up to me and touch my shoulder. I will jump and sometimes let out a yelp. I concentrate. Digital inputs don’t mean anything to me when I focus. I blot out distractions.

Not for the Daily Mail’s writer David Derbyshire. He wrote:

Some neuroscientists argue that the brain is geared to handle one thing at a time. When asked to juggle several things at once, it is forced to flick frantically between them, like a performer spinning plates. This puts the brain under stress and means it doesn’t perform as well, it is claimed.

Ah, the Daily Mail is not talking about individuals who can concentrate. The Daily Mail is talking about researchers who have studied a sample composed of people who cannot concentrate because these individuals ** choose ** to put themselves in situations where distractions are the norm.

I buy that. The young driver I saw run over a bicycle was not paying attention. I thought I saw a cell phone or an MP3 player. No one was hurt, but that bicycle cannot provides its side of the story. What’s the fix? None. If people ** choose ** to create an environment flush with distractions, those folks will have a tough time concentrating.

I don’t need a university researcher to “prove” that. Obviously the Daily Mail and its editors did.

I had hoped the article would talk about the role of search. It did not. At least the author of the Google-is-making-me-stupid essay challenged my thinking. The Daily Mail’s article did not in my opinion.

What can top this study? How about USA Today’s write up about social networks making students dip into self love?

Stephen Arnold, August 25, 2009

Social Networks and Security

August 25, 2009

I got roasted at a conference last year when I pointed out that controlling security and privacy in social networks was a challenge. One 20 something told me that I was an addled goose. No push back from me. I stuck to my assertion and endured the smarmy remarks and head shaking. I thought of this young person when I read “Social Networks Leak Personal Information”. Sure, it is one write up in a trade magazine, but it contains a statement I find instructive:

The researchers say that social networks leak information through a combination of HTTP header information — the Referrer header and the Request-URI — and cookies sent to third-party aggregators such as Google (NSDQ: GOOG)’s DoubleClick, Google Analytics, and Omniture, among others. As a consequence of this leakage, third-party aggregators can potentially link social network identifiers to past and future Web site visits, thereby identifying a person and his or her online activities.

Right? Wrong? With the young-at-heart going social, old geese like me want to move forward with some caution.

Stephen Arnold, August 25, 2009

Washington DC Clunkers

August 25, 2009

I found the Washington Post’s story “’Clunkers’ Deals Disrupted by Online Malfunction, Sellers Say” quite suggestive. I read the article after attending a meeting in Washington DC. The meeting was about an information system that did not work. I cannot reveal the government entity nor the specifics of the problem, but the word “clunker” may be apt. The Post’s angle in this story was that the online system did not work very well. With our Orange line ride sucking up 90 minutes from West Falls Church to DuPont Circle, “clunker” may become a synonym for much in the nation’s nerve center. The reason for slow trains? A computer glitch. Ah, computers, not people. To close, a happy quack to the DCist.com for this example of another Washington, DC clunker:

image

Stephen Arnold, August 25, 2009

Yahoo and User Experience

August 25, 2009

Lots of posts from gurus, azure chip consultants, and real journalists about Yahoo and search. I have plowed through about 15 of the online write ups. A couple jutted above the plain; most we in the low lands. A good example of the thinking that is not quite up the mountain is the write up “Yahoo: We’re Still in the Search Business”. The main point for me was this passage:

“I fully anticipate that our front-end experience will evolve differently from Bing,” said Prabhakar Raghavan, senior vice president of Yahoo labs and search strategy, during a presentation to journalists at Yahoo’s headquarters in Sunnyvale, Calif. “We collaborate on the back end, but we are competitors on the front end.”

So the plumbing is the plumbing. The differentiator is the user experience. To me that means interface. A search box is a part of the interface. Ergo Yahoo cannot do much with the white rectangle into which people type 2.3 words. Yahoo must add links, facets, and any other gizmo that allows a person to find information without typing 2.3 words.

I just looked at the Yahoo splash page for me:

search splash yahoo

I find this page unhelpful. I can personalize the page, but I see the Excite type of clutter that annoys me when I have to hunt for the specific link I want. Examples: NASCAR news. Three clicks. Email. Log on and extra clicks to get past the news headlines. My account for for fee email? Good luck finding this page.

I look forward to user experience changes, but I don’t think interface alone will address the issues I have encountered with Yahoo Shopping, locating news stories that have been removed even though links in the wild to the story are available, and finding specific discussion group content quickly.

I want more than punditry and user experience. I want a system that provides information access. Right now, Yahoo has many opportunities to improve, but the key will be the plumbing. If I understand the posts I have examined. Microsoft and Yahoo will collaborate on plumbing. I had a house once with two plumbing contractors. I recall some exciting discussions with the two plumbers. No one had responsibility for the leaky pipes.

Stephen Arnold, August 25, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta