Silobreaker: When Intelligence Officers Solve Their Own Info Problems
May 20, 2008
“The Holy Grail”, one former intelligence officer told me, “is to walk in my office and have what I need on my desk, on the computer monitor, and on the screen of my secure telephone.” (You can recognize these whizzy mobile phones because some have an extra light and other features to make it hard for the bad guy to listen in on the call.)
I forget that most people in the online business don’t have experience working in intelligence, the military and law enforcement. When I see an allegedly “hot new semantic search system”, I often take a cursory look and then walk on by. The reason is that the idea of searching is not where the action is for serious intelligence.
If you do a search on Mother Google, you will find more than 300,000 references to the company. To give you a benchmark, if you search for this Web log, you get about 230,000 references with most of them to a search engine optimization company with the same name. The point is that certain services or resources, no matter how useful, are tough to find unless you know exactly what to enter in the search box.
Let me illustrate. Here’s a screen shot of a system that has been available for several years.
The query “semantic search” returned a main story, secondary items in smaller “newspaper” style boxes, an embedded live video from CeBIT, a bar chart about term frequency, and an “In Focus” section that provides the names of people and things the Silobreaker system identified as important. (If you look at the people in the “In Focus” box, you’ll see me (Stephen Arnold) identified despite my <230,000 Web log references in Google.)
Notice that Silobreaker’s default display is a report. The system delivers a synthesis of what’s important. There’s no result list. No single graphic gizmo floating in the browser without meaningful context. Silobreaker looks great but it contains a significant amount of go juice. Navigate here to explore the system yourself.
Silobreaker doesn’t do plain vanilla laundry lists. You can see a list of documents, but you see them in context; that is, a specific knowledge setting. You don’t have to ask, “What the heck does that mean?” Silobreaker presents the meaning of each item in a display.
Most of the search systems I see or get asked to review don’t do what I need done. I want to comment on a basic Silobreaker output and point out a few facts about the system. Once that housekeeping is done, I will make several observations in an effort to spark discussion about the sorry state of enterprise search and commercial business intelligence systems. For a reader who finds my criticism of the best that Silicon Valley has to offer offensive, stop reading now. If you want to see where the rubber meets the race track in the intelligence community, keep reading. Read more
Microsoft to ‘Innovate and Disrupt in Search’–Again
May 19, 2008
My newsreader popped this info tart in front of me this morning: “Kevin Johnson’s Memo On Yahoo & Their Strategy”. The focus of Gigaom’s Web log post is a memo, allegedly by Kevin Johnson. By the time you read this, my pathetic posting will be very old news. You need to read the memo and determine for yourself it it’s the real deal.
I’m commenting because of a series of emails I exchanged this morning about Microsoft’s search strategy. Among the points I made to the eager journalist who was, as my mother used to say, an empty vessel:
- Microsoft is implementing reactions, not a strategy. The cause of these knee-jerk reactions: mostly the Google and a business model challenge. Cloud services are coming round the mountain, and Microsoft can hear the whistle blowing.
- Yahoo has some sharp people and a truck load of search systems–Inktomi, Stata Labs, AllTheWeb.com (provided by Fast Search & Transfer), Flickr’s system, Overture’s search, and more). I’ve been told the company is rushing to be more like Google, which is not perfect, obviously. But Yahoo is grossly heterogeneous, and Google is more homogeneous in architecture.
- Google keeps on grinding forward. In Israel a day ago, Mr. Brin referenced Google’s multi dimensional database progress. My sources tell me that it is not progress; it is a leap frog play.
So “innovate and disrupt in search” is going to boil down to tackling these problems, forcefully, squarely, and well.
First, how many search platforms will Microsoft support? SharePoint, whizzy technology from Microsoft Research, Fast Search & Transfer’s ESP, and the legacy systems that just won’t die. Each search platform is a money hog. Get too many of these critters chomping on the cash, and you will be one poor data farmer.
Second, if–and this is a big if–Microsoft cuts a deal with Yahoo, exactly how will two shot up World War I biplanes contend with Google’s F-35? Time is running out because the GOOG keeps gobbling market and mind share. It is the number one site on the Internet and the world’s top brand. Quite a one-two punch for piston powered aircraft to shoot down.
Third, Google’s business model is based on advertising. Google wants to diversify, and Mr. Brin’s comments in Israel a day ago suggest that he wants to put a rocket booster on Google Apps. Interest in cloud-based services continues to creep up, and Google is in a good position to innovate and disrupt in that sector. The company already is innovating and disrupting in search.
We’re watching a clash of cultures and business models. When Microsoft swizzled IBM in the 1980s, it was clever. Google’s not just clever; Google has the technical platform to redefine search and enterprise applications.
Mr. Johnson’s memo does little to convince me that Microsoft–with or without Yahoo–can do much to stop Googzilla from doing Googzilla-type things.
Stephen Arnold, May 19, 2008
Infobright: The Warsaw Connection to Rough Sets
May 19, 2008
Infobright is an interesting company. The wizard founders have some keen math skills. The company’s management might be as good but in marketing and sales. The combination means that Infobright is a company to watch.
The firm’s core business is selling a data management system that helps break the well-known and increasingly problematic bottlenecks that traditional databases put in front of business analysts. The relational database with its familiar rows and columns require serious engineering to make work in our world of petabyte data.
In a nutshell (and I am glossing over significant technical details), Infobright pulls data into its system. Then using some fancy math involving rough sets and other cutting-edge techniques builds a data warehouse. When you need to run a query, the Infobright system doesn’t run to the data table. The metadata allow the Infobright system to ignore data that are not germane and snag only that which is appropriate. The Infobright metadata method builds an index (not a very good word for these “views”, “set abstractions”, and probability matrices) that, for many queries, can answer the question. No hitting of the data in the warehouse necessary, thank you.
Infobright’s system interests me because years ago I did a small job for a Polish math wizard who set up shop in North Carolina. Several of the engineers used rough set and related math to create a search engine called Inferno. The metaphor of the “inferno” was intended to communicate the swarming math techniques that “discovered” relationships. Although not directly analogous to what Infobright is doing, I learned from the company’s founder, Dr. Zbigniew Michalewicz.
In my conversations with Dr. Michaelewicz, he communicated the significant potential of rough sets, mereology, and evolutionary computation to force me back to the math books. Infobright appears to be tapping into this mathematical mother lode.
It founders have some ties to Warsaw, one of the places where these mathematics are valued and made part of the curriculum for students who can deal with the notion of sparse tables, fuzzy sets, and recursive ant equations.
Infobright’s speed has caught the attention of organizations looking for ways to perform analyses quicker. You will want to navigate to the Infobright Web site and read the clear, but economical documentation available. However, to understand the sophistication of the application, you can think in terms of column-oriented data management systems.
The best-known column system is Google’s BigTable. But there are significant differences between Infobright’s approach and Google’s. The key point is that both Infobright and Google use some sophisticated math. Neither company is particularly forthcoming about these methods.
From my research, which may be incomplete, the key point is that the use of “Warsaw math”, a term I coined to refer to rough sets theory and related methods, allows a query to be satisfied without having to fetch data from the data store.
Infobright has implemented a fast-cycle data loader. The architecture of Infobright “sits outside” of the database system. In effect, the Infobright store is refreshed when new data are pumped into the Infobright system.
Google, on the other hand, updates its data store using its “relaxed write” method. So Infobright is a classic data warehouse / data analysis set up. Google is an online operation.
The key point is that some of the mathematical underpinnings are similar, at least to my aging eyeballs.
If you want to know more about the founders of Infobright, you can peruse very sparse biographical information here. The Warsaw connection jumps right out even in a thumbnail sketch. Details about rough sets may be found here. You can also run a Google query and follow the links on the first two pages of results. Most are useful. Information about mereology appears in Wikipedia. Though uneven, the entry is useful and contains additional links to follow. Information about BigTable appears in my 2005 study, The Google Legacy, which is available from Infonortics, Ltd. in Tetbury, Glou.
Stephen Arnold, May 20, 2008
PolySpot: Usability Fuels Growth
May 19, 2008
Olivier Lefassy, an investment professional turned business intelligence executive, is on the fast track. His firm–PolySpot–is growing at a double-digit pace. The company packages a suite of content processing technologies that “snap in” to licensees’ existing infrastructure. You can read this exclusive interview at http://www.arnoldit.com/search-wizards-speak/polyspot.html.
The idea is to provide powerful information access methods without the costly hand coding and months of tedious work that many vendors impose on their customers.
Instead of displaying a laundry list of results, the company delivers answers to system users. One of the system’s most interesting features, he told ArnoldIT.com for its Search Wizards Speak series, is:
…a Document Collaboration module. You are in a research team for a large financial organization. You locate a useful analyst report about a company. You can open the document and add a comment to it, appended to the original document. You can then put this into a public folder and forward it on to a colleague for his or her comments. We think this is like “document blogging” or annotating. These comments or additional information payloads are indexed “on-the-fly”.
He said, “Usability is key today. For too long the ‘large’ vendors ignored user needs at this level and tried to brainwash the market with talk of algorithms.”
To see the sharp contrast between PolySpot and a long-time player in search, take a look at “Up to Speed on Search” by Phil Muncaster. In that article Mr. Muncaster summarizes Autonomy’s view that some systems are “planes” and others “mere bikes”. The comparison underscores PolySpot’s approach to the market: power without undue complexity. PolySpot’s approach stands on one side of the usability argument and Mr. Muncaster’s essay makes clear that their is another, more complicated side to the argument that appeals to some vendors. In his essay, Mr. Muncaster uses the delightful phrase “particularly keen”, which struck me as quite telling. Could some of the established vendors feel pressured, not just by PolySpot, but the dozens of up-and-comers in information access who offer options to some organizations?
PolySpot’s managing director states clearly that there is a need for a different way to approach information access. The firm’s strong growth in the first three months of 2008 underscores that some European organizations are eager to put euros to work addressing content challenges. You can read the complete interview at ArnoldIT.com here.
Stephen Arnold, May 19, 2008
Thunderstone Adds Features
May 18, 2008
Search and text processing specialist Thunderstone has added two new features to its highly-regarded search appliance. Licensees can use a new SOAP API to “hook” the Thunderstond appliance into a licensee’s information environment. Thunderstone has provided a wide range of adaptors with its Appliance. The SOAP API adds more flexibility in acquiring content or integrating Thunderstone functionality into an organization.
Thunderstone has also added single-sign on. Users struggle with information systems that ask for passwords and user names unexpectedly. The Thunderstone function supports Active Directory based authentication. You can learn more about the Thunderstone appliance here. You can read the official Thunderstone announcement here. Click quickly; some news items can be tough to find after a few days online. ArnoldIT.com featured an in-depth on April 7, 2008, interview with Thunderstone here.
Stephen Arnold, May 18, 2008
Google Israel: Important Innovation Center
May 18, 2008
News about Google’s research work in Israel has been hard to get–particularly in rural Kentucky. Even identifying Meir Brand as the CEO of Google Israel requires some sleuthing. Over the last 24 hours, information is beginning to appear in bite-sized nuggets; for example, Haaretz.com and in emails sent to me from my contacts in the country. My Hebrew skills are shaky at best, so you will want to track down more stories about Google in Israel here and here. Google’s online translation for Hebrew to English word pairs is not available on Google Translate, at least the public version of the system.
Here’s what I have as of 0900 Eastern time, May 18, 2000:
- Strong interest in sustainable energy, specifically Israel’s interest in “environmentally friendly” transportation
- Google’s technology takes time to shape. There’s a myth that Google can work wonders in a day or two. Brin asserts that significant effort is needed; for example, the ad system took years to refine
- Research in Google Israel includes Google Suggest, Google Maps, and some “green projects”
- Google Israel was involved significantly in Google Trends
- In response to a jibe from Microsoft about search lacking innovations, Mr. Brin pointed to limitations in browser technology.
The most interesting information I saw concerned Google Apps. These will become more widely used in business in the future. The report from Yedioth Ahronoth, is only available in newspaper form at this time. I am monitoring my feeds for an electronic version. The story, according to my source, hinted that Google had made some strides in book, video, and multi-dimension search. I will post updates, links, and corrections as I locate them today (May 18, 2008). If you find additional information, please, post it in the comments section below this preliminary summary.
Stephen Arnold, May 18, 2008, 0900 Eastern
Enterprise Search: The Good, the Bad, the Downright Ugly
May 17, 2008
Author’s Note: This essay is the basis for my conference end note, which I deliver on May 21, 2008. The venue is Information Today’s Enterprise Search Summit. The program committee has slotted me in the anchor position to provide an overview of what the more than 40 speakers said and to keep some attendees from rushing to the airport. The idea is that I am controversial and some vendors want to hear what I say so they can get the attorneys organized to write me threatening letters. My actual remarks are based on the essay below.
Yes, I am wearing bunny rabbit ears. I was going to put on my bikini, but at my lawyer told me, “You can be sued for assault.” I am wearing the ears.
The reason? A big wig at a large, really ethical pharmaceutical company–maybe that’s an oxymoron– told me that my 1980 picture on my Web log here was “unprofessional” and “disturbing”. Well, in 1980, when I talked to a group of executives, online search and data were unknown. Business executives are conservative, like the Roman ruler Caligula’s advisors. The ears broke the ice. Anyway, I’m not the one in hot water with the FDA. Maybe those guys should wear them?
Today (May 21, 2008) everyone in this room plus your friends and your children use online information services. A fat, old guy wearing bunny rabbit ears makes zero difference. You saw more interesting sights in Greenwich Village on your way to dinner, correct?
What is different about search today? It’s ubiquitous. Also, it is essentially unchanged. Here’s a screen shot of a system that displays information in an interesting way. Here’s a Google report. Sorry it’s in black and white, but I am a persona non grata at Google. I have to scour the open source literature to find out that the Googlers also know traditional search isn’t going to cut it moving forward.
What I learned at the conference, and I admit I could not sit through every session. I had to poke my head in and out of sessions. Feel free to push back if you disagree. Even better, I will pay you $2.00 (that’s my usual $1.00 adjusted for inflation.)
The Good
The speakers who prepared–Sue Feldman and Martin White–made the conference worthwhile. The speakers who recycled product literature and said, “I’m giving a product review” made the conference useful. I like product reviews. Also, I like the Google Search Appliance, probably because my son, Erik, would make my life miserable if I didn’t effuse Google goodness. I also like the systems I profile in my Beyond Search study, which you can buy from Frank Gilbane, a content impresario.
The Bad
Man, infomercials. I sit in a session. The speaker has an affiliation unrelated to a search vendor. The talk is the vendor’s sales pitch. These are a total waste of time, and the speakers should be sent to Toastmasters International or a remedial speech class. My view is that there are more of these “planted talks” than ever before. It’s a disturbing trend that I have seen at other conferences this year sponsored by other companies and with independent program committees. Not good.
The Ugly
I want to spend the remaining time on five points. Then I will pay $2.00 for a question. You can start thinking about the errors in my analysis now, and I don’t have any reluctance to let you pin me to the wall for my opinions.
- Talking about semantic search, Web 3.0, and text mining does not a business make. In fact, the whole PR blitz about “better search” leaves me cold. It’s not that these buzzwords don’t mean something. They do. The systems aren’t a leap forward.
- Enterprise search is a problem. The vendors can’t and won’t talk about their disasters. The licensees are often prohibited by the license terms from saying negative things about a system. My research, Jane Russell’s in Paris, Sinequa’s, and studies summarized for me by Martin White make one point: 60 to 70 percent of the users of Big Name search systems are dissatisfied. That’s not going to change as long as these companies sell systems that date from the late 1980s. On my blog I posted the tag lines for about two dozen vendors. The average age of the companies? 1997. Nothing new, folks. Nothing new.
- Google is a big deal in the enterprise, and I am getting tired of hearing people dismiss the company’s presence as trivial or an aberration. My sources reveal that Google is THE largest enterprise search vendor. The company has more than 9,500 GSA licensees. The company is struggling to deal with inquiries about geo spatial, hosted services, and other cloud-based products. Does Google tell me this? No, Google’s Larry Page remembers that he squabbled with me in 2000 at the Boston Search Engine Meeting, and he wants me put in the Smithsonian’s computing exhibit, locked in with the UNIVAC.
- Costs are not just a problem. The costs associated with enterprise search are going to be a major problem going forward. Data transformation can consume as much as 30 percent of an IT department’s budget. The customization costs of some enterprise search systems are so high that licensees can’t make a system better. Take a look at the pre-acquisition Verity. It was a services firm, not a search vendor. Now other vendors are going for this high margin business. Some enterprise search systems are designed to sell consulting.
- Scaling ain’t us. Most of the vendors whose systems I examine for my various reports and studies don’t scale gracefully. What does this mean? It means that a licensee has to throw hardware at a problem, figure out how to tune a complex system on a complicated Frankenstein infrastructure, and figure out how to make these changes without trashing the index and going back to square one. Some systems scale. Siderean, Exalead, Coveo, ISYS. I can’t name them all. What’s important is that none of the Big Names scale gracefully. Up-and-comers, profiled in Beyond Search–my new study–do a better job of this.
Wrap Up
So what did I learn? The marketing frenzy that infects so much of our information world has reached enterprise search. The vendors and their financial challenges make it tough to get the straight dope on search systems. Finally, people who volunteer to speak at conferences often spend little time creating a presentation that will knock the attendees’ socks off.
What did I like? I like the sector. It’s booming. There’s a lot a interesting stuff out there. Cluuz.com. Silobreaker. Look to the newer systems. Oh, don’t ignore Googzilla. Think surf on Googzilla.
Stephen Arnold, May 18, 2008
Love SharePoint Search, Hate What It Can Index
May 17, 2008
There are upwards of 85 million Microsoft SharePoint licensees. Some of them rely on SharePoint search. A bit of fiddling with the plain vanilla SharePoint search reveals some challenges. You can’t index Documentum, OpenText, and Exchange stores often without coding. SharepointWorks can help you now. This Wakefield, Massachusetts-based company offers a range of connectors that you install and use. The company’s new Documentum connector supports 32- and 64-bit systems. For more information, navigate to SharePointWorks, and order your today. Take a look plus I like the company’s Web log here.
Stephen Arnold, May 17, 2008
Video Search Bragging Rights: Blinkx Says It Is Bigger Than Google Video
May 16, 2008
For those stuck in northbound traffic on the slow moving river of traffic that is Highway 101, a quite large billboard that told me that Blinkx is the world’s largest video search engine.” In mid-May 2008, a rumor swirled across the Internet that News Corp. was kicking Blinkx’s tires. Was an acquisition in the wind? Was this billboard part of an acquisition campaign? Was it a reminder to Silicon Valley that Google’s span of control did not include video search?
I was sensitive to digitized video for two reasons. The Auto Channel told me that it has thousands of hours of automotive-related video. One interesting aspect of this is that when a video gets “hot”, it gets a great deal of traffic. What’s mystifying, if I understood what The Auto Channel told me, is that it’s very hard to predict what will strike the user’s fancy.
The other reason is that I spoke with a programmer who once did a bit of work for a couple of the large European video services. I can’t reveal the name of the project this person worked on, but it rhymes with “goosed”. The point was that video is flooding the Internet, and it is difficult to generate enough revenue to keep up with the research, development, programming, and bandwidth charges. Video on a metered line is important to many users, but, if I understood his comments, those users don’t pay. Advertisers want “tight” demographics, and the usage data aren’t compelling enough to allow some video sites to generate enough cash to stay alive at this time.
I am not sure how much video Blinkx has indexed. I heard from one of my sources that Google receives more than 1.2 million video uploads per month. I recall reading that the GOOG accounts for more than 60 percent of video search traffic, but since the ComScore traffic flap, it’s tough to know just how much traffic Google has. Could be 70 percent, maybe more. A few days ago, ComScore said Google was the number one Web site on earth. Maybe? Maybe not? Google knows because it does not have to estimate its traffic. My sources tell me that Google just counts traffic, no sampling necessary, to skew the data.
The Blinkx tag line is “Over 26 million hours of video. Search it all.” Their system appears to have a slather of patent documents in place. I tallied more than 100 when I stopped counting. Its conceptual search that includes speech recognition, neural networks, and machine learning to create text transcripts. That text is then searched.
LTU: Challenging the Thomson Reuters Trademark Fortress
May 16, 2008
LTU Technologies in France is putting its well-regarded image search technology to work in a proprietary trademark database. The LTU system compare a submitted digital image against the database to confirm an already extant trademark or industrial design. You can read about that here. Click quickly. Some of news stories disappear without warning.
LTU’s image search has been among the most accurate available. Military and intelligence entities have been among LTU’s most eager customers. Now LTU is moving into a service where Thomson Reuters has a strong, if not dominant, position. You can read about Thomson’s Trademarkscan service here.
Thomson also operates Derwent, a patent information service, and the company has dozens of complimentary information services for intellectual property.
What are LTU’s chances of running with the big dog? We think that LTU will have to move quickly and be prepared for some Thomson Reuters push back.
If LTU puts out a quality product and focuses its effort, LTU may be able to offer an alternative to Thomson Derwent customers looking for options. But speed and quality are important. Oh, LTU has to be prepared for Thomson Reuters-style competition.
Jessica Bratcher, May 16, 2008