Looking for the Next Killer App: Moving Beyond Search
August 7, 2008
For years, the “next killer app” was email. Email, it turns out, is a headache. Younger folks are happy with instant messaging variants. SMS is okay. More “now”, Twitter-like functions are better. As the giants of software were pumping millions into R&D and the venture crowd was trolling the corridors of universities, the “next killer app” arrived. According to Pew/Internet: Pew Internet & American Life Project, search is the big dog. You can read the Pew story here. For me, the key point in the Pew data was “the number of those using a search engine on a typical day is pulling ever closer to the 60% of internet users who use email, arguably the Internet’s all-time killer app, on a typical day.”
So, how do I find an email a day old or older? Search. Gmail search works pretty well tool. Yahoo’s email search is okay, just pokey on my connection.
With Google dominating search, what’s Google’s next killer app?
My research suggests that Google is poking its snout into information access and management. I call it “search on steroids”. Part of this effort is Google’s Programmable Search Engine. Another part is data management. Competitors need to crank up their innovation engines and figure out how to leap frog Google. What’s “beyond search”? Competitors who want to catch Google may be too late.
Stephen Arnold, August 7, 2008
SearchCloud Updated
August 7, 2008
SearchCloud.net has updated its search system. You can now adjust term weightings from the results page. If you are not familiar with this system, you will want to navigate to http://www.searchcloud.net. The idea is to enter a term and then select a font size for that term. The system puts the term in the cloud and makes an internal notation to weight that term in proportion to its font size. The larger the font, the more significant the term is to your query. Term weighting has been available to search system administrators, but the function is usually excluded from user facing controls. I wrote a profile of the company’s system, and you can read that essay here.
Other changes in the update include:
- Tweaks to the interface
- A hint box
- Results can be copied.
I will keep you posted about developments. A happy quack to those who support term weighting.
Stephen Arnold, August 7, 2008
Sprylogics’ CTO Zivkovic Talks about Cluuz.com
August 7, 2008
The popular media fawn over search and process content companies with modest demos that work on limited result sets. Cluuz.com–a company I profiled here several weeks ago here–offers more hearty fare. The company uses Yahoo’s index to showcase its technology. You can take Cluuz.com for a test drive here. I was quite interested in the company’s approach because it uses Fancy Dan technology in a way that was immediately useful for me. Cluuz.com is a demonstration of Toronto-based Sprylogics International Inc. The company is traded on the Toronto exchange symbol TSXV:SPY.
With roots in the intelligence community, unlocking Sprylogics took some work. Once I established contact with Alex Zivkovic, I was impressed with his responsiveness and his candor.
You can read about the origins of the Cluuz.com service as well as some of the company’s other interesting content processing technology. The company offers a search system, but the real substance of the company is how the company processes content, even the Yahoo search index into a significantly more useful form.
The Cluuz.com system puts on display the firm’s proprietary semantic graph technology. You can see relationships for specific subsets of relevant content. I often use the system to locate information about a topic and then explore the identified experts and their relationships. This feature saves me hours of work trying to find a connection between two people. Cluuz.com makes this a trivial task.
Mr. Zivkovic told me:
So, we have clustering. We have entity extraction. We have a relational ship analysis in a graph format. I want to point out that for enterprise applications, the Cluuz.com functions are significantly more rich. For example, a query can be run across internal content and external content. The user sees that the internal information is useful but not exactly on point. Our graph technology makes it easy for the user to spot useful information from an external source such as the Web in conjunction with the internal information. With a single click, the user can be looking into those information objects.
I probed into the “guts” of the system. Mr. Zivkovic revealed:
Our engineers have worked hard to perform multiple text processing operations in an optimized way. Our technology can, in most cases, process content and update the indexes in a minute or less. We keep the details of our method close to the vest. I can say that we use some of the techniques that you and others have identified as characteristic of high-speed systems like those at Google, for example.
You can read the full interview with Mr. Zivkovic in the Search Wizards Speaks interview collection on the ArnoldIT.com Web site. The full text of this exclusive interview is here. A complete index of the interviews in this series is here.
Attivio: Active Intelligence Engine Version 1.2 Released
August 7, 2008
Attivio is on my radar. The company demonstrated its next-generation business intelligence system to me at the AIIM Show in April 2008. I liked what I saw. I interviewed the founder of Attivio, and you can read that transcript on the ArnoldIT.com Search Wizards Speak site here.
Now the company has released Version 1.2 of its Active Intelligence Engine, which means the gang in Wellesley, Massachusetts, is on the move. You can read the write up here.
Attivio–like some other newcomers–is not just a search or information access engine. Attivio has rethought the problem of getting information to employees who are under pressure or just in a hurry to get their kids from day care. I will dig into some of the new features later this month.
For now, let me highlight some of the new features of AIE 1.2. (I must admit when I pronounce AIE, I imagine the sound of a scream from competitors. Then after the scream dies down, I hear, “Why didn’t we implement those functions?”).
Four points about Version 1.2 caught my attention:
- AIE is positioned as a platform. The idea is that you can deploy quickly and build on top of the AIE system.
- Rich index that combines ease of use with the type of precision associated with structured query language statements. To me, this means I can get what I need without trying to get a programmer’s attention or spend some time flipping through a SQL manual
- Fast index updates and real time alerting.
- New connectors and support for 50 languages.
Attivio wants to deliver business intelligence without the hassles of most older business intelligence systems. The Bottomline for me is that Attivio is focusing on basics: speed, ease of use, quick deployment, and platform extensibility. You can learn more about Attivio here.
Stephen Arnold, August 7, 2008
Hosted SharePoint Info
August 7, 2008
Network World’s Mitchell Ashley scooped most Microsoft watchers with “Microsoft Spills the Beans on Hosted Exchange / SharePoint”. Mr. Ashley tracked down Microsoft’s John Betz, Director Product Management for Microsoft Business Online Services. The conversation–available as a podcast here–provides useful information about hosted SharePoint. Mr. Ashley tossed some high, soft, easy to field questions, but several points jumped out at me. These were:
- The cloud play is “Hosted by Microsoft and sold by partners”. Infrastructure is going to be one important key ingredient in this new service stew.
- Pricing has a ceiling of $15 per user. Most folks will pay less. These prices strike me as “pulled from the clouds.”
- Microsoft “will make it all work together. Active Directory will communicate with hosted Exchange and SharePoint. A “new tool will be provided”.
- Trade off for hosted Exchange and SharePoint–give up some control. “We make assumptions and settings on your behalf…. If you want customization, you need on premises Exchange and SharePoint.”
- The Service Level Agreement is for uptime, not transit time or any other network function.
- “We absolutely rely on partners. This is a great opportunity to sell an online service today and get paid forever.” Reason: Support comes from partner or local information technology group. Online services are for organizations that have an IT person on staff. “We’re delivering meat and potatoes. Our partners can put an embellishment upon these services.”
This is a very interesting chunk of information. A happy quack to Mr. Ashley.
Stephen Arnold, August 6, 2008
Google Search Appliance: Showing Some Fangs
August 6, 2008
Assorted wizards have hit the replay button for Google’s official description of the Google Search Appliance (GSA)
If you missed the official highlights film, here’s a recap:
- $30,000 starting price, good for two years, “support” and 500,000 document capacity. The bigger gizmos each can handle 10 million documents. These work like Christmas tree lights. When you need more, just buy more GSAs and plug them in. This is the same type of connectivity “big Google” enjoys when it scales.
- Group personalization; for example, marketing wizards see brochures-type information and engineers see documents with equations
- Metadata extraction so you can search by author, department, and other discovered index points.
If you want jump right into Google’s official description, just click here. You can even watch a video about Universal Search, which is Google’s way of dancing away from the far more significant semantic functionality that will be described in a forthcoming white paper from a big consulting firm. This forthcoming report–alas–costs money and it even contains my name in very small type as a contributor. Universal Search was the PR flash created for Google’s rush Searchology conference not long after an investment bank published a detailed report of a far larger technical search initiative (Programmable Search Engine) within the Googleplex. For true Google watchers, you will enjoy Google’s analysis of complexity. The title of the video is a bit of Googley humor because when it comes to enterprise or behind the firewall search, complexity is really not that helpful. Somewhere between 50 and 75 percent of the users of a search system are dissatisfied with the search system. Complexity is one of the “problems” that Google wants to resolve with its GSA.
When you buy the upscale versions of the GSA, you can implement fail over to another GSA. GSAs can be distributed geographically as well. The GSA comes with support for various repositories such as EMC Documentum. This means that the GSA can index the Document content without custom coding. The GSAs support the OneBox API, which is an important component in Google’s enterprise strategy. With the GSA, a clever programmer can use the GSA to create Vivisimo-style federated search results, display live data from a Microsoft Exchange server so a “hit” on a person shows that person’s calendar, integrate Web and third-party commercial content with the behind-the-firewall information, and perform other important content processing tasks.
Google happily names some of its larger customers, including Adobe Systems, Kimberly-Clark, and Sunnybrook Health. The company also does not mention the deep penetration of the GSA into government agencies, police organizations, and universities.
Good “run the game plan” write ups are available from CNet here, my favorite TechCrunch with Eric Schonfeld’s readable touch here, and the “stilling hanging in there” eWeek write up here.
After registering for the enterprise videos, you will see this splash page. You can get more information about the upgrade to Version 5 of the GSA.
My Take
Now, here’s my take on this upgrade:
First, Google is responding to demands for better connectivity, more administrative control, and better security. With each upgrade to the GSA, Google has added features that have been available for a quarter century from outfits like Verity (now part of the Autonomy holdings). The changes are important because Google is often bad mouthed for offering a poor enterprise search solution. With this release, I am not so sure that the negatives competitors heap on these cheerful yellow boxes are warranted. This version of the GSA is better than most of the enterprise search appliances with which I am familiar and a worthy competitor where administrative and engineering resources are scarce.
IBM: Blurring Lotus Notes and Enterprise Content Management
August 6, 2008
Marketwatch appears to have picked up an IBM news release and posted the write up without any editorial massaging. You can read “IBM eDiscovery Software Helps Organizations Win the Compliance Battle” here. The purpose of the news release was to explain that IBM offers “Enterprise Content Management (ECM) software designed to help clients meet challenging legal discovery requirements.” A couple of years ago, I had to take a look at IBM’s content management services. These ranged from FileNet to applications deployed within WebSphere. In addition, IBM was actively involved in deploying Documentum, albeit in some remarkably interesting situations that feathered the nest of some legal eagles.
This IBM news release and Marketwatch “news” story asserts that IBM has “enterprise content management” which complements Lotus Notes. The “new” approach “helps organizations win the compliance battle.” Hmmm. I thought compliance was a requirement designed to assure that certain actions were taken. I don’t think of compliance as a battle, but I’m not IBM. I was just a fellow asked to figure out what IBM offered in the way of content management. The real news in this release is that IBM is pushing into eDiscovery, which is neither enterprise content management nor compliance work. eDiscovery is its own separate thing, but obviously IBM is setting me straight. The news story on Marketwatch said:
IBM’s eDiscovery software is the first to leverage a complete ECM platform to transform the process of eDiscovery by proactively managing electronically stored evidence. The new eDiscovery software integrates with IBM’s auto-classification and records management technology to help IT departments manage information for compliance and electronic discovery requests. IBM eDiscovery software also integrates with IBM’s content-centric business process management (BPM) capabilities to help organizations standardize, control and automate legal discovery workflows and enable third-party components as needed.
Let’s think about this. IBM has “new” software that:
- Is Lotus Notes
- Is Enterprise Content Management
- Performs eDiscovery
- Integrates business process management
- Performs compliance requests
- Manages electronically stored evidence
- Hooks into automatic classification
- Connects to records management.
I am a bit confused. One “new” product is the digital equivalent of the entire stock list of AsSeenOnTV.com? I find this remarkable and pretty close to science fiction. But I am an addled goose in an acid rain soaked hollow in rural Kentucky. Those folks in Armonk and Almaden are sure capable innovators. I wasn’t sure every buzz word in search and content processing could be squeezed into one meaty “news” release.
Stephen Arnold, August 6, 2008
Vadlo: New Bio Search Engine
August 6, 2008
In the last year, I’ve noticed that Google is doing less PowerPoint indexing. The cause may be lousy usage so the expense of dealing with the pick up sticks file formats is not worth the computational effort. Another idea is that more people–including me–are dumping PowerPoint decks to Adobe Portable Document Format files and saving some time while reducing bloated PowerPoint files to a more manageable size. Whatever the reason, PowerPoints do contain useful information. Vadlo, a biological-oriented vertical search engine, indexes PowerPoints, training classes, references to protocols, seminars, databases, and software. Like Dieselpoint, Vadlo is a product of Chicago. I think the city is trying to change its metaphorical association from “meat packer” and “big shoulders” to search and content processing. Vadlo is owned by two biology scientists and its mission is “to locate biology research related information on the Web”. The company may get a boost because Google is not doing a particularly thorough job in this area, and as noted, the Google is doing an even poorer job of keeping up its PowerPoint indexing. Vadlo includes cartoons which are undoubtedly real side splitters for biologically-enriched wizards. You can download them here or wait until these turn up in the New Yorker Magazine.
My recollection is that vádló is a Hungarian word which can be a name or mean “accuser”. My hunch is that clever biologists have unearthed this word to find a five letter domain name. I did a bit of poking around, and I found references to this system in comments appended to Cuil.com articles, in a couple of Google Groups, and on a handful of Web logs. One post was dated 2007, so this Valdo is not really a newcomer to search like the outfit I will be visiting later this month. Check out the system. Just don’t accuse me of having lousy language translation skills. I live in rural Kentucky, shoot squirrels, and eat burgoo.
Stephen Arnold, August 6, 2008
SiSense: Shows Good Sense
August 6, 2008
I am gathering information about Google’s slow but steady progress in the enterprise software sector. My research indicates that the tasty Gummy Bear that lures people to Googzilla is maps, satellite imagery, and the nifty overlays that a licensee can plop on a Google map. You may want to look at what SiSense–a business intelligence start up–is doing with the oft-reviled Google Spreadsheets. SiSense is in the business intelligence software business. A SiSense customer can navigate to http://www.sisense.com, learn about the tight integration between the SiSense software and Google spreadsheets as a data source, and download a software widget.
SiSense has concluded that some of its customers use Google spreadsheets to hold data which can then be crunched using SiSense’s business intelligence routines. One application is for a distributor with seven sales reps who use Google spreadsheets to hold various data. The SiSense licensee can such data from these spreadsheets, roll it up, and crunch away. SiSense hit may radar because it uses the Amazon S3 service as well.
Prices are not available. SiSense is gearing up. My hunch is that if Amazon introduces a spreadsheet, SiSense may jump from Google to Amazon. Conversely, if Google makes public its remarkable data management ecosystem, SiSense may say “Sayonara” to Amazon’s pretty darned exciting Web services.
Here’s my take on how Google is attacking the enterprise. In the hospital where my mother is recovering from a heart attack, there’s a sign in the elevator to the cardiology unit. It says, “Don’t stack boxes on the floor. In case of fire, the box may continue to burn. Water will also damage the contents of the cartons.”
Google is entering into deals that work like the fire in a carton. Once the base has been penetrated, top down efforts might not extinguish Google. Once the little Google flame starts to burn the entire box is at risk. IBM, Microsoft, and Oracle are quite happy with their big fire hoses. These hoses can put out any annoying Google fires, or so the assumption goes. I am not so sure. One part of Google’s enterprise strategy is to set a bunch of small boxes on fire with the excitement of Google functionality. Google can then sit back and wait for the heat to build and then cook some goose–hopefully not this Web log’s logo.
Stephen Arnold, August 6, 2008
SharePoint: Nah, Not Complex at All
August 5, 2008
Microsoft has made available additional SharePoint documentation. If you have been wondering what other Microsoft servers you need, Microsoft spells it out. As an added bonus, Microsoft helps you plan for the hardware you will need to get the most out of your SharePoint environment. To get this information, navigate to a “sharepoint archive” here. The information comes from the helpful crowd at the Microsoft SharePoint Team Blog, which is, of course, the official blog of the Microsoft SharePoint Product Group.
Two 30 page white papers were quite helpful to me. The first is “Search and Indexing” here and the second is “Microsoft SharePoint Products and Technologies Server Farm Architecture” here. Note: both are Word files with some code snippets but no diagrams. I had to visualize some of the constructs, and it gave me a headache. You will probably find the Microsoft explanation exactly what you need to build out your SharePoint server farm.
This Web page provides links to more white papers and videos in case you like to dig into SharePoint by kicking back and watching the tube.
The one comment that stuck with me was the reference to Microsoft Forefront, a security product I had forgotten about. Here’s the passage from “Microsoft SharePoint Products and Technologies Server Farm Architecture” that grabbed me:
Microsoft Forefrontâ„¢ Security for SharePoint is a purpose-built product that you can use to protect your Office SharePoint Server 2007 or Windows SharePoint Services 3.0 deployment from malicious code, undesirable content, and disclosure of confidential information.
Like Oracle, Microsoft is urging licensees to get YASP; that is, yet another server product. One would think with the fleet of servers required to make SharePoint work, security would be baked in. Silly me.
Stephen Arnold, August 5, 2008