Operational Intelligence, the New Enterprise Search
April 14, 2010
Worlds are colliding. Business intelligence, search, analytics, and business process are hurtling toward one another. No collider is needed. The impetus comes from managers who are struggling to keep their firms above water. Make no mistake about it. The economic climate may be improving based on government data and the self serving reports from global financial powerhouses. But just look at the number of empty buildings, the fraying infrastructure, and the desperation in the eyes of most employees in North America.
For those lucky enough to be thriving in a world gone mad for sending ads to individuals, life may be good. For people who are in more traditional jobs, the notion of finding information is an everyday struggle. Without the right information at the moment it is needed, organizations can make costly mistakes. These are not errors of judgment like magazine publishers who see the iPad as the font of new revenue or the dew eyed MBA looking for a job with a third string consulting firm. Nope. These visages reflect the person who cannot explain to a customer why an order was lost or an automobile was delivered with a faulty electronic gizmo. In fact, I see the effects of downsizing, the need to squeeze extra money from every transaction, and crazy decisions made by committees everywhere I look, regardless of the country.
What’s the answer? According to a sponsored white paper from the consulting outfit IDC, Teradata has the fix. Now you may not think that even bigger piles of data will help your business. I admit that I don’t believe the premise either. You can get the story in “Real-Time Operational Intelligence Gains Momentum in Europe: Teradata-sponsored business survey shows adoption details for ‘Active Data Warehousing’” and make up your own mind. Big data means big costs in my experience.
What I liked about this write up was the phrase “real time operational intelligence”. True, the acronym RTOI is a bit clumsy, but I think the phrase points to an important shift in search and content processing. RTOI delivers what many of the people with whom I speak perceive enterprise search delivering. The idea is that the information in an organization is available when needed to help people answer questions and make decisions. Hopefully the decision makers did well in school and have a modicum of common sense.
After thinking about this phrase and the acronym RTOI, I had several thoughts:
- Vendors of enterprise search may want to make this phrase their own. It is a heck of lot more compelling than “putting information at your fingertips” or “dashboard”
- Search, in this phrase’s embrace, becomes an enabler. Search becomes like butter in a recipe. Without the ingredient the dish does not work. Many vendors of search see themselves as the fish, vegetables, and spices in the meal. RTOI makes search an essential but supporting ingredient.
- The conceptual outcome of RTOI may be consolidation of what now are marketed as separate systems. For RTOI to work, an organization needs an integrated approach. Data are not enough. The various features and functions of analytics, retrieval, report generation, and business processes must be woven together into one coherent, affordable system.
Is RTOT the future? I am willing to float a tentative, “Yes.” Fragmented information centric systems are now a cost and resource challenge for many organizations. The time is ripe for a new approach. Maybe it will be fueled by open source software like Lucene? Maybe it will be the use of a system like Google’s? Maybe it will be a roll up following the trajectory of Autonomy or OpenText.
The status quo is not delivering and change may be coming. Teradata may not be the winner, but it has contributed a useful catch phrase in my opinion. The phrase “enterprise search” could be put to rest which would be a step forward in my opinion.
Stephen E Arnold, April 14, 2010
Unsponsored post.
Google Aims for Average
April 14, 2010
“Google CEO Eric Schmidt Talks Up Web-Based Enterprise Apps” in Datamation includes a comment that I found startling. According to the article,
“Our applications aren’t full replacements to the incumbents,” he said. “Our goal is 80 percent because then we provide value and the features most users want.” The “value” Google offers over traditional PC-bound apps includes a fast iteration of new features, an emphasis on collaboration, low cost and Web-based access.
My interpretation of this comment, if it is accurate, is that Google is implementing a “good enough” strategy. The company sees its sweet spot as a C average vendor. Perhaps I am off base? Google has made much of its hiring the best people. Now Google is using its talent pool to deliver 80 percent. That may be the way to generate revenue which is the name of the game in the US. Will the C average approach apply outside of Google´s enterprise initiative?
Stephen E Arnold, April 14, 2010
Unsponsored post
Two Acquisitions: Divvyshot and Episodic
April 14, 2010
While on travel on Saturday, I read two separate news items about two competitors’ acquisitions. Facebook purchased a photo sharing outfit called Divvyshot. I had never heard of it. To my added goose eye, the Divvyshot service looks like Flickr with the requisite search and social functions that make venture capitalists drool. The service makes it easy to create a collection of images, which Divvyshot calls events. This is in line with the type of thinking I heard described years ago when a Microsoft researcher was explaining how people think about information; for example, the letter I received when I got engaged.” This is the “hook” approach to content organization.
The Google purchase delivered an outfit that is able to stream live video. YouTube.com has its own streaming video technology. Episodic is able to stream and it includes a package of services; that is, instead of an invention, Episodic has a more or less complete service, including a function that makes flash videos work on the Apple iPhone and presumably the iPad. See “Episodic Makes Flash Videos iPhone Friendly”.)
Several observations:
First, the Facebook acquisition goes into the guts of what Facebook users are now doing. Facebook is one of the largest photo repositories in the social media space. Divvyshot is likely to make existing customers happier because Facebook is not particularly good at certain types of content organization. The company is improving, but there are some constraints that madden users like me. The Google acquisition is more a product and people deal. Google can do specific inventions, but Episodic puts different things together in a reasonably coherent package.
Second, the Facebook deal is about addressing a “now” problem. The Google buy seems to be part of a build out strategy for rich media at Google. What strikes me is that Facebook is chugging along and taking steps to “me too” service functions available elsewhere just not within the Facebook walled garden. Google is trying to short cut product development. Which is the better strategy? I don’t know.
Third, both companies are buying as well as investing in their own technologies. Facebook is more of a tactical move. Google seems to be evidencing some impatience with its own line up of video inventions, products, and services. Is Google also buying staff in order to accelerate the company’s role in rich media.
I want to see how these two companies interact. Right now, Facebook seems less pressured in the rich media space that Google. Google, on the other hand, may find itself falling further behind leaders in rich media. Search and text advertising just may be losing their turbo charging capability. Quite a surprise if this assertion is accurate.
You can request a free sample chapter from Google Beyond Text, my new study of Google’s infrastructure, by navigating to http://www.theseed2020.com/gbt/. I explore rich media as an opportunity for Google to grow or for rich media to gum up the Google F 1 race car engine.
Stephen E Arnold, April 14, 2010
No one paid me to write this.
dtSearch Expands
April 14, 2010
dtSearch, the ultra interesting search vendor in Maryland, was the subject of “dtSearch Expands File Parsers and Converters; Content Extraction Only Licenses Available.” dtSearch triggers memories of small blue and white advertisements in trade publications. The angle is that dtSearch can search lots of text quickly and the system is low cost in comparison to some other vendors of search systems. I have tested the system, and it works.
Source: http://www.dtsearch.com
I find some of the interface conventions inappropriate to my style of working but you will have to give the system a test drive and make up your own mind. The article—a news release type write up—points out that dtSearch is getting into the file conversion business. The leaders in this sector offer solutions that some find too expensive. dtSearch may follow its proven marketing approach and put some pressure on the industry leaders like Oracle-Stellent. The write up says:
The file parsers and converters now cover Adobe Framemaker MIF, XFA form templates, and Visio XML, in addition to existing supported file types like HTML, PDF, XSL/XML, ZIP, OpenOffice and MS Office files (through current released versions). The parsers also support popular email formats, along with the full text of attachments. For a complete list of supported file types, see http://support.dtsearch.com/faq/dts0103.htm.
The story also describes a “content extraction” license. The explanation in the write up is:
The dtSearch Engine embeds the file parsers for hit-highlighted WYSIWYG display of web-ready files and HTML conversion (with hit-highlighted display) of other file types. Content extraction only licenses are also available.
We will have to test this system to understand exactly what is permitted. No pricing information was available in the story. My notes about dtSearch show that fees begin in the hundreds of dollars and rise from there. Compared to other Microsoft-centric search systems, dtSearch is definitely a lower cost option.
You can see the dtSearch system in action if you have access to Mimosa. That company, according to my notes, uses dtSearch in its content processing system. As you may know, Iron Mountain acquired Mimosa. It will be interesting to see how the acquisition affects the trajectory of dtSearch in certain indexing situations.
My Overflight system generates a number of links to Softpedia for a “free” download of the dtSearch system. The page here describes dtSearch 7.65.7887 in this way:
The dtSearch is a complex product and includes dtSearch Desktop, Spider, Network, Web, Publish and Text Retrieval Engine. dtSearch products instantly search gigabytes of text across a desktop, network or Internet/Intranet. Products can also publish large document collections to Web sites or to CD/DVD. dtSearch is … the Smart Choice for Text Retrieval since 1991.
The features of the system, according to http://www.softpedia.com/get/System/File-Management/Text-Retrieval-Engine.shtmlSoftpedia, include:
- Provides over two dozen indexed and unindexed text search options for all popular file types.
- Supports full-text as well as field searching in all supported file types.
- Has multiple relevancy-ranking and other search sorting options.
- The dtSearch product line displays retrieved files in a browser with highlighted hits and convenient hit and file navigation options ??” next hit, previous hit, next document, etc.
- For HTML and PDF, the products highlight hits while keeping embedded formatting, links and intact.
- For all other supported file types (“Office,” XML, ZIP, etc.), the product line has built-in HTML file converters for displaying these files in a browser with highlighted hits
- dtSearch Engine supports SQL, C++, Java, VB.NET, C#, Delphi, ASP.NET
My reaction to Softpedia’s write up is that it is promising a great deal. Considering the converters have just been expanded, I think that the use of “all” is quite interesting and a categorical affirmative. The technology of dtSearch seems to date from 1991. That makes it one of the more chronologically mature systems available today. Search has changed significantly in the last 19 years and the absence of nods to social content, semantic technology, and business intelligence type functions distinguishes dtSearch from some of the other competitors in this market sector. Finally, the software offered by Softpedia carries a $999 price tag, which seems to fit between open source search at essentially zero cost and the six and seven figure systems available from certain vendors.
Bottomline: the 1991 interests me and begs the question, “How has dtSearch been able to invest in new technology and offer such a compelling price point?” The answer to this question may instruct other content processing vendors so they avoid the financial pressures that companies like Delphes, Nstein, and others have experienced.
Stephen E Arnold, April 14, 2010
No one paid me to write this.
Autonomy Amps Social and Rich Media
April 14, 2010
At the National Association of Broadcasters conference, Autonomy announced enhancements to the Virage MediaBin platform. The latest version of MediaBin “automatically forms a conceptual understanding of all rich media assets located in any internal or external repository, including social media, blogs, and videos.” Autonomy’s “meaning based computing” makes sense out of non text content. The firm said:
Autonomy Virage’s solution overcomes these challenges by enabling businesses to automatically understand the value of all digital assets created both inside and outside an organization, and dynamically deliver the right content to the right customer, every time. At the core of Autonomy’s Virage MediaBin platform is the Intelligent Data Operating Layer (IDOL) which allows businesses to automate the processing of all rich media assets. IDOL forms a conceptual understanding that allows marketers to automatically tag and classify any rich media asset, regardless of format or language. Virage MediaBin applies this intelligence to deliver advanced analytics, automatic categorization, summarization, concept clouds, dynamic content associations, content hyperlinking and automation of business processes and workflow.
In addition, the new release:
provides enhanced innovations to “watch and listen” to video. The product automatically converts video to text and time synchronizes with a preview of the content. Video assets can be quickly and easily found with pinpoint accuracy to the exact location within a video where a word or phrase is spoken. This is dynamically associated with other critical digital assets.
You can get more information about MediaBin at www.autonomy.com/dam.
Stephen E Arnold, April 14, 2010
Unsponsored post.
Funnelback Books London School of Economics
April 14, 2010
The LSE (London School of Economics) has adopted the Funnelback search system. According to a news release issued by Squiz, a content management vendor in Australia:
LSE have set up Funnelback to search all external LSE Web site content as the default. Searches can also be restricted to specific groups or faculties. The public events search facility on the LSE website is also powered by Funnelback. This helps users find out when and where events such as the next concert or public lecture are to be held.
If you are not familiar with Funnelback, it is an open source search solution. The description on the LSE’s Web site says:
LSE uses Funnelback as the search facility for the LSE Web site. Funnelback indexes the LSE website. Submitting a query – word, phrase or question – returns a list of results – pages and documents matching the query – which are sorted by relevance. In addition to this standard search function, Funnelback allows the results to be refined by type, topic, site and file type. This refinement allows results to be more closely associated with the original query thereby aiding discovery of the sought for pages/documents or awareness of their absence from the site.
The service is publicly accessible at http://www.lse.ac.uk. I ran queries on the system and worked through relevance ranked results with different file types clearly marked. The PDF files, for example, are colored blue, which made them easy to spot. Here’s a typical results list:
The search box has a drop down menu that allows the user to limit the results to specific collections.
I find this type of narrowing useful. I was baffled by the link that said, “to improve results, try”. I could not figure out what was happening. I finally figured out that the right hand link to improve my results was a tiny graphic showing the left hand links that allowed refinement by type (which really means topic category), domain, and file type.
There is an advanced search function. The system invites me to fill out a form. The advanced search functions are used by power users but most users whack two or three words in the search box and give the system a go.
Performance was acceptable, but I don’t have a sense of the size of the corpus processed by the system. You can get more information about Funnelback from the firm’s Web site at www.funnelback.com.
Stephen E Arnold, April 14, 2010
A freebie.
Arnold Column Added to Information Today
April 14, 2010
Stephen E. Arnold, an expert in search, content processing, and online systems, and author of “Google: The Digital Gutenberg” (Infonortics, 2009) and three other significant Google studies, will be writing a column for the information industry’s trade paper, Information Today.
The column will focus on new directions in search and content processing, and themes from “Successful Enterprise Search Marketing,” which Arnold co-authored with Martin White of Intranet Solutions.
“I want to document the rapid changes now taking place in the way users interact with search systems. The era of the desktop PC is ending and new devices with new form factors mean major changes in search and retrieval,” Arnold said. Arnold has worked in the search and content processing field for more than ten years. He also writes columns for the Smart Business Network, Information World Review, and KMWorld.
More information about Arnold and his strategic information consulting business is available at http://www.arnoldit.com/sitemap.html. He also supports two blogs: Beyond Search, http://arnoldit.com/wordpress/, focuses on next generation search issues, and the Strategic Social Networking Blog, http://www.SSNBlog.com, addresses trends and current events in social media for business. His Google studies are available at http://www.infonortics.com/publications/google/google-trilogy.html.
Jessica West Bratcher, April 14, 2010
Tablets and a Puzzle for Publishers
April 13, 2010
Here’s the article I thoroughly enjoyed: “The Dark Side of Steve Jobs.” If you are interested in online information and tablets like the iPad, it is a must read. I even tucked a copy in my “quotes” file. The main point is that Apple’s Steve Jobs is exerting control over publishers, programmers, and for good measure Adobe. The write up makes a number of interesting observations and weaves a compelling story of how some publishers responded to Steve Jobs’s sales pitch.
Now the iPad may not look as the life boat some publishing companies wanted to float by their executive dining rooms and deliver boat loads of cash. Developing for the iPad will have some costs and require that publishers follow Apple’s rules.
The options available to publishers range from Google, which is not a particularly big hit on the partner league table for some, Amazon which is embroiled in a pricing tug of war, and the raft of other eBook reader manufacturers. Nook, anyone? JooJoo. WePad?
I think that some publishers will have the expertise to become a software development company that can code in Apple’s mandated programming language. Other publishers will try and be swamped by costs and the brutal life of bug fixes, updates, and enhancements. For publishers with deep pockets, no problem. For other publishers, problems.
Who will join the merry band of Googlers? Who will jump on the Amazon Kindle? Who will embrace the other eBook hardware devices? Which publishers will try to diversify and run conferences, become hardware manufacturers, or buy Web 2.0 properties in the hopes of spinning money?
I have no answers. Data will become available with each quarterly report going forward.
Stephen E Arnold, April 13, 2010
Unsponsored post.
Cpedia Previews the Future of Content Assembly
April 13, 2010
There are two services that anticipate some interesting future search methods. One company is Kosmix and the other is Cuil.com, the much maligned search service from Anna Patterson (former Googler) and Tom Costello (former IBMer). The folks behind Cuil.com have released Cpedia. According to GigaOM’s “Cuil Failed at Search, Now Fails to Copy Wikipedia”:
Cpedia launched last week with a blog post from Cuil co-founder and former IBM staffer Tom Costello, who described a meeting he had with Sun Microsystems co-founder Bill Joy when Costello and his wife Anna Patterson (a former Googler) were trying to raise money for Cuil. Joy told Costello that people didn’t need a new search engine that just returned a list of results, they needed something that would write an article based on a search. A note on Cpedia topic pages reads: “We find everything on the Web about your topic, remove all the duplication and put the information on one page.”
I have documented a couple of Google patent documents that describe somewhat similar ideas, although the Google systems and methods are tailored to the Google platform’s specific requirements for scale, cross processing, and optimizing performance among Google’s many different “flavors” of servers.
My view of Cpedia is somewhat less harsh than this statement in the GigaOM publication:
Unfortunately, being new and different doesn’t necessarily mean that it is either good or useful. Other users who have tried it out describe it as “sentence after sentence of automated nonsense,” and Tumblr and Instapaper developer Marco Arment says that “if this feature is meant to become a serious product, I truly feel bad for them.”
My view is:
- Conceptual slicing and dicing is a particularly interesting content processing problem. The Cuil method does yield some unusual outputs but for topics like “Julius Caesar”, I found the results in line with outputs from other systems we have reviewed. One can argue that the Cuil method does not produce outputs in line with what a college educated person might assemble after scanning six or seven sources, but the Cpedia results were in the ballpark compared to some of the wackiness we have seen in the past
- The computational load for this type of processing is quite high. Our tests showed that for high frequency queries like prominent topics and major historical figures, results were displayed quickly.
- The inclusion of real time results struck me as one step in providing the much needed context for information pulled from Twitter and Facebook. Too often, real time items are disembodied and make little or no sense. Maybe the Cuil.com approach is not the perfect answer, but I find the inclusion of real time results within a content centric context an improvement over a Collecta box showing items in a stream. (See http://ssnblog.com for an example of the Collecta stream.)
Our tests of Cuil.com continue, and we find that the service has been improving. Cpedia keeps the ball rolling.
Stephen E Arnold, April 13, 2010
Unsponsored post.
IBM and Verizon Team for Search Storage
April 13, 2010
Short honk: I read “IBM and Verizon Look to Draw Large Enterprises to Cloud Data Backup—Search Storage” in File Recovery. The pairing strikes me as one more attempt by IBM to hit a home run in a market sector that is beginning to get some traction. The optimists say an economic recovery is underway. Those in some big companies may be somewhat more cautious. The cloud appears to offer some ways to slash costs, but the idea that a service from two giants like IBM and Verizon will save money strikes me as a proposition that needs some supporting facts. The “search storage” phrase puzzles me. Hosted search works in some situations and it doesn’t in others. More information needed, but the tie up is fascinating.
Stephen E Arnold, April 13, 2010
Nope, a news item written for no dough.