Five Myths of Enterprise Search Marketing

May 12, 2010

The telephone and email flow has spiked. We are working to complete Google Beyond Text and people seem to be increasingly anxious (maybe desperate?) to know what can be done to sell search, content processing, indexing, and business intelligence.

Sadly there is no Betty White to generate qualified leads and close deals for most search and content processing vendors. See “From Golden Girl To It Girl: Betty White Has Become Marketing Magic.” This passage got my goose brain rolling forward:

On Saturday night, ‘SNL’ had its best ratings since 2008, with an estimated 11 million people tuning in to see Betty talk about her muffin. But more than the ratings boost was the shear hilarity of the show; for the first time in a long time, ‘SNL’ was at the center of the national conversation this Monday morning. ‘Saturday Night Live’ was good with Betty White. Really good! And that kind of chatter is something you just can’t buy.

The one thing the goose knows is that one-shot or star-centric marketing efforts are not likely to be effective. A few decades ago, I was able to promote newsletters via direct mail. The method was simple. License a list and pay a service bureau to send a four page letter, an envelope, and a subscription card. Mail 10,000 letters and get 200 subscribers at $100 a pop. If a newsletter took off like Plumb Bulletin Board Systems which we sold to Alan Meckler or MLS: Marketing Library Services which we sold to Information Today, the math was good. Just keep mailing and when the subscription list hit 1,000 or more, sell out.

Times have changed. The cost of a direct mail program in 1980 was less than a $1.00 per delivered item. Today, the costs have risen by a factor of five or more. What’s more important is that snail mail (postal delivered envelopes) is ignored. An indifferent recipient or an recipient overwhelmed with worries about money, the kids, or getting the lawn mowed has afflicted radio, television, cable, door knob hangers, fliers under windshield wipers, and almost any other form of marketing I used in 1970.

I had a long call with a search entrepreneur yesterday, and in that conversation, I jotted down five points. None is specific to her business, but the points have a more universal quality in my opinion. Let me highlight each of these “myths”. A “myth” of course is a story accepted as having elements of truth.

First, send news releases with lots of words that assert “best,” “fastest”, “easiest”, or similar superlatives produces sales. I am not sure I have to explain this. The language of the news release has to enhance credibility. If something is the “fastest” or “easiest”, just telling me one time will not convince me. I don’t think it convinces anyone. The problem is the notion of a single news release. Another problem is the idea that baloney sells or produces high value sales leads. Another problem is that news releases disappear into the digital maw and get spit out in RSS feeds. Without substance, I ignore them. PR firms are definitely increasing their reliance on news releases which are silly. So the myth that cooking up a news release makes a sale is false. A news release will get into the RSS stream, but will that sell? Probably a long shot?

Second, Webinars. I don’t know about you but scheduled Webinars take time. For me to participate in one of these, I need to know that the program is substantive and that I won’t hear people stumble through impenetrable PowerPoint slides. I have done some Webinars for big name outfits, but now I am shifting to a different type of rich media. Some companies charge $10,000 or more to set up a Webinar and deliver an audience. The problem is that some of the audiences for these fees are either not prospects or small. A Webinar, like a news release, is a one shot deal and one shot deals are less and less effective. The myth is that a Webinar is a way to make sales now. Maybe, maybe not.

Third, trade show exhibits. Trade show attendance is down. People want to go to conferences but with the economic climate swinging wildly from day to day, funds to go to conferences are constrained. Conferences have to address a specific problem. Not surprisingly events that are fuzzy are less likely to produce leads. I attended a user conference last week and the exhibitors were quite happy. In fact, one vendor sent me an email saying, “I am buried in follow ups.” The myth that all trade shows yield says is wrong. Some trade shows do; others don’t. Pick wrong and several thousand dollars can fly away in a heartbeat. For big shows, multiply that number by 10.

Fourth, Web sites sell. I don’t know about you, but Web sites are less and less effective as a selling tool. Most Web sites are brochureware unless there is some element of interactivity or stickiness. In the search world, most of the Web sites are not too helpful. Who reads Web pages? I don’t. Who reads white papers? I don’t. Who reads the baloney in the news releases or the broad descriptions of the company’s technology? I don’t. Most effective Web sites are those showcased by the marketing and designers. These are necessary evils, and my hunch is that Web sites will be losing effectiveness like snail mail, just more quickly. The myth is that Web sites pump money to the bottom-line. Hog wash. Web sites are today’s collateral in most cases. A Web site is a necessary evil.

Fifth, social media. I know that big companies have executives who are in charge of social media. Google lacks this type of manager, but apparently the company is going to hire a “social wrangler” or “social trail boss.” Social media, like any other messaging method, requires work. I know for certain that a one shot social media push may be somewhat more economical and possibly more effective than a news release or two. Social media is real and hard work. The myth that it is a slam dunk is wrong.

So with these myths, what works?

I have to be candid. In the search and content processing markets, technology is not going to close deals. The companies whom I hear are making sells are companies able to solve problems. In a conflicted market with great uncertainty, the marketing methods have to be assembled into a meaningful, consistent series of tactics. But tactics are not enough. The basics of defining a problem, targeting specific prospects, and creating awareness are the keys to success.

I wish I could identify some short cuts. I think consistency and professionalism have to be incorporated into on going activities. One shot, one kill may have worked for Buffalo Bill. I am not so sure the idea transfers to closing search deals.

Stephen E Arnold, May 12, 2010

A freebie.

Written by Stephen E. Arnold · Filed Under Marketing, News, Online (general), Search, Text processing | Comments Off on Five Myths of Enterprise Search Marketing

A New Term for Search: Enterprise Mashup

May 12, 2010

I received a copy of “Mashups in the Enterprise IT Environment: The Impact of Enterprise Mashup Platforms on Application Development and Evolving IT Relationships with Business End Users”, written by BizTechReports.com. The white paper is about JackBe.com’s software platform.

Here is the company’s description of its product and services:

Enterprise Mashups solve the quintessential information sharing problem: accessing and combining data from disparate internal and external data sources and software systems for timely decision-making. JackBe delivers trusted mashup software that empowers organizations to create, customize and collaborate through enterprise mashups for faster decisions and better business results. Our innovative Enterprise Mashup platform, Presto®, provides dynamic mashups that leverage internal and external data while meeting the toughest enterprise security and governance requirements. Presto provides enterprise mashups delivered to the user in 3 clicks versus 3 months.

You can get more information from the firm’s Web site at www.jackbe.com. If you want a short cut to demonstrations of the firm’s technology, click here.

The company provides a platform and services to convert disparate data into meaningful information assets. What I find interesting is that the phrase “enterprise mashup” is used to reference a range of content processing activities, including content acquisition and processing, indexing, and information outputting. In short, “enterprise mashup” is a useful way to position functions that some vendors describe as search or findability.

The JackBe’s interface reminds me of other business intelligence data presentations.

I want to focus on the white paper because it provides important hints about the direction in which some types of content processing is moving.

First, the argument of in the white paper hinges on an assertion that there is a “hyper dynamic environment.” How does an organization deal with this environment, a different approach to information is required. What is interesting is that the JackBe audience is a blend of developers and business professionals. Some search vendors are trying to get to the senior management of a company. JackBe is interested in two audiences.

Second, the white paper explains the concept of “mashup”. The term compresses a range of information activities into one term. To implement a mashup, JackBe provide widgets to help reduce the time and hassle for building “situation specific” implementations. Some search vendors talk about customization and personalization. The JackBe approach sidesteps these fuzzy notions and focuses on the idea of a “snap in”, lightweight method.

Finally, the JackBe approach uses an interesting metaphor. The phrase I noted was the “Home Depot model of enterprise IT.” Instead of taking disparate components of a typical search engine, JackBe suggests that a licensee can select what’s needed to do a particular information job.

You will want to read the white paper and glean more detailed information. I want to focus on the differences in the JackBe approach. These include:

Avoiding the overused and little understood terms such as search, taxonomies, business intelligence, and semantic technology. I am not sure JackBe’s approach is going to eliminate confusion, but it is clear to me that JackBe.com is trying to steer clear of the traditional jargon.
The JackBe approach is more trendy than IBM’s explanation of OmniFind. Examples of the JackBe approach include the notion of a mashup itself and the references to the “long tail” concept are examples.
To some enterprise procurement teams, JackBe’s approach may be perceived as quite different from the services of larger, higher profile vendors. In my view, this may be a positive step. Search vendors who follow in the footsteps of STAIRS III or Verity are not likely to have the sales success a more creative positioning permits.

To sum up, I think that companies with search and content processing technology will be working hard to distance themselves from the traditional vendors’ methods. The reason is that search as a stand alone service is increasingly perceived as an island. Organizations need systems that connect the islands of information into something larger.

Is JackBe a search and content processing vendor? Yes. Will most people recognize the company’s products and services as basic search? Not likely. Will the positioning confuse some potential licensees? Maybe.

Stephen E Arnold, May 12, 2010

Unsponsored post.

Written by Stephen E. Arnold · Filed Under Business intelligence, Marketing, News, Search, Text processing | Comments Off on A New Term for Search: Enterprise Mashup

Monitoring Google via Patent Documents, Definitely Fun

May 8, 2010

As soon as I returned from San Francisco, it was telephone day. Call after call. One of the callers was a testosterone charged developer in a far off land. The caller had read my three Google studies and wanted to know why my comments and analyses were at variance with what Googlers said. The caller had examples from Google executives in mobile, enterprise apps, advertising, and general management. His point was that Google says many things and none of the company’s comments reference any of the technologies I describe.

I get calls like this every couple of months. Let me provide a summary of the points I try to make when I am told that I describe one beastie and the beastie is really a unicorn, a goose, or an eagle.

First, Google is compartmentalized, based on short info streams shot between experts with sometimes quite narrow technical interests. I describe Google as a math club, which has its good points. Breadth of view and broad thinking about other subjects may not be a prerequisite to join. As a result, a Googler working in an area like rich media may not know much or even care about the challenges of scaling a data center, tracking down SEO banditry, or learning about the latest methods in ad injection for YouTube advertisers. This means that a comment by a Google expert is often accurate and shaped for that Googler’s area. Big thinking about corporate tactics may or may not be included.

Second, Google management—the top 25 or 30 executives—are pretty bright and cagey folks. Their comments are often crafted to position the company, reassure those in the audience, or instruct the listener. I have found that these individuals provide rifle shot information. On rare occassions, Google will inform people about what they should do; for example, “embrace technology” or “stand up for what’s right”. On the surface these comments are quotable but they don’t do much to pin down the specific “potential energy” that Google has to move with agility into a new market. I read these comments, but I don’t depend on them for my information. In fact, verbal interactions with Googlers are often like a fraternity rush meeting, not a discussion of issues, probably for the reasons I mentioned in point one above.

Third, Google’s voluminous publicly available information is tough to put into a framework. I hear from my one, maybe two clients, that Google is fragmented, disorganized, chaotic, and tough to engage in discussion. No kidding. The public comments and the huge volume of information scattered across thousands of Google Web pages requires a special purpose indexing operation to make manageable. I provide a free service, in concert with Exalead, so you can search Google’s blog posts. You can see a sample of this service at www.arnoldit.com/overflight. I have a system to track certain types of Google content and from that avalanche of stuff, I narrow my focus to content that is less subject to PR spin; namely, patent documents and papers published in journals. I check out some Google conference presentations, but these are usually delivered through one of Google’s many graduate interns or junior wizards. When a big manager talks, the presentation is subject to PR spin. Check out comments about Google Books or the decision to play hardball with China for examples.

My work, therefore, is designed to illuminate one aspect of Google that most Googlers and a most Google pundits don’t pay much attention to. Nothing is quite so thrilling as reading Google patent applications, checking the references in these applications, figuring out what the disclosed system and method does, and relating the technical puzzle piece to the overall mosaic of “total Google”.

You don’t have to know much about my monographs to understand that I am describing public documents that focus on systems and methods that may or may not be part of the Google anyone can use today. In fact, patent documents may never become a product. What a patent application provides includes:

Names of Google inventors. Example: Anna Patterson, now running Cuil.com. I don’t beat up on Cuil.com because Dr. Patterson is one sharp person and I think her work is important because she is following the research path explained in her Google patent documents, some of which have now become patents. In my experience, knowing who is “inventing” some interesting methods for Google is the equivalent of turning on a light in a dark room.
The disclosed methods. Example: There’s a lot of chatter about how lousy Wave was and is. The reality I inhabit is that Wave makes use of a number of interesting Google methods. Reading the patent applications and checking out Wave makes it possible to calibrate where in a roll out a particular method is. For that reason, I am fascinated by Google “janitors” and other disclosures in these publicly available and allegedly legal documents.
The disclosures through time. I pay attention to dates on which certain patent documents and technical papers appear. I plot these and then organize the inventions by type and function. Over the last eight years I have built a framework of Google capabilities that makes it possible to offer observations based on this particular body of open source information.

When you look at these three points and my monographs, I think it is pretty easy to see why my writings seem to describe a Google that is different from the popular line. To sum up, I focus on a specific domain and present information about Google’s technology that is described in the source documents. I offer my views of the systems and methods. I describe implications of these systems and methods.

I enjoy the emails and the phone calls, but I am greatly entertained by my source documents. My fourth Google monograph, Google Beyond Text, will be available in a month or so. Like my previous three studies, there are some interesting discoveries and hints that Google has reached a pivot point.

Stephen E Arnold, May 8, 2010

Sponsored post. I paid myself to write this article. Such a deal.

Written by Stephen E. Arnold · Filed Under Business strategy, News, Technology | Comments Off on Monitoring Google via Patent Documents, Definitely Fun

Milward from Linguamatics Wins 2010 Evvie Award

April 28, 2010

The Search Engine Meeting, held this year in Boston, is one of the few events that focuses on the substance of information retrieval, not the marketing hyperbole of the sector. Entering its second decade, the conference speakers tackle challenging subjects. This year speakers addressed such topics as “Universal Composable Indexing” by Chris Biow, Mark Logic Corporation, “Innovations in Social Search” by Jeff Fried, Microsoft, and “From Structured to Unstructured and Back Again: Database Offloading”, by Gregory Grefenstette, Exalead, and a dozen other important topics.

From left to right: Sue Feldman, Vice President, IDC, Dr. David Milward, Liz Diamond, Stephen E. Arnold, and Eric Rogge, Exalead.

Each year, the best paper is recognized with the Evvie Award. The “Evvie” was created in honor of Ev Brenner, one of the pioneers in machine-readable content. After a distinguished career at the American Petroleum Institute, Ev served on the planning committee for the Search Engine Meeting and contributed his insights to many search and content processing companies. One of the questions I asked after each presentation was, “What did Ev think?”. I valued Ev Brenner’s viewpoint as did many others in the field.

The winner of this year’s Evvie award is David R. Milward, Linguamatics, for his paper “From Document Search to Knowledge Discovery: Changing the Paradigm.” Dr. Milward said:

Business success is often dependent on making timely decisions based on the best information available. Typically, for text information, this has meant using document search. However, the process can be accelerated by using agile text mining to provide decision-makers directly with answers rather than sets of documents. This presentation will review the challenges faced in bringing together diverse and extensive information resources to answer business-critical R&D questions in the pharmaceutical domain. In particular, it will outline how an agile NLPbased approach for discovering facts and relationships from free text can be used to leverage scientific knowledge and move beyond search to automated profiling and hypothesis generation from millions of documents in real time.

Dr. Milward has 20 years’ experience of product development, consultancy and research in natural language processing. He is a co-founder of Linguamatics, and designed the I2E text mining system which uses a novel interactive approach to information extraction. He has been involved in applying text mining to applications in the life sciences for the last 10 years, initially as a Senior Computer Scientist at SRI International. David has a PhD from the University of Cambridge, and was a researcher and lecturer at the University of Edinburgh. He is widely published in the areas of information extraction, spoken dialogue, parsing, syntax and semantics.

Presenting this year’s award was Eric Rogge, Exalead, and Liz Diamond, niece of Ev Brenner. The award winner received a recognition award and a check for $500. A special thanks to Exalead for sponsoring this year’s Evvie.

The judges for the 2010 Evvie were Dr. David Evans (Evans Research), Sue Feldman (IDC), and Jill O’Neill, NFAIS.

Congratulations, Dr. Milward.

Stuart Schram IV, April 28, 2010

Sponsored post.

Written by Stephen E. Arnold · Filed Under Business intelligence, News, Online (general), Search, Semantic, Technology, Text analytics, Text processing | Comments Off on Milward from Linguamatics Wins 2010 Evvie Award

New Search and Old Boundaries

April 28, 2010

Yesterday in my talk at a conference I pointed out that for many people, the Facebook environment will cultivate new species of information retrieval. Understandably the audience listened politely and converted my observations into traditional information retrieval methods. Several of the people with whom I spoke pointed out that the Facebook information was findable only with a programmatic query via the Facebook application programming interfaces or by taking a Facebook feed and processing it. The idea that “search” now spans silos, includes structured and unstructured data, and delivers actionable results describes what some organizations want. There are challenges, of course. These include:

Mandated silos of information; for example, in certain situations, mash ups and desiloization are prohibited for legal or practical reasons
The costs of shifting from inefficient, expensive methods to more informed methods; for example, the costs of data transformation can be onerous. I have talked with individuals who point out that data transformation can consume significant sums of money and these expenditures are often inadequately budgeted. One result is a slow down or cut back on the behind-the-scenes preparatory work
Business processes have sometimes emerged based on convention, user behavior or because the system was refined over time. When “data” are meshed with such a business process, the marriage is a less-than-happy one. Data centric thinking can be blunted when juxtaposed to certain traditional business processes and methods.

In short, the new world can be envisioned, based on speculation, or assembled from fragmentary reports from the field. I can imagine the intrepid 16th century navigators understanding why innovators have to push forward into a new and unknown world. One reminder is the assertion that an estimated 358 million personal data records have been leaked since 2005.

The Guardian article “Facebook Privacy Hole ‘Lets You See Where Strangers Plan to Go‘” provides an example of one challenge. The point of the write up is that the Facebook social network has a “privacy hole”. The Guardian says:

Some people report that they are able to see the public “events” that Facebook users have said they will attend – even if they person is not a “friend” on the social network…The implications of being able to find out the movements of any of the 400m people on Facebook are potentially wide-ranging – although the flaw does not seem to apply to every user, or every event. Yee says that the simplest way to prevent your name appearing in such lists is to put “not attending” against any event you are invited to.

As the Facebook approach to finding information captures users, the barriers between new types of information and the uses to which those information objects can be put come down. In a social space, the issue is personal privacy. In an organizational space, the issue is the security of information assets.

As young people enter the workforce, these folks bring a comfort level with Facebook type of systems markedly different from mine. I think organizations are largely unable to control effectively what some employees do with online services. Telework, mobile devices, and smart phones present a management and information challenge.

The lowering of information barriers and the efforts to dissolve silos further reduces an organization’s control of information and the knowledge of the uses to which that information may be put.

Let’s step back.

First, ineffective search and content processing systems exist, so organizations need ways to address the costs and inefficiencies of incumbent systems. Web services and fresh approaches to indexing content seem to be solutions to findability problems in some situations.

Second, employees—particularly those comfortable with pervasive connectivity and social methods of obtaining information—do what works for them. These methods are not necessarily controllable or known to some employers. An employee can use a personal smart phone to ask “friends” a question. After all, what are friends for?

Third, vendors want to describe their systems using words and phrases that connote ways to solve findability problems. Talking about merged data and collaboration may be what’s needed to close a deal.

When these three ingredients are mixed, the result is a security and information control challenge that is only partially understood.

Is it possible to deliver a next generation information experience and minimize the risks from such a system? Sure, but there will be surprises along the route. Whether it is Mr. Zuckerberg’s schedule or insights into the Web browsing habits of government employees, there will be unexpected and important insights about these systems. The ability to use a search interface to obtain reports is increasing. Are the privacy and security controls lagging behind?

Stephen E Arnold, April 28, 2010

Unsponsored post.

Written by Stephen E. Arnold · Filed Under Business strategy, News, Search, Social, Text analytics, Text processing | Comments Off on New Search and Old Boundaries

Goose and Maverick Agree on the Facebook Thing

April 26, 2010

The addled goose likes the blog maverick. A farm yard of folks who look at the machinations of 20 somethings and offer a different perspective. I liked “Is Facebook the New Internet and How Soon before Microsoft Tries to Buy It ? I see information that increasingly suggests that Google has to deal with a digital Billy the Kid, the founder of Facebook. The new Billy as I shall reference Mr. Zuckerberg figured out that the big text indexing approach was not where the action was going. Google, and to a certain extend Microsoft, missed this insight. Under the able leadership of the News Corporation, MySpace.com found itself marginalized. Google fumbled with Buzz and Microsoft was sufficiently savvy to take a stake in Facebook and rejig some cloud apps for the Facebook ecosystem.

Now along comes the blog maverick who writes:

Everything that the net was 5 or more years ago, Facebook is today. The interesting thing is that Facebook knows it. Slowly but surely they are extending their tentacles into traditional websites, mobile apps (android/iphone/Ipad) and soon your HDTV .

Let’s assume he is correct.

First, the Google approach is yesterday’s news. The Facebook crowd does text but within the Facebook ecosystem. The idea of Google creating a Facebook seems unnatural to me. Google is not social in the way Facebook is. It will be easier for Facebook to offer finding services and it will be cheaper for Facebook to offer this function.

Second, Facebook may not sell even if the numbers are crazy. The reason is that the new Billy thinks he is the fastest draw in Silicon Valley. The only way to find out is to get into duels, not take the money and retire. Facebook may be the next IBM, Microsoft, or Google. The only way to find out is not to sell out but to go all out. That pedal to the metal approach is what I took away from the f8 info storm.

Third, I heard at lunch yesterday (Friday, April 23, 2010) that Microsoft is now moving into consumer online via Facebook and Google is chasing the enterprise. These outfits are like yin and yang. The new magnetic center seems to be Facebook and Google, so far, has been ineffectual in social and not able to build services with Facebook’s benign indifference.

Bottom-line: the goose and maverick are on the same half acre.

Stephen E Arnold, April 26, 2010

Unsponsored post.

Written by Stephen E. Arnold · Filed Under Business strategy, Google, Marketing, Microsoft, News | Comments Off on Goose and Maverick Agree on the Facebook Thing

OmniFind Does XML

April 25, 2010

I never doubted that OmniFind when indexing DB2 tables. The write up “Search XML in OmniFind V1R2” makes this point clearly. If you want to dive into this use of OmniFind, take a peek at the syntax for a query:

> SELECT PRODUCT_ORDER FROM ORDERS WHERE CONTAINS(PRODUCT_ORDER, ‘@xmlxp:”/ORDER/CUSTOMER/TITLE[. contains(“digital indexing operator”)] ” ‘ ) = 1;

More information is available from www.ibm.com.

Stephen E Arnold, April 25, 2010

Unsponsored post.

Written by Stephen E. Arnold · Filed Under Enterprise, News, Search | 1 Comment

Coveo Surges Forward

April 23, 2010

The Coveo team has surged forward in search, according the story “Coveo Announce Une Augmentation de 55% des Revenus de License au 1er Trimestre de 2010; Conclut 24 Nouvelle Transactions.” (There’s an English version of the news release at Intelligent Enterprise as well.) If you don’t read French, this translates as significant growth and two dozen new deals in 12 weeks. Beyond Search thinks that is pretty darned good in today’s economic climate. Other highlights from the story include:

The release of Version 6.1 of the firm’s Enterprise Search Platform with a raft of new features such as an Outlook integrated sidebar, a floating desktop search bar, and complete desktop email indexing. (You can get the full details at www.coveo.com).
Deals with Trading Technologies, Netezza, Hewitt, Royal Mail Group, and Allina Hospitals among others
A 97 percent reduction in time taken to find expertise across a top engineering firm and a two week ROI for a Fortune 100 financial services company

I moderated for Fierce Content Management, the publishing company, a Webinar with Louis Tetu, one of the investors in Coveo, Bill Cavendish (GEICO), and Coveo’s Executive Vice President (Richard Tessier). The company’s approach to enterprise information struck me as focused on chopping the “wait” out of the installation and delivering information that helps employees do their jobs. The “search” function meshes with work processes, so employees can click on a link, fire a query from a mobile device, or use a customized interface. After the Fierce Webinar, I spoke briefly with the firm’s founder Laurent Simoneau. He pointed out that Coveo’s architecture and “smart” software make it possible to get real payoff from search, not big engineering and consulting bills. My recollection is that Laurent Simoneau said, “We focus on making search work the way users want in their specific situation. This seems to be working quite well for us.”

With 55 percent growth in 12 weeks. I am inclined to agree.

Stephen E Arnold, April 23, 2010

This post was not sponsored.

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, Mobile, News, Online (general), Search, Technology | 1 Comment

SharePoint Taxonomy Fairy Dust

April 21, 2010

First, navigate to “SharePoint 2010: Using Taxonomy & Controlled Vocabulary for Content Enrichment”. Second, read the article. Now ask yourself these questions:

Who sets up the SharePoint taxonomy magic?
From where does the taxonomy come?
Who maintains the taxonomy?
How are inappropriate terms removed from the index and the correct terms applied?

Got your answers. Here are mine:

A specialist in controlled term lists is needed to figure out the list and then an industrial strength system like the one available from Access Innovations is needed. Once the system is up and running and the term list generated you are ready to tackle SharePoint.
The taxonomy comes from a method that involves figuring out the lingo of the organization, available term lists, and then knowledge value work. In short, a taxonomy has to be in touch with the organization and the domain of knowledge to which it is applied. Sound like work? It is and most taxonomy problems originate with slap dash methods.
The taxonomy must be – note the imperative – by a combination of a human and software. New terms come and old terms go. The indexes and the tagged objects must be kept in sync. Humans with software tools perform this work. A taxonomy left to the devices of automated systems, left unchanged, or tweaked by azure chip experts is essentially useless after a period of time.
Inappropriate terms are removed from the system via a human and software intermediated system. Once the term list is updated, then the process of retagging and reindexing takes places. Muff this bunny and no one can find anything.

Now read the article again. Quite a bit is left out or simply not deemed relevant. My suggestion is to do some thinking about the nature of the user, the specific information retrieval needs, and the expertise required to do the job to avoid wasting time and money.

Like most tasks in search, it is more fun to simplify than to deal from the top of the deck. SharePoint is one of the more interesting systems with which to work. Once the short cuts and half baked approach goes south, you will be ready to do the job correctly. I wonder if the CFO knows what questions to ask to figure out why content processing costs have gone through the roof because of rework, fiddling, and bungee jumping without a cord.

Stephen E Arnold, April 21, 2010

Unsponsored post

Written by Stephen E. Arnold · Filed Under Microsoft, News, SharePoint, Text processing | 3 Comments

The Seven Forms of Mass Media

April 21, 2010

Last evening on a pleasant boat ride on the Adriatic, a number of young computer scientists to be were asking about my Google lecture. A few challenged me, but most seemed to agree with my assertion that Google has a large number of balls in the air. A talented juggler, of course, can deal with five or six balls. The average juggler may struggle to keep two or three in sync.

One of the students shifted the subject to search and “findability.” As you know, I floated the idea that search and content processing is morphing into operational intelligence, preferably real-time operational intelligence, not the somewhat stuffy method of banging two or three words into a search box and taking the most likely hit as the answer.

The question put to me was, “Search has not kept up with printed text, which has been around since the 1500s, maybe earlier. What are we going to do about mobile media?”

The idea is that we still have a difficult time locating the precise segment of text or datum. With mobile devices placing restraints on interface, fostering new types of content like short text messages, and producing an increasing flow of pictures and video, finding is harder not easier.

I remembered reading “Cell Phones: The Seventh Mass Media” and had a copy of this document on my laptop. I did not give the assertion that mobile derives were a mass medium, but I thought the insight had relevance. Mobile information comes with some interesting characteristics. These include:

The potential for metadata derived from the user’s mobile number, location, call history, etc
The index terms in content, if the system can parse information objects or unwrap text in an image or video such as converting an image to ASCII and then indexing the name of a restaurant or other message in an object
Contextual information, if available, related to content, identified entities, recipients of messages, etc.
Log file processing for any other cues about the user, recipient(s), and information objects.

What this line of thinking indicates is that a shift to mobile devices has the potential for increasing the amount of metadata about information objects. A “tweet”, for instance, may be brief but one could given the right processing system impart considerable richness to the information object in the form of metadata of one sort or another.

The previous six forms of media—[I] print (books, magazines, and newspapers), [II] recordings; [III] cinema; [IV] radio; [V] television; and [VI] Internet—fit neatly under the umbrella of [VII] mobile. The idea is mobile embraces the other six. This type of reasoning is quite useful because it gathers some disparate items and adds some handles and knobs to the otherwise unwieldy assortment in the collection.

In the write up referenced above, I found this passage interesting: “Mobile is as different from the Internet as TV is from the radio.”

The challenge that is kicked to the side of the information highway is, “How does one find needed information in this seventh mass media?” Not very well in my experience. In fact, finding and accessing information is clumsy for textual information. After 500 years, the basic approach of hunting, Easter egg style, has been facilitated by information retrieval systems. But I think most people who look for information can point out some obvious deficiencies. For example, most retrieval systems ignore content in various languages. Real time information is more of a marketing ploy than a useful means of figuring out the pulse count for a particular concept. A comprehensive search remains a job for a specialist who would be recognized by an archivist who worked in Ephesus’ library 2500 years ago.

Are you able to locate this video on Ustream or any other video search system? I could not, but I know the video exists. Here is a screen capture. Finding mobile content can be next to impossible in my opinion.

When I toss in the radio and other rich media content, finding and accessing pose enormous challenges to a researcher and a casual user alike. In my keynote speech on April 15, 2010, I referenced some Google patent documents. The clutch of disclosures provide some evidence that Google wants to apply smart software to the editorial job of creating personalized rich media program guides. The approach strikes me as an extension of other personalization approaches, and I am not convinced that explicit personalization is a method that will crack the problem of finding information in the seventh medium or any other for that matter.

Here’s my reasoning:

Search and retrieval methods for text don’t solve problems. The more information processed means longer results lists and an increase in the work required to figure out where the answer is.
Smart systems like Google’s or the Cuil Cpedia project are in their infancy. An expert may find fault with smart software that is actually quite stupid from the informed user’s point of view.
Making use of context is a challenging problem for research scientists but asking one’s “friends” may be the simplest, most economical, and widely used method. Facebook’s utility as a finding system or Twitter’s vibrating mesh may be the killer app for finding content from mobile devices.
As impressive as Google’s achievements have been in the last 11 years, the approach remains largely a modernization of search systems from the 1970s. A new direction may be needed.

The bright young PhDs have the job of figuring out if mobile is indeed the seventh medium. The group with which I was talking or similar engineers elsewhere have the job of cracking the findability problem for the seventh medium. My hope is that on the road to solving the problem of the new seventh medium’s search challenge, a solution to finding information in the other six is discovered as well.

The interest in my use of the phrase “operational intelligence” tells me one thing. Search is a devalued and somewhat tired bit of jargon. Unfortunately substituting operational intelligence for the word search does not address the problem of delivering the right information when it is needed in a form that the user can easily apprehend and use.

There’s work to be done. A lot of work in my opinion.

Stephen E Arnold, April 20, 2010

No sponsor for this post, gentle reader.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Mobile, Rich media, Semantic, Technology, Text analytics, Text processing | 4 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Five Myths of Enterprise Search Marketing

A New Term for Search: Enterprise Mashup

Monitoring Google via Patent Documents, Definitely Fun

Milward from Linguamatics Wins 2010 Evvie Award

New Search and Old Boundaries

Goose and Maverick Agree on the Facebook Thing

OmniFind Does XML

Coveo Surges Forward

SharePoint Taxonomy Fairy Dust

The Seven Forms of Mass Media

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta