2012: Enterprise Search Yields to Metadata?

October 30, 2011

Oh, my. The search dragon has been killed by metadata.

You might find yourself on an elevator ready to get off on a specific floor. The rest of your trip will start from that point and that point only. The same is true for learning, conversing, actually just about anything. We all have a particular place we want to enter the conversation. MSDN’s Microsoft Enterprise Content Management (ECM) Team Blog’s recent posting on “Taxonomy: Starting from Scratch” was a breath of fresh air in the way it addressed anyone–no matter what floor they needed.

For the novices to Managed Metadata Service, a service providing tools to foster a rich corporate taxonomy, the article recommends a starting point: Introducing Enterprise Metadata Management

According to the article. The more seasoned users are reminded to point their browsers towards import capabilities. Of course, there are more specific needs, and links to go with them, addressed too.

The article recommends the following for the clients who need a comprehensive understanding of both common and specific corporate terms. The author Ryan Duguid states:

“The General Business Taxonomy consists of around 500 terms describing common functional areas that exist in most businesses.  The General Business Taxonomy can be imported in to the SharePoint 2010 term store within minutes and provides a great starting point for customers looking to build a corporate vocabulary and take advantage of the Managed Metadata Service.”

Overall, this article is worth keeping tucked away for a day when you might need information on WAND, SharePoint, or metadata and taxonomy in general because of the directness and the accessible next steps the variety of links offer.

Megan Feil, October 30, 2011

Sponsored by Pandia.com

Google Amazon Dust Bunnies

October 13, 2011

The addled goose has a bum eye, more air miles than a 30 something IBM sales engineer, and lousy Internet connectivity. T Mobile’s mobile WiFi sharing gizmo is a door stop. Imagine my surprise when I read “Google Engineer: “Google+ Example of Our Complete Failure to Understand Platforms.” In one webby write up, the dust bunnies at Google and Amazon were moved from beneath the bed to the white nylon carpet of a private bed chamber.

I am not sure the information in the article is spot on. Who can certain about the validity of any information any longer. The goose cannot. But the write up reveals that Amazon is an organization with political “infighting”. What’s new? Nothing. Google, on the other hand, evidences a bit of reflexivity. I will not drag the Motorola Mobility event into this brief write up, but students of business may find that acquisition worth researching.

Here is the snippet which caught my attention:

[A]  high-profile Google engineer … mistakenly posted a long rant about working at Amazon and Google’s own issues with creating platforms on Google+. Apparently, he only wanted to share it internally with everybody at Google, but mistaken shared it publicly. For the most part, [the] post focuses on the horrors of working at Amazon, a company that is notorious for its political infighting. The most interesting part to me, though, is … [the] blunt assessment of what he perceives to be Google’s inability to understand platforms and how this could endanger the company in the long run.

I want to step back. In fact, I want to go into MBA Mbit mode.

First, this apparent management behavior is the norm in many organizations, not the companies referenced in the post.I worked for many years in the old world of big time consulting. Keep in mind that my experiences date from 1973, but management idiosyncrasies were the rule. The majority of these management gaffes took place in a slower, not digital world. Sure, speed was important. In the physics of information speed is relative. Today the perceived velocity is great and the diffusion of information adds a supercharger to routine missteps. Before getting too excited about the insights into one or two companies, most organizations today are  perilously close to dysfunction. Nothing special here, but today’s environment gives what is normal some added impact. Consolidation and an absence of competition makes the stakes high. Bad decisions add a thrill to the mundane. Big decisions weigh more and can have momentum that does more quickly than a bad decision in International Harvester or NBC in the 1970s.

Second, technology invites bad decisions. Today most technologies are “hidden”, not exposed like the guts of  a Model T or my mom’s hot wire toaster which produced one type of bagel—burned. Not surprisingly, even technically sophisticated managers struggle to understand the implications of  a particular technical decision. To make matters worse, senior mangers have to deal with “soft” issues and technical training, even if limited, provide few beacons for the course to chart. Need some evidence. Check out the Hewlett Packard activities over the last 18 months. I routinely hear such statements as “we cannot locate the invoice” and “tell us what to do.” Right. When small things go wrong, how can the big things go right? My view is that chance is a big factor today.

Third, the rush to make the world social, collaborative, and open means that leaks, flubs, sunshine, and every other type of exposure is part of the territory.. I find it distressing that sophisticated organizations fall into big pot holes. As I write this, I am at an intelligence conference, and the rush to openness has an unexpected upside for some information professionals. With info flowing around without controls, the activities of authorities are influenced by the info bonanza. Good and bad guys have unwittingly created a situation that makes it less difficult to find the footprints of an activity. The post referenced in the source article is just one more example of what happens when information policies just don’t work. Forget trust. Even the technically adept cannot manage individual communications. Quite a lesson I surmise.

In search and content processing,the management situation is  dire. Many companies are uncertain about pricing,features, services, and innovation. Some search vendors describe themselves with nonsense and Latinate constructions. Other flip flop for search to customer support to business intelligence without asking themselves, “Does this stuff actually work?” Many firms throw adjectives in front of jargon and rely on snake charming sales people to close deals. Good management or bad management? Neither. We are in status quo management with dollops of guessing and wild bets.

My take on this dust bunny matter is that we have what may be an unmanageable and ungovernable situation. No SharePoint governance conference is going to put the cat back in the bag. No single email, blog post, or news article will make a difference. Barn burned. Horse gone. Wal-Mart is building on the site. The landscape has changed. Now let the “real” consultants explain the fix. Back to the goose pond for me. Collaborate on that.

Stephen E Arnold, October 13, 2011

Sponsored by Pandia.com

xx

 

xx

Lucid Imagination: Open Source Search Reaches for Big Data

September 30, 2011

We are wrapping up a report about the challenges “big data” pose to organizations. Perhaps the most interesting outcome of our research is that there are very few search and content processing systems which can cope with the digital information required by some organizations. Three examples merit listing before I comment on open source search and “big data”.

The first example is the challenge of filtering information required by orgnaizatio0ns produced within the organization and by the organizations staff, contractors, and advisors. We learned in the course of our investigation that the promises of processing updates to Web pages, price lists, contracts, sales and marketing collateral, and other routine information are largely unmet. One of the problems is that the disparate content types have different update and change cycles. The most widely used content management system based on our research results is SharePoint, and SharePoint is not able to deliver a comprehensive listing of content without significant latency. Fixes are available but these are engineering tasks which consume resources. Cloud solutions do not fare much better, once again due to latency. The bottom line is that for information produced within an organization employees are mostly unable to locate information without a manual double check. Latency is the problem. We did identify one system which delivered documented latency across disparate content types of 10 to 15 minutes. The solution is available from Exalead, but the other vendors’ systems were not able to match this problem of putting fresh, timely information produced within an organization in front of system users. Shocked? We were.

lucid decision copy

Reducing latency in search and content processing systems is a major challenge. Vendors often lack the resources required to solve a “hard problem” so “easy problems” are positioned as the key to improving information access. Is latency a popular topic? A few vendors do address the issue; for example, Digital Reasoning and Exalead.

Second, when organizations tap into content produced by third parties, the latency problem becomes more severe. There is the issue of the inefficiency and scaling of frequent index updates. But the larger problem is that once an organization “goes outside” for information, additional variables are introduced. In order to process the broad range of content available from publicly accessible Web sites or the specialized file types used by certain third party content producers, connectors become a factor. Most search vendors obtain connectors from third parties. These work pretty much as advertised for common file types such as Lotus Notes. However, when one of the targeted Web sites such as a commercial news services or a third-party research firm makes a change, the content acquisition system cannot acquire content until the connectors are “fixed”. No problem as long as the company needing the information is prepared to wait. In my experience, broken connectors mean another variable. Again, no problem unless critical information needed to close a deal is overlooked.

Read more

MarkLogic, FAST, Categorical Affirmatives, and a Direction Change

July 5, 2011

I weakened this morning (July 4, 2011) with a marketing Fourth of July boom. I received one of those ever present LinkedIn updates putting a comment from the Enterprise Search Engine Professionals Group in front of me.

image

The MarkLogic positioning exploded on my awareness like a Fourth of July skyrocket’s burst.

Most of the comments on the LinkedIn group are ho hum. One hot topic has been Microsoft’s failure to put much effort in its blogs about Fast Search & Transfer’s technology. Snore. Microsoft put down $1.2 billion for Fast, made some marketing noises, and had a fellow named Mr. Treo-something talk to me about the “new” Fast Search system. Then search turned out to be more like a snap in but without the simplicity of a Web part. Microsoft moved on and search is there, but like Google’s shift to Android, search is not where the action is. I am not sure who “runs” the enterprise search unit at Microsoft. Lots of revolving door action is my impression of Microsoft’s management approach in the last year.

The noise died down and Fast has become another component in the sprawling Shanghai of code known as SharePoint 2010. Making Fast “fast” and tuning it to return results that don’t vary with each update has created a significant amount of business for Microsoft partners “certified” to work on Fast Search. Licensees of the Linux/Unix version of ESP are now like birds pushed from the next by an impatient mother.

New MarkLogic Market Positioning?

Set Microsoft aside for a moment and look at this post from a MarkLogic professional who once worked at Fast Search and subsequently at Microsoft. I am not sure how to hyperlink to LinkedIn posts without generating a flood of blue and white screens begging for log in, sign up, and money. I will include a link, but you are on your own.

Here’s the alleged MarkLogic professional’s comment:

Many organizations are replacing FAST with MarkLogic. MarkLogic offers a scalable enterprise search engine with all the features of FAST plus more…

Wow.

An XML engine with wrappers is now capable of “all” the Fast features. In my new monograph “The New Landscape of Enterprise Search”, I took some care to review information presented by Fast at CERN, the wizard lair in Europe, about Fast Search’s effort to rewrite Fast ESP, which was originally a Web search engine. The core was wrapped to convert Web search into enterprise search. This was neither quick nor particularly successful. Fast Search & Transfer ran into some tough financial waters, ended up the focus of a government investigation, and was quickly sold for a price that surprised me and the goslings in Harrod’s Creek.

You can get the details of the focus of the planned reinvention of the Fast system and the link to the source document at CERN which I reference in my Landscape study. A rewrite indicates that some functions were not in 2007 and 2008 performing in  a manner that was acceptable to someone in Fast Search’s management. Then the acquisition took place. The Linux/Unix support was nuked. Fast under Microsoft’s wing has become a utility in the incredible assemblage of components that comprises SharePoint 2010. I track the SharePoint ecosystem in my information service SharePointSemantics.com. If you haven’t seen the content, you might want to check it out.

Read more

Are Webinars the Backbone of Concept Searching Marketing?

July 5, 2011

On the surface, Concept Searching looks like some of the other analytics company that asserts steady growth. What is interesting is that when some value adding software co9mpanies market, webinars or online lectures and demos are a component of a broader marketing program, Concept Search seems to rely heavily on webinars. We find this interesting.

We looked into one search company which was using Twitter to make the text processing service a hot trend. From our vantage point, it seems that Concept Searching is using social media in a more modest way.

Though it sounds like Spiderman should be involved, a webinar is simply an online seminar or workshop. The great thing about a webinar is that it is usually interactive and allows all participates to give, receive and discuss the topics at hand. Additionally, geographical boundaries are not an issue and these presentations are very low in cost.

When perusing Concept Searching’s Web site, you will find an entire events page dedicated to their upcoming exhibitions and a list and description of their current webinars. Some titles include: “Designing Information Architecture for SharePoint: Making Sense in a World of SharePoint Architecture”  and “De-mystifying Content Types: Four Key Content Types of Leverage.” You simply register and voilà, you join in on all the fun. They also have a page dedicated to previously recorded webinars that you can access at your leisure.

I moderate webinars for a couple of outfits, and these are often expensive programs. There is time, often lots of time, required to prepare the text, create the graphics and demos, and then build an audience. I participate in webinars when I am paid to do so. However, I do not participate in webinars. The reason is that I am receiving inputs, experiencing interruptions even when the door is closed, and working to respond to ad hoc requests from clients.

I do think that webinars are somewhat more useful than attending certain conferences. Over the last couple of years, conferences are more like fraternity and sorority parties. But that perception may be a function of my age and distaste for rock and roll, mixed media events with lots of 20 somethings opining about social media and organic search. Yikes, digital bonsai.

This leads me to the question, “Who has time to participate in webinars?” If these are buyers of high end solutions, great. However, if I were the boss of a company where webinars consumed staff time, I would be asking some questions about the efficacy of the method.

I find reading a Web page and using an online demo or downloading code useful. Webinars may be too zippy for an old goose like me. One thing for sure: lots of companies are using webinars to hold down the cost of on site sales calls and getting individuals “interested” in a product or service to cough up an email address.

Stephen E Arnold, July 5, 2011

Sponsored by Pandia.com, publishers of the New Landscape of Enterprise Search

D4 and RiverGlass Join eDiscovery Forces

June 27, 2011

As announced on PRWeb in “D4, LLC, Partners with RiverGlass, Inc. Enabling Progressive Enhancements to D4’s eDiscovery Service Offerings,” the two companies have signed an agreement to form a strategic partnership for D4 to distribute, install and host the RiverGlass solutions.

D4 focuses on litigation support and eDiscovery services to law firms and corporate law departments. RiverGlass, Inc. is a provider of advanced information collection and analysis solutions focusing on government agencies, as well as eDiscovery and risk management applications to major corporations. The write up said:

D4’s highly technical method to eDiscovery and digital forensics leverages the maximum benefits available from the RiverGlass application.

With the solution:

Customers can harvest from many different types of data stores and ingest ESI in native format without having to have it processed. This includes network stores, SharePoint sites, websites, social media as well as structured databases.

This type of eDiscovery is blurring the lines between search and text analytics, creating a powerful tool for lawyers. It markedly improves the labor-intensive and mistake-prone legal discovery process.

Will eDiscovery go the way of customer support. What looks like a trivial exercise in using traditional search and retrieval for customer support is tough. Some of the vendors chasing customers in this segment are learning that customer support is more difficult than it appears. eDiscovery strikes me as having a higher level of complexity.

It is interesting to watch the shape shifting that is underway in the content processing sector.

Stephen E Arnold, June 27, 2011

You can read more about enterprise search and retrieval in The New Landscape of Enterprise Search, published my Pandia in Oslo, Norway, in June 2011.

Search: An Information Retrieval Fukushima?

May 18, 2011

Information about the scale of the horrific nuclear disaster in Japan at the Fukushima Daiichi nuclear complex is now becoming more widely known.

Expertise and Smoothing

My interest in the event is the engineering of a necklace of old-style reactors and the problems the LOCA (loss of coolant accident) triggered. The nagging thought I had was that today’s nuclear engineers understood the issues with the reactor design, the placement of the spent fuel pool, and the risks posed by an earthquake. After my years in the nuclear industry, I am quite confident that engineers articulated these issues. However, the technical information gets “smoothed” and simplified. The complexities of nuclear power generation are well known at least in engineering schools. The nuclear engineers are often viewed as odd ducks by the civil engineers and mechanical engineers. A nuclear engineer has to do the regular engineering stuff of calculating loads and looking up data in hefty tomes. But the nukes need grounding in chemistry, physics, and math, lots of math. Then the engineer who wants to become a certified, professional nuclear engineer has some other hoops to jump through. I won’t bore you with the details, but the end result of the process produces people who can explain clearly a particular process and its impacts.

image

Does your search experience emit signs of troubles within?

The problem is that art history majors, journalists, failed Web masters, and even Harvard and Wharton MBAs get bored quickly. The details of a particular nuclear process makes zero sense to someone more comfortable commenting about the color of Mona Lisa’s gown. So “smoothing” takes place. The ridges and outcrops of scientific and statistical knowledge get simplified. Once a complex situation has been smoothed, the need for hard expertise is diminished. With these simplifications, the liberal arts crowd can “reason” about risks, costs, upsides, and downsides.

image

A nuclear fall out map. The effect of a search meltdown extends far beyond the boundaries of a single user’s actions. Flawed search and retrieval has major consequences, many of which cannot be predicted with high confidence.

Everything works in an acceptable or okay manner until there is a LOCA or some other problem like a stuck valve or a crack in a pipe in a radioactive area of the reactor. Quickly the complexities, risks, and costs of the “smoothed problem” reveal the fissures and crags of reality.

Web search and enterprise search are now experiencing what I call a Fukushima event. After years of contentment with finding information, suddenly the dashboards are blinking yellow and red. Users are unable to find the information needed to do their job or something as basic as locate a colleague’s telephone number or office location. I have separated Web search and enterprise search in my professional work.

I want to depart for a moment and consider the two “species” of search as a single process before the ideas slip away from me. I know that Web search processes publicly accessible content, has the luxury of ignoring servers with high latency, and filtering content to create an index that meets the vendors’ needs, not the users’ needs. I know that enterprise search must handle diverse content types, must cope with security and access controls, and perform more functions that one of those two inch wide Swiss Army knives on sale at the airport in Geneva. I understand. My concern is broader is this write up. Please, bear with me.

Read more

Microsoft and Its Research about Search

May 13, 2011

We loved Microsoft’s use of the “beyond search” phrase to describe some of its earlier efforts to wrest the King of Search crown from the rampaging Googzilla.

Non-techies tend to take the complexities and subtle nuances of search for granted.  I’ll admit that at one point I was also in the dark.  Since the switch has been flipped, I find sites like the one summarizing Microsoft’s Information Retrieval and Mining research incredibly interesting.

The overview explains:

We aim at developing fundamental technologies for general web search and enterprise search. Our main technology areas include machine learning, information retrieval, data mining, and natural language processing. We partner with Microsoft Live Search and SharePoint Search. Currently, we are working on five projects: Learning to Rank, Search Result Ranking, Data Selection in Search, Search Log Data Mining, and Next Generation Enterprise Search.

I recommend scanning the page if the subject piques your interest, but here are some of the highlights.  For ranking web pages, they have advanced the common practice of web graph data to large-scale graph data collected from users’ own browsing habits.  Complimenting this achievement is the work on a search log mining platform, culling search session and click-thru data, enabling the graph modeling mentioned above.  They are even delving into what is on the tips of many tongues: enterprise social computing.

There are a lot of critics of Bing, even more of SharePoint.  Regardless, Microsoft refuses to stand down when it comes to search development.  Will these advancements launch Microsoft to the top of the field?  Perhaps, with a little streamlining of their products or more negative PR for Google.  If Apple could rise from the grave with the iPod, I guess anything is possible.

Sarah Rogers, May 13, 2011

Freebie unlike the technical and engineering support some of SharePoint search users experience

Simplexo Search

January 3, 2011

Short honk: I learned about Simplexo earlier this year. The company provides “optimized search for your mobile.” The company has a product that makes it possible for a user of Simplexo to search a desktop computer from a mobile device or a Web browser. Yahoo UK reported in “Simplexo Aims to Simplify Remote Desktop Searches”:

Simplexo said that the software could find emails in Outlook and Exchange Server, as well as documents in SharePoint, spreadsheets and database records, and can scour social networking applications such as Facebook, Twitter and LinkedIn.

The service is to go live early in 2011. If you are interested in this type of product, navigate to this link and sign up.

Stephen E Arnold, January 3, 2011

Freebie

Repositioning 2011: The Mad Scramble

December 15, 2010

Yep, the new year fast approaches. Time to turn one’s thoughts to vendors of search, content processing, data fusion, text mining, and—who could forget?—knowledge management. In the last two weeks, I have done several live-and-in-person briefings about ArnoldIT.com’s views on enterprise search and related disciplines.

Today enterprise search has become what I call an elastic concept. It is stretched over a baker’s dozen of quite divergent information retrieval concepts. Examples range from the old bugaboo of many companies customer support to the effervescence of knowledge management. In between the hard realities of the costs of support actual customers and the frothy topping of “knowledge”.

Several trends are pushing through the fractured landscape of information retrieval. Like earthquakes, the effects can vary significantly depending on one’s position at the time of the event.

image

Source: http://www.sportsnet.ca/gallery/2009/12/30/scramble_gal_640.jpg

Search can looked at in different ways. One can focus on a particular problem; for example, content management system repositories. The challenge is to find information in these systems. One would think that after years of making Web pages, the problem would be solved. Apparently not. CMS with embedded search stubs trigger some grousing in most of the organizations with which I am familiar. Search works, just not exactly as the users expect. A vendor of search technology can position the search solution as one that makes it easy for users to locate information in a CMS. This is, of course, the pitch of numerous Microsoft Certified Gold resellers of various types of search solutions, utilities, and work arounds. This an example of a search market defined by the type of enterprise system that creates a retrieval problem.

Other problems for search crop up when specific rules and regulations mandate a particular type of information processing. One example is the eDiscovery market. Anyone can be sued, and eDiscovery systems have to make content findable, but the users of an eDiscovery system have quite particular needs. One example is bookkeeping so that the time and search process can be documented and provided upon request under certain conditions.

Social media has created a new type of problem. One can take a specific industry sector such as the Madison Avenue crowd and apply information technology to the social media problem. The idea is for a search system to “harvest” data from social content sources like Facebook or Twitter, process the text which can be ambiguous, and generate information about how the people creating Facebook messages or tweets perceive a product, person, ad, or some other activity for the advertising team. The idea is that search unlocks hidden information. The Mad Ave crowd thinks in terms of nuggets of information that will allow the ad team to upsell the advertiser. Search is doing search work but the object of the exercise is to make sense out of content streams that are too voluminous for a single person to read. This type of search market—which may not be classic search and retrieval at all—is closer to what various intelligence agencies want software to do to transcribed phone calls, email, and general information from a range of sources.

Let’s stop with the examples of information access problems already. There are more information access problems than at any other time, and I want to move on to the impact of these quite diverse problems upon vendors in 2011.

Now let’s take a vendor that has a search system that can index Word documents, email, and content found in most office environments. Nothing tricky like product specifications, chemical structures, or the data in the R&D department’s lab notebooks. For mainstream search, here is the problem:

Commoditization

Right now (now pun on the vendor of customer support solutions by the way) anyone can download an open source search solution. It helps if the person downloading Lucene, Solr, or one of the other open source solutions has a technical bent. If not, a local university’s computer science department can provide a student to do the installation and get the system up and running. If the part time contracting approach won’t work, you can hire a company specializing in open source to do the work. There are dozens of these outfits bouncing around.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta