Search Silver Bullets, Elixirs, and Magic Potions: Thinking about Findability in 2012
November 10, 2011
I feel expansive today (November 9, 2011), generous even. My left eye seems to be working at 70 percent capacity. No babies are screaming in the airport waiting area. In fact, I am sitting in a not too sticky seat, enjoying the announcements about keeping pets in their cage and reporting suspicious packages to law enforcement by dialing 250.
I wonder if the mother who left a pink and white plastic bag with a small bunny and box of animal crackers is evil. Much in today’s society is crazy marketing hype and fear mongering.
Whilst thinking about pets in cages and animal crackers which may be laced with rat poison, and plump, fabric bunnies, my thoughts turned to the notion of instant fixes for horribly broken search and content processing systems.
I think it was the association of the failure of societal systems that determined passengers at the gate would allow a pet to run wild or that a stuffed bunny was a threat. My thoughts jumped to the world of search, its crazy marketing pitches, and the satraps who have promoted themselves to “expert in search.” I wanted to capture these ideas, conforming to the precepts of the About section of this free blog. Did I say, “Free.”
A happy quack to http://www.alchemywebsite.com/amcl_astronomical_material02.html for this image of the 21st century azure chip consultant, a self appointed expert in search with a degree in English and a minor in home economics with an emphasis on finger sandwiches.
The Silver Bullets, Garlic Balls, and Eyes of Newts
First, let me list the instant fixes, the silver bullets, the magic potions, the faerie dust, and the alchemy which makes “enterprise search” work today. Fasten your alchemist’s robe, lift your chin, and grab your paper cone. I may rain on your magic potion. Here are 14 magic fixes for a lousy search system. Oh, one more caveat. I am not picking on any one company or approach. The key to this essay is the collection of pixie dust, not a single firm’s blend of baloney, owl feathers, and goat horn.
- Analytics (The kind equations some of us wrangled and struggled with in Statistics 101 or the more complex predictive methods which, if you know how to make the numerical recipes work, will get you a job at Palantir, Recorded Future, SAS, or one of the other purveyors of wisdom based on big data number crunching)
- Cloud (Most companies in the magic elixir business invoke the cloud. Not even Macbeth’s witches do as good a job with the incantation of Hadoop the Loop as Cloudera,but there are many contenders in this pixie concoction. Amazon comes to mind but A9 gives me a headache when I use A9 to locate a book for my trusty e Reeder.)
- Clustering (Which I associate with Clustify and Vivisimo, but Vivisimo has morphed clustering in “information optimization” and gets a happy quack for this leap)
- Connectors (One can search unless one can acquire content. I like the Palantir approach which triggered some push back but I find the morphing of ISYS Search Software a useful touchstone in this potion category)
- Discovery systems (My associative thought process offers up Clearwell Systems and Recommind. I like Recommind, however, because it is so similar to Autonomy’s method and it has been the pivot for the company’s flip flow from law firms to enterprise search and back to eDiscovery in the last 12 or 18 months)
- Federation (I like the approach of Deep Web Technologies and for the record, the company does not position its method as a magical solution, but some federating vendors do so I will mention this concept. Yhink mash up and data fusion too)
- Natural language processing (My candidate for NLP wonder worker is Oracle which acquired InQuira. InQuira is a success story because it was formed from the components of two antecedent search companies, pitched NLP for customer support,and got acquired by Oracle. Happy stakeholders all.)
- Metatagging (Many candidates here. I nominate the Microsoft SharePoint technology as the silver bullet candidate. SharePoint search offers almost flawless implementation of finding a document by virtue of knowing who wrote it, when, and what file type it is. Amazing. A first of sorts because the method has spawned third party solutions from Austria to t he United States.)
- Open source (Hands down I think about IBM. From Content Analytics to the wild and crazy Watson, IBM has open source tattooed over large expanses of its corporate hide. Free? Did I mention free? Think again. IBM did not hit $100 billion in revenue by giving software away.)
- Relationship maps (I have to go with the Inxight Software solution. Not only was the live map an inspiration to every business intelligence and social network analysis vendor it was cool to drag objects around. Now Inxight is part of Business Objects which is part of SAP, which is an interesting company occupied with reinventing itself and ignored TREX, a search engine)
- Semantics (I have to mention Google as the poster child for making software know what content is about. I stand by my praise of Ramanathan Guha’s programmable search engine and the somewhat complementary work of Dr. Alon Halevy, both happy Googlers as far as I know. Did I mention that Google has oodles of semantic methods, but the focus is on selling ads and Pandas, which are somewhat related.)
- Sentiment analysis (the winner in the sentiment analysis sector is up for grabs. In terms of reinventing and repositioning, I want to acknowledge Attensity. But when it comes to making lemonade from lemons, check out Lexalytics (now a unit of Infonics). I like the Newssift case, but that is not included in my free blog posts and information about this modest multi-vehicle accident on the UK information highway is harder and harder to find. Alas.)
- Taxonomies (I am a traditionalist, so I quite like the pioneering work of Access Innovations. But firms run by individuals who are not experts in controlled vocabularies, machine assisted indexing, and ANSI compliance have captured the attention of the azure chip, home economics, and self appointed expert crowd. Access innovations knows its stuff. Some of the boot camp crowd, maybe somewhat less? I read a blog post recently that said librarians are not necessary when one creates an enterprise taxonomy. My how interesting. When we did the ABI/INFORM and Business Dateline controlled vocabularies we used “real” experts and quite a few librarians with experience conceptualizing, developing, refining, and ensuring logical consistency of our word lists. It worked because even the shadow of the original ABI/INFORM still uses some of our term 30 plus years later. There are so many taxonomy vendors, I will not attempt to highlight others. Even Microsoft signed on with Cognition Technologies to beef up its methods.)
- XML (there are Google and MarkLogic again. XML is now a genuine silver bullet. I thought it was a markup language. Well, not any more, pal.)
2012: Enterprise Search Yields to Metadata?
October 30, 2011
Oh, my. The search dragon has been killed by metadata.
You might find yourself on an elevator ready to get off on a specific floor. The rest of your trip will start from that point and that point only. The same is true for learning, conversing, actually just about anything. We all have a particular place we want to enter the conversation. MSDN’s Microsoft Enterprise Content Management (ECM) Team Blog’s recent posting on “Taxonomy: Starting from Scratch” was a breath of fresh air in the way it addressed anyone–no matter what floor they needed.
For the novices to Managed Metadata Service, a service providing tools to foster a rich corporate taxonomy, the article recommends a starting point: Introducing Enterprise Metadata Management
According to the article. The more seasoned users are reminded to point their browsers towards import capabilities. Of course, there are more specific needs, and links to go with them, addressed too.
The article recommends the following for the clients who need a comprehensive understanding of both common and specific corporate terms. The author Ryan Duguid states:
“The General Business Taxonomy consists of around 500 terms describing common functional areas that exist in most businesses. The General Business Taxonomy can be imported in to the SharePoint 2010 term store within minutes and provides a great starting point for customers looking to build a corporate vocabulary and take advantage of the Managed Metadata Service.”
Overall, this article is worth keeping tucked away for a day when you might need information on WAND, SharePoint, or metadata and taxonomy in general because of the directness and the accessible next steps the variety of links offer.
Megan Feil, October 30, 2011
Sponsored by Pandia.com
Google Amazon Dust Bunnies
October 13, 2011
The addled goose has a bum eye, more air miles than a 30 something IBM sales engineer, and lousy Internet connectivity. T Mobile’s mobile WiFi sharing gizmo is a door stop. Imagine my surprise when I read “Google Engineer: “Google+ Example of Our Complete Failure to Understand Platforms.” In one webby write up, the dust bunnies at Google and Amazon were moved from beneath the bed to the white nylon carpet of a private bed chamber.
I am not sure the information in the article is spot on. Who can certain about the validity of any information any longer. The goose cannot. But the write up reveals that Amazon is an organization with political “infighting”. What’s new? Nothing. Google, on the other hand, evidences a bit of reflexivity. I will not drag the Motorola Mobility event into this brief write up, but students of business may find that acquisition worth researching.
Here is the snippet which caught my attention:
[A] high-profile Google engineer … mistakenly posted a long rant about working at Amazon and Google’s own issues with creating platforms on Google+. Apparently, he only wanted to share it internally with everybody at Google, but mistaken shared it publicly. For the most part, [the] post focuses on the horrors of working at Amazon, a company that is notorious for its political infighting. The most interesting part to me, though, is … [the] blunt assessment of what he perceives to be Google’s inability to understand platforms and how this could endanger the company in the long run.
I want to step back. In fact, I want to go into MBA Mbit mode.
First, this apparent management behavior is the norm in many organizations, not the companies referenced in the post.I worked for many years in the old world of big time consulting. Keep in mind that my experiences date from 1973, but management idiosyncrasies were the rule. The majority of these management gaffes took place in a slower, not digital world. Sure, speed was important. In the physics of information speed is relative. Today the perceived velocity is great and the diffusion of information adds a supercharger to routine missteps. Before getting too excited about the insights into one or two companies, most organizations today are perilously close to dysfunction. Nothing special here, but today’s environment gives what is normal some added impact. Consolidation and an absence of competition makes the stakes high. Bad decisions add a thrill to the mundane. Big decisions weigh more and can have momentum that does more quickly than a bad decision in International Harvester or NBC in the 1970s.
Second, technology invites bad decisions. Today most technologies are “hidden”, not exposed like the guts of a Model T or my mom’s hot wire toaster which produced one type of bagel—burned. Not surprisingly, even technically sophisticated managers struggle to understand the implications of a particular technical decision. To make matters worse, senior mangers have to deal with “soft” issues and technical training, even if limited, provide few beacons for the course to chart. Need some evidence. Check out the Hewlett Packard activities over the last 18 months. I routinely hear such statements as “we cannot locate the invoice” and “tell us what to do.” Right. When small things go wrong, how can the big things go right? My view is that chance is a big factor today.
Third, the rush to make the world social, collaborative, and open means that leaks, flubs, sunshine, and every other type of exposure is part of the territory.. I find it distressing that sophisticated organizations fall into big pot holes. As I write this, I am at an intelligence conference, and the rush to openness has an unexpected upside for some information professionals. With info flowing around without controls, the activities of authorities are influenced by the info bonanza. Good and bad guys have unwittingly created a situation that makes it less difficult to find the footprints of an activity. The post referenced in the source article is just one more example of what happens when information policies just don’t work. Forget trust. Even the technically adept cannot manage individual communications. Quite a lesson I surmise.
In search and content processing,the management situation is dire. Many companies are uncertain about pricing,features, services, and innovation. Some search vendors describe themselves with nonsense and Latinate constructions. Other flip flop for search to customer support to business intelligence without asking themselves, “Does this stuff actually work?” Many firms throw adjectives in front of jargon and rely on snake charming sales people to close deals. Good management or bad management? Neither. We are in status quo management with dollops of guessing and wild bets.
My take on this dust bunny matter is that we have what may be an unmanageable and ungovernable situation. No SharePoint governance conference is going to put the cat back in the bag. No single email, blog post, or news article will make a difference. Barn burned. Horse gone. Wal-Mart is building on the site. The landscape has changed. Now let the “real” consultants explain the fix. Back to the goose pond for me. Collaborate on that.
Stephen E Arnold, October 13, 2011
Sponsored by Pandia.com
xx
xx
Lucid Imagination: Open Source Search Reaches for Big Data
September 30, 2011
We are wrapping up a report about the challenges “big data” pose to organizations. Perhaps the most interesting outcome of our research is that there are very few search and content processing systems which can cope with the digital information required by some organizations. Three examples merit listing before I comment on open source search and “big data”.
The first example is the challenge of filtering information required by orgnaizatio0ns produced within the organization and by the organizations staff, contractors, and advisors. We learned in the course of our investigation that the promises of processing updates to Web pages, price lists, contracts, sales and marketing collateral, and other routine information are largely unmet. One of the problems is that the disparate content types have different update and change cycles. The most widely used content management system based on our research results is SharePoint, and SharePoint is not able to deliver a comprehensive listing of content without significant latency. Fixes are available but these are engineering tasks which consume resources. Cloud solutions do not fare much better, once again due to latency. The bottom line is that for information produced within an organization employees are mostly unable to locate information without a manual double check. Latency is the problem. We did identify one system which delivered documented latency across disparate content types of 10 to 15 minutes. The solution is available from Exalead, but the other vendors’ systems were not able to match this problem of putting fresh, timely information produced within an organization in front of system users. Shocked? We were.
Reducing latency in search and content processing systems is a major challenge. Vendors often lack the resources required to solve a “hard problem” so “easy problems” are positioned as the key to improving information access. Is latency a popular topic? A few vendors do address the issue; for example, Digital Reasoning and Exalead.
Second, when organizations tap into content produced by third parties, the latency problem becomes more severe. There is the issue of the inefficiency and scaling of frequent index updates. But the larger problem is that once an organization “goes outside” for information, additional variables are introduced. In order to process the broad range of content available from publicly accessible Web sites or the specialized file types used by certain third party content producers, connectors become a factor. Most search vendors obtain connectors from third parties. These work pretty much as advertised for common file types such as Lotus Notes. However, when one of the targeted Web sites such as a commercial news services or a third-party research firm makes a change, the content acquisition system cannot acquire content until the connectors are “fixed”. No problem as long as the company needing the information is prepared to wait. In my experience, broken connectors mean another variable. Again, no problem unless critical information needed to close a deal is overlooked.
MarkLogic, FAST, Categorical Affirmatives, and a Direction Change
July 5, 2011
I weakened this morning (July 4, 2011) with a marketing Fourth of July boom. I received one of those ever present LinkedIn updates putting a comment from the Enterprise Search Engine Professionals Group in front of me.
The MarkLogic positioning exploded on my awareness like a Fourth of July skyrocket’s burst.
Most of the comments on the LinkedIn group are ho hum. One hot topic has been Microsoft’s failure to put much effort in its blogs about Fast Search & Transfer’s technology. Snore. Microsoft put down $1.2 billion for Fast, made some marketing noises, and had a fellow named Mr. Treo-something talk to me about the “new” Fast Search system. Then search turned out to be more like a snap in but without the simplicity of a Web part. Microsoft moved on and search is there, but like Google’s shift to Android, search is not where the action is. I am not sure who “runs” the enterprise search unit at Microsoft. Lots of revolving door action is my impression of Microsoft’s management approach in the last year.
The noise died down and Fast has become another component in the sprawling Shanghai of code known as SharePoint 2010. Making Fast “fast” and tuning it to return results that don’t vary with each update has created a significant amount of business for Microsoft partners “certified” to work on Fast Search. Licensees of the Linux/Unix version of ESP are now like birds pushed from the next by an impatient mother.
New MarkLogic Market Positioning?
Set Microsoft aside for a moment and look at this post from a MarkLogic professional who once worked at Fast Search and subsequently at Microsoft. I am not sure how to hyperlink to LinkedIn posts without generating a flood of blue and white screens begging for log in, sign up, and money. I will include a link, but you are on your own.
Here’s the alleged MarkLogic professional’s comment:
Many organizations are replacing FAST with MarkLogic. MarkLogic offers a scalable enterprise search engine with all the features of FAST plus more…
Wow.
An XML engine with wrappers is now capable of “all” the Fast features. In my new monograph “The New Landscape of Enterprise Search”, I took some care to review information presented by Fast at CERN, the wizard lair in Europe, about Fast Search’s effort to rewrite Fast ESP, which was originally a Web search engine. The core was wrapped to convert Web search into enterprise search. This was neither quick nor particularly successful. Fast Search & Transfer ran into some tough financial waters, ended up the focus of a government investigation, and was quickly sold for a price that surprised me and the goslings in Harrod’s Creek.
You can get the details of the focus of the planned reinvention of the Fast system and the link to the source document at CERN which I reference in my Landscape study. A rewrite indicates that some functions were not in 2007 and 2008 performing in a manner that was acceptable to someone in Fast Search’s management. Then the acquisition took place. The Linux/Unix support was nuked. Fast under Microsoft’s wing has become a utility in the incredible assemblage of components that comprises SharePoint 2010. I track the SharePoint ecosystem in my information service SharePointSemantics.com. If you haven’t seen the content, you might want to check it out.
Are Webinars the Backbone of Concept Searching Marketing?
July 5, 2011
On the surface, Concept Searching looks like some of the other analytics company that asserts steady growth. What is interesting is that when some value adding software co9mpanies market, webinars or online lectures and demos are a component of a broader marketing program, Concept Search seems to rely heavily on webinars. We find this interesting.
We looked into one search company which was using Twitter to make the text processing service a hot trend. From our vantage point, it seems that Concept Searching is using social media in a more modest way.
Though it sounds like Spiderman should be involved, a webinar is simply an online seminar or workshop. The great thing about a webinar is that it is usually interactive and allows all participates to give, receive and discuss the topics at hand. Additionally, geographical boundaries are not an issue and these presentations are very low in cost.
When perusing Concept Searching’s Web site, you will find an entire events page dedicated to their upcoming exhibitions and a list and description of their current webinars. Some titles include: “Designing Information Architecture for SharePoint: Making Sense in a World of SharePoint Architecture” and “De-mystifying Content Types: Four Key Content Types of Leverage.” You simply register and voilà, you join in on all the fun. They also have a page dedicated to previously recorded webinars that you can access at your leisure.
I moderate webinars for a couple of outfits, and these are often expensive programs. There is time, often lots of time, required to prepare the text, create the graphics and demos, and then build an audience. I participate in webinars when I am paid to do so. However, I do not participate in webinars. The reason is that I am receiving inputs, experiencing interruptions even when the door is closed, and working to respond to ad hoc requests from clients.
I do think that webinars are somewhat more useful than attending certain conferences. Over the last couple of years, conferences are more like fraternity and sorority parties. But that perception may be a function of my age and distaste for rock and roll, mixed media events with lots of 20 somethings opining about social media and organic search. Yikes, digital bonsai.
This leads me to the question, “Who has time to participate in webinars?” If these are buyers of high end solutions, great. However, if I were the boss of a company where webinars consumed staff time, I would be asking some questions about the efficacy of the method.
I find reading a Web page and using an online demo or downloading code useful. Webinars may be too zippy for an old goose like me. One thing for sure: lots of companies are using webinars to hold down the cost of on site sales calls and getting individuals “interested” in a product or service to cough up an email address.
Stephen E Arnold, July 5, 2011
Sponsored by Pandia.com, publishers of the New Landscape of Enterprise Search
D4 and RiverGlass Join eDiscovery Forces
June 27, 2011
As announced on PRWeb in “D4, LLC, Partners with RiverGlass, Inc. Enabling Progressive Enhancements to D4’s eDiscovery Service Offerings,” the two companies have signed an agreement to form a strategic partnership for D4 to distribute, install and host the RiverGlass solutions.
D4 focuses on litigation support and eDiscovery services to law firms and corporate law departments. RiverGlass, Inc. is a provider of advanced information collection and analysis solutions focusing on government agencies, as well as eDiscovery and risk management applications to major corporations. The write up said:
D4’s highly technical method to eDiscovery and digital forensics leverages the maximum benefits available from the RiverGlass application.
With the solution:
Customers can harvest from many different types of data stores and ingest ESI in native format without having to have it processed. This includes network stores, SharePoint sites, websites, social media as well as structured databases.
This type of eDiscovery is blurring the lines between search and text analytics, creating a powerful tool for lawyers. It markedly improves the labor-intensive and mistake-prone legal discovery process.
Will eDiscovery go the way of customer support. What looks like a trivial exercise in using traditional search and retrieval for customer support is tough. Some of the vendors chasing customers in this segment are learning that customer support is more difficult than it appears. eDiscovery strikes me as having a higher level of complexity.
It is interesting to watch the shape shifting that is underway in the content processing sector.
Stephen E Arnold, June 27, 2011
You can read more about enterprise search and retrieval in The New Landscape of Enterprise Search, published my Pandia in Oslo, Norway, in June 2011.
Search: An Information Retrieval Fukushima?
May 18, 2011
Information about the scale of the horrific nuclear disaster in Japan at the Fukushima Daiichi nuclear complex is now becoming more widely known.
Expertise and Smoothing
My interest in the event is the engineering of a necklace of old-style reactors and the problems the LOCA (loss of coolant accident) triggered. The nagging thought I had was that today’s nuclear engineers understood the issues with the reactor design, the placement of the spent fuel pool, and the risks posed by an earthquake. After my years in the nuclear industry, I am quite confident that engineers articulated these issues. However, the technical information gets “smoothed” and simplified. The complexities of nuclear power generation are well known at least in engineering schools. The nuclear engineers are often viewed as odd ducks by the civil engineers and mechanical engineers. A nuclear engineer has to do the regular engineering stuff of calculating loads and looking up data in hefty tomes. But the nukes need grounding in chemistry, physics, and math, lots of math. Then the engineer who wants to become a certified, professional nuclear engineer has some other hoops to jump through. I won’t bore you with the details, but the end result of the process produces people who can explain clearly a particular process and its impacts.
Does your search experience emit signs of troubles within?
The problem is that art history majors, journalists, failed Web masters, and even Harvard and Wharton MBAs get bored quickly. The details of a particular nuclear process makes zero sense to someone more comfortable commenting about the color of Mona Lisa’s gown. So “smoothing” takes place. The ridges and outcrops of scientific and statistical knowledge get simplified. Once a complex situation has been smoothed, the need for hard expertise is diminished. With these simplifications, the liberal arts crowd can “reason” about risks, costs, upsides, and downsides.
A nuclear fall out map. The effect of a search meltdown extends far beyond the boundaries of a single user’s actions. Flawed search and retrieval has major consequences, many of which cannot be predicted with high confidence.
Everything works in an acceptable or okay manner until there is a LOCA or some other problem like a stuck valve or a crack in a pipe in a radioactive area of the reactor. Quickly the complexities, risks, and costs of the “smoothed problem” reveal the fissures and crags of reality.
Web search and enterprise search are now experiencing what I call a Fukushima event. After years of contentment with finding information, suddenly the dashboards are blinking yellow and red. Users are unable to find the information needed to do their job or something as basic as locate a colleague’s telephone number or office location. I have separated Web search and enterprise search in my professional work.
I want to depart for a moment and consider the two “species” of search as a single process before the ideas slip away from me. I know that Web search processes publicly accessible content, has the luxury of ignoring servers with high latency, and filtering content to create an index that meets the vendors’ needs, not the users’ needs. I know that enterprise search must handle diverse content types, must cope with security and access controls, and perform more functions that one of those two inch wide Swiss Army knives on sale at the airport in Geneva. I understand. My concern is broader is this write up. Please, bear with me.
Microsoft and Its Research about Search
May 13, 2011
We loved Microsoft’s use of the “beyond search” phrase to describe some of its earlier efforts to wrest the King of Search crown from the rampaging Googzilla.
Non-techies tend to take the complexities and subtle nuances of search for granted. I’ll admit that at one point I was also in the dark. Since the switch has been flipped, I find sites like the one summarizing Microsoft’s Information Retrieval and Mining research incredibly interesting.
The overview explains:
We aim at developing fundamental technologies for general web search and enterprise search. Our main technology areas include machine learning, information retrieval, data mining, and natural language processing. We partner with Microsoft Live Search and SharePoint Search. Currently, we are working on five projects: Learning to Rank, Search Result Ranking, Data Selection in Search, Search Log Data Mining, and Next Generation Enterprise Search.
I recommend scanning the page if the subject piques your interest, but here are some of the highlights. For ranking web pages, they have advanced the common practice of web graph data to large-scale graph data collected from users’ own browsing habits. Complimenting this achievement is the work on a search log mining platform, culling search session and click-thru data, enabling the graph modeling mentioned above. They are even delving into what is on the tips of many tongues: enterprise social computing.
There are a lot of critics of Bing, even more of SharePoint. Regardless, Microsoft refuses to stand down when it comes to search development. Will these advancements launch Microsoft to the top of the field? Perhaps, with a little streamlining of their products or more negative PR for Google. If Apple could rise from the grave with the iPod, I guess anything is possible.
Sarah Rogers, May 13, 2011
Freebie unlike the technical and engineering support some of SharePoint search users experience
Simplexo Search
January 3, 2011
Short honk: I learned about Simplexo earlier this year. The company provides “optimized search for your mobile.” The company has a product that makes it possible for a user of Simplexo to search a desktop computer from a mobile device or a Web browser. Yahoo UK reported in “Simplexo Aims to Simplify Remote Desktop Searches”:
Simplexo said that the software could find emails in Outlook and Exchange Server, as well as documents in SharePoint, spreadsheets and database records, and can scour social networking applications such as Facebook, Twitter and LinkedIn.
The service is to go live early in 2011. If you are interested in this type of product, navigate to this link and sign up.
Stephen E Arnold, January 3, 2011
Freebie