Enterprise Search Has a Back Seat Driver

July 3, 2012

Once again, the technology road behind enterprise search is being questioned and some are mapping out a new route for a company road trip. According to Norm.al’s article, ‘Search vs. Findability vs. Information Retrieval’ findability is the new buzz word of today, but utilizing a back seat driver seems questionable.

The self-appointed tour guides have determined:

“What Findability should be, and what the Semantic Web promises is a new approach. Order first and then the rest will be easy. By using Faceted Search or other Information Retrieval interfaces findability is achieved. Computer Search is based on indexing a junk of data, while Findability should be a process defined at the moment when the data are created.”

“If we could note the order, is Junk of Data, to Order by a third party who analyzes your content based on keywords, NLP and some other great metrics.”

No one really likes a back seat driver and now they are trying to hop in and bark out directions. Sometimes the search engine road may get a little bumpy, but utilizing the right landmarks will get you where you need to go without the interference of detours.

The pavement on this new road seems to still be a bit wet, so one might yet find themselves spattered with debris. Will these distinctions stick? We think not. Search is dead. Long live the next set of buzzwords from self-appointed experts, “real” analysts, and failed Webmasters.

Jennifer Shockley, July 3, 2012

Sponsored by Polyspot

Microsoft Snags a Big Search Project

July 2, 2012

Search Content Management recently reported on a new win for Microsoft in the article “FAST Enterprise Search at Core of European Court of Human Rights Website.”

According to the article, The European Court of Human Rights has quite a task ahead of it. After nearly a decade of using a site designed using Fulcrum Technologies’ document management software,  ECHR has decided to use Microsoft’s FAST Enterprise Search to overhaul its Web site in order to make it as intuitive and simple as Amazon and simplify the search process. In addition to this, ECHR is also working to make the new site accessible to mobile devices.

It is imperative that this happens because the ECHR Web site currently receives 4.6 million visits a year from lawyers, government officials, students, professors, journalists and citizens seeking rulings and information about the state of individual freedoms in Europe. In addition to this, the new site will enable search of 90,000 documents on rulings that affect more than 800 million inhabitants.

When discussing the upcoming project, the article states:

“Beginning next week, ECHR expects to expand the reach of its site search capabilities to more than 5 million users and be able to accommodate 5,000 visitors at a given time when rulings are made. The integration of document management, enterprise search and a cloud-based collaboration in the Web CMS promises automated Google indexing for public-facing documents, improved ECHR real-time collaboration efforts and reduced overhead.”

Due to the nature and status of this project, being selected to do the ECHR’s Web site redesign is certainly a win for Microsoft.

Jasmine Ashton, July 2, 2012

Sponsored by PolySpot

New Version of Funnelback

June 25, 2012

Funnelback’s latest version boasts a number of new features, we learned at Regina’s List in “Funnelback 11 Launched with Automated Tuning and SEO Assistant.” The press release describes the new Automated Tuning component:

“Brett Matson, Managing Director of Funnelback, said Funnelback 11 has the ability to continually and automatically optimize its ranking using a correct answer set determined by the customer. This enables customers to intuitively adjust the search engine ranking algorithm to ensure it continuously adapts and is optimized to the ever-changing characteristics of their own information environment. A related benefit is that it exposes how effectively the search engine is ranking, said Mr. Matson.”

Other new features include an integrated SEO assistant, updatable indexes, efficient crawling, 64-bit indexing, a new high performance search interface, a broken links report, and a People Search feature for users’ customers. The software is available on Windows, on Linux, and as a cloud service.

Based in Australia, Funnelback grew from technology developed by premier scientific research agency CSIRO. The company was established in 2005, and was bought by UK content management outfit Squiz in 2009. They offer Enterprise and Website Search, both of which include customizable features. Their memorable name derives from the names of two Australian spiders, the funnel-web and the red back.

Cynthia Murrell, June 25, 2012

Sponsored by PolySpot

The Alleged Received Wisdom about Predictive Coding

June 19, 2012

Let’s start off with a recommendation. Snag a copy of the Wall Street Journal and read the hard copy front page story in the Marketplace section, “Computers Carry Water of Pretrial Legal Work.” In theory, you can read the story online if you don’t have Sections A-1, A-10 of the June 18, 2012, newspaper. Check out a variant of the story appears as “Why Hire a Lawyer? Computers Are Cheaper.”

Now let me offer a possibly shocking observation: The costs of litigation are not going down for certain legal matters. Neither bargain basement human attorneys nor Fancy Dan content processing systems make the legal bills smaller. Your mileage may vary, but for those snared in some legal traffic jams, costs are tough to control. In fact, search and content processing can impact costs, just not in the way some of the licensees of next generation systems expect. That is one of the mysteries of online that few can penetrate.

The main idea of the Wall Street Journal story is that “predictive coding” can do work that human lawyers do for a higher cost but sometimes with much less precision. That’s the hint about costs in my opinion. But the article is traditional journalistic gold. Coming from the Murdoch organization, what did I expect? i2 Group has been chugging along with relationship maps for case analyses of important matters since 1990. Big alert: i2 Ltd. was a client of mine. Let’s see that was more than a couple of weeks ago that basic discovery functions were available.

The write up quotes published analyses which indicate that when humans review documents, those humans get tired and do a lousy job. The article cites “experts” who from Thomson Reuters, a firm steeped in legal and digital expertise, who point out that predictive coding is going to be an even bigger business. Here’s the passage I underlined: “Greg McPolin, an executive at the legal outsourcing firm Pangea3 which is owned by Thomson Reuters Corp., says about one third of the company’s clients are considering using predictive coding in their matters.” This factoid is likely to spawn a swarm of azure chip consultants who will explain how big the market for predictive coding will be. Good news for the firms engaged in this content processing activity.

What goes faster? The costs of a legal matter or the costs of a legal matter that requires automation and trained attorneys? Why do companies embrace automation plus human attorneys? Risk certainly is a turbo charger?

The article also explains how predictive coding works, offers some cost estimates for various actions related to a document, and adds some cautionary points about predictive coding proving itself in court. In short, we have a touchstone document about this niche in search and content processing.

My thoughts about predictive coding are related to the broader trends in the use of systems and methods to figure out what is in a corpus and what a document is about.

First, the driver for most content processing is related to two quite human needs. First, the costs of coping with large volumes of information is high and going up fast. Second, the need to reduce risk. Most professionals find quips about orange jump suits, sharing a cell with Mr. Madoff, and the iconic “perp walk” downright depressing. When a legal matter surfaces, the need to know what’s in a collection of content like corporate email is high. The need for speed is driven by executive urgency. The cost factor clicks in when the chief financial officer has to figure out the costs of determining what’s in those documents. Predictive coding to the rescue. One firm used the phrase “rocket docket” to communicate speed. Other firms promise optimized statistical routines. The big idea is that automation is fast and cheaper than having lots of attorneys sifting through documents in printed or digital form. The Wall Street Journal is right. Automated content processing is going to be a big business. I just hit the two key drivers. Why dance around what is fueling this sector?

Read more

More Predictive Silliness: Coding, Decisioning, Baloneying

June 18, 2012

It must be the summer vacation warm and fuzzies. I received another wild analytics news release today. This one comes from 5WPR, “a top 25 PR agency.” Wow. I learned from the spam: PeekAnalytics “delivers enterprise class Twitter analytics and help marketers understand their social consumers.”

What?

Then I read:

By identifying where Twitter users exist elsewhere on the Web, PeekAnalytics offers unparalleled audience metrics from consumer data aggregated not just from Twitter, but from over sixty social sites and every major blog platform.

The notion of algorithms explaining anything is interesting. But the problem with numerical recipes is that those who use outputs may not know what’s going on under the hood. Wide spread knowledge of the specific algorithms, the thresholds built into the system, and the assumptions underlying the selection of a particular method is in short supply.

Analytics is the realm of the one percent of the population trained to understand the strengths and weaknesses of specific mathematical systems and methods. The 99 percent are destined to accept analytics system outputs without knowing how the data were selected, shaped, formed, and presented given the constraints of the inputs. Who cares? Well, obviously not some marketers of predictive analytics, automated indexing, and some trigger trading systems. Too bad for me. I do care.

When I read about analytics and understanding, I shudder. As an old goose, each body shake costs me some feathers, and I don’t have many more to lose at age 67. The reality of fancy math is that those selling its benefits do not understand its limitations.

Consider the notion of using a group of analytic methods to figure out the meaning of a document. Then consider the numerical recipes required to identify a particular document as important from thousands or millions of other documents.

When companies describe the benefits of a mathematical system, the details are lost in the dust. In fact, bringing up a detail results in a wrinkled brow. Consider the Kolmogorov-Smirnov Test. Has this non parametric test been applied to the analytics system which marketers have presented to you in the last “death by PowerPoint” session? The response from 99.5 percent of the people in the world is, “Kolmo who?” or “Isn’t Smirnov a vodka?” Bzzzz. Wrong.

Mathematical methods which generate probabilities are essential to many business sectors. When one moves fuel rods at a nuclear reactor, the decision about what rod to put where is informed by a range of mathematical methods. Special training experts, often with degrees in nuclear engineering plus post graduate work handle the fuel rod manipulation. Take it from me. Direct observation is not the optimal way to figure out fuel pool rod distribution. Get the math “wrong” and some pretty exciting events transpire. Monte Carlo anyone? John Gray? Julian Steyn? If these names mean nothing to you, you would not want to sign up for work in a nuclear facility.

Why then would a person with zero knowledge of how numerical recipes, oddball outputs from particular types of algorithms, and little or know experience with probability methods use the outputs of a system as “truth.” The outputs of analytical systems require expertise to interpret. Looking at a nifty graphic generated by Spotfire or Palantir is NOT the same as understand what decisions have been made, what limitations exist within the data display, and what are the blind spots generated by the particular method or suite of methods. (Firms which do focus on explaining and delivering systems which make it clear to users about methods, constraints, and considerations include Digital Reasoning, Ikanow, and Content Analyst. Others? You are on your own, folks.)

Today I have yet another conference call with 30 somethings who are into analytics. Analytics is the “next big thing.” Just as people assume coding up a Web site is easy, people assume that mathematical methods are now the mental equivalent of clicking a mouse to get a document. Wrong.

The likelihood of misinterpreting the outputs of modern analytic systems is higher than it was when I entered the workforce after graduate school. These reasons include:

  1. A rise in the “something for nothing” approach to information. A few clicks, a phone call, and chit chat with colleagues makes many people expert in quite difficult systems and methods. In the mid 1960s, there was limited access to systems which could do clever stuff with tricks from my relative Vladimir Ivanovich Arnold. Today, the majority of the people with whom I interact assume their ability to generate a graph and interpret a scatter diagram equips them as analytic mavens. Math is and will remain hard. Nothing worthwhile comes easy. That truism is not too popular with the 30 somethings who explain the advantages of analytics products they sell.
  2. Sizzle over content. Most of the wild and crazy decisions I have learned about come from managers who accept analytic system outputs as a page from old Torah scrolls from Yitzchok Riesman’s collection. High ranking government officials want eye candy, so modern analytic systems generate snazzy graphics. Does the government official know what the methods were and the data’s limitations? Nope. Bring this up and the comment is, “Don’t get into the weeds with me, sir.” No problem. I am an old advisor in rural Kentucky.
  3. Entrepreneurs, failing search system vendors, and open source repackagers are painting the bandwagon and polishing the tubas and trombones. The analytics parade is on. From automated and predictive indexing to surfacing nuggets in social media—the music is loud and getting louder. With so many firms jumping into the bandwagon or joining the parade, the reality of analytics is essentially irrelevant.

The bottom line for me is that the social boom is at or near its crest. Marketers—particularly those in content processing and search—are desperate for a hook which will generate revenues. Analytics seems to be as good as any other idea which is converted by azure chip consultants and carpetbaggers into a “real business.”

The problem is that analytics is math. Math is easy as 1-2-3; math is as complex as MIT’s advanced courses. With each advance in computing power, more fancy math becomes possible. As math advances, the number of folks who can figure out what a method yields decreases. The result is a growing “cloud of unknowing” with regard to analytics. Putting this into a visualization makes clear the challenge.

Stephen E Arnold, June 18, 2012

Coveo Positions Itself Insight Solutions

June 15, 2012

Coveo has a new positioning with Insight Solutions. It does search, business intelligence, and compliance. We learn from “3i Group Leverages Coveo Insight Solutions for Knowledge Continuity and Expertise Finding,” posted at the Wall Street Journal’s Market Watch, of at least one company that is very happy with the product. The press release states:

“3i needed a flexible solution that would easily scale as the amount of information and information sources continued to grow. After evaluating several vendors, 3i selected Coveo’s Insight Solutions based on ease-of-use, flexibility and Insight Consoles, the presentation layer of Coveo’s intelligent indexing technology, which provides information from across sources in a single, unified view, configured by role — so that each user views and interacts with contextually relevant, dynamically updated information.”

3i Group is a leading international investment company who used to rely on the on-board search functions of a myriad of data sources, from email to file systems. Naturally, this approach wasted a lot of time, and the company is happy to have found a solution to that problem that has also turned up more useful information than workers knew existed. 3i is so happy with Insight Solutions, it plans to expand its use to other initiatives such as legal, compliance, and business intelligence. They also look forward to an upcoming enterprise-wide roll out via mobile devices.

This development is an example of how Coveo shows ingenuity in positioning its search technology. The company was founded in 2005 by some of the team which developed Copernic Desktop Search. Coveo takes pride in solutions that are agile and easy to use yet scalable, fast, and efficient. They also boast that “people like doing business with us.” That is something not every company can say.

Cynthia Murrell, June 15, 2012

Sponsored by PolySpot

Prediction, Metadata, and Good Enough

June 14, 2012

Several PR mavens have sent me today multiple unsolicited emails about their clients’ predictive statistical methods. I don’t like spam email. I don’t like PR advisories that promise wild and crazy benefits for predictive analytics applied to big data, indexing content, or figuring out what stocks to buy.

March Communications was pitching Lavastorm and Kabel Deutschland. The subject analytics—real time, predictive, and discovery driven.

Baloney.

Predictive analytics can be helpful in many business and technical processes. Examples range from figuring out where to sell an off lease mint green Ford Mustang convertible to planning when to ramp up outputs from a power generation station. Where predictive analytics are not yet ready for prime time is identifying which horse will win the Kentucky Derby and determining where the next Hollywood starlet will crash a sports car. Predictive methods can suggest how many cancer cells will die under certain conditions and assumptions, but the methods cannot identify which cancer cells will die.

Can predictive analytics make you a big winner at the race track? If firms with rock sold predictive analytics could predict a horse race, would these firms be selling software or would these firms be betting on horse races?

That’s an important point. Marketers promise magic. Predictive methods deliver results that provide some insight but rarely rock solid outputs. Prediction is fuzzy. Good enough is often the best a method can provide.

In between is where hopes and dreams rise and fall with less clear cut results. I am, of course, referring to the use by marketers of lingo like this:

The idea behind these buzzwords is that numerical recipes can process information or data and assign probabilities to outputs. When one ranks the outputs from highest probability to lowest probability, an analyst or another script can pluck the top five outputs. These outputs are the most likely to occur. The approach works for certain Google-type caching methods, providing feedback to consumer health searchers, and figuring out how much bandwidth is needed for a new office building when it is fully occupied. Picking numbers at the casino? Not so much.

Read more

Helios Treaty Creates Neutral File Ground

June 14, 2012

Helios just fired the web border patrol and initiated a peace treaty for neutral file ground. For decades Mac and Windows have possessively guarded their terrain making it difficult for files to cross from border to border. That is changing according to the article Helios puts spotlight on cross-platform search, the spotlight shining across the border now lights the path for synchronization.

Tom Hallinan, Strategic Partner Manager at HELIOS Software stated:

“The demise of the Xserve, and the increased usage of Macs and mobile devices in businesses, has revealed the shortcomings of the Mac-only Spotlight search from Apple, and the Windows-only Windows Search for Windows. The HELIOS Spotlight-compatible indexing and search system solves that problem.”

“Mac, Windows, and UNIX/Linux users can drag & drop project files from the web browser or local workstation into the WebShare Manager window to enable synchronization of files between the remote WebShare server and the local workstation. Automatic file versioning can also be enabled.”

Helios integrates into Windows Server, Mac OS X, Oracle Solaris, IBM AIX, and Linux, which covers all the major server operating systems. This virtual directory simplifies search by placing all this data on one individual file server, thus enabling ease of access. This is a perfect solution for businesses since the mobile device industry is becoming oversaturated. The Helios treaty designating neutral file territory came at a perfect time

Jennifer Shockley, June 14, 2012

Sponsored by Polyspot

Microsoft SharePoint: Controlled Term Functionality

June 13, 2012

Also covered “SharePointSearch, Synonyms, Thesaurus, and You” provides a useful summary of Microsoft SharePoint’s native support for controlled term lists. Today, the buzzwords taxonomy and ontology are used to refer to term lists which SharePoint can use to index content. Term lists may consist of company-specific vocabulary, the names of peoples and companies with which a firm does business, or formal lists of words and phrases with “Use for” and “See also” cross references.

The important of a controlled term list is often lost when today’s automated indexing systems process content. Almost any search system benefits when the content processing subsystem can use a controlled term list as well as the automated methods baked into the indexer.

In this TechGrowingPains write up, the author says:

A little known, and interesting, feature in SharePoint search is the ability to create customized thesaurus word sets. The word sets can either be synonyms, or word replacements, augmenting search functionality. This ability is not limited to single words, it can also be extend into specific phrases.

The article explains how controlled term lists can be used to assist a user in formulating a query. The method is called “replacement words”. The idea of suggesting terms is a good one which many users find a time saver when doing research. The synonym expansion function is mentioned as well. SharePoint can insert broader terms into a user’s query which increases or decreases the size of the result set.

The centerpiece of the article is a recipe for activating this functionality. A helpful code snippet is included as well.

If you want additional technical support, let us know. Our Search Technoologies’ team has deep experience in Microsoft SharePoint search and customization. We can implement advanced controlled term features in almost any SharePoint system.

Iain Fletcher, June 13, 2012

Facebook and Search: A New Google Rival

June 7, 2012

Facebook is making plans to improve its search engine so users can more easily find shared or liked content. The current flawed search system needs a revamp, but a new survey reveals that almost half of respondents disliked the idea of Facebook launching its own search engine.

The article, “A Facebook Search Engine to Rival Google? Users Dislike That Idea,” tells us that even though Facebook could potentially capture 22 percent of the global search market, but the public isn’t exactly receptive at the moment. Forty-eight percent of respondents to the recent survey by Greenlight spoke up and said they would not, or probably would not, be interested in a Facebook search engine.

“Still, Greenlight says if Facebook launches its own search engine, it could potentially grab 22 percent of the global search market share and become the second most used search engine in every major market except for China, Japan, and Russia, where it would rank third.

‘It wouldn’t need to be a spectacular engine either, just well integrated into the Facebook experience and generally competent,’ said Greenlight Chief Operating Officer Andreas Pouros.”

However, Facebook isn’t currently interested in crawling and indexing the entire web. The company just wants content on the site that is shared by users to be more easily accessible. Regardless, Google’s 66.5 percent market share in the U.S. is quite intimidating and possibly the reason behind Facebook’s reluctance to join in the search engine war.

Andrea Hayden, June 7, 2012

Sponsored by PolySpot

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta