More Predictive Silliness: Coding, Decisioning, Baloneying

June 18, 2012

It must be the summer vacation warm and fuzzies. I received another wild analytics news release today. This one comes from 5WPR, “a top 25 PR agency.” Wow. I learned from the spam: PeekAnalytics “delivers enterprise class Twitter analytics and help marketers understand their social consumers.”

What?

Then I read:

By identifying where Twitter users exist elsewhere on the Web, PeekAnalytics offers unparalleled audience metrics from consumer data aggregated not just from Twitter, but from over sixty social sites and every major blog platform.

The notion of algorithms explaining anything is interesting. But the problem with numerical recipes is that those who use outputs may not know what’s going on under the hood. Wide spread knowledge of the specific algorithms, the thresholds built into the system, and the assumptions underlying the selection of a particular method is in short supply.

Analytics is the realm of the one percent of the population trained to understand the strengths and weaknesses of specific mathematical systems and methods. The 99 percent are destined to accept analytics system outputs without knowing how the data were selected, shaped, formed, and presented given the constraints of the inputs. Who cares? Well, obviously not some marketers of predictive analytics, automated indexing, and some trigger trading systems. Too bad for me. I do care.

When I read about analytics and understanding, I shudder. As an old goose, each body shake costs me some feathers, and I don’t have many more to lose at age 67. The reality of fancy math is that those selling its benefits do not understand its limitations.

Consider the notion of using a group of analytic methods to figure out the meaning of a document. Then consider the numerical recipes required to identify a particular document as important from thousands or millions of other documents.

When companies describe the benefits of a mathematical system, the details are lost in the dust. In fact, bringing up a detail results in a wrinkled brow. Consider the Kolmogorov-Smirnov Test. Has this non parametric test been applied to the analytics system which marketers have presented to you in the last “death by PowerPoint” session? The response from 99.5 percent of the people in the world is, “Kolmo who?” or “Isn’t Smirnov a vodka?” Bzzzz. Wrong.

Mathematical methods which generate probabilities are essential to many business sectors. When one moves fuel rods at a nuclear reactor, the decision about what rod to put where is informed by a range of mathematical methods. Special training experts, often with degrees in nuclear engineering plus post graduate work handle the fuel rod manipulation. Take it from me. Direct observation is not the optimal way to figure out fuel pool rod distribution. Get the math “wrong” and some pretty exciting events transpire. Monte Carlo anyone? John Gray? Julian Steyn? If these names mean nothing to you, you would not want to sign up for work in a nuclear facility.

Why then would a person with zero knowledge of how numerical recipes, oddball outputs from particular types of algorithms, and little or know experience with probability methods use the outputs of a system as “truth.” The outputs of analytical systems require expertise to interpret. Looking at a nifty graphic generated by Spotfire or Palantir is NOT the same as understand what decisions have been made, what limitations exist within the data display, and what are the blind spots generated by the particular method or suite of methods. (Firms which do focus on explaining and delivering systems which make it clear to users about methods, constraints, and considerations include Digital Reasoning, Ikanow, and Content Analyst. Others? You are on your own, folks.)

Today I have yet another conference call with 30 somethings who are into analytics. Analytics is the “next big thing.” Just as people assume coding up a Web site is easy, people assume that mathematical methods are now the mental equivalent of clicking a mouse to get a document. Wrong.

The likelihood of misinterpreting the outputs of modern analytic systems is higher than it was when I entered the workforce after graduate school. These reasons include:

  1. A rise in the “something for nothing” approach to information. A few clicks, a phone call, and chit chat with colleagues makes many people expert in quite difficult systems and methods. In the mid 1960s, there was limited access to systems which could do clever stuff with tricks from my relative Vladimir Ivanovich Arnold. Today, the majority of the people with whom I interact assume their ability to generate a graph and interpret a scatter diagram equips them as analytic mavens. Math is and will remain hard. Nothing worthwhile comes easy. That truism is not too popular with the 30 somethings who explain the advantages of analytics products they sell.
  2. Sizzle over content. Most of the wild and crazy decisions I have learned about come from managers who accept analytic system outputs as a page from old Torah scrolls from Yitzchok Riesman’s collection. High ranking government officials want eye candy, so modern analytic systems generate snazzy graphics. Does the government official know what the methods were and the data’s limitations? Nope. Bring this up and the comment is, “Don’t get into the weeds with me, sir.” No problem. I am an old advisor in rural Kentucky.
  3. Entrepreneurs, failing search system vendors, and open source repackagers are painting the bandwagon and polishing the tubas and trombones. The analytics parade is on. From automated and predictive indexing to surfacing nuggets in social media—the music is loud and getting louder. With so many firms jumping into the bandwagon or joining the parade, the reality of analytics is essentially irrelevant.

The bottom line for me is that the social boom is at or near its crest. Marketers—particularly those in content processing and search—are desperate for a hook which will generate revenues. Analytics seems to be as good as any other idea which is converted by azure chip consultants and carpetbaggers into a “real business.”

The problem is that analytics is math. Math is easy as 1-2-3; math is as complex as MIT’s advanced courses. With each advance in computing power, more fancy math becomes possible. As math advances, the number of folks who can figure out what a method yields decreases. The result is a growing “cloud of unknowing” with regard to analytics. Putting this into a visualization makes clear the challenge.

Stephen E Arnold, June 18, 2012

Coveo Positions Itself Insight Solutions

June 15, 2012

Coveo has a new positioning with Insight Solutions. It does search, business intelligence, and compliance. We learn from “3i Group Leverages Coveo Insight Solutions for Knowledge Continuity and Expertise Finding,” posted at the Wall Street Journal’s Market Watch, of at least one company that is very happy with the product. The press release states:

“3i needed a flexible solution that would easily scale as the amount of information and information sources continued to grow. After evaluating several vendors, 3i selected Coveo’s Insight Solutions based on ease-of-use, flexibility and Insight Consoles, the presentation layer of Coveo’s intelligent indexing technology, which provides information from across sources in a single, unified view, configured by role — so that each user views and interacts with contextually relevant, dynamically updated information.”

3i Group is a leading international investment company who used to rely on the on-board search functions of a myriad of data sources, from email to file systems. Naturally, this approach wasted a lot of time, and the company is happy to have found a solution to that problem that has also turned up more useful information than workers knew existed. 3i is so happy with Insight Solutions, it plans to expand its use to other initiatives such as legal, compliance, and business intelligence. They also look forward to an upcoming enterprise-wide roll out via mobile devices.

This development is an example of how Coveo shows ingenuity in positioning its search technology. The company was founded in 2005 by some of the team which developed Copernic Desktop Search. Coveo takes pride in solutions that are agile and easy to use yet scalable, fast, and efficient. They also boast that “people like doing business with us.” That is something not every company can say.

Cynthia Murrell, June 15, 2012

Sponsored by PolySpot

Prediction, Metadata, and Good Enough

June 14, 2012

Several PR mavens have sent me today multiple unsolicited emails about their clients’ predictive statistical methods. I don’t like spam email. I don’t like PR advisories that promise wild and crazy benefits for predictive analytics applied to big data, indexing content, or figuring out what stocks to buy.

March Communications was pitching Lavastorm and Kabel Deutschland. The subject analytics—real time, predictive, and discovery driven.

Baloney.

Predictive analytics can be helpful in many business and technical processes. Examples range from figuring out where to sell an off lease mint green Ford Mustang convertible to planning when to ramp up outputs from a power generation station. Where predictive analytics are not yet ready for prime time is identifying which horse will win the Kentucky Derby and determining where the next Hollywood starlet will crash a sports car. Predictive methods can suggest how many cancer cells will die under certain conditions and assumptions, but the methods cannot identify which cancer cells will die.

Can predictive analytics make you a big winner at the race track? If firms with rock sold predictive analytics could predict a horse race, would these firms be selling software or would these firms be betting on horse races?

That’s an important point. Marketers promise magic. Predictive methods deliver results that provide some insight but rarely rock solid outputs. Prediction is fuzzy. Good enough is often the best a method can provide.

In between is where hopes and dreams rise and fall with less clear cut results. I am, of course, referring to the use by marketers of lingo like this:

The idea behind these buzzwords is that numerical recipes can process information or data and assign probabilities to outputs. When one ranks the outputs from highest probability to lowest probability, an analyst or another script can pluck the top five outputs. These outputs are the most likely to occur. The approach works for certain Google-type caching methods, providing feedback to consumer health searchers, and figuring out how much bandwidth is needed for a new office building when it is fully occupied. Picking numbers at the casino? Not so much.

Read more

Helios Treaty Creates Neutral File Ground

June 14, 2012

Helios just fired the web border patrol and initiated a peace treaty for neutral file ground. For decades Mac and Windows have possessively guarded their terrain making it difficult for files to cross from border to border. That is changing according to the article Helios puts spotlight on cross-platform search, the spotlight shining across the border now lights the path for synchronization.

Tom Hallinan, Strategic Partner Manager at HELIOS Software stated:

“The demise of the Xserve, and the increased usage of Macs and mobile devices in businesses, has revealed the shortcomings of the Mac-only Spotlight search from Apple, and the Windows-only Windows Search for Windows. The HELIOS Spotlight-compatible indexing and search system solves that problem.”

“Mac, Windows, and UNIX/Linux users can drag & drop project files from the web browser or local workstation into the WebShare Manager window to enable synchronization of files between the remote WebShare server and the local workstation. Automatic file versioning can also be enabled.”

Helios integrates into Windows Server, Mac OS X, Oracle Solaris, IBM AIX, and Linux, which covers all the major server operating systems. This virtual directory simplifies search by placing all this data on one individual file server, thus enabling ease of access. This is a perfect solution for businesses since the mobile device industry is becoming oversaturated. The Helios treaty designating neutral file territory came at a perfect time

Jennifer Shockley, June 14, 2012

Sponsored by Polyspot

Microsoft SharePoint: Controlled Term Functionality

June 13, 2012

Also covered “SharePointSearch, Synonyms, Thesaurus, and You” provides a useful summary of Microsoft SharePoint’s native support for controlled term lists. Today, the buzzwords taxonomy and ontology are used to refer to term lists which SharePoint can use to index content. Term lists may consist of company-specific vocabulary, the names of peoples and companies with which a firm does business, or formal lists of words and phrases with “Use for” and “See also” cross references.

The important of a controlled term list is often lost when today’s automated indexing systems process content. Almost any search system benefits when the content processing subsystem can use a controlled term list as well as the automated methods baked into the indexer.

In this TechGrowingPains write up, the author says:

A little known, and interesting, feature in SharePoint search is the ability to create customized thesaurus word sets. The word sets can either be synonyms, or word replacements, augmenting search functionality. This ability is not limited to single words, it can also be extend into specific phrases.

The article explains how controlled term lists can be used to assist a user in formulating a query. The method is called “replacement words”. The idea of suggesting terms is a good one which many users find a time saver when doing research. The synonym expansion function is mentioned as well. SharePoint can insert broader terms into a user’s query which increases or decreases the size of the result set.

The centerpiece of the article is a recipe for activating this functionality. A helpful code snippet is included as well.

If you want additional technical support, let us know. Our Search Technoologies’ team has deep experience in Microsoft SharePoint search and customization. We can implement advanced controlled term features in almost any SharePoint system.

Iain Fletcher, June 13, 2012

Facebook and Search: A New Google Rival

June 7, 2012

Facebook is making plans to improve its search engine so users can more easily find shared or liked content. The current flawed search system needs a revamp, but a new survey reveals that almost half of respondents disliked the idea of Facebook launching its own search engine.

The article, “A Facebook Search Engine to Rival Google? Users Dislike That Idea,” tells us that even though Facebook could potentially capture 22 percent of the global search market, but the public isn’t exactly receptive at the moment. Forty-eight percent of respondents to the recent survey by Greenlight spoke up and said they would not, or probably would not, be interested in a Facebook search engine.

“Still, Greenlight says if Facebook launches its own search engine, it could potentially grab 22 percent of the global search market share and become the second most used search engine in every major market except for China, Japan, and Russia, where it would rank third.

‘It wouldn’t need to be a spectacular engine either, just well integrated into the Facebook experience and generally competent,’ said Greenlight Chief Operating Officer Andreas Pouros.”

However, Facebook isn’t currently interested in crawling and indexing the entire web. The company just wants content on the site that is shared by users to be more easily accessible. Regardless, Google’s 66.5 percent market share in the U.S. is quite intimidating and possibly the reason behind Facebook’s reluctance to join in the search engine war.

Andrea Hayden, June 7, 2012

Sponsored by PolySpot

SearchBlox 7.0 Available

June 7, 2012

SearchBlox’s blog invites us to “Compare SearchBlox7.0 vs. Solr.” Okay, so I am to compare. I wonder how? There is no side-by-side comparison set up here nor is there any link to one. Hmm. . . I guess I am expected to do the legwork.

Misleading headline aside, the write up does thoroughly describe SearchBlox new version 7.0 in relation to rival Solr. It reads:

“SearchBlox 7 is a (free) enterprise solution for website, ecommerce, intranet and portal search. The new 7.0 version makes it easy to add faceted search without the hassles of managing a schema and scales horizontally without any manual configuration or external software/scripts. SearchBlox enables you to achieve term, range and date based faceted search without manually maintaining a schema file as in Solr. SearchBlox enables to have distributed indexing and searching abilities without using any separate scripts/programs as in SolrCloud. SearchBlox provides on demand dynamic faceting of fields without specifying them through a config or script.”

The software also sports a Web-based administrator’s console. Unlike Solr, SearchBlox indexes custom meta fields without the need to specify custom fields or setup within the schema.xml file. It also supports: multiple indexes out of the box; indexing of custom content and multiple content types; and the specification of facets at runtime (as opposed to requiring a prior definition). Another nifty feature lets you add or remove SearchBlox servers from a cluster without the need to restart or stop the servers.

Perhaps SearchBlox 7.0 outpaces Solr in these metrics because it is built on top of that Apache product. SearchBlox Software was founded in 2003 and is based in Richmond, VA.

Cynthia Murrell, June 7, 2012

Sponsored by PolySpot

Get a Comprehensive Search Solution for SharePoint from Fabasoft Mindbreeze

June 4, 2012

In “SharePoint Log: When Databases Rebel,” Robert Schifreen looks at how one user can generate 16 gigabytes of logs in just three months. The article is the ninth part of a larger SharePoint 2010 series chronicling a SharePoint deployment at the ZDNet Blog.

Schifreen has this to say about navigating the growing amounts of data:

Microsoft markets a separate SharePoint add-on product called FAST Search, and likes to imply that no successful SharePoint installation is complete without it. In practice, from what I have read, it seems that FAST is unnecessary unless you have tens of millions of documents to index. Otherwise, SharePoint’s out-of-the-box indexing system will crawl the full text of all your documents (you’ll need to download a free ifilter, as it’s called, to crawl PDF files) perfectly well.

But he goes on to add:

There’s a handful of things missing from the standard search, such as having the number of hits displayed in brackets within the search results page, and there are no thumbnail previews of search results, but nothing that is sufficiently must-have to warrant the added expense or complication of learning yet another Microsoft technology.

We know SharePoint is a complex and beneficial system for content management, but we also know there are gaps in the out-of-the-box search feature. But you don’t have to learn a new Microsoft technology or settle for less. Consider a third party solution developed and devoted specifically to search, like Fabasoft Mindbreeze. Their Web Parts based information pairing capabilities give you powerful searches and a complete picture of your business information, allowing you to get the most out of your enterprise search investments. And your end users will benefit from the fast and intuitive search with clearly displayed results and simple navigation.

Creating relevant knowledge means processing data in a comprehensible form and utilizing relations between information objects. Data is sorted according to type and relevance. The enterprise search for professionals.

Mindbreeze’s intuitiveness also means less training required. They have tutorials and wikis that are easy to use and more efficient. Here you can browse Mindbreeze’s support tools for users, including videos, FAQs, wikis, and other training options. Check out the full suite of solutions at Fabasoft Mindbreeze.

Philip West, June 4, 2012

Sponsored by Pandia.com

Forward Search 2.7 Arrives

June 1, 2012

The eagerly anticipated newest version of Forward Search has finally been released. According to Release of Forward Search 2.7, extensive testing has been done over the past several weeks to ensure the functionality of new features and improvements. The company is secure in their opinion that this is the most stable, flexible and versatile Forward Search version ever.

The Highlighted Features and Improvements are as follows on the backend client:

“Facet Counted Search – Count of each found result, Faster Numerical Range Query – Returns hits within specified range, Type-ahead improvement – Support for selected fields and sorting by frequency, Improved HTML5 support – New filtering options for extended custom fields, WebService improvement – Json-returning search interface, Web Crawler – Now supports partial crawl, and Indexing – Complete re-indexing of an index. “

Some of the new Administration Clients Services are:

“New Atom-Feed News reader – relaying news from the Forward Search Partner Portal, Added support for above backend features, and improved interface for related control element editing”

Forward Search is a Microsoft Partner that offers Enterprise Search for enterprise solutions including Content Management, intranets, databases, document repositories, OEM software etc. They currently work with over 35 partners in 9 countries offering backing and support to enable corporations to handle large amounts of unstructured data and create success within their client circles by utilizing Content Management Solutions based on Microsoft technology like EPiServer, Sitecore and Umbraco.

Jennifer Shockley, June 1, 2012

Sponsored by PolySpot

Lucid Imagination Previews Solr 4

May 30, 2012

With the first alpha release of Solr 4 promised to us soon, Lucid Imagination posts a “Solr 4 Preview: SolrCloud, NoSQL, and More.” Solr 4 is full of features that enhance existing Solr applications. It also paves the way for new applications by further muddying the distinction between full text search and NoSQL.

Solr Cloud is the code name for the largest set of features. These promise easy scalability to Solr as well as distributed indexing to boost elements such as real time-get, optimistic locking, and durable updates.

Solr 4 incorporates Apache‘s robust distributed coordination project ZooKeeper. This tool contains the Solr configuration as well as cluster meta-data such as hosts, collections, shards, and replicas. The post describes how distributed coordination works in Solr 4:

“When a new node is brought up, it will automatically be assigned a role such as becoming an additional replica for a shard. A bounced node can do a quick ‘peer sync’ by exchanging updates with its peers in order to bring itself back up to date. New nodes, or those that have been down too long, recover by replicating the whole index of a peer while concurrently buffering any new updates.

“An update can be sent to any node in the cluster, and it’s automatically forwarded to the correct node and immediately replicated to a number of other nodes to enable fault tolerance, high availability, and query scalability. Likewise, queries may be sent to any node in a cluster and they will automatically be routed to the correct nodes and load balanced across replicas.”

Solr 4 is packed with other new features, like pivot faceting, pseudo-fields, and a spell checker that can work from the main index to name just a few. See the write up for more.

Lucid Imagination is the commercial company for Lucene and its search server Solr. The company crafts robust scalable search solutions that make the most of the open source technology. Lucid prides itself on making open source search accessible and easy to learn for clients worldwide, many of which are industry heavyweights. These search gurus recently moved to new digs in Redwood City, CA.

Cynthia Murrell, May 30, 2012

Sponsored by PolySpot

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta