More Predictive Silliness: Coding, Decisioning, Baloneying

June 18, 2012

It must be the summer vacation warm and fuzzies. I received another wild analytics news release today. This one comes from 5WPR, “a top 25 PR agency.” Wow. I learned from the spam: PeekAnalytics “delivers enterprise class Twitter analytics and help marketers understand their social consumers.”

What?

Then I read:

By identifying where Twitter users exist elsewhere on the Web, PeekAnalytics offers unparalleled audience metrics from consumer data aggregated not just from Twitter, but from over sixty social sites and every major blog platform.

The notion of algorithms explaining anything is interesting. But the problem with numerical recipes is that those who use outputs may not know what’s going on under the hood. Wide spread knowledge of the specific algorithms, the thresholds built into the system, and the assumptions underlying the selection of a particular method is in short supply.

Analytics is the realm of the one percent of the population trained to understand the strengths and weaknesses of specific mathematical systems and methods. The 99 percent are destined to accept analytics system outputs without knowing how the data were selected, shaped, formed, and presented given the constraints of the inputs. Who cares? Well, obviously not some marketers of predictive analytics, automated indexing, and some trigger trading systems. Too bad for me. I do care.

When I read about analytics and understanding, I shudder. As an old goose, each body shake costs me some feathers, and I don’t have many more to lose at age 67. The reality of fancy math is that those selling its benefits do not understand its limitations.

Consider the notion of using a group of analytic methods to figure out the meaning of a document. Then consider the numerical recipes required to identify a particular document as important from thousands or millions of other documents.

When companies describe the benefits of a mathematical system, the details are lost in the dust. In fact, bringing up a detail results in a wrinkled brow. Consider the Kolmogorov-Smirnov Test. Has this non parametric test been applied to the analytics system which marketers have presented to you in the last “death by PowerPoint” session? The response from 99.5 percent of the people in the world is, “Kolmo who?” or “Isn’t Smirnov a vodka?” Bzzzz. Wrong.

Mathematical methods which generate probabilities are essential to many business sectors. When one moves fuel rods at a nuclear reactor, the decision about what rod to put where is informed by a range of mathematical methods. Special training experts, often with degrees in nuclear engineering plus post graduate work handle the fuel rod manipulation. Take it from me. Direct observation is not the optimal way to figure out fuel pool rod distribution. Get the math “wrong” and some pretty exciting events transpire. Monte Carlo anyone? John Gray? Julian Steyn? If these names mean nothing to you, you would not want to sign up for work in a nuclear facility.

Why then would a person with zero knowledge of how numerical recipes, oddball outputs from particular types of algorithms, and little or know experience with probability methods use the outputs of a system as “truth.” The outputs of analytical systems require expertise to interpret. Looking at a nifty graphic generated by Spotfire or Palantir is NOT the same as understand what decisions have been made, what limitations exist within the data display, and what are the blind spots generated by the particular method or suite of methods. (Firms which do focus on explaining and delivering systems which make it clear to users about methods, constraints, and considerations include Digital Reasoning, Ikanow, and Content Analyst. Others? You are on your own, folks.)

Today I have yet another conference call with 30 somethings who are into analytics. Analytics is the “next big thing.” Just as people assume coding up a Web site is easy, people assume that mathematical methods are now the mental equivalent of clicking a mouse to get a document. Wrong.

The likelihood of misinterpreting the outputs of modern analytic systems is higher than it was when I entered the workforce after graduate school. These reasons include:

  1. A rise in the “something for nothing” approach to information. A few clicks, a phone call, and chit chat with colleagues makes many people expert in quite difficult systems and methods. In the mid 1960s, there was limited access to systems which could do clever stuff with tricks from my relative Vladimir Ivanovich Arnold. Today, the majority of the people with whom I interact assume their ability to generate a graph and interpret a scatter diagram equips them as analytic mavens. Math is and will remain hard. Nothing worthwhile comes easy. That truism is not too popular with the 30 somethings who explain the advantages of analytics products they sell.
  2. Sizzle over content. Most of the wild and crazy decisions I have learned about come from managers who accept analytic system outputs as a page from old Torah scrolls from Yitzchok Riesman’s collection. High ranking government officials want eye candy, so modern analytic systems generate snazzy graphics. Does the government official know what the methods were and the data’s limitations? Nope. Bring this up and the comment is, “Don’t get into the weeds with me, sir.” No problem. I am an old advisor in rural Kentucky.
  3. Entrepreneurs, failing search system vendors, and open source repackagers are painting the bandwagon and polishing the tubas and trombones. The analytics parade is on. From automated and predictive indexing to surfacing nuggets in social media—the music is loud and getting louder. With so many firms jumping into the bandwagon or joining the parade, the reality of analytics is essentially irrelevant.

The bottom line for me is that the social boom is at or near its crest. Marketers—particularly those in content processing and search—are desperate for a hook which will generate revenues. Analytics seems to be as good as any other idea which is converted by azure chip consultants and carpetbaggers into a “real business.”

The problem is that analytics is math. Math is easy as 1-2-3; math is as complex as MIT’s advanced courses. With each advance in computing power, more fancy math becomes possible. As math advances, the number of folks who can figure out what a method yields decreases. The result is a growing “cloud of unknowing” with regard to analytics. Putting this into a visualization makes clear the challenge.

Stephen E Arnold, June 18, 2012

Findability and Design: How Sizzle Distracts from Understanding

May 9, 2012

I have been watching the Disneyfication of search. A results list is just not exciting unless there are dozens of links, images, videos, and graphs to help me find the answer to my research question. As far as I know, Palantir and several other analytics companies have built their businesses on outputting flashy graphics which I often have a tough time figuring out. My view is that looks are more important than substance in many organizations.

I read “Designers Are Not a Panacea.” I agree with the basic premise of the write up. Here’s a passage I tucked into my reference file:

Rather than granting designers full control over the product, remember that they need to play nice and integrate with several other aspects of your business. You need to remember that you are building a business not a pretty app. A designer co-founder could help (as could a sales co-founder), but does not offer any guarantees that you will make good business decisions, regardless of how “beautiful” an experience your application offers (not to say that adding more engineers does). Visual aesthetics are rarely enough. Getting a product into the hands of potential customers is important.

The write leaves an important question unanswered: “Why is the pursuit of visual flashiness now so important?”

I have several hypotheses, and I don’t think that some of these have been explored in sufficient detail by either the private equity firms pumping money into graphic-centric search and content processing companies. Here goes and feel free to use the comments section of this blog if you disagree:

First, insecurity. I think that many professionals are not sure of their product or service, not sure of their expertise, and not sure of their own “aura of competence.” Hiding behind visually thrilling graphs distracts the audience to some degree. The behavior of listeners almost guarantees that really basic questions about sample size and statistical recipes used to output the visual will not be asked.

Second, mislead. I think that humans like to look at pictures and then do the “thinking fast, thinking slow” thing and jump to conclusions for social or psychological reasons. The notion of an in depth discussion is something I have watched get kicked into the gutter in some recent meetings. The intellectual effort required to think about a problem is just not present. A visual makes it easy for the speaker and the listener to mislead intentionally.

Third, indifference. In a recent meeting, several presenters put up slides which had zero to do with the topic at hand. The speaker pointed to the visual and made a totally unrelated comment or observation. No one in the audience cared. I don’t think most people were listening. Fiddling with smart phones or playing with iPads has replaced listening and old fashioned note taking. The speaker did not care either. I think the presentation was prepared by some corporate team and the presenter was trying to smile and get through the briefing.

What does design have to do with search? If one looks at the “new” interfaces for Google and Microsoft Web search, I noted that neither service was making fundamental changes. In fact, Google seemed to be moving to the old Excite and Yahoo approach with three columns and a bewildering number of hot links. Microsoft, on the other hand, was emulating Google’s interface in 2006 and 2007.

Visualization systems and methods have made significant contributions to engineering and certain types of mathematics. However, for other fields, visualization has become lipstick designed to distract, obfuscate, or distort information.

In US government briefings, visual sizzle is often more important than the content presented. I have seen the same disturbing trend at analytics and search conferences. Without accountability from colleagues and employers, design is going to convert search and findability into a walk through Disneyland. The walk is fun, but I don’t think an amusement park shares much with the nitty grit of day to day revenue generation from software and services.

Stephen E Arnold, May 9, 2012

Sponsored by IKANOW

IBM Buys Vivisimo Allegedly for Its Big Data Prowess

April 25, 2012

Big data. Wow. That’s an angle only a public relations person with a degree in 20th century American literature could craft. Vivisimo is many things, but a big data system? News to me for sure.

IBM has been a strong consumer and integrator of open source search solutions. Watson, the game show winner, used Lucene with IBM wrapper software to keep the folks in Jeopardy post production on their toes.

vivisimo search

A screen shot of the Vivisimo Velocity system displaying search results for the RAND organization. Notice the folders in the left hand panel. The interface reveals Vivisimo’s roots in traditional search and retrieval. The federating function operates behind the scenes. The newest versions of Velocity permit a user to annotate a search hit so the system will boost it in subsequent queries if the comment is positive. A negative rating on a result suppresses that result.

I learned that IBM allegedly purchased Vivisimo, a company which I have covered in my various monographs about search and content processing. Forbes ran a story which was at odds with my understanding of what the Vivisimo technology actually does. Here’s the Forbes’ title: “IBM To Buy Vivisimo; Expands Bet On Big Data Analytics.” Notice the phrase “big data analytics.”

Why do I point out the “big data” buzzword? The reasons include:

  • Vivisimo has a clustering method which takes search results and groups them, placing similar results identified by the method in “folders”
  • Vivisimo has a federating method which, like Bright Planet’s and Deep Web Technologies’, takes a user’s query and sends the query to two or more indexing systems, retrieves the results, and displays them to the user
  • Vivisimo has a clever de-duplication method which makes the results list present one item. This is important when one encounters a news story which appears on multiple Web sites.

According to the write up in Forbes, a “real” news outfit:

IBM this morning said it has agreed to acquire Vivisimo, a Pittsburgh-based provider of big data access and analysis tools.

Okay, but in Beyond Search we have documented that Vivisimo followed this trajectory in its sales and marketing efforts since the company opened for business in 2000. In fact, the Wikipedia write up about Vivisimo says this:

Vivisimo is a privately held enterprise search software company in Pittsburgh that develops and sells software products to improve search on the web and in enterprises. The focus of Vivisimo’s research thus far has been the concept of clustering search results based on topic: for example, dividing the results of a search for “cell” into groups like “biology,” “battery,” and “prison.” This process allows users to intuitively narrow their search results to a particular category or browse through related fields of information, and seeks to avoid the “overload” problem of sorting through too many results.

Read more

SAS Gets More Visual

March 31, 2012

Inxight (now owned by BusinessObjects, part of the SAP empire)  is history at SAS or almost history. Now the company is moving in a different direction.

Jaikumar Vijayan writes about a new visual analytics application recently unveiled by SAS in his article “SAS Promises Pervasive BI with New Tool.” Einstein is believed to have once said “computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.” We noted this passage from Mr. Vijayan’s write up:

Unlike many purely server-based enterprise analytics technologies, Visual Analytics gives business users a full range of data discovery, data visualization and querying capabilities from desktop and mobile client devices, the company said.

The initial version of the new tool allows iPad users to view reports and download information to their devices. Future versions will support other mobile devices as well, SAS added. The quote is actually a good description of the concept that underlies Visual Analysis. The process uses analytic reasoning to detect specific information in massive amount of data. For example, a clothing manufacturer might use it to determine current trends in ladies’ fashions. The results are presented in charts and graphs to the users, who can fine-tune the parameters until their specific queries are answered.

SAS is known for its statistical functionality, its programming language, and its need for SAS-savvy cow pokes to ride herd on the bits and bytes. Will SAS be able to react to the trend for the consumerization of business intelligence.

While the technology is impressive, SAS may be a little late to the game. Palantir and Digital Reasoning have already introduced applications that offer clients powerful Visual Analysis capabilities. Time will tell if SAS is able to catch up to some competitors’ approach. We are interested in Digital Reasoning, Ikanow and Quid.

Stephen E Arnold, March 31, 2012

Sponsored by Pandia.com

Connotate Acquires Fetch Technologies

March 27, 2012

I know, “Who? Bought what?”

Connotate is a data fusion company which uses software bots (agents) to harvest information. Fetch Technologies, founded more than a decade ago, processes structured data. The deal comes on the heels of some executive ball room dancing. Connotate snagged a new CEO, Keith Cooper, according to New Jersey Tech Week. Fetch also uses agent technology.

Founded in 1999, Fetch Technologies enables organizations to extract, aggregate and use real-time information from Web sites. Fetch’s artificial intelligence-based technology allows precise data extraction from any Web site, including the so-called Deep Web, and transforms that data into a uniform format that can be integrated into any analytics or business intelligence software.

The company’s technology originated at the University of Southern California’s Information Sciences Institute. Fetch’s founders developed the core artificial intelligence algorithms behind the Fetch Agent Platform while they were faculty members in Computer Science at USC. Fetch’s artificial intelligence solutions were further refined through years of research funded by the Defense Advanced Research Projects Agency (DARPA), the National Science Foundation (NSF), the U.S. Air Force, and other U.S. Government agencies.

The Connotate news release said:

Fetch is very excited to combine our information extraction, integration, and data analytics solution with Connotate’s monitoring, collection and analysis solution,” said Ryan Mullholland, Fetch’s former CEO and now President of Connotate. Our similar product and business development histories, but differing go-to-market strategies creates an extraordinary opportunity to fast-track the creation of world-class proprietary ‘big data’ collection and management solutions.

Okay, standard stuff. But here’s the paragraph that caught my attention:

Big data, social media and cloud-based computing are major drivers of complexity for business operations in the 21st century,” said Keith Cooper, CEO of Connotate.  “Connotate and Fetch are the only two companies to apply machine learning to web data extraction and can now take the best of both solutions to create a best-of-breed application that delivers inherent business value and real-time intelligence to companies of all sizes.

I am not comfortable with the assertion of “only two companies to apply machine learning to Web data extraction.” In our coverage of the business intelligence and text mining market in Inteltrax.com, we have written about many companies which have applied such technologies and generated more market traction. Examples range from Digital Reasoning to Palantir, and others.

The deal is going to deliver on a “unified vision.” That may be true; however, saying and doing are two different tasks. As I write this, unification is the focus of activities from big dogs like Autonomy, now part of Hewlett Packard, to companies which have lower profiles than Connotate or Fetch.

We think that the pressure open source business intelligence and open source search are exerting will increase. With giants like IBM (Cognos, i2 Group, SPSS) and Oracle working to protect their revenues, more mergers like the Connotate-Fetch tie up are inevitable. You can read a July 14, 2010, interview with Xoogler Mike Horowitz, Fetch Technologies at this link.

Will the combined companies rock the agent and data fusion market? We hope so.

Stephen E Arnold, March 27, 2012

Sponsored by Pandia.com

Lexmark: Under Its Own Nose

March 20, 2012

I read “Lexmark Acquires Isys Search Software and Nolij (Knowledge, get it?) In 2008, Hewlett Packard   acquired Lexington based Exstream Software. HP paid $350 million for the company, leaving Lexmark wondering what its arch printing enemy was doing. Now more than three years later, Lexmark is lurching through acquisitions.

On March 7, 2012, I reported that Lexmark purchased Brainware, a search, eDiscovery, and back office system. Brainware caught my attention because its finding method was based in part on tri-gram technology. I recall seeing patents on the method which were filed in 1999. I have a special report on this Brainware if anyone is interested. Brainware has a rich history. Its technology stretches back to SER Solutions (See US6772164). SER was once part of SER Systems AG. The current owners bought the search and technology and generated revenue from its back office capabilities, not the “pure” search technology. However, Brainware’s associative memory technology struck me as interesting because it partially addressed the limitations of trigram indexes. Brainware became part of Lexmark’s Perceptive Software unit.

Now, a mere two weeks later, Lexmark snags another search and retrieval company. Isys Search was started by Iain Davies in 1988. Mr. Davies was an author and an independent consultant in IBM mainframe fourth generation languages. His vision was to provide an easy-to-use search system. When I visited with him in 2009, I learned that Isys had more than 12,000 licensees worldwide. However, in the US, Isys never got the revenue traction which Autonomy achieved. Even Endeca which was roughly one-tenth the size of Autonomy was larger than Isys. The company began licensing its connectors to third parties a couple of years ago, and I did not get too many requests for analyses of the company’s technology. Like Endeca, the system processes content and generates a list of entities and other “facets’ which can help a user locate additional information for certain types of queries.

Now Lexmark, which allowed Exstream to go to HP, has purchased two companies with technology which is respectively 24 and 12 years old. I am okay with this approach to obtaining search and retrieval functionality, but I do wonder what Lexmark is going to do to leverage these technologies now that HP has Autonomy and Oracle has Endeca. Microsoft is moving forward with Fast Search and a boat load of third party search solutions from certified Microsoft partners. IBM does the Lucene Watson thing, and every math major from New York to San Francisco is jumping into the big data search and analytics sector.

Here’s a screen shot of the Isys Version 8 interface, which has been updated I have heard. You can see its principal features. I have an analysis of this system as well.

clip_image002

What will Lexmark do with two search vendors?

Here’s the news release lingo:

“Our recent acquisitions enable Lexmark to offer customers a differentiated, integrated system of solutions that are unique, cost effective, and deliver a rapid return on investment,” said Paul Rooke, Lexmark’s chairman and CEO. “The methodical shift in our focus and investments has strengthened our managed print services offerings and added new content and process technologies, positioning Lexmark as a key solutions provider to businesses large and small.”

Perceptive Software is now in the search and content processing business. However, unlike Exstream, these two companies do not have a repository and cross media publishing capability. I think it is unlikely that Lexmark/Perceptive will be able to shoehorn either of these two systems’ technology into its printers. Printers make money because of ink sales, not because of the next generation technology that some companies think will make smart printers more useful. Neither Brainware nor Isys has technology which meshes with the big data and Hadoop craziness now swirling around.

True, Lexmark can invest in both companies, but the cash required to update code from 1988 and methods from 1999 might stretch the Lexmark pocket book. Lexmark has been a dog paddler since the financial crisis of 2008.

image

Source: Google Finance

Here’s the Lane Report’s take on the deal:

Lexmark’s recent acquisitions have advanced its “capture/manage/access” strategy, enabling the company to intelligently capture content from hardcopy and electronic documents through a range of devices including the company’s award-winning smart multifunction products and mobile devices, while also managing and processing content through its enterprise content management and business process management technologies. These technologies, when combined with Lexmark’s managed print services capabilities, give the company the unique ability to help customers save time and money by managing their printing and imaging infrastructure while providing complementary and high value, end-to-end content and process management solutions.

I have a different view:

First, a more fleet footed Lexmark would have snagged the Exstream company. It was close to home, generating revenue, and packaged a solution. Exstream was not a box of Lego blocks. What Perceptive now has is an assembly job, not a product which can go head to head against Hewlett Packard. Maybe Lexmark will find a new market in Oracle installations, but Lexmark is a printer company, not a data management company.

Second, technology is moving quickly. Neither Brainware nor Isys has the components which allow the company to process content and output the type of results one gets from Digital Reasoning or Palantir. Innovative Ikanow is leagues ahead of both Brainware and Isys.

Neither Brainware nor Isys is open source centric. Based on my research and our forthcoming information services about open source technology, neither Brainware nor Isys is in that game. Because growth is exploding in the open source sector, how will Lexmark recover its modest expenditures for these two companies?

I think there may be more lift in the analytics sector than the search sector, but I live in Harrod’s Creek, not the intellectual capital of Kentucky where Lexmark is located.

Worth watching.

Stephen E Arnold, March 20, 2012

Sponsored by Pandia.com

Prediction Data Joins the Fight

January 12, 2012

It seems that prediction data could be joining the fight against terrorism. According to the Social Graph Paper article “Prediction Data As An API in 2012” some companies are working on developing prediction models that can be applied to terror prevention. The article mentions the company Palantir “they emphasize development of prediction models as applied to terror prevention, and consumed by non-technical field analysts.” Recorded Future is another company but they rely on “creating a ‘temporal index’, a big data/ semantic analysis problem, as a basis to predict future events.”  Other companies that have been dabbling in big data/prediction modeling are Sense Networks, Digital Reasoning, BlueKai and Primal. The author theorizes that “There will be data-domain experts spanning the ability to make sense of unstructured data, aggregate from multiple sources, run prediction models on it, and make it available to various “application” providers.”  Using data to predict the future seems a little farfetched but the technology is still new and not totally understood. Everyone does need to join the fight against terrorism but exactly how data prediction fits in remains to be seen.

April Holmes, January 12, 2012

Sponsored by Pandia.com

Predictions on Big Data Miss the Real Big Trend

December 18, 2011

Athena the goddess of wisdom does not spend much time in Harrod’s Creek, Kentucky. I don’t think she’s ever visited. However, I know that she is not hanging out at some of the “real journalists’” haunts. I zipped through “Big Data in 2012: Five Predictions”. These are lists which are often assembled over a lunch time chat or a meeting with quite a few editorial issues on the agenda. At year’s end, the prediction lunch was a popular activity when I worked in New York City, which is different in mental zip from rural Kentucky.

The write up churns through some ideas that are evident when one skims blog posts or looks at the conference programs for “big data.” For example—are you sitting down?—the write up asserts: “Increased understanding of and demand for visualization.” There you go. I don’t know about you, but when I sit in on “intelligence” briefings in the government or business environment, I have been enjoying the sticky tarts of visualization for years. Nah, decades. Now visualization is a trend? Helpful, right?

Let me identify one trend which is, in my opinion, an actual big deal. Navigate to “The Maximal Information Coefficient.” You will see a link and a good summary of a statistical method which allows a person to process “big data” in order to determine if there are gems within. More important, the potential gems pop out of a list of correlations. Why is this important? Without MIC methods, the only way to “know” what may be useful within big data was to run the process. If you remember guys like Kolmogorov, the “we have to do it because it is already as small as it can be” issue is an annoying time consumer. To access the original paper, you will need to go to the AAAS and pay money.

The abstract for “Detecting Novel Associates in Large Data Sets by David N. Reshef1,2,3,*,†, Yakir A. Reshef, Hilary K. Finucane, Sharon R. Grossman, Gilean McVean, Peter Turnbaugh, Eric S. Lander, Michael Mitzenmacher, Pardis C. Sabet, Science, December 16, 2011 is:

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R^2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

Stating a very interesting although admittedly complex numerical recipe in a simple way is difficult, I think this paragraph from “The Maximal Information Coefficient”  does a very good job:

The authors [Reshef et al] go on showing that that the MIC (which is based on “gridding” the correlation space at different resolutions, finding the grid partitioning with the largest mutual information at each resolution, normalizing the mutual information values, and choosing the maximum value among all considered resolutions as the MIC) fulfills this requirement, and works well when applied to several real world datasets. There is a MINE Website with more information and code on this algorithm, and a blog entry by Michael Mitzenmacher which might also link to more information on the paper in the future.

Another take on the MIC innovation appears in “Maximal Information Coefficient Teases Out Multiple Vast Data Sets”. Worth reading as well.

Forbes will definitely catch up with this trend in a few years. For now, methods such as MIC point the way to making “big data” a more practical part of decision making. Yep, a trend. Why? There’s a lot of talk about “big data” but most organizations lack the expertise and the computational know how to perform meaningful analyses. Similar methods are available from Digital Reasoning and the Google love child Recorded Future. Palantir is more into the make pictures world of analytics. For me, MIC and related methods are not just a trend; they are the harbinger of processes which make big data useful, not a public relations, marketing, or PowerPoint chunk of baloney. Honk.

Stephen E Arnold, December 18, 2011

Sponsored by Pandia.com, a company located where high school graduates actually can do math.

Search Silver Bullets, Elixirs, and Magic Potions: Thinking about Findability in 2012

November 10, 2011

I feel expansive today (November 9, 2011), generous even. My left eye seems to be working at 70 percent capacity. No babies are screaming in the airport waiting area. In fact, I am sitting in a not too sticky seat, enjoying the announcements about keeping pets in their cage and reporting suspicious packages to law enforcement by dialing 250.

I wonder if the mother who left a pink and white plastic bag with a small bunny and box of animal crackers is evil. Much in today’s society is crazy marketing hype and fear mongering.

Whilst thinking about pets in cages and animal crackers which may be laced with rat poison, and plump, fabric bunnies, my thoughts turned to the notion of instant fixes for horribly broken search and content processing systems.

I think it was the association of the failure of societal systems that determined passengers at the gate would allow a pet to run wild or that a stuffed bunny was a threat. My thoughts jumped to the world of search, its crazy marketing pitches, and the satraps who have promoted themselves to “expert in search.” I wanted to capture these ideas, conforming to the precepts of the About section of this free blog. Did I say, “Free.”

A happy quack to http://www.alchemywebsite.com/amcl_astronomical_material02.html for this image of the 21st century azure chip consultant, a self appointed expert in search with a degree in English and a minor in home economics with an emphasis on finger sandwiches.

The Silver Bullets, Garlic Balls, and Eyes of Newts

First, let me list the instant fixes, the silver bullets,  the magic potions, the faerie dust, and the alchemy which makes “enterprise search” work today. Fasten your alchemist’s robe, lift your chin, and grab your paper cone. I may rain on your magic potion. Here are 14 magic fixes for a lousy search system. Oh, one more caveat. I am not picking on any one company or approach. The key to this essay is the collection of pixie dust, not a single firm’s blend of baloney, owl feathers, and goat horn.

  1. Analytics (The kind equations some of us wrangled and struggled with in Statistics 101 or the more complex predictive methods which, if you know how to make the numerical recipes work, will get you a job at Palantir, Recorded FutureSAS, or one of the other purveyors of wisdom based on big data number crunching)
  2. Cloud (Most companies in the magic elixir business invoke the cloud. Not even Macbeth’s witches do as good  a job with the incantation of Hadoop the Loop as Cloudera,but there are many contenders in this pixie concoction. Amazon comes to mind but A9 gives me a headache when I use A9 to locate a book for my trusty e Reeder.)
  3. Clustering (Which I associate with Clustify and Vivisimo, but Vivisimo has morphed clustering in “information optimization” and gets a happy quack for this leap)
  4. Connectors (One can search unless one can acquire content. I like the Palantir approach which triggered some push back but I find the morphing of ISYS Search Software a useful touchstone in this potion category)
  5. Discovery systems (My associative thought process offers up Clearwell Systems and Recommind. I like Recommind, however, because it is so similar to Autonomy’s method and it has been the pivot for the company’s flip flow from law firms to enterprise search and back to eDiscovery in the last 12 or 18 months)
  6. Federation (I like the approach of Deep Web Technologies and for the record, the company does not position its method as a magical solution, but some federating vendors do so I will mention this concept. Yhink mash up and data fusion too)
  7. Natural language processing (My candidate for NLP wonder worker is Oracle which acquired InQuira. InQuira is  a success story because it was formed from the components of two antecedent search companies, pitched NLP for customer support,and got acquired by Oracle. Happy stakeholders all.)
  8. Metatagging (Many candidates here. I nominate the Microsoft SharePoint technology as the silver bullet candidate. SharePoint search offers almost flawless implementation of finding a document by virtue of  knowing who wrote it, when, and what file type it is. Amazing. A first of sorts because the method has spawned third party solutions from Austria to t he United States.)
  9. Open source (Hands down I think about IBM. From Content Analytics to the wild and crazy Watson, IBM has open source tattooed over large expanses of its corporate hide. Free? Did I mention free? Think again. IBM did not hit $100 billion in revenue by giving software away.)
  10. Relationship maps (I have to go with the Inxight Software solution. Not only was the live map an inspiration to every business intelligence and social network analysis vendor it was cool to drag objects around. Now Inxight is part of Business Objects which is part of SAP, which is an interesting company occupied with reinventing itself and ignored TREX, a search engine)
  11. Semantics (I have to mention Google as the poster child for making software know what content is about. I stand by my praise of Ramanathan Guha’s programmable search engine and the somewhat complementary work of Dr. Alon Halevy, both happy Googlers as far as I know. Did I mention that Google has oodles of semantic methods, but the focus is on selling ads and Pandas, which are somewhat related.)
  12. Sentiment analysis (the winner in the sentiment analysis sector is up for grabs. In terms of reinventing and repositioning, I want to acknowledge Attensity. But when it comes to making lemonade from lemons, check out Lexalytics (now a unit of Infonics). I like the Newssift case, but that is not included in my free blog posts and information about this modest multi-vehicle accident on the UK information highway is harder and harder to find. Alas.)
  13. Taxonomies (I am a traditionalist, so I quite like the pioneering work of Access Innovations. But firms run by individuals who are not experts in controlled vocabularies, machine assisted indexing, and ANSI compliance have captured the attention of the azure chip, home economics, and self appointed expert crowd. Access innovations knows its stuff. Some of the boot camp crowd, maybe somewhat less? I read a blog post recently that said librarians are not necessary when one creates an enterprise taxonomy. My how interesting. When we did the ABI/INFORM and Business Dateline controlled vocabularies we used “real” experts and quite a few librarians with experience conceptualizing, developing, refining, and ensuring logical consistency of our word lists. It worked because even the shadow of the original ABI/INFORM still uses some of our term 30 plus years later. There are so many taxonomy vendors, I will not attempt to highlight others. Even Microsoft signed on with Cognition Technologies to beef up its methods.)
  14. XML (there are Google and MarkLogic again. XML is now a genuine silver bullet. I thought it was a markup language. Well, not any more, pal.)

Read more

A Coming Dust Up between Oracle and MarkLogic?

November 7, 2011

Is XML the solution to enterprise data management woes? Is XML a better silver bullet than taxonomy management? Will Oracle sit on the sidelines or joust with MarkLogic?

Last week, an outfit named AtomicPR sent me a flurry of news releases. I wrote a chipper Atomic person mentioning that I sell coverage and that I thought the three news releases looked a lot like Spam to me. No answer, of course.

A couple of years ago, we did some work for MarkLogic, a company focused on Extensible Markup Language or XML. I suppose that means AtomicPR can nuke me with marketing fluff. At age 67, getting nuked is not my idea of fun via email or just by aches and pains.

Since August 2011, MarkLogic has been “messaging” me. The recent 2011 news releases explained that MarkLogic was hooking XML to the buzz word “big data.” I am not exactly sure what “big data” means, but that is neither here nor there.

In September 2011, I learned that MarkLogic had morphed into a search vendor. I was surprised. Maybe, amazed is a more appropriate word. See Information Today’s interview with Ken Bado, formerly an Autodesk employee. (Autodesk makes “proven 3D software that accelerates better design.” Autodesk was the former employer of Carol Bartz when Autodesk was an engineering and architectural design software company. I have a difficult time keeping up with information management firms’ positioning statements. I refer to this as “fancy dancing” or “floundering” even though an azure chip consultant insists I really should use the word “foundering”. I love it when azure chip consultants and self appointed experts input advice to my free blog.)

In a joust between Oracle and MarkLogic, which combatant will be on the wrong end of the pointy stick thing? When marketing goes off the rails, the horse could be killed. Is that one reason senior executives exit the field of battle? Is that one reason veterinarians haunt medieval re-enactments?

Trade Magazine Explains the New MarkLogic

I thought about MarkLogic when I read “MarkLogic Ties Its Database to Hadoop for Big Data Support.” The PCWorld story stated:

MarkLogic 5, which became generally available on Tuesday, includes a Hadoop connector that will allow customers to “aggregate data inside MarkLogic for richer analytics, while maintaining the advantages of MarkLogic indexes for performance and accuracy,” the company said.

A connector is a software widget that allows one system to access the information in another system. I know this is a vastly simplified explanation. Earlier this year, Palantir and i2 Group (now part of IBM) got into an interesting legal squabble over connectors. I believe I made the point in a private briefing that “connectors are a new battleground.” the MarkLogic story in PCWorld indicated that MarkLogic is chummy with Hadoop via connectors. I don’t think MarkLogic codes its own connectors. My recollection is that ISYS Search Software licenses some connectors to MarkLogic, but that deal may have gone south by now. And, MarkLogic is a privately held company funded, I believe, by Lehman Brothers, Sequoia Capital, and Tenaya Capital. I am not sure “open source” and these financial wizards are truly harmonized, but again I could be wrong, living in rural Kentucky and wasting my time in retirement writing blog posts.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta