Research and Development Innovation: A New Study from a Search Vendor

August 3, 2012

I received message from LinkedIn about a news item called “What Are the Keys to Innovation in R&D?” I followed the links and learned that the “study” was sponsored by Coveo, a search vendor based in Canada. You can access similar information about the study by navigating to the blog post “New Study: The Keys to Innovation for R&D Organizations – Their Own, Unused Knowledge.” (You will also want to reference the news release about the study as well. It is on the Coveo News and Events page.

Engineers need access to the drawings and those data behind the component or subsystem manufactured by their employer. Text based search systems cannot handle this type of specialized data without some additional work or the use of third party systems. A happy quack to PRLog: http://www.prlog.org/10416296-mechanical-design-drawing-services.jpg

The main of the study, as I interpret it, is marketing Coveo as a tool to facilitate knowledge management. Even though I write a monthly column for the print and online publication KMWorld, I do not have a definition of knowledge management with which I am comfortable. The years I spent at Booz, Allen & Hamilton taught me that management is darned tough to define. Management as a practice is even more difficult to do well. Managing research and development is one of the more difficult tasks a CEO must handle. Not even Google has an answer. Google is now buying companies to have a future, not inventing its future with existing staff.

The unhappy state of many search and content processing companies is evidence that those with technological expertise may not be able to generate consistent and growing revenues. Innovation in search has become a matter of jazzing up interfaces and turning up the marketing volume. The $10 billion paid for Autonomy, the top dog in the search and content processing space, triggered grousing by Hewlett Packard’s top executives. Disappointing revenues may have contributed to the departure of some high profile Autonomy Corporation executives. Not even the HP way can make traditional search technology pay off as expected, hoped, and needed. Search vendors are having a tough time growing fast enough to stay ahead of spiking technical and support costs.

When I studied for a year at the Jesuit-run Duquesne University, I encountered Dr. Frances J. Chivers. The venerable PhD was an expert in epistemology with a deep appreciation for the lively St. Augustine and the comedian Johann Gottlieb Fichte. I was indexing medieval Latin sermons. I had to take “required” courses in “knowledge.” In the mid 1960s, there were not too many computer science departments in the text indexing game, so I assume that Duquesne’s administrators believed that sticking me in the epistemology track would improve the performance of my mainframe indexing software. Well, let me tell you: Knowledge is a tough nut to crack.

Now you can appreciate my consternation when the two words are juxtaposed and used by search vendors to sell indexing. Dr. Chivers did not have a clue about what I was doing and why. I tried to avoid getting involved in discussions that referenced existentialism, hermeneutics, and related subjects. Hey, I liked the indexing thing and the grant money. To this day, I avoid talking about knowledge.

Selected Findings

Back to the study. Coveo reports:

We recently polled R&D teams about how they use and share innovation across offices and departments, and the challenges they face in doing so.  Because R&D is a primary creator and consumer of knowledge, these organizations should be a model for how to utilize and share it. However, as we’ve seen in the demand for our intelligent indexing technology, and as revealed in the study, we found that R&D teams are more apt to duplicate work, lose knowledge and operate in soloed, “tribal” environments where information isn’t shared and experts can’t be found.  This creates a huge opportunity for those who get it right—to out-innovate and out-perform their competition.

The question I raised to myself was, “How were the responses from Twitter verified as coming from qualified respondents?” And, “How many engineers with professional licenses versus individuals who like Yahoo’s former president just arbitrarily awarded themselves a particular certification were in the study?” Also, “What statistical tests were applied to the results to validate the the data met textbook-recommended margins of error?”

I may have the answers to these questions in the source documents. I have written about “number shaping” at some of the firms with which I have worked, and I have addressed the issue more directly in my opt in, personal news service Honk. (Honk, a free weekly newsletter, is a no-holds-barred look at one hot topic in search and content processing. Those with a propensity to high blood pressure should not subscribe.)

Read more

Epic Analysis

August 2, 2012

A unique application of text analytics may hint at a future for English majors, at last! Phys.org News informs us, “Physicists Study the Classics for Hidden Truths.” Scholars at Coventry University analyzed the Illiad, Beowulf, and the Irish epic Táin Bó Cuailnge. They found that, in all three mythological works, character interactions mirror those found in today’s social networks.

The write up describes the study’s methodology:

“The researchers created a database for each of the three stories and mapped out the characters’ interactions. There were 74 characters identified in Beowulf, 404 in the Táin and 716 in the Iliad.

“Each character was assigned a number, or degree, based on how popular they were, or how many links they had to other characters. The researchers then measured how these degrees were distributed throughout the whole network.

“The types of relationships that existed between the characters were also analysed using two specific criteria: friendliness and hostility.

“Friendly links were made if characters were related, spoke to each other, spoke about one another or it is otherwise clear that they know each other amicably. Hostile links were made if two characters met in a conflict, or when a character clearly displayed animosity against somebody they know.”

These interaction maps paralleled those found in real-life networks. On the other hand, the same analysis of four fictional tales, Les Misérables, Richard III, The Fellowship of the Ring, and Harry Potter, turned up clear differences from real-life interactions. (See the article for more details on these differences.)

Interesting—the classical epics are more true-to-life than fiction. This is not to say that everything in them can be taken as facts, of course; no one insists Beowulf slew a real dragon, for example. However, the study does suggest that as the craft of story writing was refined, it moved away from realistic portrayal of societies and the ways folks related to each other. Why would that be? Ask an English major.

Cynthia Murrell, August 2, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Quote to Note: Manage the Decline

July 23, 2012

I snipped a quotation from the Wall Street Journal, dead tree edition, this morning (July 23, 2012). On page B-6 the “Idol Auditions New Judges” write up included this gem:

American Idol is a juggernaut franchise that still has many season left but once a program starts to fall from its pear, you are working to minimize the decline, said Kris Magel, director of national broadcast at Initiative, a media buying firm…

I highlighted the phrase which I think is a keeper. I want to use this idea to characterize a number of search and content processing vendors’ actions in the closing months of 2012. With the shift to open source technologies beginning to gain momentum, many information retrieval companies, regardless of the spin in their marketing collateral, are likely to be working to maintain revenues. Growth may be tough. With funds in short supply for some firms, the white knight notion of an acquisition to get talented people (an acq-hire) may be galloping into the sunset. Trigger words for me now include predictive anything (analytics, tagging, coding, what have you), customer support or customer relationship management, and big data. Words do not equate with revenue in the tough months ahead.

Love that phrase, “minimize the decline.”

Stephen E Arnold, July 23, 2012

Sponsored by Ikanow

The TREC 2011 Results and Predictive Whatevers

July 20, 2012

Law.com reports “Technology-Assisted Review Boosted in TREC 2011 Results” how technology-assisted review boasts that it may be capable of ousting predictive coding’s title. TREC Legal Track is an annual government sponsored project (2012 was canceled) to examine document review methods. From the 2011 TREC, participants voted in favor of technology-assisted review, but it may have a way to go:

“As such, ‘There is still plenty of room for improvement in the efficiency and effectiveness of technology-assisted review efforts, and, in particular, the accuracy of intra-review recall estimation tools, so as to support a reasonable decision that ‘enough is enough’ and to declare the review complete. Commensurate with improvements in review efficiency and effectiveness is the need for improved external evaluation methodologies,’ the report states.”

The 2011 TREC asked participants to test three document review requests, but different from past years the rules were more specific in requirements by having participants rank documents as well as which were the most responsive. The extra requirement meant that researchers were able to test hypothetical situations, but there were some downsides:

“TREC 2011 had its share of controversy. ‘Some participants may have conducted an all-out effort to achieve the best possible results, while others may have conducted experiments to illuminate selected aspects of document review technology. … Efficacy must be interpreted in light of effort,’ the report authors wrote. They noted that six teams devoted 10 or fewer hours for document review during individual rounds, two took 20 hours, one used 48 hours, and one, Recommind, invested 150 hours in one round and 500 in another.”

We noticed this passage in the write up as well:

“`It is inappropriate –- and forbidden by the TREC participation agreement –- to claim that the results presented here show that one participant’s system or approach is generally better than another’s. It is also inappropriate to compare the results of TREC 2011 with the results of past TREC Legal Track exercises, as the test conditions as well as the particular techniques and tools employed by the participating teams are not directly comparable. One TREC 2011 Legal Track participant was barred from future participation in TREC for advertising such invalid comparisons,’ the report states.”

TREC is sensitive to participants who use the data for commercial purposes. We wonder which vendor allegedly stepped over the end line. We also wonder if TREC is breaking out of the slump which traditional indexing seems have relaxed into. Is “predictive” the future of search? We are not sure about the TREC results. We do have an opinion, however. Predictive works in certain situations. For others, there are other, more reliable tools. We also believe that there is a role for humans, particularly when the risks of an algorithm going crazy exist. A goof in placing an ad on a Web page is one thing. An error predicting more significant events? Well, we are more cautious. Marketers are afoot. We prefer the more pragmatic approach of outfits like Ikanow and we avoid the high fliers whom we will not name.

Stephen E Arnold, July 20, 2012

Sponsored by Polyspot

 

Text Analysis and Text Mining Are Powerful Tools

July 19, 2012

Text analysis and mining is one service that many data analytics firms offer their clients. AME Info has the latest news on how “SAS to Add High Performance Text Mining to Its Powerful In-Memory Analytics Software in Q3 2012.” SAS is one of the leading big data analytics companies and soon they will add Hugh-Performance Analytics to their Teradata and EMC Greenplum platforms to perform even more complex big data analytics. The new technology with new text-mining technology will give new insights into unstructured data from emails to social media quicker and more efficiently.

SAS is proud of the advancement:

” ‘High-performance analytics is the most significant SAS technology advance over the last 10 years,’ said Jim Goodnight, CEO, SAS. ‘We realized that organizations were accumulating massive amounts of data that could provide answers to questions they could never ask before. The analysis took so long to process, answers were irrelevant by the time the computer spit them out. High-performance analytics provides answers when the information is still useful and leaves time to explore multiple possibilities.’ “

Text analysis is a basic service and many companies are trying to find ways to make their services stand out in the crowd. We suggest that you look at the next generation text analysis vendors; for example, Ikanow.

Whitney Grace, July 19, 2012

Sponsored byIkanow

Now Business Intelligence Is Dead

July 18, 2012

I received a “news item”  from Information Enterprise Software, an HTML email distributed by InformationWeek Software. The story was labeled “Commentary.” I did not think that “real” journalists engaged in “commentary.” Isn’t there “real” news out there to “cover” or “make.”

Read the article. Navigate to “If BI Is Dead, What’s Next?” The “commentary” is hooked to an azure chip consultant report called “BI Is Dead! Long Live BI” which costs a modest $250. You can buy this document from Constellation Research here. First, let’s look at the summary of the report and then consider the commentary. I want to wrap up with some blunt talk about analytic baloney which is winging through the air.

Here’s the abstract so get your credit card ready:

We [Constellation Research] suggest a dozen best practices needed to move Business Intelligence (BI) software products into the next decade. While five “elephants” occupy the lion’s share of the market, the real innovation in BI appears to be coming from smaller companies. What is missing from BI today is the ability for business analysts to create their own models in an expressive way. Spreadsheet tools exposed this deficiency in BI a long time ago, but their inherent weakness in data quality, governance and collaboration make them a poor candidate to fill this need. BI is well-positioned to add these features, but must first shed its reliance on fixed-schema data warehouses and read-only reporting modes. Instead, it must provide businesspeople with the tools to quickly and fully develop their models for decision-making.

I like the animal metaphors. I must admit I thought more in terms of baloney, but that’s just an addled goose’s reaction to “real” journalism.

The point is that business intelligence (I really dislike the BI acronym) can do a heck of a lot more. So what’s dead? Excel? Nah. Business intelligence? Nah. A clean break with the past which involved SAS, SPSS, and Cognos type systems? Nah.

Information about point and click business intelligence should be delivered in this type of vehicle. A happy quack to the marketing wizard at Oscar Mayer for the image at http://brentbrown98.hubpages.com/hub/12-of-the-Worst-Sports-Logos-Ever

So what?

Answer: Actually not a darned thing. What this report has going for it is a shocking headline. Sigh.

Now to the “commentary.” Look a pay to play report is okay. The report is a joint work of InformationWeek and the Constellation report. Yep, IDC is one of the outfits involved in the study. The “commentary” is pretty much a commercial. Is this “real” journalism? Nah, it is a reaction to a lousy market for consulting studies and an attempt to breathe controversy into a well known practice area.

Here’s the passage I noted:

We all saw the hand wringing in recent years over BI not living up to its promise, with adoption rates below 20% or even 10% of potential users at many enterprises. But that’s “probably the right level” given the limitations of legacy BI tools, says Raden. I couldn’t agree more, and I’ve previously called for better ease of use, ease of deployment, affordability, and ease of administration. What’s largely missing from the BI landscape, says Raden, is the ability for business users to create their own data models. Modeling is a common practice, used to do what-if simulation and scenario planning. Pricing models, for instance, are used to predict sales and profits if X low-margin product is eliminated in hopes of retaining customers with products A, B, and C.

So what we are learning is that business intelligence systems have to become easier to use. I find this type of dumbing down a little disturbing. Nothing can get a person into more business trouble faster than fiddling around with numbers and not understanding what the implications of a decision are. Whether it is the fancy footwork of a Peregrine or just the crazy US government data about unemployment, a failure to be numerically literature can have big consequences.

Read more

Inteltrax: Top Stories, July 16 to July 20

July 16, 2012

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, some breaking news in the industry.

Our story: “Data Mining and Other Issues on Slate at 2012 Joint Statistical Meetings” showed that analytics is rightly on statistic experts’ radar.

Mike Miller Joins Digital Reasoning as VP of Sales” provided a glimpse into the wisest hiring minds in the business.

Florida Community Benefits Medically and Financially from Analytics” gives a glimpse at the immediate impact analytics is making on the community level.

News crops up in all areas of analytics, so it’s helpful to have stories wrangled up that might slip through the cracks. We’re here everyday, monitoring just such stories so you don’t have to.

Follow the Inteltrax news stream by visiting www.inteltrax.com

Patrick Roland, Editor, Inteltrax.

July 16, 2012

Inteltrax: Top Stories, July 2 to July 6

July 9, 2012

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, some of the more interesting niches in the industry.

America’s passtime takes front and center stage in our story, “Baseball and Analytics Hit a Home Run,” which showcases the data explosion in the sport.

Wall Street, too, is a big center of analytic thought and our story, “Analytic Financial Trends” unlocks some of the big moves happening.

Finally, Washington has long been a supporter of big data and our story, “Data Mining to Play Role in 2012 Election” shows that Obama, Romney and other offices are using this technology to their advantage.

Analytics is invading our world, often in the most unexpected places. This is just a small sampling of the deep research we provide every day.

Follow the Inteltrax news stream by visiting www.inteltrax.com

 

Patrick Roland, Editor, Inteltrax.

July 9, 2012

Protected: The Federal Government Turns to Litigation Software

July 5, 2012

This content is password protected. To view it please enter your password below:

Google and Latent Semantic Indexing: The KnowledgeGraph Play

June 26, 2012

One thing that is always constant is Google changing itself.  Not too long ago Google introduced yet another new tool: Knowledge Graph.  Business2Community spoke highly about how this new application proves the concept of latent semantic indexing in “Keyword Density is Dead…Enter “Thing Density.”  Google’s claim to fame is providing the most relevant search results based on a user’s keywords.  Every time they update their algorithm it is to keep relevancy up.  The new Knowledge Graph allows users to break down their search by clustering related Web sites and finding what LSI exists between the results.  From there the search conducts a secondary search and so on.  Google does this to reflect the natural use of human language, i.e. making their products user friendly.

But this change begs an important question:

“What does it mean for me!? Well first and foremost keyword density is dead, I like to consider the new term to be “Concept Density” or to coin Google’s title to this new development “Thing Density.” Which thankfully my High School English teachers would be happy about. They always told us to not use the same term over and over again but to switch it up throughout our papers. Which is a natural and proper style of writing, and we now know this is how Google is approaching it as well.”

The change will means good content and SEO will be rewarded.  This does not change the fact, of course, that Google will probably change their algorithm again in a couple months but now they are recognizing that LSI has value.  Most IVPs that provide latent semantic indexing, content and text analytics, such as Content Analyst,have gone way beyond what Google’s offering with the latest LSI trends to make data more findable and discover new correlations.

Whitney Grace, June 26, 2012

Sponsored by Content Analyst

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta