Inteltrax: Top Stories, October 31 to November 4

November 7, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, its impact on businesses and nations around the globe.

A good overview of this topic was our article, “Businesses Prepare for Analytic Bandwagon” http://inteltrax.com/?p=2674 which showed proof that businesses across all industries and sizes are latching onto the power of big data analytics to improve their bottom lines.

More specifically, we saw its impact on a tiny nation in the story, “New Zealand Stepping onto World BI Stage,” http://inteltrax.com/?p=2687 which showed how that country’s passion for big data with companies like Right Hemisphere and ComOps.

We issued a firm warning to any business trying to get something for nothing in “Freemium BI Software Not the Total Answer to Analytic Woes,” http://inteltrax.com/?p=2694 which warned that free BI tools are no match for the investment of proven analytic tools.

This is a wide swath of analytic focus, but each well worth the attention. Whether it puts a small country on the tech map, offers companies chances to get more competitive or also tempts budgets with worthless freebees, IntelTrax is watching the pulse of the industry to keep readers informed.

Follow the Inteltrax news stream by visiting

www.inteltrax.com

Patrick Roland, Editor, Inteltrax

The Perils of Searching in a Hurry

November 1, 2011

I read the Computerworld story “How Google Was Tripped Up by a Bad Search.” I assume that it is pretty close to events as the “real” reporter summarized them.

Let me say that I am not too concerned about the fact that Google was caught in a search trip wire. I am concerned with a larger issue, and one that is quite important as search becomes indexing, facets, knowledge, prediction, and apps. The case reported by Computerworld applies to much of “finding” information today.

Legal matters are rich with examples of big outfits fumbling a procedure or making an error under the pressure of litigation or even contemplating litigation. The Computerworld story describes an email which may be interpreted as having a bright LED to shine on the Java in Android matter. I found this sentence fascinating:

Lindholm’s computer saved nine drafts of the email while he was writing it, Google explained in court filings. Only to the last draft did he add the words “Attorney Work Product,” and only on the version that was sent did he fill out the “to” field, with the names of Rubin and Google in-house attorney Ben Lee.

Ah, the issue of versioning. How many content management experts have ignored this issue in the enterprise. When search systems index, does one want every version indexed or just the “real” version? Oh, what is the “real” version. A person has to investigate and then make a decision. Software and azure chip consultants, governance and content management experts, and busy MBAs and contractors are often too busy to perform this work. Grunt work, I believe, it may be described by some.

What I am considering is the confluence of people who assume “search” works, the lack of time Outlook and iCalandar “priority one” people face, and the reluctance to sit down and work through documents in a thorough manner. This is part of the “problem” with search and software is not going to resolve the problem quickly, if ever.

Source: http://www.clipartguide.com/_pages/0511-1010-0617-4419.html

What struck me is how people in a hurry, assumptions about search, and legal procedures underscore a number of problems in findability. But the key paragraph in the write up, in my opinion, was:

It’s unclear exactly how the email drafts slipped through the net, and Google and two of its law firms did not reply to requests for comment. In a court filing, Google’s lawyers said their “electronic scanning tools” — which basically perform a search function — failed to catch the documents before they were produced, because the “to” field was blank and Lindholm hadn’t yet added the words “attorney work product.” But documents produced for opposing counsel should normally be reviewed by a person before they go out the door, said Caitlin Murphy, a senior product manager at AccessData, which makes e-discovery tools, and a former attorney herself. It’s a time-consuming process, she said, but it was “a big mistake” for the email to have slipped through.

What did I think when I read this?

First, all the baloney—yep, the right word, folks–about search, facets, metadata, indexing, clustering, governance and analytics underscore something I have been saying for a long, long time. Search is not working as lots of people assume it does. You can substitute “eDiscovery,” “text mining,” or “metatagging” for search. The statement holds water for each.

The algorithms will work within limits but the problem with search has to do with language. Software, no matter how sophisticated, gets fooled with missing data elements, versions, and words themselves. It is high time that the people yapping about how wonderful automated systems are stop and ask themselves this question, “Do I want to go to jail because I assumed a search or content processing system was working?” I know my answer.

Second, in the Computerworld write up, the user’s system dutifully saved multiple versions of the document. Okay, SharePoint lovers, here’s a question for you? Does your search system make clear which antecedent version is which and which document is the best and final version? We know from the Computerworld write up that the Google system did not make this distinction. My point is that the nifty sounding yap about how “findable” a document is remains mostly baloney. Azure chip consultants and investment banks can convince themselves and the widows from whom money is derived that a new search system works wonderfully. I think the version issue makes clear that most search and content processing systems still have problems with multiple instances of documents. Don’t believe me. Go look for the drafts of your last PowerPoint. Now to whom did you email a copy? From whom did you get inputs? Which set of slides were the ones on the laptop you used for the briefing? What the “correct” version of the presentation? If you cannot answer the question, how will software?

Read more

Inteltrax: Top Stories, October 24 to October 28

October 31, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, the economic challenges that are realized and overcome thanks to the use of big data and analytics.

The best example of this situation that we found came from our story, “BI’s a Part of Germany’s Strong Economy,” http://inteltrax.com/?p=2647 showcased the fascinating trend of how one of the few thriving European economies is directly tied to business intelligence and data analytics.

The story, “Analytic Jobs a Possible Economic Solution,” http://inteltrax.com/?p=2652 discussed how analytic work has been steady while other industries dry up. Could data analysis be the fix to sluggish economies?

Another economic staple, FICO credit scores, were magnified in the story, “Pushing 60, FICO Adjusts to Analytics.” http://inteltrax.com/?p=2655 Here, we discovered how the credit giant takes the massive amounts of personal data to streamline its analytic system.”

No matter how you slice it, economics is a hot topic these days. We were pleased to discover a positive side to this talk when paired with analytics. We are optimistic about this union in the future and will continue giving it our attention at IntelTrax.

Follow the Inteltrax news stream by visiting

http://www.inteltrax.com/

Patrick Roland, Editor, Inteltrax.

October 31, 2011

Datameer Creates Analytics Platform for Hadoop

October 31, 2011

Software development company Datameer  has come up with another Hadoop  business intelligence play to maintain the compounded 40 percent per year growth rate in corporate data volume, with the lion’s share of the growth in unstructured data, being produced and consumed.

There are current technical challenges that need to be addressed. Hadoop is moving out costly analytic databases and warehouses, in its push forward has given us yet another crazy acronym—ADBMS. Now Hadoop vendors keeping the Big Data market in a state of churn.

In the Datameer blog write up “Why I Am at Datameer”  Brian Smith discusses a potential solution to this issue. He asserted:

Datameer is the first BI/Analytics platform built natively on Hadoop. On the surface it sounds interesting, but in practice the solution is game-changing. The Datameer Analytic Solution (DAS) connects business users directly with the entire volume and variety of their raw Hadoop data and makes it available for comprehensive analysis.

While Smith’s assertions are certainly interesting, we are not sure who is “first” in many of the assertions about the Big Data world. IBM is chugging away. Digital Reasoning is a player. There are, in fact, dozens of companies making claims and counter-claims. Perhaps in a dicey economy, marketing takes precedence over cold, hard facts?

Jasmine Ashton, October 31, 2011

Sponsored by Pandia.com

Big Data for Big Thinkers

October 31, 2011

“Big data analytics” is an emerging term in the storage industry that originated within the open source community to develop analytics processes that were faster and more scalable than traditional data warehousing.

Open source advocates hope to use data to extract value from the vast amounts of unstructured data produced daily by web users. I recently read an interesting Karmasphere write up called “Big Data IS Different— I Knew It!” in which Rich Guth mused about his past year spent at Karmasphere. In the period, his opinion of Big Data requires different analytic techniques than traditional business intelligence products provide. Guth asserted:

Today we announced version 1.5 of our Karmasphere Analyst product, a workspace for performing Big Data Analytics. It implements a new workflow for data analysts to mine and analyze Big Data.  We also released a whitepaper “Deriving Intelligence from Big Data in Hadoop – A Big Data Analytics Primer” that describes this workflow, discusses why this workflow is necessary and compares it to traditional BI and data warehousing approaches.

The challenge is to make clear exactly what “old methods” will not work and which “new methods” will work. As important, how does a person using a system with new Big Data methods determine if the outputs are accurate. Who wants to make a decision only to find out that the underlying set up of the new methods were off the mark. Most business intelligence professionals don’t know when an old and well worn method is delivering accurate outputs. Toss in a snappy graphic and the disconnect may become significant.

Jasmine Ashton, October 31, 2011

Sponsored by Pandia.com

Software and Smart Content

October 30, 2011

I was moving data from Point A to Point B yesterday, filtering junk that has marginal value. I scanned a news story from a Web site which covers information technology with a Canadian perspective. The story was “IBM, Yahoo turn to Montreal’s NStein to Test Search Tool.” In 2006, IBM was a pace-setter in search development cost control The company was relying on the open source community’s Lucene technology, not the wild and crazy innovations from Almaden and other IBM research facilities. Web Fountain and jazzy XML methods were promising ways to make dumb content smart, but IBM needed a way to deliver the bread-and-butter findability at a sustainable, acceptable cost. The result was OmniFind. I had made a note to myself that we tested the Yahoo OmniFind edition when it became available and noted:

Installation was fine on the IBM server. Indexing seemed sluggish. Basic search functions generated a laundry list of documents. Ho hum.

Maybe this comment was unfair, but five years ago, there were arguably better search and retrieval systems. I was in the midst of the third edition of the Enterprise Search Report, long since batardized by the azure chip crowd and the “real” experts. But we had a test corpus, lots of hardware, and an interest is seeing for ourselves how tough it was to get an enterprise search system up and running. Our impression was that most people would slam in the system, skip the fancy stuff, and move on to more interesting things such as playing Foosball.

Thanks to Adobe for making software that creates a need for Photoshop training. Source: http://www.practical-photoshop.com/PS2/pages/assign.html

Smart, Intelligent… Information?

In this blast from the past article, NStein’s product in 2006 was “an intelligent content management product used by media companies such as Time Magazine and the BBC, and a text mining tool called NServer.” The idea was to use search plus a value adding system to improve the enterprise user’s search experience.

Now the use of the word “intelligent” to describe a content processing system, reaching back through the decades to computer aided logistics and forward to the Extensible Markup Language methods.

The idea of “intelligent” is a pregnant one, with a gestation period measured in decades.

Flash forward to the present. IBM markets OmniFind and a range of products which provide basic search as a utility function. NStein is a unit of OpenText, and it has been absorbed into a conglomerate with a number of search systems. The investment needed to update, enhance, and extend BASIS, BRS Search, NStein, and the other systems OpenText “sells” is a big number. “Intelligent content” has not been an OpenText buzzword for a couple of years.

The torch has been passed to conference organizers and a company called Thoora, which “combines aggregation, curation, and search for personalized news streams.” You can get some basic information in the TechCrunch article “Thoora Releases Intelligent Content Discovery Engine to the Public.”

In two separate teleconference calls last week (October 24 to 28, 2011), “intelligent content” came up. In one call, the firm was explaining that traditional indexing system missed important nuances. By processing a wide range of content and querying a proprietary index of the content, the information derived from the content would be more findable. When a document was accessed, the content was “intelligent”; that is, the document contained value added indexing.

The second call focused on the importance of analytics. The content processing system would ingest a wide range of unstructured data, identify items of interest such as the name of a company, and use advanced analytics to make relationships and other important facets of the content visible. The documents were decomposed into components, and each of the components was “smart”. Again the idea is that the fact or component of information was related to the original document and to the processed corpus of information.

No problem.

Shift in Search

We are witnessing another one of those abrupt shifts in enterprise search. Here’s my working hypothesis. (If you harbor a life long love of marketing baloney, quit reading because I am gunning for this pressure point.)

Let’s face it. Enterprise search is just not revving the engines of the people in information technology or the chief financial officer’s office. Money pumped into search typically generates a large number of user complaints, security issues, and cost spikes. As content volume goes up, so do costs. The enterprise is not Google-land, and money is limited. The content is quite complex, and who wants to try and crack 1990s technology against the nut of 21st century data flows. Not I. So something hotter is needed.

Second, the hottest trends in “search” have nothing to do with search whatsoever. Examples range from conflating the interface with precision and recall. Sorry. Does not compute for me. The other angle is “mobile.” Sure, search will work  when everything is monitored and “smart” software provides a statistically appropriate method suggests will work “most” of the time. There is also the baloney about apps, which is little more than the gameification of what in many cases might better be served with a system that makes the user confront actual data, not an abstraction of data. What this means is that people are looking for a way to provide information access without having to grunt around in the messy innards of editorial policies, precision, recall, and other tasks that are intellectually rigorous in a way that Angry Birds interfaces for business intelligence are not.

Third, companies engaged in content access are struggling for revenue. Sure, the best of the search vendors have been purchased by larger technology companies. These acquisitions guarantee three things.

  1. The Wild West spirit of the innovative content processing vendors is essentially going to be stamped out. Creativity will be herded into the corporate killing pens, and the “team” will be rendered as meat products for a technology McDonald’s
  2. The cash sink holes that search vendors research programs were will be filled with procedure manuals and forms. There is no money for blue sky problem solving to crack the tough problems in information retrieval at a Fortune 1000 company. Cash can be better spent on things that may actually generate a return. After all, if the search vendors were so smart, why did most companies hit revenue ceilings and have to turn to acquisitions to generate growth? For firms unable to grow revenues, some just fiddled the books. Others had to get injections of cash like a senior citizen in the last six months of life in a care facility. So acquired companies are not likely to be hot beds of innovation.
  3. The pricing mechanisms which search vendors have so cleverly hidden, obfuscated, and complexified will be tossed out the window. When a technology is a utility, then giant corporations will incorporate some of the technology in other products to make a sale.

What we have, therefore, is a search marketplace where the most visible and arguably successful companies have been acquired. The companies still in the marketplace now have to market like the Dickens and figure out how to cope with free open source solutions and giant acquirers who will just give away search technology.

Read more

Inteltrax: Top Stories, October 17 to October 21

October 24, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, the ups and downs for some of the industry’s biggest names.

Those in the know about cloud computing were surprised to see our story, “Amazon Analytics Experiences Setbacks,” http://inteltrax.com/?p=2591 since the book and cloud giant’s analytics offerings aren’t taking off like its Kindle.

On the upswing, we offered “Jaspersoft Climbing the BI Competition Ladder” http://inteltrax.com/?p=2595 detailing how one of our favorite BI vendors has made some bold moves pay off recently.

Back on the negative side of the spectrum, “Google Analytics Gets Weaker in Germany” http://inteltrax.com/?p=2588 tough data mining laws are keeping the search king from knowing too much about Germany’s users.

This is just a taste of the news we deliver. There’s never any telling from day-to-day when a major player will suffer a blow and when a little guy will climb higher. Sometimes vice versa. So we watch the big data game like a hawk, showing all sides of the story to give readers a full view of the roller coaster ride.

Follow the Inteltrax news stream by visiting http://www.inteltrax.com/

Patrick Roland, Editor, Inteltrax.

October 24, 2011

Make Case-Based Approximate Reasoning a Reality

October 23, 2011

I stumbled across an interesting book on Amazon.com that has received a great deal of attention over he past few years. The book is called Case-Based Approximate Reasoning (CBR) by Eyke Hullermeier.

CBR has established itself as a core methodology in the field of artificial intelligence. The key idea of CBR is to tackle new problems by referring to similar problems that have already been solved in the past. One reviewer wrote:

In the last years developments were very successful that have been based on the general concept of case-based reasoning. … will get a lot of attention and for a good while will be the reference for many applications and further research. … the book can be used as an excellent guideline for the implementation of problem-solving programs, but also for courses in Artificial and Computational Intelligence. Everybody who is involved in research, development and teaching in Artificial Intelligence will get something out of it.

The problem with CBR can be the time, effort, and cost required to create and maintain the rules. Automated systems work well if the inputs do not change. Flip in some human unpredictability and the CBR system can require baby sitting.

Jasmine Ashton, October 23, 2011

Sponsored by Pandia.com

Baseball Embraces SAS Analytics

October 20, 2011

Baseball as an institution is known for its love of numbers.  Now it’s embracing analytics.  KDNuggets reports more in, “Pittsburgh Pirates tap SAS Analytics.”

The article explains the use of statistics and analytics:

As ‘Moneyball’ has become a valued statistical approach to selecting talent, teams such as the Pittsburgh Pirates are also embracing analytics to improve operations and marketing and build stronger relationships with fans. Using SAS Visual Data Discovery, the Pirates surface a treasure trove of fan insights. The point-and-click interface gives quick entry to advanced analytics from SAS, the leader in business analytics.

The Pirates had previously used Microsoft Excel, but it’s widely known that the application of such flat data is challenging.  SAS will now allow the club to analyze everything from attendance to marketing to statistics.  Now to get back to that business of actually winning some games . . .

Keep in mind that SAS now has the Teragram text processing technology. You can put words with your numbers.

Emily Rae Aldridge, October 20, 2011

Sponsored by Pandia.com

Inteltrax: Top Stories, October 10 to October 14

October 17, 2011

Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, how analytic technology depends so heavily on funding and what those dollars signify.

Our feature story this week, “Palantir Back From the Grave,” http://inteltrax.com/?p=2775 details how one BI company suffered some near-fatal blows, but has bounced back with new software and confidence, thanks to some new funding.

Another funding-centric tale was our story, “Opera and Xignite Make Waves by Raising Millions” http://inteltrax.com/?p=2573 that showed two smaller companies on the rise thanks to some big time investments.

We turned the tables with “Actuate Analytics Contest Gets Attention” http://inteltrax.com/?p=2541 to show how one company is supporting the next generation of analytic thinkers by offering their financial support.

Money makes the big data globe spin, it’s no secret. But funding carries a lot of meaning in this industry, usually it’s a sign of impending success. We’ll see if that theory holds true, as we follow these and other stories in the ever-expanding world of data analytics.

Follow the Inteltrax news stream by visiting

http://www.inteltrax.com/

Patrick Roland, Editor, Inteltrax.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta