Imagine the Internet without Search Engines

April 17, 2014

Centrifuge Systems proposes an interesting idea in “Big Data Discovery Without Link Analysis Is Like The Web Without Google.” Centrifuge Systems asks readers of the short article to imagine using the Internet without a search engine. How would we locate information? It would be similar to the librarian’s favorite description of the Internet all the contents of a library spilled on the floor. The article continues to explain that big data without link analysis works the same as the Internet without a search engine.

What is link analysis?

“You can view link analysis as a data discovery technique that reveals the structure and content of information by representing it as a set of interconnected objects. When combined with a visual representation, an investigator can quickly gain an understanding of the strength of relationships and the frequency of contacts and immediately discover new associations. Link analysis offers an intuitive alternative to the traditional relational database formats and BI tools without deep technical expertise.”

It is a convincing analogy. To increase a potential client’s interest, Centrifuge Systems offers a Data Discovery Challenge, where the client is given a free solution. In other terms, it’s a free estimate for services. Big data is full of analytics, but has anyone other than Centrifuge Systems offer rich link analysis?

Whitney Grace, April 17, 2014
Sponsored by, developer of Augmentext

Big Data Buzzword Alert: Thick Data

April 14, 2014

I read “Your Big Data Is Worthless if You Don’t Bring It Into the Real World.” The article points out some often overlooked issues with Big Data. Now that the meaning of the phrase “Big Data” has morphed into a glory phrase, new wordsmithing is needed. This article uses the phrase “thick data.”

The article points out:

To really understand people, we must also understand the aspects of our experience — what anthropologists refer to as thick data. Thick data captures not just facts but the context of facts.

And then notes:

Rather than seeking to understand us simply based on what we do as in the case of big data, thick data seeks to understand us in terms of how we relate to the many different worlds we inhabit. Only by understanding our worlds can anyone really understand “the world” as a whole, which is precisely what companies like Google and Facebook say they want to do.

Will the phrase “thick data” add clarity to the explanations of the analytics frenzy evident in many vendors’ marketing materials? Will search vendor like IBM use the phrase to explain how Watson adds value to information processing?

Interesting semantic shift from “big” to “thick.”

Stephen E Arnold, April 14, 2014

Search and Big Data: Been There, Done That

April 12, 2014

Is the use of search to find information in large collections of content revolutionary? Er, no. What about using search to locate an Internet Protocol address in a repository of monitored email traffic? Er, no.

With the chatter on LinkedIn and the vacuous news releases from some floundering search companies, one would think that gathering up content and running a query was the equivalent of my ancestor stealing and ember and saying, “Look, I invented fire.”


Beyond the rather influential if specious IBM white paper published in 2010 (link is at, a large number of companies continue to position some old as new again.

One interesting twist on the “search is better than SQL” is the useful solution brief from RainStor. In some circles, RainStor has a low profile. In others, the company has caught the attention of some recognized “names” in the Big Data world; for example, Cloudera and Dell. So think Hadoop friendly.

RainStor focuses on cost effective solutions for gathering, archiving, and querying content. Like the old CrossZ technology, RainStor queries the compressed files. There are benefits from this approach. Unlike CrossZ, no proprietary routines have to be run to extract a data cube. The person looking for information can use standard query syntax using SQL, MapReduce, or off the shelf business intelligence tools.

If you are confused by peas-in-a-pod desperate for a cannery with cash, you will want to check out RainStor. The company’s Web site is I would have like RainStor to publish the numbers of their patents that were granted by the USPTO in 2013. The general description here reminded me of several other firms’ systems and methods.

Stephen E Arnold, April 12, 2014

Guidance to Sidestep Big Data Failure

April 8, 2014

The article titled The Metrics Missteps to Fix in 2014 on iMedia Connection offers tips to handle the revolution in marketing that has accompanied the big data era. The advice is aimed at marketers unable to keep up with the technology and all it has to offer. Clues such as only present necessary information and double-check your graphs for clarity may seem obvious enough, but the article offers these generic-sounding tips with a clear understanding of where marketers are generally failing. The article states,

“As data progresses and becomes more sophisticated, it is obvious there will be tons more to learn about the field. This means that it will be necessary to stay on the cutting edge, do your research, and constantly evaluate the quality of your information and reporting. It will often be a hard task, but those who dream of being extremely effective marketers will need to do it.”

There is both encouragement and insight to be found in this article, such as the tip headed “Think big…sample size” which goes on to explain that the most effective representation will be the largest collection of data. Both commitment and focus are necessary for a company to maintain a successful use of data.

Chelsea Kerwin, April 08, 2014

Sponsored by, developer of Augmentext

Meet The Armadillo

April 3, 2014

Armadillos are not native to France, but the Armadillo digital resources management company is. If you are curious to learn more about the French company peruse the “Company Overview” with a little assistance from Google Translate. Armadillo was founded in 1998 and has since acquired a very long and prestigious client list.

Armadillo’s products offer a range of services that include research and development of information technology, custom data solutions, and packages for various digital content. The products are, of course, advertised as a big data solution and can be customized for any data type, content, and organizational method.

The director describes his products as:

“Armadillo packages are integrated into the information systems of companies and other organizations to facilitate data exchange between former silos. This creates repositories harmonized content easily shared and guaranteed “up to date “. Our solutions have a broad functional coverage with excellent performance for near-zero operating costs. Our technology is based on the latest innovations proposed by the Semantic Web and Big Data.”

It looks like another big data player peddling the usual solutions, however, they have been around longer than other big data startups, so longevity and reliability is on their side.

Whitney Grace, April 03, 2014
Sponsored by, developer of Augmentext

You Want to Be a Real Data Scientist?

April 1, 2014

With $900 million  in funding, Cloudera is making an attempt to legitimize data scientists. If you have a degree in statistics from CalTech, that might not be enough to land you a job in the Clouderaverse.

The fix is revealed in “Cloudera Launches Data Scientist Certification.” According to the write up:

Consisting of an essentials exam and data science challenge, the new program helps developers, analysts, statisticians, and engineers get experience with relevant big data tools and techniques and validate their abilities while helping prospective employers identify elite, highly skilled data scientists.

Will Cloudera become the equivalent of the American Bar Association and NCEES? Cloudera challenges me to prove my expertise at the highest level. Okay. Will a doctorate from Cambridge University’s or Moscow State University’s math program do the trick? Would my relative (now deceased) Vladimir Ivanovich Arnold make the cut? (He was a data lackey for Kolmogorov, who could add and subtract pretty well my relative told me.)

In the world of information technology, the ability to make something work or to code up a script that solves a problem are useful skills. Tossing in numerical recipes for Big Data cook outs adds spice.

The problem, in my opinion, is that anyone from former middle school teachers to failed webmasters can say, “I’m a Big Data expert.” The lack of certification in some application spaces is normal. Enterprise search has no certification. Look at the outstanding track record consultants, search procurement teams, and vendors have compiled. Enterprise search delivers solutions that 50 to 75 percent of a system’s users find wanting. Big Data, Cloudera style, wants to avoid the enterprise search train wreck.

But can a cloud centric company become the equivalent of the 1950s beacon, the Good Housekeeping Seal of Approval, just for data wizards? Does the company’s move speak to the needs of Cloudera’s marketing organization or call attention to the abrogation of certification from institutions of higher education? Perhaps Cloudera has concerns about a “hiring gap”? One way to snag candidates is to offer to train them. The best and brightest become data fish in a Big Data barrel.

Stephen E Arnold, April 1, 2014

Fasten Your Seat Belt: A Big Data List with Some Surprises

March 31, 2014

I read “The World’s Top 10 Most Innovative Companies in Big Data.” I am not sure if this Fast Company article is news, content marketing, or analysis for the Silicon Valley set. I found the list of companies that apparently are going to get a chunk of the “$18 billion” market for Big Data surprising. I read the article riding in the back seat of a friend’s SUV. I had my seat belt fastened. The shocks in this article prevented me from jumping upwards and striking my head against the vehicle’s roof. You are now warned.

First, there are some obvious companies on the list, assembled according to one of those undisclosed analyses embraced by “real” journalists and mid tier consulting firms hungry for engagements.

I recognized these big names: GE (General Electric, maker of jet engines and other gear that continues  bring “good things” to one’s life), IBM (purchaser of companies like Cognos, Cybertap LLC, SPSS, and others in the data arena),

I had heard of Kaggle (for fee information and services) and Splunk, the company now in the gun sites of Elasticsearch, among others, for log file supremacy. I ran across Knewton (education) when I did a feature for Online Search Magazine not long ago.

There were some outfits that I had never heard of. My personal filtering system (Overflight) had little information about these organizations. New to me were Evolv (a personnel outfit), the Weather Company (not global worming type climate data, the environment and shopping angle), and Ayasdi (a visualization services firm funded by DARPA).

I think the word is “eclectic”i for this group.

But the two shocks were Mount Sinai Ichan School of Medicine (allegedly building the hospital of the future) and GNIP (another social media analytics firm).

Several observations:

First, the list raises more questions than it answers. What were the criteria used to determine who was able to make the cut for “most innovative.”

Second, what the heck is “innovation.” I think this word, like search and Big Data itself, is emerging as the go-to buzzword for the first half of 2014.

Third, are these outfits much different from hundreds of other organizations that process available data as a routine business process?

Beyond Search is surprised by the listicle itself and the helter skelter natures of the selection of companies. By the way, I thought IBM was in the game show winning business. Watson, Jeopardy, and revolutionizing health care just like Mount Sinai.

Little wonder folks are confused about Big Data. A dose of Google Flu might be necessary.

Stephen E Arnold, March 31, 2014

Tibco Connects with Popular Big Data Repositories

March 29, 2014

Tibco continues to grow. The Wall Street Journal’s Market Watch reveals, “TIBCO Expands Connectivity to Key Big Data Sources.” Now, users of the company’s Spotfire data analysis platform can connect directly to big data storehouses at Cloudera, Hortonworks, and Pivotal. The press release quotes VP of Spotfire product strategy, Lars Bauerle:

“Our ability to connect directly to these data sources, conduct in-database analysis, and mash-up the data in the worlds of Hadoop and others puts Spotfire in prime position for enterprises looking to get the most out of their data assets. Spotfire now further embraces data access in all forms, including Big Data architecture, enabling our customers to derive significantly greater value from their existing data.”

Cloudera development VP Tim Stevens added:

“Cloudera and TIBCO Big Data technologies complement one another by adding significant value to our joint customers’ IT environments. Until now, analytics and Hadoop have separately been two of the most significant enterprise technologies of the last few years. As these technologies come together in Spotfire, we see an opportunity for organizations to reap great business value as they build out their enterprise data hubs.”

Launched in 1997 and offering a range of infrastructure and business intelligence software, Tibco is based in Palo Alto, California. The company is so sure of the competitive edge granted by its BI software that it has trademarked the phrase “two-second advantage.”

Cynthia Murrell, March 29, 2014

Sponsored by, developer of Augmentext

Balance Big Data with Human Cognition

March 25, 2014

An example from the world of professional cricket illustrates the dangers of relying too much on data and too little on human intuition. ReadWrite tells us “How Big Data Fails to Make Big Plays in Sports.” Writer Matt Asay describes what went wrong when England’s data-smitten cricket coach lead his team to a striking defeat against Austrailia earlier this year.

The article recounts a strategy based on analysis of data from previous matches, one that left no wiggle room for shifting factors and certainly no tolerance for hunches. The result—a humiliating 5-0 defeat. Asay goes on to extrapolate the lesson to business decisions. He writes:

“Data complements decisions, but shouldn’t rule them, because data is never truly objective. Choosing which data to collect is a human judgment—so, too, are the questions we ask of it. Still, data need not always be subservient to human intuition. At my company, for example, we recently found through extensive A/B testing that our best guesses as to which email subject lines would be most effective were way off. We therefore calibrated our email campaign to match the data, not our intuition. This is where data comes in handy: measuring one’s intuition for accuracy. But it also serves to inform that same intuition, so that our next ‘best guess’ is more likely to succeed. In the case of England’s cricket team, rather than respond to data, coach Flowers was bowled over by it, sticking to data even when it clearly wasn’t paying off in wins. In sport or business, that’s what we call ‘a losing strategy.’”

Technology is great and all, but it is important to remember that it has its limitations. We are still a very long way from building a machine that can compare with the human brain, Watson notwithstanding. Heck, we don’t even fully understand that magnificent apparatus we’re born with. As big data just keeps getting bigger, the adage “trust your instincts” deserves to be reiterated.

Cynthia Murrell, March 25, 2014

Sponsored by, developer of Augmentext

Quote to Note: Big Data from a Xoogler Angle

March 24, 2014

Tucked deep into the Wall Street Journal’s stunning analysis of Big Data was a gem. Here’s the quote, allegedly made by Zest Finance’s big data dog, Douglas Merrill. This Xoogler is quoted in “Big Data Weird Data” as stating:

Machine learning isn’t replacing people.

Interesting. Many professionals at the Dubai intelligence conference in March 2014 were asserting that whizzy new systems worked without having to depend on humans.

Read the entire article in the Wall Street Journal, March 24, 2014, page R 5. It’s online but you may have to pay. Humans can be expensive when asked to do work like report on the underbelly of Big Data.

One question: I thought Google had figured out the automatic thinking thing. Is a disenchanted Googler not getting with the smart data program?

Stephen E Arnold, March 24, 2014

Next Page »