Information Manipulation: Accountability Pipe Dream

July 5, 2014

I read an article with what I think is the original title: “What does the Facebook Experiment Teach us? Growing Anxiety About Data Manipulation.” I noted that the title presented on Techmeme was “We Need to Hold All Companies Accountable, Not Just Facebook, for How They Manipulate People.” In my view, this mismatch of titles is a great illustration of information manipulation. I doubt that the writer of the improved headline is aware of the irony.

The ubiquity of information manipulation is far broader than Facebook twirling the dials of its often breathless users. Navigate to Google and run this query:

cloud word processing

Note anything interesting in the results list displayed for me on my desktop computer:


The number one ad is for Google. In the first page of results, Google’s cloud word processing system is listed three more times. I did not spot Microsoft Office in the cloud except in item eight: Is Google Docs Making Microsoft Word Redundant.

For most Google search users, the results are objective. No distortion evident.

Here’s what Yandex displays for the same query:


No Google word processing and no Microsoft word processing whether in the cloud or elsewhere.

When it comes to searching for information, the notion that a Web indexing outfit is displaying objective results is silly. The Web indexing companies are in the forefront of distorting information and manipulating users.

Flash back to the first year of the Bush administration when Richard Cheney was vice president. I was in a meeting where the request was considered to make sure that the vice president’s office Web site would appear in hits in a prominent position. This, gentle reader, is a request that calls for hit boosting. The idea is to write a script or configure the indexing plumbing to make darned sure a specific url or series of documents appears when and where they are required. No problem, of course. We created a stored query for the Fast Search & Transfer search system and delivered what the vice president wanted.

This type of results manipulation is more common than most people accept. Fiddling Web search, like shaping the flow of content on a particular semantic vector, is trivial. Search engine optimization is a fools’ game compared with the tried and true methods of weighting or just buying real estate on a search results page, a Web site from a “real” company.

The notion that disinformation, reformation, and misinformation will be identifiable, rectified, and used to hold companies accountable is not just impossible. The notion itself reveals how little awareness of the actual methods of digital content injection work.

How much of the content on Facebook, Twitter, and other widely used social networks is generated by intelligence professionals, public relations “professionals,” and folks who want to be perceived as intellectual luminaries? Whatever your answer, what data do you have to back up your number? At a recent intelligence conference in Dubai, one specialist estimated that half of the traffic on social networks is shaped or generated by law enforcement and intelligence entities. Do you believe that? Probably not. So good for you.

Amusing, but as someone once told me, “Ignorance is bliss.” So, hello, happy idealists. The job is identifying, interpreting, and filtering. Tough, time consuming work. Most of the experts prefer to follow the path of least resistance and express shock that Facebook would toy with its users. Be outraged. Call for action. Invent an algorithm to detect information manipulation. Let me know how that works out when you look for a restaurant and it is not findable from your mobile device.

Stephen E Arnold, July 5, 2014

Elasticsearch: Bulldozing Content Processing

June 7, 2014

When I left the intelligence conference in Prague, there were a number of companies in my graphic about open source search. When I got off the airplane, I edited my slide. Looks to me as if Elasticsearch has just bulldozed the search and content sector, commercialized open source group. I would not want to be the CEO of LucidWorks, Ikanow, or any other open sourcey search and content processing company this weekend.

I read “Elasticsearch Scores $70 Million to Help Sites Crunch Tons of Data Fast.” Forget the fact that Elasticsearch is built on Lucene and some home grown code. Ignore the grammar in “data fast.” Skip over the sports analogy “scores.” Dismiss the somewhat narrow definition of what Elasticsearch ELK can really deliver.

What’s important is the $70 million committed to Elasticsearch. Added to the $30 or $40 million the outfit had obtained before, we are looking at a $100 million bet on an open source search based business. Compare this to the trifling $40 million the proprietary vendor Coveo had gathered or the $30 million put on LucidWorks to get into the derby.

I have been pointing out that Elasticsearch has demonstrated that it had several advantages over its open source competitors; namely, developers, developers, and developers.

Now I want to point out that it has another angle of attack: money, money, and money.

With the silliness of the search and content processing vendors’ marketing over the last two years, I think we have the emergence of a centralizing company.

No, it’s not HP’s new cloudy Autonomy. No, it’s not the wonky Watson game and recipe code from IBM. No, it’s not the Google Search Appliance, although I do love the little yellow boxes.

I will be telling those who attend my lectures to go with Elasticsearch. That’s where the developers and the money are.

Stephen E Arnold, June 7, 2014

Watson: The Most Gifted Digital Chef Using Butternut Squash

May 30, 2014

Does silicon have taste buds? Do algorithms sniff the essence of Kentucky barbecue?

I read a darned amazing article called “I Tasted BBQ Sauce Made By IBM’s Watson, And Loved It.” The write up reports that IBM and partner Co.Design used the open source, home grown code, and massive database to whip up a recipe for grilling. IBM is going whole hog with the billion dollar baby Watson, which is supposed to be one of IBM’s revenue fountains any day now.

According the write up, which may or may not have the ingredients of a “real” news story:

Most BBQ sauces start with ingredients like vinegar, tomatoes, or even water, but IBM’s stands out from the get go. Ingredient one: White wine. Ingredient two: Butternut squash. The list contains more Eastern influences, such as rice vinegar, dates, cilantro, tamarind (a sour fruit you may know best from Pad Thai), cardamom (a floral seed integral to South Asian cuisine) and turmeric (the yellow powder that stained the skull-laden sets of True Detective) alongside American BBQ sauce mainstays molasses, garlic, and mustard.

And most important for the grillin’ fans in Harrod’s Creek, the author used the Watson concoction of tofu. I am not sure that the folks in Harrod’s Creek know what tofu is. I do know that the idea of creating a barbecue sauce without bourbon in it is a culinary faux pas. Splash tamarind on a couple of dead squirrels parked above the coals, and the friends of Daniel Boone may skin the offender and think about grillin’ something larger than a squirrel.

The author who is scoring the tofu and broccoli treat reports:

I test it again and again. Finally I just slather my plate in the stuff. It’s delicious–the best way I can describe it is as a Thai mustard sauce, or maybe the middle point between a BBQ sauce and a curry. Does that sound gross? I assure you that it isn’t…But as I mop my plate of the last drips of Bengali Butternut BBQ Sauce, contemplating the difference between a future in which computers addict us to the next Lean Cuisine and one where they attempt to eradicate us with Terminators, Napoleon’s old adage comes to mind: An army marches on its stomach. He–or that–who controls our stomachs controls it all.

Yes. From game show win to a tofu topping, IBM Watson is redefining search, corporate strategy, and the vocabulary of cuisine for tofu and broccoli lovers. Kentucky frshly killed and skinned grilled squirrel may not benefit.

Anyone who suggests that vendors of information retrieval technology have lost their keen marketing edge, you are not in touch with butternut squash and reality. Should the digital chefs Put Kentucky bourbon in Bengali Butternut BBQ Sauce? Myron Mixon, the winningest man in barbecue, may say, “That’s what I am talkin’ for my whole hog.” Couild IBM sponsor the barbecue cook off program? Mr. Mixon may be a lover of tamarind and tofu too.

Stephen E Arnold, May 30, 2014

Watson on the Move: Cognea

May 20, 2014

I wanted to associate Cognos with Cognea. Two different things. IBM’s Watson unit, according to “IBM Watson Acquires Artificial Intelligence Startup Cognea,” is beefing up its artificial intelligence capabilities. Facebook, Google, and other outfits are embracing the dreams of artificial intelligence like it is 1981 when Marvin Weinberger was giving talks about AI’s revolutionizing information processing. I have lost track of Marvin, although I recall his impassioned polemics, 30 years after hearing him lecture. Unfortunately I remain skeptical about “artificial intelligence” because Watson, as I understood the pitch after Jeopardy, was already super smart. I suppose Cognea can add some marketing credibility to Watson. That system is curing disease and performing wonders for the insurance industry, if I embrace the IBM public relations’ flow.

In my lectures about the Big O problem, I point out that many of today’s smartest systems (for example, Search2, to name one) implements clever methods to make well known numerical recipes run like a teenager who just gulped three cans of Jolt Cola followed by a Red Bull energy drink.

The reality is that there are more sophisticated mathematical tools available. The problem is that the systems available cannot exploit these algorithmic methods. I am pretty confident that Cognea tells a great story. I am even more confident that IBM will do the “Vivisimo” thing with whatever technology Cognea actually has. Without a concrete demo, benchmarks, and independent evaluations, I will remain skeptical about “a cognitive computing and conversational artificial intelligence platform.”

I am far more interested in the Cybertap technology that IBM acquired and seems to  be keeping under wraps. Cybertap works. Artificial intelligence, well, it depends on how one defines “artificial” and “intelligence” doesn’t it?

Stephen E Arnold, May 20, 2014

Trifles in Enterprise Search History

May 6, 2014

Search conferences are, in my experience, context free. The history of enterprise search is interesting and contains useful examples pertaining to findability. Stephen E Arnold’s new video is “Trifles from Enterprise Search History.” The eight minute video reviews developments from the late 1970s and early 1980s. These mini snapshots provide information about where some of the hottest concepts today originated. Do you think MarkLogic invented an XML data management system that could do search and analytics? The correct answer may be Titan Search. What about “inventing” an open source search business model. Do you think Lucid Imagination, now Lucid Works, cooked up the concept of challenging proprietary systems with community created software? The correct answer may be Fulcrum Technologies’ early concoction of home brew code with the WAIS server. What about the invention of jargon that permeates discussions of content processing. A good example is a “parametric cube”. Is this the conjuring of Spotfire and Palantir? Verity is, in Mr. Arnold’s view, the undisputed leader in this type of lingo in its attempts to sell search without using the word “search.” Grab some SkinnyPop and check out Trifles.

Kenneth Toth, May 6, 2014

SAS Text Miner Gets An Upgrade

May 5, 2014

SAS is a well-recognized player in IT game as a purveyor of data, security, and analytics software. In modern terms they are a big player in big data and in order to beef up their offerings we caught word that SAS had updated its Text Miner. SAS Text Miner is advertised as a way for users to not only harness information in legacy data, but also in Web sites, databases, and other text sources. The process can be used to discover new ideas and improve decision-making.

SAS Text Miner a variety of benefits that make it different from the standard open source download. Not only do users receive the license and tech support, but Text Miner offers the ability to process and analyze knowledge in minutes, an interactive user interface, and predictive and data mining modeling techniques. The GUI is what will draw in developers:

“Interactive GUIs make it easy to identify relevance, modify algorithms, document assignments and group materials into meaningful aggregations. So you can guide machine-learning results with human insights. Extend text mining efforts beyond basic start-and-stop lists using custom entities and term trend discovery to refine automatically generated rules.”

Being able to modify proprietary software is a deal breaker these days. With multiple options for text mining software, being able to make it unique is what will sell it.

Whitney Grace, May 05, 2014
Sponsored by, developer of Augmentext

Meme Attention Deficit

April 27, 2014

I read “Algorithm Distinguishes Memes from Ordinary Information.” The article reports that algorithms can pick out memes. A “meme”, according to Google, is “an element of a culture or system of behavior that may be considered to be passed from one individual to another by nongenetic means, especially imitation.” The passage that caught my attention is:

Having found the most important memes, Kuhn and co studied how they have evolved in the last hundred years or so. They say most seem to rise and fall in popularity very quickly. “As new scienti?c paradigms emerge, the old ones seem to quickly lose their appeal, and only a few memes manage to top the rankings over extended periods of time,” they say.

The factoid that reminded me how far smart software has yet to travel is:

To test whether these phrases are indeed interesting topics in physics, Kuhn and co asked a number of experts to pick out those that were interesting. The only ones they did not choose were: 12. Rashba, 14. ‘strange nonchaotic’ and 15. ‘in NbSe3′. Kuhn and co also checked Wikipedia, finding that about 40 per cent of these words and phrases have their own corresponding entries. Together this provides compelling evidence that the new method is indeed finding interesting and important ideas.

Systems produce outputs that are not yet spot on. I concluded that scientists, like marketers, like whizzy new phrases and ideas. Jargon, it seems, is an important part of specialist life.

Stephen E Arnold, April 27, 2014

Small Analytics Firms Reaping the Benefit of Investment Cycle

April 23, 2014

Small time analytics isn’t really as startup-y as people may think anymore. These companies are in high demand and are pulling in some serious cash. We discovered just how much and how serious from a recent Cambridge Science Park article, “Cambridge Text Analytics Linguamatics Hits $10m in Sales.”

According to the story:

Linguamatics’ sales showed strong growth and exceeded ten million dollars in 2013, it was announced today – outperforming the company’s targeted growth and expected sales figures.  The increased sales came from a boost in new customers and increased software licenses to existing customers in the pharmaceutical and healthcare sectors. This included 130 per cent growth in healthcare sales plus increased sales in professional services.

This earning potential has clearly grabbed the attention of investors. This, is feeding a cycle of growth, which is why the Linguamaticses of the world can rake in impressive numbers. Just the other day, for example, Tech Circle reported on a microscopic Mumbai big data company that landed $3m in investments. They say it takes money to make money and right now, the world of big data analytics has that cycle down pat. It won’t last forever, but it’s fun to watch as it does.

Patrick Roland, April 23, 2014

Sponsored by, developer of Augmentext

Digging for Data Gold

April 1, 2014

Tech Radar has an article that suggests an idea we have never heard before: “How Text Mining Can Help Your Business Dig Gold.” Be mindful that was a sarcastic comment. It is already common knowledge that text mining is advantageous tool to learn about customers, products, new innovations, market trends, and other patterns. One of big data’s main scopes is capturing that information from an organization’s data. The article explains how much data is created in a single minute from text with some interesting facts (2.46 million Facebook posts, wow!).

It suggests understanding the type of knowledge you wish to capture and finding software with a user-friendly dashboard. It ends on this note:

“In summary, you need to listen to what the world is trying to tell you, and the premier technology for doing so is “text mining.” But, you can lean on others to help you use this daunting technology to extract the right conversations and meanings for you.”

The entire article is an overview of what text mining can do and how it is beneficial. It does not go further than basic explanations or how to mine the gold in the data mine. That will require further reading. We suggest a follow up article that explains how text mining can also lead to fool’s gold.

Whitney Grace, April 01, 2014
Sponsored by, developer of Augmentext

Twenty Electric Text Analytics Platforms

March 11, 2014

Butler Analytics collected a list of “20+ Text Analytics Platforms” that delve through the variety of text analytics platforms available and what their capabilities are. According to the list, text analytics has not reached its full maturity yet. There are three main divisions in the area: natural language processing, text mining, and machine learning. Each is distinct and each company has their own approach to using these processes:

“Some suppliers have applied text analytics to very specific business problems, usually centering on customer data and sentiment analysis. This is an evolving field and the next few years should see significant progress. Other suppliers provide NLP based technologies so that documents can be categorized and meaning extracted from them. Text mining platforms are a more recent phenomenon and provide a mechanism to discover patterns that might be used in operational activities. Text is used to generate extra features which might be added to structured data for more accurate pattern discovery. There is of course overlap and most suppliers provide a mixture of capabilities. Finally we should not forget information retrieval, more often branded as enterprise search technology, where the aim is simply to provide a means of discovering and accessing data that are relevant to a particular query. This is a separate topic to a large extent, although again there is overlap.”

Reading through the list shows the variety of options users have when it comes to text analytics. There does not appear to be a right or wrong way, but will the diverse offerings eventually funnel

down to few fully capable platforms?

Whitney Grace, March 11, 2014
Sponsored by, developer of Augmentext

« Previous PageNext Page »