CyberOSINT banner

Big Data Lake: Are the Data Safe to Consume?

August 2, 2015

I read “The Analytics Journey Leading to the Business Data Lake.” Data lake is one of the terms floating around (pun definitely intended!) to stimulate sales. If one has a great deal of water, one needs a place to put it. Even though water is dammed, piped, used, recycled, and dumped—storage is the key.

Enter EMC, a company which is in the business of helping those with water store it and make use of that substance.

The write up reflects effort. I assume there was a PowerPoint slide deck in the mix. There are some snazzy graphics. Here’s one that caught my eye:


Instead of enterprise search being the go-to enterprise software solution, EMC has slugged in the following umbrella terms:

  • Information ecosystem
  • Business intelligence (perhaps an oxymoron in light of this article)
  • Advanced analytics (obviously because regular analytics just are zippy enough)
  • Knowledge layer (I remain puzzled about knowledge because I have a tough time defining. In fact, I resigned from my for fee knowledge management column because I just don’t know what the heck “knowledge” means.)
  • The unfathomable data lake (yep, pun intended). What’s wrong with the word “storage” or “database” by the way?
  • Master data which is also baffling. Is there servant data too?
  • Machine data. Again I have no clue what this means.

The chart scatters undefined and fuzzy buzzwords like a crazed Jethro Tull, a water soluble blend of Jethro Tull (inventor of the seed drill) and Jethro Tull (the commercially successful and eccentric rock bands).

The write up is important because EMC has sucked in the jargon and assertions once associated with enterprise search and applied them to the dark and mysterious data lake.

I highlighted:

Our data lake is one logical data platform with multiple tiers of performance and storage levels to optimally serve various data needs based on Service Level Agreements (SLA). It will provide a vast amount of structured and unstructured data at the Hadoop and Greenplum layers to data scientists for advanced analytics innovation. The higher performance levels powered by Greenplum and in-memory caching databases will serve mission-critical and real-time analytics and application solutions. With more robust data governance and data quality management, we can ensure authoritative, high-quality data driving all of EMC business insights and analytics driven applications using data services from the lake.

Ah, the Mariana Trench of enterprise information: Governance. Like “knowledge” and “advanced analytics”,  governance has euphony. I think of the water lapping against the shore of Lake Paseco.

So what? Several observations:

  1. This type of “suggest lots” marketing ended poorly for a number of companies who used this type of rhetoric when marketing search
  2. The folks who swallow this bait are likely to find themselves in a most uncomfortable spot
  3. The problems associated with making use of information to improve decision making by reducing risk are not going to be solved by crazy diagrams and unsupported assertions.

EMC has been able to return revenue growth. But the company’s profit margin has flat lined.


I am not sure that increasing the buzzword density in marketing write ups will help angle the red lines to low earth orbit. With better margins, it is much easier to check out the topographic view and see where lakes meet land.

Stephen E Arnold, August 2, 2015

Watson: Following in the Footsteps of America Online with PR, not CD ROMs

July 31, 2015

I am now getting interested in the marketing efforts of IBM Watson’s professionals. I have written about some of the items which my Overflight system snags.

I have gathered a handful of gems from the past week or so. As you peruse these items, remember several facts:

  • Watson is Lucene, home brew scripts, and acquired search utilities like Vivisimo’s clustering and de-duplicating technology
  • IBM said that Watson would be a multi billion dollar business and then dropped that target from 10 or 12 Autonomy scale operations to something more modest. How modest the company won’t say.
  • IBM has tallied a baker’s dozen of quarterly reports with declining revenues
  • IBM’s reallocation of employee resources continues as IBM is starting to run out of easy ways to trim expenses
  • The good old mainframe is still a technology wonder, and it produces something Watson only dreams about: Profits.

Here we go. Remember high school English class and the “willing suspension of disbelief.” Keep that in mind, please.

ITEM 1: “IBM Watson to Help Cities Run Smarter.” The main assertion, which comes from unicorn land, is: “Purple Forge’s “Powered by IBM Watson” solution uses Watson’s question answering and natural language processing capabilities to let users  ask questions and get evidence-based answers using a website, smartphone or wearable devices such as the Apple Watch, without having to wait for a call agent or a reply to an email.” There you go. Better customer service. Aren’t government’s supposed to serve its citizens? Does the project suggest that city governments are not performing this basic duty? Smarter? Hmm.

ITEM 2: “Why I’m So Excited about Watson, IBM’s Answer Man.” In this remarkable essay, an “expert” explains that the president of IBM explained to a TV interviewer that IBM was being “reinvented.” Here’s the quote that I found amusing: “IBM invented almost everything about data,” Rometty insisted. “Our research lab was the first one ever in Silicon Valley. Creating Watson made perfect sense for us. Now he’s ready to help everyone.” Now the author is probably unaware that I was, lo, these many years ago, involved with an IBM Herb Noble who was struggling to make IBM’s own and much loved STAIRS III work. I wish to point out that Silicon Valley research did not have its hands on the steering wheel when it came to the STAIRS system. In fact, the job of making this puppy work fell to IBM folks in Germany as I recall.

ITEM 3: “IBM Watson, CVS Deal: How the Smartest Computer on Earth Could Shake Up Health Care for 70m Pharmacy Customers.” Now this is an astounding chunk of public relations output. I am confident that the author is confident that “real journalism” was involved. You know: Interviewing, researching, analyzing, using Watson, talking to customers, etc. Here’s the passage I highlighted: “One of the most frustrating things for patients can be a lack of access to their health or prescription history and the ability to share it. This is one of the things both IBM and CVS officials have said they hope to solve.” Yes, hope. It springs eternal as my mother used to say.

If you find these fact filled romps through the market activating technology of Watson, you may be qualified to become a Watson believer. For me, I am reminded of Charles Bukowski’s alleged quip:

The problem with the world is that the intelligent people are full of doubts while the stupid ones are full of confidence.

Stephen E Arnold, July 31, 2015

Bing Is Very Important, I Mean VERY Important

July 31, 2015

The online magazine eWeek published, “What The Bing Search Engine Brings To Microsoft’s Web Strategy” and it explains how Bing spurs a lot of debate:

“Some who don’t like the direction in which Google is going say that Bing is the search engine they prefer, especially since Microsoft has honed Bing’s ability to deliver relevant results. Others, however, look at Bing as one of many products from Microsoft, which is still seen as the “Evil Empire” in some quarters and a search platform that’s incapable of delivering the results that compare favorably with Google. Bing, introduced six years ago in 2009, is still a remarkably controversial product in Microsoft’s lineup. But it’s one that plays an important role in so many of the company’s Internet services.”

Microsoft is ramping up Bing to become a valuable part of its software services, it continues its partnership with Yahoo and Apple, and it will also power AOL’s web advertising and search.  Bing is becoming a more respected search engine, but what does it have to offer?

Bing has many features it is using to entice people to stop using Google.  When searching a person’s name, search results display a bio of the person (only if they are affluent, however).  Bing has a loyalty program, seriously, called Bing Rewards, the more you search on Bing it rewards points that are redeemable for gift cards, movie rentals, and other items.

Bing is already a big component in Microsoft software, including Windows 10 and Office 365.  It serves as the backbone for not only a system search, but searching the entire Internet.  Think Apple’s Spotlight, except for Windows.  It also supports a bevy of useful applications and do not forget about Cortana, which is Microsoft’s answer to Siri.

Bing is very important to Microsoft because of the ad revenue.  It is just a guess, but you can always ask Cortana for the answer.

Whitney Grace, July 31, 2015
Sponsored by, publisher of the CyberOSINT monograph



Finnish Content Discovery Case Study

July 31, 2015

There are many services that offer companies the ability to increase their content discover.  One of these services is Leiki, which offers intelligent user profiling, context-based intelligence, and semantic SaaS solutions.  Rather than having humans adapt their content to get to the top of search engine results, the machine is altered to fit a human’s needs.  Leiki pushes relevant content to a user’s search query.  Leiki released a recent, “Case Study: Lieki Smart Services Increase Customer Flow Significantly At Alma Media.”

Alma Media is one of the largest media companies in Finland, owning many well-known Finnish brands.  These include Finland’s most popular Web site, classified ads, and a tabloid newspaper.  Alma Media employed two of Leiki’s services to grow its traffic:

“Leiki’s Smart Services are adept at understanding textual content across various content types: articles, video, images, classifieds, etc. Each content item is analyzed with our semantic engine Leiki Focus to create a very detailed “fingerprint” or content profile of topics associated with the content.

SmartContext is the market leading service for contextual content recommendations. It’s uniquely able to recommend content across content types and sites and does this by finding related content using the meaning of content – not keyword frequency.

SmartPersonal stands for behavioral content recommendations. As it also uses Leiki’s unique analysis of the meaning in content, it can recommend content from any other site and content type based on usage of one site.”

The case study runs down how Leiki’s services improved traffic and encouraged more users to consume its content. Leiki’s main selling point in the cast study is that offers users personal recommendations based on content they clicked on Alma Media Web sites.  Leiki wants to be a part of developing Web 3.0 and the research shows that personalization is the way for it to go.

Whitney Grace, July 31, 2015
Sponsored by, publisher of the CyberOSINT monograph

An Obscure Infographic About London Coffee Shops

July 29, 2015

Here’s a unique pair of graphics, particularly of interest for anyone who can see themselves enjoying a cup of joe in London. Gizmodo presents “A Taxonomy of Hip Coffee Shop Names.” The infographic from Information is Beautiful lays out London’s hipster coffee shops by both naming convention and location. Both charts size their entries by popularity– the more popular a shop the bigger disk (coaster?) its name sits upon. The brief write-up sets the scene:

“As you walk down the sidewalk, you see a chalkboard in the distance. As you step a little closer, you smell the deep musk of coffee emanating from an artfully distressed front door. Out steps a man with a beard, a Mac slung under his arm, sipping from small re-useable flat white-sized cup. You’ve stumbled across another hip coffee shop. Now, what’s it called?

“Information is Beautiful … breaks the naming structure down by type: there are ones themed around drugs, chatter, beans, brewing, socialism and more. But they all share one thing in common: they sound just like they could be hand-painted above that scene you just saw.”

So, if you like coffee, London, hipsters, or taxonomy-graphics, take a gander. From Alchemy to Maison d’être to Window, a shop or two are sure to peak the curiosity.

Cynthia Murrell, July 29, 2015

Sponsored by, publisher of the CyberOSINT monograph


Web Sites Going The Way Of The Dodo

July 24, 2015

Apps are supposed to replace Web sites, but there is a holdup for universal adoption. Search Engine Watch explains why Web sites are still hanging tight and how a new Google acquisition might be a game changer: “The Final Hurdle Is Cleared-Apps Will Replace Web Sites.”  The article explains that people are “co-users” of both apps and classic Web sites, but online browsers are still popular.  Why is that?

Browsers are universal and can access any content with a Web address.  Most Web sites also do not have an app counterpart, so the only way to access content is to use the old-fashioned browser.  Another issue is that apps cannot be crawled by search engines, so they are left out of search results. The biggest pitfall for apps is that they have to be downloaded in order to be accessed, which takes up screen space and disk space.

A startup has created a solution to making apps work faster:

“Agawi has developed a technology to stream apps, just like Netflix streams videos. Instead of packaging the entire app into a single, large file for the user to download, the app is broken up into many small files, letting users interact with small portions of the app while the rest of it is downloading.  In the short term, it appears that Google wants to deploy Agawi for users try an app before downloading the full version.”

Google acquired Agawi, but do not expect it to be accessible soon.  Google enjoys putting its own seal of approval on all acquisitions and making sure it works well.  Mobile device usage is increasing and more users are moving towards using them over traditional computers.  Search marketers will need to be more aware than ever about how search engines work with apps and encourage clients to make an app.


Whitney Grace, July 24, 2015
Sponsored by, publisher of the CyberOSINT monograph

One Million Minutes of Unfindable Video

July 23, 2015

I read “AP Makes One Million Minutes of Historical Footage Available on YouTube.”  This struck me as an anomaly. The AP is an outfit which, as I recall, rattled sabers and showed knives to people who quote from their articles. Also, the AP is in a revenue hunt; that is, the good old days of newspapers are history. The company is, like many outfits sired in the stable of dead tree journalism, adapting. Need a real time news feed with search, the AP offers this via a tie up with a former Bell Labs’ person. I will wager $1.00 in pennies that you cannot name the vendor? Send your answers to benkent2020 at yahoo dot com.

The AP write up reports that lots of video has been digitized and placed on YouTube. There are links to videos which AP finds interesting. The word “find” brings up an interesting question: “How does one locate a video?” and “How does one locate a series of related videos?” and “How does one find a video with a specific segment of text in it?” and “How does one find a video with a specific image in it?”

The answer, gentle reader, is that one cannot. I know that AP is excited about this collection. I assume that Google is pleased that the collection is not on Facebook.

As a user, the approach to locating a video is somewhat unsatisfying. Prepare your patient self to guess keywords, click, and watch in serial fashion one million videos. Well, maybe a couple.

Without search, this collection, like Google’s Life Magazine images, is useful to folks with time on their hands and even more time on their hands. A dump is not useful to me. To you, gentle reader, and to the executives at AP, I am picking nits. The problem is that these nits are the size of the synthetic creatures in Jurassic World. Big nits. My hunch is that the ad revenue from these videos will be the size of regular, run of the mill nits. I hope  I am wrong. Don’t forget to submit the name of the AP’s real time, online news intelligence service. I will accept entries for 24 hours.

Stephen E Arnold, July 23, 2015

Semantic Search: How Far Will This Baloney Tube Stretch?

July 12, 2015

I have seen a number of tweets, messages, and comments about “Semantic Search: the Future of Search Marketing?”

For those looking for traffic, it seems that using the phrase “semantic search” in conjunction with “search marketing” is Grade A click bait. Go for it.

My view is a bit different. I think that the baloney manufactured from semantic search (more correctly the various methods that can be grouped under the word semantic) is low grade baloney.

Search marketing is on a par with the institutional pizza pumped out for freshman in a dorm in DeKalb, Illinois. Yum, tasty. What is it? Oh, I know it is something that is supposed to be nutritious and tasty. The reality is that the pizza isn’t. That’s search marketing. The relevant result may not be. Relevance is jiggling results so that a message is displayed whether the user wants that message or not. Not pizza.

Here’s a passage in the write up I highlighted in pale yellow, the color in my marker set closest to the dorm pizza:

Semantic search is the technology the search engines employ to better understand the context of a search.

Contrast this definition with this one from “Breakthrough Analysis: Two + Nine Types of Semantic Search” published in 2010, five years before the crazy SEO adoption of the buzzword, if not the understanding of what “semantic” embraces:

Semantics (in an IT setting) is meaningful computing: the application of natural language processing (NLP) to support information retrieval, analytics, and data-integration that compass both numerical and “unstructured” information.

The article then trots out these semantic search options:

  1. Related searches and queries
  2. Reference results (dictionary look up)
  3. Annotated results
  4. Similarity search
  5. Syntactic annotations
  6. Concept search
  7. Ontology based search
  8. Semantic Web search
  9. Faceted search
  10. Clustered search
  11. Natural language search

Now there are many, many issues with this list. How about differentiating faceted, concept, and clustered search? Give up yet?

The point is that semantic search is not one thing. If one accepts this list as the touchstone, the functions referenced are going to contain other content processing operations.

The problem is that these functions on their own or used in some magical, affordable combination are not likely to deliver what the user wants.

The user wants relevant results which pertain directly to her specific information need.

The search engine optimization and marketing crowd want the results to be what they want to present to a user.

The objectives are different and may not be congruent or even similar.

In short, the notion of taking crazy, generalized concepts and slapping them on marketing is likely to produce howlers like this write up and the equally wonky list from 2010.

The point is that semantic baloney has been in the supermarket for a long time.

Obviously this baloney has a long shelf life.

In the meantime, how is ad supported Web search working for you? Oh, how is that in house information access system working for you?

If you want traffic, buy Adwords. Please, do not deliver to me the six pack of baloney.

Stephen E Arnold, July 12, 2015

Dealing with Company and Product Identity: Terbium Labs Nails It

July 11, 2015

Navigate to and read about the company.


Nifty name. Very nifty name indeed. Now, a bit of branding commentary.

I used to work at Halliburton Nuclear. Ah, the good old days of nuclear engineers poking fun at civil engineers and mathematicians not understanding any joke made my the computer engineers.

The problem of naming companies in high technology disciplines is a very big one. Before Halliburton gobbled up the Nuclear Utility Services outfit, the company with more than 400 nuclear engineers on staff struggled with its name. Nuclear Utility Services was abbreviated to NUS. A pretty sharp copywriter named Richard Harrington of the dearly loved Ketchum, McLeod and Gove ad agency came up with this catchy line:

After the EPA, call NUS.

The important point is that Mr. Harrington, a whiz person, wanted to have people read each letter: E-P-A, not say eepa and say N-U-S not say noose. In Japanese, the sound “nus” has a negative meaning usually applied to pressurized body odor emissions. Not good.

Search and content processing vendors struggle with names. I have written about outfits which have fumbled the branding ball. Examples range from Thunderstone which has been usurped by a gaming company. Brainware which has been snagged and used for interesting videos. Smartlogic whose name has been appropriated by a smaller outfit doing marketing/design stuff. There are names which are impossible to find; for example, i2, AMI, and ChaCha to name a few among many.

I want to call attention to a quite useful product naming which I learned about recently. Navigate to Consider the word Terbium. Look for the word “Matchlight.”

I find Terbium a darned good word because terbium is an element, which my old (and I mean old) chemistry professor pronounced “ter-beem”). The element has a number of useful applications. Think solid sate devices and as a magic ingredient in some rocket fuels and—okay, okay—some explosives.

But as good as “terbium” is for a company I absolutely delight in this product name:


Now what’s Matchlight and why should anyone care. My hunch is that the technology which allows a next generation approach to content identification and other functions works to

  • light a match in the wilderness
  • illuminate a dark space
  • start a camp fire so I can cook a goose

You can and should learn more about Terbium Labs and its technology. The names will help you remember.

Important company; important technology. Great name Matchlight. (Hear that search and content processing vendors with dud names?)

Stephen E Arnold, July 11, 2015

Sprinklr Aims to Conquer Consolidation Market

July 8, 2015

Sprinklr is in a race with the likes of Salesforce as well as fellow social-consolidation startups. Forbes declares, “Sprinklr Acquires NewBrand, the $1 Billion Social Startup’s Seventh Buy in 18 Months.” Back when social media was new, companies scrambled to leverage its potential with a hodgepodge of tools. Now, Sprinklr founder Ragy Thomas sees a wave of consolidation approaching, as companies tire of struggling to unite disparate solutions. Writer Alex Konrad writes:

“Sprinklr is one of a number of companies facing pressure to provide a more complete stack to brands looking to integrate their social marketing and customer support, Thomas says. An obvious example is the Salesforce Marketing Cloud, built off a nucleus of its own acquisitions like ExactTarget, Buddy Media and Radian6. Demand for a more end-to-end solution has intensified in the last year, Thomas argues. That’s why Sprinklr has acquired so much and so quickly, the CEO argues, typically taking the absorbed startup and absorbing its code directly into Sprinklr’s main code. …

“Sprinklr will face competition from also well-financed startups like Percolate as well as from larger suite offerings like Salesforce. ‘We are in a race against time to provide the capability to brands,’ Thomas says. ‘It’s becoming a three or four horse race with a clear set of companies that big brands can bank on moving forward.’”

 At the moment, it looks like Sprinklr may be ahead in that race; predictive-analytics/ business-intelligence firm NewBrand is its seventh acquisition since the beginning of 2014. NewBrand launched in 2010, and is based in Washington, DC.

 Ragy Thomas founded Sprinklr in 2009. The company is headquartered in New York City, with offices around the world. The other six companies it has snapped up include Scup, Get Satisfaction, Pluck, Branderati, TBG Digital, and Dachis Group.

Cynthia Murrell, July 8, 2015

Sponsored by, publisher of the CyberOSINT monograph

Next Page »