CyberOSINT banner

Hire Watson As Your New Dietitian

August 4, 2015

IBM’s  supercomputer Watson is being “trained” in various fields, such as healthcare, app creation, customer service relations, and creating brand new recipes.  The applications for Watson are possibly endless.  The supercomputer is combining its “skills” from healthcare and recipes by trying its hand at nutrition.  Welltok invented the CaféWell Health Optimization Platform, a PaaS that creates individualized healthcare plans, and it implemented Watson’s big data capabilities to its Healthy Dining CaféWell personal concierge app.  eWeek explains that “Welltok Takes IBM Watson Out To Dinner,” so it can offer clients personalized restaurant menu choices.

” ‘Optimal nutrition is one of the most significant factors in preventing and reversing the majority of our nation’s health conditions, like diabetes, overweight and obesity, heart disease and stroke and Alzheimer’s,’ said Anita Jones-Mueller, president of Healthy Dining, in a statement. ‘Since most Americans eat away from home an average of five times each week and it can be almost impossible to know what to order at restaurants to meet specific health needs, it is very important that wellness and condition management programs empower  smart dining out choices. We applaud Welltok’s leadership in providing a new dimension to healthy restaurant dining through its groundbreaking CaféWell Concierge app.’”

Restaurant menus are very vague when it comes to nutritional information.  When it comes to knowing if something is gluten-free, spicy, or a vegetarian option, the menu will state it, but all other information is missing.  In order to find a restaurant’s nutritional information, you have to hit the Internet and conduct research.  A new law passed will force restaurants to post calorie counts, but that will not include the amount of sugar, sodium, and other information.  People have been making poor eating choices, partially due to the lack of information, if they know what they are eating they can improve their health.  If Watson’s abilities can decrease the US’s waistline, it is for the better.  The bigger challenge would be to get people to use the information.

Whitney Grace, August 4, 2015
Sponsored by, publisher of the CyberOSINT monograph


Poor IBM i2: 15 Year Old Company Makes Headlines in Fraud Detection and Big Blue Is Not Mentioned

August 3, 2015

Before IBM purchased i2 Ltd from an investment outfit, I did some work for Mike Hunter, one of the founders of i2 Ltd. i2 is not a household name. The fault lies not with i2’s technology; the fault lies at the feet of IBM.

A bit of history. Back in the 1990s, Hunter was working on an advanced degree in physics at Cambridge University. HIs undergraduate degree was from Manchester University. At about the same time, Michael Lynch, founder of Autonomy and DarkTrace, was a graduate of Cambridge and an early proponent of guided machine learning implemented in the Digital Reasoning Engine or DRE, an influential invention from Lynch’s pre Autonomy student research. Interesting product name: Digital Reasoning Engine. Lynch’s work was influential and triggered some me too approaches in the world of information access and content processing. Examples can be found in the original Fast Search & Transfer enterprise systems and in Recommind’s probabilistic approach, among others.

By 2001, i2 had placed its content processing and analytics systems in most of the NATO alliance countries. There were enough i2 Analyst Workbenches in Washington, DC to cause the Cambridge-based i2 to open an office in Arlington, Virginia.

i2 delivered in the mid 1990s, tools which allowed an analyst to identify people of interest, display relationships among these individuals, and drill down into underlying data to examine surveillance footage or look at text from documents (public and privileged).

IBM has i2 technology, and it also owns the Cybertap technology. The combination allows IBM to deploy for financial institutions a remarkable range of field proven, powerful tools. These tools are mature.

Due to the marketing expertise of IBM, a number of firms looked at what Hunter “invented” and concluded that there were whizzier ways to deliver certain functions. Palantir, for example, focused on Hollywood style visualization, Digital Reasoning emphasized entity extraction, and Haystax stressed insider threat functions. Today there are more than two dozen companies involved in what I call the Hunter-i2 market space.

Some of these have pushed in important new directions. Three examples of important innovators are: Diffeo, Recorded Future, and Terbium Labs. There are others which I can name, but I will not. You will have to wait until my new Dark Web study becomes available. (If you want to reserve a copy, send an email to benkent2020 at yahoo dot com. The book will run about 250 pages and cost about $100 when available as a PDF.)

The reason I mention i2 is because a recent Wall Street Journal article called “”Spy Tools Come to Wall Street” Print edition for August 3, 2015) and “Spy Software Gets a Second Life on Wall Street” did not. That’s not a surprise because the Murdoch property defines “news” in an interesting way.

The write up profiles a company called Digital Reasoning, which was founded in 2000 by a clever lad from the University of Virginia. I am confident of the academic excellence of the university because my son graduated from this fine institution too.

Digital Reasoning is one of the firms engaged in cognitive computing. I am not sure what this means, but I know IBM is pushing the concept for its fascinating Watson technology, which can create recipes and cure cancer. I am not sure about generating a profit, but that’s another issue associated with the cognitive computing “revolution.”

I learned:

In pitching prospective clients, Digital Reasoning often shows a demonstration of how its system respo9nded when it was fed 500,000 emails related to the Enron scandal made available by the Federal Energy Regulatory Commission. After being “taught” some key concepts about compliance, the Synthesys program identified dozens of suspicious emails in which participants were using language that suggested attempts to conceal or destroy information.

Interesting. I would suggest that the Digital Reasoning approach is 15 years old; that is, only marginally newer than the i2 system. Digital Reasoning lacks the functionality of Cybertap. Furthermore, companies like Diffeo, Recorded Future, and Terbium incorporate sophisticated predictive methods which operate in an environment of real time information flows. The idea is that looking at an archive is interesting and useful to an attorney or investigator looking backwards. However, the focus for many financial firms is on what is happening “now.”

The Wall Street Journal story reminds me of the third party descriptions of Autonomy’s mid 1990s technology. Those who fail to understand the quantity of content preparation and manual, subject matter expert effort required to obtain high value outputs are watching smoke, not investigating the fire.

For organizations looking for next generation technology which is and has been working for several years, one must push beyond the Palantir valuation and look to the value of innovative systems and methods.

For a starter, check out Diffeo, Recorded Future, and Terbium Labs. Please, push IBM to exert some effort to explain the i2-Cybertap capabilities. I tip my hat to the PR firm which may have synthesized some information for a story that is likely to make the investors’ hearts race this fine day.

Stephen E Arnold, August 3, 2015

Data Science, Senior Managers, and the Ever Interesting Notion of Truth

August 3, 2015

I read “Data Scientists to CEOs: You Can’t Handle the Truth.” I enjoy write ups about data science which start off with the notion of truth. I know that the “truth” referenced is the outputs of analytics systems.

Call me skeptical. If the underlying data are not normalized, validated, and timely, the likelihood of truth becomes even murkier than it was in my college philosophy class. Roger Ailes allegedly said:

Truth is whatever people will believe.

Toss in the criticism of a senior manager who in the US is probably a lawyer or an accountant, and you have a foul brew. Why would a manager charged with hitting quarterly targets or generating enough money to meet payroll quiver with excitement when a data scientist presents “truth.”

There is that pesky perception thing. There are frames of reference. There are subjective factors in play. Think of the dentist who killed Cecil. I am not sure data science will solve his business and personal challenges. Do you?

The write up is a silly fan rant for the fuzzy discipline of data science. Data science does not pivot on good old statisticians with their love of SAS and SPSS, fancy math, and 17th century notions of what constitutes a valid data set. Nope.

The data scientist has to communicate the known unknowns to his or her CEO. Shades of Rumsfeld. Does today’s CEO want to know more about the uncertainty in the business? The answer is, “Maybe.” But senior managers often get information that is filtered, shaped, and presented to create an illusion. Shattering those illusions can have some negative career consequences even for data scientists, assuming there is such a discipline as data science.

Evoking the truth from statistical processes which are output from system configured by others can be interesting. Those threshold settings are not theoretical. Those settings determine what the outputs are and what they are “about.”

Connecting an automated output to something that the data scientist asserts should be changed strikes me as somewhat parental. How does that work on a manager like Dick Cheney? How does that work on the manager of a volunteer committee working on a parent teacher luncheon?

I thought the Jack Benny program from the 1930s to 1960s was amusing. Some of the output about data science suggests that comedy may be a more welcoming profession than management based on truth from data science. Truth and statistics. Amazing comedy.

Stephen E Arnold, August 3, 2015

Big Data Lake: Are the Data Safe to Consume?

August 2, 2015

I read “The Analytics Journey Leading to the Business Data Lake.” Data lake is one of the terms floating around (pun definitely intended!) to stimulate sales. If one has a great deal of water, one needs a place to put it. Even though water is dammed, piped, used, recycled, and dumped—storage is the key.

Enter EMC, a company which is in the business of helping those with water store it and make use of that substance.

The write up reflects effort. I assume there was a PowerPoint slide deck in the mix. There are some snazzy graphics. Here’s one that caught my eye:


Instead of enterprise search being the go-to enterprise software solution, EMC has slugged in the following umbrella terms:

  • Information ecosystem
  • Business intelligence (perhaps an oxymoron in light of this article)
  • Advanced analytics (obviously because regular analytics just are zippy enough)
  • Knowledge layer (I remain puzzled about knowledge because I have a tough time defining. In fact, I resigned from my for fee knowledge management column because I just don’t know what the heck “knowledge” means.)
  • The unfathomable data lake (yep, pun intended). What’s wrong with the word “storage” or “database” by the way?
  • Master data which is also baffling. Is there servant data too?
  • Machine data. Again I have no clue what this means.

The chart scatters undefined and fuzzy buzzwords like a crazed Jethro Tull, a water soluble blend of Jethro Tull (inventor of the seed drill) and Jethro Tull (the commercially successful and eccentric rock bands).

The write up is important because EMC has sucked in the jargon and assertions once associated with enterprise search and applied them to the dark and mysterious data lake.

I highlighted:

Our data lake is one logical data platform with multiple tiers of performance and storage levels to optimally serve various data needs based on Service Level Agreements (SLA). It will provide a vast amount of structured and unstructured data at the Hadoop and Greenplum layers to data scientists for advanced analytics innovation. The higher performance levels powered by Greenplum and in-memory caching databases will serve mission-critical and real-time analytics and application solutions. With more robust data governance and data quality management, we can ensure authoritative, high-quality data driving all of EMC business insights and analytics driven applications using data services from the lake.

Ah, the Mariana Trench of enterprise information: Governance. Like “knowledge” and “advanced analytics”,  governance has euphony. I think of the water lapping against the shore of Lake Paseco.

So what? Several observations:

  1. This type of “suggest lots” marketing ended poorly for a number of companies who used this type of rhetoric when marketing search
  2. The folks who swallow this bait are likely to find themselves in a most uncomfortable spot
  3. The problems associated with making use of information to improve decision making by reducing risk are not going to be solved by crazy diagrams and unsupported assertions.

EMC has been able to return revenue growth. But the company’s profit margin has flat lined.


I am not sure that increasing the buzzword density in marketing write ups will help angle the red lines to low earth orbit. With better margins, it is much easier to check out the topographic view and see where lakes meet land.

Stephen E Arnold, August 2, 2015

Elasticsearch: A Useful Overview

August 1, 2015

Want to shake free of the proprietary search and retrieval systems? I don’t blame you. Irregular and slow bug fixes and licensing handcuffs are two good reasons. Remember: The cost of search is not the licensing fee. The cost is a collection of fees, purchases, and expenses which every search system with which I am familiar is burdened.

Elasticsearch is the go to solution at this time in my opinion. If you want a useful overview of Elasticsearch, check out the Slideshare presentation “Introduction to ElasticSearch.” You may have to “join” LinkedIn / Slideshare to do anything useful, however.

The deck was prepared / delivered in the spring of 2015 by Roy Russo who is affiliated with or is “DevNexus.” The information is jargon free, an approach which the whiz kids at LucidWorks (Really?) may want to imitate. The presentation does contain a couple of buzzwords like NGram, but no MBA speak.

Stephen E Arnold, August 1, 2015

Darktrace: A Kin of Kinjin?

August 1, 2015

Many years ago I loaded a software application from Autonomy. The application watched what I was “doing” and automatically displayed search results sort of relevant to what the software thought I was writing.

Flash forward to now. I read “Mike Lynch’s Cyber security Startup Darktrace Valued at More than £60m.” The point of the write up is that Dr. Mike Lynch has what looks like another success in his digital Bialette k6857 Mocha Express machine.

Darktrace monitors digital flows for signals. Instead of displaying search results, the system alerts security officers of a probable issue. Maybe Kinjin is not the influencer of the system. No matter. The company is “valued at more than $100 million.”

Several observations:

  • The Hewlett Packard Autonomy hassle has not spoiled Dr. Lynch’s coffee
  • Dr. Lynch is once again moving into a market sector in which some of the competitors are likely to be unaware of Dr. Lynch’s electric powered kitchen appliance taking over their coffee machine.
  • Hewlett Packard may want to ask and answer: “Why did we lose this fellow?”

My hunch is that HP won’t ask the question and may not admit that the answer is not just technology. The murky world of management spoils and otherwise pristine cup of java. That’s a $100 million dollar cup of joe.

Stephen E Arnold, August 1, 2015

The Hadoop Spark Thing: Simple, Simple

July 30, 2015

I am fascinated with the cheerleading about open source software which makes Big Data as easy as driving a Fiat 500 through a car wash. (Make sure the wheels fit inside the automated pulley system, of course.)

Navigate to “The Big Big Data Question: Hadoop or Spark?” Be prepared to read about two—count ‘em—two systems working as smoothly as the engine in a technical high school’s auto repair class’ project car.

I want to highlight two statements in the write up.

The first is:

As I [a Big Data practitioner] mentioned, Spark does not include its own system for organizing files in a distributed way (the file system) so it requires one provided by a third-party. For this reason many Big Data projects involve installing Spark on top of Hadoop, where Spark’s advanced analytics applications can make use of data stored using the Hadoop Distributed File System (HDFS).

In short, Spark is what I call a wrapper. One uses it like a taco shell to keep the good in position for real time munching.

The second is this comment:

The open source principle is a great thing, in many ways, and one of them is how it enables seemingly similar products to exist alongside each other – vendors can sell both (or rather, provide installation and support services for both, based on what their customers actually need in order to extract maximum value from their data.

What the write omits is that there are some other bits and pieces needed; for example, how does one locate a particular string amidst the Big Data?

The point, for me, is that these nested and layered systems are truly exciting to troubleshoot. Not only are their issues with the integrity of the data, there is the thrill of getting each subsystem to work and then figuring out how to get useful outputs from the digital equivalent of a Roy’s Place Lassie’s Double Revenge sandwich before it closed its doors in 2013.

A Lassie’s Double Revenge consisted of a knockwurst, cheese, grilled onions, baked beans, and assorted seasonings served to the discerning diner.

A little like an open source Big Data mash up.

As a bonus, one gets to hire consultants who can make separate products, systems, and solutions work in a way which benefits the licensee and the system’s users.

Stephen E Arnold, July 30, 2015

Organizations Should Consider Office 365 Utilization

July 30, 2015

Office 365 has been a bit contentious within the community. While Microsoft touts it as a solution that gives users more of the social and mobile components they were wishing for, it has not been widely adopted. IT Web gives some reasons to consider the upgrade in its article, “Why You Should Migrate SharePoint to Office 365.”

The article says:

“Although SharePoint as a technology has matured a great deal over the years, I still see many businesses struggling with issues related to on-premises SharePoint, says Simon Hepburn, director of bSOLVe . . . You may be thinking: ‘Are things really that different using SharePoint on Office 365?’ Office 365 is constantly evolving and as I will explain, this evolution brings with it opportunities that your business should seriously consider exploring.’”

Of course the irony is that with the new SharePoint 2016 upgrade, Microsoft is giving users a promise to stand behind on-premise installations, but they are continuing to integrate and promote the Office 365 components. Only time and feedback will dictate the continued direction of the enterprise solution. In the meantime, stay tuned to Stephen E. Arnold and his Web service, Arnold is a longtime leader in search and his dedicated SharePoint feed is a one-stop-shop for all the latest news, tips, and tricks.

Emily Rae Aldridge, July 30, 2015

Sponsored by, publisher of the CyberOSINT monograph



Whither Unix Data

July 30, 2015

For anyone using open-source Unix to work with data, IT World has a few tips for you in “The Best Tools and Techniques for Finding Data on Unix Systems.” In her regular column, “Unix as a Second Language,” writer Sandra Henry-Stocker explains:

“Sometimes looking for information on a Unix system is like looking for needles in haystacks. Even important messages can be difficult to notice when they’re buried in huge piles of text. And so many of us are dealing with ‘big data’ these days — log files that are multiple gigabytes in size and huge record collections in any form that might be mined for business intelligence. Fortunately, there are only two times when you need to dig through piles of data to get your job done — when you know what you’re looking for and when you don’t. 😉 The best tools and techniques will depend on which of these two situations you’re facing.”

When you know just what to search for, Henry-Stocker suggests the “grep” command. She supplies a few variations, complete with a poetic example. Sometimes, like when tracking errors, you’re not sure what you will find but do know where to look. In those cases, she suggests using the “sed” command. For both approaches, Henry-Stocker supplies example code and troubleshooting tips. See the article for the juicy details.

Cynthia Murrell, July 30, 2015

Sponsored by, publisher of the CyberOSINT monograph


Connecting SharePoint with External Data

July 28, 2015

One of the most frequently discussed SharePoint struggles is integrating SharePoint data with existing external data. IT Business Edge has compiled a short slideshow with helpful tips regarding integration, including the possible use of business connectivity services. See all the details in their presentation, “Eight Steps to Connect Office 365/SharePoint Online with External Data.”

The summary states:

“According to Mario Spies, senior strategic consultant at AvePoint, a lot of companies are in the process of moving their SharePoint content from on-premise to Office 365 / SharePoint Online, using tools such as DocAve Migrator from SharePoint 2010 or DocAve Content Manager from SharePoint 2013. In most of these projects, the question arises about how to handle SharePoint external lists connected to data using BDC. The good news is that SharePoint Online also supports Business Connectivity Services.”

To continue to learn more about the tips and tricks of SharePoint connectivity, stay tuned to, particularly the SharePoint feed. Stephen E. Arnold is a lifelong leader in all things search, and his expertise is especially helpful for SharePoint. Users will continue to be interested in data migration and integration, and how things may be easier with the SharePoint 2016 update coming soon.

Emily Rae Aldridge, July 28, 2015

Sponsored by, publisher of the CyberOSINT monograph


Next Page »