CyberOSINT banner

The Race to Predict Began Years Ago: Journalism as Paleontology

August 4, 2015

I love reading the dead tree edition of the Wall Street Journal. This morning I learned that “Apple and Google Race to Predict What You Want.” The print story appears in the Business & Tech section on B1 and B6 for August 4, 2014. Note that the online version of the story has this title: “Apple and Google Know What You Want before You Do.” There is a difference for me between a “race” and “know.”

Nevertheless, the write up is interesting because of what is omitted. The story seems to fixate on mobile phone users and the notion of an assistant. The first thing I do with my mobile phone is find a way to disable this stuff. I dumped my test Microsoft phone because the stupid Cortana button was in a location which I inadvertently pressed. The Blackberry Classic is equally annoying, defaulting to a screen which takes three presses to escape. The iPhones and Android devices cannot understand my verbal instructions. Try looking up a Russian or Spanish name. Let me know how that works for you.

Now what’s omitted from the write up. Three points struck me as one which warranted a mention:

  1. Predictive methods are helping in reduce latency and unnecessary traffic (hence cost) between the user’s device and the service with the “answer”
  2. Advertisers benefit from predictive analytics. Figuring out that someone wants food opens the door to a special offer. Why not cue that up in advance?
  3. Predictive technology is not limited to a mobile applications. Google invested some bucks into an outfit called Recorded Future. What does Recorded Future do? Answer: Predictive analytics with a focus on time. The GOOG like Apple is mostly time blind.

Predictive methods are not brand, spanking new to those who have followed the antics of physicists since Einstein miracle year. For the WSJ and its canines, isn’t new whatever today seems bright and shiny.

Stephen E Arnold, August 4, 2015

IBM to Do the Apple Thing: Will Jeans and Black Turtlenecks Be the New IBM Fashion Trend

August 4, 2015

I don’t know if this article is accurate or a belated April Fool joke: “IBM to Purchase Up to 200,000 Macs Annually, with 50-75 Percent of Employees Ultimately Switching from Lenovo.”

My recollection is that IBM and Microsoft were once pals. Well, maybe sort of pals. There was that OS/2 thing, wasn’t there.

In the write up I learned:

A year after teaming up with Apple on an enterprise partnership to push iOS devices and apps for business users, IBM is moving forward with plans to rapidly move its own employees onto Apple’s platforms, MacRumors has learned.  While IBM announced in an internal memo several months ago that it was planning to purchase up to 50,000 MacBooks for employees by the end of 2015, chief information officer Jeff Smith has revealed in a new internal video released to employees yesterday that he believes IBM could actually end up purchasing 150,000-200,000 Macs annually.

I am not sure if Lenovo is happy or sad to see this type of “news.” Years ago, I learned from IBM that it had “figure out Google.” I assume that this deal with Apple and the embrace of Apple hardware is part of that “figuring out.”

Stephen E Arnold, August 4, 2015

Coauthoring Documents in SharePoint to Save Time

August 4, 2015

SharePoint users are often looking for ways to save time and streamline the process of integration from other programs. Business Management Daily has devoted some attention to the topic with their article, “Co-authoring Documents in SharePoint and Office.” Read on for the full details of how to make the most of this feature.

The article begins:

“One of the best features of SharePoint 2010 and 2013 is the way it permits co-authoring. Co-authoring means more than one person is in a document, workbook or presentation at the same time editing different parts. It works differently in Word, Excel and PowerPoint . . . With Word 2013/SharePoint 2013, co-authors may edit either in Word Online (Word Web App) or the desktop version.”

SharePoint is a powerful but complicated solution that requires quite a bit of energy to maintain and use to the best of its ability. For those users and managers that are tasked with daily work in SharePoint, staying in touch with the latest tips and tricks is vital. Those users may benefit from Stephen E. Arnold’s Web site, A longtime leader in search, Arnold brings the latest SharePoint news together in one easy to digest news feed.

Emily Rae Aldridge, August 4, 2015

Sponsored by, publisher of the CyberOSINT monograph

Hire Watson As Your New Dietitian

August 4, 2015

IBM’s  supercomputer Watson is being “trained” in various fields, such as healthcare, app creation, customer service relations, and creating brand new recipes.  The applications for Watson are possibly endless.  The supercomputer is combining its “skills” from healthcare and recipes by trying its hand at nutrition.  Welltok invented the CaféWell Health Optimization Platform, a PaaS that creates individualized healthcare plans, and it implemented Watson’s big data capabilities to its Healthy Dining CaféWell personal concierge app.  eWeek explains that “Welltok Takes IBM Watson Out To Dinner,” so it can offer clients personalized restaurant menu choices.

” ‘Optimal nutrition is one of the most significant factors in preventing and reversing the majority of our nation’s health conditions, like diabetes, overweight and obesity, heart disease and stroke and Alzheimer’s,’ said Anita Jones-Mueller, president of Healthy Dining, in a statement. ‘Since most Americans eat away from home an average of five times each week and it can be almost impossible to know what to order at restaurants to meet specific health needs, it is very important that wellness and condition management programs empower  smart dining out choices. We applaud Welltok’s leadership in providing a new dimension to healthy restaurant dining through its groundbreaking CaféWell Concierge app.’”

Restaurant menus are very vague when it comes to nutritional information.  When it comes to knowing if something is gluten-free, spicy, or a vegetarian option, the menu will state it, but all other information is missing.  In order to find a restaurant’s nutritional information, you have to hit the Internet and conduct research.  A new law passed will force restaurants to post calorie counts, but that will not include the amount of sugar, sodium, and other information.  People have been making poor eating choices, partially due to the lack of information, if they know what they are eating they can improve their health.  If Watson’s abilities can decrease the US’s waistline, it is for the better.  The bigger challenge would be to get people to use the information.

Whitney Grace, August 4, 2015
Sponsored by, publisher of the CyberOSINT monograph


Facebook Wants You To Double Think About Using YouTube

August 4, 2015

Facebook does not like YouTube.  Facebook wants to encourage users to upload their videos to its network, rather than posting them on YouTubeThe Next Web shares how Facebook is trying to become major YouTube competition in “Facebook Throws Shade At YouTube When You  Try To Paste A Link.”  How is Facebook doing this?  First, when a user tries to post a YouTube link, Facebook encourages users to upload to Facebook instead.  Most users do not want to upload to Facebook, because it does not offer the same posting options as YouTube or does it?

Facebook has apparently upgraded how users can share their videos, including new features such as adding categories, sharing as an unlisted video, and disabling embedding.  One drawback is that this could increase the amount of stolen videos.  Some users might upload a stolen video, claim it as theirs, and reap the benefits.  Facebook, however, does have user Audible Magic to catch a stolen copyrighted video.  A direct quote from a Facebook representative said:

“ ‘For years we’ve used the Audible Magic system to help prevent unauthorized video content. We also have reporting tools in place to allow content owners to report potential copyright infringement, and upon receiving a valid notice we remove unauthorized content. We also suspend accounts of people with repeated IP violations when appropriate.’”

Thievery of original content is an important factor Facebook needs to work on if it wishes to rival YouTube.  Popular YouTube celebrities and channels work hard to create original content and YouTube is a proven, marketable network.  Facebook needs to offer competitive or better options to attract the big names, but for the average Facebook user uploading a video directly to Facebook is a desirable option.

Whitney Grace, August 4, 2015
Sponsored by, publisher of the CyberOSINT monograph

Poor IBM i2: 15 Year Old Company Makes Headlines in Fraud Detection and Big Blue Is Not Mentioned

August 3, 2015

Before IBM purchased i2 Ltd from an investment outfit, I did some work for Mike Hunter, one of the founders of i2 Ltd. i2 is not a household name. The fault lies not with i2’s technology; the fault lies at the feet of IBM.

A bit of history. Back in the 1990s, Hunter was working on an advanced degree in physics at Cambridge University. HIs undergraduate degree was from Manchester University. At about the same time, Michael Lynch, founder of Autonomy and DarkTrace, was a graduate of Cambridge and an early proponent of guided machine learning implemented in the Digital Reasoning Engine or DRE, an influential invention from Lynch’s pre Autonomy student research. Interesting product name: Digital Reasoning Engine. Lynch’s work was influential and triggered some me too approaches in the world of information access and content processing. Examples can be found in the original Fast Search & Transfer enterprise systems and in Recommind’s probabilistic approach, among others.

By 2001, i2 had placed its content processing and analytics systems in most of the NATO alliance countries. There were enough i2 Analyst Workbenches in Washington, DC to cause the Cambridge-based i2 to open an office in Arlington, Virginia.

i2 delivered in the mid 1990s, tools which allowed an analyst to identify people of interest, display relationships among these individuals, and drill down into underlying data to examine surveillance footage or look at text from documents (public and privileged).

IBM has i2 technology, and it also owns the Cybertap technology. The combination allows IBM to deploy for financial institutions a remarkable range of field proven, powerful tools. These tools are mature.

Due to the marketing expertise of IBM, a number of firms looked at what Hunter “invented” and concluded that there were whizzier ways to deliver certain functions. Palantir, for example, focused on Hollywood style visualization, Digital Reasoning emphasized entity extraction, and Haystax stressed insider threat functions. Today there are more than two dozen companies involved in what I call the Hunter-i2 market space.

Some of these have pushed in important new directions. Three examples of important innovators are: Diffeo, Recorded Future, and Terbium Labs. There are others which I can name, but I will not. You will have to wait until my new Dark Web study becomes available. (If you want to reserve a copy, send an email to benkent2020 at yahoo dot com. The book will run about 250 pages and cost about $100 when available as a PDF.)

The reason I mention i2 is because a recent Wall Street Journal article called “”Spy Tools Come to Wall Street” Print edition for August 3, 2015) and “Spy Software Gets a Second Life on Wall Street” did not. That’s not a surprise because the Murdoch property defines “news” in an interesting way.

The write up profiles a company called Digital Reasoning, which was founded in 2000 by a clever lad from the University of Virginia. I am confident of the academic excellence of the university because my son graduated from this fine institution too.

Digital Reasoning is one of the firms engaged in cognitive computing. I am not sure what this means, but I know IBM is pushing the concept for its fascinating Watson technology, which can create recipes and cure cancer. I am not sure about generating a profit, but that’s another issue associated with the cognitive computing “revolution.”

I learned:

In pitching prospective clients, Digital Reasoning often shows a demonstration of how its system respo9nded when it was fed 500,000 emails related to the Enron scandal made available by the Federal Energy Regulatory Commission. After being “taught” some key concepts about compliance, the Synthesys program identified dozens of suspicious emails in which participants were using language that suggested attempts to conceal or destroy information.

Interesting. I would suggest that the Digital Reasoning approach is 15 years old; that is, only marginally newer than the i2 system. Digital Reasoning lacks the functionality of Cybertap. Furthermore, companies like Diffeo, Recorded Future, and Terbium incorporate sophisticated predictive methods which operate in an environment of real time information flows. The idea is that looking at an archive is interesting and useful to an attorney or investigator looking backwards. However, the focus for many financial firms is on what is happening “now.”

The Wall Street Journal story reminds me of the third party descriptions of Autonomy’s mid 1990s technology. Those who fail to understand the quantity of content preparation and manual, subject matter expert effort required to obtain high value outputs are watching smoke, not investigating the fire.

For organizations looking for next generation technology which is and has been working for several years, one must push beyond the Palantir valuation and look to the value of innovative systems and methods.

For a starter, check out Diffeo, Recorded Future, and Terbium Labs. Please, push IBM to exert some effort to explain the i2-Cybertap capabilities. I tip my hat to the PR firm which may have synthesized some information for a story that is likely to make the investors’ hearts race this fine day.

Stephen E Arnold, August 3, 2015

My Refrigerator Door Shuts Automatically or Content Processing Vendor Works Hard at Repositioning

August 3, 2015

This weekend I checked out the flow of news from several dozen search and content processing vendors. What I discovered was surprising. For example, for the set of 36 vendors, there was zero substantive news about the companies’ information access technology. More disturbing were the hints of revenue difficulties; for example, New Zealand based SLI Systems, a public traded company, continues to lose money. Search and content processing sales challenges are forcing vendors to reposition themselves or align themselves with business trends which are more likely to have traction with senior managers.


How does a semantic technology company adapt. The approach is surprising, and it involves the Internet of Things. This is the push to put a Nest in your home and an Internet node in your appliances. One benefit is energy efficiency. The other idea is increased opportunities to push advertising to the hapless consumer who just wants to nuke a burrito in a microwave (smart of dumb microwave may not matter to a hungry teen).

I am not sure about your refrigerator. My double door General Electric refrigerator (what my grandmother called an “ice box” and some folks call a “fridge”) has doors which shut automatically. The refrigerator has an odd energy efficient sticker like the ones I remove from monitors which persist in going to sleep when my intelligence does not match the gizmo’s.

I understand that someday soon I will have a refrigerator with lots of intelligence. I am confident that with a few moments thought, I can kill that puppy’s brain.

In my narrow world, bounded by gun toting neighbors and dynamite crazed bridge builders, the Internet of Things or the somewhat odd acronym “IoT”, pronounced by my Spanish tutor “Eee ooooh tay”, will be a bit like Big Data, semantic search, natural language processing, artificial intelligence, and data lakes. The idea is that a search and content processing vendor can surf on a hot idea like fraud and pump some air into the sagging balloon labeled sales leads.

I am more convinced of this verbal magic each time I read about “new” technology from companies that are essentially vendors of look up functions applicable to information access.

The IoT is, in my opinion, more about getting information about a machine’s performance, the leasee’s adherence to maintenance schedules, and alerts about highly probably device failure.

One of my neighbors has a Mercedes which beeps, vibrates, and flashes when my neighbor strays across the white lines on the highway. Annoying but semi useful. The Mercedes also can phone home if my neighbor’s big expensive SUV experiences a malfunction. Useful. Maybe annoying if the malfunction occurs when the SUV is parked in front of the local Neiman Marcus or Goodwill store.

I read “Content Analysis and the Internet of Things: Never Leave the Fridge Door Open Again?” The main point of the write up is the question which I already answered. My refrigerator automatically shuts its door.

The article states:

The Internet of Things is the expanding network of physical objects that collect information, communicate and sense or interact with their internal states or the external environment according to Gartner, which reports that there will be nearly 26 billion devices on the Internet of Things by 2020.

Ah, yes, the mid tier firm Gartner, an excellent source of objective, unbiased, inclusion free information.

Here’s the article’s keeper passage I noted from a senior manager at a content processing company. Keep that phrase in mind: “content processing.”

With the common method of interaction, we will speak, devices will read, the design will be predicated upon our needs and less so upon the device. The trend seems so simple—for us to understand these devices, the devices must understand us. The difference is meaning. Data is an abstraction, understanding is communication, and to understand and communicate one must know meaning.

I am delighted that data have meaning. I just wonder how much of a stretch it is to apply text centric methods to outputs from an industrial machine connected to the Internet via an iGear service. My hunch is, “Not too much.”

To me the phrase “content processing” means words, not data output from my neighbor’s flashy Mercedes or an Internet enabled refrigerator.

As I said, my refrigerator door closes automatically. Do I want anyone to know that let the hinges do the work?

Stephen E Arnold, August 3, 2015

Data Science, Senior Managers, and the Ever Interesting Notion of Truth

August 3, 2015

I read “Data Scientists to CEOs: You Can’t Handle the Truth.” I enjoy write ups about data science which start off with the notion of truth. I know that the “truth” referenced is the outputs of analytics systems.

Call me skeptical. If the underlying data are not normalized, validated, and timely, the likelihood of truth becomes even murkier than it was in my college philosophy class. Roger Ailes allegedly said:

Truth is whatever people will believe.

Toss in the criticism of a senior manager who in the US is probably a lawyer or an accountant, and you have a foul brew. Why would a manager charged with hitting quarterly targets or generating enough money to meet payroll quiver with excitement when a data scientist presents “truth.”

There is that pesky perception thing. There are frames of reference. There are subjective factors in play. Think of the dentist who killed Cecil. I am not sure data science will solve his business and personal challenges. Do you?

The write up is a silly fan rant for the fuzzy discipline of data science. Data science does not pivot on good old statisticians with their love of SAS and SPSS, fancy math, and 17th century notions of what constitutes a valid data set. Nope.

The data scientist has to communicate the known unknowns to his or her CEO. Shades of Rumsfeld. Does today’s CEO want to know more about the uncertainty in the business? The answer is, “Maybe.” But senior managers often get information that is filtered, shaped, and presented to create an illusion. Shattering those illusions can have some negative career consequences even for data scientists, assuming there is such a discipline as data science.

Evoking the truth from statistical processes which are output from system configured by others can be interesting. Those threshold settings are not theoretical. Those settings determine what the outputs are and what they are “about.”

Connecting an automated output to something that the data scientist asserts should be changed strikes me as somewhat parental. How does that work on a manager like Dick Cheney? How does that work on the manager of a volunteer committee working on a parent teacher luncheon?

I thought the Jack Benny program from the 1930s to 1960s was amusing. Some of the output about data science suggests that comedy may be a more welcoming profession than management based on truth from data science. Truth and statistics. Amazing comedy.

Stephen E Arnold, August 3, 2015

Sorry, Experts. NLP and Semantic Technology Will Guarantee Higher Precision and Recall

August 3, 2015

I read “5 Reasons for Developers to Build NLP and Semantic Search Skills” is one of those bait and switch write ups. The title suggests that NLP and semantic search are “skills.” The content of the article presents without factual substantiation assertions about the differences between Web search and enterprise search. The reality is that both are more closely related than they appear to some “experts.” Neither works particularly well for reasons which have to do with cost control, system management, and focus. The technology is, from my point of view, more stable than some search mavens believe.

Here’s the passage I highlighted in pale mauve because I did not have purple:

It at times feels magical that Search engines know, with unbelievable accuracy, exactly what you are looking for. This is the result of a heavy investment in NLP and Semantic technologies. These, along with speech-recognition, have the potential of enabling a future where search will transform into a smart machine that uses “connected knowledge” to answer significantly complex questions – a Star Trek Computer may not be too far away after all, if Amit Singhal – brain behind Google’s search engine evolution, has be to believed.

More remarkable was the introduction of the phrase “big, unstructured data.” I also found the notion of “commoditization” of data science amusing.

One idea warrants comment. The article calls attention to the “widening gap between enterprise search platforms and general purpose search engines.” Anyone who has attempted to index Web content quickly learns that it is a fruit basket which is in the process of being shoved into a blender. The notion of the enterprise search system was to process the content normally found inside an organization. But guess what? After the first query run on a restricted domain of content, the user says, “I need access to Internet content.” The “gap” is one of perception. The underlying components of the system and much of the gee whiz technology are similar. The fact that the Web search systems have been shaped to handle a restricted body of content is lost on some folks. Similarly the enterprise search systems are struggling because they, like Web search engines, cannot handle efficiently and automatically certain types of content. In short, neither works particularly well.

Will NLP and semantic skills help a developer? Not too much if the search system is not focused, the content is not reliable, and functions poorly defined. Forget big data, little data, and unstructured or structured data. Get the basics wrong and one has a lousy search system, which sadly, is more common than not.

Stephen E Arnold, August 3, 2015

Bodleian Library Gets Image Search

August 3, 2015

There is a lot of free information on the Internet, but the veracity is always in question.  While libraries are still the gateway of knowledge, many of their rarer, more historic works are buried in archives.  These collections offer a wealth of information that is often very interesting.  The biggest problem is that libraries often lack the funds to scan archival collections and create a digital library.  Oxford University’s Bodleian Library, one of the oldest libraries in Europe, has the benefit of funds and an excellent collection to share with the world.

Digital Bodleian boasts over 115,179 images as of writing this article, stating that it is constantly updating the collection.  The online library takes a modern approach to how users interact with the images by taking tips from social media.  Not only can users browse and search the images randomly or in the pre-sorted collections, they can also create their own custom libraries and sharing the libraries with friends.

It is a bold move for a library, especially for one as renowned as Bodleian, to embrace a digital collection as well as offering a social media-like service.  In my experience, digital library collections are bogged down by copyright, incomplete indices or ontologies, and they lack images to perk a users’ interest.  Digital Bodleian is the opposite of many of its sister archives, but another thing I have noticed is that users are not too keen on joining a library social media site.  It means having to sign up for yet another service and also their friends probably aren’t on it.

Here is an idea, how about a historical social media site similar to Pinterest that pulls records from official library archives?  It would offer the ability to see the actual items, verify information, and even yield those clickbait top ten lists.

Whitney Grace, August 3, 2015
Sponsored by, publisher of the CyberOSINT monograph

Next Page »