Computerworld Becomes ComputerWatson

October 22, 2016

I followed a series of links to three articles about IBM Watson. Here are the stories I read:

The publication running these three write ups is Computerworld, which I translated as “ComputerWatson.”

Intrigued by the notion of “news,” I learned:

Watson uses some 50 technologies today, tapping artificial-intelligence techniques such as machine learning, deep learning, neural networks, natural language processing, computer vision, speech recognition and sentiment analysis.

But IBM does not like the idea of artificial intelligence even though I have spotted such synonyms in other “real” news write ups; for example, “augmented intelligence.”

There are factoids like “Watson can read more than 800 pages a second.” Figure 125 words per “page” and that works out to 100,000 words per second which is a nice round number. Does Watson perform this magic on a basic laptop? Probably not. What are the bandwidth and storage requirements? Oh,not a peep.

Computerworld—I mean ComputerWatson—provides a complete timeline of the technology too. The future begins in 1997. Imagine that. Boom. Watson wins at chess.

The “history” of Watson is embellished with a fanciful account of how IBM trained via humans assembling information. How much does corpus assembly cost? ComputerWatson—oh, I meant “Computerworld”—does not dive into investment.

To make Watson’s inner workings clear, the “real” news write up provides a link to an IBM video. Here’s an example of the cartoonish presentation:


These three write ups strike me as a public relations exercise. If IBM paid Computerworld to write and run these stories, the three articles are advertising. Who wrote these “news stories”? The byline is Katherine Noyes, who describes herself as “an ardent geek.” Her beat? Enterprise software, cloud computing, big data, analytics, and artificial intelligence.

Remarkable stuff but I had several thoughts:

  1. Not much “news” is included in the articles. It seems to me that the information has appeared in other writings.
  2. IBM Watson is working overtime to be recognized as the leader in the smart software game. That’s okay, but IBM seems to be pushing big markets with no easy way to monetize its efforts; for example, education, cancer, and game show participation.
  3. The Computerworld IBM Watson content party strikes me as eroding the credibility of both outfits.

Oh, I remember. Dave Schubmehl, the fellow who tried to sell on Amazon reports containing my research without my consent, was hooked up with IDG. I have lost track of the wizard, but I do recall the connection. More information is here.

Yep, credibility for possible content marketing and possible presentation of “news” as marketing collateral. Fascinating. Perhaps I should ask Watson: “What’s up?”

Stephen E Arnold, October 22, 2016

Google Finds That Times Change: Privacy Redefined

October 21, 2016

I read “Google Has Quietly Dropped Ban on Personally Identifiable Web Tracking.” The main idea is that an individual can be mapped to just about anything in the Google-verse. The write up points out that in 2007, one of the chief Googlers said that privacy was a “number one priority when we [the Google] contemplate new kinds of advertising products.”

That was before Facebook saddled up with former Googlers (aka Xooglers) and started to ride the ad pony, detailed user information, and the interstellar beast of user generated content. Googlers knew that social was a big deal, probably more important than offering Boolean operators and time stamp metadata for users of its index. But that was then and this is now.

The write up reveals:

But this summer, Google quietly erased that last privacy line in the sand – literally crossing out the lines in its privacy policy that promised to keep the two pots of data separate by default. In its place, Google substituted new language that says browsing habits “may be” combined with what the company learns from the use Gmail and other tools. The change is enabled by default for new Google accounts. Existing users were prompted to opt-in to the change this summer.

I must admit that when I saw the information, I ignored it. I don’t use too many Google services, and I am not one of the cats in the bag that Google is carrying to and fro. I am old (73), happy with my BlackBerry, and I don’t use mobile search. But the shift is an important part of the “new” Alphabet Google thing.

Tracking users 24×7 is the new black in Sillycon Valley. The yip yap about privacy, ethics, and making explicit what data are gathered is noise. Buy a new Pixel phone and live the dream, gentle reader.

You can work through the story cited above for more details. My thoughts went a slightly different direction:

  1. Facebook poses a significant challenge to Google, and today it does not have a viable option to offer its users
  2. The shift to mobile means that Google has to — note the phrase “has to” — find a way to juice up ad revenues. Sure, these are okay, but to keep the Loon balloons aloft more dough is needed.
  3. Higher value data boils down to detailed information about specific users, their cohorts, their affinity groups, and their behaviors. As the economy continues to struggle, the Alphabet Google thing will have data to buttress the Google ad sales’ professionals pitches to customers.
  4. Offering nifty data to nation states like China-type countries may allow Google to enter a new market with the Pixel and mobile search as Trojan horses.

In my monograph “Google Version 2.0: The Calculating Predator,” I described some of the technical underpinnings of Google’s acquisitions and inventors. With more data, the value of these innovations may begin to pay off. If the money does not flow, Google Version 3.0 may be a reprise of the agonies of the Yahooligans. Those Guha and Halevy “inventions” are fascinating in their scope and capabilities. Think about an email for which one can know who wrote it, who received it, who read it, who changed, what the changes were, who the downstream recipients were, and other assorted informational gems.

Allow me to leave you with a single question:

Do you think the Alphabet Google thing was not collecting fine grained data prior to the official announcement?

Although out of print, I have a pre publication copy of the Google 2.0 monograph available as a PDF. If you want a copy, write my intrepid sales manager, Ben Kent at benkent2020 at yahoo dot com. Yep, Yahoo. Inept as it may be, Yahoo is not the GOOG. The Facebook, however, remains the Facebook, and that’s one of Google’s irritants.

Stephen E Arnold, October 21, 2016

The Thrill of Rising Yahoo Traffic

October 21, 2016

I love the Gray Lady. The Bits column is chock full of technology items which inspire, excite, and sometimes implant silly ideas in readers’ minds. That’s real journalism.

Navigate to “Daily Report: Explaining Yahoo’s Unexpected Rise in Traffic.”

The write up pivots on the idea that Internet traffic can be monitored in a way that is accurate and makes sense. A click is a click. A packet is a packet. Makes sense. The are the “minor” points of figuring out which clicks are from humans and which clicks are from automated scripts performing some function like probing for soft spots. There are outfits which generate clicks for various reasons including running down a company’s advertising “checkbook.” There are clicks which ask such questions as, “Are you alive?” or “What’s the response time?” You get the idea because you have a bit of doubt about traffic generated by a landing page, a Web site, or even an ad. The counting thing is difficult.

The write up in the Gray Lady assumes that these “minor” points are irrelevant in the Yahoo scheme of themes; for example:

an increased number of people were drawn to Yahoo in September. The reason may have been Yahoo’s disclosure that month that hackers stole data on 500 million users in 2014.

“People”? How do we know that the traffic is people?

The Gray Lady states:

Yahoo’s traffic has been declining for a long time, overtaken by more adept, varied and apparently secure places to stay on the internet.

Let’s think about this. We don’t know if the traffic data are counting humans, software scripts, or utility functions. We do know that Yahoo has been on a glide path to a green field without rocks and ruts. We know that Yahoo is a bit of a hoot in terms of management.

My hunch is that Yahoo’s traffic is pretty much what it has been; that is, oscillating a bit but heading in for a landing, either hard or soft.

Suggesting that Yahoo may be growing is interesting but unfounded. That traffic stuff is mushy. What’s the traffic to the New York Times’s pay walled subsite? How does the Times know that a click is a human from a “partner” and not a third party scraping content?

And maybe the traffic spike is a result of disenchanted Yahoo users logging in to change their password or cancel their accounts.

Stephen E Arnold, October 21, 2016

Twitter: A Security Breach

October 21, 2016

Several years ago, the Beyond Search Twitter account was compromised. I received emails about tweets relating to a pop singer named Miley Cyrus. We knew the Twitter CTO at the time and it took about 10 days to fix the issue. At that time, I knew that Twitter had an issue.

I read “Passwords for 32 Million Twitter Accounts May Have Been Hacked and Leaked.” I learned:

the data comes from a Twitter hack in which 32 million Twitter accounts may have been compromised. The incident and the news comes from a rather unusual source that lets you download such data and even lets you remove yourself from the listing for free.

No word about how many days will be consumed addressing affected accounts.

Stephen E Arnold, October 21, 2016

Picking Away at Predictive Programs

October 21, 2016

I read “Predicting Terrorism From Big Data Challenges U.S. Intelligence.” I assume that Bloomberg knows that Thomson Reuters licenses the Palantir Technologies Metropolitan suite to provide certain information to Thomson Reuters’ customers. Nevertheless, I was surprised at some of the information presented in this “real” journalism write up.

The main point is that numerical recipes cannot predict what, when, where, why, and how bad actors will do bad things. Excluding financial fraud, which seems to be a fertile field for wrong doing, the article chases the terrorist angle.

I learned:

  • Connect  the dots is a popular phrase, but connecting the dots to create a meaningful picture of bad actors’ future actions is tough
  • Big data is a “fundamental fuel”
  • Intel, PredPol, and Global Intellectual Property Enforcement Center are working in the field of “predictive policing”
  • The buzzword “total information awareness” is once again okay to use in public

I highlighted this passage attributed too a big thinker at the Brennan Center for Justice at NYU School of Law:

Computer algorithms also fail to understand the context of data, such as whether someone commenting on social media is joking or serious,

Several observations:

  • Not a single peep about Google Deep Mind and Recorded Future, outfits which I consider the leaders in the predictive ball game
  • Not a hint that Bloomberg was itself late to the party because Thomson Reuters, not exactly an innovation speed demon, saw value in Palantir’s methods
  • Not much about what “predictive technology” does.

In short, the write up delivers a modest payload in my opinion. I predict that more work will be needed to explain the interaction of math, data, and law enforcement. I don’t think a five minute segment with talking heads on Bloomberg TV won’t do it.

Stephen E Arnold, October 21, 2016

Falcon Searches Through Browser History

October 21, 2016

Have you ever visited a Web site and then lost the address or could not find a particular section on it?  You know that the page exists, but no matter how often you use an advanced search feature or scour through your browser history it cannot be found.  If you use Google Chrome as your main browser than there is a solution, says GHacks in the article, “Falcon: Full-Text history Search For Chrome.”

Falcon is a Google Chrome extension that adds full-text history search to a browser.  Chrome usually remembers Web sites and their extensions when you type them into the address bar.  The Falcon extension augments the default behavior to match text found on previously visited Web Sites.

Falcon is a search option within a search feature:

The main advantage of Falcon over Chrome’s default way of returning results is that it may provide you with better results.  If the title or URL of a page don’t contain the keyword you entered in the address bar, it won’t be displayed by Chrome as a suggestion even if the page is full of that keyword. With Falcon, that page may be returned as well in the suggestions.

The new Chrome extension acts as a delimiter to recorded Web history and improves a user’s search experience so they do not have to sift through results individually.

Whitney Grace, October 21, 2016
Sponsored by, publisher of the CyberOSINT monograph


Data Silos: Here to Stay

October 20, 2016

Data silos have become a permanent part of the landscape. Even if data reside in a cloud, some data are okay for certain people to access. Other data are off limits. Whether the data silo is a result of access controls or because an enthusiastic marketer has a one off storage device in his or her cubbies’ desk drawer, we have silos.

I read “Battling Data Silos: 3 Tips to Finance and Operations Integration.” This is a very good example of providing advice which is impossible to implement. If I were to use the three precepts in an engagement, I have a hunch that a barrel of tar and some goose feathers will be next to my horse and buggy.

What are the “tips”? Here you go.

  1. Conduct a data discovery audit.
  2. Develop a plan
  3. And my fave “Realize the value of the cloud for high performance and scalability.”

Here we go, gentle reader.

The cost of a data discovery audit can be high. The cost of the time, effort, and lost productivity mean that most data audits are limp wash rags. Few folks in an organization know what data are where, who manages those data, and the limits placed on the data. Figuring out the answers to these questions in a company with 25 people is tough. Try to do it for a government agency with dozens of locations and hundreds of staff and contractors. Automated audits can be a help, but there may be unforeseen consequences of sniffing who has what. The likelihood of a high value data discovery audit without considerable preparation, budgeting, and planning is zero. Most data audits like software audits never reach the finish line without a trip to the emergency room.

The notion of a plan for consolidating data is okay. Folks love meetings with coffee and food. A plan allows a professional to demonstrate that work has been accomplished. The challenge, of course, is to implement the plan. That’s another kettle of fish entirely. MBA think does not deliver much progress toward eliminating silos which proliferate like tweets about zombies.

The third point is value. Yep, value. What is value? I don’t know. Cloud value can be demonstrated for specific situations. But the thought of migrating data to a cloud and then making sure that no regulatory, legal, or common sense problems have been avoided is a work in progress. Data management, content controls, and security tasks nudge cloud functions toward one approach: Yet another data silo.

Yep, YADS. Three breezy notions crater due to the gravitational pull of segmented content repositories under the control of folks who absolutely love silos.

Stephen E Arnold, October 20, 2016

Semantiro and Ontocuro Basic

October 20, 2016

Quick update from the Australian content processing vendor SSAP or Semantic Software Asia Pacific Limited. The company’s Semantiro platform now supports the new Ontocuro tool.

Semantiro is a platform which “promises the ability to enrich the semantics of data collected from disparate data sources, and enables a computer to understand its context and meaning,” according to “Semantic Software Announces Artificial Intelligence Offering.”

I learned:

Ontocuro is the first suite of core components to be released under the Semantiro platform. These bespoke components will allow users to safely prune unwanted concepts and axioms; validate existing, new or refined ontologies; and import, store and share these ontologies via the Library.

The company’s approach is to leapfrog the complex interfaces other indexing and data tagging tools impose on the user. The company’s Web site for Ontocuro is at this link.

Stephen E Arnold, October 20, 2016

Google and the Mobile Traffic Matter

October 20, 2016

I read a couple of writes up about “Google May Be Stealing Your Mobile Traffic.” Quite surprisingly there was a response to these “stealing” articles by Google. You can read the explanation in a comment by Malte Ubl in the original article (link here).

I noted these comments in the response to the stealing article:

  • Mr. Ubl says, ““stealing traffic” is literally the opposite of what AMP is for.”
  • Mr. Ubl says, “there are audience measurement platforms that attribute traffic to publishers. They might in theory wrongly attribute AMP traffic to the AMP Cache (not Google) rather than to a publisher because they primarily use referrer information. That is why we worked with them in worldwide outreach to get this corrected (where it was a problem), so that traffic is correctly attributed to the publisher. If this is still a problem anywhere, AMP treats it as a highest priority to get it resolved.”
  • Mr. Ubl says, “AMP supports over 60 ad networks (2 of them are owned by Google) with 2-3 coming on board every week and makes absolutely no change to business terms whatsoever. There is no special revenue share for AMP.”
  • Mr. Ubl says, “The Android users might have already noticed that it is now scrolling out of the way and the same is coming soon for iOS (we’re just fighting a few jank issues in Safari).”

AMP is, therefore, not stealing traffic.

I went back to my 2007 monograph “Google Version 2.0: The Calculating Predator,” and pulled out this diagram from a decade ago:

goog container 2007

The user interacts with the Google, not the Internet for certain types of content. The filtering is far from perfect, but it an attempt to gain control over the who, what, why, when, and where of information access and delivery. © Stephen E Arnold, 2007, All rights reserved.

I offer this diagram as a way to summarize my understanding of the architecture which Google had spelled out in its patent documents and open source technical documents. (Yep, the GOOG did pay me a small amount of money, but that is supposed to be something you cannot know.) However, my studies of Google — The Google Legacy, Google Version 2.0: The Calculating Predator, and Google: The Digital Gutenberg— were written with open source content only.

Now back to the diagram. My research suggested that Google, like Facebook, envisioned that it would be the “Internet” for most people. In order to reduce latency and derive maximum efficiency from its global infrastructure, users would interact with Google via services like search. The content or information would be delivered from Google’s servers. In its simplest form, there is a Google cache which serves content. The company understood the cost of passing every query back to data centers, running each query, and then serving the content. Common sense said, “Hey, let’s store this stuff and knock out unnecessary queries.” In a more sophisticated form, the inventions of Ramanathan Guha and others illustrated a system and method for creating a sliced-and-diced archive of factoids. A user query for digital cameras would be handled by pulling factoids from a semantic database. (I am simplifying here.,)

In one of my unpublished presentations, I show a mobile phone user interacting with Google’s caches in order to eliminate the need to send the user to the source of the factoid.

Perhaps I misunderstood the technical information my researchers and I analyzed.

I don’t think Google is doing anything different today. The “stealing” idea comes from a person who finally takes a look at how the Google systems maximize efficiency and control the users. In order to sell ads, Google has to know who does what, when, where, and under what circumstances.

Today’s Google is now a legacy system. I know this is heretical, but Google is not a search company. The firm is using its legacy platform to deliver revenue and maximize that revenue. Facebook (which has lots of Xooglers running around) is doing essentially the same thing but with plumbing variations.

I am probably wildly out of step with youthful Googlers and the zippy mobile AMPers. But from my vantage point, Google has been delivering a closed garden solution for a long time.

My Google trilogy is now out of print. I can provide a fair copy with some production glitches for $250. If you are interested, write my intrepid marketer, Benny Kent at

Stephen E Arnold, October 20, 2016

Multiple Vendors Form Alliance to Share Threat Intelligence

October 20, 2016

In order to tackle increasing instances of digital security threats, multiple intelligence threat vendors have formed an alliance that will share the intelligence gathered by each of them.

An article that appeared on Network World titled Recorded Future aligns with other threat intelligence vendors states that stated:

With the Omni Intelligence Partner Network, businesses that are customers of both Recorded Future and participating partners can import threat intelligence gathered by the partners and display it within Intelligence Cards that are one interface within Recorded Future’s platform

Apart from any intelligence, the consortium will also share IP addresses that may be origin point of any potential threat. Led by Recorded Future, the other members of the alliance include FireEye iSIGHTResilient Systems and Palo Alto Networks

We had earlier suggested about formation inter-governmental alliance that could be utilized for sharing incident reporting in a seamless manner. The premise was:

Intelligence gathered from unstructured data on the Internet such as security blogs that might shed light on threats that haven’t been caught yet in structured-data feeds

Advent of Internet of Things (IoT) will exacerbate the problems for the connected world. Will Omni Intelligence Partner Network succeed in preempting those threats?

Vishal IngoleOctober 20, 2016
Sponsored by, publisher of the CyberOSINT monograph


Next Page »