Silicon Valley and the Butterflies
September 24, 2017
I read “The Tide Is Starting to Turn Against the World’s Digital Giants.” The idea is that those butterfly wings in Brazil can whip up Irma in Miami. Maybe? Maybe not? “Real” journalists have been paddling their canoes away from a whirlpool for decades.
Their efforts, like those sucked into the digital maws of the evil “Internet” have not gone well. The Guardian newspaper, itself whipped by one digital transformation after another, wants the old order restored. The idea is that “real” journalists and other “intermediaries” were gatekeepers. Now the distributed technologies have replaced the “old” gatekeepers with “new” gatekeepers. Oh, the “real” journalists sign, “We want to be Facebook. We were you know.”
The write up explains that the information satans are about to get their comeuppance. About time, I think the write up suggests.
I noted these comments in the “real” news write up:
[a] Multimillion fines are just the start for Facebook and Google, as the world comes to realize how political big tech has become [a piñata? Satan’s right hand? the destroyer of “real” news? Sentence fragments invite completion even when they appear in the Guardian newspaper]
[b] What’s more interesting are various straws in the wind that show how digital behemoths are losing their shine. Many of these relate to Brexit and the election of Donald Trump, and to the dawning of a realization that Google and Facebook in particular may have played some role in these political earthquakes.
[c] What we’ve come to understand over the last two years is that, to coin a slogan, the technical is political.
I find it interesting that the intellectual touchstone for the write up is not the history book but Buzzfeed. Yep, Buzzfeed, a love child of the Guardian in spirit perhaps?
Ah, “real” journalism. Bashing successful companies is effective with a digital information service as one’s inspiration.
Stephen E Arnold, September 24, 2017
Security: Whom Does One Trust?
September 19, 2017
I read “The Market Can’t – and Won’t – Deal with IT Security, It Must Be Regulated, Argues Bruce Schneier.” The write up is about online, which is of interest to me. I found the summary of the remarks of Bruce Schneier, a security expert, interesting.
The main point is that government must regulate security. I highlighted this passage:v”The market can’t fix this. Markets work because buyers choose between sellers, and sellers compete for buyers. In case you didn’t notice, you’re not Equifax’s customer. You’re its product.
Several questions occurred to me:
- Which government? Maybe the United Nations?
- What’s the enforcement mechanism? Is after-the-fact “punishment” feasible?
- What’s the end point of security regulation?
Here in rural Kentucky security boils down to keeping an eye on the two brothers who live in a broken down trailer next to the crazy people who have a collection of wild animals. The wild animals are less threatening than these fine examples of Appalachian oak.
In the larger world which includes a number of nation states which are difficult to influence, how are the regulations to be enforced. What if one of these frisky nation states is behind the headline making security breaches?
Answers to this question are likely to be cause for discussion. Talk is easy. Remediation may be a bit more difficult. Perhaps the barn has burned and the horses already converted to glue and dog food?
Fixes are hard. Talk, well, just talk.
Stephen E Arnold, September 19, 2017
Bing and Google: The News Battle
September 15, 2017
I read “Bing Battles Google News with Its Own Make-Over.” I noted the alliteration: Bing battle. I immediately thought, “Google Gropes.” Both of these companies are trying to reinvent the newspaper using zeros and ones, not dead trees. Let’s look at some of the points I highlighted:
I noted this statement everyone’s most lovable online ad vendor:
Google redesigned their desktop Google News website. Their [sic] new UI has a clean and uncluttered look.
Microsoft responded. I circled this statement:
Microsoft recently updated their Bing News experience that will help users in finding the most up to date and well-rounded information.
Note that the pivot of both sentences is a subjective assertion: “Clean and uncluttered” for the GOOG, and “most up to date and well rounded.”
Some facts would be useful. I am not sure what “clean” or “uncluttered” means. My recollection is that Einstein’s desk like most “dead tree” newspapers are organized in an eclectic manner. Facts supporting these assertions might be difficult to conjure.
The “most up to date” statement should be easy to back up. What’s the latency of the system? The superlative “most” means that Bing is the top dog in news. Hmmm. I don’t buy this.
My point is that the write up provides a useful idea: Neither Bing nor Google has figured out how to present “news” to each system’s online users. The implicit idea is that “dead tree” methods are of little use. Inspiration comes from each system’s response to what the other system does.
Cold War methods applied to online “news”? That’s what the write signals me.
Let’s step back.
Online users have different reasons for wanting news. Some folks chase sports, which as I recall was the most read section of the “dead tree” newspaper company at which I once worked. Other people have quite different reasons for scanning the news; for example, there are some who read the obituaries, others seek cartoons, and others want the latest on the real housewives.
Bing and Google have to figure out how to meet these diverse needs because the “dead tree” crowd has fallen in the forest.
The write up tells me one thing: Neither Google nor Microsoft has any idea about reinventing what “dead tree” newspapers used to do.
Now what? Shape the news to fit what each company’s filters “decide” is “real news”?
Stephen E Arnold, September 15, 2017
Why Not Have One or Two Smart Software Platforms? Great Idea!
September 8, 2017
I have been writing about online for decades. In my Eagleton Lecture in the 1980s (I forget the year I received an ASIS Award and had to give a talk as part of the deal), I pointed out that online concentrates. It is not just economy of scale, a single online service operates like a magnet. Once critical magnetism is achieved, other services clump around or two that central gizmo. There are fancy economic explanations; for example, economies of scale, convenience, utility, and so on.
Now couple the concentration with one of the properties of digital information flows and we have some exciting things to think about. In that Eagleton Lecture, I used the example of the telegraph to illustrate how even inefficient forms of information movement can rework social landscapes.
The point is that concentration and flows are powerful forces. When data flows through an institution, that institution comes apart. When online power is concentrated, the nature of “facts” changes. Even the unconscious decisions of a widely used online service alter how individuals perceive “facts” and set their priorities. (Are you checking your email now, gentle reader?)
When I read “Facebook and Microsoft collaborate to simplify conversions from PyTorch to Caffe2,” I thought about that decades old Eagleton Lecture of mine. The write up describes what seems to be a ho-hum integration play. Here’s a passage I highlighted:
The collaborative work Facebook and Microsoft are announcing helps folks easily convert models built in PyTorch into Caffe2 models. By reducing the barriers to moving between these two frameworks, the two companies can actually improve the diffusion of research and help speed up the entire commercialization process.
Just a tiny step, right?
A few observations:
- Clumping is now evident between Google and Walmart
- Amazon and Microsoft are partnering to make digital gizmos play well together
- Fusion (both data and services) are the go-to idea for next generation information access and analysis services.
The questions which seem interesting this morning are:
- Why do we need to have just multiple artificial intelligence platforms? Won’t one be more efficient and “better”?
- How will users understand reality when smart software seamlessly operates across what appear to be separate functions?
- What will regulators do to control clumping which will command the lion’s share of revenue, resources, and influence?
Yep, just a minor step with this PyTorch and Caffee2 deal.
Online can be exciting and transformative too.
Stephen E Arnold, September 8, 2017
Factoids about Toutiao: Smart News Filtering Service
August 28, 2017
The filtering service Toutiao is operated by Bytedance. The company attracted attention because it is generating money (allegedly) and has lots of users or “daily average users” in the 120 million range. (If you are acronym minded, the daily average user count is a DAU. Holy Dau!)
Forget Google’s “translate this page” for Toutiao, the service is blind to the Toutiao content. A work around is to cut and paste snippets into FreeTranslations.org or get someone who reads Chinese to explain what’s on the Toutiao’s pages.
Other items of interest include. (Oh, the hyperlinks point to the source of the factoid.)
- $900 million in revenue (allegedly). Wall Street Journal, August 28, 2017 with a pay wall for your delectation
- Funding of $3 billion Crunchbase
- Valuation of $20 billion or more Reuters
- Toutiao means headlines Wikipedia
- What it does from Wikipedia:
Toutiao uses algorithms to select different quality content for individual users. It has created algorithmic models that understand information (text, images, videos, comments, etc.) in depth, and developed large-scale machine learning systems for personalized recommendation that surfaces content users have not necessarily signaled preference for yet. Using Natural Language Processing and Computer Vision technologies in A.I, Toutiao extracts hundreds of entities and keywords as features from each piece of content. When a user first open the app, Toutiao makes a preliminary recommendation based on the operation system of his mobile device, his location and other factors. With users’ interactions with the app, Toutiao fine-tunes its models and make better recommendations.
- Founded by Zhang Yiming, age 34, in 2012 Reuters
Technode’s “Why Is Toutiao, a News App, Setting Off Alarm Bells for China’s Giants?” suggests that Toutiao may be the next big Chinese online success. The reason is that the service aggregates “news” from disparate content sources; for example, text, video, images, and data.
Toutiao may be the next big thing in algorithmic, mobile centric information access solutions. The company generates revenues from online ads. The company’s secret sauce include smart software plus some extra ingredients:
- Social functions
- Search
- Video
- User generated “original” content
- Global plans.
Net net: Worth watching.
Stephen E Arnold, August 28, 2017
Web Search Training Wheels: A Play for Precision
August 10, 2017
I read “How to Instantly Boost the Accuracy of Search Results on Google and Bing.” i love the word “instantly”, particularly when coupled to “accuracy.” The write up describes an overlay called Advangle, which helps a person create a search with more than 2.6 words. Interesting neologism Advangle.
These services are what I call “training wheels.” The idea is that a person looking for information fills in a form, which helps the person create a query more sophisticated than “pizza.” Many systems in the last 50 years have tried these types of interfaces. In fact, one can find them in the whiz bang interfaces available to cyber OSINT software users. I won’t drag the old Dow Jones interface into this post, nor will I provide screenshots of Palantir Gotham interfaces. (Hey, you probably know about these already.)
The write up, however, does not explore the concept in too much detail. I noted this statement:
The Advantage interface makes it easier to string together targeted searches with the right syntax, and in half the time it would take to type it all out by hand.
Saving time, not prediction or recall, is the unique selling proposition.
It is useful to keep in mind that formal search operators are still available to users of Bing, Google, Yandex, and a number of other systems. The problem is that as Web search has massified, a tiny faction of the users of ad supported Web search systems bother with formal operators like filetype: or other oddities.
The real problems with search are far deeper than an interface overlay. Let me highlight several which I find consistently troublesome:
- Finding a way to impart the skills of well executed reference interview conducted by an expert in online search and retrieval. (Marydee Ojala, Ruth Patel, Anne Mintz, Ulla de Stricker, and Barbara Quint are individuals who can help a PhD formulate a statement of what information and data are needed, convert that desire into appropriate queries of appropriate databases, and deliver a filtered list of results.) Software, no matter how nifty the interface, at this time cannot replicate this expertise.
- Individuals who need information are more crippled than their counterparts from 30 years ago. Online systems have worked hard to let popularity and past user behavior provide a context for a query like “cyrus.” If you think you will get the pop star before a long dead historical figure, you are more sophisticated than the eager consumers of pop up ads on a Pixel phone
- Databases are governed by editorial policies. In the good old days of 1975, creators of databases figured out what and how to index. Today most users believe that Google has “all” the world’s information. Nothing could be more wrong headed. Indexes, particularly free ones, include what creates traffic. If the content gets a little too frisky, censorship, filtering, and smart / predictive software steps in and delivers “better” information.
I suggest you give the Advantage service a try. You may find that it is better than a room stuffed with Quints and Ojalas and others of this ilk.
My approach is simple: Know what one wants. Formulate a suitable query. Pass the query across the sources/databases likely to have indexed the information. Review the results. Think about the information gaps. Repeat the process.
Pretty crazy today, right?
Who has time to figure out what companies are in the cyber OSINT business or what Dark Web sites continue to offer contraband in the wake of AlphaBay and Hansa.
Research via digital resources, unlike checking Facebook, is a bit of a mental workout.
On the other hand, why not let the ad supported search engines deliver exactly what they think you need. Better yet, let these outfits provide that information before you know you need it.
A system that actually delivered precise, on point, timely, and authoritative results would be great. It would be nice to be able to live forever and travel to the stars.
Reality is a tad different. UX is not yet a replacement for knowing how to research in a way that moves beyond finding Game of Thrones.
Stephen E Arnold, August 10, 2017
India Jumps on the Filtering Bandwagon
August 9, 2017
We noted “Internet Archive Contacted Indian Govt Regarding the Block but Got No Response.” The main point is that a repository (incomplete as its collection of Web pages may be) seems to be unavailable in India. Perhaps the Indian government has found a way to search for information in the service. We have noted that searching for rich media, including the collection of 78 rpm records, is a tough slog. It is tough to find information even when it is online. When services are filtered, locating facts, semi-facts, and outright hoohaw becomes impossible. We think the actions could impair the outstanding customer support services provided by the world’s second largest nation. Efficient delivery of information centric services, however, are like to improve in Mumbai. China, Indonesia, Russia, Turkey, and now India may be taking steps to put the data doggies in the kennel.
Stephen E Arnold, August 9, 2017
Google and Its Vestager Adventure
July 7, 2017
I found the analyses of Google’s fine for certain misunderstood and misinterpreted behavior interesting. I noted a round up in that font of legal and technical wisdom, the Hollywood Reporter, which presented pros and cons of the decision. Well, sort of one pro and one con. My question, “Why was the Hollywood Reporter interested in a legal decision seemingly far removed from the concerns of Hollywood?”
I also noted “More Than Money: Why Google’s Antitrust Loss Matters.” One of the points in this write up was that the EU process might qualify some other companies for a day in court with a stop at the toll booth on the way out of the building.
I noted this passage:
These other cases involve: (1) the available range of mobile apps in the Android operating system, and (2) allegations that through AdSense, Google has prevented third-party websites from sourcing search ads. Once complete, these cases could result in similarly hefty fines. Indeed, given the European Commission’s statements regarding the potentiality of findings of abuse, it seems unlikely that Google will escape further punitive measures.
Several observations:
- Google will pay the fine one way or another but there will be some legal excitement on the information highway leading to the pay station
- Other US companies are likely to be getting an invitation to explain their business practices. Brussels and Strasbourg are fun cities with good restaurants and some nice hotels.
- Google will have an opportunity to explain some of its other systems and methods in the future.
I am not sure saying, “Hey, we’re sorry” will work very well. One thing is certain: Google will not ask IBM Watson for its take on the matter.
Stephen E Arnold, July 7, 2017
Online Filtering: China and “All” Rich Media
July 6, 2017
i read “China’s Bloggers, Filmmakers Feel Chill of Internet Crackdown.” The main idea is that control over Internet content is getting exciting. I noted this point in the “real” news story:’
Over the last month, Chinese regulators have closed celebrity gossip websites, restricted what video people can post and suspended online streaming, all on grounds of inappropriate content.
Yep, an “all” in the headline and an “all” in the text of the story.
I also thought the point that emerges from the alleged statement of an academic whose travel to and from China is likely to become more interesting:
“According to these censorship rules, nothing will make it through, which will do away with audiovisual artistic creation,” Li Yinhe, an academic who studies sexuality at the government-run Chinese Academy of Social Sciences, wrote in an online post. Under the government rules, such works as Georges Bizet’s opera “Carmen” and Shakespeare’s “Othello” would technically have to be banned for depicting prostitution and overt displays of affection, she said.
What’s the key point? It seems to me that China wants to prevent digital content from eroding what the write up calls via a quote from “an industry association” “socialist values.” Yep, bad. Filtering and controls applied by commercial enterprises, therefore, must be better. If government filters applied by countries other than China may be sort of better than China’s approach.
Hey, gentle reader, this is news. But does “news” exist if one cannot access it online? Perhaps actions designed to limit Surface Web online content will increase the use of encrypted systems such as sites accessible via Tor.
Presumably Thomson Reuters new incubator for smart software and big data will not do any of the filtering thing? On the other hand, my hunch is that Thomson Reuters will filter like the Dickens: From screening ideas to fund to guiding the development trajectories of the lucky folks who get some cash.
Worth watching the publishing giant which has been struggling to generate significant top line growth.
Stephen E Arnold, July 5, 2017
Dark Web Notebook Now Available
June 5, 2017
Arnold Information Technology has published Dark Web Notebook: Investigative Tools and Tactics for Law Enforcement, Security, and Intelligence Organizations. The 250-page book provides an investigator with instructions and tips for the safe use of the Dark Web. The book, delivered as a PDF file, costs $49.

Orders and requests for more information be directed to darkwebnotebook@yandex.com. Purchasers must verify that they work for a law enforcement, security, or intelligence organization. Dark Web Notebook is not intended for general distribution due to the sensitive information it contains.
The author is Stephen E Arnold, whose previous books include CyberOSINT: Next Generation Information Access and Google Version 2.0: The Calculating Predator, among others. Arnold, a former Booz, Allen & Hamilton executive, worked on the US government-wide index and the Threat Open Source Intelligence Gateway.
The Dark Web Notebook was suggested by attendees at Arnold’s Dark Web training sessions, lectures, and webinars. The Notebook provides specific information an investigator or intelligence professional can use to integrate Dark Web information into an operation.
Stephen E Arnold, author of the Dark Web Notebook, said:
“The information in the Dark Web Notebook has been selected and presented to allow an investigator to access the Dark Web quickly and in a way that protects his or her actual identity. In addition to practical information, the book explains how to gather information from the Dark Web. Also included are lists of vendors who provide Dark Web services to government agencies along with descriptions of open source and commercial software tools for gathering and analyzing Dark Web data. Much of the information has never been collected in a single volume written specifically for those engaged in active investigations or operations.”
The book includes a comprehensive table of contents, a glossary of terms and their definitions, and a detailed index.
The book is divided into 13 chapters. These are:
- Why write about the Dark Web?
- An Introduction to the Dark Web
- A Dark Web Tour with profiles of more than a dozen Dark Web sites, their products, and services
- Dark Web Questions and Answers
- Basic Security
- Enhanced Security
- Surface Web Resources
- Dark Web Search Systems
- Hacking the Dark Web
- Commercial Solutions
- Bitcoin and Variants
- Privacy
- Outlook
In addition to the Glossary, the annexes include a list of DARPA Memex open source software written to perform specific Dark Web functions, a list of spoofed Dark Web sites operated by law enforcement and intelligence agencies, and a list of training resources.
Kenny Toth, June 5, 2017
 
	




