Factualities for August 28, 2019

August 28, 2019

Summer is coming to an end. But the flow of undocumented, slightly crazy statistics continues.

Our crazy number of the week is:

33. Number of counts of theft and attempted theft leveled at Xoogler Anthony Levandowski, the self driving car wizard. Source: New York Times

Other factualities we noted are:

3. Number of unicorns in Japan. Source: Financial Times (pay walled, you lucky reader)

22. Number of regions for Amazon AWS’ 69 availability zones. (No, we don’t know what regions and zones are either.) Source: Infoq

70. The percentage of the total crypto market Bitcoin has. Source: Next Web

74. The percentage of streaming video sessions throttled by AT&T. Source: Engadget

2,000. Number of surveillance cameras from unauthorized vendors in use by the US government. Source: Forbes

4,100. Number of unsafe or recalled product listings on Amazon. Source: Arstechnica

20,000. Size of the police case backlog created after a cyber security firm found its cyber security system breached. Source: Inquirer

30,000. Number of devices US Customs and Border Patrol searched without a warrant in 2018. Source: TechCrunch

501,000. The number of “fewer jobs” the US has. We think this means, more people out of work or just chilling with their mobiles. Source: AP

21.6 million. The number of fake accounts Microsoft LinkedIn has block in the first six months of 2019. Source: Geekwire

57 million. Number of people in the US living with a disability. Are those mobile apps accessible? Source: Next Web

$100 million. Number of probable downloads of the malware dropper Android CamScanner from the Google Play Store. Source: Bleeping Computer

$150 million. Cost of electricity to power the new Cray super computer for five years. Source: The Next Platform

12 trillion. The number of transistors on a new chip specifically designed for artificial intelligence/machine learning. Slashdot

Stephen E Arnold, August 28, 2019

Disrupting Neural Nets: Adversarial Has a More Friendly Spin Than Weaponized

August 28, 2019

In my lecture about manipulation of algorithms, I review several methods for pumping false signals into a data set in order to skew outputs.

The basic idea is that if an entity generates content pulses which are semantically or otherwise related in a way the smart software “counts”, then the outputs are altered.

A good review of some of these flaws in neural network classifiers appears in “How Reliable Are Neural Networks Classifiers Against Unforeseen Adversarial Attacks.”

DarkCyber noted this statement in the write up:

attackers could target autonomous vehicles by using stickers or paint to create an adversarial stop sign that the vehicle would interpret as a ‘yield’ or other sign. A confused car on a busy day is a potential catastrophe packed in a 2000 pound metal box.

Dramatic, yes. Far fetched? Not too much.

Providing weaponized data objects to smart software can screw up the works. Examples range from adversarial clothing, discussed in the DarkCyber video program for August 27, 2019, to the wonky predictions that Google makes when displaying personalized ads.

The article reviews an expensive and time consuming method for minimizing the probability of weaponized data mucking up the outputs.

The problem, of course, is that smart software is supposed to handle the tricky, expensive, and slow process of assembling and refining a training set of data. Talk about smart software is really cheap. Delivering systems which operate in the real world is another kettle of what appear to be fish as determined by a vector’s norm.

The Analytics India article is neither broad nor deep. It does raise awareness of the rather interesting challenges which lurk within smart software.

Understanding how smart software can get off base and drift into LaLa Land begins with identifying the problem.

Smart software cannot learn and discriminate with the type of accuracy many people assume is delivered. Humans assume a system output is 99 percent accurate; for example, Is it raining?

The reality is that adversarial inputs can reduce the accuracy rate significantly.

On good days, smart software can hit 85 to 90 percent accuracy. That’s good enough unless a self driving car hits you. But with adversarial or weaponized data, that accuracy rate can drop below the 65 percent level which most of the systems DarkCyber has tested can reliably achieve.

To sum up, smart software makes mistakes. Weaponized data input into a smart software can increase the likelihood of an error.

The methods can be used in commercial and military theaters.

Neither humans nor software can prevent this from happening on a consistent basis.

So what? Yes, that’s a good question.

Stephen E Arnold, August 29. 2019

Facebook: One Must Respect Different Views of Reality

August 28, 2019

I wonder if the works of the Argentinean writer Borges are required reading at Facebook.

DarkCyber noted this article in Gizmodo: Alex Stamos, Ex-Facebook Security Chief, Blames Journalists for Cambridge Analytica Fallout. This passage warranted a tick mark:

According to Facebook’s former chief security officer, reporters who covered the company’s Cambridge Analytica scandal are at least partly to blame.

Alex Stamos, who oversaw security at Facebook when news first broke about the scandal last year, criticized BuzzFeed and “other outlets” over what he called “unbalanced reporting on privacy,” saying the media coverage of Facebook’s numerous privacy violations has been geared all along toward hampering its ability to share data for legitimate research.

DarkCyber spotted this write up as well: “Facebook Staff Had Concerns About ‘Sketchy’ Cambridge Analytica Year Before 2016 Election.” We circled this statement:

Facebook employees discussed for months how they would look into Cambridge Analytica’s practices. A document published by the company containing the emails appears to show the company only learned that Aleksandr Kogan, a developer working for the political data firm, had improperly gathered information on tens of millions of Americans in December of 2015, after the Guardian published a report.

Yep, interesting company. Too bad Borges is not around to explain the two views using Facebook’s duality.

Stephen E Arnold, August 28, 2019

The New Lingo of Enterprise Search

August 28, 2019

Enterprise search is back. My Google Alert has been delivering market research reports which tell me that finding information is huge. Plus, there have been some announcements about funding which have surprised me. Examples include:

  • Capacity raised $13.2 million. Source: DarkCyber
  • LucidWorks snagged an additional $100 million. Source: Globe News Wire
  • Squirro pulled in additional funds, but the timing of the Salesforce investment and additional funding of this Zurich based company remains a bit of a mystery. Source: Venture Lab

These are just three examples plucked from my box of note cards about search vendors.

What’s interesting is the lingo, the jargon, and the argot these outfits are using. Frankly the plumbing is usually open source, a fact which the companies bury beneath the blizzard of buzzwords.

Here are some examples:

AI powered

actionable insights

artificial intelligence

cloud

cognitive

connect the dots

data mining

fusion

information mining

machine learning

natural language

pattern detection

platform

self learning

transform

The problem with the vendors collecting investment funds are easy to identify:

  1. The content processed is text. The unstructured information in videos, podcasts, messaging apps like WhatsApp, images like chemical structures and engineering drawings, etc. are not included.
  2. Indexing content residing on cloud platforms may work today, but as market dynamics shift, access to that content my be blocked or prohibited by regulations in certain countries
  3. Federation, on-the-fly so that real time information is available remains a challenge which typically requires script fiddling or new content filters
  4. Configuration of “smart” systems is not significantly different from the complex, time consuming, and expensive procedures which added friction to some Autonomy, Convera, Fast Search & Transfer, and similar systems’ deployment
  5. Maintenance is an issue, micro services work well in a low latency environment. Under loads, the magic of sub three second response can disappear
  6. Search remains an idiosyncratic solution. Many departments require specific features. As a result, enterprise search — regardless of the wrappers around open source information retrieval systems — is a series of customizations.

To sum up, enterprise search has failed to deliver for more than 50 years. Despite the optimism that investors have for “finding the next Google”, enterprise search vendors will find themselves hitting a revenue ceiling just as Autonomy, Fast Search, and similar firms did.

The fix was acquisitions and allegations of financial fancy dancing. If we assume that investors still dream of a 10x or higher return, is it possible that LucidWorks can generate sufficient revenue to pull off an IPO or a sale like Exalead, Vivisimo, and other search vendors were able to complete before the hammer fell?

This is an important question because new enterprise search vendors are popping up like mushrooms. The incumbents like Attivio, Coveo, Mindbreeze, and Sinequa are also trying to smash a ball over the fence.

Net net: Enterprise search appears to be putting on the worn slippers last used by the founders of Fast Search & Transfer. Maybe Microsoft will buy another enterprise search vendor? The problem is that enterprise search is easy to make visible with marketing LED lights. Delivering sustainable revenues is a far greater challenge when Amazon is a competitor and a platform enabler.

What happens when Amazon competes more aggressively, raises its prices, or bundles text search into another of its services?

Answer: Nothing particularly beneficial for the investors in new and improved enterprise search solutions based on Lucene/Solr and dusted with disco glitter.

Stephen E Arnold, August 28, 2019

Is Google Privacy Oriented?

August 28, 2019

Google may be like sugar. We love Google, so we consume a lot of its products. Eventually Google harms us in someway. Unlike Sugar, Google does not rot teeth, cause weight gain, nor contribute to numerous diseases.

Google instead collects private user information and shares it with advertisers to make a buck. Medium reports that Google does more to take advantage of its users: “Google Photo Is Making Your Photos Semi-Public And You Probably Don’t Realize.”

Millions of Google users upload, share, and store their photos on Google Photo. Little do these users know is that whenever is photo is shared on Google Photos it creates a link and anyone in the world can view said photo. You do not believe me? Article writer Robert Wiblin discovered that no one believed him either, until he showed them.

When you share a photo via Google Photo it creates a “secret link.” If the secret link is shared, anyone can view the photo until its manually deleted. People assume their photos are private, because Google lists who it is shared with, but that is not true. Wiblin and I both agree this is unacceptable:

Firstly it’s unacceptable because most users don’t realize it’s happening. The interface is so poorly designed that the most common reaction I’ve had when I tell Photos users about this is literal disbelief. The only way to convince people is to show them with their own eyes. If our private and potentially sensitive data is going to be revealed this way, it should be clear that it’s going on.

We also noted this statement:

It’s also unacceptable because it creates an excessive risk of sensitive data being exposed. People often take photos of things like private documents, or themselves naked. It’s very important only the right people get to see these things! Google is a data company that has a responsibility to its users to make sure that’s the case.”

You might not care, but think about this: any of these photos and the information they contain can be hacked, shared, or stolen. They can be posted publicly and perpetually exist online.

Is there an easy way to resolve this issue? Could alter the Google Photo interface to match Google Drive, which is mostly transparent and states exactly where information is shared? Could Google Photo notify users of this link visibility?

Over to you, Google.

Whitney Grace, August 28, 2019

Google Cookies: Dancing Around

August 28, 2019

In my Google Version 2: The Calculating Predator, I summarized a number of Google innovations which embed tracking. One of the more interesting approaches was for Google to become the Internet; that is, when you run a query, you are accessing the Internet as it exists within Google. (If you want more information, write benkent2020 @ yahoo dot com. I sell a set of “fair copies” of these original books I submitted to a now defunct publisher in Brexitland. There are some minor typos and a dropped graphic or two, but the info is there.)

I wrote the Google monographs in 2003 to 2008.

The tracking functions, the walled garden, the Google version of the Internet — each of these were in place more than 15 years ago. Therefore, any modification of Google’s cookies polices and the associated technology like Ramanathan Guha’s and Alon Halevy’s innovations is a very big job. Given the present state of the Google architecture, I am not sure that the existing crew of 100,000 plus could make such modifications without having many Google services break. “Services”, however, are not what users experience. The services are the internal operations that ensure ads get displayed, the click stream data are collected, the internal components have access to fresh user behavior data, and the public facing outputs like search results, “did you mean”, and even the “I’m feeling lucky” are in line with what Google’s financial demands require. Remember: Ads have to be displayed and users induced to click on them to make the Yahoo-GoTo-Overture inspired system function.

Cookies, including the special DoubleClick variety and the garden variety “expire a long time in the future” type are important to the Google system. If you can’t find content in an index, the reason may be that the site’s content is no longer generating clicks. Indexing becomes more important with each passing day. How does one control costs? Well, those cookies and beacons are helpful. No signals of click love, then less frequent or zero indexing. Thus, indexing costs can be managed which is almost impossible if a spider just follows links, changed content, and new information. Where is an index to the content on “beat sites” like Beatstars.com? Answer: The content is not indexed if our recent test queries are accurate. (I know, “What’s beat content? Not in this write up, gentle reader, not in this write up.)

Against this background I want to call your attention to “Deconstructing Google’s Excuses on Tracking Protection.” The write up is a reasonable analysis of Google saying that it wants to be more respectful of user’s privacy.

DarkCyber thought the summary of cookies was good. Here’s the passage we circled:

Our high-level points are:

1) Cookie blocking does not undermine web privacy. Google’s claim to the contrary is privacy gas lighting.

2) There is little trustworthy evidence on the comparative value of tracking-based advertising.

3) Google has not devised an innovative way to balance privacy and advertising; it is latching onto prior approaches that it previously disclaimed as impractical.

4) Google is attempting a punt to the web standardization process, which will at best result in years of delay.

My concern is that this type of write up does not specifically state what Google is doing. The use of the phrase “gas lighting” and the invocation of Shoshana Zuboff’s The Age of Surveillance Capitalism are very trendy.

Unfortunately, plain talk is needed. With Google search the primary conduit of what is “important”, the game is no longer one of cookies.

Exactly what can a government or a committee do to address more than 15 years of engineering specifically designed to track people, cluster individuals into groups, predict what the majority of those in a statistically valid cluster want, and make sense of individual user behavior cues?

One step may be that writers and analysts adopt a more direct, blunt way of explaining Google/DoubleClick tracking. The reason individuals do not speak out is that there is what I call “Google fright”. It affects news release services. It affects analysts. It affects “real journalists.” It affects Google’s would be government watch dogs.

Who doesn’t want a Google mouse pad or T shirt? Darned few. Fear of Google may be a factor to consider when reading about DarkCyber’s favorite ad supported, Web search system.

Stephen E Arnold, August 28, 2019

Google: Anything Goes Except Lots of Stuff

August 27, 2019

I read “What It Means to Work at Google When You Can No Longer Say Anything You Want.” This statement caught my attention:

Employees were encouraged to be their true, unfiltered selves on internal social forums as long as they were harnessing that energy to help Google succeed.

The write up quotes a Google internal memo that allegedly says:

Billions of people rely on us every day for high-quality, reliable information. It’s critical that we honor that trust and uphold the integrity of our products and services…

I also found this passage interesting:

An office environment that harms some workers and moderation policies that harm some users may be separate problems, but in Google’s case, the former never prodded the company to do anything about the latter—until it became a problem with implications for the very health of democracy, and lawmakers started to threaten the company with regulations. And both issues stem from the same formative Silicon Valley worldview that conceptualizes the internet as a place that functions best with as little oversight as possible.

Several observations are warranted because I am not involved in today’s GOOG:

  1. Google appears to be bewilderment. The perception of itself is different from what some of its employee factions perceive. Money cannot buy obedience. The greatest threat to the country of Google is citizen revolt.
  2. The Slate write up is a long overdue crtiical look at the weaknesses of Google’s high school science club management methods. For a long time, Google seemed to just make up stuff up as it moved along. That method may not work in today’s wild and crazy business environment.
  3. Google faces significant competition from Facebook. That’s less of an issue than Amazon, the Bezos bulldozer.

The earth is shaking around Google buildings.

Stephen E Arnold, August 27, 2019

Free Music Samples

August 27, 2019

Short honk: Looking for free music samples? A collection of samples is available on “Free Sound Samples.” Queries via search engines for samples produces some wonky results. Worth noting.

Stephen E Arnold, August 27, 2018

Grover and Real Fake News

August 27, 2019

The Next Web reported, “This Terrifying AI Generates Fake Articles from Any News Site.” Now, the point here is to create an AI that can easily detect fake news, but researchers at the Allen Institute for Artificial Intelligence began with one that could generate such content. Basically, it takes one to know one. We learn:

“A team of researchers at the institute recently developed Grover, a neural network capable of generating fake news articles in the style of actual human journalists. In essence, the group is fighting fire with fire because the better Grover gets at generating fakes, the better it’ll be at detecting them. … Most fake news is generated by humans and then spread on social media. But the rise of robust systems such as OpenAI’s controversial GPT-2 point toward a future where AI-generated articles are close enough to the real thing to obfuscate nearly any issue. While it’s easy enough to search a website to see if an article is legitimate, not everyone is going to do that. And if an article goes viral, no matter how false it is, some people will be convinced.”

Writer Tristan Greene shares some passages Grover wrote, so see the article if you wish to read those. They are pretty convincing, especially if one just skims the text (as many readers do).. One example aptly mimics President Obama’s writing/ speaking style, while another seems to spook Greene with how well it captures his own writing essence. The article concludes with this link, where each of us can take Grover for a test drive. Modern life is fun.

Cynthia Murrell, August 27, 2019

Google Does Podcasts Too

August 27, 2019

Everyone and his or her dog has a podcast, but the problem is you cannot find individual episodes in a search engine. Sure, you can go to individual Web sites, iTunes, or Anchor to track down specific episodes, but that requires a lot of searching and typing. Thankfully, Google has changed its search algorithm to be friendlier for individual podcast episodes. Tech Radar explains the news in the article, “Google Search Just Got Smarter At Finding Podcast Episodes.”

Now when people search for a podcast through Google, the podcast will appear in the search results along with a display carousel of individual episodes. Google is able to do this, because it is a direct result of natural language processing and artificial intelligence programming. Google’s AI department is hard at work developing the search engine’s ability to “understand what is being talked about” in search terms.

It might be a simple return on state of the art technology, but it proves how Google’s search algorithm is getting smarter.

While search results list the podcast and its individual episodes, there are still some limitations:

“You can’t currently listen to the podcast direct from the search results, it will instead click through to the Google Podcasts web app, but support for third-party apps and websites that may hold exclusive rights to a podcast will be supported in the future, greatly increasing the potential search results. The blog post also mentions that the tech giant will be bringing the same functionality to Google Assistant later in the year, as well as the dedicated Google Podcasts for web, from which you’ll be able to also listen directly to the episode from the search result.”

Will Google put podcasts in YouTube? That’s an original idea. So if you want to find your dog’s podcast, all you have to do is type it into Google and it will appear. That’s the theory at least.

Whitney Grace, August 27, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta