Is Google Playing Defense?

May 31, 2018

The Search Engine Roundtable reports, “Google Has a Bias Towards Scientific Truth in Search.” Great! Now what about reproducible scientific studies?

This defense of a slant toward verifiable truth was made by Google engineer Paul Haahr on Twitter after someone questioned the impartiality of his company’s “quality raters guidelines,” section 3.2 (reproduced for our convenience in the write-up). The guidelines consider consensus and subject-matter expertise in search rankings, a position one Twitter user took issue with. Writer Barry Schwartz lets that thread speak for itself, so see the write-up for the back-and-forth. The engineer’s challenger basically questions Google’s right to discern good sources from bad (which is, I’d say, is the basic the job of a search engine). This is Haahr’s side:

“We definitely do have a bias towards, for example, what you call ‘Scientific Truth,’ where the guidance in section 3.2 says ‘High quality information pages on scientific topics should represent well­ established scientific consensus on issues where such consensus exists. […]

‘It’s the decision we’ve made: we need to be able to describe what good search results are. Those decisions are reflected in our product. Ultimately, someone who disagrees with our principles may want a different product; there may be a market niche for them. […]

‘I think it’s the only realistic model if you want to build a search engine. You need to know what your objective in ranking is. Evaluation is central to the whole process and that needs clarity on what “good” means. If you don’t describe it, you only get noise.’”

The write-up concludes with this question from Haahr—if Google’s search results are bad, is it because they are too close to their guidelines, or too far away?

Cynthia Murrell, May 31, 2018

Google: Excellence Evolves to Good Enough

May 25, 2018

I read “YouTube’s Infamous Algorithm Is Now Breaking the Subscription Feed.” I assume the write up is accurate. I believe everything I read on the Internet.

The main point of the write up seems to me to be that good enough is the high water mark.

I noted this passage, allegedly output by a real, thinking Googler:

Just to clarify. We are currently experimenting with how to show content in the subs feed. We find that some viewers are able to more easily find the videos they want to watch when we order the subs feed in a personalized order vs always showing most recent video first.

I also found this statement interesting:

With chronological view thrown out, it’s going to become even more difficult to find new videos you haven’t seen — especially if you follow someone who uploads at a regular time each day.

I would like to mention that Google, along wit In-Q-Tel, invested in Recorded Future. That company has some pretty solid date and time stamping capabilities. Furthermore, my hunch is that the founders of the company know the importance of time metadata to some of the Recorded Future customers.

What would happen if Google integrated some of Recorded Future’s time capabilities into YouTube and into good old Google search results.

From my point of view, good enough means “sells ads.” But I am usually incorrect, and I expect to learn just how off base I am when I explain how one eCommerce giant is about to modify the landscape for industrial strength content analysis. Oh, that company’s technology does the date and time metadata pretty well.

More on this mythical “revolution” on June 5th and June 6th. In the meantime, try and find live feeds of the Hawaii volcano event using YouTube search. Helpful, no?

Stephen E Arnold, May 25, 2018

IBM: Just When You Thought Crazy Stuff Was Dwindling

May 19, 2018

How has IBM marketing reacted to the company’s Watson and other assorted technologies? Consider IBM and quantum computing. That’s the next big thing, just as soon as the systems become scalable. And the problem of programming? No big deal. What about applications? Hey, what is this a reality roll call?

Answer: Yes, plus another example of IBM predicting the future.

Navigate to “IBM Warns of Instant Breaking of Encryption by Quantum Computers: ‘Move Your Data Today’.”

I like that “warning.” I like that “instant breaking of encryption.” I like that command: “Move your data today.”


hog in mud

IBM’s quantum computing can solve encryption problems instantly. Can this technology wash this hog? The answer is that solving encryption instantly and cleaning this dirty beast remain highly improbably. To verify this hunch, let’s ask Watson.

The write up states with considerable aplomb:

“Anyone that wants to make sure that their data is protected for longer than 10 years should move to alternate forms of encryption now,” said Arvind Krishna, director of IBM Research.

So, let me get this straight. Quantum computing can break encryption instantly. I am supposed to move to an alternate form of encryption. But if encryption can be broken instantly, why bother?

That strikes me as a bit of the good old tautological reasoning which leads exactly to nowhere. Perhaps I don’t understand.

I learned:

The IBM Q is an attempt to build a commercial system, and IBM has allowed more than 80,000 developers run applications through a cloud-based interface. Not all types of applications will benefit from quantum computers. The best suited are problems that can be broken up into parallel processes. It requires different coding techniques. “We still don’t know which applications will be best to run on quantum computers,” Krishna said. “We need a lot of new algorithms.”

No kidding. Now we need numerical recipes, and researchers have to figure out what types of problems quantum computing can solve?

We have some dirty hogs in Harrod’s Creek, Kentucky. Perhaps IBM’s quantum cloud computing thing which needs algorithms can earn some extra money. You know that farmers in Kentucky pay pretty well for hog washing.

Stephen E Arnold, May 19, 2018

Text Classification: Established Methods Deliver Good Enough Results

April 26, 2018

Short honk: If you are a cheerleader for automatic classification of text centric content objects, you are convinced that today’s systems are home run hitters. If you have some doubts, you will want to scan the data in “Machine Learning for Text Categorization: Experiments Using Clustering and Classification.” The paper was free when I checked at 920 am US Eastern time. For the test sets, Latent Dirichlet Allocation performed better than other widely used methods. Worth a look. From my vantage point in Harrod’s Creek, automated processes, regardless of method, perform in a manner one expert explained to me at Cebit several years ago: “Systems are good enough.” Improvements are now incremental but like getting the last few percentage ticks of pollutants from a catalytic converter, an expensive and challenging engineering task.

Stephen E Arnold, April 26, 2018

Quote to Note: Statistics May Spoil Like Bananas

April 13, 2018

I noticed this synopsis for a talk by Andrew Gelman, a wizard who teaches at Columbia University. You can find the summary in “Do Statistical methods Have an Expiration Date?” Here’s the quote I noted:

The statistical methods which revolutionized science in the 1930s-1950s no longer seem to work in the 21st century. How can this be? It turns out that when effects are small and highly variable, the classical approach of black-box inference from randomized experiments or observational studies no longer works as advertised.

What happens when these methods are bolted into next generation data analytics systems which humans use to make decisions? My great uncle (Vladimir.I. Arnold and his co worker Andrey Kolmogorov could calculate an answer I assume?)

Stephen E Arnold, April 13, 2018

The AI Spy Who Photographed Me

March 29, 2018

Artificial intelligence is one of the of the tools that law enforcement is using to thwart potential terrorist attacks and other illegal activities.  Applications use AI to run data analysis, scan the Dark Web, and monitor identity theft.  One major use for AI is image analysis and facial recognition.  IEEE Spectrum takes a look at how there is a huge demand for more accurate image AI, “Wanted: AI That Can Spy.”  While fear over spy satellites is not much a plot point anymore, the US has hundreds of satellites orbiting the planet capturing photographic data.  Humans are only capable of observing so many photographic data and the US government has FOMO “fear of missing out” on something important.

US intelligence officials sponsored an AI challenge to identify objects of interest in satellite images.  The entire goal is to improve AI standards and capabilities:

Since July, competitors have trained machine-learning algorithms on one of the world’s largest publicly available data sets of satellite imagery—containing 1 million labeled objects, such as buildings and facilities. The data is provided by the U.S. Intelligence Advanced Research Projects Activity (IARPA). The 10 finalists will see their AI algorithms scored against a hidden data set of satellite imagery when the challenge closes at the end of December.

The agency’s goal in sponsoring the Functional Map of the World Challenge aligns with statements made by Robert Cardillo, director of the U.S. National Geospatial-Intelligence Agency, who has pushed for AI solutions that can automate 75 percent of the workload currently performed by humans analyzing satellite images.

Lockheed research scientist Mark Pritt guessed that the US government wants to automatically generate maps, instead of relying on manual labor.  Pritt’s Lockheed team is one of the many teams competing for the $100,000 prize to develop the best deep-learning algorithm that can recognize specific patterns and identify objects of interest in satellite images.  Satellite images are more complex than other images because they are shot from multiple angles, cloud coverage is a problem, and a variety of resolutions.

Even if a deep-learning algorithm was developed it would not be enough, because the algorithm lacks the ability for refinement.  Think sentimental analysis, except with images.  The perfect solution for the moment is a combination of AI and human interaction.  The AI does the bulk of the work, while humans examine flagged photos for further investigation.

Whitney Grace, March 29, 2018

Importance of Good Data to AI Widely Underappreciated

March 27, 2018

Reliance on AI has now become embedded in our culture, even as we struggle with issues of algorithmic bias and data-driven discrimination. Tech news site CIO reminds us, “AI’s Biggest Risk Factor: Data Gone Wrong.” In the detailed article, journalist Maria Korolov begins with some early examples of “AI gone bad” that have already occurred, and explains how this happens; hard-to-access data, biases lurking within training sets, and faked data are all concerns. So is building an effective team of data management workers who know what they are doing. Regarding the importance of good data, Korolov writes:

Ninety percent of AI is data logistics, says JJ Guy, CTO at Jask, an AI-based cybersecurity startup. All the major AI advances have been fueled by advances in data sets, he says. ‘The algorithms are easy and interesting, because they are clean, simple and discrete problems,’ he says. ‘Collecting, classifying and labeling datasets used to train the algorithms is the grunt work that’s difficult — especially datasets comprehensive enough to reflect the real world.’… However, companies often don’t realize the importance of good data until they have already started their AI projects. ‘Most organizations simply don’t recognize this as a problem,’ says Michele Goetz, an analyst at Forrester Research. ‘When asked about challenges expected with AI, having well curated collections of data for training AI was at the bottom of the list.’ According to a survey conducted by Forrester last year, only 17 percent of respondents say that their biggest challenge was that they didn’t ‘have a well-curated collection of that to train an AI system.’

Eliminating bias gleaned from training sets (like one AI’s conclusion that anyone who’s cooking must be a woman) is tricky, but certain measures could help. For example, tools that track how an algorithm came to a certain conclusion can help developers correct its impression. Also, independent auditors bring in a fresh perspective. These delicate concerns are part of why, says Korolov, AI companies are “taking it slow.” This is slow? We’d better hang on to our hats whenever (they decide) they’ve gotten a handle on these issues.

Cynthia Murrell, March 27, 2018

Cambridge Analytica and Fellow Travelers

March 26, 2018

I read Medium’s “Russian Analyst: Cambridge Analytica, Palantir and Quid Helped Trump Win 2016 Election.” Three points straight away:

  1. The write up may be a nifty piece of disinformation
  2. The ultimate source of the “factoids” in the write up may be a foreign country with interests orthogonal to those of the US
  3. The story I saw is dated July 2017, but dates – like other metadata – can be fluid unless in a specialized system which prevents after the fact tampering.

Against this background of what may be hefty problems, let me highlight several of the points in the write up I found interesting.

More than one analytics provider. The linkage of Cambridge Analytica, Palantir Technologies, and Quid is not a surprise. Multiple tools, each selected for its particular utility, are a best practice in some intelligence analytics operations.

A Russian source. The data in the write up appear to arrive via a blog by a Russian familiar with the vendors, the 2016 election, and how analytic tools can yield actionable information.

Attributing “insights.” Palantir allegedly output data which suggested that Mr. Trump could win “swing” states. Quid’s output suggested, “Focus on the Midwest.” Cambridge Analytica suggested, “Use Twitter and Facebook.”

If you are okay with the source and have an interest in what might be applications of each of the identified companies’ systems, definitely read the article.

On April 3, 2018, my April 3, 2018, DarkCyber video program focuses on my research team’s reconstruction of a possible workflow. And, yes, the video accommodates inputs from multiple sources. We will announce the location of the Cambridge Analytica, GSR, and Facebook “reconstruction” in Beyond Search.

Stephen E Arnold, March 26, 2018

Algorithm Positions Microsoft on Top of Global Tech Field

March 23, 2018

This is quite a surprise. Reporting the results of their own analysis, Reuters announces, “Microsoft Tops Thomson Reuters Top 100 Global Tech Leaders List.” The write-up tells us that, in second and third place, were:

… Chipmaker Intel and network gear maker Cisco Systems. The list, which aims to identify the industry’s top financially successful and organizationally sound organizations, features US tech giants such as Apple, Alphabet, International Business Machines and Texas Instruments, among its top 10. Microchip maker Taiwan Semiconductor Manufacturing, German business software giant SAP and Dublin-based consultant Accenture round out the top 10. The remaining 90 companies are not ranked, but the list also includes the world’s largest online retailer Amazon and social media giant Facebook.


The results are based on a 28-factor algorithm that measures performance across eight benchmarks: financial, management and investor confidence, risk and resilience, legal compliance, innovation, people and social responsibility, environmental impact, and reputation. The assessment tracks patent activity for technological innovation and sentiment in news and selected social media as the reflection of a company’s public reputation. The set of tech companies is restricted to those that have at least $1 billion in annual revenue.

That is an interesting combination of factors; I’d like to see that Venn diagram. Some trends emerged from the report. For example, 45 of those 100 companies are based in the US (but 47 in North America); 38 are headquartered in Asia, 14 in Europe, and one in Australia.

Cynthia Murrell, March 23, 2018

What Happens When Intelligence Centric Companies Serve the Commercial and Political Sectors?

March 18, 2018

Here’s a partial answer:






Years ago, certain types of companies with specific LE and intel capabilities maintained low profiles and, in general, focused on sales to government entities.

How times have changed!

In the DarkCyber video news program for March 27, 2018, I report on the Madison Avenue type marketing campaigns. These will create more opportunities for a Cambridge Analytica “activity.”

Net net: Sometimes discretion is useful.

Stephen E Arnold, March 18, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta