Google Introduces Fact Checking Tool

October 26, 2016

If it works as advertised, a new Google feature will be welcomed by many users—World News Report tells us, “Google Introduced Fact Checking Feature Intended to Help Readers See Whether News Is Actually True—Just in Time for US Elections.” The move is part of a trend for websites, who seem to have recognized that savvy readers don’t just believe everything they read. Writer Peter Woodford reports:

Through an algorithmic process from known as ClaimReview, live stories will be linked to fact checking articles and websites. This will allow readers to quickly validate or debunk stories they read online. Related fact-checking stories will appear onscreen underneath the main headline. The example Google uses shows a headline over passport checks for pregnant women, with a link to Full Fact’s analysis of the issue. Readers will be able to see if stories are fake or if claims in the headline are false or being exaggerated. Fact check will initially be available in the UK and US through the Google News site as well as the News & Weather apps for both Android and iOS. Publishers who wish to become part of the new service can apply to have their sites included.

Woodford points to Facebook’s recent trouble with the truth within its Trending Topics feature and observes that many people are concerned about the lack of honesty on display this particular election cycle. Google, wisely, did not mention any candidates, but Woodford notes that Politifact rates 71% of Trump’s statements as false (and, I would add, 27% of Secretary Clinton’s statements as false. Everything is relative.)  If the trend continues, it will be prudent for all citizens to rely on (unbiased) fact-checking tools on a regular basis.

Cynthia Murrell, October 26, 2016
Sponsored by, publisher of the CyberOSINT monograph

Trending Topics: Google and Twitter Compared

October 25, 2016

For those with no time to browse through the headlines, tools that aggregate trending topics can provide a cursory way to keep up with the news. The blog post from communications firm Cision, “How to Find Trending Topics Like an Expert,” examines the two leading trending topic tools—Google’s and Twitter’s. Each approaches its tasks differently, so the best choice depends on the user’s needs.

Though the Google Trends homepage is limited, according to writer Jim Dougherty, one can get further with its extension, Google Explore. He elaborates:

If we go to the Google Trends Explore page (, our sorting options become more robust. We can sort by the following criteria:

*By country (or worldwide)

*By time (search within a customized date range – minimum: past hour, maximum: since 2004)

*By category (arts and entertainment, sports, health, et cetera)

*By Google Property (web search, image search, news search, Google Shopping, YouTube)

You can also use the search feature via the trends page or explore the page to search the popularity of a search term over a period (custom date ranges are permitted), and you can compare the popularity of search terms using this feature as well. The Explore page also allows you to download any chart to a .csv file, or to embed the table directly to a website.

The write-up goes on to note that there are no robust third-party tools to parse data found with Google Trends/ Explore, because the company has not made the API publicly available.

Unlike Google, we’re told, Twitter does not make it intuitive to find and analyze trending topics. However, its inclusion of location data can make Twitter a valuable source for this information, if you know how to find it. Dougherty suggests a work-around:

To ‘analyze’ current trends on the native Twitter app, you have to go to the ‘home’ page. In the lower left of the home page you’ll see ‘trending topics’ and immediately below that a ‘change’ button which allows you to modify the location of your search.

Location is a huge advantage of Twitter trends compared to Google: Although Google’s data is more robust and accessible in general, it can only be parsed by country. Twitter uses Yahoo’s GeoPlanet infrastructure for its location data so that it can be exercised at a much more granular level than Google Trends.

Since Twitter does publicly share its trending-topics API, there are third-party tools one can use with Twitter Trends, like TrendoGate, TrendsMap, and ttHistory. The post concludes with a reminder to maximize the usefulness of data with tools that “go beyond trends,” like (unsurprisingly) the monitoring software offered by Daugherty’s company. Paid add-ons may be worth it for some enterprises, but we recommend you check out what is freely available first.

Cynthia Murrell, October 25, 2016
Sponsored by, publisher of the CyberOSINT monograph

Google Finds That Times Change: Privacy Redefined

October 21, 2016

I read “Google Has Quietly Dropped Ban on Personally Identifiable Web Tracking.” The main idea is that an individual can be mapped to just about anything in the Google-verse. The write up points out that in 2007, one of the chief Googlers said that privacy was a “number one priority when we [the Google] contemplate new kinds of advertising products.”

That was before Facebook saddled up with former Googlers (aka Xooglers) and started to ride the ad pony, detailed user information, and the interstellar beast of user generated content. Googlers knew that social was a big deal, probably more important than offering Boolean operators and time stamp metadata for users of its index. But that was then and this is now.

The write up reveals:

But this summer, Google quietly erased that last privacy line in the sand – literally crossing out the lines in its privacy policy that promised to keep the two pots of data separate by default. In its place, Google substituted new language that says browsing habits “may be” combined with what the company learns from the use Gmail and other tools. The change is enabled by default for new Google accounts. Existing users were prompted to opt-in to the change this summer.

I must admit that when I saw the information, I ignored it. I don’t use too many Google services, and I am not one of the cats in the bag that Google is carrying to and fro. I am old (73), happy with my BlackBerry, and I don’t use mobile search. But the shift is an important part of the “new” Alphabet Google thing.

Tracking users 24×7 is the new black in Sillycon Valley. The yip yap about privacy, ethics, and making explicit what data are gathered is noise. Buy a new Pixel phone and live the dream, gentle reader.

You can work through the story cited above for more details. My thoughts went a slightly different direction:

  1. Facebook poses a significant challenge to Google, and today it does not have a viable option to offer its users
  2. The shift to mobile means that Google has to — note the phrase “has to” — find a way to juice up ad revenues. Sure, these are okay, but to keep the Loon balloons aloft more dough is needed.
  3. Higher value data boils down to detailed information about specific users, their cohorts, their affinity groups, and their behaviors. As the economy continues to struggle, the Alphabet Google thing will have data to buttress the Google ad sales’ professionals pitches to customers.
  4. Offering nifty data to nation states like China-type countries may allow Google to enter a new market with the Pixel and mobile search as Trojan horses.

In my monograph “Google Version 2.0: The Calculating Predator,” I described some of the technical underpinnings of Google’s acquisitions and inventors. With more data, the value of these innovations may begin to pay off. If the money does not flow, Google Version 3.0 may be a reprise of the agonies of the Yahooligans. Those Guha and Halevy “inventions” are fascinating in their scope and capabilities. Think about an email for which one can know who wrote it, who received it, who read it, who changed, what the changes were, who the downstream recipients were, and other assorted informational gems.

Allow me to leave you with a single question:

Do you think the Alphabet Google thing was not collecting fine grained data prior to the official announcement?

Although out of print, I have a pre publication copy of the Google 2.0 monograph available as a PDF. If you want a copy, write my intrepid sales manager, Ben Kent at benkent2020 at yahoo dot com. Yep, Yahoo. Inept as it may be, Yahoo is not the GOOG. The Facebook, however, remains the Facebook, and that’s one of Google’s irritants.

Stephen E Arnold, October 21, 2016

Falcon Searches Through Browser History

October 21, 2016

Have you ever visited a Web site and then lost the address or could not find a particular section on it?  You know that the page exists, but no matter how often you use an advanced search feature or scour through your browser history it cannot be found.  If you use Google Chrome as your main browser than there is a solution, says GHacks in the article, “Falcon: Full-Text history Search For Chrome.”

Falcon is a Google Chrome extension that adds full-text history search to a browser.  Chrome usually remembers Web sites and their extensions when you type them into the address bar.  The Falcon extension augments the default behavior to match text found on previously visited Web Sites.

Falcon is a search option within a search feature:

The main advantage of Falcon over Chrome’s default way of returning results is that it may provide you with better results.  If the title or URL of a page don’t contain the keyword you entered in the address bar, it won’t be displayed by Chrome as a suggestion even if the page is full of that keyword. With Falcon, that page may be returned as well in the suggestions.

The new Chrome extension acts as a delimiter to recorded Web history and improves a user’s search experience so they do not have to sift through results individually.

Whitney Grace, October 21, 2016
Sponsored by, publisher of the CyberOSINT monograph


Google and the Mobile Traffic Matter

October 20, 2016

I read a couple of writes up about “Google May Be Stealing Your Mobile Traffic.” Quite surprisingly there was a response to these “stealing” articles by Google. You can read the explanation in a comment by Malte Ubl in the original article (link here).

I noted these comments in the response to the stealing article:

  • Mr. Ubl says, ““stealing traffic” is literally the opposite of what AMP is for.”
  • Mr. Ubl says, “there are audience measurement platforms that attribute traffic to publishers. They might in theory wrongly attribute AMP traffic to the AMP Cache (not Google) rather than to a publisher because they primarily use referrer information. That is why we worked with them in worldwide outreach to get this corrected (where it was a problem), so that traffic is correctly attributed to the publisher. If this is still a problem anywhere, AMP treats it as a highest priority to get it resolved.”
  • Mr. Ubl says, “AMP supports over 60 ad networks (2 of them are owned by Google) with 2-3 coming on board every week and makes absolutely no change to business terms whatsoever. There is no special revenue share for AMP.”
  • Mr. Ubl says, “The Android users might have already noticed that it is now scrolling out of the way and the same is coming soon for iOS (we’re just fighting a few jank issues in Safari).”

AMP is, therefore, not stealing traffic.

I went back to my 2007 monograph “Google Version 2.0: The Calculating Predator,” and pulled out this diagram from a decade ago:

goog container 2007

The user interacts with the Google, not the Internet for certain types of content. The filtering is far from perfect, but it an attempt to gain control over the who, what, why, when, and where of information access and delivery. © Stephen E Arnold, 2007, All rights reserved.

I offer this diagram as a way to summarize my understanding of the architecture which Google had spelled out in its patent documents and open source technical documents. (Yep, the GOOG did pay me a small amount of money, but that is supposed to be something you cannot know.) However, my studies of Google — The Google Legacy, Google Version 2.0: The Calculating Predator, and Google: The Digital Gutenberg— were written with open source content only.

Now back to the diagram. My research suggested that Google, like Facebook, envisioned that it would be the “Internet” for most people. In order to reduce latency and derive maximum efficiency from its global infrastructure, users would interact with Google via services like search. The content or information would be delivered from Google’s servers. In its simplest form, there is a Google cache which serves content. The company understood the cost of passing every query back to data centers, running each query, and then serving the content. Common sense said, “Hey, let’s store this stuff and knock out unnecessary queries.” In a more sophisticated form, the inventions of Ramanathan Guha and others illustrated a system and method for creating a sliced-and-diced archive of factoids. A user query for digital cameras would be handled by pulling factoids from a semantic database. (I am simplifying here.,)

In one of my unpublished presentations, I show a mobile phone user interacting with Google’s caches in order to eliminate the need to send the user to the source of the factoid.

Perhaps I misunderstood the technical information my researchers and I analyzed.

I don’t think Google is doing anything different today. The “stealing” idea comes from a person who finally takes a look at how the Google systems maximize efficiency and control the users. In order to sell ads, Google has to know who does what, when, where, and under what circumstances.

Today’s Google is now a legacy system. I know this is heretical, but Google is not a search company. The firm is using its legacy platform to deliver revenue and maximize that revenue. Facebook (which has lots of Xooglers running around) is doing essentially the same thing but with plumbing variations.

I am probably wildly out of step with youthful Googlers and the zippy mobile AMPers. But from my vantage point, Google has been delivering a closed garden solution for a long time.

My Google trilogy is now out of print. I can provide a fair copy with some production glitches for $250. If you are interested, write my intrepid marketer, Benny Kent at

Stephen E Arnold, October 20, 2016

Google Cloud, Azure, and AWS Differences

October 18, 2016

With so many options for cloud computing, it can be confusing about which one to use for your personal or business files.  Three of the most popular cloud computing options are Amazon Web Services (AWS), Google Cloud Platform, and Microsoft Azure.  Beyond the pricing, the main differences range from what services they offer and what they name them.  Site Point did us a favor with its article comparing the different cloud services: “A Side-By-Side Comparison Of AWS, Google Cloud, And Azure.”

Cloud computing has the great benefit of offering flexible price options, but they can often can very intricate based on how much processing power you need, how many virtual servers you deploy, where they are deployed, etc.  AWS, Azure, and Google Cloud do offer canned solutions along with individual ones.

AWS has the most extensive service array, but they are also the most expensive.  It is best to decide how you want to use cloud computing because prices will vary based on the usage and each service does have specializations.  All three are good for scalable computing on demand, but Google is less flexible in its offering, although it is easier to understand the pricing.  Amazon has the most robust storage options.

When it comes to big data:

This requires very specific technologies and programming models, one of which is MapReduce, which was developed by Google, so maybe it isn’t surprising to see Google walking forward in the big data arena by offering an array of products — such as BigQuery (managed data warehouse for large-scale data analytics), Cloud Dataflow (real-time data processing), Cloud Dataproc (managed Spark and Hadoop), Cloud Datalab (large-scale data exploration, analysis, and visualization), Cloud Pub/Sub (messaging and streaming data), and Genomics (for processing up to petabytes of genomic data). Elastic MapReduce (EMR) and HDInsight are Amazon’s and Azure’s take on big data, respectively.

Without getting too much into the nitty gritty, each of the services have their strengths and weaknesses.  If one of the canned solutions do not work for you, read the fine print to learn how cloud computing can help your project.

Whitney Grace, October 18, 2016
Sponsored by, publisher of the CyberOSINT monograph

Pattern of Life Analysis to Help Decrypt Dark Web Actors

October 18, 2016

Google funded Recorded Future plans to use technologies like natural language processing, social network analysis and temporal pattern analysis to track Dark Web actors. This, in turn, will help security professionals to detect patterns and thwart security breaches well in advance.

An article Decrypting The Dark Web: Patterns Inside Hacker Forum Activity that appeared on DarkReading points out:

Most companies conducting threat intelligence employ experts who navigate the Dark Web and untangle threats. However, it’s possible to perform data analysis without requiring workers to analyze individual messages and posts.

Recorded Future which deploys around 500-700 servers across the globe monitors Dark Web forums to identify and categorize participants based on their language and geography. Using advanced algorithms, it then identifies individuals and their aliases who are involved in various fraudulent activities online. This is a type of automation where AI is deployed rather than relying on human intelligence.

The major flaw in this method is that bad actors do not necessarily use same or even similar aliases or handles across different Dark Web forums. Christopher Ahlberg, CEO of Recorded Future who is leading the project says:

A process called mathematical clustering can address this issue. By observing handle activity over time, researchers can determine if two handles belong to the same person without running into many complications.

Again, researchers and not AI or intelligent algorithms will have to play a crucial role in identifying the bad actors. What’s interesting is to note that Google, which pretty much dominates the information on Open Web is trying to make inroads into Dark Web through many of its fronts. The question is – will it succeed?

Vishal Ingole, October 18, 2016
Sponsored by, publisher of the CyberOSINT monograph

Artificial Intelligence Is Only a Download Away

October 17, 2016

Artificial intelligence still remains a thing of imagination in most people’s minds, because we do not understand how much it actually impacts our daily lives.  If you use a smartphone of any kind, it is programmed with software, apps, and a digital assistant teeming with artificial intelligence.  We are just so used to thinking that AI is the product of robots that we are unaware our phones, tablets, and other mobiles devices are little robots of their own.

Artificial intelligence programming and development is also on the daily task list on many software technicians.  If you happen to have any technical background, you might be interested to know that there are many open source options to begin experimenting with artificial intelligence.  Datamation rounded up the “15 Top Open Source Artificial Intelligence Tools” and these might be the next tool you use to complete your machine learning project.  The article shares that:

Artificial Intelligence (AI) is one of the hottest areas of technology research. Companies like IBM, Google, Microsoft, Facebook and Amazon are investing heavily in their own R&D, as well as buying up startups that have made progress in areas like machine learning, neural networks, natural language and image processing. Given the level of interest, it should come as no surprise that a recent artificial intelligence report from experts at Stanford University concluded that ‘increasingly useful applications of AI, with potentially profound positive impacts on our society and economy are likely to emerge between now and 2030.

The statement reiterates what I already wrote.  The list runs down open source tools, including PredictionIO, Oryx 2, OpenNN, MLib, Mahout, H20, Distributed Machine Learning Toolkit, Deeplearning4j, CNTK, Caffe, SystemML, TensorFlow, and Torch.  The use of each tool is described and most of them rely on some sort of Apache software.  Perhaps your own artificial intelligence project can contribute to further development of these open source tools.

Whitney Grace, October 17, 2016
Sponsored by, publisher of the CyberOSINT monograph

Google: Fragmentation and the False Universal Search

October 14, 2016

I read “Within Months, Google to Divide Its Index, Giving Mobile Users Better & Fresher Content.” Let’s agree to assume that this write up is spot on. I learned that Google plans “on releasing a separate mobile search index, which will become the primary one.”

The write up states:

The most substantial change will likely be that by having a mobile index, Google can run its ranking algorithm in a different fashion across “pure” mobile content rather than the current system that extracts data from desktop content to determine mobile rankings.

The news was not really news here in Harrod’s Creek. Since 2007, the utility of Google’s search system has been in decline for the type of queries the Beyond Search goslings and I typically run. On rare occasion we need to locate a pizza joint, but the bulk of our queries require old fashioned relevance ranking with results demonstration high precision and on point recall.


Time may be running out for Google Web search.

Several observations:

  1. With the volume of queries from mobile surpassing desktop queries, why would Google spend money to maintain two indexes? Perhaps Google will have a way to offer advertisers messaging targeted to mobile users and then sell ads for the old school desktop users? If the ad revenue does not justify the second index, well, why would an MBA continue to invest in desktop search? Kill it, right?
  2. What happens to the lucky Web sites which did not embrace AMP and other Google suggestions? My hunch is that traffic will drop and probably be difficult to regain. Sure, an advertiser can buy ads targeted at desktop users, but Google does not put much wood behind that which becomes a hassle, an annoyance, or a drag on the zippy outfit’s aspirations.
  3. What will the search engine optimization crowd do? Most of the experts will become instant and overnight experts in mobile search. There will be a windfall of business from Web sites addressed to business customers and others who use mobile but need an old fashioned boat anchor computing device. Then what? Answer: An opportunity to reinvent themselves. Data scientist seems like a natural fit for dispossessed SEO poobahs.

If the report is not accurate, so what? Here’s an idea. Relevance will continue to be eroded as Google tries to deal with the outflow of ad dollars to social outfits pushing grandchildren lovers and the folks who take snaps of everything.

The likelihood of a separate mobile index is high. Remember universal search? I do. Did it arrive? No. If I wanted news, I had to search Google News. Same separate index for scholar, maps, and other Google content. The promise of universal search was PR fluff.

Fragmentation is the name of the game in the world of Alphabet Google. And fragmented services have to earn their keep or get terminated with extreme prejudice. Just like Panoramio (I know. You are asking, “What’s Panoramio?), Google Web search could very well be on the digital glide way to the great beyond.

Stephen E Arnold, October 14, 2016

National Geographic Quad View

October 13, 2016

Google Maps and other map tools each have their unique features, but some are better than others at helping you find your way.  However, most of these online map tools have the same basic function and information.  While they can help you if you are lost, they are not that useful for topographyNational Geographic comes to our rescue with free topographic PDFs.  Check them out at PDF Quads.

Here are the details straight from the famous nature magazine:

National Geographic has built an easy to use web interface that allows anyone to quickly find any quad in the country for downloading and printing. Each quad has been pre-processed to print on a standard home, letter size printer. These are the same quads that were printed by USGS for decades on giant bus-sized pressed but are now available in multi-page PDFs that can be printed just about anywhere. They are pre-packaged using the standard 7.5 minute, 1:24,000 base but with some twists.

How can there be twists in a topographic map?  They are not really that surprising, just explanations about how the images are printed out.  Page one is an overview map that, pages two through five are standard topographic maps sized to print on regular paper, and hill shading is added to provide the maps with more detail.

Everyone does not use topography maps, but a precise tool is invaluable to those who do.

Whitney Grace, October 13, 2016
Sponsored by, publisher of the CyberOSINT monograph

Next Page »