Google Biases: Real, Hoped For, or Imagined?

November 10, 2016

I don’t have a dog in this fight. Here at Beyond Search we point to open source documents and offer comments designed to separate the giblets from the goose feathers. Yep, that’s humor, gentle reader. Like it or not.

The write up “Opinion: Google Is Biased Toward Reputation-Damaging Content” pokes into an interesting subject. When I read the article, I thought, “Is this person a user of Proton Mail?”

The main point of the write up is that the Google relevance ranking method responds in an active manner to content which the “smart” software determines is negative. But people wrote the software, right? What’s up, people writing relevance ranking modules?

The write up states:

Google has worked very hard to interpret user intent when searches are conducted. It’s not easy to fathom what people may be seeking when they submit a keyword or a keyword phrase.

Yep, Google did take this approach prior to its initial public offering in 2004. Since then, I ask, “What changes did Google implement in relevance in the post IPO era?” I ask, “Did Google include some of the common procedures which have known weaknesses with regard to what lights the fires of the algorithms’ interests?”

The write up tells me:

Since Google cannot always divine a specific intention when a user submits a search query, it’s evolved to using something of a scattergun approach — it tries to provide a variety of the most likely sorts of things that people are generally seeking when submitting those keywords. When this is the name of a business or a person, Google commonly returns things like the official website of the subject, resumes, directory pages, profiles, business reviews and social media profiles. Part of the search results variety Google tries to present includes fresh content — newly published things like news articles, videos, images, blog posts and so on. [Emphasis added.]

Perhaps “fresh” content triggers the following relevance components? For example, fresh content signals change and change may mean that the “owner” of the Web page may be interested in buying AdWords. A boost for “new stuff” means that when a search result drifts lower over a span of a week or two, the willingness to buy AdWords goes up? I think about this question because it suggests that tuning certain methods provides a signal to the AdWords’ subsystems of people and code. I have described how such internal “janitors” within Google modules perform certain chores. Is this a “new” chore designed to create a pool of AdWords’ prospects? Alas, the write up does not explore this matter.

The write up points to a Googler’s public explanation of some of the relevance ranking methods in use today. That’s good information. But with the public presentations of Google systems and methods with which I am familiar, what’s revealed is like touching an elephant when one is blind. There is quite a bit more of the animal to explore and understand. In fact “understand” is pretty tough unless one is a Googler with access to other Googlers, the company’s internal database system, and the semi clear guidelines from whoever seems to be in charge at a particular time.

I highlighted this passage from the original write up as interesting:

I’ve worked on a number of cases in which all my research indicates my clients’ names have extremely low volumes of searches.  The negative materials are likely to receive no more clicks than the positive materials, according to my information, and, in many cases, they have fewer links.

Okay, so there’s no problem? If so, why is the write up headed down the Google distorts results path? My hunch is that the assurance is a way to keep Googzilla at bay. The author may want to work at the GOOG someday. Why be too feisty and remind the reader of the European Commission’s view of Google’s control of search results?

The write up concludes with a hope that Google says more about how it handles relevance. Yep, that’s a common request from the search engine optimization crowd.

My view from rural Kentucky is that there are a number of ways to have an impact on what Google presents in search results. Some of these methods exploit weaknesses in the most common algorithms used for basic functions within the Google construct. Other methods are available as well, but these are identified by trial and error by SEO wizards who flail for a way to make their clients’ content appear in the optimum place for one of the clients’ favorite keywords.

Three observations:

  • The current crop of search mavens at Google are in the business of working with what is already there. Think in terms of using a large, frequently modified, and increasingly inefficient system for determining relevance. That’s what the new hires confront. Fun stuff.
  • The present climate for relevance at Google is focused on dealing with the need to win in mobile search. The dominant market share in desktop search is not a given in the mobile world. Google is fragmenting its index for a reason. The old desktop model looks a bit like a 1990s Corvette. Interesting. Powerful. Old.
  • The need for revenue is putting more and more pressure on Google to make up for the mobile user behavior and the desktop user behavior in terms of search. Google is powerful, but different methods are needed to get closer to that $100 billion in revenue Eric Schmidt referenced in 2006. Relevance may be an opportunity.

My view is that Google is more than 15 years down the search road. Relevance is no longer defined by precision and recall. What’s important is reducing costs, increasing revenue, and dealing with the problems posed by Amazon, Facebook, Snapchat, et al.

Relevance is not high on the list of to dos in some search centric companies. Poking Google about relevance may produce some reactions. But not from me. I love the Google. Proton Mail is back in the index because Google allegedly made a “fix.” See. Smart algorithms need some human attention. If you buy a lot of AdWords, I would wager that some human Googlers will pay attention to you. Smart software isn’t everything once it alerts a Googler to activate the sensitivity function in the wetware.

Stephen E Arnold, November 10, 2016

Google Search Tips That Make You Say DUH

November 10, 2016

Unless you are establishing the trends in the search field, then there is room for you to learn new search-related skills.  Search is a basic function in the developed world and is more powerful than typing a word or phrase into Google’s search box.  Google also has more tricks in its toolbox than you might be aware of.  Single Grain published, “Google Like A Pro: 42 Of The Most Useful Google Search Tricks” that runs down useful ways to use the search engine.  Some of them, however, are cheap tricks we have discussed before.

Single Grain runs down the usual Google stats about how many people use the search engine, its multiple services, and the hard to find Advanced Search.  Here is a basic article description:

Here’s a list of 42 of the most useful Google search tricks that’ve probably never thought of—some practical, some just plain fun. Everyone knows how to Google, but by learning how to Google like a pro, you can harness the full power of the search giant and impress your boss and friends alike. Or at least find stuff.

These tips include: calculator, package tracker, stock watcher, tip calculator, conversions, weather, flight tracker, coin flipping, voice search, fact checking, and other tips you probably know.  What I love is that it treats Boolean operators as if they are a brand new thing.  They do not even use Boolean in the article!  Call me old school, but give credit where credit is due.

Whitney Grace, November 10, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Word Embedding Captures Semantic Relationships

November 10, 2016

The article on O’Reilly titled Capturing Semantic Meanings Using Deep Learning explores word embedding in natural language processing. NLP systems typically encode word strings, but word embedding offers a more complex approach that emphasizes relationships and similarities between words by treating them as vectors. The article posits,

For example, let’s take the words woman, man, queen, and king. We can get their vector representations and use basic algebraic operations to find semantic similarities. Measuring similarity between vectors is possible using measures such as cosine similarity. So, when we subtract the vector of the word man from the vector of the word woman, then its cosine distance would be close to the distance between the word queen minus the word king (see Figure 1).

The article investigates the various neural network models that prevent the expense of working with large data. Word2Vec, CBOW, and continuous skip-gram are touted as models and the article goes into great technical detail about the entire process. The final result is that the vectors understand the semantic relationship between the words in the example. Why does this approach to NLP matter? A few applications include predicting future business applications, sentiment analysis, and semantic image searches.

Chelsea Kerwin,  November 10, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Palantir Technologies: Less War with Gotham?

November 9, 2016

I read “Peter Thiel Explains Why His Company’s Defense Contracts Could Lead to Less War.” I noted that the write up appeared in the Washington Post, a favorite of Jeff Bezos I believe. The write up referenced a refrain which I have heard before:

Washington “insiders” currently leading the government have “squandered” money, time and human lives on international conflicts.

What I highlighted as an interesting passage was this one:

a spokesman for Thiel explained that the technology allows the military to have a more targeted response to threats, which could render unnecessary the wide-scale conflicts that Thiel sharply criticized.

I also put a star by this statement from the write up:

“If we can pinpoint real security threats, we can defend ourselves without resorting to the crude tactic of invading other countries,” Thiel said in a statement sent to The Post.

The write up pointed out that Palantir booked about $350 million in business between 2007 and 2016 and added:

The total value of the contracts awarded to Palantir is actually higher. Many contracts are paid in a series of installments as work is completed or funds are allocated, meaning the total value of the contract may be reflected over several years. In May, for example, Palantir was awarded a contract worth $222.1 million from the Defense Department to provide software and technical support to the U.S. Special Operations Command. The initial amount paid was $5 million with the remainder to come in installments over four years.

I was surprised at the Washington Post’s write up. No ads for Alexa and no Beltway snarkiness. That too was interesting to me. And I don’t have a dog in the fight. For those with dogs in the fight, there may be some billability worries ahead. I wonder if the traffic jam at 355 and Quince Orchard will now abate when IBM folks do their daily commute.

Stephen E Arnold, November 9, 2016

Code.gov: Missing Some Stuff

November 9, 2016

I love US government Web sites. They come and then they fade. The new kid on the block is Code.gov. The idea is that the US government has created a portal for open source software. Here are the entities whose code is available to anyone able to navigate to this link.

  • Agriculture
  • Commerce
  • EPA
  • Energy
  • Executive Office of the President
  • GSA (home of 18F)
  • Labor
  • NASA
  • National Archives and Records Administration
  • OPM (yep, the security conscious folks)
  • Treasury
  • Veterans Affairs (what are you looking at, cupcake?)

I did notice some interesting gaps; for example, does the Department of Defense have open source software? Well, maybe not. We do include a pointer to more than 100 useful programs in the forthcoming Dark Web Notebook. Want to reserve a copy? Write benkent2020 at yahoo dot com and we’ll put your name on the list.

Stephen E Arnold, November 9, 2016

Blue Chipper and Marketing Analytics

November 9, 2016

I think this write up “Reporter’s Notebook: McKinsey’s Heller Talks Analytics” is a summary plus odds and ends based on a McKinsey blue chip consultant’s lecture. McKinsey prides itself on hiring smart people, and it does some crafty buzzwording when it makes the obvious so darned obvious.

I noted this passage:

CMOS are asking: Do we have enough data scientists? Are we accelerating customer acquisition? Are we increasing customer value? What they care about is taking the intense amount of data that happens every day from call centers, Web sites and stores, then stitching it together and identifying new customer segmentation and new opportunities to create growth. The CMO is thinking about data science — how it can drive growth about the organization.

The idea is that federating disparate information is important from McKinsey’s point of view.

How does a marketer deal with data in a way that makes revenue? I highlighted this MBA formula: Get organized, plan, and hire McKinsey to help. The 4Ds will help too:

  • “Data. Aggregate as much information as possible and everything you do downstream creates more value.
  • Decisioning. Run advanced models — propensity models, churn models — against that data. You don’t become a data scientist overnight. The organization needs to do customer scoring and advanced analytics. Identify where the data fiefdoms are in your organization (people holding on to their data to protect their jobs) and get the right people together.
  • Design. Managing the content, offers and experience the customer receives and being curious and experimenting. Testing. A/B testing. Once you have the models, what are the experiences these customers want to see?
  • Distribution. Push both the decision data and test design into marketing. Close the loop and measure everything. If I’m in a room of marketers and I ask them what their roles are, they’re distributing marketing communications, just not in a truly data-driven way.”

But the marketing officer must embrace the five core beliefs behind “mobilization.” I bet you are eager to learn these five insights. Here you go:

  1. “Mobilize cross-functional leaders around the opportunity. The CMO needs CIO, store operations, different people to help break down the silos.
  2. Get creative about navigating the legacy … be relentless about solutions.
  3. Walk before you run. Identify a roadmap, pick some high priority areas and execute.
  4. Prioritize “lighthouse” projects to kick-start execution.
  5. Let data activation drive your new marketing operations model.”

What’s the payoff? Well, for McKinsey it is billable hours. For the client:

We see real aggressive growth with clients doing nothing wrong in the range of a 6X revenue capture. If I can increase the speed by which you test, you’re increasing revenue . Typically conversion rate increases from the low end of the 20s to high end of 150 percent plus  range … on the digital sales side yield exponential gains of 2, 3, 5X. Just 1 percent, 2 percent or 3 percent of enterprise value creation for a multi-billion company — driven by digital — is huge.

Huge? That seems to be a trendy word. Where have I heard it before? Hmmm. Will McKinsey guarantee the measurable benefit of its consultants’ work? My hunch is that McKinsey sends invoices; it does not write checks when its work wanders a bit from the data in a presentation.

Stephen E Arnold, November 9, 2016

Ontotext: The Fabric of Relationships

November 9, 2016

Relationships among metadata, words, and other “information” are important. Google’s Dr. Alon Halevy, founder of Transformic which Google acquired in 2006, has been beavering away in this field for a number of years. His work on “dataspaces” is important for Google and germane to the “intelligence-oriented” systems which knit together disparate factoids about a person, event, or organization. I recall one of his presentations—specifically the PODs 2006 keynote–in which he reproduced a “colleague’s” diagram of a flow chart which made it easy to see who received the document, who edited the document and what changes were made, and to whom recipients of the document forward the document.

Here’s the diagram from Dr. Halevy’s lecture:

image

Principles of Dataspace Systems, Slide 4 by Dr. Alon Halevy at delivered on June 26, 2006 at PODs. Note that “PODs” is an annual ACM database-centric conference.

I found the Halevy discussion interesting.

Read more

Shining a Flashlight in Space

November 9, 2016

A tired, yet thorough metaphor of explaining the dark web is shining a flashlight in space.  If you shine a flashlight in space, your puny battery-powered beacon will not shed any light on the trillions of celestial objects that exist in the vacuum.  While you wave the flashlight around trying to see something in the cosmos, you are too blind to see the grand galactic show hidden by the beam.  The University of Michigan shared the article, “Shadow Of The Dark Web” about Computer Science and Engineering Professor Mike Cafarella and his work with DARPA.

Cafarella is working on Memex, a project that goes beyond the regular text-based search engine.  Using more powerful search tools, Memex concentrates on discovering information related to human trafficking.  Older dark web search tools skimmed over information and were imprecise.  Cafarella’s work improved dark web search tools, supplying data sets with more accurate information on traffickers, their contact information, and their location.

Humans are still needed to interpret the data as the algorithms do not know how to interpret the black market economic worth of trafficked people.  His dark web search tools can be used for more than just sex trafficking:

His work can help identify systems of terrorist recruitment; bust money-laundering operations; build fossil databases from a century’s worth of paleontology publications; identify the genetic basis of diseases by drawing from thousands of biomedical studies; and generally find hidden connections among people, places, and things.

I would never have thought a few years ago that database and data-mining research could have such an impact, and it’s really exciting,’ says Cafarella. ‘Our data has been shipped to law enforcement, and we hear that it’s been used to make real arrests. That feels great.

In order to see the dark web, you need more than a flashlight.  To continue the space metaphor, you need a powerful telescope that scans the heavens and can search the darkness where no light ever passes.

Whitney Grace, November 9, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Google May Be Edging Out Its Competitors Surreptitiously

November 9, 2016

Leading secure email service provider ProtonMail mysteriously vanished from Google’s search results for 10 long months. Though the search engine giant denies any wrongdoing on its part, privacy advocates are crying foul.

ZDNet in an article titled ProtonMail strikes out at Google for crippling encrypted email service searches says:

ProtonMail has accused Google of hiding the company from search results in what may have been an attempt to suffocate the Gmail competitor. The free encrypted email service, which caters to nearly one million users worldwide, has enjoyed an increasing user base and popularity over the past few years as governments worldwide seek to increase their surveillance powers.

This is not the first time that Google has been accused of misusing its dominant position to edge out its competitors. The technology giant is also facing anti-trust lawsuit in Europe over the way it manipulates search results to retain its dominance.

Though ProtonMail tried to contact Google multiple time, all attempts elicited no response from the company. Just as the secure email service provider vanished from its organic search results, it mysteriously reappeared enabling the email service provider to get back on its feet financially.

As stated in the article:

Once Google issued a “fix,” ProtonMail’s search ranking immediately recovered. Now, the company is ranked at number one and number three for the search terms at the heart of the situation.

What caused the outage is still unknown. According to ProtonMail, it might be a bug in the search engine algorithm. Privacy advocates, however, are of the opinion that ProtonMail’s encrypted email might have been irking Google.

Vishal Ingole, November 9, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

HonkinNews for November 8, 2016 Now Available

November 8, 2016

This week HonkinNews comments about Microsoft’s mobile phone adventure. You will learn about geo spatial analytics’ companies that may have an impact in certain secret applications. Palantir  makes news again. There is more. You can view the seven minute video at this link https://youtu.be/UWCk4n_AC0Y.

Kenny Toth, November 8, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta