Google Search Evaluator Handbook
June 12, 2018
How does Google shape search results? The pay to play search giant allegedly has a guide for individuals who interact with the automated search system. The information appears at this link. The information dates to 2017. There may be a revision or additional instructional material online. If we come across that information, we will post the link in Beyond Search.
The information is described as “Search Quality Rating System.” A sample from the table of contents for the documentation appears below:
An example of the information provided to the human making quality decisions appears below:
Here’s the guidance for queries about kittens:
In my first Google monograph (The Google Legacy, 2004), I gathered about 100 factors allegedly used to determine “quality” of Google search results. What I found interesting is that Google’s listing has many more entries than I identified 14 years ago.
Quality, it seems, is more difficult to pinpoint today. The rules for relevance, however, seem to have been marginalized.
I do know that in order to obtain useful results from Google, I have to craft my queries carefully. In fact, creating a query for an old school Boolean system is easier to do. Google has added on to what was essentially a key word system by wrapping layers of software around an ageing core.
Worth spending a few minutes with the document in my opinion.
Stephen E Arnold, June 12, 2018
Want Info about a Small Town? Hit the Library
June 6, 2018
For many, libraries are obsolete, deader than a Peruvian mummy. This is true for some, but if you live in a small town then libraries are far from dead. Big news outlets cover global issues, so they skip over small town stories. Small towns, however, still have news and the residents want to read it. Where do they go to get information when local newspapers dried up? They go to the local library. The Atlantic shares the story, “The Libraries Bringing Small-Town News Back To Life” and how the US’s smaller cities still rely on libraries as information centers.
Libraries have seen their budgets slashed, branches closed down, and the librarian profession has been traded for para-professionals. Yet people still go to libraries and even trust librarians over journalists and other news sources. Why? Librarians also understand the importance of accurate information and their sources.
Librarians have picked up the slack where local news sources fail or disappeared. In some towns, being a news source has increased participation at libraries. The write up stated:
“Various types of community building are happening across the nation. In some cities, libraries are partnering with established news sources, teaming up in Dallas to train high schoolers in news gathering or hosting a satellite studio in Boston for the public radio station WGBH. In San Antonio, the main library offers space to an independent video news site that trains students and runs a C-SPAN-style operation in America’s seventh-biggest city. (That site was the only video outlet covering a mayoral debate last year in which the incumbent mayor’s comments on poverty became a national story—and may have contributed to her electoral defeat.)”
Where once libraries use to store information, they are turning into the information source. They are also reinforcing important information literacy skills, which are in desperate need as fake news and instant search weakens people’s judgment skills.
Whitney Grace, June 6, 2018
Are Auto Suggestions Inherently Problematic?
June 3, 2018
Politics is a dangerous subject to bring up in any social situation. My advice is to keep quiet and nod, then you can avoid loudmouths trying to press their agendas down your throat. Despite attempts to remain polite, the Internet always brings out the worst in people and The Sun shares how with a simple search engine function, “‘Trump Should Be Shot’ Google And Bing Searches For ‘Trump’ And ‘Conservatives’ Offer Disgusting Auto-Suggestions.”
Auto-complete is notorious for making hilarious mistakes and the same is with auto-suggest on search engines, but these end up to be more gruesome than a misspelling.
If you want to see some interesting suggestions, type “Trump should be…” into a blank search bar and the results are endless, including: shot, arrested, killed, in jail, arrested banned from Twitter (okay, the last one might be a little funny).
Typing in “conservatives need…” results in less derogatory terms, but the auto-suggestions include: to die, to go, a new party, and not apply.
Hmmm.
What creates these auto-suggestions?
“These are based on a number of factors including real-time searches, trending results, your location, and previous activity.The intuitive predictions change in “response to new characters being entered into the search box” explains Google. And the company also has its own set of “autocomplete policies” in case something untoward should pop up.Along with prohibiting predictions that contain sexually explicit, violent, and harmful terms, Google says it also removes hateful suggestions against groups and individuals. ‘We remove predictions that include graphic descriptions of violence or advocate violence generally,’ states the firm.”
Google and Bing deserve some credit for removing the slander from auto-complete, but sometimes they only do it when they are pushed. Trolls and bigots create these terms and it would be nice to see them scrubbed from auto-suggest, but it is near impossible. Hey, Bing and Google try scrubbing 4chan!
Whitney Grace, June 3, 2018
AI: A Little Helper for Those Seeking Information
May 31, 2018
Search is a powerful tool and big data software has only improved search’s quality. Search can now locate items in all data structures, ranging from structured to unstructured. Do users, however, actually find the answers they want? InfoWorld runs through the impact AI has had on search in the article, “The Wonders Of AI-Or The Shortcomings Of Search?”
In essence, Google and Amazon’s subsecond search results have spoiled and ruined users. Users are so use to accurate and quick results that they expect all Web sites, software, and hardware to do the same. These search tools are actually providing users with an information overload.
One the other hand, AI makes search and other tools more robust. Organizations use AI not only to power search, but to feed and filter data to make business recommendations. Google and Amazon are not the only ones using it. Other companies that use AI to power their businesses are Uber, Tesla, Spotify, Pandora, Netflix, and Bristol Meyers Squibb. AI takes the search out of search:
“Those last points are crucial. A structural shift is under way. AI cuts through the clutter to provide not endless pages of results to wade through, but with specific recommendations tailored to you as the seeker of knowledge—or simply as the seeker of where to find the best Chicago-style pizza while away from home on a business trip. (Which is not to admit, certainly not in print, that I have not supplemented my normal whole-foods, plant-based, no-meat-or-dairy nutrition by indulging in such a cheesy, guilty pleasure. I present it merely for illustration.) The key construct: AI-driven systems present either the single best solution or a tight shortlist of best-fit solutions.”
AI also augments search by providing recommendations that are related to the original query, but are simply suggestions. This requires that AI be fed a lot of data, so that it can offer proactive assistance.
Big data and AI are empowering, but they do need a checks and balances system. The solution is to combine AI search and regular search into one tool: the curated list and the raw data list.
Whitney Grace, May 31, 2018
Want Mobile Traffic? New Tactics May Be Needed
May 30, 2018
I read “Mobile Direct Traffic Eclipses Facebook.” Like any research, I like to know the size of the sample, the methodology, and the “shaping” which the researchers bring to the project. To answer these questions, one must see other sources cited in the write up, including Nieman Lab, which appears to be recycling Chartbeat data. In short, I don’t know much about the research design or other aspects of the research.
Nevertheless, I noted a handful of statements or “facts” which on the surface struck me as interesting. The study data appear to support the assertion that “mobile does not equal social”.
First, the study reports that “mobile direct to traffic has surpassed Facebook.” I think this means that if those in the sample use a mobile device, some of those users use an app or a browser to go directly to a site. At first glance, Facebook seems to be a major player but it is, according to the survey, trending down from being the gateway to information for some mobile device users.
Second, the write up points out sites offering “content” are not losing visitors. On one hand, the finding suggests that Facebook is not a gateway trending upwards. I have seen reports suggesting that Facebook has been negatively affected by the Cambridge Analytica matter, but I have also seen reports which assert that Facebook is adding users. Which is it? That’s the question, isn’t it?
Third, the Chartbeat data put Google as the leading source of traffic to sites. What this means is that the “gap” between Facebook and Google as referrers seems to be getting bigger. Bad news for Facebook and good news for Google if the data are accurate.
Several observations:
- The data, if accurate, make it clear that Google and its Android operating system have a clear path to the barn
- Facebook may have to begin the process of adapting to mobile users who do not use Facebook as the gateway to the Internet (whatever that ends up being)
- Governments interested in censoring certain content streams have a crude road map for determine what online destinations should be cut off from the information superhighway. (The law enforcement addiction to Facebook and Twitter may require some special treatment at clinics run by Google and high traffic destinations accessed via an app.)
To sum up, if the data in the Chartbeat report are accurate, changes are underway. Some positive, some negative. There is, however, that “if.”
Stephen E Arnold, May 30, 2018
Bing Keeps on Trying
May 21, 2018
Ah, Bing.
Microsoft has struggled to garner the respect in the search engine world that its software has commanded.
Bing is often seen as the Avis to Google’s Hertz. Maybe a stepchild of the search game patriarchs, Sergey and Larry.
Microsoft is not blind to these views, which is resulting in some interesting innovations to close the gap between it and Google. We learned about these steps from a recent TechRadar story, “Microsoft Unveils New Features for Bing in Bid to Make You Switch from Google.”
The biggest upgrade? The fact that Bing now gives you an “Intelligent Answer” and not just the one that ranks first. It seems like a good move, which the article highlights:
“We’re pleased to see Microsoft attempt to win over users by adding more features (which you can read about more on the Bing blog), rather than trying to strong-arm people who use Windows 10 into using the search engine, but will this be enough to make people switch?”
We’re going to go out on a (not very long) limb and suggest, no. This isn’t enough to make people switch. That’s especially true when we see news like this, that claims that Google’s Assistant is the most accurate. Looks like the game board is shifting beneath Microsoft’s feet as they try to catch up. How does one find information available on the Internet?
One doesn’t without recourse to commercial systems from vendors with low or zero profile among consumers. Money is required to find relevant information. Free stuff returns what earns money to pay for the “free lunch.”
Patrick Roland, May 21, 2018
Bing Engineers Serendipity, Not Just Irrelevant Results
May 5, 2018
It is Saturday. Innovation in search never rests. I read “Bing: Search Engines Have a Responsibility to Get People Out of Their Bubbles.” The headline is one guaranteed to give me a headache.
My view is that when I use a search system I expect, want, and need the system to:
- Process my keyword query, accept Boolean logic (AND, OR, and NOT arguments), and generate a list of results that optimize relevance.
- If I need more results, synonyms, Endeca-like “facets”, I want a button or a menu option that allows me to specify what I think I need to get the information I seek.
- I want to have ads, sponsored content, and SEO skewed content flagged in a color which is easily visible and put within a ruled “box.”
- I want to know [a] the date at which the displayed result was indexed, [b] the date assigned by whoever wrote the item to the specific article, and [c] an explicit link to a cache in the event the page indexed has been removed or is otherwise unavailable.
I have other requirements for a commercial search system; for example, Diffeo’s or Recorded Future’s approach. But these are specialized and inappropriate for a Bing style Web index.
The Bing approach, according to the write up:
Bing has launched a new feature called Intelligent Answers. When you enter a question with several valid answers, the search engine summarizes them all in a carousel to give a balanced overview.
I don’t want answers. I want a list of relevant locations which may contain the information I seek. For example, I needed to identify the term for a penance device and access images of these gizmos. Bing, Google, Yandex, and even lesser known systems like iSeek.com and Qwant.com failed.
The systems returned everything from a church calendar to a correction of penance to pennant. I did not want baseball information.
Now Bing is going to identify from my query my “question” and provide a range of answers. I don’t want this to happen. If I search LOCA, I want information about a loss of coolant accident, not this:
I am happy to add a field code for power, nuclear if such a feature were supported by Bing. I would also add key words to get something close to my term.
The complete and utter silliness of Bing results exists right now. The company which has managed minimal progress in search now expects me to believe that its “smart software” can provide answers.
The write up states:
“Take a simple query like ‘Is coffee good for you?’” said Ribas. “There are plenty of reputable sources that tell you that there are good reasons for drinking coffee, but there are also some very reputable ones that say the opposite. Deep learning allows us to project multiple queries in the passages to what we call the semantic space and find the matches.
Based on my limited experience with whizzy 2018 search technology, I am not sure if Bing’s innovation will be helpful to me. When “semantic space” is concerned, the systems with which I am familiar, provide a number of other tools and functions to ensure relevance and accuracy.
Even with those tools, including state of the art systems from developers from Madrid to San Carlos, the user has to think, analyze, and run additional queries. Phone calls, interviews, and even visits to libraries are often required to obtain helpful information.
Bing promises “intelligent answers.”
Sounds like MBA infused marketing with a few notes added from engineers with better things to do than explain exactly what a content processing component can do with 80 percent accuracy.
Time out. The referee wants the coach to get the MBA marketers off the field for intellectual fantasizing. This is the same outfit which owns Fast Search & Transfer, created the racist chatbot, and missed the mobile phone business by a country mile. Why not ask Bing a question like, “How did these missteps occur?” Perhaps Watson would be able to take a crack at “intelligent answers”?
Stephen E Arnold, May 5, 2018
Web Archives
May 4, 2018
Short honk: Here’s a list of Web archives. These services allow one to find pages from a defunct or unavailable Web site or page:
- Archive Today at http://archive.is/
- Internet Archive at http://archive.org/web/
- Perma.cc at https://perma.cc/ (which is a collection of permalinks)
The Beyond Search goose has learned to generate a PDF of information. Quite a bit of content “disappearing” is taking place. To cite one example: Try to locate the list of MIC, RAC, and ZPIC vendors once engaged in locating health care billing fraud and similar misunderstandings. Enjoy your hunt for these items of information.
The source article was “Force Archive Websites to Pick up Webpages with This Handy Tool.”
Stephen E Arnold, May 4, 2018
LucidWorks Has a Search App for That. What?
April 27, 2018
Is there life in enterprise search after many years of hype, razzle dazzle, and over the top marketing?
Maybe?
Lucidworks announced that they have a brand new search tool for enterprise business systems, says Global Newswire in the article, “Lucidworks Launches AI-Powered Site Search App For Enterprise.” The new application is dubbed Lucidworks Site Search and it is an easy configurable, embeddable site-based application.
Lucidworks Site Search uses workflows that optimize natural language processing and machine learning for users to personalize their search results. The application uses rich faceting and filtering to drill down for the most accurate results. Users will be able to access content and insights quicker than older applications.
The Lucidworks CEO said,
“‘Developing a website’s search with both a powerful backend and an elegant UI can be an arduous process. We’ve created Site Search to empower more teams to get site search apps done and out the door,’ explains Lucidworks CEO Will Hayes. ‘By increasing the usability through an applications-based approach, we’re able to bring Lucidworks’ operationalized AI to more customers.’”
We enjoy terms like “operationalize.” Do we understand these MBA inspired noun to verb arabesques? Not really.
Key word search is a useful utility. The new Lucidworks Site Search scans through every document, allows quick configuration, and has an attractive user interface. Elasticsearch does this as well.
We believe the future belongs to vendors with a more comprehensive next generation information access system. In short, more like Palantir Gotham or BAE NetReveal and less like the mainframe centric IBM Stairs approach.
Whitney Grace, April 27, 2018
Taking Time for Search Vendor Limerance
April 18, 2018
Life is a bit hectic. The Beyond Search and the DarkCyber teams are working on the US government hidden Web presentation scheduled this week. We also have final research underway for the two Telestrategies ISS CyberOSINT lectures. The first is a review of the DarkCyber approach to deanonymizing Surface Web and hidden Web chat. The second focuses on deanonymizing digital currency transactions. Both sessions provide attendees with best practices, commercial solutions, open source tools, and the standard checklists which are a feature of my LE and intel lectures.
However, one of my associates asked me if I knew what the word “limerance” meant. This individual is reasonably intelligent, but the bar for brains is pretty low here in rural Kentucky. I told the person, “I think it is psychobabble, but I am not sure.”
The fix was a quick Bing.com search. The wonky relevance of the Google was the reason for the shift to the once indomitable Microsoft.
Limerance, according to Bing’s summary of Wikipedia means “a state of mind which results from a romantic attraction to another person typically including compulsive thoughts and fantasies and a desire to form or maintain a relationship and have one’s feelings reciprocated.”
Upon reflection, I decided that limerance can be liberated from the woozy world of psychologists, shrinks, and wielders of water witches.
Consider this usage in the marginalized world of enterprise search:
Limerance: The state of mind which causes a vendor of key word search to embrace any application or use case which can be stretched to trigger a license to the vendor’s “finding” system.