Boolean Is Better but Maybe Google Must Motor Through Ad Inventory by Relaxing Queries…a Lot?

May 17, 2020

A brief exchange on StackExchange demonstrates some common sense. One user, moseisley.2015, asks the community, “Should Default Search Behavior be ‘This AND That,’ or ‘This OR That’?” They elaborate:

“I have web application that shows lists of various data types … employees, customers, inventory items, orders, and so on. There’s one simple search field for doing a ‘global’ search … . Question is, when a user enters multi-word text in the field should the default search behavior be (1) this OR that or (2) this AND that? What default behavior do you think average users would expect?”

Their example lists four records: John Smith, John Jones, Michael Smith, and Betty Taylor-Smith. Would users expect the query “John Smith” to return just the first record (AND) or all four (OR)? As any online researcher from the ‘70s and ‘80s would tell you, the Boolean AND is the better default. The first respondent, SNag, sensibly writes:

“As a user, the more I type in, the more specific I’m expecting the results to get, and this is what happens with AND. With OR, your results would explode! If my search for popular Google Doodle games gave me everything that was popular, everything Google, everything Doodle and every game out there, I’d be lost! If you’re expecting your user to fetch all matching either John or Smith results, consider supporting syntax like John|Smith (where | is the logical symbol for OR) and placing a hint ? icon next to the search box to showcase the various supported syntaxes. You could also consider quotes in the search syntax for exact matches, where “Smith” wouldn’t match Taylor-Smith, but Smith would. “John”|”Smith” would then match all John and all Smith but not Betty Taylor-Smith.”

We concur. The second respondent, Big_Chair, adds a good observation—users without any programming background are probably unfamiliar with the | character and may need a more explicit cue that their query is about to return results based on OR rather than AND.

Cynthia Murrell, May 17 2020

Google: Regular Search Not Up to Covid19 Queries. Who Knew?

May 15, 2020

Google has launched a new semantic search tool designed to help researchers fight this pandemic. The Google AI Blog reveals “An NLU-Powered Tool to Explore COVID-19 Scientific Literature.” As one might expect, researchers around the world have been turning out an enormous number of papers on the disease and how we might fight it. Why does this call for a special tool? Google researcher Keith Hall writes:

“Traditional search engines can be excellent resources for finding real-time information on general COVID-19 questions like ‘How many COVID-19 cases are there in the United States?’, but can struggle with understanding the meaning behind research-driven queries. Furthermore, searching through the existing corpus of COVID-19 scientific literature with traditional keyword-based approaches can make it difficult to pinpoint relevant evidence for complex queries. To help address this problem, we are launching the COVID-19 Research Explorer, a semantic search interface on top of the COVID-19 Open Research Dataset (CORD-19), which includes more than 50,000 journal articles and preprints.”

Based on the BERT technology recently injected into the general Google Search, this bespoke semantic AI has been trained on biomedical literature. The team chose to build a hybrid term-neural retrieval model for this platform—a combination of keyword search and neural retrieval; see the article for the technical details. Here’s how the search functions:

“When the user asks an initial question, the tool not only returns a set of papers (like in a traditional search) but also highlights snippets from the paper that are possible answers to the question. The user can review the snippets and quickly make a decision on whether or not that paper is worth further reading. If the user is satisfied with the initial set of papers and snippets, we have added functionality to pose follow-up questions, which act as new queries for the original set of retrieved articles.”

The open-alpha platform is available for free to the research community, and Google plans to continue refining the system over the next few months. May this tool help scientists find solutions that much faster.

Cynthia Murrell, May 15, 2020

Deindexing: Does It Officially Exist?

May 14, 2020

DarkCyber noted “LinkedIn Temporarily Deindexed from Google.” The rock solid, hard news service stated:

LinkedIn found itself deindexed from Google search results on Wednesday, which may or may not have occurred due to an error on their part. The telltale sign of an entire domain being deindexed from Google is performing a “site:” search and seeing zero results.

Mysterious.

DarkCyber has fielded two reports of deindexing from Google in the last three days. I one case a site providing automobile data was disappeared. In another, a site focused on the politics of the intelligence sector was pushed from page one to the depths of page three.

Why?

No explanation, of course.

LinkedIn is owned by Microsoft. Is that a reason? Did LinkedIn’s engineers ignore a warning about a problem in AMP?

Google does not make errors. If a problem arises, the cause is the vaunted Google smart software.

DarkCyber’s view is that Google is taking stepped up action to filter certain types of content. We have documented that one Google office has access to controls that can selectively block certain content from appearing in the public facing Web search system. The content is indeed indexed and available to those with certain types of access.

What’s up? Here are our theories?

  1. Google is trying to deal with problematic content in a more timely manner by relaxing constraints on search engineers working in Google “virtual offices” around the world. Human judgments will affect some Web site. (Contacting Google is as difficult as it has been for the last 20 years.)
  2. Google wants to make sure that ads do not appear next to content that might cause a big spender to pull away. Google needs the cash. The thought is that Amazon and Facebook are starting to put a shunt in the money pipeline.
  3. Google is struggling to control costs. Slowing indexing, removing sites from a crawl, and pushing content that is rarely viewed to the side of the Information Superhighway reduces some of the costs associated with serving more than 95 percent of the queries launched by humans each day.

Regardless of the real reason or the theoretical ones, Google’s control over findable content can have interesting consequences. For example, more investigations are ramping up in Europe about the firm’s practices (either human or software centric).

Interesting. Too bad others affected by Google actions are not of the girth and heft of LinkedIn. Oh, well, the one percent are at the top for a reason.

Stephen E Arnold, May 14, 2020

New Arnold-Steele Discussion: Findability Is Terrible

May 7, 2020

Robert David Steele, a former CIA professional, stored a video of our recent discussion about finding open source information. The main point is that findability has degraded to the point that results are generally useless. Bing, Google, and other ad-supported systems have abandoned precision and relevance. Search results are a dog’s breakfast. To view the findabiity discussion, navigate to this link. The video was produced by Mr. Steele.

Stephen E Arnold, May 7, 2020

Search Engine Optimization: The Next Frontier Is Smart SEO

April 29, 2020

Content strategy plans are the most overlooked part of any Web site design and advertising campaign. Good content is integral to selling a product or a service, but not everyone is good at creating it. News Patrolling runs down the: “Best AI Tools For Content Marketing Strategy” and how AI is becoming an industry game changer.

Content is usually the first impression consumers have of companies. It is meant to engage the consumer, then:

“It serves as a tool to communicate with your audience. If you identify their pain points to provide them with a solution, they will trust you and be more interested in buying your offerings. The growth of your business depends on content strategy. It must be as effective as possible if you do not go downhill. Artificial intelligence can help you make an effective content marketing strategy. There are various tools to help you from targeting keywords to choosing the right topic. You will be surprised to know that AI tools can create a smarter content strategy by identifying the behaviour of users. Such software can help you increase revenues and reduce cost.”

The article recommends four content marketing software: Hubpost, Quill, Clearscope, and BrightEdge. Hubpost is advertised as using machine learning to help one get an edge on competition. The software analyzes keywords to discover what consumers want, then it clusters topics based on competition level.

Quill specializes in keyword optimization and generating quality content. Clearscope also optimizes content using keywords. It helps you generate keywords based on Google data and select the best keywords to use. Once you choose a keyword and write your post, Clearscope analyzes a post with other top-ranking posts.

BrightEdge is one integrated software solution that provides performance measurement, optimization, and keywords. It is described as a one-size-fits-all for content marketing strategies.

AI can provide insights into how to create the best content, but the most important part of a content strategy plan remains creative humans.

Yep, SEO is modernizing and automating methods to ensure that ad-supported Web search engines decide what matches a query. Precision, recall, and objectivity? Forget those irrelevant concepts.

Whitney Grace, April 29, 2020

Dig.ccMixter for Royalty-Free Tunes

April 22, 2020

Here is a resource that makers (and aspiring makers) of video content and games will want to bookmark. CCMixter is an online community where musicians share their work through creative commons licenses. Dig.ccMixter is our search portal into that content, free to download and use even for commercial purposes. Scrolling down reveals three categories: instrumental music for film & video; free music for commercial projects; and music for video games. Clicking the “Dig!” button leads to a keyword search page, where you can search by attributes like genre, mood, and instruments. The site’s About page, titled Yea, But Is It Legal? explains:

“This is a community music remixing site featuring remixes and samples licensed under Creative Commons licenses. Music on this site is licensed under a Creative Commons license. You are free to download and sample from music on this site and share the results with anyone, anywhere, anytime. Some songs might have certain restrictions, depending on their specific licenses. Each submission is marked clearly with the license that applies to it.”

So there you have it—a free source of music for your projects, even ones you intend to profit from. All you have to do is give credit where credit is due.

Interestingly, developers can also access the site’s ccHost Query API. We’re told:

“The ccHost Query API is an open, publicly available interface that is available for public use, especially by 3rd party websites, mobile applications, smart TV appliances and any other network connected device. We here at ccMixter use it to help expose the artists that upload their Creative Commons licensed music to audiences that otherwise would not have access to. The API and software implementation is owned by ArtIsTech Media under a license agreement with Creative Commons. The music itself is owned by the individual artists that uploaded it to the site and agree, through the Creative Commons licenses to share the music through this mechanism.”

Bing, Google, and Yandex are not suited for some types of music search. Enter Dig.cc Mixter. Applause, please.

Cynthia Murrell, April 22, 2020

Video Search: Maybe Find That for Which You Were Looking? Ha Ha

April 9, 2020

Searching for a motion picture online? It is collective intelligence to the rescue at Ask MetaFilter’s thread, “How to Find What Streaming Services Certain Films Are On?” Canadian poster NoneOfTheAbove was perusing this 1000 Greatest Films list and asked for an easy way to locate specific films on streaming services across the web.

The obvious is stated—use Google—with the caveat that those results may not tell you if a membership is required. Another suggestion is to follow links in the movie’s IMDb description, and one respondent notes that if one already has Roku, its search results point to sources available through that subscription. A couple people point to the streaming-service consolidator JustWatch, and one suggests Reelgood as a similar platform. The most descriptive answers, though, discuss Letterboxd:

“Another option is to sign up for a free membership with Letterboxd – that is a social-media movie-logging site that is really [darn] comprehensive. You can track what movies you want to see, what movies you have seen, and make endless lists of all kinds (‘Movies with female leads,’ ‘Movies with cute dogs,’ ‘Movies with Left-Handed Protaganists,’ whatever you want). A lot of members already have their own lists tracking their progress through the 1000 Greatest Movies list. Best of all – Letterboxd links to JustWatch and you can look at the streaming availability for a given movie when you pull it up on Letterboxd. So it may be fun to sign up for Letterboxd, make your own copy of the 1000 list, and then track your viewing progress. …Letterboxd also has a paid ‘Pro’ account where you can filter such a list based on a given streaming service like Netflix, but you may find that that’s overkill.” posted by EmpressCallipygos at 11:45 AM on March 31 [1 favorite]

“Bonus of having your own Letterboxd account is that you can already mark the ones you’ve seen and quickly visually scan for the ones you haven’t seen yet, then click through per film to see on which streaming services it’s available. I’ve been going through a bunch of the Criterion Collection this way recently myself. :D” posted by rather be jorting at 12:23 PM on March 31 [2 favorites]

So there you have several options supplied by the hive mind. Even if you aren’t looking for a film right now, this list may be worth bookmarking for future reference. Finding videos remains a challenge. Search has been solved, right? Yeah, sure.

Cynthia Murrell, April 9, 2020

Hyland Updates Document Processing Platform

April 8, 2020

Remember ISYS, the Australian search system? DarkCyber does. Hyland owns the technology. In a series of updates over the last six months, content-services provider Hyland Software has added file formats, capabilities, and support to its Document Filters platform, we learn from the press release posted by ProgrammableWeb, “Hyland Document Processing Update Includes New APIs.” The company aims to provide tools that allow its clients to process any type of file an organization may encounter in a typical day. Over 550 file formats are now supported. The write-up lists the new features:

  • Text and metadata support for Apple iBook file types, Apple PList binary files, EPUB ebook file types, and Quattro Pro Spreadsheet files
  • High definition support for NCR images, MS Project Gantt Charts, Microsoft Windows Clipboard (CLP) files, Microsoft Outlook for Mac OLK15MsgSource files, Paint Shop Pro images, Windows Cursor images, X-Windows-Bitmap images, X-Windows-Pixmap images, and WordPerfect Graphics (version 1)
  • New API for extraction and processing of hierarchical bookmark information
  • New API for the extraction and processing of static PDF form data
  • Added option, DETECT_MACROS, that outputs a metadata value if macros are detected in MS Office documents
  • New API to allow for adding common annotations such as notes, lines, shapes, polygons, and stamps. *When added to PDF output, annotations are created as native PDF annotations, that a user can interact with and modify
  • New API to allow the control of graphic effects on a per page basis
  • New option, GRAPHIC_ROTATE, to allow the rotation of an entire document rendition, or individual pages via the new graphic effects API
  • Added support for mark-up and drawing functions onto an HTML5 canvas

With clients in several different industries, Hyland helps them leverage their data to better serve their own customers. It boasts that over half of 2019’s Fortune 100 companies use its products. Founded in 1991, the firm is based in Westlake, Ohio. How many years has ISYS been available? Good question, and DarkCyber knows the answer. If you said a number less than 30, you might be on a walkabout.

Cynthia Murrell, April 8, 2020

Techspert: Search and Experts

April 6, 2020

How Our AI Search Technology Finds Experts Others Can’t” provides a crunchy description about an application of artificial intelligence. Techspert.io provides a diagram of its approach:

techspert small

The idea is that the approach operates with pinpoint precision. Then a semantic search engine is used to identify context. The old school lingo was Endeca’s Guided Search or maybe side search. Then a social graph is generated. That’s a relationship map like those used by i2 Ltd’s Analysts Notebook in the early 1990s. The i2 Ltd outfit had some Cambridge grads on its team. Finally the system can identify candidates.

What’s interesting is that the pinpoint angle appears to focus on a narrow domain; that is, individuals in STM with a focus on the M (medicine, biotechnology, etc.). This approach reduces the difficulty of indexing for any business or technical discipline. Focus means that descriptive terms are narrower than general business lingo. Second, the crawling for specialized personnel becomes somewhat easier because many sites can be ignored because they are not related to medicine and related fields; for example, the garden gnome site www.designsoscano.com. Plus, the social graph complexity can be reduced by applying qualifiers that NOT out individuals and other entities unrelated to the focus of Techspert.io; for example, David Drummond and Jennifer Blakely.

Several observations are warranted:

  1. The implemented method is useful when deployed in a focused way; that is, vertical search for different “terminologies”.
  2. Scaling the approach across different content domains may require innovative engineering. And the engineering solutions will be expensive to implement, update, and enhance.
  3. Generating market magnetism will require effective marketing and sales programs. Business development must generate sufficient revenue because once certain hires are made by a company, the recruiting service is put on ice; and sustainable revenues will have to come from recruiting services which offer lower costs, perquisites to customers, etc. These factors may inhibit some venture cash investments.

Worth monitoring this firm. A pivot may be necessary due to the uncertain economic environment.

Stephen E Arnold, April 6, 2020

Semantic Search: From Whence to What

April 2, 2020

A post from semantic SEO firm InLinks traces “The Evolution of Semantic Search.” The buzzword-filled summary does relate an interesting saga, which prompts us to wonder why enterprise search results are generally still pretty poor.

The write-up traces the evolution from the card-catalogue-like directories of early Yahoo to today’s semantic search. Along the way it details these concepts and milestones: directory-based search vs. text-based search; the crawl and discover phase; JavaScript challenges; turning text into math; the continuous bag of words (COBW) and nGrams; vectors; semantic markup; and trusted seed sets. See the post for elaboration on any of these headings.

The piece concludes:

“We started the journey of search by discussing how human-led web directories like Yahoo Directory and the Open Directory Project was surpassed by full-text search. The move to Semantic search, though, is a blending of the two ideas. At its heart, Google’s Knowledge-based extrapolates ideas from web pages and augments its database. However, the initial data set is trained by using ‘trusted seed sets’. the most visible of these is the Wikipedia foundation. Wikipedia is curated by humans and if something is listed in Wikipedia, it is almost always listed as an entity in Google’s Knowledge Graph. … So in many regards. the Knowledge Graph is the old web Directory going full circle. The original directories used a tree-like structure to give the directory and ontology, whilst the Knowledge Graph is more fluid in its ontology. In addition, the smallest unit of a directory structure was really a web page (or more often a website) whilst the smallest unit of a knowledge graph is an entity which can appear in many pages, but both ideas do in fact stem from humans making the initial decisions.”

Here is where we are reminded of the post’s source—For the SEO platform, the takeaway is that what Google considers an “entity” has become key to effective SEO marketing. For our part, we look forward to the continuation of the saga, hopefully resulting in truly effective enterprise search solutions. Some day.

Cynthia Murrell, April 2, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta