Bing Keeps on Trying

May 21, 2018

Ah, Bing.

Microsoft has struggled to garner the respect in the search engine world that its software has commanded.

Bing is often seen as the Avis to Google’s Hertz. Maybe a stepchild of the search game patriarchs, Sergey and Larry.

Microsoft is not blind to these views, which is resulting in some interesting innovations to close the gap between it and Google. We learned about these steps from a recent TechRadar story, “Microsoft Unveils New Features for Bing in Bid to Make You Switch from Google.”

The biggest upgrade? The fact that Bing now gives you an “Intelligent Answer” and not just the one that ranks first. It seems like a good move, which the article highlights:

“We’re pleased to see Microsoft attempt to win over users by adding more features (which you can read about more on the Bing blog), rather than trying to strong-arm people who use Windows 10 into using the search engine, but will this be enough to make people switch?”

We’re going to go out on a (not very long) limb and suggest, no. This isn’t enough to make people switch. That’s especially true when we see news like this, that claims that Google’s Assistant is the most accurate. Looks like the game board is shifting beneath Microsoft’s feet as they try to catch up. How does one find information available on the Internet?

One doesn’t without recourse to commercial systems from vendors with low or zero profile among consumers. Money is required to find relevant information. Free stuff returns what earns money to pay for the “free lunch.”

Patrick Roland, May 21, 2018

Bing Engineers Serendipity, Not Just Irrelevant Results

May 5, 2018

It is Saturday. Innovation in search never rests. I read “Bing: Search Engines Have a Responsibility to Get People Out of Their Bubbles.” The headline is one guaranteed to give me a headache.

My view is that when I use a search system I expect, want, and need the system to:

  1. Process my keyword query, accept Boolean logic (AND, OR, and NOT arguments), and generate a list of results that optimize relevance.
  2. If I need more results, synonyms, Endeca-like “facets”, I want a button or a menu option that allows me to specify what I think I need to get the information I seek.
  3. I want to have ads, sponsored content, and SEO skewed content flagged in a color which is easily visible and put within a ruled “box.”
  4. I want to know [a] the date at which the displayed result was indexed, [b] the date assigned by whoever wrote the item to the specific article, and [c] an explicit link to a cache in the event the page indexed has been removed or is otherwise unavailable.

I have other requirements for a commercial search system; for example, Diffeo’s or Recorded Future’s approach. But these are specialized and inappropriate for a Bing style Web index.

The Bing approach, according to the write up:

Bing has launched a new feature called Intelligent Answers. When you enter a question with several valid answers, the search engine summarizes them all in a carousel to give a balanced overview.

I don’t want answers. I want a list of relevant locations which may contain the information I seek. For example, I needed to identify the term for a penance device and access images of these gizmos. Bing, Google, Yandex, and even lesser known systems like and failed.

The systems returned everything from a church calendar to a correction of penance to pennant. I did not want baseball information.

Now Bing is going to identify from my query my “question” and provide a range of answers. I don’t want this to happen. If I search LOCA, I want information about a loss of coolant accident, not this:

loca the song

I am happy to add a field code for power, nuclear if such a feature were supported by Bing. I would also add key words to get something close to my term.

The complete and utter silliness of Bing results exists right now. The company which has managed minimal progress in search now expects me to believe that its “smart software” can provide answers.

The write up states:

“Take a simple query like ‘Is coffee good for you?’” said Ribas. “There are plenty of reputable sources that tell you that there are good reasons for drinking coffee, but there are also some very reputable ones that say the opposite. Deep learning allows us to project multiple queries in the passages to what we call the semantic space and find the matches.

Based on my limited experience with whizzy 2018 search technology, I am not sure if Bing’s innovation will be helpful to me. When “semantic space” is concerned, the systems with which I am familiar, provide a number of other tools and functions to ensure relevance and accuracy.

Even with those tools, including state of the art systems from developers from Madrid to San Carlos, the user has to think, analyze, and run additional queries. Phone calls, interviews, and even visits to libraries are often required to obtain helpful information.

Bing promises “intelligent answers.”

Sounds like MBA infused marketing with a few notes added from engineers with better things to do than explain exactly what a content processing component can do with 80 percent accuracy.

Time out. The referee wants the coach to get the MBA marketers off the field for intellectual fantasizing. This is the same outfit which owns Fast Search & Transfer, created the racist chatbot, and missed the mobile phone business by a country mile. Why not ask Bing a question like, “How did these missteps occur?” Perhaps Watson would be able to take a crack at “intelligent answers”?

Stephen E Arnold, May 5, 2018

Web Archives

May 4, 2018

Short honk: Here’s a list of Web archives. These services allow one to find pages from a defunct or unavailable Web site or page:

The Beyond Search goose has learned to generate a PDF of information. Quite a bit of content “disappearing” is taking place. To cite one example: Try to locate the list of MIC, RAC, and ZPIC vendors once engaged in locating health care billing fraud and similar misunderstandings. Enjoy your hunt for these items of information.

The source article was “Force Archive Websites to Pick up Webpages with This Handy Tool.”

Stephen E Arnold, May 4, 2018

LucidWorks Has a Search App for That. What?

April 27, 2018

Is there life in enterprise search after many years of hype, razzle dazzle, and over the top marketing?


Lucidworks announced that they have a brand new search tool for enterprise business systems, says Global Newswire in the article, “Lucidworks Launches AI-Powered Site Search App For Enterprise.” The new application is dubbed Lucidworks Site Search and it is an easy configurable, embeddable site-based application.

Lucidworks Site Search uses workflows that optimize natural language processing and machine learning for users to personalize their search results. The application uses rich faceting and filtering to drill down for the most accurate results. Users will be able to access content and insights quicker than older applications.

The Lucidworks CEO said,

“‘Developing a website’s search with both a powerful backend and an elegant UI can be an arduous process. We’ve created Site Search to empower more teams to get site search apps done and out the door,’ explains Lucidworks CEO Will Hayes. ‘By increasing the usability through an applications-based approach, we’re able to bring Lucidworks’ operationalized AI to more customers.’”

We enjoy terms like “operationalize.” Do we understand these MBA inspired noun to verb arabesques? Not really.

Key word search is a useful utility. The new Lucidworks Site Search scans through every document, allows quick configuration, and has an attractive user interface. Elasticsearch does this as well.

We believe the future belongs to vendors with a more comprehensive next generation information access system. In short, more like Palantir Gotham or BAE NetReveal and less like the mainframe centric IBM Stairs approach.

Whitney Grace, April 27, 2018

Taking Time for Search Vendor Limerance

April 18, 2018

Life is a bit hectic. The Beyond Search and the DarkCyber teams are working on the US government hidden Web presentation scheduled this week. We also have final research underway for the two Telestrategies ISS CyberOSINT lectures. The first is a review of the DarkCyber approach to deanonymizing Surface Web and hidden Web chat. The second focuses on deanonymizing digital currency transactions. Both sessions provide attendees with best practices, commercial solutions, open source tools, and the standard checklists which are a feature of  my LE and intel lectures.

However, one of my associates asked me if I knew what the word “limerance” meant. This individual is reasonably intelligent, but the bar for brains is pretty low here in rural Kentucky. I told the person, “I think it is psychobabble, but I am not sure.”

The fix was a quick search. The wonky relevance of the Google was the reason for the shift to the once indomitable Microsoft.

Limerance, according to Bing’s summary of Wikipedia means “a state of mind which results from a romantic attraction to another person typically including compulsive thoughts and fantasies and a desire to form or maintain a relationship and have one’s feelings reciprocated.”


Upon reflection, I decided that limerance can be liberated from the woozy world of psychologists, shrinks, and wielders of water witches.

Consider this usage in the marginalized world of enterprise search:

Limerance: The state of mind which causes a vendor of key word search to embrace any application or use case which can be stretched to trigger a license to the vendor’s “finding” system.


Read more

About That Google Question Answering: Books, Scholar, and Open Source at Its Talon Tips

April 17, 2018

Googzilla prides itself on consuming search queries. Answering those questions? That’s a matter for discussion. Note that here in Harrod’s Creek we understand that if Google does not point to an entity, Web site, or factoid—that entity, Web site, or factoid does not exist. Who knew that those in Harrod’s Creek were into epistemology?

However, Pagal Parrot found “10 Questions Even Google Can’t Answer.” Let us talk a look at the write up’s exemplary 10 questions:

“1. Why does a round pizza come in a square box?

2. Why are boxing rings square?

3.What is Satan’s last name?

4. Why do we press harder on a remote control when we know the batteries are flat?

5. Why is Google not the most translated website?

6. Why do banks charge a fee on ‘insufficient funds’ when they know there is not enough?

7. Why is it that people say they ‘slept like a baby’ when babies wake up, like, every two hours?

8. Why do Baidu lead Google in China?

9. Do Atheist also swear by the Bible /Quran when they go to court?

10. Why do people get angry each time another passenger sits beside them in a seat?”

These questions also beg another question: Do people spend time trying to dumbfound Google? It appears that the answer is, “Folks do try to bedevil the GOOG.”

The article is mostly for giggles, but there are definitely more than 10 questions Google cannot answer. Here is one: When will Google answer questions with precision and recall balanced for relevance and “accuracy”? Would advertisers respond to the functionality?

Whitney Grace, April 17, 2018

Google Argues With Russia About Website Rankings

April 10, 2018

Amidst its employee petitions and the increasing concern about YouTube videos for children, Google is annoyed with Russia.

Google fiddled with its ranking algorithm to stop the dissemination of fake news and Russia believes it is biased against two of its news agencies. Reuters describes more of the argument in the story, “Google Seeks To Defuse Row With Russia Over Website Rankings.” Roskomnadzor called out Alphabet Inc. and its popular search engine Google, when it claimed that Google pushed Russian media sites Sputnik and Russia Today into lower search results.

Eric Schmidt claimed that Google would not be deleting those links, instead they would be pushed lower in search results. Russia claimed Google discriminated against Russia Today and Sputnik, also saying they would take action if necessary. Google responded:

“ ‘We’d like to inform you that by speaking about ranking of web-sources, including the websites of Russia Today and Sputnik, Dr. Eric Schmidt was referring to Google’s ongoing efforts to improve search quality,’ Google said in a letter posted on Roskomnadzor’s website… ‘We don’t change our algorithm to re-rank,’ it added. A Google spokeswoman confirmed the letter had been sent by the company but provided no further comment.”

Years ago Mr. Brin’s trip to space fizzled. Now the search giant is finding fault with a country known to use interesting methods to solve problems.

Whitney Grace, April 10, 2017

Mondeca: Another Semantic Search Option

April 9, 2018

Mondeca, based in France, has long been focused on indexing and taxonomy. Now they offer a search platform named, simply enough, Semantic Search. Here’s their description:

“Semantic search systems consider various points including context of search, location, intent, variation of words, synonyms, generalized and specialized queries, concept matching and natural language queries to provide relevant search results. Augment your SolR or ElasticSearch capabilities; understand the intent, contextualize search results; search using business terms instead of keywords.”

A few details from the product page caught my eye. Let’s begin with the Search functionality; the page succinctly describes:

“Navigational search – quickly locate specific content or resource. Informational search – learn more about a specific subject. Compound term processing, concept search, fuzzy search, simple but smart search, controlled terms, full text or metadata, relevancy scoring. Takes care of language, spelling, accents, case. Boolean expressions, auto complete, suggestions. Disambiguated queries, suggests alternatives to the original query. Relevance feedback: modify the original query with additional terms. Contextualize by user profile, location, search activity and more.”

The software includes a GUI for visualizing the semantic data, and features word-processing tools like auto complete and a thesaurus. Results are annotated, with key terms highlighted, and filters provide significant refinement, complete with suggestions. Results can also be clustered by either statistics or semantic tags. A personalized dashboard and several options for sharing and publishing round out my list. See the product page for more details.

Established in 1999, Mondeca delivers pragmatic semantic solutions to clients in Europe and North America, and is proud to have developed their own, successful semantic methodology. The firm is based in Paris. Perhaps the next time our beloved leader, Stephen E Arnold, visits Paris, the company will make time to speak with him. Previous attempts to set up a meeting were for naught. Ah, France.

Cynthia Murrell, April 9, 2018

Video Search: Still a Challenge

April 6, 2018

As MIT Technology Review describes in its article, “The Next Big Step for AI? Understanding Video,” artificial intelligence still tends to have trouble correctly interpreting video. A recent slew of new jobs at YouTube (owned by Google) underscores this flaw—“YouTube is Hiring 10,000 People to Police Offensive Videos,” reports the New York Post. When it comes to objectionable content, algorithms just don’t get it. Yet. Meanwhile, the PR machine keeps running.

MIT Tech editor Will Knight discusses some promising solutions in the above article, beginning close to home with a collaboration between MIT and IBM. He writes:

“MIT and IBM this week released a vast data set of video clips painstakingly annotated with details of the action being carried out. The Moments in Time Dataset includes three-second snippets of everything from fishing to break-dancing. ‘A lot of things in the world change from one second to the next,’ says Aude Oliva, a principal research scientist at MIT and one of the people behind the project. ‘If you want to understand why something is happening, motion gives you lot of information that you cannot capture in a single frame.’” … “The MIT-IBM project is in fact just one of several video data sets designed to spur progress in training machines to understand actions in the physical world. Last year, for example, Google released a set of eight million tagged YouTube videos called YouTube-8M. Facebook is developing an annotated data set of video actions called the Scenes, Actions, and Objects set.”

Knight also mentions Twenty Billion Neurons, which, he notes:

“… Created a custom data set by paying crowdsourced workers to perform simple tasks. One of the company’s cofounders, Roland Memisevic, says it also uses a neural network designed specifically to process temporal vision information.”

So, we should not be surprised if, soon, AI can comprehend what it “sees.” Meanwhile, sites that host video content would do well to employ the judgment of humans.

Cynthia Murrell, April 6, 2018

Build an Alternative Google: How To Wanted

April 6, 2018

Hacker News presented an interesting question, “How would you build an internet scale web crawler?” We have been talking with companies which have developed Internet search systems that are not available for free Web search. Those conversations have produced some fascinating information. Some of the data will be included in my upcoming lecture for a government agency and then in my two presentations at the June 2018 Telestrategies ISS Conference in Prague.

What was interesting about this question was the few people responded. That is interesting because my team’s research for my new presentations on deanonymizing encrypted chat and deanonymizing digital currency transactions pivot on comprehensive Internet indexing. In fact, more companies are indexing the Internet content than at any time in the last 10 years.

The second issue the post triggered was a realization that only a handful of people jumped on the topic. This low response to the question in itself is interesting. With more activity in indexing, why aren’t more people helping out JustinGarrson? That’s a question worth thinking about.

Third, one of the responses to the Hacker News question was a pointer to the open source project. We once included this technology in our Internet Research for Law Enforcement training program. My recollection of the system is fuzzy, so I will get one of my team to take at look.

The final thought the Hacker News’ story triggered was, “Have people just accepted Bing, Google, Qwant, and a handful of metasearch systems as too dominant to challenge?” My view is that an opportunity exists to create a public facing Internet search and retrieval system. The reason? Outstanding alternatives to Bing, Google, and Qwant are available for those who qualify as customers and who are willing to pay the license fees.

My hunch is that just as enterprise search has coalesced around the open source Lucene/Solr technologies, free Web search has become “game over” because the ad supported model has won.

The problem, of course, is that a person looking for information usually does not realize that free Web search results are neither comprehensive, timely, or objective.

I hope individuals like JustinGarrison get the information needed to seize an opportunity in Internet search.

Stephen E Arnold, April 6, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta