Google and IBM: Me Too Marketing or a Coincidence?

October 15, 2018

I noted this article: “Google AI Researchers Find Strange New Reason to Play Jeopardy.” What caught my attention was the introduction of the TV game show which featured IBM Watson stomping mere humans in a competition. I dismissed the human versus machine as a Madison Avenue ad confection. IBM wanted to convince the folks in West Virginia and rural Kentucky that Watson smart software was bigger than college basketball.

I think it worked. It allowed me to crank out write ups poking fun at the cognitive computing assertion, the IBM billion dollar revenue target, and the assorted craziness of IBM’s ever escalating assertions about the efficacy of Watson. I even pointed out that humans had to figure out the content used to “train” Watson and then fiddle with digital knobs and levers to get the accuracy up to snuff. The behind the scenes work was hidden from the Madison Avenue creatives; the focus was on the sizzle, not the preparatory work in the knowledge abattoir.

The Googlers have apparently discovered Jeopardy. I learned that Google uses Jeopardy to inform its smart software about reformulating questions. Here’s a passage I highlighted:

Active Question Answering,” or Active QA, as the TensorFlow package is called, will reformulate a given English-language question into multiple different re-wordings, and find the variant that does best at retrieving an answer from a database.

I am not going to slog through the history of query parsing. The task is an important one, and in my opinion, without providing precise indexing such as “company type” and other quite precise terms, queries go off base. The elimination of explicit Boolean has put the burden on query processors figuring out what humans mea when they type a query using the word “terminal” for instance. Is it a computer terminal or is it a bus terminal. No indexing? Well, smart software which looks up data in a dynamic table will do the job in a fine, fine way. What if one wants to locate a white house? Is it the DC residence of the president or is it the term for Benjamin Moore house paint when one does not know 2126-70?

Well, Google has embraced Jeopardy to make its smart software smarter and ignore the cost, time, and knowledge work of creating controlled term lists, assigning and verifying index accuracy, and fine grained indexing to deal with the vagaries of language.

So, Google seems to have hit upon the idea of channeling IBM Watson.

But I recalled seeing this article: “Google AI Can Spot Advanced Breast Cancer More Effectively Than Humans.” That reminded me of IBM Watson’s message carpet bombing about the efficacy of Big Blue cancer fighting. The only problem was that articles like “IBM Pitched Its Watson Supercomputer As a Revolution in Cancer Care. It’s Nowhere Close” Continue to Appear.”

Is Google channeling IBM’s marketing?

My hypothesis is that Google is either consciously or unconsciously tilling an already prepped field for marketing touch points. IBM did Jeopardy; Google does Jeopardy with the question understanding twist. IBM did cancer; Google does a specific type of cancer better than humans and, obviously, better than IBM Watson.

So what? My thought is that Google is shifting its marketing gears. In the process, the Google-dozer is dragging its sheep’s’ foot roller across the landscape slowly recovering from IBM’s marketing blitzes.

Will this work?

Hey, Google, like Amazon, wants to be the 21st century IBM. Who knows? I thank both companies for giving me some new fodder for my real live goats which can walk away from behemoth smart machines reworking the information landscape.

Here’s a thought? Google is more like IBM than it realizes.

Stephen E Arnold, October 15, 2018

Omnity Search: Adjusting Fast and Slow

October 14, 2018

Beyond Search maintains a file about the Omnity search system. We noted that a new white paper became available in April 2018. If you want a copy of the 42 page document, you can download a free copy at this url.

The white paper is interesting because it suggests that the current methods of finding information are “inherently biased.” Omnity’s indexing is different; for example:

Omnity has developed a semantic signature technology that impartially and mathematically articulates the deep structure of a document, and self-assembles by inter-connecting to other documents with similar structure.

Omnity may be the first search and retrieval syst4em to embrace blockchain technology, but we are not 100 percent certain. Frankly we don’t pay much attention to distributed databases because the technology is another spin down database lane and the next big thing mall.

The document contains some interesting diagrams. Some of these remind us of sense making systems for law enforcement and intelligence professionals. The company positions itself against Palantir and Quid as well as Bloomberg and Lexis Nexis. Surprisingly Linguamatics is a “leader” like Omnity.

What is fascinating is that Omnity seems to be embracing the digital currency approach to raising funds. One of the firm’s advisors is the really famous Danny Kahneman.

My recollection is that Omnity was going to knock Google search off its mountain top. Then Omnity shifted to a commercial model like the old Dialog Information Services. Now it is blending findability with blockchain and crypto currency.

More information about the company is at www.omnity.io. Get the white papers. Check out the diagrams. One question is, “Should Palantir and Quid be looking over their individual and quite broad shoulders?”

Omnity’s approach is a good example of search vendors repositioning fast and slow.

Stephen E Arnold, October 15, 2018

Search Revisionism: Alive and Well

September 27, 2018

I read “The Google Graveyard: Remembering Three Dead Search Engines.” I find it interesting how the reality perceived today seems to differ from the reality that existed in the 1990s. The write up answers the question, “Yo, dudes, what happened to three search engines?”

The three dead search engines explained or sort of described in the article are AskJeeves, Dogpile, and AltaVista.

The write up states:

Google is so ingrained in online culture that it feels as if it’s always been there.

I like feelings. Although after working at Halliburton Nuclear, I am not sure I am quite so warm and cuddly. Definitely Google was not “always” there.

And for those unfamiliar with the commercial databases like Chemical Abstracts and other commercial research services, I find this statement a bit disconcerting:

Google holds humanity’s knowledge in its search bar, and it has the ability to shape conversations on a massive scale. Imagine the internet as a million-volume collection of books, each one densely packed with essential information (and cat pictures).

Quite a statement. But people who use “always” often look for point and click solutions which require little or no attention.

You can skim the explanations of each the three search engines. I would like to offer additional information.

AskJeeves

This was a rule based system. Rules were written by humans. The AskJeeves’ system looked at a query, matched it to the rules, and offered an answer. Humans were and are expensive. Humans have to write and modify rules. AskJeeves’ death had little to do with Google and everything to do with the ineffectiveness of the system, its costs, and the resources required to come up with answers to those questions. A version of the service lives on and it is a “diller.” Sorry, dilly.

Dogpile

This services was a metasearch engine, and for a few years, a reasonable one. A user entered a query. Dogpile sent the query to other Web search engines and displayed results. The service ended up in the hands of InfoSpace, and the Dogpile engaged in some legal excitement and ended up the modern version of a one stop shop. In short, Dogpile is not yet dead.

AltaVista

Now that’s an interesting case. AltaVista was a demo of the DEC Alpha. Search was and is a complicated application. Compaq bought DEC. HP bought Compaq. HP, the management wizards, left AltaVista high and dry. Messrs. Brin and Page hired several interesting people from AltaVista; for example, Jeff Dean, Simon Tong, et al. AltaVista disappeared because HP was not exactly on the ball. Alums of AltaVista went on to set up Exalead, now a unit of Dassault Systèmes. The Exalead search system is still online at www.exalead.com/search.

NetNet

AskJeeves was not a Web search engine. Dogpile was a metasearch engine and did little original crawling and indexing. AltaVista is embedded in certain technological ways in the Google system. And, by the way, Google is not the place to go if your child has been poisoned and your doctor needs an antidote.

Even those who do not understand information can figure out the limits of ad supported, free information. At least I hope so.

Stephen E Arnold, September 27, 2018

Bing: No More Public URL Submissions

September 19, 2018

Ever wondered why some Web site content is not indexed? Heck, ever talk to a person who cannot find their Web site in a “free” Web index? I know that many people believe that “free” Web search services are comprehensive. Here’s a thought: The Web indexes are not comprehensive. The indexing is selective, disconnected from meaningful date and time stamps, and often limited to following links to a specified depth; for example, three levels down or fewer in many cases.

I thought about the perception of comprehensiveness when I read “Bing Is Removing Its Public URL Submission Tool.” The tool allowed a savvy SEO professional or an informed first time Web page creator to let Bing know that a site was online and ready for indexing.

No more.

How do “free” Web indexes find new sites? Now that’s a good question, and the answers range from “I don’t know” or “Bing and Google are just able to find these sites.”

A couple of thoughts:

  • Editorial or spidering policies are not spelled out by most Web indexing outfits
  • Users assume that if information is available online, that information is accurate
  • “Free” Web indexing services are not set up to deliver results that are necessarily timely (indexed on a daily basis) or comprehensive.

Bing’s allegedly turning off public url submissions is a small thing. My question, “Who looked at these submissions and made a decision about what to index or exclude from indexing?” Perhaps the submission form operated like a thermostat control in a hotel room?

Stephen E Arnold, September 18, 2018

Semantic Struggles and Metadata

August 31, 2018

I have noticed the flood of links and social media posts about semantics from David Amerland. I found many of the observations interesting; a few struck me as a wildly different view of indexing. A recent essay by David AmerlandSnipers Use Metadata Much Like Semantic Search Does” caught the Beyond Search team’s attention.

image

Learn about “The Sniper Mind” at this link.

According to the story:

“There are two key takeaways here [about metadata and trained killers]: First, such skills are directly transferable in the business domain and even in most life situations. Second, in order to use their brain in this way snipers need training. The mental training and the psychological aids that are developed as a result of it is what I detailed…”

We must admit that it is a fresh metaphor: Comparing killers’ use of indexing with semantic search. In our experience with professional indexing systems and human indexers, the word “sniper” has not to our recollection been used.

Watch your back, your blindside, or ontology. Oh, also metaphors.

Patrick Roland, August 31, 2018

Internet Search Engines that Reach Past Bing or Google Search

August 27, 2018

An article at Kimallo shares a roster of their ten “Most Valuable Deep Web Search Engines.” Billed as a list of search engines that plumb depths not found in a Google or Bing search, this collection is indeed that. One could wish the Dark Web and the Deep Web were not conflated in the piece’s introduction, but anyone who is fuzzy on the difference can click here for clarification. The list is an assortment of search engines that tap into the Deep and/or Dark Web to different degrees in different ways. Only one, “not Evil,” uses Tor, about which we’re told:

“Unlike other Tor search engines, not Evil is not for profit. The cost to run not Evil is a contribution to what one hopes is a growing shield against the tyranny of an intolerant majority. Not Evil is another search engine in the Tor network. According to its functionality and quality it is highly competitive with the competitors. There is no advertising and tracking. Due to thoughtful and continuously updated algorithms of search it is easy to find the necessary goods, content or information. Using not Evil, you can save a lot of time and keep total anonymity. The user interface is highly intuitive. It should be noted that previously this project was widely known as TorSearch.”

The other nine entries include people-prying tools pipl and mylife; metasearch engines Yippy, Fazzle, and privacy-centric DuckDuckGo; SurfWax, which seeks to turn search into a “visual process”; StartPage, another platform emphasizing privacy; the Wayback Machine, an archive of open web pages; and Google Scholar, which can be configured to access the NSCU Libraries’ databases and journal subscriptions. I’ll add that Beyond Search pointed out Ichidan last autumn, a search engine designed to look up sites hosted through the Tor network. Though one should not rely on the Kimallo article to distinguish between these general Web classifications, anyone who would like to go beyond the reach of Bing or Google may want to explore these options.

One question: Do metasearch systems go “beyond” Google? Some here at Beyond Search believe metasearch engines are recyclers, not indexes which point to content not included in primary spidering and indexing systems.

Cynthia Murrell, August 27, 2018

Twitter Bans Accounts

August 22, 2018

i read “Facebook and Twitter Ban over 900 Accounts in Bid to Tackle Fake News.” Twitter was founded about 12 years ago. The company found itself in the midst of the 2016 election messaging flap. The article reports:

Facebook said it had identified and banned 652 accounts, groups and pages which were linked to Iran and to Russia for “co-ordinated inauthentic behavior”, including the sharing of political material.

One of the interesting items of information which surfaced when my team was doing the research for CyberOSINT and the Dark Web Notebook, both monographs designed for law enforcement and intelligence professionals, was the ease with which Twitter accounts can be obtained.

For a program we developed for a conference organizer in Washington, DC, in 2015, we illustrated Twitter messages with links to information designed to attract young men and women to movements which advocated some activities which broke US laws.

The challenge had in 2015 several dimensions. Let me run down the ones the other speakers and I mentioned; for example:

  • The ease with which an account could be created
  • The ease with which multiple accounts could be created
  • The ease with which messages could be generated with suitable index terms
  • The ease with which messages could be disseminated across multiple accounts via scripts
  • The lack of filtering to block weaponized content.

Back to the present.

Banning an account addresses one of these challenges.

The notion of low friction content dissemination, unrestricted indexing, and the ability to create accounts is one to ponder.

Killing an account or a group of accounts may not have the desired effect.

Compared to other social networks, Twitter has a strong following in certain socio economic sectors. That in itself adds a bit of spice to the sauce.

Stephen E Arnold, August 22, 2018

DuckDuck Go and Its View of Google

August 16, 2018

A post at the Search Engine Journal reproduces a series of tweets—“DuckDuckGo Blasts Google for Anti-Competitive Search Behavior,” they report. Writer Matt Southern introduces the captured tweets, noting that DuckDuckGo seems to have been prompted by the record $5 billion fine recently levied on Google by the EU for antitrust violations. Here’s what DuckDuckGo had to say about specific ways Googley practices have affected them:

“We welcome the EU cracking down on Google’s anti-competitive search behavior. We have felt its effects first hand for many years and has led directly to us having less market share on Android vs iOS and in general mobile vs desktop.

We noted:

“Up until just last year, it was impossible to add DuckDuckGo to Chrome on Android, and it is still impossible on Chrome on iOS. We are also not included in the default list of search options like we are in Safari, even though we are among the top search engines in many countries.

And this statement was interesting:

“The Google search widget is featured prominently on most Android builds and is impossible to change the search provider. For a long time it was also impossible to even remove this widget without installing a launcher that effectively changed the whole way the OS works. Their anti-competitive search behavior isn’t limited to Android. Every time we update our Chrome browser extension, all of our users are faced with an official-looking dialogue asking them if they’d like to revert their search settings and disable the entire extension.”

Google owns the domain Duck.com, which redirects to the Google home page and may confuse some DuckDuckGo users. Southern notes the privacy-centric search engine continues to dog Google on Twitter; for example, they recently called it a “myth” that users cannot be tracked when using (Google-owned) Chrome in Incognito mode and linked to a post that details why their process is far more effective at protecting user privacy. I suggest the curious navigate to that resource for the technical details.

BeyondSearch believes that DuckDuckGo is a metasearch system with some unique content. Depending on one’s point of view, there may be significant differences between DuckDuckGo and primary Web indexing systems like Exalead, Qwant, or Yandex. Running the same query on different systems is often a useful way to get a sense of what is in an index and what is not.

Cynthia Murrell, August 14, 2018

 

About Wanting China to Change

August 2, 2018

I read “Google Developing News App for China.” Interesting tactical shift at the GOOG. I won’t bring up the remarkable suggestion some senior Googlers floated years ago. Nope. I won’t write: “Google wants China to change.” No. I will not mention that the Middle Kingdom has not been a social construct ready to rush into the Brave New World. China is, well, China.

The main point of the write up is that Google employs some people who have probably figured out that China is a big market. In terms of market share, Google is looking at the Great Wall from afar. Where there is money, there is now a desire to become a player in what sure looks like one of the world’s largest markets.

Bottom-line: Google will do things the way China wants them done. That killing courtyard in Xi’an made it clear that once in that clever reception area, one did it China’s way or the clueless traders were in a position of strategic and tactical disadvantage. That’s a nice way of saying “trapped.”

Indexing information for China requires a basic tweak: Exclude content not on the Chinese white list.

What does this mean for the old “information wants to be free” idea?

It means that filtered information is what a person will see if Beyond Search understands the assertions in the write up.

Interesting stuff.

Google has learned a basic lesson at a cost of hundreds of millions, perhaps billions, in revenue: Companies are not nation states.

Beyond Search has learned that certain “ideals” are what one might describe as “flexible.” “Real” news and MBAs discussing ethics. As the Beyond Search goose knows, “Bend like the willow in a wind.”

Stephen E Arnold, August 2, 2018

DarkCyber for July 17, 2018, Now Available

July 17, 2018

DarkCyber for July 17, 2018, is now available. You may view the nine minute news program about the Dark Web and lesser known Internet services at www.arnoldit.com/wordpress or Vimeo at this link. This week’s program covers:

This week’s program covers four stories.
The first story reviews the enhanced capabilities of Webhose.io’s Dark Web and Surface Web monitoring service. Tor Version 3 is supported. The content collection system can now access content on Dark Web and i2p services. Plus, Webhose’s system now scans compressed attachments and can access obfuscated sites with Captcha and user name and password requirements.

The second story reports that NSO, an Israeli intelligence services firm, suffered an insider breach. NSO’s Pegasus platform can extract email, text messages, SIM card and cell network information, GPS location data, keychain passwords, including Wi-Fi and router, and voice and image data. The NSO Pegasus system was advertised on the Dark Web. The insider was identified and arrested.

The third story takes a look at Dark Web money laundering services. Mixers, tumblers, and flip concepts are explained. These services are becoming more popular and are coming under closer scrutiny by law enforcement.

The fourth story explains Diffeo’s approach to next generation information access. Diffeo was one of the technology vendors for the Defense Advanced Research Projects Agency’s Memex Dark Web indexing program. The commercial version of Diffeo’s analytic tool is in use at major financial institutions and the US Department of Defense.

Enjoy.

Kenny Toth, July 17, 2018

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta