Amazon and YouTube: The Hong Kong Protests Mark the Day that Twitch.tv Made Clear the Limitations of YouTube

June 16, 2019

I heard there was a small protest underway in Hong Kong. The time is now 6 30 am US Eastern time. I navigated to YouTube, entered the query “Hong Kong protest”, and I saw links to videos from a day ago (today is June 16, 2019). I navigated to the YouTube “Live” page which provides a limited selection of streaming videos on YouTube. If you have not seen that somewhat incomplete index, navigate to https://www.youtube.com/live. No live stream of the Hong Kong protest.

If it’s not on YouTube, then it doesn’t exist, goes some old times’ catchphrase.

Well, not quite.

Navigate to Amazon’s Twitch.tv. Run a query for Hong Kong. Here’s what I saw before I clicked on the live stream of Unable to Breath.

image

Amazon Twitch.tv search result. The Unable to Breath stream is not one but an aggregate of eight separate feeds from Hong Kong.

Front and center was a link to Unable to Breath, which presents this streaming image:

image

This is a screen shot of a single screen which is eight different feeds showing different views of the handful of people who are participating in the event. Note: Handful means more than one million.

Notice that three are eight live streams of this modest protest. This is one live stream with eight separate views of the modest demonstration in Hong Kong. Eight in one stream! No registration required. No in stream pop up ads. Just high value intelligence in pretty good streaming video quality.

Read more

Google: Can Semantic Relaxing Display More Ads?

June 10, 2019

For some reason, vendors of search systems have shuddered if a user’s query returns a null set. the idea is that a user sends a query to a system or more correctly an index. The terms in the query do not match entries in the database. The system displays a message which says, “No results match your query.”

For some individuals, that null set response is high value information. One can bump into null sets when running queries on a Web site; for example, send the anti fungicide query to the Arnold Information Technology blog at this link. Here’s the result:

image

From this response, one knows that there is no content containing the search phrase. That’s valuable for some people.

To address this problem, modern systems “relax” the query. The idea is that the user did not want what he or she typed in the search box. The search system then changes the query and displays those results to the stupid user. Other systems take action and display results which the system determines are related to the query. You can see these relaxed results when you enter the query shadowdragon into Google. Here are the results:

image

Google ignored my spelling and displays information about a video game, not the little known company Shadowdragon. At least Google told me what it did and offers a way to rerun the query using the word I actually entered. But the point is that the search was “relaxed.”

The purpose of semantic expansion is a variation of Endeca’s facets. The idea is that a key word belongs to a category. If a system can identify a category, then the user can get more results by selecting the category and maybe finding something useful. Endeca’s wine demonstration makes this function and its value clear.

Read more

Nosing Beyond the Machine Learning from Human Curated Data Sets: Autonomy 1996 to Smart Software 2019

April 24, 2019

How does one teach a smart indexing system like Autonomy’s 1996 “neurodynamic” system?* Subject matter experts (SMEs) assembled training collection of textual information. The article and other content would replicate the characteristics of the content which the Autonomy system would process; that is, index and make searchable or analyzable. The work was important. Get the training data wrong and the indexing system would assign metadata or “index terms” and “category names” which could cause a query to generate results the user could perceive as incorrect.

image

How would a licensee adjust the Autonomy “black box”? (Think of my reference to Autonomy and search as a way of approaching “smart software” and “artificial intelligence.”)

The method was to perform re-training. The approach was practical and for most content domains, the re-training worked. It was an iterative process. Because the words in the corpus fed into the “black box” included new words, concepts, bound phrases, entities, and key sequences, there were several functions integrated into the basic Autonomy system as it matured. Examples ranged from support for term lists (controlled vocabularies) and dictionaries.

The combination of re-training and external content available to the system allowed Autonomy to deliver useful outputs.

Where the optimal results departed from the real world results usually boiled down to several factors, often working in concert. First, licensees did not want to pay for re-training. Second, maintenance of the external dictionaries was necessary because new entities arrive with reasonable frequency. Third, testing and organizing the freshening training sets and the editorial work required to keep dictionaries ship shape was too expensive, time consuming, and tedious.

Not surprisingly, some licensees grew unhappy with their Autonomy IDOL (integrated data operating layer) system. That, in my opinion, was not Autonomy’s fault. Autonomy explained in the presentations I heard what was required to get a system up and running and outputting results that could easily hit 80 percent or higher on precision and recall tests.

The Autonomy approach is widely used. In fact, wherever there is a Bayesian system in use, there is the training, re-training, external knowledge base demand. I just took a look at Haystax Constellation. It’s Bayesian and Haystax makes it clear that the “model” has to be training. So what’s changed between 1996 and 2019 with regards to Bayesian methods?

Nothing. Zip. Zero.

Read more

Expert System: Interesting Financials

April 6, 2019

Expert System SpA is a firm providing semantic software that extracts knowledge from text by replicating human processes. I noticed information on the company’s Web site which informed me:

  • The company had sales revenues of 28.7 million euros for 2018
  • The company’s growth was 343 percent compared to 2017
  • The net financial position was 12.4 million euros up from 8.8 million euros in March 2017.

Remarkable financial performance.

Out of curiosity I navigated to Google Finance and plugged in Expert System Spa to see what data the GOOG could offer.

Here’s the chart displayed on April 6, 2019:

image

The firm’s stock does not seem to be responding as we enter the second quarter of 2019.

Read more

Facebook: Ripples of Confusion, Denial, and Revisionism

March 18, 2019

Facebook contributed to an interesting headline about the video upload issue related to the bad actor in New Zealand. Here’s the headline I noted as it appeared on Techmeme’s Web page:

image

The Reuters’ story ran a different headline:

image

What caught my attention is the statement “blocked at upload.” If a video were blocked at upload, were those videos removed? If blocked, then the number of videos drops to 300 million.

This type of information is typical of the coverage of Facebook, a company which is become the embodiment of social media.

There were two other interesting Facebook stories in my news feed this morning.

The first concerns a high profile Silicon Valley investor, Marc Andreessen. The write up reports and updates a story whose main point is:

Facebook Board Member May Have Met Cambridge Analytica Whistleblower in 2016.

Read more

The Search Wars: When Open Starts to Close

March 12, 2019

Compass Search. The precursor. The result? Elasticsearch. No proprietary code. Free and open source. The world of enterprise search shifted.

As a result of Shay Bannon’s efforts, an alternative to proprietary search and interesting financial maneuvers, an individual or organization could download code and set up a functional enterprise search system.

There are proprietary search systems available like Coveo. But most of the offerings are sort of open sourcey. It is a marketing ploy. The forward leaning companies do not use the word search to market their products because zippier functionality is what brings tire kickers and some buyers.

The landscape of search seems to be doing its Hawaii volcano act. No real eruption buts shakes, hot gas, and cracks have begun to appear. The lava flows will come soon enough.

a bezos art

The path is clear to the intrepid developer.

The tip off is Amazon’s announcement that it now offers an open distro for Elasticsearch. Why is Amazon taking this step? The company explains:

Elasticsearch has become an essential technology for log analytics and search, fueled by the freedom open source provides to developers and organizations. Our goal is to ensure that open source innovation continues to thrive by providing a fully featured, 100% open source, community-driven distribution that makes it easy for everyone to use, collaborate, and contribute.

DarkCyber’s briefings about Amazon’s policeware initiative suggest that the online bookstore is adding another component to its robust intelligence system and services.

The move involves or will involve:

  • Entrepreneurs who will see Amazon as creating low friction for new products and services
  • Partners because implementing search can be a consulting gold mine
  • Users
  • Developers who will use an Amazon “off the shelf” solutions
  • Competitors who may find the “other open source” Elasticsearch lagging behind the Amazon “house brand”.

The move is not much of a surprise. Amazon seeks to implement its version of IBM’s 1960s style vendor lock in. Open source is open source, isn’t it? A version of the popular Elasticsearch system which has utility in commercial products to add ons which help make log files more mine-able. Plus search snaps into the DNA of the Amazon jungle of services, functions, features, and services. Where there is confusion, there are opportunities to make money.

Adding a house brand to its ecosystem is a basic tactic in the Amazon playbook. Those T shirts with the great price are Amazon’s, not the expensive stuff with a fancy brand name. T shirts and search? Who cares?

What’s the play mean for over extended proprietary search systems which may never generate a pay day for investors? A lot of explaining seems likely.

What the play mean for Elastic, the company which now operates the son of Compass Search? Some long off site meetings may be ahead and maybe some chats with legal eagles.

What’s the play mean for vendors using Amazon as back end plumbing for their enterprise or policeware services? A swap out of the Elasticsearch system for the Amazon version could be in the cards. Amazon Elasticsearch will probably deliver fewer headaches and lost weekends than using the Banon-Elastic version. Who wants headaches in an already complex, expensive implementation?

The Register quotes an evangelist from AWS as saying:

“We will continue to send our contributions and patches upstream to advance these projects.”

DarkCyber interprets this action and Amazon’s explanations from the perspective and context of a high school football coach:

“Front line, listen up, fork that QB. I want that guy put down. Hard. Let’s go.”

Amazon. The best defense is a good offense, right?

The coach shouts:

“Let’s hit those Sheep hard. Arrrgh.”

Stephen E Arnold, March 12, 2019

IBM Debate Contest: Human Judges Are Unintelligent

February 12, 2019

I was a high school debater. I was a college debater. I did extemp. I did an event called readings. I won many cheesey medals and trophies. Also, I have a number of recollections about judges who shafted me and my team mate or just hapless, young me.

I learned:

Human judges mean human biases.

When I learned that the audience voted a human the victor over the Jeopardy-winning, subject matter expert sucking, and recipe writing IBM Watson, I knew the human penchant for distortion, prejudice, and foul play made an objective, scientific assessment impossible.

ibm debate

Humans may not be qualified to judge state of the art artificial intelligence from sophisticated organizations like IBM.

The rundown and the video of the 25 minute travesty is on display via Engadget with a non argumentative explanation in words in the write up “IBM AI Fails to Beat Human Debating Champion.” The real news report asserts:

The face-off was the latest event in IBM’s “grand challenge” series pitting humans against its intelligent machines. In 1996, its computer system beat chess grandmaster Garry Kasparov, though the Russian later accused the IBM team of cheating, something that the company denies to this day — he later retracted some of his allegations. Then, in 2011, its Watson supercomputer trounced two record-winning Jeopardy! contestants.

Yes, past victories.

Now what about the debate and human judges.

My thought is that the dust up should have been judged by a panel of digital devastators; specifically:

  • Google DeepMind. DeepMind trashed a human Go player and understands the problems humanoids have being smart and proud
  • Amazon SageMaker. This is a system tuned with work for a certain three letter agency and, therefore, has a Deep Lens eye to spot the truth
  • Microsoft Brainwave (remember that?). This is a system which was the first hardware accelerated model to make Clippy the most intelligent “bot” on the planet. Clippy, come back.

Here’s how this judging should have worked.

  1. Each system “learns” what it takes to win a debate, including voice tone, rapport with the judges and audience, and physical gestures (presence)
  2. Each system processes the video, audio, and sentiment expressed when the people in attendance clap, whistle, laugh, sub vocalize “What a load of horse feathers,” etc.
  3. Each system generates a score with 0.000001 the low and 0.999999 the high
  4. The final tally would be calculated by Facebook FAIR (Facebook AI Research). The reason? Facebook is among the most trusted, socially responsible smart software companies.

The notion of a human judging a machine is what I call “deep stupid.” I am working on a short post about this important idea.

A human judged by humans is neither just nor impartial. Not Facebook FAIR.

An also participated award goes to IBM marketing.

participant meda

IBM snagged an also participated medal. Well done.

Stephen E Arnold, February 13, 2019

Deloitte and NLP: Is the Analysis On Time and Off Target?

January 18, 2019

I read “Using AI to Unleash the Power of Unstructured Government Data.” I was surprised because I thought that US government agencies were using smart software (NLP, smart ETL, digital notebooks, etc.). My recollection is that use of these types of tools began in the mid 1990s, maybe a few years earlier. i2 Ltd., a firm for which I did a few minor projects, rolled out its Analyst’s Notebook in the mid 1990s, and it gained traction in a number of government agencies a couple of years after British government units began using the software.

The write up states:

DoD’s Defense Advanced Research Projects Agency (DARPA) recently created the Deep Exploration and Filtering of Text (DEFT) program, which uses natural language processing (NLP), a form of artificial intelligence, to automatically extract relevant information and help analysts derive actionable insights from it.

My recollection is that DEFT fired up in 2010 or 2011. Once funding became available, activity picked up in 2012. That was six years ago.

However, DEFT is essentially a follow on from other initiatives which reach by to Purple Yogi (Stratify) and DR-LINK, among others.

The capabilities of NLP are presented as closely linked technical activities; for example:

  • Name entity resolution
  • Relationship extraction
  • Sentiment analysis
  • Topic modeling
  • Text categorization
  • Text clustering
  • Information extraction

The collection of buzzwords is interesting. I would annotate each of these items to place them in the context of my research into content processing, intelware, and related topics:

Read more

Google: Innovation Desperation or Innovation Innovation

December 5, 2018

Google has an innovation problem. The company has tried 20 percent free time. Engineers were supposed to work on personal projects. Google tried creating investment units. Google has acquired companies, often in time frames that seemed compressed. Anyone remember buying Motorola Mobility in 2011? Google created a super secret innovation center because the ageing Google Labs was not up to the task of creating Loon balloons and solving death. There have been competitions to identity bright young sprouts who can bring new ideas to the Google. If I dig through my files, there are probably innovation initiatives I have forgotten. Google is either a forward looking outfit, or it is struggling to do more than keep the 20 year old system looking young.

Image result for archimedes eureka

Has Google tried thinking in the hot tub like Archimedes? Google has bean bags, volleyball courts, and Foosball. But real innovations like those AltaVista mechanisms or GoTo’s pay to play for search visibility? There is Web Accelerator, of course.

I read “An Exclusive look inside Google’s in-house incubator Area 120.” The write up reports that a wizard Googler allegedly said and may actually believe:

“We built a place and a process to be able to have those folks come to us and then select what we thought were the most promising teams, the most promising ideas, the most promising markets,” explains managing director Alex Gawley, who has spent a decade at Google and left his role as product manager for Google Apps (since renamed G Suite) to spearhead this new effort. Employees “can actually leave their jobs and come to us to spend 100% of their time pursuing something that they are particularly passionate about,” he says.

Okay, Area 120. That even more mysterious than the famous Area 51. I am thinking of the theme from “Outer Limits.”

The Googlers “pitch” ideas in the hope of getting funding. A Japanese management expert explained a somewhat similar approach to keeping smart employees innovating. See Kuniyasu Sakai’s explanations of the method in “To Expand We Divide.” You probably have this and his other management writings on your desk, right? Someone at Google seems to have brushed against these concepts. In Fast Company / Google speak, these new companies are “hatchlings.”

Several observations:

  1. Innovation is a problem as companies become larger. Google illustrates this problem.
  2. Google’s approach to innovation is bifurcated. Most of its “innovations” originated elsewhere; for example, IBM Clever, AltaVista technology, GoTo-Overture “pay to play” advertising. The company’s goal is to innovate using original ideas, not refinements of other innovators’ breakthroughs.
  3. Google faces an innovation free environment. A recent example may be found in the wild and crazy Amazon announcements at its Re:Invent conference. Somewhere in the jet blast of announcements, there were a couple of substantive innovations. Google does phones with problems and wraps search in layers of cotton wool. Amazon, its seems, is sucking search innovation from Google.

For these reasons Google is gasping. Even rah rah write ups about Google like the recent encomium to Jeff Dean and Sanjay Ghemawat (both AltaVista veterans) is a technical “You Can’t Go Home Again” description of the good old days.

On one hand, Google’s efforts to become innovative are admirable. Persistence, patience, investment—yada yada. On the other hand, Google remains trapped as a servant to its Yahoo (GoTo and Overture) business model for online advertising.

The PR will continue to flow, but innovations? Maybe.

Stephen E Arnold, December 5, 2018

A Reasonable Assertion: Google Is Dying

October 10, 2018

Nope, this is not the view in Harrod’s Creek. The idea that “Google Is Dying” comes from a write up in Vortex by Lauren, whom I assume is a real, living entity and not an avatar, construct, or VR thing.

google is dying

You can find the analysis at this link.

I am not going to push back against the entity Lauren’s ideas.

I want to point out that:

  1. Companies, like real living humans, have a lifespan. It does not matter that some Googlers are awaiting the opportunity to merge with a machine, save their brain (assuming that intelligence is indeed  the sole province of thought), and live a long time. Ideally? Forever. The death of Google, therefore, is hard wired, and, if I may offer a controversial idea, has already taken place. Today we are dealing with the progeny of Google.
  2. The missteps which have captured some Google embracers’ attention is the outright failure of Google’s ability to create a secure environment for management and for users of the descendent of Orkut. The lapses are not an indication that Google is dying. The examples are logical manifestations of the consequences of inbreeding. Imagine West Virginia’s isolated communities connected via a mobile system. That does not change the inbreeding for some individuals. If you are not up on inbreeding, here’s a handy reference. The key point is cognitive deterioration. Stated more clearly, stupid decision making, impaired analytic skills, etc.
  3. Google’s lab rat approach to innovation has not, so far, been able to disprove Steve Ballmer’s brilliant observation: “One trick pony.” But what few analysts care to remember is that the “one trick pony” was online advertising derived from the GoTo.com/Overture.com/Yahoo.com idea. My recollection is that prior to the Google IPO, a legal settlement was reached with Yahoo. This billion dollar deal kept good old Yahoo afloat for several years. Thus, Google’s big idea was a bit of a “me too.” One might argue that the failure to find a way to generate an equivalent amount of revenue is not surprising. Even the Android ecosystem is like a sucker fish on a shark. The symbiosis between online advertising, data harvesting, and revenue is difficult to disentangle. The key point: The big idea was GoTo.com, implemented in a Googley way.

After writing three monographs about the Google and adding comments to my research about the company, I could write more.

Read the alleged humanoid’s “real news” essay. Make your own decision.

I am not pushing back. I am just disappointed that 20 years after the Backrub folks morphed into Google, analyses continue to look at here-and-now events, not the broader trends the company manifests.

Maybe Generation Z will step forward and fill the void?

Stephen E Arnold, October 11, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta