Dow Jones: Fake News As a Training Error

October 11, 2017

In the dead tree edition of the Wall Street Journal, I read an interesting but all too brief article; to wit: “Dow Jones Publishes Errant Headlines in Systems Snafu.” The main point is that Dow Jones pushed out “nearly 2,000 dummy headlines and articles.” The company, of course, is sorry, very sorry. The “false headlines” were disappeared. The small item on page B 5 at the bottom of the page of newsprint included this statement on October 11, 2017:

I take today’s inadvertent and erroneous publication of testing materials extremely seriously.

Fake news. Nah, just a digital flub from the proud Murdoch outfit. Mistakes happen. Perhaps the Dow Jones engine will factor in this human response when it next excoriates Silicon Valley outfits who stub their toes.

Oh, if you are looking for the story online, you have to search Google News for “fake news” and follow the links to everyone except the Wall Street Journal. Google does point to this item on the Web site. The publicist does not include the mea culpa, which I find interesting.

Stephen E Arnold, October 11, 2017

Palantir Settlement Makes Good Business Sense

October 11, 2017

Palantir claims it is focusing on work, not admitting its guilt over a labor dispute in a recent settlement. This is creating a divide in the industry about what it exactly does mean. We first learned of the $1.66 million settlement in How To Zone’s story, “Palantir Settles Discrimination Complaint with U.S. Labor Agency.”

How did we get here? According to the story:

The Labor Department said in an administrative complaint last year that it conducted a review of Palantir’s hiring process beginning in 2010. The agency alleged that the company’s reliance on employee referrals resulted in bias against Asians. Contracts worth more than $370 million, including with the U.S. Defense Department, Treasury Department and other federal agencies, were in jeopardy if the Labor Department had found Palantir guilty of discrimination.

Serious accusations. But this settlement might not signal what you think it does. Palantir said in a statement:

We settled this matter, without any admission of liability, in order to focus on our work.

This might be the smartest action on their behalf. Consider what happened to SalesForce when they got wrapped up in a legal battle earlier this year. It not only slowed down their sales, but some experts feel the suit may have altered enterprise search for good.

Something tells us Palantir, with its rich government contracts, wants to simply put this behind them and not get caught in a legal web.

Patrick Roland, October 11, 2017

Smart Software with a Swayed Back Pony

October 1, 2017

I read “Is AI Riding a One-Trick Pony?” and felt those old riding sores again. Technology Review nifty new technology old. Bayesian methods date from the 18th century. The MIT write up has pegged Geoffrey Hinton, a beloved producer of artificial intelligence talent, as the flag bearer for the great man theory of smart software.

Dr. Hinton is a good subject for study. But the need to generate clicks and zip in the quasi-academic world of bit time universities may be engaged in “practical” public relations. For example, the write up praises Dr. Hinton’s method of “back propagation.” At the same time, the MIT publication points out the method of neural networks popular today:

you change each of the weights in the direction that best reduces the error overall. The technique is called “backpropagation” because you are “propagating” errors back (or down) through the network, starting from the output.
This makes sense. The idea is that the method allows the real world to be subject to a numerical recipe.

The write up states:

Neural nets can be thought of as trying to take things—images, words, recordings of someone talking, medical data—and put them into what mathematicians call a high-dimensional vector space, where the closeness or distance of the things reflects some important feature of the actual world.

Yes, reality. The way the brain works. A way to make software smart. Indeed a one trick pony which can be outfitted with a silver bridle, a groomed mane and tail, and black liquid shoe polish on its dainty hooves.

The sway back? A genetic weakness. A one trick pony with a sway back may not be able to carry overweight kiddies to the Artificial Intelligence Restaurant, however.

MIT’s write up suggests there is a weakness in the method; specifically:

these “deep learning” systems are still pretty dumb, in spite of how smart they sometimes seem.


Neural nets are just thoughtless fuzzy pattern recognizers, and as useful as fuzzy pattern recognizers can be—hence the rush to integrate them into just about every kind of software—they represent, at best, a limited brand of intelligence, one that is easily fooled.

Software, the article points out that:

And though we’ve started to get a better handle on what kinds of changes will improve deep-learning systems, we’re still largely in the dark about how those systems work, or whether they could ever add up to something as powerful as the human mind.

There is hope too:

Essentially, it is a procedure he calls the “exploration–compression” algorithm. It gets a computer to function somewhat like a programmer who builds up a library of reusable, modular components on the way to building more and more complex programs. Without being told anything about a new domain, the computer tries to structure knowledge about it just by playing around, consolidating what it’s found, and playing around some more, the way a human child does.

We have a braided mane and maybe a combed tail.

But what about that swayed back, the genetic weakness which leads to a crippling injury when the poor pony is asked to haul a Facebook or Google sized child around the ring? What happens if low cost, more efficient ways to create training data, replete with accurate metadata and tags for human things like sentiment and context awareness become affordable, fast, and easy?

My thought is that it may be possible to do a bit of genetic engineering and make the next pony healthier and less expensive to maintain.

Stephen E Arnold, October 1, 2017

Why the Future of Computing Lies in Natural Language Processing

September 26, 2017

In a blog post, EasyAsk declares, “Cognitive Computing, Natural Language & AI: Game Changers.”  We must keep in mind that the “cognitive eCommerce” company does have a natural language search engine to sell, so they are a little biased. Still, writer and CEO Craig Bassin make some good points. He begins by citing research firm Gartner’s assessment that natural-language query “will dramatically change human-computer interaction.” After throwing in a couple amusing videos, Bassin examines the role of natural language in two areas of business, business intelligence (BI) and customer relationship management (CRM). He writes:

That shift [to natural language and cognitive computing] enables two things. First, it enables users to ask a computer questions the same way they’d ask an associate, or co-worker. Second, it enables the computer to actually answer the question. That’s the game changer. The difference is a robust Natural Language Linguistic Engine. Let’s go back to the examples above for a reexamination of our questions. For BI, what if there was an app that looked beyond the dashboards into the data to answer ah-hoc questions? Instead of waiting days for a report to be generated, you could have it on the fly – right at your fingertips. For CRM, what if that road warrior could ask and answer questions about the current status across prospects in a specific region to deduce where his/her time would be best spent? Gartner and Forrester see the shift happening. In Gartner’s Magic Quadrant Report for Business Intelligence and Analytics Platforms [PDF], strategic planning assumptions incorporate the use of natural language. It may sound like a pipe dream now, but this is the future.

Naturally, readers can find natural-language goodness in EasyAsk’s platform which, to be fair, has been building their cognitive computing tech for years now. Businesses looking for a more sophisticated search solution would do well to check them out—along with their competition.  Based in Burlington, Mass., EasyAsk also maintains their European office in Berkshire, UK. The company was founded in 2000 and was acquired by Progress Software in 2005.

Cynthia Murrell, September 26, 2017

Google Invests Hefty Sums in Lobbying Efforts

September 19, 2017

Since Microsoft was caught flat-footed by antitrust charges in 1992, the tech industry has steadily increased its lobbying efforts. Now, The Guardian asks, “Why is Google Spending Record Sums on Lobbying Washington?” Writer Johathan Taplin describes some reasons today’s political climate prompts such spending and points out that Google is the “largest monopoly in America,” though the company does its best to downplay that trait. He also notes that Google is libertarian in nature, and staunchly advocates against regulation. Looking forward, Taplin posits:

Much of Google’s lobbying may be directed toward its future business. That will be running artificial intelligence networks that control the transportation, medical, legal and educational businesses of the future. In a speech last Saturday to the National Governor’s Conference, the tech entrepreneur Elon Musk stated: ‘AI is a rare case where I think we need to be proactive in regulation instead of reactive.’ Coming from a Silicon Valley libertarian, this was a rare admission, but Musk went on to say: ‘There certainly will be job disruption. Because what’s going to happen is robots will be able to do everything better than us … I mean all of us.’ Both Google and Facebook pushed back hard against Musk’s remarks, because they have achieved their extraordinary success by working in an unregulated business environment. But now, for the first time in their histories, the possibility of regulation may be on the horizon. Google’s response will be to spend more of its $90 bn in cash on politicians. K Street is lining up to help.

We are reminded that, for many industries, lobbying Congress has long been considered a routine cost of doing business. The tech industry is now firmly in that category and is beginning to outspend the rest. See the article for more details.

Cynthia Murrell, September 19, 2017

Annoyed Xooglers and Lawyers: A Volatile Mixture

September 15, 2017

Straightaway you will want to read the “real” news story from a “real” newspaper. The write up is “Former Employees Sue Google Alleging Bias against Women in Pay and Promotion.” (The story is online as 5 15 pm Eastern US time on September 14, 2017. Any other time? Who knows? The Guardian, another “real news” outfit jumped on the story as well at this link.)

The main point is in my opinion to help more criticism on the Alphabet Google thing.

I highlighted this passage:

Three female former employees of Alphabet Inc’s Google filed a lawsuit on Thursday accusing the tech company of discriminating against women in pay and promotions. The proposed class action lawsuit, filed in California state court in San Francisco, comes as Google is facing an investigation by the U.S. Department of Labor into sex bias in pay practices.

Since I am not a woman, I have zero knowledge about what did or did not happen when the GOOG decided what to pay each person. The write up suggests that Google is a throwback because “its treatment of female employees has not entered the 21st century.”

I think the GOOG is an innovative and progressive outfit. The company creates new products and services using multiple tactics. It is socially progressive because, like Walmart, it allows employees to park their campers in the Google parking lots.

The paragraph which raised my eyebrows was this one:

The department [of labor] last month appealed an administrative judge’s July decision that rejected its request for contact information for more than 20,000 Google employees.

My recollection is that Google is on record with a factual statement revealing that collecting certain employee compensation data is a job that is too difficult.

Why can’t regulators and lawyers trust Alphabet Google the way we do in Harrod’s Creek.

Gathering information about a closed domain of employees is tough. Accept the Google fact. And Google is progressive. Some employees are allowed to live in their trucks, emulating a parking policy of Walmart’s.

Stephen E Arnold, September 15, 2017

Markov: Maths for the Newspaper Reader

September 14, 2017

Remarkable. I read a pretty good write up called “That’s Maths: Andrey Markov’s Brilliant Ideas Are Still Bearing Fruit.” I noted the source of the article: The Irish Times. A “real” newspaper. Plus it’s Irish. Quick name a great Irish mathematician? I like Sir William Rowan Hamilton, who my slightly addled mathy relative Vladimir Igorevich Arnold and his boss/mentor/leader of semi clothed hikes in the winter Andrey Kolmogorov thought was an okay guy.

Markov liked literature. Well, more precisely, he liked to count letter frequencies and occurrence in Russian novels like everyone’s fave Eugene Onegin. His observations fed his insight that a Markov Process or Markov Chain was a useful way to analyze probabilities in certain types of data. Applications range from making IBM Watson great again to helping outfits like Sixgill generate useful outputs. (Not familiar with Sixgill? I cover the company in my forthcoming lecture at the TechnoSecurity & Digital Forensics Conference next week.)

I noted this passage which I thought was sort of accurate or at least close enough for readers of “real” newspapers:

For a Markov process, only the current state determines the next state; the history of the system has no impact. For that reason we describe a Markov process as memoryless. What happens next is determined completely by the current state and the transition probabilities. In a Markov process we can predict future changes once we know the current state.

The write up does not point out that the Markov Process becomes even more useful when applied to Bayesian methods enriched with some LaPlacian procedures. Now stir in the nuclear industry’s number one with a bullet Monte Carlo method and stir the ingredients. In my experience and that of my dear but departed relative, one can do a better job at predicting what’s next than a bookie at the Churchill Downs Racetrack. MBAs on Wall Street have other methods for predicting the future; namely, chatter at the NYAC or some interactions with folks in the know about an important financial jet blast before ignition.

A happy quack to the Irish Times for running a useful write up. My great uncle would emit a grunt, which is as close as he came to saying, “Good job.”

Stephen E Arnold, September 14, 2017

IBM Watson: The US Open As a Preview of an IBM Future

September 12, 2017

I read a remarkable essay, article, or content marketing “object” called “What We Can Glean From The 2017 U.S Open to Imagine a World Powered by the Emotional Intelligence AI Can Offer.” The author is affiliated with an organization with which I am not familiar. Its name? Brandthropologie.

Let’s pull out the factoids from the write up which has two themes: US government interest in advanced technology and IBM Watson.

Factoid 1: “Throughout time, the origin of many modern-day technologies can be traced to the military and Defense Research Projects Agency (DARPA).”

Factoid 2: “Just as ARPA was faced with wide spread doubt and fear about how an interconnected world would not lead to a dystopian society, IBM, among the top leaders in the provision of augmented intelligence, is faced with similar challenges amidst today’s machine learning revolution.”

Factoid 3: “IBM enlisted its IBM Watson Media platform to determine the best highlights of matches. IBM then broadcasted the event live to its mobile app, using IBM Watson Media to watch for match highlights as they happened. It took into account crowd noises, emotional player reactions, and other factors to determine the best highlight of a match.”

Factoid 4: “The U.S. Open used one of the first solutions available through IBM Watson Media, called Cognitive Highlights. Developed at IBM Research with IBM iX, Cognitive Highlights was able to identify a match’s most important moments by analyzing statistical tennis data, sounds from the crowd, and player reactions using both action and facial expression recognition. The system then ranked the shots from seven U.S. Open courts and auto-curated the highlights, which simplified the video production process and ultimately positioned the USTA team to scale and accelerate the creation of cognitive highlight packages.”

Factoid 5: “Key to the success of this sea change will be the ability for leading AI providers to customize these solutions to make them directly relevant to specific scenarios, while also staying agilely informed on the emotional intelligence required to not only compete, but win, in each one.”

My reaction to these snippets was incredulity.

My comment about Factoid 1: I was troubled by the notion of “throughout time” DARPA has been the source of “many modern day technologies.” It is true that government funding has assisted outfits from the charmingly named Purple Yogi to Interdisciplinary Laboratories. Government funding is often suggestive and, in many situations, reactive; for example, “We need to move on this autonomous weapons thing.” The idea of autonomous weapons has been around a long time; for example, Thracians’ burning wagon assaults which were a small improvement over Neanderthals pushing stones off a cliff onto their enemies. Drones with AI is not a big leap from my point of view.

My comment about Factoid 2: I like the idea that one company, in this case IBM, was the prime mover for smart software. IBM, like other early commercial computing outfits, was on the periphery of many innovations. If anything, the good ideas from IBM were not put into commercial use because the company needed to generate revenue. IBM Almaden wizard Jon Kleinberg came up with CLEVER. The system and method influenced the Google. Where is IBM in search and information access today? Pretty much nowhere, and I am including the marketing extravaganza branded “Watson.” IBM, from my point of view, acted like an innovation brake, not an innovator. Disagree? That’s your prerogative. But building market share via wild and crazy assertions about Lucene, home brew code, and acquired technology like Vivisimo is not going to convince me about the sluggishness of large companies.

My comment about Factoid 3: The assertion that magic software delivered video programming is sort of true. But the reality of today’s TV production is that humans in trailers handle 95 percent of the heavy lifting. Software can assist, but the way TV production works at live events is that there are separate and unequal worlds of keeping the show moving along, hitting commercial points, and spicing up the visual flow. IBM, from my point of view, was the equivalent of salt free spices which a segment of the population love. The main course was human-intermediated TV production of the US Open. Getting the live sports event to work is still a human intermediated task. Marketing may not believe this, but, hey, reality is different from uninformed assertions about what video editing systems can do quickly and “automatically.”

My comment about Factoid 4: See my comment about Factoid 3. If you know a person who works in a trailer covering a live sports event, get their comments about smart editing tools.

My comment about Factoid 5: Conflating the idea of automated functions ability to identify a segment of a video stream with emotion detection is pretty much science fiction. Figuring out sentiment in text is tough. Figuring out “emotion” in a stream of video is another kettle of fish. True, there is progress. I saw a demo from an Israeli company’s whose name I cannot recall. That firm was able to parse video to identify when a goal was scored. The system sort of worked. Flash forward to today: Watson sort of works. Watson is a punching bag for some analysts and skeptics like me for good reason. Talk is easy. Delivering is tough.

Reality, however, seems to be quite different for the folks at Brandthropologie.

Stephen E Arnold, September 12, 2017

Yet Another Digital Divide

September 8, 2017

Recommind sums up what happened at a recent technology convention in the article, “Why Discovery & ECM Haven’t, Must Come Together (CIGO Summit 2017 Recap).” Author Hal Marcus first discusses that he was a staunch challenge to anyone who said they could provide a complete information governance solution. He recently spoke at CIGO Summit 2017 about how to make information governance a feasible goal for organizations.

The problem with information governance is that there is no one simple solution and projects tend to be self-contained with only one goal: data collection, data reduction, etc. When he spoke he explained that there are five main reasons for there is not one comprehensive solution. They are that it takes a while to complete the project to define its parameters, data can come from multiple streams, mass-scale indexing is challenging, analytics will only help if there are humans to interpret the data, risk, and cost all put a damper on projects.

Yet we are closer to a solution:

Corporations seem to be dedicating more resources for data reduction and remediation projects, triggered largely by high profile data security breaches.

Multinationals are increasingly scrutinizing their data sharing and retention practices, spurred by the impending May 2018 GDPR deadline.

ECA for data culling is becoming more flexible and mature, supported by the growing availability and scalability of computing resources.

Discovery analytics are being offered at lower, all-you-can-eat rates, facilitating a range of corporate use cases like investigations, due diligence, and contract analysis

Tighter, more seamless and secure integration of ECM and discovery technology is advancing and seeing adoption in corporations, to great effect.

And it always seems farther away.

Whitney Grace, September 8, 2017

Old School Searcher Struggles with Organizing Information

September 7, 2017

I read a write up called “Semantic, Adaptive Search – Now that’s a Mouthful.” I cannot decide if the essay is intended to be humorous, plaintive, or factual. The main idea in the headline is that there is a type of search called “semantic” and “adaptive.” I think I know about the semantic notion. We just completed a six month analysis of syntactic and semantic technology for one of my few remaining clients. (I am semi retired as you may know, but tilting at the semantic and syntactic windmills is great fun.)

The semantic notion has inspired such experts as David Amerland, an enthusiastic proponent of the power of positive thinking and tireless self promotion, to heights of fame. The syntax idea gives experts in linguistics hope for lucrative employment opportunities. But most implementations of these hallowed “techniques” deliver massive computational overhead and outputs which require legions of expensive subject matter experts to keep on track.

The headline is one thing, but the write up is about another topic in my opinion. Here’s the passage I noted:

The basic problem with AI is no vendor is there yet.

Okay, maybe I did not correctly interpret “Semantic, Adaptive Search—Now That’s a Mouthful.” I just wasn’t expecting artificial intelligence, a very SEO type term.

But I was off base. The real subject of the write up seems to be captured in this passage:

I used to be organized, but somehow I lost that admirable trait. I blame it on information overload. Anyway, I now spend quite a bit of time searching for my blogs, white papers, and research, as I have no clue where I filed them. I have resorted to using multiple search criteria. Something I do, which is ridiculous, is repeat the same erroneous search request, because I know it’s there somewhere and the system must have misunderstood, right? So does the system learn from my mistakes, or learn the mistakes? Does anyone know?

Okay, disorganized. I would never have guessed without a title that references semantic and adaptive search, the lead paragraph about artificial intelligence, and this just cited bit of exposition which makes clear that the searcher cannot make the search systems divulge the needed information.

One factoid in the write up is that a searcher will use 2.73 terms per query. I think that number applies to desktop boat anchor searches from the Dark Ages of old school querying. Today, more than 55 percent of queries are from mobile devices. About 20 percent of those are voice based. Other queries just happen because a greater power like Google or Microsoft determines what you “really” wanted is just the ticket. To me, the shift from desktop to mobile makes the number of search terms in a query a tough number to calculate. How does one convert data automatically delivered to a Google Map when one is looking for a route with an old school query with 2.73 terms? Answer: You maybe just use whatever number pops out from a quick Bing or Google search from a laptop and go with the datum in a hit on an ad choked result list.

The confused state of search and content processing vendors is evident in their marketing, their reliance on jargon and mumbo jumbo, and fuzzy thinking about obtaining information to meet a specific information need.

I suppose there is hope. One can embrace a taxonomy and life will be good. On the other hand, disorganization does not bode well for a taxonomy created by a person who cannot locate information.

Well, one can use smart software to generate those terms, the Use Fors and the See Alsos. One can rely on massive amounts of Big Data to save the day. One can allow a busy user of SharePoint to assign terms to his or her content. Many good solutions which make information access a thrilling discipline.

Now where did I put that research for my latest book, “The Dark Web Notebook”? Ah, I know. In a folder called “DWNB Research” on my back up devices with hard copies in a banker’s box labeled “DWNB 2016-2017.”

Call me old fashioned but the semantic, syntactic, artificially intelligent razzmatazz underscores the triumph of jargon over systems and methods which deliver on point results in response to a query from a person who knows that for which he or she seeks.

Plus, I have some capable research librarians to keep me on track. Yep, real humans with MLS degrees, online research expertise, and honest-to-god reference desk experience.

Smart software and jargon requires more than disorganization and arm waving accompanied by toots from the jargon tuba.

Stephen E Arnold, September 7, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta