Facebook and Humans: Reality Is Not Marketing

May 16, 2016

I read “Facebook News Selection Is in Hands of Editors Not Algorithms, Documents Show.” The main point of the story is that Facebook uses humans to do work. The idea is that algorithms do not seem to be a big part of picking out what’s important.

The write up comes from a “real” journalism outfit. The article points out:

The boilerplate about its [Facebook’s]  news operations provided to customers by the company suggests that much of its news gathering is determined by machines: “The topics you see are based on a number of factors including engagement, timeliness, Pages you’ve liked and your location,” says a page devoted to the question “How does Facebook determine what topics are trending?”

After reading this, I thought of Google’s poetry created by its artificial intelligence system. Here’s the line which came to mind:

I started to cry. (Source: Quartz)

I vibrate with the annoyance bubbling under the surface of the newspaper article. Imagine. Facebook has great artificial intelligence. Facebook uses smart software. Facebook open sources its systems and methods. The company says it is at the cutting edge of replacing humans with objective procedures.

The article’s belief in baloney is fried and served cold on stale bread. Facebook uses humans. The folks at real journalism outfits may want to work through articles like “Different Loci of Semantic Interference in Picture Naming vs. Word-Picture Matching Tasks” to get a sense of why smart systems go wandering.

So what’s new? Palantir Technologies uses humans to index content. Without that human input, the “smart” software does some useful work, but humans are part of the work flow process.

Other companies use humans too. But the marketing collateral and the fizzy presentations at fancy conferences paint a picture of a world in which cognitive, artificially intelligent, smart systems do the work that subject matter experts used to do. Humans, like indexers and editors, are no longer needed.

Now reality pokes is rose tinted fingertips into the real world.

Let me be clear. One reason I am not happy with the verbiage generated about smart software is one simple fact.

Most of the smart software systems require humans to fiddle at the beginning when a system is set up, while the system operates to deal with exceptions, and after an output is produced to figure out what’s what. In short, smart software is not that smart yet.

There are many reasons but the primary one is that the math and procedures underpinning many of the systems with which I am familiar are immature. Smart software works well when certain caveats are accepted. For example, the vaunted Watson must be trained. Watson, therefore, is not that much different from the training Autonomy baked into its IDOL system in the mid 1990s. Palantir uses humans for one simple reason. Figuring out what’s important to a team under fire with software works much better if the humans with skin in the game provide indexing terms and identify important points like local names for stretches of highway where bombs can be placed without too much hassle. Dig into any of the search and content processing systems and you find expenditures for human work. Companies licensing smart systems which index automatically face significant budget overruns, operational problems because of lousy outputs, and piles of exceptions to either ignore or deal with. The result is that the smoke and mirrors of marketers speaking to people who want a silver bullet are not exactly able to perform like the carefully crafted demonstrations. IBM i2 Analyst’s Notebook requires humans. Fast Search (now an earlobe in SharePoint) requires humans. Coveo’s system requires humans. Attivio’s system requires humans. OpenText’s suite of search and content processing requires humans. Even Maxxcat benefits from informed set up and deployment. Out of the box, dtSearch can index, but one needs to know how to set it up and make it work in a specific Microsoft environment. Every search and content processing system that asserts that it is automatic is spackling flawed wallboard.

For years, I have given a lecture about the essential sameness of search and content processing systems. These systems use the same well known and widely taught mathematical procedures. The great breakthroughs at SRCH2 and similar firms amount to optimization of certain operations. But the whiziest system is pretty much like other systems. As a result, these systems perform in a similar manner. These systems require humans to create term lists, look up tables of aliases for persons of interest, hand craft taxonomies to represent the chunk of reality the system is supposed to know about, and other “libraries” and “knowledgebases.”

The fact that Watson is a source of amusement to me is precisely because the human effort required to make a smart system work is never converted to cost and time statements. People assume Watson won Jeopardy because it was smart. People assume Google knows what ads to present because Google’s software is so darned smart. People assume Facebook mines its data to select news for an individual. Sure, there is automation of certain processes, but humans are needed. Omit the human and you get the crazy Microsoft Tay system which humans taught to be crazier than some US politicians.

For decades I have reminded those who listened to my lectures not to confuse what they see in science fiction films with reality. Progress in smart software is evident. But the progress is very slow, hampered by the computational limits of today’s hardware and infrastructure. Just like real time, the concept is easy to say but quite expensive and difficult to implement in a meaningful way. There’s a reason millisecond access to trading data costs so much that only certain financial operations can afford the bill. Smart software is the same.

How about less outrage from those covering smart software and more critical thinking about what’s required to get a system to produce a useful output? In short, more info and less puffery, more critical thinking and less sawdust. Maybe I imagined it but both the Google and Tesla self driving vehicles have crashed, right? Humans are essential because smart software is not as smart as those who believe in unicorns assume. Demos, like TV game shows, require pre and post production, gentle reader.

What happens when humans are involved? Isn’t bias part of the territory?

Stephen E Arnold, May 16, 2016

The Trials, Tribulations, and Party Anecdotes Of “Edge Case” Names

May 16, 2016

The article titled These Unlucky People Have Names That Break Computers on BBC Future delves into the strange world of “edge cases” or people with unexpected or problematic names that reveal glitches in the most commonplace systems that those of us named “Smith” or “Jones” take for granted. Consider Jennifer Null, the Virginia woman who can’t book a plane ticket or complete her taxes without extensive phone calls and headaches. The article says,

“But to any programmer, it’s painfully easy to see why “Null” could cause problems for a database. This is because the word “null” is often inserted into database fields to indicate that there is no data there. Now and again, system administrators have to try and fix the problem for people who are actually named “Null” – but the issue is rare and sometimes surprisingly difficult to solve.”

It may be tricky to find people with names like Null. Because of the nature of the controls related to names, issues generally arise for people like Null on systems where it actually does matter, like government forms. This is not an issue unique to the US, either. One Patrick McKenzie, an American programmer living in Japan, has run into regular difficulties because of the length of his last name. But that is nothing compared to Janice Keihanaikukauakahihulihe’ekahaunaele, a Hawaiian woman who championed for more flexibility in name length restrictions for state ID cards.

 

Chelsea Kerwin, May 16, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Anonymous Hacks Turkish Cops

May 16, 2016

Anonymous has struck again, this time hacking  the Turkish General Directorate of Security (EGM) in its crusade against corruption. The International Business Times reports, “Anonymous: Hacker Unleashes 17.8 GB Trove of Data from a Turkish National Police Server.” It is believed that the hacker responsible is ROR[RG], who was also deemed responsible for last year’s Adult Friend Finder breach.  The MySQL-friendly files are now available for download at TheCthulhu website, which seems to be making a habit of posting hacked police data.

Why has Anonymous targeted Turkey? Reporter Jason Murdock writes:

“Anonymous has an established history with carrying out cyberattacks against Turkey. In 2015 the group, which is made up of a loose collection of hackers and hacktivists from across the globe, officially ‘declared war’ on the country. In a video statement, the collective accused Turkish President Recep Tayyip Erdo?an’s government of supporting the Islamic State (Isis), also known as Daesh.

“’Turkey is supporting Daesh by buying oil from them, and hospitalising their fighters,’ said a masked spokesperson at the time. ‘We won’t accept that Erdogan, the leader of Turkey, will help Isis any longer. If you don’t stop supporting Isis, we will continue attacking your internet […] stop this insanity now Turkey. Your fate is in your own hands.’”

We wonder how Turkey will respond to this breach, and what nuggets of troublesome information will be revealed. We are also curious to see what Anonymous does next; stay tuned.

 

Cynthia Murrell, May 16, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

 

Now Big Data Has to Be Fast

May 15, 2016

I read “Big Data Is No Longer Enough: It’s Now All about Fast Data.” The write up is interesting because it shifts the focus from having lots of information to infrastructure which can process the data in a timely manner. Note that “timely” means different things in different contexts. For example, to a crazed MBA stock market maven, next week is not too useful. To a clueless marketing professional with a degree in art history, “next week” might be just speedy enough.

The write up points out:

Processing data at these breakneck speeds requires two technologies: a system that can handle developments as quickly as they appear and a data warehouse capable of working through each item once it arrives. These velocity-oriented databases can support real-time analytics and complex decision-making in real time, while processing a relentless incoming data feed.

The point omitted from the article is that speed comes at a cost. The humans required to figure out what’s needed to go fast, the engineers to build the system, and the time required to complete the task. The “cloud” is not a solution to the cost.

Another omission in the article is that the numerical recipes required to “make sense” of large volumes of data require specialist knowledge. A system which outputs nifty charts may be of zero utility when it comes to making a decision.

The write up ignores the information in “What Beats Big Data? Small Data.” Some organizations cannot afford the cost of fast data. Even outfits which have the money can find themselves tripping over their analyses. See, for example, “Amazon Isn’t Racist, It’s Just Been an Unfortunate Victim of Big Data.” Understanding the information is important. Smart software often lacks the ability to discern nuances or issues with data quality, poor algorithm selection, or knowing what to look for in the first place.

Will the write up cause marketers and baloney makers to alter their pitches about Big Data and smart software. Not a chance. Vendors’ end game is revenue; licensees have a different agenda. When the two do not meet, there may be some excitement.

Stephen E Arnold, May 15, 2016

Excite and Ask: Where Are They Now?

May 14, 2016

I learned a factoid from “Yahoo Stock: Analyzing 5 Key Suppliers.” Here’s the passage with the items I noted in bold face:

Excite Japan Co., Ltd. was established in 1997 as a joint venture with Excite, Inc., which is wholly owned by IAC/InterActiveCorp. At the time, Excite, Inc., which is known in 2016 as Ask.com, was among the largest and most popular Web portals offering personalized home pages for searching content. In 2015, Excite Japan generated 9.91% of its revenues from Yahoo through a revenue-sharing agreement for ad-clicks going through Yahoo’s search engine. In 2015, the company had revenue of $66.47 million in U.S. dollars and a market capitalization of $3.77 billion.

Interesting about Excite. About Yahoo? Not so much.

Stephen E Arnold, May 14, 2016

Deep Learning: Old Wine, New Labels

May 13, 2016

I read “Deep Learning: Definition, Resources, and Comparison with Machine Learning.” The most useful segment of the article to me is the list of resources. I did highlight this statement and its links:

Many deep learning algorithms (clustering, pattern recognition, automated bidding, recommendation engine, and so on)  — even though they appear in new contexts such as IoT or machine to machine communication — still rely on relatively old-fashioned techniques such as logistic regression, SVM, decision trees, K-NN, naive Bayes, Bayesian modeling, ensembles, random forests, signal processing, filtering, graph theory, gaming theory, and many others. Click here and here for details about the top 10 algorithms.

The point is that folks are getting interested in established methods hooked together in interesting ways. Perhaps new methods will find their way into the high flying vehicles for smart software? But wait. Are computational barriers acting like a venturi in the innovation flow? What about that vacuum?

Stephen E Arnold, May 13, 2016

Dissing the GOOG: After 15 Years, the Halo Tarnishes

May 13, 2016

I love the Google. Sorry. I love the Alphabet Google thing. I read the Google invention explaining how I could have a computer implanted in my eye. The Alphabet Google thing has sufficient time, talent, and money to move beyond Dr. Babak Amir Parviz’s contact lens invention. Amir Parviz or Amirparviz has left the Google building and the problem of cooling a computer in an eye for others to solve.

Alphabet Google can solve some problems; for example, Loon balloon drifting and making it difficult for me to locate information directly relevant to a query I pass to the Google search systems. Management glitches? No problem. Solve them with personnel shifts and reorganization.

From my perspective, the search giant turned Leonardo can envision with the best mankind has offered. The challenge seems to be finding a way to keep the online advertising machine pumping money.

I read “Is the Online Advertising Bubble Finally Starting to Pop?” This is an interesting question. The write up presents some data which make clear that Google is generating less revenue per click than it did in 2014. I looked at a chart which shows a decline in the “cost of ad space per dollar of revenue.

If the data are accurate, erosion of Google’s ad revenue is now a problem for Google to solve. The write up opined:

We estimate that the online advertising market has been artificially inflated since the end of 2013, and is much more mature than its pundits are claiming. 90% of Google’s revenues come from advertising. We expect Alphabet’s share price to go down by 75%…

The article concludes with a list of other sources which suggest that Google’s ad revenue is “crumbling to the ground.”

My reaction is that Alphabet Google’s business model pivots on the Overture/GoTo.com pay to play model. Google and now Alphabet have tried for decades to find another source of revenue which would prove that Steve Ballmer’s “one trick pony” observation was not accurate.

How have those revenue initiatives worked out? Google remains dependent on online advertising for the bulk of its revenue. The desktop search approach is not the principal method of obtaining answers to questions for most mobile users. Facebook, it appears, is more successful in providing must have information to users who will put up with Facebook’s revenue methods. Amazon, despite its woeful search systems, generates money from a couple of talented ponies, not one.

What’s going on? Here’s my view:

  • Google’s vision was to build a better Alta Vista and generate revenue with online ads. That model is the foundation of the Alphabet Google thing and a digital straitjacket which Google-dini cannot escape
  • Alphabet Google is a combination of science club projects and me-too innovation. Without something “new,” the GOOG is a bit of an artifact for many users. Convenience is one thing, and revenue is slightly different. A mismatch perhaps?
  • Google is distracted. There are legal hassles. There are staffing hassles. There are competitive hassles. Is the pony addled by crowd noise in the online circus ring?

The Google is not going away quickly. Messrs. Brin and Page need to find the imitative magic that created a better Alta Vista. Then that “new” thing has to produce sufficient revenue to add some meaningful revenue to the company’s financials.

Is Google “feeling lucky”?

Stephen E Arnold, May 13, 2016

Facebook and Law Enforcement in Cahoots

May 13, 2016

Did you know that Facebook combs your content for criminal intent? American Intelligence Report reveals, “Facebook Monitors Your Private Messages and Photos for Criminal Activity, Reports them to Police.” Naturally, software is the first entity to scan content, using keywords and key phrases to flag items for human follow-up. Of particular interest are “loose” relationships. Reporter Kristan T. Harris writes:

Reuters’ interview with the security officer explains,  Facebook’s software focuses on conversations between members who have a loose relationship on the social network. For example, if two users aren’t friends, only recently became friends, have no mutual friends, interact with each other very little, have a significant age difference, and/or are located far from each other, the tool pays particular attention.

“The scanning program looks for certain phrases found in previously obtained chat records from criminals, including sexual predators (because of the Reuters story, we know of at least one alleged child predator who is being brought before the courts as a direct result of Facebook’s chat scanning). The relationship analysis and phrase material have to add up before a Facebook employee actually looks at communications and makes the final decision of whether to ping the authorities.

“’We’ve never wanted to set up an environment where we have employees looking at private communications, so it’s really important that we use technology that has a very low false-positive rate,’ Sullivan told Reuters.”

Uh-huh. So, one alleged predator  has been caught. We’re told potential murder suspects have also been identified this way, with one case awash in 62 pages of Facebook-based evidence. Justice is a good thing, but Harris notes that most people will be uncomfortable with the idea of Facebook monitoring their communications. She goes on to wonder where this will lead; will it eventually be applied to misdemeanors and even, perhaps, to “thought crimes”?

Users of any social media platform must understand that anything they post could eventually be seen by anyone. Privacy policies can be updated without notice, and changes can apply to old as well as new data. And, of course, hackers are always lurking about. I was once cautioned to imagine that anything I post online I might as well be shouting on a public street; that advice has served me well.

 

Cynthia Murrell, May 13, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Parts Unknown of Dark Web Revealed in Study

May 13, 2016

While the parts unknown of the internet is said to be populated by terrorists’ outreach and propaganda, research shows a different picture. Quartz reports on this in the article, The dark web is too slow and annoying for terrorists to even bother with, experts say. The research mentioned comes from Thomas Rid and Daniel Moore of the Department of War Studies at King’s College London. They found 140 extremist Tor hidden services; inaccessible or inactive services topped the list with 2,482 followed by 1,021 non-illicit services. As far as illicit services, those related to drugs far outnumbered extremism with 423. The write-up offers a few explanations for the lack of terrorists publishing on the Dark Web,

“So why aren’t jihadis taking advantage of running dark web sites? Rid and Moore don’t know for sure, but they guess that it’s for the same reason so few other people publish information on the dark web: It’s just too fiddly. “Hidden services are sometimes slow, and not as stable as you might hope. So ease of use is not as great as it could be. There are better alternatives,” Rid told Quartz. As a communications platform, a site on the dark web doesn’t do what jihadis need it to do very well. It won’t reach many new people compared to “curious Googling,” as the authors point out, limiting its utility as a propaganda tool. It’s not very good for internal communications either, because it’s slow and requires installing additional software to work on a mobile phone.”

This article provides fascinating research and interesting conclusions. However, we must add unreliable and insecure to the descriptors for why the Dark Web may not be suitable for such uses.

 

Megan Feil, May 13, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Smart Software a Derby Winner. Watson Does Not Show

May 12, 2016

I read “AI Predicts All Four Top Places in the Kentucky Derby: Machine Uses Swarm Intelligence to Turn $20 bet into $11,000.” Let’s assume that the magic revealed in the write up is spot on. I will not ask the question, “How much would IBM have won if it had bet a couple of hundred million on the Kentucky Derby using the revealed technology?” Believe me, I want to ask that question, but I will exercise restraint.

According to the write up:

An artificial intelligence program developed by Unanimous A.I. successfully predicted the Superfecta at the 142nd Kentucky Derby last Saturday, turning a $20 bet into nearly $11,000. Using ‘Swarm Intelligence,’ the AI was able to correctly choose the winning horse, Nyquist – along with the second, third, and fourth finishers.

The article includes a nifty, real anigif to illustrate how “swarm intelligence” made big money at the track.

The idea originated at the real news outfit TechRepublic.

The trick:

Many minds are better than one.

Should one ask Watson if it can perform the same big payday magic? Nah.

Stephen E Arnold, May 12, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta