Faster Text Classification from Facebook, the Social Outfit

August 29, 2016

I read “Faster, Better Text Classification.” Facebook’s artificial intelligence team has made available some of its whizzy code. The software may be a bit of a challenge to the vendors of proprietary text classification software, but Facebook wants to help everyone. Think of the billion plus Facebook users who need to train an artificially intelligent system with one billion words in 10 minutes. You may want to try this on your Chromebook, gentle reader.

I learned:

Automatic text processing forms a key part of the day-to-day interaction with your computer; it’s a critical component of everything from web search and content ranking to spam filtering, and when it works well, it’s completely invisible to you. With the growing amount of online data, there is a need for more flexible tools to better understand the content of very large datasets, in order to provide more accurate classification results. To address this need, the Facebook AI Research (FAIR) lab is open-sourcing fastText, a library designed to help build scalable solutions for text representation and classification.

What does the Facebook text classification code deliver as open sourciness? I learned:

FastText combines some of the most successful concepts introduced by the natural language processing and machine learning communities in the last few decades. These include representing sentences with bag of words and bag of n-grams, as well as using subword information, and sharing information across classes through a hidden representation. We also employ a hierarchical softmax that takes advantage of the unbalanced distribution of the classes to speed up computation. These different concepts are being used for two different tasks: efficient text classification and learning word vector representations.

The write up details some of the benefits of the code; for example, its multilingual capabilities and its accuracy.

What will other do gooders like Amazon, Google, and Microsoft do to respond to Facebook’s generosity? My thought is that more text processing software will find its way to open source green pastures.

What will the for fee vendors peddling proprietary classification systems do? Here’s a short list of ideas I had:

  1. Pivot to become predictive analytics companies and seek new rounds of financing
  2. Pretend that open source options are available but not good enough for real world tasks
  3. Generate white papers and commission mid tier consulting firms to extol the virtues of their innovative, unique, high speed, smart software
  4. Look for another line of work in search engine optimization, direct sales for a tool and die company, or check out Facebook.

Stephen E Arnold, August 29, 2016

Microsoft to Sunset China Search and News Services

August 22, 2016

Recent news has made clear that online content from the U.S. or any country foreign to China faces challenges in China. An article from CNN Money recently published Microsoft is giving up on its Chinese web portal. This piece informs us that Microsoft will sunset it’s MSN website in China on June 7. Through their company statement, Microsoft mentions their commitment to China remains and notes China is home to the largest R&D facility outside the U.S. An antitrust investigation on Microsoft in China has been underway since July 2014. The article shares an overview of the bigger picture,

The company’s search engine, Bing, also flopped in the country amid tough competition with homegrown rivals. It didn’t help that in Chinese, “Bing” sounds similar to the word for “sickness.

In September, Microsoft finally ditched Bing for users of its Edge browser in China, striking a deal with Chinese Internet giant Baidu (BIDU, Tech30) to use its search engine as the default.

Other Western tech firms have come under scrutiny in China before, including Qualcomm(QCOM, Tech30) and Apple (AAPL, Tech30). Social networks like Facebook (FB, Tech30) and Google (GOOG) remain blocked in the country.”

It looks like Bing will bite the dust soon, in China at least. Does this news mean anything for Microsoft as a company? While regulations China are notably stringent, the size of their population makes up a notably sized market. We will be watching to see how search plays out in China.

Megan Feil, August 22, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph     There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.                                                                                                                 Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/

 

Improving Information for Everyone

August 14, 2016

I love it when Facebook and Google take steps to improve information quality for everyone.

I noted “Facebook’s News Feed to Show Fewer Clickbait Headlines.” I thought the Facebook news feed was 100 percent beef. I learned:

The company receives thousands of complaints a day about clickbait, headlines that intentionally withhold information or mislead users to get people to click on them…

Thousands. I am impressed. Facebook is going to do some filtering to help its many happy users avoid clickbait, a concept which puzzles me. I noted:

Facebook created a system that identifies and classifies such headlines. It can then determine which pages or web domains post large amounts of clickbait and rank them lower in News Feed. Facebook routinely updates its algorithm for News Feed, the place most people see postings on the site, to show users what they are most interested in and encourage them to spend even more time on the site.

Clustering methods are readily available. I ask myself, “Why did Facebook provide streams of clickbait in the first place?”

On a related note, the Google released exclusive information to Time Warner, which once owned AOL and now owns a chunk of Hula. Google’s wizards have identified bad bits, which it calls “unwanted software.” The Googlers converted the phrase into UwS and then into the snappy term “ooze.”

Fortune informed me:

people bump into 60 million browser warnings for download attempts of unwanted software at unsafe Web pages every week.

Quite a surprise I assume. Google will definitely talk about “a really big problem.” Alas, Fortune was not able to provide information about what Mother Google will do to protect its users. Obviously the “real” journalists at Fortune did not find the question, “What are you going to do about this?” germane.

It is reassuring to know that Facebook and Google are improving the quality of the information each provides. Analytics and user feedback are important.

Stephen E Arnold, August 13, 2016

No Dark Web Necessary

August 11, 2016

Do increased Facebook restrictions on hate speech and illegal activity send those users straight to the Dark Web? From The Atlantic comes and article entitled, American Neo-Nazis Are on Russia’s Facebook, which hints that is not always the case. This piece explains that location of an online group called “United Aryan Front” moved from Facebook to a Russia’s version of Facebook: VKontakte. The article describes a shift to cyber racism,

The move to VK is part of the growing tendency of white supremacists to interact in online forums, rather than through real-life groups like the KKK, according to Heidi Beirich, director of the Southern Poverty Law Center’s anti-terror Intelligence Project. Through the early 2000s, skinheads and other groups would host dozens of events per year with hundreds of attendees, she says, but now there are only a handful of those rallies each year. “People online are talking about the same kinds of things that used to happen at the rallies, but now they’re doing it completely through the web,” she said.

It is interesting to consider the spaces people choose, or are forced into, for conducting ill-intentioned activities. Even when Facebook cracks down on it, hate speech amongst other activities is not relegated solely to the Dark Web. While organized online hate speech analogous to rallies may be experiencing a surge in the online world, rallies are not the only avenue for real-world racism. At the core of this article, like many we cover on the Dark Web, is a question about the relationship between place and malicious activity.

 

Megan Feil, August 11, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden/Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/

Facebook Algorithms: Doing What Users Expect Maybe

August 9, 2016

I read an AOL-Yahoo post titled “Inside Facebook Algorithms.” With the excitement of algorithms tingeing the air, explanations of smart software make the day so much better.

I learned:

if you understand the rules, you can play them by doing the same thing over and over again

Good point. But how many Facebook users are sufficiently attentive to correlate a particular action with an outcome which may not be visible to the user?

Censorship confusing? It doesn’t need to be. I learned:

Mr. Abbasi [a person whose Facebook post was censored] used several words which would likely flag his post as hate speech, which is against Facebook’s community guidelines. It is also possible that the number of the words flagged would rank it on a scale of “possibly offensive” to “inciting violence”, and the moderators reviewing these posts would allocate most of their resources to posts closer to the former, and automatically delete those in the latter category. So far, this tool continues to work as intended.

There is nothing like a word look up list containing words which will result in censorship. We love word lists. Non public words lists are not much fun for some.

Now what about algorithms? The examples in the write up are standard procedures for performing brute force actions. Algorithms, as presented in the AOL Yahoo article, seem to be collections of arbitrary rules. Straightforward for those who know the rules.

A “real” newspaper tackled the issue of algorithms and bias. The angle, which may be exciting to some, is “racism.” Navigate to “Is an Algorithms Any Less Racist Than a Human?” Since algorithms are often generated by humans, my hunch is that bias is indeed possible. The write up tells me:

any algorithm can – and often does – simply reproduce the biases inherent in its creator, in the data it’s using, or in society at large. For example, Google is more likely to advertise executive-level salaried positions to search engine users if it thinks the user is male, according to a Carnegie Mellon study. While Harvard researchers found that ads about arrest records were much more likely to appear alongside searches for names thought to belong to a black person versus a white person.

Don’t know the inside rules? Too bad, gentle reader. Perhaps you can search for an answer using Facebook’s search systems or the Wow.com service. Better yet. Ask a person who constructs algorithms for a living.

Stephen E Arnold, August 9, 2016

Facebook vs. LinkedIn for Job Hunters

August 4, 2016

The article on Lifehacker titled Facebook Can Be Just As Important AS LinkedIn For Finding a Job emphasizes the importance of industry connections. As everyone knows, trying to a find a job online is like trying to date online. A huge number of job postings are scams, schemes, or utter bollox. Navigating these toads and finding the job equivalent to Prince Charming is frustrating, which is why Facebook might offer a happy alternative. The article states,

“As business site Entrepreneur points out, the role Facebook plays in helping people find jobs shouldn’t be surprising. Any time you can connect with someone who works in your industry, that’s one more person who could potentially help you get a job. Research from Facebook itself shows that both strong and weak ties on the site can lead to jobs… Well, weak ties are important collectively because of their quantity, but strong ties are important individually because of their quality.”

Obviously, knowing someone in the industry you seek to work in is the key to finding and getting a job. But a site like Facebook is much easier to exploit than LinkedIn because more people use it and more people check it. LinkedIn’s endless emails eventually become white noise, but scrolling through Facebook’s Newsfeed is an infinite source of time-wasting pleasure for the bulk of users. Time to put the networking back into social networking, job seekers.

 

Chelsea Kerwin, August 4, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Facebook Acknowledges Major Dependence on Artificial Intelligence

July 28, 2016

The article on Mashable titled Facebook’s AI Chief: ‘Facebook Today Could Not Exist Without AI’ relates the current conversations involving Facebook and AI. Joaquin Candela, the director of applied machine learning at Facebook, states that “Facebook could not exist without AI.” He uses the examples of the News Feed, ads, and offensive content, all of which involve AI stimulating a vastly more engaging and personalized experience. He explains,

“If you were just a random number and we changed that random number every five seconds and that’s all we know about you then none of the experiences that you have online today — and I’m not only talking about Facebook — would be really useful to you. You’d hate it. I would hate it. So there is value of course in being able to personalize experiences and make the access of information more efficient to you.”

And we thought all Facebook required is humans and ad revenue. Candela makes it very clear that Facebook is driven by machine learning and personalization. He paints a very bleak picture of what Facebook would look like without AI- completely random ads, unranked New Feeds, and offensive content splashing around like beached whale. Only in the last few years, computer vision has changed Facebook’s process of removing such content. What used to take reports and human raters now is automated.
 

Chelsea Kerwin, July 28, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Twitter Influential but a Poor Driver of News Traffic

June 20, 2016

A recent report from social analytics firm Parse.ly examined the relationship between Twitter and digital publishers. NeimanLab shares a few details in, “Twitter Has Outsized Influence, but It Doesn’t Drive Much Traffic for Most News Orgs, a New Report Says.” Parse.ly tapped into data from a couple hundred of its clients, a group that includes digital publishers like Business Insider, the Daily Beast, Slate, and Upworthy.

Naturally, news sites that make the most of Twitter do so by knowing what their audience wants and supplying it. The study found there are two main types of Twitter news posts, conversational and breaking, and each drives traffic in its own way. While conversations can engage thousands of users over a period of time, breaking news produces traffic spikes.

Neither of  those findings is unexpected, but some may be surprised that Twitter feeds are not inspiring more visits publishers’ sites. Writer Joseph Lichterman reports:

“Despite its conversational and breaking news value, Twitter remains a relatively small source of traffic for most publishers. According to Parse.ly, less than 5 percent of referrals in its network came from Twitter during January and February 2016. Twitter trails Facebook, Google, and even Yahoo as sources of traffic, the report said (though it does edge out Bing!)”

Still, publishers are unlikely to jettison their Twitter accounts anytime soon, because that platform offers a different sort of value. One that is, perhaps, more important for consumers. Lichterman quotes the report:

“Though Twitter may not be a huge overall source of traffic to news websites relative to Facebook and Google, it serves a unique place in the link economy. News really does ‘start’ on Twitter.”

And the earlier a news organization knows about a situation, the better. That is an advantage few publishers will want to relinquish.

 

 

Cynthia Murrell, June 20, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Statistical Translation: Dead Like Marley

June 16, 2016

I read “Facebook Says Statistical Machine Translation Has Reached End of Life.” Hey, it is Facebook. Truth for sure. I learned:

Scale is actually one reason Facebook has invested in its own MT technology. According to Packer [Facebook wizard’’], there are more than two trillion posts and comments, which grows by over a billion each day. “Pretty clearly, we’re not going to solve this problem with a roomful or even a building-full of human translators,” he quipped, adding that to have even “a hope of solving this problem, we need AI; we need automation.” The other reason is adaptability. “We tried that,” said Packer about using third-party MT, but it “did not work well enough for our needs.” The reason? The language of Facebook is different from what is on the rest of the Web. Packer described Facebook language as “extremely informal. It’s full of slang, it’s very regional.” He said it is also laden with metaphors, idiomatic expressions, and is riddled with misspellings (most of them intentional). Additionally, as in the rest of the world, there is a marked difference in the way different age groups communicate on Facebook.

I wonder if it is time to send death notices to the vendors who use statistical methods? Perhaps I should wait a bit. Predictions are often different from reality.

Stephen E Arnold, June 16, 2016

Facebook AI Explainer

June 10, 2016

Facebook posted a partial explanation of its artificial intelligence system. You can review the document “Introducing DeepText: Facebook’s Text Understanding Engine” and decide if Facebook or IBM is winning the smart software race. The Facebook document states:

In traditional NLP approaches, words are converted into a format that a computer algorithm can learn. The word “brother” might be assigned an integer ID such as 4598, while the word “bro” becomes another integer, like 986665. This representation requires each word to be seen with exact spellings in the training data to be understood. With deep learning, we can instead use “word embeddings,” a mathematical concept that preserves the semantic relationship among words. So, when calculated properly, we can see that the word embeddings of “brother” and “bro” are close in space. This type of representation allows us to capture the deeper semantic meaning of words. Using word embeddings, we can also understand the same semantics across multiple languages, despite differences in the surface form. As an example, for English and Spanish, “happy birthday” and “feliz cumpleaños” should be very close to each other in the common embedding space. By mapping words and phrases into a common embedding space, DeepText is capable of building models that are language-agnostic.

Due to Facebook’s grip on the 18 to 35 demographic, its approach may have more commercial impact than the methods in use at other firms. Just ask IBM Watson.

Stephen E Arnold, June 10, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta