Google Looks to Curb Hate Speech with Jigsaw
January 6, 2017
No matter how advanced technology becomes, certain questions continue to vex us. For example, where is the line between silencing expression and prohibiting abuse? Wired examines Google’s efforts to walk that line in its article, “Google’s Digital Justice League: How Its Jigsaw Projects are Hunting Down Online Trolls.” Reporter Merjin Hos begins by sketching the growing problem of online harassment and the real-world turmoil it creates, arguing that rampant trolling serves as a sort of censorship — silencing many voices through fear. Jigsaw, a project from Google, aims to automatically filter out online hate speech and harassment. As Jared Cohen, Jigsaw founder and president, put it, “I want to use the best technology we have at our disposal to begin to take on trolling and other nefarious tactics that give hostile voices disproportionate weight, to do everything we can to level the playing field.”
The extensive article also delves into Cohen’s history, the genesis of Jigsaw, how the team is teaching its AI to identify harassment, and problems they have encountered thus far. It is an informative read for anyone interested in the topic.
Hos describes how the Jigsaw team has gone about instructing their algorithm:
The group partnered with The New York Times (NYT), which gave Jigsaw’s engineers 17 million comments from NYT stories, along with data about which of those comments were flagged as inappropriate by moderators.
Jigsaw also worked with the Wikimedia Foundation to parse 130,000 snippets of discussion around Wikipedia pages. It showed those text strings to panels of ten people recruited randomly from the CrowdFlower crowdsourcing service and asked whether they found each snippet to represent a ‘personal attack’ or ‘harassment’. Jigsaw then fed the massive corpus of online conversation and human evaluations into Google’s open source machine learning software, TensorFlow. …
By some measures Jigsaw has now trained Conversation AI to spot toxic language with impressive accuracy. Feed a string of text into its Wikipedia harassment-detection engine and it can, with what Google describes as more than 92 per cent certainty and a ten per cent false-positive rate, come up with a judgment that matches a human test panel as to whether that line represents an attack.
There is still much to be done, but soon Wikipedia and the New York Times will be implementing Jigsaw, at least on a limited basis. At first, the AI’s judgments will be checked by humans. This is important, partially because the software still returns some false positives—an inadvertent but highly problematic overstep. Though a perfect solution may be impossible, it is encouraging to know Jigsaw’s leader understands how tough it will be to balance protection with freedom of expression. “We don’t claim to have all the answers,” Cohen emphasizes.
Cynthia Murrell, January 6, 2017
Lucidworks Sees Watson as a Savior
December 21, 2016
Lucidworks (really?). A vision has appeared to the senior managers of Lucidworks, an open source search outfit which has ingested $53 million and sucked in another $6 million in debt financing in June 2016. Yep, that Lucidworks. The “really” which the name invokes is an association I form when someone tells me that commercializing open source search is going to knock off the pesky Elastic of Elasticsearch fame while returning a juicy payoff to the folks who coughed up the funds to keep the company founded in 2007 chugging along. Yep, Lucid works. Sort of, maybe.
I read “Lucidworks Integrates IBM Watson into Fusion Enterprise Discovery Platform.” The write up explains that Lucidworks is “tapping into” the IBM Watson developer cloud. The write up explains that Lucidworks has:
an application framework that helps developers to create enterprise discovery applications so companies can understand their data and take action on insights.
Ah, so many buzzwords. Search has become applications. “Action on insights” puts some metaphorical meat on the bones of Solr, the marrow of Lucidworks. Really?
With Watson in the company’s back pocket, Lucidworks will deliver. I learned:
Customers can rely on Fusion to develop and deploy powerful discovery apps quickly thanks to its advanced cognitive computing features and machine learning from Watson. Fusion applies Watson’s machine learning capabilities to an organization’s unique and proprietary mix of structured and unstructured data so each app gets smarter over time by learning to deliver better answers to users with each query. Fusion also integrates several Watson services such as Retrieve and Rank, Speech to Text, Natural Language Classifier, and AlchemyLanguage to bolster the platform’s performance by making it easier to interact naturally with the platform and improving the relevance of query results for enterprise users.
But wait. Doesn’t Watson perform these functions already. And if Watson comes up a bit short in one area, isn’t IBM-infused Yippy ready to take up the slack?
That question is not addressed in the write up. It seems that the difference between Watson, its current collection of partners, and affiliated entities like Yippy are vast. The write up tells me:
customers looking for hosted, pre-tuned machine learning and natural language processing capabilities can point and click their way to building sophisticated applications without the need for additional resources. By bringing Watson’s cognitive computing technology to the world of enterprise data apps, these discovery apps made with Fusion are helping professionals understand the mountain of data they work with in context to take action.
This sounds like quite a bit of integration work. Lucidworks. Really?
Stephen E Arnold, December 21, 2016
IBM Open Sourciness Goes Only So Far
December 19, 2016
I love IBM, Big Blue, creator of Watson. Watson, as you may know, is a confection consisting of goodies from IBM’s internal code wizards, acquired technologies like the instantly Big Data friendly Vivisimo, and Lucene. Yep, like Attivio and many other “search” vendors, open source Lucene is the way to reduce the costs for basic information retrieval.
I assume you know about OpenLava, which is an open source system for managing certain types of IBM systems. The Open Lava Web page here states:
With an active community of users and developers, OpenLava development is accelerating, delivering high-quality implementations of important new features including:
- Fair-share scheduling – allocate resources between users and groups according to configurable policies
- Job pre-emption – Ensure that critical users, jobs and groups have the resources they need – when they need them
- Docker support – Providing application isolation, fast service deployment and cloud mobility
- Cloud & VM friendly auto-scaling – Easily add or remove cluster nodes on the fly without cluster re-configuration
These features are in addition to the many advanced capabilities already in OpenLava including job arrays, run-windows, n-way host failover, job limits, dependencies for multi-step workflows, parallel job support and much more.
I read “OpenLava under IBM Attack.” I believe everything I read on the Internet. The write up explains that that Big Blue wants the OpenLava open source code removed. The write up states:
IBM claims that the versions of OpenLava starting from 3.0 infringe their copyright
and that some source code have been stolen from them, copied, or otherwise taken
from their code base.
Several thoughts:
- The folks involved with OpenLava did knowingly and intentionally rip off IBM’s software, and the marketer of Watson and its open source tinged Watson is taking a logical and appropriate action against the open source alternative to IBM’s own management software
- IBM is unhappy with OpenLava’s adoption by IBM customers. IBM customers should buy only software from IBM-authorized sources. Other old school enterprise software companies have this philosophy too.
- There is a failure to communicate. OpenLava is not making its case understandable to the outfit poised to hire 25,000 more employees and IBM is not making itself clear to the crafty folks at OpenLava.
I don’t have a dog in the fight. But I find it interesting that IBM Watson with its Lucene tinged capabilities is finding open source distasteful in some circumstances.
Life was far simpler when open source projects were more malleable. Next stop? The legal eagles’ nests.
Stephen E Arnold, December 19, 2016
Tor Phone to Take on Google
December 13, 2016
Tor users have nil or very limited options to surf Underground Web anonymously as Android-powered phones still manage to scrape user data. The Tor Project intends to beat Google at its own game with Tor-enabled smartphone.
An article that appeared on arsTechnica and titled Tor Phone Is Antidote to Google “Hostility” Over Android, Says Developer, says:
The prototype is meant to show a possible direction for Tor on mobile. We are trying to demonstrate that it is possible to build a phone that respects user choice and freedom, vastly reduces vulnerability surface, and sets a direction for the ecosystem with respect to how to meet the needs of high-security users.
The phone is powered by custom-made CopperHead OS and can be run only on Google Nexus or Pixel hardware phones. Of course due to high technicalities involved, it is recommended only for Linux geeks.
For voice calls, according to the article:
To protect user privacy, the prototype runs OrWall, the Android firewall that routes traffic over Tor, and blocks all other traffic. Users can punch a hole through the firewall for voice traffic, for instance, to enable Signal.
Google’s Android is an Open Source platform that OEMs can customize. This creates multiple security threats enabling hackers and snoopers to create backdoors. CopperHead OS, on the other hand, plugs these security holes with verified boot and also stops Google Play Store from overriding native apps. Seems the days of mobile Tor are finally here.
Vishal Ingole, December 13, 2016
Super Secretive Google DeepMind Open Sources AI Rocket Science
December 6, 2016
Take that IBM. And you Microsoft, I see you and raise you more smart software. The high stakes poker game for the control of the burgeoning market for smart software is getting exciting and expensive. The Google either bursts with confidence, or it fears that outfits like IBM and Microsoft are poised to run the table.
Navigate to the “real” news story “Google DeepMind Makes AI Training Platform Publicly Available.” You will learn that:
DeepMind is putting the entire source code for its training environment — which it previously called Labyrinth and has now renamed as DeepMind Lab — on the open-source depository GitHub, the company said Monday. Anyone will be able to download the code and customize it to help train their own artificial intelligence systems. They will also be able to create new game levels for DeepMind Lab and upload these to GitHub.
Is the move a response to auto maker, dreamer, and launcher of expensive fireworks Elon Musk’s puttering in smart software? The “real” news story said:
OpenAI, a rival research shop set up by billionaire entrepreneur Elon Musk, venture capitalist Peter Thiel and Sam Altman, a founder of Silicon Valley startup accelerator Y Combinator, made its own AI training platform, called OpenAI Gym, available to the public in April [2016]. On Monday it also announced that it was making public an interface called Universe that lets an AI agent “use a computer like a human does: by looking at screen pixels and operating a virtual keyboard and mouse,” the company said in a statement. In short, it’s a go-between that lets an AI system learn the skills needed to play games or operate other applications. Researchers can use tools in OpenAI’s Gym to measure how these agents perform.
Why pay for artificial intelligence to perform the handful of tasks that the Harvard Business Review documented in the helpful write up “What Artificial Intelligence Can and Can’t Do Right Now.”
Perhaps free software will allow the capabilities of smart software to achieve the heights of wonder the marketers envision? How does one spell lock in?
Stephen E Arnold, December 6, 2016
Iran-Russia Ink Pact for Search Engine Services
November 28, 2016
Owing to geopolitical differences, countries like Iran are turning towards like-minded nations like Russia for technological developments. Russian Diplomat posted in Iran recently announced that home-grown search engine service provider Yandex will offer its services to the people of Iran.
Financial Tribune in a news report Yandex to Arrive Soon said that:
Last October, Russian and Iranian communications ministers Nikolay Nikiforov and Mahmoud Vaezi respectively signed a deal to expand bilateral technological collaborations. During the meeting, Russian Ambassador Vaezi said, We are familiar with the powerful Russian search engine Yandex. We agreed that Yandex would open an office in Iran. The system will be adapted for the Iranian people and will be in Persian.
Iran traditionally has been an extremist nation and at the center of numerous international controversies that indirectly bans American corporations from conducting business in this hostile territory. On the other hand, Russia which is seen as a foe to the US stands to gain from these sour relations.
As of now, .com and .com.tr domains owned by Yandex are banned in Iran, but with the MoU signed, that will change soon. There is another interesting point to be observed in this news piece:
Looking at Yandex.ir, an official reportedly working for IRIB purchased the website, according to a domain registration search. DomainTools, a portal that lists the owners of websites, says Mohammad Taqi Mozouni registered the domain address back in July.
Technically, and internationally accepted, no individual or organization can own a domain name of a company with any extension (without necessary permissions) that has already carved out a niche for itself online. It is thus worth pondering what prompted a Russian search engine giant to let a foreign governmental agency acquire its domain name.
Vishal Ingole November 28, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Android Has No Competition in Mobile OS Market
November 23, 2016
Google’s Android OS currently powers 88% of the smartphones in the world, leaving minuscule 12.1 percent to Apple’s iOS and the remaining 0.3 percent for Windows Mobile, BlackBerry OS and Tizen.
IBTimes in an article titled Android Rules! 9 out of Every 10 Phones Run Google’s OS says:
Google’s Android OS dominated the world by powering 88 percent of the world’s smartphone market in the third quarter of 2016. This means 9 out of every 10 mobile phones in the world are using Android, while the rest rely on iOS or other mobile OS such as BlackBerry OS, Tizen and Windows Phone.
The growth occurred despite the fact that smartphone shipments are falling. China and Africa which were big markets have been performing poorly since last three-quarters. Android’s gain thus can be attributed to the fact that Android is an OpenSource system that can be used by any device manufacturer.
Despite being the clear leader, the mobile OS is full of bugs and other inherent problems, as the article points out:
Android platform is getting overcrowded with hundreds of manufacturers, few Android device vendors make profits, and Google’s new Pixel range is attacking its own hardware partners that made Android popular in the first place.
At present, Samsung, Huawei, Oppo and Vivo are the leading Android phone makers. However, Google recently unveiled Pixel, its flagship phone for the premium category. Does it mean that Google has its eyes set on the premium handset category market? Only time can tell.
Vishal Ingole, November 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Code.gov: Missing Some Stuff
November 9, 2016
I love US government Web sites. They come and then they fade. The new kid on the block is Code.gov. The idea is that the US government has created a portal for open source software. Here are the entities whose code is available to anyone able to navigate to this link.
- Agriculture
- Commerce
- EPA
- Energy
- Executive Office of the President
- GSA (home of 18F)
- Labor
- NASA
- National Archives and Records Administration
- OPM (yep, the security conscious folks)
- Treasury
- Veterans Affairs (what are you looking at, cupcake?)
I did notice some interesting gaps; for example, does the Department of Defense have open source software? Well, maybe not. We do include a pointer to more than 100 useful programs in the forthcoming Dark Web Notebook. Want to reserve a copy? Write benkent2020 at yahoo dot com and we’ll put your name on the list.
Stephen E Arnold, November 9, 2016
Solr: The Prestigious Bossie Winner
September 30, 2016
Beyond Search learned that open source search and retrieval solution Solr won a Bossie Award. The outfit involved in the awards said that Solr was a trusted and mature search engine technology.” Big outfits using Solr include Zappos, Comcast, and DuckDuckGo.
Also bringing home an award was Lucene. The description of Elasticsearch pointed out:
As part of the ELK stack (Elasticsearch, Logstash, and Kibana, all developed by Elasticsearch’s creators, Elastic), Elasticsearch has found its killer app as an open source Splunk replacement for log analysis.
Users of Lucene include Microsoft and LinkedIn. (What’s the problem with SharePoint Search? What prevents Microsoft from using Fast Search & Transfer technology in lieu of open source search?)
Why are Solr and Lucene the go to search utilities? Free? Actual bug fixes and not excuses? No licensing leg shackles? Did I mention free?
Stephen E Arnold, September 30, 2016
The Uncertain Fate of OpenOffice
September 27, 2016
We are in danger of losing a popular open-source alternative to the Microsoft Office suite, we learn from the piece, “Lack of Volunteer Contributors Could Mean the End for OpenOffice” at Neowin. Could this the fate of open source search, as well?
Writer William Burrows observes that few updates for OpenOffice have emerged of late, only three since 2013, and the last stable point revision was released about a year ago. More strikingly, it took a month to patch a major security flaw over the summer, reports Burrows. He goes on to summarize OpenOffice’s 14-year history, culminating it the project’s donation to Apache by Oracle in 2011. It appears to have been downhill from there. The article tells us:
It was at this point that a good portion of the volunteer developer base reportedly moved onto the forked LibreOffice project. Since becoming Apache OpenOffice, activity on project has diminished significantly. In a statement by Dennis Hamilton, the project’s volunteer vice president, released in an email to the mailing list it was suggested that “retirement of the project is a serious possibility” citing concerns that the current team of around six volunteer developers who maintain the project may not have sufficient resources to eliminate security vulnerabilities. There is still some hope for OpenOffice, though, with some of the contributors suggesting that discussion about a shutdown may be a little premature, and that attracting new contributors is still possible.
In fact, OpenOffice was downloaded over 29 million times last year, so obviously it still has a following. LibreOffice is currently considered more successful, but that could change if OpenOffice manages to attract a resurgence of developers willing to contribute to the project. Any volunteers?
Cynthia Murrell, September 27, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/