Natural Language Takes Lessons from Famous Authors

April 18, 2016

What better way to train a natural language AI than to bring venerated human authors into the equation? Wired reports, “Google Wants to Predict the Next Sentences of Dead Authors.” Not surprisingly, Google researchers are tapping into Project Gutenberg for their source material. Writer Matt Burgess relates:

“The network is given millions of lines from a ‘jumble’ of authors and then works out the style of individual writers. Pairs of lines were given to the system, which made a simple ‘yes’ or ‘no’ decision to whether they matched up. Initially the system didn’t know the identity of any authors, but still only got things wrong 17 percent of the time. By giving the network an indication of who the authors were, giving it another factor to compare work against, the computer scientists reduced the error rate to 12.3 percent. This was also improved by a adding a fixed number of previous sentences to give the network more context.”

The researchers carry their logic further. As the Wired title says, they have their AI predict an author’s next sentence; we’re eager to learn what Proust would have said next. They also have the software draw conclusions about authors’ personalities. For example, we’re told:

“Google admitted its predictions weren’t necessarily ‘particularly accurate,’ but said its AI had identified William Shakespeare as a private person and Mark Twain as an outgoing person. When asked ‘Who is your favourite author?’ and [given] the options ‘Mark Twain’, ‘William Shakespeare’, ‘myself’, and ‘nobody’, the Twain model responded with ‘Mark Twain’ and the Shakespeare model responded with ‘William Shakespeare’. Asked who would answer the phone, the AI Shakespeare hoped someone else would answer, while Twain would try and get there first.”

I can just see Twain jumping over Shakespeare to answer the phone. The article notes that Facebook is also using the work of human authors to teach its AI, though that company elected to use children’s classics  like The Jungle Book, A Christmas Carol, and Alice in Wonderland. Will we eventually see a sequel to Through the Looking Glass?

 

 

Cynthia Murrell, April 18, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Watson Weekly: IBM Watson Hooks Up with Hana

April 17, 2016

Is this a Tinder tech date or what? I read “IBM to Bring Watson’s Cognitive Capabilities to SAP Customers.” IBM’s strategy for cognitive is to either partner with or acquire every possible technology company it seems. The write up reports:

IBM Corp. is turning to its partners for help with widening the adoption of Watson in the enterprise. As part of the effort, the company this morning announced an alliance with SAP SE that will see the capabilities of the cognitive computing platform made available for users of the latter’s flagship S/4HANA business software suite.

Has anyone asked an SAP customer if he or she needs Watson?

I learned:

If the current feature set is anything to go by, then SAP and IBM are probably looking to deliver something akin to what mutual rival Microsoft Corp. offers with its Cortana Intelligence Suite. The bundle combines the virtual assistant with a number of Redmond’s cloud-based analytics services to make complex operational information accessible to everyday knowledge workers. Big Blue’s announcement specifies that the Watson integration will similarly target a “broad range of business users and … all C-suite professions.”

I wonder if SAP customers using Microsoft technology will eagerly embrace Watson.

Keep the PR machine, if not the revenues, flowing.

Stephen E Arnold, April 17, 2016

Watson Impresses a Stakeholder

April 16, 2016

I read “IBM Shows Me What Watson Can Do.” One of the points I noted about the write up was that it was written by a person who sort of thought Watson was a “computer language.” I think of Watson as open source software, acquired technology, and home brew code.

I noted this statement:

The folks at IBM ran me through a couple of examples of what Watson does. Some were more impressive then others, but one example stuck in my mind because of the language component. The company wouldn’t reveal its partner’s name, but an insurance company is using Watson to help increase online sales. According to IBM that customer has seen a high single-digit uptick in online sales because of Watson.

I love rock solid case examples.

I noted this statement:

But, like a human, Watson doesn’t always come up with the right answers at first. Watson makes mistakes while it’s learning. It understands things in the wrong way and pulls the wrong answers out of the information it has at its disposal. The team working with Watson then corrects it and tries again with another question. The time this takes depends on a lot of different variables, of course, but one customer took a year to train Watson.

How do I know that this write up may not reflect the sentiments of an objective, “real journalist.” Here’s the disclaimer:

I wrote this article myself, and it expresses my own opinions. I am not receiving compensation for it (other than from Seeking Alpha). I have no business relationship with any company whose stock is mentioned in this article.

Stakeholders, how did you like the write up? More important: Watson, how do you feel about the write up?

Stephen E Arnold, April 16, 2016

Interface Design: An Argument for the IBM i2 Approach

April 15, 2016

i read “Why I Love Ugly, Messy Interfaces — and You Probably Do Too.” I have been checking out information about interfaces for augmented intelligence or what I call “cyber OSINT.” The idea I am exploring is how different vendors present information functions to people who are working under pressure. Now the pressure in which I am interested involves law enforcement, intelligence, and staying alive. I am not too worried about how to check the weather on a mobile phone.

The write up points out that

…there is no single right way to do things. There’s no reason to assume that having a lot of links or text on a page, or a dense UI, or a sparse aesthetic is fundamentally bad — those might be fine choices for the problem at hand. Especially if it’s a big, hairy problem. Products that solve big, hairy problems are life savers. I love using these products because they work so damn well. Sure they’re kind of a sprawling mess. That’s exactly why they work!

Consider the IBM i2 Analyst’s Notebook interface. Here’s an example courtesy of Google Images:analyst notebook

The interface has a menu bar across the top, display panels, and sidebar options. In order to use this application which is called Analyst’s Notebook, one attends classes. Years ago I did a little work for i2 before it became part of IBM. Without regular use of the application, I would forget how to perform certain tasks.

There is a competitor to i2’s Analysts Notebook: Palantir Gotham. Again, courtesy of Google Images, here’s an example of the Palantir Gotham interface:

palantirThe interface includes options in the form of a a title bar with icons, a sidebar, and some right click features which display a circular context menu.

The principal difference between the two interfaces boils down to color.

There are some significant differences, and these include:

  • Palantir provides more helper and wizard functions. These allow a user to perform many tasks without sitting through five or more days of classroom and hands on instruction.
  • The colors and presentation are more stylish, not exactly a mobile phone app approach but slicker than the Analyst’s Notebook design
  • The interface automates more functions. Both applications require the user to perform some darned tedious work. But once that work is completed, Gotham allows software to perform some tasks with a mouse click.

My point is that interface choices and functionality have to work together. If the work flows are not assisted by the interface and smart software, simple or complex interfaces will be a barrier  to quick, high value work.

When someone is shooting at the person operating the laptop with either of these applications in use, the ability to complete a task without confusion is paramount. Confusing pretty with staying alive is not particularly helpful.

Stephen E Arnold, April 15, 2016

Talk to Text: Problem. What Problem?

April 15, 2016

I marvel at the baloney I read about smart software. The most effective systems blend humans with sort of smart software. The interaction of the human with the artificial intelligence can speed some work processes. But right now, I am not sure that I want a smart software driven automobile to navigate near the bus on which I am riding. I don’t need smart automobile keys which don’t work when the temperature drops, do you? I am not keen on reading about the wonders of IBM Watson type systems when IBM struggles to generate revenue.

I read “Why Our Crazy-Smart AI Still Sucks at Transcribing Speech.” Frankly I was surprised with the candor about the difficulty software has in figuring out human speech. I highlighted this passage:

“If you have people transcribe conversational speech over the telephone, the error rate is around 4 percent,” says Xuedong Huang, a senior scientist at Microsoft, whose Project Oxford has provided a public API for budding voice recognition entrepreneurs to play with. “If you put all the systems together—IBM and Google and Microsoft and all the best combined—amazingly the error rate will be around 8 percent.” Huang also estimates commercially available systems are probably closer to 12 percent. “This is not as good as humans,” Huang admits, “but it’s the best the speech community can do. It’s about as twice as bad as humans.”

I suggest your read the article. My view is that speech recognition is just one area which requires more time, effort, research, and innovation.

The situation today is that as vendor struggle to prove their relevance and importance to investors, many companies are struggling to generate sustainable revenue. In case anyone has not noticed, Microsoft’s smart system Tay was a source of humor and outrage. IBM Watson spends more on marketing the wonders of its Lucene, acquired technology, and home brew confection than many companies earn in a year.

There are folks who insist that speech to text is not that hard. It may not be hard, but this one tiny niche in the search and content processing sector seems to be lagging. Hyperbole, assurance, and marketing depict one reality. The software often delivers a different one.

Who is the leader? The write up points out:

…most transcription start-ups seem to be mainly licensing Google’s API and going from there.

Yep, the Alphabet Google thing.

Stephen E Arnold, April 15, 2016

First Surface Web Map of the Dark Web

April 15, 2016

Interested in a glimpse of the Dark Web without downloading Tor and navigating it yourself? E-Forensics Magazine published Peeling back the onion part 1: Mapping the Dark Web by Stuart Peck, which shares an overview of services and content in this anonymity-oriented internet. A new map covering the contents of the Dark Web, the first one to do so, was launched recently by a ZeroDayLab key partner, and threat intelligence service Intelliagg. The write-up explains,

“But this brings me to my previous point why is this map so important? Until recently, it had been difficult to understand the relationships between hidden services, and more importantly the classification of these sites. As a security researcher, understanding hidden services, such as private chat forums and closed sites, and how these are used to plan and discuss potential campaigns, such as DDoS, Ransom Attacks, Kidnapping, Hacking, and Trading of Vulnerabilities and leaked data, is key to protecting our clients through proactive threat intelligence.”

Understanding the layout of an online ecosystem is an important first step for researchers or related business ventures. But what about a visualization showing these web services are connected to functions, such as financial and other services, with brick-and-mortar establishments? It is also important to that while this may be the first Surface Web map of the Dark Web, many navigational “maps” on .onion sites that have existed as long as users began browsing on Tor.

 

Megan Feil, April 15, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Microsoft Azure Plans Offers Goldilocks and Three Bears Strategy to Find Perfect Fit

April 15, 2016

The article on eWeek titled Microsoft Debuts Azure Basic Search Tier relates the perks of the new plan from Microsoft, namely, that it is cheaper than the others. At $75 per month (and currently half of for the preview period, so get it while it’s hot!) the Basic Azure plan has lower capacity when it comes to indexing, but that is the intention. The completely Free plan enables indexing of 10,000 documents and allows for 50 megabytes of storage, while the new Basic plan goes up to a million documents. The more expensive Standard plan costs $250/month and provides for up to 180 million documents and 300 gigabytes of storage. The article explains,

“The new Basic tier is Microsoft’s response to customer demand for a more modest alternative to the Standard plans, said Liam Cavanagh, principal program manager of Microsoft Azure Search, in a March 2 announcement. “Basic is great for cases where you need the production-class characteristics of Standard but have lower capacity requirements,” he stated. Those production-class capabilities include dedicated partitions and service workloads (replicas), along with resource isolation and service-level agreement (SLA) guarantees, which are not offered in the Free tier.”

So just how efficient is Azure? Cavanagh stated that his team measured the indexing performance at 15,000 documents per minute (although he also stressed that this was with batches organized into groups of 1,000 documents.) With this new plan, Microsoft continues its cloud’s search capabilities.

 

 

Chelsea Kerwin, April 15,  2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Free Book? Semantic Mining of Social Networks

April 14, 2016

I saw a reference to a 2015 book, Semantic Mining of Social Networks by Jie Tang and Juanzi Li. This volume consists of essays about things semantic. Published by Morgan & Claypool publishers, the link I clicked did not return a bibliographic citation nor a review. The link displayed the book which appeared to be downloadable. If your engines are revved with the notion of semantic analysis, you may want to explore the volume yourself. I advocate purchasing monographs. Here’s the link I followed. Keep in mind that if the link 404s you, the fault is not mine.

Stephen E Arnold, April 14, 2016

eBay and Facebook: Different Spins in Online Sales

April 14, 2016

I noted two seemingly unrelated items about two different companies. Here are the two items:

  1. Russian Diplomat: ISIS Making $200 Million Selling Stolen Artifacts on eBay
  2. Weapons for Sale on Facebook in Libya

In our work on the “Dark Web Notebook,” we have examined a number of sites which purport to offer contraband or prohibited products. These sites have been accessible using special software.

What is interesting is that the difference between the Dark Web and the “regular” Web seem to be blurring.

If these two stories are accurate, questions about governance by the owners of the Web sites may be raised. Since we began working on this new study of online content, we have noted that the boundary separating the Web which billions use from the Web tailored to a smaller set of online users is growing more difficult to discern.

In itself, the boundary’s change is interesting.

Stephen E Arnold, April 14, 2016

The Force of the Dark Web May Not Need Sides

April 14, 2016

The name “Dark Web” has sensational language written all over it. Such a label calls for myth-busting articles to be published, such as the recent one from Infosecurity Magazine, The Dark Web — Is It All Bad?. This piece highlights the opinions of James Chappell, CTO and Co-founder of Digital Shadows, who argues the way the Dark Web is portrayed in the media pigeonholes sites accessible by Tor as for criminal purposes. Chappell is quoted,

“Looking at some of the press coverage you could be forgiven for thinking that the Dark Web is solely about criminality,” he told Infosecurity. “In reality, this is not the case and there are many legitimate uses alongside the criminal content that can be found on these services. Significantly – criminality is an internet-wide problem, rather than exclusively a problem limited to just the technologies that are labelled with the Dark Web.”

The author’s allusion to Star Wars’ divided force, between supposed “good” and “bad” seems an appropriate analogy to the two sides of the internet. However, with a slightly more nuanced perspective, could it not be argued that Jedi practices, like those of the Sith, are also questionable? Binaries may be our preferred cultural tropes, as well as the building blocks of computer software programming, but let’s not forget the elements of variability: humans and time.

 

Megan Feil, April 14, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta