On the Value of Customized Sentiment Analysis

August 26, 2014

Natural language processing—one of its most-discussed functions in business is sentiment analysis. Over at the SmartData Collective, Lexalytics’ Scott Van Boeyen tells us “Why Sentiment Analysis Engines Need Customization.” The short answer: slang. The write-up explains:

The problem with sentiment analysis is sometimes it’s wrong.[…]

“Oh man, that was nasty!” Is this sentence positive or negative? Surely, it must be negative. “Nasty” is a negative word, and everything else in this sentence is neutral. Final answer, negative! Drum roll…. Wrong! It’s positive.

The person who said this used the American slang definition of nasty, which has positive sentiment. There is absolutely no way to know by reading the sentence. So, if you (a human) were just tricked by reading this article, how is a machine supposed to figure it out? Answer: Tell the engine what’s positive and what’s negative.

High quality NLP engines will let you customize your sentiment analysis settings. “Nasty” is negative by default. If you’re processing slang where “nasty” is considered a positive term, you would access your engine’s sentiment customization function, and assign a positive score to the word.

The man has a point. Still, we are left with a few questions: How much more should one expect to pay for a customization feature? Also, how long does it take to teach an NLP platform comprehensive alternate vocabulary? How does one decide what slang to include—has anyone developed a list of suggestions? Perhaps one could start by consulting the Urban Dictionary.

Cynthia Murrell, August 26, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

SharePoint Online Admin Center Simplifies

August 26, 2014

Microsoft is trying to simplify the Office 365 experience from all angles, but the latest SharePoint simplification focuses on the SharePoint Online admin center. Read all the details in the WinBeta article, “Simplified Office 365 Admin Experience Arrives for SharePoint Online Admin Center.”

The article begins:

“Microsoft has simplified the SharePoint Online admin center, as part of the company’s journey to simplify the Office 365 admin experience. You can now choose between a simple or advanced admin experience, control top navigation items, and block workflows from being used from your environment. The simple admin center experience displays only the essential options used in the most common scenarios. This includes site collection management, user profile management, and the main settings for external sharing, Information Rights Management, and more. The advanced admin center, on the other hand, offers access to all SharePoint Online management capabilities.”

The simplification is needed and welcome. However, using the admin center will still be far from intuitive. SharePoint is a huge infrastructure and tips, tricks, and workarounds are still greatly needed for most users and administrators. Stephen E. Arnold has dedicated his career to all things search, and reports his findings on ArnoldIT.com. His SharePoint feed is a great resource for those who are learning more about all angles of SharePoint.

Emily Rae Aldridge, August 26, 2014

Hackers Leverage Elasticsearch Flaw in the Cloud

August 25, 2014

Just as Elasticsearch is reveling in its recent successes, CloudPro informs us that “Hackers Target Elasticsearch to Set Up DDoS Botnet on AWS.” Writer Rene Millman reports that cloud providers besides Amazon Web Services could be affected by the attacks, which leverage a vulnerability in the older Elasticsearch 1.1 versions. Because of its ability to run on multiple nodes, Elasticsearch’s open source, Java-based full-text-search application is a popular choice for use with cloud environments. The article describes the vulnerability hackers are now exploiting:

“Researchers at Kaspersky Labs have found that cybercriminals have exploited a flaw in the software to install DDoS malware on various clouds. The flaw was found in Elasticsearch v. 1.1x and a scripting exploit. The software has default support for active scripting, but does not use authentication and also does not sandbox the script code. Criminals can use the flaw to hack into EC2 VMs and then use a use a new variant of Linux DDoS Trojan Mayday – Backdoor.Linux.Mayday.g – to launch their attack, according to Kaspersky Lab principal security researcher Kurt Baumgartner.”

Millman goes on to quote a blog post by Kurt Baumgartner, principal security researcher at Kaspersky Lab. Baumgartner states:

“The [Mayday variants] in use on compromised EC2 instances oddly enough were flooding sites with UDP traffic only. The flow is strong enough that the DDoS’d victims were forced to move from their normal hosting operations IP addresses to those of an anti-DDoS solution.

“The flow is also strong enough that Amazon is now notifying their customers, probably because of potential for unexpected accumulation of excessive resource charges for their customers. The situation is probably similar at other cloud providers.”

Unsurprisingly, the goal of these attacks seems to be financial. Baumgertner notes that among those affected by this attacks are a large regional U.S. bank, a large electronics maker, and a Japanese service provider. For its part, Amazon is urging users to upgrade asap to the latest version of Elasticsearch, which is free from this vulnerability.

Cynthia Murrell, August 25, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Questioning How To Search New Sound files

August 25, 2014

Sound is an underrated science, but it is quite an amazing topic to study. MIT News reports an amazing experiment: “Extracting Audio From Visual Information.” The article explains that Adobe, Microsoft, and MIT researchers developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. The team has been able to get audible files of the leaves of a potted plant, the surface of a glass of water, aluminum foil, and vibrations from a potato-chip bag.

The sound files can be used by law enforcement organizations, but MIT graduate student Abe Davis says it creates a “new kind of imaging.”

“ ‘We’re recovering sounds from objects,’ [Davis] says. ‘That gives us a lot of information about the sound that’s going on around the object, but it also gives us a lot of information about the object itself, because different objects are going to respond to sound in different ways.’”

The team speculates that the technology community will embrace the research and amazing applications will be developed from it. The new sound technology will also create a new slew of content. How will we search the new content? A specific and exact ontology will be needed to distinguish sound files. Will a search application smart enough to read the sound data be developed to identify the user’s information need? Oh wait, enterprise search systems index “all information” so it already exists.

Whitney Grace, August 25, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Open Source Software Costs: The Wal-Mart View

August 24, 2014

Which is more economical? Proprietary software or open source software? Which approach delivers greater “value”? In Wal-Mart’s tussle with Amazon, will it deliver a better online experience for shopping, search, and logistics? I ask because the Wal-Mart closest to Harrod’s Creek has fewer products, dimmer lighting, and restocking challenges in my experience.

Some information that may help answer these questions appeared in “Wal-Mart’s Investment in Open Source Isn’t Cheap.” Note that this publication is owned by IDG / IDC the mid tier consulting firm that sold my content on Amazon without my permission. Some details are at this link.

This write up explains that open source software is more than a price:

Wal-Mart has put in place a set of metrics to estimate the return on investment. Hammer explains “every five startups using Hapi translated to the value of one full-time developer, while every 10 large companies translated to one full-time senior developer.” In return for its extra work on open development, Wal-Mart gets high-quality programming at a cost far below that of recruiting and retaining extra staff. In turn, this demonstrable return allows the company to justify further development investment because “by paying developers to work on Hapi full time, we get back twice (or more) that much in engineering value.”

Wal-Mart, however, is a place that sells stuff at what looks like low prices. There are some legal arabesques related to Wal-Mart’s parsimonious streak.

Three questions:

  • Is Wal-Mart looking for ways to obtain maximum freedom from traditional vendors, not just value or cost savings. Freedom can translate to handling software the Wal-Mart way?
  • Will developers find themselves subject to the same cost parameters that Wal-Mart has honed to deliver its competitive prices?
  • How will Wal-Mart adapt when an open source project loses its community?

With Amazon looking more and more proprietary, Wal-Mart seems to be heading in the opposite direction. Will Wal-Mart out Amazon Amazon or will Wal-Mart become more like Amazon?

The search experience for both Amazon and Wal-Mart online is often frustrating. Perhaps in a few months one of these discounters will crack their information retrieval nuts.

For those looking for information about the cost of open source, the Wal-Mart approach is worth tucking into one’s card file.

Stephen E Arnold, August 24, 2014

Forrester and Physical Storage: HP Autonomy May Like This Mid Tier Prognostication

August 23, 2014

I recommend reading “Forrester Says It’s Time to Give Up on Physical Storage Arrays.” The position of the mid tier consulting firm is clear: Local storage bad, cloud storage good. What’s missing is nuance. The comments point out a couple of issues with this Promethean assertion; for example:

  • The time has therefore come to recognize that arrays are expensive and inflexible, Baltazar says, and make the jump to virtual arrays for future storage purchases. Fancy words for outsource and off site.—from Ole Juul
  • Until workmen outside cut through your comms cable …… It can and does happen (Power cable for one company I worked for, water mains for another). We hear all about the resilience built up at the other end to near guaranty your data, but there’s always single points of failure much closer to home.—from Dappman
  • Data needs to be local. How can you move 1000TB of data around? The storage needs to be local to where it’s being used. Increasingly, the data is coming in from the cloud. What happens in the cloud stays in the cloud(R).—from Anonymous Coward

But for me the article tips Forrester’s hand with regard to HP Autonomy. HP is reporting record revenues from sales of PCs. HP is emphasizing the value of HP Autonomy IDOL as an enterprise app. Against this background, I noted this passage in the source article:

Forrester knows this, too: one of its analysts, Henry Baltazar, just declared you should “make your next storage array an app”.

I look forward to HP’s picking up on this “expert” opinion and giving the hobby horse a whack. Content marketing? Yep yep.

Stephen E Arnold, August 23, 2014

Will Apps Built on IDOL Gin Cash?

August 23, 2014

I don’t know if the data in “Most Smartphone Users Download Zero Apps per Month.” The majority (65%) of smartphone users download zero apps per month. I suppose the encouraging point in the write up is 35% of smartphone users download more than one per month. The Hewlett Packard IDOL app can be a slam dunk when HP unleashes IDOL enterprise apps. If HP converts just one percent of the 35 percent, millions will flow to the printer ink and personal computer company. At least, that’s one way to interpret the data the MBA way. Plug those numbers into Excel, fatten up the assumptions, and the money is in the virtual bank. At least that’s one way to leverage spreadsheet fever into a corporate initiative for Big Data IDOL enterprise apps.

Stephen E Arnold, August 23, 2014

Google Android: Fragmentation? No Fragmentation.

August 22, 2014

I read “There Are 18,796 Distinct Android Devices, According to OpenSignal’s Latest Fragmentation Report.” I noted this factoid in the write up:

18,796 separate Android devices

Several years ago, one of the interchangeable Google mobile engineers emphasized that there was minimal Android fragmentation.

One aspect of this issue is the emergence of open source Android. Has Google lost control of Android and the opportunity to extract high end device revenue in its quest for ads?

At least one Chinese phone outfit is working the angle “Show me the money.” With many distinct Android devices and folks going their own way like Amazon and Samsung, Google does not have a fragmentation problem. Google has competition, confusion, and cash challenges breeding and cross breeding.

I know the Google response, “Trivial.” If Google believes this, will a meta-tactics grind the challengers to disconnected ones and zeros?

Stephen E Arnold, August 22, 2014

Google Barge Scheme Abandoned Without Comment

August 22, 2014

The article on The Portland Press Herald titled Scrap the Mystery: High-tech Vision for Google barge Crumbles in a Heap reports that the mysterious barge that landed in Portland’s harbor on October 10, 2013, has been relegated to the scrap pile. The barge was believed to be intended for an elite showroom for Google’s latest innovations, such as Google Glass. The remaining question is why abandon the project? Google did not comment, but the article states,

“After some digging by reporters on both coasts, Google admitted that it had commissioned the barges to serve as “an interactive space where people can learn about new technology.” When finished, the barge in Portland was to be towed to New York City and opened for an invitation-only crowd of hip and affluent urbanites. Never mind… the structure…was being prepared to leave Portland for an ocean voyage to an undisclosed location….The containers, though, will be disassembled at Turner’s Island and scrapped”

This was a major disappointment for Portland, a disappointment soothed by the half a million dollars in property taxes accumulated on the barge while it sat in the harbor. That money, along with the cost of assembling the containers now headed for use as scrap metal, has many interested parties scratching their heads. Is this a metaphor for the future of Google’s moon shots? A second barge still sitting in San Francisco’s bay might answer that question.

Chelsea Kerwin, August 22, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Where to Find Digitized Medieval Manuscripts Online

August 22, 2014

The article titled Consulting Medieval Manuscripts Online from The University of Tennessee at Martin is an offering of links to some 13,000 digitized manuscripts. It is an amazing thing to consider that these manuscripts, having survived thus far, will now be safe for posterity. Unfortunately, search across the collections is not available, but the manuscripts are organized as follows,

“Below, I have included a block of links leading to collections containing fully digitized medieval manuscripts (over 13,000), one for digitized individual manuscripts, and one devoted to projects choosing to digitize selected pages for things like illustrations, examples musical notation, etc. This page is part of the Andy Holt Virtual Library’s section on medieval source-based textual scholarship. You may also wish to consult some of the incunabula readable on line.”

The article gives credit UCLA’s web database, the Catalogue of Digitized Medieval Manuscripts for their collection of over 3,000 works. That collection is also easier to navigate due to search capabilities based on title, location, author, and perhaps most importantly, language. You may notice on that site the note that warns the collection is not comprehensive, along with a mention that UCLA’s “active” work on the collection is on hiatus. Hopefully that will not be the case for UTM’s collections, which already dwarfs UCLA’s.

Chelsea Kerwin, August 22, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta