Lexalytics Offers Tunable Text Mining

May 13, 2015

Want to do text mining without some of the technical hassles? if so, you will want to read about Lexalytics “the industry’s most tunable and configurable text mining technology.” Navigate to “Lexalytics Unveils Industry’s First Wizard for Text Mining and Sentiment Analysis.” I learned that text mining can be fun, easy, and intuitive.” I highlighted this quote from the news story as an indication that one does not need to understand exactly what’s going on in the text mining process:

“Before, our customers had to understand the meaning of things like ‘alpha-numeric content threshold’ and ‘entities confidence threshold,'” Jeff continued. “Lexalytics provides the most knobs to turn to get the results exactly as you want them, and now our customers don’t even have to think about them.”

Text mining, the old-fashioned way, required understanding of what was required, what procedures were appropriate, and ability to edit or write scripts. There are other skills that used to be required as the entry fee to text mining. The modern world of interfaces allows anyone to text mine. Do users understand the outputs? Sure. Perfectly.

As I read the story, I recalled a statement in “A Review of Three Natural Language Processors, AlchemyAPI, OpenCalais, and Semantria.” Here is the quote I noted in that July 2014 write up by Marc Clifton:

I find the concept of Natural Language Processing intriguing and that it holds many possibilities for helping to filter and analyze the vast and growing amount of information out there on the web.  However, I’m not quite sure exactly how one uses the output of an NLP service in a productive way that goes beyond simple keyword matching.  Some people will of course be interested in whether the sentiment is positive or negative, and I think the idea of extracting concepts (AlchemyAPI) and topics (Semantria) are useful in extracting higher level abstractions regarding a document.  NLP is therefore an interesting field of study and I believe that the people who provide NLP services would benefit from the feedback of users to increase the value of their service.

Perhaps the feedback was, “Make this stuff easy to do.” Now the challenge is to impart understanding to what a text mining system outputs. That might be a bit more difficult.

Stephen E Arnold, May 13, 2015

MBAs Gone Wild: Is There a Video?

May 13, 2015

I find Harvard fascinating. As a poor student in a small, ignored also ran university, I bumped into the fancy university folks at debate tournaments. Our record against many fine institutions, such as Dartmouth and the other Big Dogs was okay.

We won. Lots.

Then there were the Big Dog MBAs at Booz, Allen & Hamilton. Sigh. Wall Street forced more of these “special ones” into my radar screen. Delightful. There was an MBAism for every issue. Maybe not a correction or appropriate solution, but there was jargon, generalizations, and entitlement thinking. In meeting after meeting, I reviewed nifty, but often meaningless or incoherent, graphs or diagrams. I did not count this work as “quality time.” Today, as the beloved and now departed Yogi Berra said, “It’s déjà vu all over again.

I read “Where the Digital Economy Is Moving the Fastest” and circled a remarkable diagram. True, the write up is not about search, but the spirit of search and content processing system vendors tinted my perception, using only pleasing and compatible calming colors.

The authors developed criteria for countries which are moving fast. Not product sales or market share, countries. The world but for nation states like Yemen and its ilk. Then, like good MBAs, crafted a matrix and plotted the fast movers, the losers, the ones to watch, and the maybes. Here’s the graphic:

image

There are “Watch Out” countries. I admit I interpreted this label in a manner different from the article’s sense of the phrase. I checked out the countries between Stall Out (dogs) and Stand Out (invest for sure maybe?). Poor Sweden, Britain, and Germany. Look at the countries on the move. Check out the tweeners: Brazil, Turkey, and the Russian Federation.

Now this map, the graphic, and the meaning of the “data” strikes me as less than useful

The diagram reminded me of other consulting firms’ matrices. I snagged this at random from Google.

image

What about this version from

Which makes more sense? The diagram from the Harvard Business Review, the diagram with lots of dots, or the dead simple diagram from Boston Consulting Group?

From my point of view, the BCG approach makes the most sense. BCG analyzed market share data, actual numbers. The diagram presents visual cues which related directly to the numerical data., The viewer of the diagram does not have to wonder why a specific company is in one box or why a specific country is a “break out.”

Whether analyzing countries or companies on essentially methods which do not tie directly to numbers, the diagrams raise more questions than they answer. The BCG matrix, which consultants at McKinsey and Booz, Allen & Hamilton envied when the BCG matrix became available in the 1970s was, “Wow, what a great diagram?”

What I believe has happened is that the value of the envy-generating BCG matrix is based in the data to which the quadrants tied because the focus was a specific product and its market share. If the share was increasing and lights were flashing green, then the product was a star.

The more recent versions of the matrix do the diagram, add complexity, and lack the quite specific analysis of numerical data which another person could analyze and, presumably, reach similar data-anchored observations. Math and data are helpful when properly combined.

In short, data and the BCG analysis make the BCG matrix an effective communication tool. Matrices without similar data rigor are artifacts of “experts” who want something that makes a sale possible.

BCG wanted to help its clients, not confuse them. MBAs, please, do not write me and tell me I don’t understand the sophistication of the methods underpinning these presentation ready diagrams. For me there is a gap between data-anchored graphics and subjective or opinion-based graphics.

Poetry and fiction are noble pursuits. Data may not exciting. But for me, an old school BCG matrix more satisfying as long as there are verifiable data unpinning the items mapped to the matrix.

Stephen E Arnold, May 23, 2015

IBM and the Watson Burrito: A Semi Classic from Armonk

May 13, 2015

A podcaster used the phrase “emotional burrito.” I liked the connotation. IBM has crafted a burrito recipe. I by chance saw a link to a story about the IBM Watson burrito. “IBM’s Watson Designed The Worst Burrito I’ve Ever Had” explains a recipe generated by IBM’s smart system. Keep in mind that Watson is now helping physicians treat cancer.

According to the article:

you zest a whole orange into a skillet of ground beef with a pinch of cinnamon. You puree some plain edamame as a sort of salsa. And you reduce apricot puree, vanilla bean, and dark chocolate to create a sauce. Then you top off this mixture with some cotija cheese. Well that’s what the recipe said. Then I noticed an asterisk. Apparently Watson had originally suggested cheese curds, while the human chef in charge of this recipe, ICE Creative Director Michael Laiskonis, had opted for a more traditional cotija cheese. I figured, when one of the smartest AI engines on the planet tells you to eat more cheese curds, you eat more cheese curds. So I opted for cheese curds.

The author a cooking wizard followed IBM’s recipe. The author says:

I prepared to roll $32 worth of groceries into a warm tortilla shell, I doubled my pour of sake, and soldiered on following General Watson to burrito town.

So how did this discovery by Watson work out?

My meal wasn’t good. In fact, it was pretty bad. If I’d made this for my family, I’d apologize and suggest we order a pizza.

I get my burritos at the lone semi-authentic restaurant in Harrod’s Creek. I think the chef microwaves burritos from Kroger. They are at least burritos. IBM’s PR department is not reassuring me about Watson’s open source code, home brew scripts, and software from acquired companies.

My concern with the Watson burrito is a question, “What happens if Watson’s cancer treatment is like this chocolate, edamame apricot salsa, and cheese burrito?”

Do the docs and nurses order a pizza? Hold the apricots.

Stephen E Arnold, May 13, 2015

The Forgotten List of Telegraph

May 13, 2015

Technology experts and information junkies in the European Union are in an uproar over a ruling that forces Google to remove specific information from search results.  “The right to be forgotten” policy upheld by the EU is supposed to help people who want “inadequate, irrelevant, or no longer relevant” information removed from Google search results.  Many news outlets in Europe have been affected, including the United Kingdom’s Telegraph.  The Telegraph has been recording a list called “Telegraph Stories Affected By ‘EU Right To Be Forgotten’” of all the stories they have been forced to remove.

According to the article, the Google has received over 250,000 requests to remove information.  Some of these requests concern stories published by Telegraph.  While many oppose the ‘right to be forgotten,’ including the House of Lords, others are still upholding the policy:

“But David Smith, deputy commissioner and director of data protection for the Information Commissioner’s Office (ICO), hit back and claimed that the criticism was misplaced, ‘as the initial stages of its implementation have already shown.’ ”

Many of the “to be forgotten” requests concern people with criminal pasts and misdeeds that are color them in an bad light.  The Telegraph’s content might be removed from Google, but they are keeping a long, long list on their website.  Read the stories there or head on over to the US Google website-freedom of the press still holds true here.

Whitney Grace, May 13, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The Philosophy of Semantic Search

May 13, 2015

The article Taking Advantage of Semantic Search NOW: Understanding Semiotics, Signs, & Schema on Lunametrics delves into semantics on a philosophical and linguistic level as well as in regards to business. He goes through the emergence of semantic search beginning with Ray Kurzweil’s interest in machine learning meaning as opposed to simpler keyword search. In order to fully grasp this concept, the author of the article provides a brief refresher on Saussure’s semantics.

“a Sign is comprised of a signifier, or the name of a thing, and the signified, what that thing represents… Say you sell iPad accessories. “iPad case” is your signifier, or keyword in search marketing speak. We’ve abused the signifier to the utmost over the years, stuffing it onto pages, calculating its density with text tools, jamming it into title tags, in part because we were speaking to robot who read at a 3-year-old level.”

In order to create meaning, we must go beyond even just the addition of price tag and picture to create a sign. The article suggests the need for schema, in the addition of some indication of whom and what the thing is for. The author, Michael Bartholow, has a background in linguistics and marketing and search engine optimization. His article ends with the question of when linguists, philosophers and humanists will be invited into the conversation with businesses, perhaps making him a true visionary in a field populated by data engineers with tunnel-vision.

Chelsea Kerwin, May 13, 2014

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

CyberOSINT Videos

May 12, 2015

Xenky.com has posted a single page which provides one click access to the three CyberOSINT videos. The videos provide highlight of Stephen E Arnold’s new monograph about next generation information access. You can explore the videos which run a total of 30 minutes on the Xenky site. One viewer said, “This has really opened my eyes. Thank you.”

Kenny Toth, May 12, 2015

Another Google Challenger with Two Angles: Anonymity and Charity

May 12, 2015

I read another “Google killer” write up today. Hope springs eternal I know. The article is “New Search Engine from Waterfox Founder Aims to Take a Punch at Google.” The idea is that the search engine will offer “users absolute privacy online.” I thought that the combination of an alias, a VPN, and the Tor bundle delivered at least some privacy online.

The idea is that the new Storm search system will deliver anonymized search and pay some money derived from the system to non profit outfits. The article reports:

The aim is to tempt millions of users away from Google and create substantial revenues for worthy organizations. Up to £20 could be generated from each active user per year for charitable causes, the company claims.

The article points out that Storm has come competition:

Most of the successful entrants offer users the ability to search the web privately and securely, hiding their data from brands and data crunchers online. DuckDuckGo, which brands itself as a champion of privacy rights, has now been included on Apple’s internet browser Safari. Qwant, StartPage, and Ixquick are also vying for market share in the private browsing space.

My question, “What metasearch engines do you use regularly?”

Did I hear, “None.”

Another question, “Are you certain the queries are anonymous?”

Did I hear someone say, “I don’t know.”

Exactly.

Stephen E Arnold, May 12, 2015

Google and Your JavaScript: Good News and Reality

May 12, 2015

If you want to keep Mother Google happy, you want to feed her content which is mobile friendly, meets her rules for her children, and delivers bang up information to her many minions.

I read “We Tested How Googlebot Crawls Javascript And Here’s What We Learned.” You may want to check out the SEO oriented write up if you wonder why no one visits your Web site. (Tip: Most Web sites do not get much traffic. Traffic is often helped out by buying Adwords. Remember. You heard this from me. Oh, prepare to invest substantial sums for maximum payback. SEO is usually less efficacious.)

The article explains a series of tests to reveal how Mother Google interprets and makes use of JavaScript. I found this passage highlight worthy:

Javascript links work in a similar manner to plain HTML links (at face value, we do not know what’s happening behind the scenes in the algorithms).

JavaScript away. Just remember that Adwords deliver traffic. SEO is usually a somewhat less reliable method. But those SEO experts do charge money. So make your own decision. Adwords which work. SEO methods which are at best uneven.

Stephen E Arnold, May 12, 2015

MarkLogic: Now a Unicorn in Database Land

May 12, 2015

I read “Database Vendor MarkLogic Joins Billion Dollar Club with New Funding.” The main point for me is that MarkLogic is described as a “database vendor.” MarkLogic has been working hard to explain that it is an enterprise search vendor, a business intelligence vendor, and an XML publishing system appropriate for finance, health care, and publishing. There is MarkLogic DNA in Autonomy.

The headline brushes these assertions away, clearing the path for the unicorn to charge directly in the face of Oracle and maybe IBM.

According to the write up:

MarkLogic in the last few years has gained several new database rivals–including Cloudera Inc., last valued at $4.1 billion; MongoDB Inc., last valued at $1.6 billion; MapR Technologies Inc.; and Datastax Inc.–in addition to traditional competitors Oracle, Microsoft Corp. and International Business Machines Corp. MarkLogic customers include Dow Jones & Co., which publishes VentureWire and The Wall Street Journal. The company said the new money would be used to expand globally across Europe, Japan and Asia and invested in MarkLogic partners and in research and development.

Is this what MarkLogic will do with the money? I thought some of it would be allocated to purchase other firms; for example, companies which allegedly shore up MarkLogic’s content processing gaps. Concept Searching, Content Analyst, Smartlogic? Also, there may be some long suffering investors who want a payback for the millions pumped into the company. I noticed that the lead investor was Wellington Management with some help from Arrowpoint Partners.

Before the current president, I was working for some of the nifty outfits in Sillycon Valley. I learned that MarkLogic had missed some important financial targets. A spin of the revolving door put some new faces in familiar positions.

If one looks for MarkLogic today, the company is findable, but it maintains a comparatively low profile. I dropped the blog from my useful source list. I can’t recall the last time I saw a substantive link to the company in Twitter. I don’t see the company at some of the conferences I attend, but, hey, I attend some very specialized information centric hoe downs.

Several observations:

Oracle may expand on its”we’re a better XML database white paper which you can find here. An earlier paper called “Mark Logic XML Server 4.1” points out some issues which Oracle perceived in the MarkLogic approach. In a shoot out with Oracle, the bullets will fly. Does MarkLogic have the arsenal to deal with Oracle’s cache of armaments?

Will proprietary NoSQL data management systems be able to generate a billion in revenue in the next six or eight quarters? Outfits like Lucid Imagination (Really?) have been running into headwinds, and I think a similar weather system may turn MarkLogic’s sunny skies into a cloudy day. I understand that the Wall Street Journal is a MarkLogic believer? How many more can MarkLogic bring to its picnic? The assumption, I assume, is a lot.

MarkLogic’s core technology dates from 2001. Like many companies from this time period, MarkLogic has to find a way to get that old time start up excitement back. Companies which are 14 years old often continue along the same trajectory in my experience.

This will be interesting and maybe a big payday for the increasingly strapped owners of companies with technology which can caulk some leaks in the MarkLogic lake raft.

Stephen E Arnold, May 12, 2015

Emojis Spur Ancient Language Practices

May 12, 2015

Emojis, different from their cousin emoticons, are a standard in Internet jargon and are still resisted by most who grew up in a world sans instant connection.  Mike Isaac, who writes the New York Times Bits blog, tried his best to resist the urge to use a colon and parentheses to express his mood.  Isaac’s post “The Rise Of Emoji On Instagram Is Causing Language Repercussions” discusses the rise of the emoji language.

Emojis are quickly replacing English abbreviations, such as LOL and TTYL.  People are finding it easier to select a smiley face picture over having to type text.  Isaac points to how social media platforms like Facebook, Twitter, Instagram, and Snapchat users are relying more on these pictograms for communication.   Instagram’s Thomas Dimson mentioned we are watching the rise of a new language.

People string emojis together to form complete sentences and sentiments.  Snapchat and Instagram rely on pictures as their main content, which in turn serves as communication.

“Instagram itself is a means of expression that does not require the use of words. The app’s meteoric rise has largely been attributed to the power of images, the ease that comes, for instance, in looking at a photo of a sunset rather than reading a description of one.  Other companies, like Snapchat, have also risen to fame and popularity through the expressive power of images.”

Facebook and Twitter are pushing more images and videos on their own platforms.  It is a rudimentary form of communication, but it harkens back to the days of cave paintings.  People are drawn to images, because they are easy to interpret from their basic meaning and they do not have a language barrier.  A picture of a dog is still the same in Spanish or English. The only problem from using emojis is actually understanding the meaning behind them.  A smiley face is easy to interpret, but a dolphin, baseball glove, and maple leaf might need some words for clarification.

Isaac finishes that one of the reasons he resisted emojis so much was that it made him feel childish, so he reserved them for his close friends and family.  The term “childish” is subjective, just like the meaning of emojis, so as they become more widely adopted it will become more accepted.

Whitney Grace, May 12, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta