Data Insight: Common Sense Makes Sense
February 25, 2016
I am skeptical about lists of problems which hot buzzwords leave in their wake. I read “Why Data Insight Remains Elusive,” which I though was another content marketing pitch to buy, buy, buy. Not so. The write up contains some clearly expressed, common sense reminds for those who want to crunch big data and point and click their way through canned reports. Those who actually took the second semester of Statistics 101 know that ignoring the data quality and the nitty gritty of the textbook procedures can lead to bone head outputs.
The write up identifies some points to keep in mind, regardless of which analytics vendor system a person is using to make more informed or “augmented” decisions.
Here’s the pick of the litter:
- Manage the data. Yep, time consuming, annoying, and essential. Skip this step at your decision making peril.
- Manage the indexing. The buzzword is metadata, but assigning keywords and other indexing items makes the difference when trying to figure out who, what, why, when, and where. Time? Yep, metadata which not even the Alphabet Google thing does particularly well.
- Create data models. Do the textbook stuff. Get the model wrong, and what happens? Failure on a scale equivalent to fumbling the data management processes.
- Visualization is not analytics. Visualization makes outputs of numerical recipes appear in graphical form. Do not confuse Hollywood outputs with relevance, accuracy, or math on point to the problem one is trying to resolve.
- Knee jerking one’s way through analytics. Sorry, reflexes are okay but useless without context. Yep, have a problem, get the data, get the model, test, and examine the outputs.
Common sense. Most basic stuff was in the textbooks for one’s college courses. Too bad more folks did not internalize those floorboards and now seek contractors to do a retrofit. Quite an insight when the bill arrives.
Stephen E Arnold, February 25, 2016
Text Analytics Vendors for Your Retirement Fund
February 10, 2016
I located a list of companies involved in content processing. You may want to add one or more of these to your retirement investment portfolio. Which one will be the next Facebook, Google, or Uber? I know I would love to have a hat or T shirt from each of these outfits:
Api.ai
Appinions
Automated Insights
Bitext
Clueda
Cortical.io
Dataminr
DigitalGenius
Equivio
Health Fidelity
Jobandtalent
Linguasys
Medallia
MonkeyLearn
NetBase
NewBrandAnalytics
Semantic Machines
Sensai
Sentisis
Signal
Strossle
Sysomos
TEMIS (Expert System)
Texternel
Textio
Treparel
Viralheat
Wibbitz
Wit.ai
Stephen E Arnold, February 8, 2016
HP Enterprise Investigative Analytics
February 5, 2016
Shiver me timbers. Batten the hatches. There is a storm brewing in the use of Autonomy-type methods to identify risks and fraud. To be fair, HP Enterprise no longer pitches Autonomy, but the sprit of Dr. Mike Lynch’s 1990s technology is there, just a hint maybe, but definitely noticeable to one who has embraced IDOL.
For the scoop, navigate to “HPE Launches Investigative Analytics, Using AI and Big Data to Identify Risk.” I was surprised that the story’s headline did not add “When Swimming in the Data Lake.” But the message is mostly clear despite the buzzwords.
Here’s a passage I highlighted:
The software is initially geared toward financial services organizations, and it combines existing HPE products like Digital Safe, IDOL, and Vertica all on one platform. By using big data analytics and artificial intelligence, it can analyze a large amount of data and help pinpoint potential risks of fraudulent behavior.
Note the IDOL thing.
The write up added:
Investigative Analytics starts by collecting both structured sources like trading systems, risk systems, pricing systems, directories, HR systems, and unstructured sources like email and chat. It then applies analysis to query “aggressively and intelligently across all those data sources,” Patrick [HP Enterprise wizard] said. Then, it creates a behavior model on top of that analysis to look at certain communication types and see if they can define a certain problematic behavior and map back to a particular historical event, so they can look out for that type of communication in the future.
This is okay, but the words, terminology, and phrasing remind me of more than 1990 Autonomy marketing collateral, BAE’s presentations after licensing Autonomy technology in the late 1990s, the i2 Ltd. Analyst Notebook collateral, and, more recently, the flood of jabber about Palantir’s Metropolitan Platform and Thomson Reuters’ version of Metropolitan called QA Direct or QA Studio or QA fill in the blank.
The fact that HP Enterprise is pitching this new service developed with “one bank” at a legal eagle tech conference is a bit like me offering to do my Dark Web Investigative Tools lecture at Norton Elementary School. A more appropriate audience might deliver more bang for each PowerPoint slide, might it not?
Will HP Enterprise put a dent in the vendors already pounding the carpeted halls of America’s financial institutions?
HP Enterprise stakeholders probably hope so. My hunch is that a me-too, me-too product is a less than inspiring use of the collection of acquired technologies HP Enterprise appears to put in a single basket.
Stephen E Arnold, February 5, 2016
Big Data: A Shopsmith for Power Freaks?
February 4, 2016
I read an article that I dismissed. The title nagged at my ageing mind and dwindling intellect. “This is Why Dictators Love Big Data” did not ring my search, content processing, or Dark Web chimes.
Annoyed at my inner voice, I returned to the story, annoyed with the “This Is Why” phrase in the headline.
Predictive analytics are not new. The packaging is better.
I think this is the main point of the write up, but I an never sure with online articles. The articles can be ads or sponsored content. The authors could be looking for another job. The doubts about information today plague me.
The circled passage is:
Governments and government agencies can easily use the information every one of us makes public every day for social engineering — and even the cleverest among us is not totally immune. Do you like cycling? Have children? A certain breed of dog? Volunteer for a particular cause? This information is public, and could be used to manipulate you into giving away more sensitive information.
The only hitch in the git along is that this is not just old news. The systems and methods for making decisions based on the munching of math in numerical recipes has been around for a while. Autonomy? A pioneer in the 1990s. Nope. Not even the super secret use of Bayesian, Markov, and related methods during World War II reaches back far enough. Nudge the ball to hundreds of years farther on the timeline. Not new in my opinion.
I also noted this comment:
In China, the government is rolling out a social credit score that aggregates not only a citizen’s financial worthiness, but also how patriotic he or she is, what they post on social media, and who they socialize with. If your “social credit” drops below a certain level because you post anti-government messages online or because you’re socially associated with other dissidents, you could be denied credit approval, financial opportunities, job promotions, and more.
Just China? I fear not, gentle reader. Once again the “real” journalists are taking an approach which does not do justice to the wide diffusion of certain mathy applications.
Net net: I should have skipped this write up. My initial judgment was correct. Not only is the headline annoying to me, the information is par for the Big Data course.
Stephen E Arnold, February 4, 2016
Palantir: Revenue Distribution
January 27, 2016
I came across a write up in a Chinese blog about Palantir. You can find the original text at this link. I have no idea if the information are accurate, but I had not seen this breakdown before:
The chart from “Touchweb” shows that in FY 2015 privately held Palantir derives 71 percent of its revenue from commercial clients.
The report then lists the lines of business which the company offers. Again this was information I had not previously seen:
Energy, disaster recovery, consumer goods, and card services
- Retail, pharmaceuticals, media, and insurance
- Audit, legal prosecution
- Cyber security, banking
- Healthcare research
- Local law enforcement, finance
- Counter terrorism, war fighting, special forces.
Because Palantir is privately held, there is not solid, audited data available to folks in Kentucky at this time.
Nevertheless, the important point is that the Palantir search and content processing platform has a hefty valuation, lots of venture financing, and what appears to be a diversified book of business.
Stephen E Arnold, January 27, 2016
Cheerleading for the SAS Text Exploration Framework
January 27, 2016
SAS is a stalwart in the number crunching world. I visualize the company’s executives chatting among themselves about the Big Data revolution, the text mining epoch, and the predictive analytics juggernaut.
Well, SAS is now tapping that staff interaction.
Navigate to “To Data Scientists and Beyond! One of Many Applications of Text Analytics.” There is an explanation of the ease of use of SAS. Okay, but my recollection was that I had to hire a PhD in statistics from Cornell University to chase down the code which was slowing our survivability analyses to meander instead of trot.
I learned:
One of the misconceptions I often see is the expectation that it takes a data scientist, or at least an advanced degree in analytics, to work with text analytics products. That is not the case. If you can type a search into a Google toolbar, you can get value from text analytics.
The write up contains a screenshot too. Where did the text analytics plumbing come from? Perchance an acquisition in 2008 like the canny purchase Teragram’s late 1990s technology?
The write up focuses on law enforcement and intelligence applications of text analytics. I find that interesting because Palantir is allegedly deriving more than 60 percent of the firm’s revenue from commercial customers like JP Morgan and starting to get some traction in health care.
Check out the screenshot. That is worth 1,000 words. SAS has been working on the interface thing to some benefit.
Stephen E Arnold, January 27, 2016
Dark Web and Tor Investigative Tools Webinar
January 5, 2016
Telestrategies announced on January 4, 2016, a new webinar for active LEA and intel professionals. The one hour program is focused on tactics, new products, and ongoing developments for Dark Web and Tor investigations. The program is designed to provide an overview of public, open source, and commercial systems and products. These systems may be used as standalone tools or integrated with IBM i2 ANB or Palantir Gotham. More information about the program is available from Telestrategies. There is no charge for the program. In 2016, Stephen E Arnold’s new Dark Web Notebook will be published. More information about the new monograph upon which the webinar is based may be obtained by writing benkent2020 at yahoo dot com.
Stephen E Arnold, January 5, 2016
Text Analytics Jargon: You Too Can Be an Expert
December 22, 2015
Want to earn extra money as a text analytics expert? Need to drop some cool terms like Latent Dirichlet Allocation at a holiday function? Navigate to “Text Analytics: 15 Terms You Should Know Surrounding ERP.” The article will make clear some essential terms. I am not sure the enterprise resource planning crowd will be up to speed on probabilistic latent semantic analysis, but the buzzword will definitely catch everyone’s attention. If you party in certain circles, you might end up with a consulting job at mid tier services firm or, better yet, land several million in venture funding to dance with Dirichlet.
Stephen E Arnold, December 22, 2015
Palantir Profile: Search Plus Add Ons
November 25, 2015
Short honk: If you read French, you will learn quite a bit about Palantir, an interesting company with a $20 billion valuation. The write up is “Palantir et la France : naissance d’une nouvelle théorie abracadabrantesque ? An listicle in the heart of the article provides a good run down of the system’s search and content processing capabilities. Yep, search. The difference between Palantir and outfits like Attivio, Coveo, Smartlogic, et al is the positioning, the bundle of technology, and – oh, did I mention the $20 billion valuation? I do like the abracadabra reference. Magic?
Stephen E Arnold, November 25, 2015
Inferences: Check Before You Assume the Outputs Are Accurate
November 23, 2015
Predictive software works really well as long as the software does not have to deal with horse races, the stock market, and the actions of single person and his closest pals.
“Inferences from Backtest Results Are False Until Proven True” offers a useful reminder to those who want to depend on algorithms someone else set up. The notion is helpful when the data processed are unchecked, unfamiliar, or just assumed to be spot on.
The write up says:
the primary task of quantitative traders should be to prove specific backtest results worthless, rather than proving them useful.
What throws backtests off the track? The write up provides a useful list of reminders:
- Data-mining and data snooping bias
- Use of non tradable instruments
- Unrealistic accounting of frictional effects
- Use of the market close to enter positions instead of the more realistic open
- Use of dubious risk and money management methods
- Lack of effect on actual prices
The author is concerned about financial applications, but the advice may be helpful to those who just want to click a link, output a visualization, and assume the big spikes are really important to the decision you will influence in one hour.
One point I highlighted was:
Widely used strategies lose any edge they might have had in the past.
Degradation occurs just like the statistical drift in Bayesian based systems. Exciting if you make decisions on outputs known to be flawed. How is that automatic indexing, business intelligence, and predictive analytics systems working?
Stephen E Arnold, November 23, 2015