March 7, 2014
Infogistics calls itself a leading company in text analysis, document retrieval, and text extraction for various industries. One would not think that after visiting their Web site that has not been updated since 2005. The company does, however have a new vested interest in DaXtra Technologies, its new endeavor to provide content processing solutions for personnel and human resources applications.
Here is an official description from the Web site:
“For almost a decade we’ve been at the forefront of technology and solutions within our marketplace, giving our customers the competitive edge in their challenge to source the best available jobseekers, and find them quickly. Over 500 organizations, spanning all continents, use our resume analysis, matching and search products – from the world’s largest staffing companies to boutique recruiters, corporate recruitment departments, job boards and software vendors. This global reach is made possible via our multilingual CV technology which can automatically parse in over 25 different languages.”
DaXtra’s products include DaXtra Capture-a recruitment management software, DaXtra Search, DaXtra Parser-turns raw data into structured XML, DaXtra Components-to manage Web services, and DaXtra Analytics to come in 2014. The company appears to make top of the line personnel software that deletes the confusion in HR departments. What is even better is that the Web site is updated.
March 4, 2014
Bayes’s Theorem is the founding basis for predictive analytics. Gigaom’s article tries to explain how not only Bayes’s Theorem is used in predictive analytics, but there is another factor: “How the Solution To the Monty Hall Problem Is Also The Key To Predictive Analytics.”
The Monty Hall Problem is named after the Let’s Make a Deal host. Here is how it works:
“The show used what came to be known as the Monty Hall Problem, a probability puzzle named after the original host. It works like this: You choose between three doors. Behind one is a car and the other two are Zonks. You pick a door – say, door number one – and the host, who knows where the prize is, opens another door – say, door number three – which has a goat. He then asks if you want to switch doors. Most contestants assume that since they have two equivalent options, they have a 50/50 shot of winning, and it doesn’t matter whether or not they switch doors. Makes sense, right?”
If a data scientist had been on the show, he would have used Bayes’s Theorem to win the prize. The solution is to switch doors.
The Monty Hall Problem is used in business, but Bayes’s Theorem is becoming more widespread. It is used to link big data and cloud computing, which also powers predictive analytics. What follows is an explanation of the theorem’s importance and impact on business, which is not new. It ends with encouraging people to rely on Bayes over Monty Hall.
What will the next metaphor comparison be?
March 4, 2014
Did you ever think that predictive analytics would be used to determine the next singing sensation? I did not think so. “SAPVoice: How To Predict A Future Pop Star” from Forbes details how music labels are using data to find star power. The form of predictive analytics is called predictive business. Despite its immaterial aspects, music does contain many data points:
“Her record label, Universal Music Group taps thousands of data points generated daily for the artists it manages that reveal how particular customer segments are responding to them. Managers search a database of a million interview subjects, containing data on everything from where a consumer shops to the new music she prefers. With such tools at hand, YouTube won’t be the only way to find the next stars; scouts will also dig through the data.”
It is not just the music industry tapping into this new resource. Consumer goods, healthcare, technology, and manufacturing are using it to signal red flags and increase efficiency.
SAP steps in with its own predictive business model that focuses on predicting with accuracy, determining the best actions to take based on the data, and act fast on the data results. This approach has paid off for many companies.
Will the singing capitals of the world embrace SAP’s methodology? Don’t some disaffected recording moguls shoot handguns when disaffected? If the software does not deliver value, will there be gunplay at a Las Vegas intersection or maybe Wall Street if it does not pay off in the finance sector?
February 26, 2014
I read “Splunk Feels the Heat from Stronger, Cheaper Open Source Rivals.” InfoWorld is up to its old tricks again. Log files have been around for decades. Many organizations allow more recent entries to overwrite previous log files. I know that some people believe that this practice has gone the way of the dodo. Well, would you like to buy a bridge?
For those who keep log files and want to figure out what treasures nestle therein, an outfit has marketed an expensive “search” system. Splunk is the darling of many information technology gurus. In Washington, DC, I am surprised when laborers in the Federal vineyard do not sport a Splunk tattoo.
IDC’s view is that there is charge rolling down the road. The write up points out that Splunk is no longer limited. Like most information access systems, the company has expanded. In fact, the wizards at IDC parrot the jargon: Analytics. Here’s the passage I noted:
Splunk started strong and has only grown stronger as it’s branched out to become a wide-ranging analytics platform. But the free version of Splunk is quite limited, and the enterprise version’s pricing is based on the amount of data indexed, which adds up to prohibitive costs for some.
The important factoid is, in my opinion, cost. Most organizations want to reduce costs for some little understood information tasks. Making heads or tails out of the ever burgeoning and frequently overwritten log files may be at the top of the budget tightening list.
IDC, truly an expert in open source software, points out that “open source competition has been emerging in the background.” I suppose that’s why IDC is selling at $3,500 a whack analyses of open source such as this gem produced in part by IDC’s wizards. See Report 237410. Who wrote that? Worth a look I suppose.
The angle is that Graylog2 and Elasticsearch are chasing after Splunk. I am not sure if this is old news, good news, or silly news. What’s clear is that InfoWorld is covering open source and not emphasizing its deep research.
Cost control is a subtle point. I am delighted that the write up creeps up on one of the central attributes of open source software: No license fees. But what of the costs of installing, tuning, and maintaining the open source solution? Ah, not included in the write up. If you pony up $3,500 for an IDC open source report, I assume more substance is provided. Who wrote those IDC open source reports like 237410? Was it an IDC analyst, marketer, or reporter? Did the information come from another source?
Anyway, good PR for Elasticsearch. Bad PR for Splunk.
Stephen E Arnold, February 26, 2014
February 25, 2014
I read “Publishers Withdraw More than 120 Gibberish Papers.” The article reports that Springer and IEEE have begun the process of removing “computer generated nonsense.” The article explains how to create a fake paper in case you are curious. What about the papers in online services and commercial databases that contain bogus data? Do researchers discern false information?
PLOS, an open access scientific publisher, said that it would ask authors to make their data more available. You can read about this long overdue action in “PLOS’ New Data Policy: Public Access to Data.”
I wonder why the much vaunted text analysis software does not flag suspect information. Perhaps marketing is more important than accuracy?
Stephen E Arnold, February 25, 2014
February 24, 2014
One of the goslings dug up additional information on the PathAR company. The firm caught my attention with its assertion that it could identify one specific meaning content object in a large corpus.
The company’s Web site is http://www.pathar.net.
According to an SEC Form D, the executives of the company are:
- Patrick D. Butler
- Andrew Woglom, chief financial officer
- Anthony (Tony) Marshall
- Mark Jacobson.
The address for the company is listed on Form D as:
110 S. Sierra Madre Street
Colorado Springs, CO 80903.
The company is seeking a software development manager and a senior architect/developer.
The CrunchBase profile states that the company provides “leading edge analysis capabilities.”
The firm has received $500,000 in funding.
The company appears to be throwing its hat in the ring with IBM, Palantir, and Recorded Future. With Palantir still pursuing a $9 billion valuation, the smart analytics sector continues to attract innovators and entrepreneurs. The question is, “Are there enough customers to make the dozens of analytics firms profitable?”
Stephen E Arnold, February 24, 2014
February 24, 2014
Mind maps can be a valuable tool for the visual among us, and you can easily build your own virtual version with Knowledgebase Builder 2.6 from InfoRapid, based in Waiblingen, Germany. The best part—it’s free for personal use. As with most such business models, the company hopes you’ll try the freeware version and decide you can’t live without the tool in your workplace. The Professional Edition, which lets multiple users work together on the same knowledge base, goes for 99 euros (about $135 as of this writing). The price for the version with all the bells and whistles, the Enterprise Version, varies by company size, but starts at 1,000 euros (about $1,360 as I type) for a small business.
The description tells us:
“InfoRapid KnowledgeBase Builder allows you to easily create complex Mind Maps with millions of interconnected items. One single Mind Map can hold your entire knowledge, all your thoughts and ideas in a clear way. The data is stored securely in a local database file. While traditional Mind Maps don’t offer cross connections, InfoRapid KnowledgeBase Builder can connect any item with each other and label the connection lines. The program contains an archive for documents, images and web pages that may be imported and attached to any chart item or connection line.”
The six-minute video on the website demonstrates the Builder’s functionality, using as its example text about the software itself. The connection lines they mention above, which shift to adjust to new input, are reason enough to switch from pen-and-paper or MSPaint mapping techniques. Another key feature: You can link to documents or web pages from within the map, simplifying follow-through (a weak point for many of us.) The Highlighter Analysis is pretty nifty, too. Anyone curious about this tool should check out the site—the (personal use) price can’t be beat.
Cynthia Murrell, February 24, 2014
February 23, 2014
I came across a quite remarkable marketing assertion. The company using the wording is PathAR LLC, based in the midwest. Here’s what the company says:
Today 1 of the 3.8 Billion users of social media WILL impact your organization! Do you know who that 1 user is? How do we do it?
We built the world’s most advanced commercially available end-to-end solution for creating actionable intelligence from big data! Our proprietary intelligence engine powers Dunami, our web-based software platform. Dunami combines breakthrough advances in network analysis with advanced analytical techniques derived from long standing intelligence practices. Dunami’s broad capabilities are being used to Find, Understand, and Predict the behaviors of thought leaders and organizers on any topic, including identifying extremists, criminals, and others who are inciting potential violence around the globe!
When I read the statements, I wonder how predictive methods can pinpoint a single datum as the pivotal item of information.
Dunami, as a product/service name, poses some findability challenges. The name is in use for an exercise studio, a religious connotation, and a visual novel.
Is this another outfit chasing after IBM i2, Recorded Future, and the dozens of vendors listed on the Carasoft Web site?
Stephen E Arnold, February 23, 2014
February 19, 2014
I read a long report and then a handful of spin off reports about HP and Autonomy, mid February 2014 version. The Financial Times’s story is a for fee job. You can get a feel for the information in “HP Executives Knew of Autonomy’s Hardware Sales Losses: Report.” There are clever discussions of this allegedly “new information” in a number of blogs. What is interesting is an allegedly accurate chunk of information in “HP Explores Settlement of Autonomy Shareholder Lawsuit.” My head is spinning. HP buys something. Changes the person on watch when the deal was worked out. HP gets a new boss and makes changes to its board of directors. HP then accuses everyone except itself for buying Autonomy for a lot of money. HP then whips up the regulators, agitates accounting firms, and pokes Michael Lynch with a cattle prod.
As this activity was in the microwave, it appears that HP knew how the hardware/software deals were handled. If the reports are accurate, Dell hardware was more desirable than HP’s hardware.
But there is a more interesting twist. I refer you, gentle reader, to “A Fervent Defense of Frequentist Statistics.” Autonomy’s “black box” consists of Bayesian methods and what I call MCMC or Monte Carlo and Markov Chain techniques. The idea is that once some judgment calls are made, the Integrated Data Operating Layer or IDOL can chug away without human involvement. When properly resourced and trained, the Autonomy system works for certain types of content processing and information retrieval applications. You can read more about IDOL in our for-fee analysis of IDOL. This document reviews several important patents germane to the Autonomy system. You can purchase a copy of this analysis at https://gumroad.com/l/autonomy.
In a Fervent Defense, an old battle line is reactivated. The “frequentists” are not exactly thrilled with the rise of Bayesian methods. Autonomy emerged from Cambridge University when some of the Bayesian methods were revealed as crucial to World War II activities. Freqeuntists point out that there are some myths about Bayesian methods. The write up is not for MBAs, failed Web masters, and unemployed middle school teachers. For example, the myths allegedly dispelled in the article are:
- “Bayesian methods are optimal.
- Bayesian methods are optimal except for computational considerations.
- We can deal with computational constraints simply by making approximations to Bayes.
- The prior isn’t a big deal because Bayesians can always share likelihood ratios.
- Frequentist methods need to assume their model is correct, or that the data are i.i.d.
- Frequentist methods can only deal with simple models, and make arbitrary cutoffs in model complexity (aka: “I’m Bayesian because I want to do Solomonoff induction”).
- Frequentist methods hide their assumptions while Bayesian methods make assumptions explicit.
- Frequentist methods are fragile, Bayesian methods are robust.
- Frequentist methods are responsible for bad science
- Frequentist methods are unprincipled/hacky.
- Frequentist methods have no promising approach to computationally bounded inference.”
The key point is that HP is going to learn, already has learned, or learned and just forgotten that Bayesian methods are not a suitable for every single information processing application. In fact, using Bayesian when a frequentist method is more appropriate can produce unsatisfactory results for a discriminating data scientist. The use of frequentist methods when Bayesian is more appropriate can yield equally dissatisfying outputs.
The point is that if one buys a system built on one method and then applies it inappropriately, the knowledgeable user is going to be angry. It is possible that some disappointed users will take legal action, demand a license refund, or just hit the conference circuit and explain why such and such a system was a failure.
Will HP put the three ring circus of buying Autonomy to rest and then find itself mired in the jaws of a Bayesian versus frequentist dispute? My hunch is, “Yep.”
Could HP have convinced itself that Autonomy was a universal fix it kit for information processing problems? If the answer is, “Yes,” then HP is going to have to come to grips with licensees who are going to point out that the solution did not cure the problem.
In short, HP faces more excitement. The company will not be “idle” any time soon. HP may not be amused, but I am. Search is indeed a bit more difficult than some would have customers believe.
Stephen E Arnold, February 19, 2014
February 17, 2014
The article titled Elasticsearch Debuts Marvel To Deploy And Monitor Its Open Source Search And Data Analytics Technology on TechCrunch provides insight into Marvel, which the article calls a “deployment management and monitoring solution.” Elasticsearch is a technology for extracting information from structured and unstructured data and its users include such big names as Netflix, Verizon and Facebook among others. The article explains how Marvel will work to manage Elasticsearch,
“Enter Marvel, Elasticsearch’s first commercial offering, that makes it easy to run search, monitor performance, get visual views in real time and take action to fix things and improve performance. Marvel allows Elasticsearch system operators, who manage the technology at companies like Foursquare, see their Elasticsearch deployments in action, initiate instant checkup, and access historical data in context. Potential systems issues can be spotted and resolved before they become problems, and troubleshooting is faster. Pricing starts at $500 per five nodes.”
Elasticsearch reported that their revenue growth in 2013 was at over 400% and Marvel will only further their popularity. Already a user-friendly and lightweight technology, Elasticsearch is targeting developers interested in real-time discernibility of their data. Marvel may be great news for Elasticsearch and its users, but is certainly bad news for competitor Lucid Imagination.
Chelsea Kerwin, February 17, 2014