November 6, 2014
Here’s an interesting development from the world of text-processing technology. GeekWire reports, “Microsoft and Amazon Vets Form Textio, a New Startup Looking to Discover Patterns in Documents.” The new company expects to release its first product next spring. Writer John Cook tells us:
“Kieran Snyder, a linguistics expert who previously worked at Amazon and Microsoft’s Bing unit, and Jensen Harris, who spent 16 years at Microsoft, including stints running the user experience team for Windows 8, have a formed a new data visualization startup by the name of Textio.
“The Seattle company’s tagline: ‘Turn business text into insights.’ The emergence of the startup was first reported by Re/code, which noted that the Textio tool could be used by companies to scour job descriptions, performance reviews and other corporate HR documents to uncover unintended discrimination. In fact, Textio was formed after Snyder conducted research on gender bias in performance reviews in the tech industry.”
That is an interesting origin, especially amid the discussions about gender that currently suffuse the tech community. Textio sees much room for improvement in text analytics, and hopes to help clients reach insights beyond those competing platforms can divine. CEO Snyder’s doctorate and experience in linguistics and cognitive science should give the young company an edge in the competitive field.
Cynthia Murrell, November 06, 2014
October 31, 2014
I received a notice about new conference called “The First International Conference on Predictive APIs and Apps.” According to the write up I saw:
Several companies who are building predictive APIs and tools to make predictive app development easier will be at PAPIs (BigML, Datagami, Dataiku, Indico, Intuitics, GraphLab, Openscoring, PredictionIO, RapidMiner, Yhat). We’re expecting to see both actual and potential users who will share and learn how to use these products. Newcomers will learn and get inspiration from the keynotes, showcases and practical “predictive for all” user stories. Experts will also be interested in the sessions on technical challenges and in the panel discussion on the future of predictive APIs.
A number of search and content processing vendors suggest they deliver advanced analytics. Text analytics vendors are either feeding data into predictive engines or delivering outputs that are predictive.
Are predictive analytics one of the next big things? If so, traditional information retrieval and content processing companies are likely to be attending this conference on November 17 and 18, 2014.
At this time, IBM and Microsoft are on the program.
IBM will be addressing “intelligent APIs.” In the abstract for his talk, I did not see a reference to Watson. Microsoft’s talk abstract is not on the program page as of October 30, 2014.
Worth attending if you in the Barcelona area.
Stephen E Arnold, October 31, 2014
October 31, 2014
The article on Fortune titled The Company Was In a Death Spiral. She Brought It Back From the Brink lauds the work of Penny Herscher at data analytics firm FirstRain. Herscher took over the company in 2004 after successful work at Cadence Design Systems, Simplex and Texas Instruments. FirstRain was a bankrupt company with a great prototype but no product. Herscher embraced the challenges posed by FirstRain and began her overhaul with a move from New York to California. The article goes on,
“She raised $20 million from new investors and hired a trusted team, including chief operating officer Y.Y. Lee, a mathematician and software engineer… Today, more than 50% of FirstRain’s senior leadership is women. The fledgling company had barely started developing a product when storms began brewing on the horizon. It was 2008. The global economy was beginning to collapse. “The wheels came off the bus,” Herscher says with lament. To survive, the company had to completely change course again…It pulled through.”
But only after major lay-offs and changes in the structure. Today FirstRain customers include IBM and Cisco, and it is only continuing to grow, with new offices in San Mateo. Herscher’s story of success is one of commitment and creative problem-solving.
Chelsea Kerwin, October 31, 2014
October 29, 2014
I read “72 Hours of #Gamergate.” I don’t follow the high buck world of video games. The write up contains oodles of data. Some of the information is in the form of bar charts. Other information is presented in words, spreadsheets, and graphics. I am okay with the bar charts. These have labels and numbers on the x and y axes. The visualization show below baffles me:
The image adds graphic impact. I have been in briefings in which senior executives and military brass have presented similar visualizations.
I suppose clarity is less important than sizzle. Analytics vendors, are you listening? I think not if this graphic is any indication of the way data are presented.
Stephen E Arnold, October 29, 2014
October 28, 2014
I found the Attensity blog post “Attensity Takes Utah Tech Week” quite interesting. I cannot recall when mainstream content processing companies embraced hackathons so fiercely.
The blog post explains:
A hackathon, for the uninitiated, is exactly what it sounds like: a hybrid of computer hacking and a marathon in a grueling, caffeine-fueled, 12-hour time period. Groups comprised of mostly engineers and IT whizzes compete against the clock and other teams to create a project to present at the of the day to a panel of judges.
What did Attensity’s engineers build to showcase the company’s sentiment analysis and analytics technologies? Here’s the Attensity description:
With the Twitter API up and running, Team Attensity used Raspberry Pi to process tweets using #obama and #utahtechweek. Simultaneously, the team used Arduino to code sentiments from the tweets using a red light for negative sentiments, blue for positive sentiments, and yellow for neutral sentiments.
Attensity was pleased with the outcome in Utah. More hackathons are in the firm’s future. I wonder if one can deploy IBM Watson using a Raspberry Pi or showcase HP Autonomy with an Arduino.
How will hackathons generate revenue? I am not sure. The effort seems like a cost hole to me.
Stephen E Arnold, October 28, 2014
October 28, 2014
I learned about a new book that will be available in early 2015. Its title is The Black Box Society: The Secret Algorithms That Control Money and Information. The author is Frank Pasquale, a professor of law at the University of Maryland.
The Harvard promotional Web site for the book asserts:
Hidden algorithms can make (or ruin) reputations, decide the destiny of entrepreneurs, or even devastate an entire economy. Shrouded in secrecy and complexity, decisions at major Silicon Valley and Wall Street firms were long assumed to be neutral and technical. But leaks, whistleblowers, and legal disputes have shed new light on automated judgment. Self-serving and reckless behavior is surprisingly common, and easy to hide in code protected by legal and real secrecy. Even after billions of dollars of fines have been levied, underfunded regulators may have only scratched the surface of this troubling behavior.
The Institute for Ethics and Emerging Technologies mentioned the forthcoming book here. One of the comments about that post was interesting to me. TooManyJoes wrote:
The control of the results by the decision makers is what makes this future menacing. Right now, Google is under attack being too good at search prediction and making money on targeted advertisements whose brilliantly written algorithms allow such a sophisticated variety of information to be indexed. As a result search bubbles have formed, and a lack of statistics comprehension prevents the awareness of control over this new medium. Snake oil salesmen turned into Mad Men and psychiatrists, it’s the medium of internet based controlled by one snake oil salesman that frightens us all. I believe it’s not possible without a formal computational human algorithm to have enough of an impact to have widespread influence. I bring up these mediums because to engage in them is to participate, participation can be tracked, then imagine the expense of the things we have access to because free participation drives those products and services by up selling those products. Without education, which most people won’t be open to, and time for the common man to analyze the data…those in control of the data will be people delegated by others. Welcome to the age of transparency.
The Google reference may presage some discussion of the company’s predictive wizardry.
Stephen E Arnold, October 28, 2014
October 24, 2014
Here in Harrod’s Creek, Kentucky there is not too much chatter about machine learning. It is hunting season. Time to get out the Barrett Automatic Rifle and go hunting for varmints.
Sundown yesterday when calm returned to the hollow, I read “Machine-Learning Maestro Michael Jordan on the Delusions of Big Data and Other Huge Engineering Efforts.”
My thought after reading the IEEE article was that I was really tired of the artificial intelligence yap yap. Now a whiz at UCal Berkeley is pointing out that some of the methods are a “cartoon.”
The Dr. Michael Jordan says:
I think data analysis can deliver inferences at certain levels of quality. But we have to be clear about what levels of quality. We have to have error bars around all our predictions. That is something that’s missing in much of the current machine learning literature.,,if people use data and inferences they can make with the data without any concern about error bars, about heterogeneity, about noisy data, about the sampling pattern, about all the kinds of things that you have to be serious about if you’re an engineer and a statistician—then you will make lots of predictions, and there’s a good chance that you will occasionally solve some real interesting problems. But you will occasionally have some disastrously bad decisions. And you won’t know the difference a priori. You will just produce these outputs and hope for the best. And so that’s where we are currently.
In short, marketing hyperbole takes precedence over the plodding realities of the steps required of a person aspiring to a PhD in statistics is supposed to follow.
With regard to the applications that deliver predictive outputs, Dr. Jordan says:
But unless you’re actually doing the full-scale engineering statistical analysis to provide some error bars and quantify the errors, it’s gambling. It’s better than just gambling without data. That’s pure roulette. This is kind of partial roulette.
I strongly recommend you read the interview. I would not involve a search or content processing marketer in the exercise, however.
Stephen E Arnold, October 24, 2014
October 21, 2014
Navigate to Thinglink. At this location is an example of the type of graphic that can be generated with output from Watson, IBM’s next big thing. A graphic artist has taken the data and created an eye snapping infographic. How many other systems can generate this type of output? Quite a few if the information in my analytics files are representative. Is it necessary to use IBM Watson when Microsoft Excel and an open source tool like Tableau are available? IBM Watson analyzed 135 million tweets from 10 countries in Central and South America. Brazil was excluded.
Twitter said in 2013:
Brazil is one of our largest markets with a strong user base. Twitter has already become an important part of our lives in Brazil and, by strengthening our local presence, we plan to continue delighting our users as well as creating new opportunities for marketers who want to connect with them.
Perhaps I overlooked Brazil. No big deal.
Stephen E Arnold, October 22, 2014
October 21, 2014
Curious about Hewlett Packard’s Autonomy APIs? You can see the list of 33 at IdolOnDemand.com. If you are curious about Autonomy’s Big Data capabilities, you may be puzzled about the lack of explicit analytics application programming interfaces. Don’t be. The savvy developer selects operations, takes outputs, and pumps the data into a search based application, third party number crunching system, a data management system, or plain old Excel. What’s interesting is that the naming of the APIs makes clear the search-centric nature of Autonomy. The marketing of IDOL as a service or a cloud solution shifts attention away from search in my view.
Stephen E Arnold, October 21, 2014
October 13, 2014
How much data are available for teen demographics, popular music sales by genre and medium, downloads from iTunes and Amazon, the music trade associations, and myriad other sources. If there is one industry with data, lots of data, isn’t it the music business?
I read “No One Knows How Teens Listen to Music.” The information is surprising. I thought we lived in the world of Big Data. With flashy algorithms and lots of zeros and ones, the secrets of the universe are exposed. Business strategists and entrepreneurs would flourish. The world would be a better place. Isn’t that what Big Data marketers suggest?
Here’s a passage I noted:
Fast forward to 2014. Nielsen’s recent analysis of the music industry at large showed a six-percent decrease in digital music sales and a 32-percent increase in overall streaming. According to the company, these changes were largely… because of teens. As Martin Pyykkonnen, an analyst at Wedge Partners, told Yahoo last year, “Young people today don’t buy music anymore.” Except maybe they do, according to the Piper Jaffray report. Or maybe they don’t buy MP3s but do download them. Or maybe they don’t download them but do listen to them.
So lots of data about music and teens. We learn, “All the major surveys disagree. Maybe it’s a secret.”
Yep, Big Data delivers. Oh, how about those Ebola predictions?
Stephen E Arnold, October 14, 2014