How Data Science Pervades
May 2, 2017
We think Information Management may be overstating a bit with the headline, “Data Science Underlies Everything the Enterprise Now Does.” While perhaps not underpinning quite “everything,” the use of data analysis has indeed spread throughout many companies (especially larger ones).
Writer Michael O’Connell cites a few key developments over the last year alone, including the rise of representative data, a wider adoption of predictive analysis, and the refinement of customer analytics. He predicts, even more, changes in the coming year, then uses a hypothetical telecom company for a series of examples. He concludes:
You’ll note that this model represents a significant broadening beyond traditional big data/analytics functions. Such task alignment and comprehensive integration of analytics functions into specific business operations enable high-value digital applications ranging far beyond our sample Telco’s churn mitigation — cross-selling, predictive and condition-based maintenance, fraud detection, price optimization, and logistics management are just a few areas where data science is making a huge difference to the bottom line.
See the article for more on the process of turning data into action, as illustrated with the tale of that imaginary telecom’s data-wrangling adventure.
Cynthia Murrell, May 2, 2017
Enterprise Search and a Chimera: Analytical Engines
May 1, 2017
I put on my steam punk outfit before reading “Leading Analytical Engines for Enterprise Search.” Now there was one small factual error; specifically, the Google Search Appliance is a goner. When it was alive and tended to by authorized partners, it was not particularly adept at doing “analytical engine” type things.
What about the rest of the article? Well, I found it amusing.
Let me get to the good stuff and then deal with the nasty reality which confronts the folks who continue to pump money into enterprise search.
What companies does this “real journalism” out identify as purveyors of high top shoes for search. Yikes, sorry. I meant to say enterprise search systems which do analytical engine things.
Here’s the line up:
The Google Search Appliance. As noted, this is a goner. Yep, the Google threw in the towel. Lots of reasons, but my sources say, cost of sales was a consideration. Oh, and there were a couple of Google FTEs plus assorted costs for dealing with those annoyed with the product’s performance, relevance, customization, etc. Anyway. Museum piece.
Microsoft SharePoint. I find this a side splitter. Microsoft SharePoint is many things. In fact, armed with Visual Studio one can actually make the system work in a useful manner. Don’t tell the HR folks who wonder why certified SharePoint experts chew up a chunk of the budget and “fast.” Insider joke. Yeah, Excel is the go to analysis tool no matter what others may say. The challenge is to get the Excel thing to interact in a speedy, useful way with whatever the SharePoint administrator has managed to get working in a reliable way. Nuff said.
Coveo. Interesting addition to the list because Coveo is doing the free search thing, the Salesforce thing, the enterprise search thing, the customer support thing, and I think a bunch of other things. The Canadian outfit wants to do more than surf on government inducements, investors’ trust and money, and a key word based system. So it’s analytical engine time. I am not sure how the wrappers required to make key word search do analytics help out performance, but the company says it is an “analytical engine.” So be it.
Attivio. This is an interesting addition. The company emerged from some “fast” movers and shakers. The baseball data demo was nifty about six years ago. Now the company does search, publishing, analytics, etc. The shift from search to analytical engine is somewhat credible. The challenge the company faces is closing deals and generating sustainable revenue. There is that thing called “open source”. A clever programmer can integrate Lucene (Elasticsearch), use its open source components, and maybe dabble with Ikanow. The result? Perhaps an Attivio killer? Who knows.
Lucidworks (Really?). Yep, this is the Avis to the Hertz in the open source commercial sector. Lucidworks (Really?) is now just about everything sort of associated with Big Data, search, smart software, etc. The clear Lucid problem is Shay Bannon and Elastic. Not only does Elastic have more venture money, Elastic has more deployments and, based on information available to me, more revenue, partners, and clout in the open source world. Lucidworks (Really?) has a track record of executive and founder turnover and the thrill of watching Amazon benefit from a former Lucid employee’s inputs. Exciting. Really?
So what do I think of this article in CIO Review? Two things:
- It is not too helpful to me and those looking for search solutions in Harrod’s Creek, Kentucky. The reason? The GSA error and gasping effort to make key word search into something hot and cool. “Analytical engines” does not rev my motor. In fact, it does not turn over.
- CIO Review does not want me to copy a quote from the write up. Tip to CIO Review. Anyone can copy wildly crazy analytical engines article by viewing source and copying the somewhat uninteresting content.
Stephen E Arnold, May 1, 2017
Amazon Aims to Ace the Chatbots
April 26, 2017
Amazon aims to insert itself into every aspect of daily life and the newest way it does is with the digital assistant Alexa. Reuters reports that, “Amazon Rolls Out Chatbot Tools In Race To Dominate Voice-Powered Tech,” explaining how Amazon plans to expand Alexa’s development. The retail giant recently released the technology behind Alexa to developers, so they can build chat features into apps.
Amazon is eager to gain dominance in voice-controlled technology. Apple and Google both reign supreme when it comes to talking computers, chatbots, and natural language processing. Amazon has a huge reach, perhaps even greater than Apple and Google, because people have come to rely on it for shopping. Chatbots have a notorious history for being useless and Microsoft’s Tay even turned into a racist, chauvinist program.
The new Alexa development tool is called Alexa Lex, which is hosted on the cloud. Alexa is already deployed in millions of homes and it is fed a continuous data stream that is crucial to the AI’s learning:
Processing vast quantities of data is key to artificial intelligence, which lets voice assistants decode speech. Amazon will take the text and recordings people send to apps to train Lex – as well as Alexa – to understand more queries.
That could help Amazon catch up in data collection. As popular as Amazon’s Alexa-powered devices are, such as Echo speakers, the company has sold an estimated 10 million or more.
Amazon Alexa is a competent digital assistant, able to respond to vocal commands and even offers voice-only shop via Amazon. As noted, Alexa’s power rests in its data collection and ability to learn natural language processing. Bitext uses a similar method but instead uses trained linguists to build its analytics platform.
Whitney Grace, April 26, 2017
Smart Software, Dumb Biases
April 17, 2017
Math is objective, right? Not really. Developers of artificial intelligence systems, what I call smart software, rely on what they learned in math school. If you have flipped through math books ranging from the Googler’s tome on artificial intelligence Artificial Intelligence: A Modern Approach to the musings of the ACM’s journals, you see the same methods recycled. Sure, the algorithms are given a bath and their whiskers are cropped. But underneath that show dog’s sleek appearance, is a familiar pooch. K-means. We have k-means. Decision trees? Yep, decision trees.
What happens when developers feed content into Rube Goldberg machines constructed of mathematical procedures known and loved by math wonks the world over?
The answer appears in “Semantics Derived Automatically from Language Corpora Contain Human Like Biases.” The headline says it clearly, “Smart software becomes as wild and crazy as a group of Kentucky politicos arguing in a bar on Friday night at 2:15 am.”
Biases are expressed and made manifest.
The article in Science reports with considerable surprise it seems to me:
word embeddings encode not only stereotyped biases but also other knowledge, such as the visceral pleasantness of flowers or the gender distribution of occupations.
Ah, ha. Smart software learns biases. Perhaps “smart” correlates with bias?
The canny whiz kids who did the research crawfish a bit:
We stress that we replicated every association documented via the IAT that we tested. The number, variety, and substantive importance of our results raise the possibility that all implicit human biases are reflected in the statistical properties of language. Further research is needed to test this hypothesis and to compare language with other modalities, especially the visual, to see if they have similarly strong explanatory power.
Yep, nothing like further research to prove that when humans build smart software, “magic” happens. The algorithms manifest biases.
What the write up did not address is a method for developing less biases smart software. Is such a method beyond the ken of computer scientists?
To get more information about this question, I asked on the world leader in the field of computational linguistics, Dr. Antonio Valderrabanos, the founder and chief executive officer at Bitext. Dr. Valderrabanos told me:
We use syntactic relations among words instead of using n-grams and similar statistical artifacts, which don’t understand word relations. Bitext’s Deep Linguistics Analysis platform can provide phrases or meaningful relationships to uncover more textured relationships. Our analysis will provide better content to artificial intelligence systems using corpuses of text to learn.
Bitext’s approach is explained in the exclusive interview which appeared in Search Wizards Speak on April 11, 2017. You can read the full text of the interview at this link and review the public information about the breakthrough DLA platform at www.bitext.com.
It seems to me that Bitext has made linguistics the operating system for artificial intelligence.
Stephen E Arnold, April 17, 2017
Bitext: Exclusive Interview with Antonio Valderrabanos
April 11, 2017
On a recent trip to Madrid, Spain, I was able to arrange an interview with Dr. Antonio Valderrabanos, the founder and CEO of Bitext. The company has its primary research and development group in Las Rosas, the high-technology complex a short distance from central Madrid. The company has an office in San Francisco and a number of computational linguists and computer scientists in other locations. Dr. Valderrabanos worked at IBM in an adjacent field before moving to Novell and then making the jump to his own start up. The hard work required to invent a fundamentally new way to make sense of human utterance is now beginning to pay off.
Dr. Antonio Valderrabanos, founder and CEO of Bitext. Bitext’s business is growing rapidly. The company’s breakthroughs in deep linguistic analysis solves many difficult problems in text analysis.
Founded in 2008, the firm specializes in deep linguistic analysis. The systems and methods invented and refined by Bitext improve the accuracy of a wide range of content processing and text analytics systems. What’s remarkable about the Bitext breakthroughs is that the company support more than 40 different languages, and its platform can support additional languages with sharp reductions in the time, cost, and effort required by old-school systems. With the proliferation of intelligent software, Bitext, in my opinion, puts the digital brains in overdrive. Bitext’s platform improves the accuracy of many smart software applications, ranging from customer support to business intelligence.
In our wide ranging discussion, Dr. Valderrabanos made a number of insightful comments. Let me highlight three and urge you to read the full text of the interview at this link. (Note: this interview is part of the Search Wizards Speak series.)
Linguistics as an Operating System
One of Dr. Valderrabanos’ most startling observations addresses the future of operating systems for increasingly intelligence software and applications. He said:
Linguistic applications will form a new type of operating system. If we are correct in our thought that language understanding creates a new type of platform, it follows that innovators will build more new things on this foundation. That means that there is no endpoint, just more opportunities to realize new products and services.
Better Understanding Has Arrived
Some of the smart software I have tested is unable to understand what seems to be very basic instructions. The problem, in my opinion, is context. Most smart software struggles to figure out the knowledge cloud which embraces certain data. Dr. Valderrabanos observed:
Search is one thing. Understanding what human utterances mean is another. Bitext’s proprietary technology delivers understanding. Bitext has created an easy to scale and multilingual Deep Linguistic Analysis or DLA platform. Our technology reduces costs and increases user satisfaction in voice applications or customer service applications. I see it as a major breakthrough in the state of the art.
If he is right, the Bitext DLA platform may be one of the next big things in technology. The reason? As smart software becomes more widely adopted, the need to make sense of data and text in different languages becomes increasingly important. Bitext may be the digital differential that makes the smart applications run the way users expect them to.
Snap In Bitext DLA
Advanced technology like Bitext’s often comes with a hidden cost. The advanced system works well in a demonstration or a controlled environment. When that system has to be integrated into “as is” systems from other vendors or from a custom development project, difficulties can pile up. Dr. Valderrabanos asserted:
Bitext DLA provides parsing data for text enrichment for a wide range of languages, for informal and formal text and for different verticals to improve the accuracy of deep learning engines and reduce training times and data needs. Bitext works in this way with many other organizations’ systems.
When I asked him about integration, he said:
No problems. We snap in.
I am interested in Bitext’s technical methods. In the last year, he has signed deals with companies like Audi, Renault, a large mobile handset manufacturer, and an online information retrieval company.
When I thanked him for his time, he was quite polite. But he did say, “I have to get back to my desk. We have received several requests for proposals.”
Las Rosas looked quite a bit like Silicon Valley when I left the Bitext headquarters. Despite the thousands of miles separating Madrid from the US, interest in Bitext’s deep linguistic analysis is surging. Silicon Valley has its charms, and now it has a Bitext US office for what may be the fastest growing computational linguistics and text analysis system in the world. Worth watching this company I think.
For more about Bitext, navigate to the firm’s Web site at www.bitext.com.
Stephen E Arnold, April 11, 2017
Diffeo Incorporates Meta Search Technology
March 24, 2017
Will search-and-discovery firm Diffeo’s recent acquisition give it the edge? Yahoo Finance shares, “Diffeo Acquires Meta Search and Launches New Offering.” Startup Meta Search developed a local computer and cloud search system that uses smart indexing to assign index terms and keep the terms consistent. Diffeo provides a range of advanced content processing services based on collaborative machine intelligence. The press release specifies:
Diffeo’s content discovery platform accelerates research analysts by applying text analytics and machine intelligence algorithms to users’ in-progress files, so that it can recommend content that fills in knowledge gaps — often before the user thinks of searching. Diffeo acts as a personal research assistant that scours both the user’s files and the Internet. The company describes its technology as collaborative machine intelligence.
Diffeo and Meta’s services complement each other. Meta provides unified search across the content on all of a user’s cloud platforms and devices. Diffeo’s Advanced Discovery Toolbox displays recommendations alongside in-progress documents to accelerate the work of research analysts by uncovering key connections.
Meta’s platform integrates cloud environments into a single keyword search interface, enabling users to search their files on all cloud drives, such as Dropbox, Google Drive, Slack and Evernote all at once. Meta also improves search quality by intelligently analyzing each document, determining the most important concepts, and automatically applying those concepts as ‘Smart Tags’ to the user’s documents.
This seems like a promising combination. Founded in 2012, Diffeo made Meta Search its first acquisition on January 10 of this year. The company is currently hiring. Meta Search, now called Diffeo Cloud Search, is based in Boston.
Cynthia Murrell, March 24, 2017
Big Data: The Crawfish Approach to Meaningful Information
March 21, 2017
Have you ever watched a crawfish (sometimes called a crawdad or a crayfish) get away from trouble. The freshwater crustaceans can go backwards. Members of the members of the Astacidae can be found in parts of the south, so you will have to wander in a Georgia swamp to check out the creature’s behavior.
The point is that crawfish go backwards to protect themselves and achieve their tiny lobster like goals. Big time consultants also crawfish in order to sell more work and provide “enhanced” insight into a thorny business or technical problem other consultants have created.
To see this in action, navigate to “The Conundrum of Big Data.” A super consultant explains that Big Data is not exactly the home run, silver bullet, or magic potion some lesser consultants said Big Data would be. I learned:
Despite two decades of intensive IT investment in data [mining] applications, recent studies show that companies continue to have trouble identifying metrics that can predict and explain performance results and/or improve operations. Data mining, the process of identifying patterns and structures in the data, has clear potential to identify prescriptions for success but its wide implementation fails systematically. Companies tend to deploy ‘unsupervised-learning’ algorithms in pursuit of predictive metrics, but this automated [black box] approach results in linking multiple low-information metrics in theories that turn out to be improbably complex.
Big surprise. For folks who are not trained in the nuts and bolts of data analysis and semi fancy math, Big Data is a giant vacuum cleaner for money. The cash has to pay for “experts,” plumbing, software, and more humans. The outputs are often fuzzy wuzzy probabilities which more “wizards” interpret. Think of a Greek religious authority looking at the ancient equivalent of road kill.
The write up cites the fizzle that was Google Flu Trends. Cough. Cough. But even that sneeze could be fixed with artificial intelligence. Yep, when smart humans make mistakes, send in smart software. That will work.
In my opinion, the highlight of the write up was this passage:
When it comes to data, size isn’t everything because big data on their own cannot just solve the problem of ‘insight’ (i.e. inferring what is going on). The true enablers are the data-scientists and statisticians who have been obsessed for more than two centuries to understand the world through data and what traps lie in wait during this exercise. In the world of analytics (AaaS), it is agility (using science, investigative skills, appropriate technology), trust (to solve the client’s real business problems and build collateral), and ‘know-how’ (to extract intelligence hidden in the data) that are the prime ‘assets’ for competing, not the size of the data. Big data are certainly here but big insights have yet to arrive.
Yes. More consulting is needed to make those payoffs arrive. But first, hire more advisers. What could possibly go wrong? Cough. Sneeze. One goes forwards with Big Data by going backwards for more analysis.
Stephen E Arnold, March 21, 2017
Big Data Requires More Than STEM Skills
March 13, 2017
It will require training Canada’s youth in design and the arts, as well as STEM subjects if that country is to excel in today’s big-data world. That is the advice of trio of academic researchers in that country, Patricio Davila, Sara Diamond, and Steve Szigeti, who declare, “There’s No Big Data Without Intelligent Interface” at the Globe and Mail. The article begins by describing why data management is now a crucial part of success throughout society, then emphasizes that we need creative types to design intuitive user interfaces and effective analytics representations. The researchers explain:
Here’s the challenge: For humans, data are meaningless without curation, interpretation and representation. All the examples described above require elegant, meaningful and navigable sensory interfaces. Adjacent to the visual are emerging creative, applied and inclusive design practices in data “representation,” whether it’s data sculpture (such as 3-D printing, moulding and representation in all physical media of data), tangible computing (wearables or systems that manage data through tactile interfaces) or data sonification (yes, data can make beautiful music).
Infographics is the practice of displaying data, while data visualization or visual analytics refers to tools or systems that are interactive and allow users to upload their own data sets. In a world increasingly driven by data analysis, designers, digital media artists, and animators provide essential tools for users. These interpretive skills stand side by side with general literacy, numeracy, statistical analytics, computational skills and cognitive science.
We also learn about several specific projects undertaken by faculty members at OCAD University, where our three authors are involved in the school’s Visual Analytics Lab. For example, the iCity project addresses transportation network planning in cities, and the Care and Condition Monitor is a mobile app designed to help patients and their healthcare providers better work together in pursuit of treatment goals. The researchers conclude with an appeal to their nation’s colleges and universities to develop programs that incorporate data management, data numeracy, data analysis, and representational skills early and often. Good suggestion.
Cynthia Murrell, March 13, 2017
To Make Data Analytics Sort of Work: Attention to Detail
March 10, 2017
I read “The Much-Needed Business Facet for Modern Data Integration.” The write up presents some useful information. Not many of the “go fast and break things” crowd will relate to some of the ideas and suggestions, but I found the article refreshing.
What does one do to make modern data centric activities sort of work? The answers are ones that I have found many more youthful wizards often elect to ignore.
Here they are:
- Do data preparation. Yikes. Normalization of data. I have fielded this question in the past, “Who has time for that?” Answer: Too few, gentle reader. Too few.
- Profile the data. Another gasp. In my experience it is helpful to determine what data are actually germane to the goal. Think about the polls for the recent
- Create data libraries. Good idea. But it is much more fun to just recreate data sets. Very Zen like.
- Have rules which are now explained as “data governance.” The jargon does not change the need for editorial and data guidelines.
- Take a stab at data quality. This is another way of saying, “Clean up the data.” Even whiz bang modern systems are confused with differences like I.B.M and International Business Machines or numbers with decimal points in the incorrect place.
- Get colleagues in the game. This is a good idea, but in many organizations in which I have worked “team” is spelled “my bonus.”
Useful checklist. I fear that those who color unicorns will not like the dog work which accompanies implementing the ideas. That’s what makes search and content processing so darned interesting.
Stephen E Arnold, March 10, 2017
Intelligence Industry Becoming Privatized and Concentrated
March 10, 2017
Monopolies aren’t just for telecoms and zipper manufacturers. The Nation reveals a much scarier example in its article, “5 Corporations Now Dominate Our Privatized Intelligence Industry.” Reporter Tim Shorrock outlines the latest merger that brings us to this point, one between Pentagon & NSA contractor Leidos Holdings and a division of Lockheed Martin called Information Systems and Global Solutions. Shorrock writes:
The sheer size of the new entity makes Leidos one of the most powerful companies in the intelligence-contracting industry, which is worth about $50 billion today. According to a comprehensive study I’ve just completed on public and private employment in intelligence, Leidos is now the largest of five corporations that together employ nearly 80 percent of the private-sector employees contracted to work for US spy and surveillance agencies.
Yes, that’s 80 percent. For the first time since spy agencies began outsourcing their core analytic and operational work in the late 1990s, the bulk of the contracted work goes to a handful of companies: Leidos, Booz Allen Hamilton, CSRA, SAIC, and CACI International. This concentration of ‘pure plays’—a Wall Street term for companies that makes one product for a single market—marks a fundamental shift in an industry that was once a highly diverse mix of large military contractors, small and medium technology companies, and tiny ‘Beltway Bandits’ surrounding Washington, D.C.
I should mention that our beloved leader, Stephen E Arnold, used to work as a gopher for one of these five companies, Booz Allen Hamilton. Shorrock details the reasons such concentrated power is a problem in the intelligence industry, and shares the profile he has made on each company. He also elaborates on the methods he used to analyze the shadowy workforce they employ. (You’ll be unsurprised to learn it can be difficult to gather data on intelligence workers.) See the article for those details, and for Shorrock’s discussion of negligence by the media and by Congress on this matter. We can agree that most folks don’t seem to be aware of this trend, or of its potential repercussions.
Cynthia Murrell, March 10, 2016