Bitext: Exclusive Interview with Antonio Valderrabanos

April 11, 2017

On a recent trip to Madrid, Spain, I was able to arrange an interview with Dr. Antonio Valderrabanos, the founder and CEO of Bitext. The company has its primary research and development group in Las Rosas, the high-technology complex a short distance from central Madrid. The company has an office in San Francisco and a number of computational linguists and computer scientists in other locations. Dr. Valderrabanos worked at IBM in an adjacent field before moving to Novell and then making the jump to his own start up. The hard work required to invent a fundamentally new way to make sense of human utterance is now beginning to pay off.

Antonio Valderrabanos of Bitext

Dr. Antonio Valderrabanos, founder and CEO of Bitext. Bitext’s business is growing rapidly. The company’s breakthroughs in deep linguistic analysis solves many difficult problems in text analysis.

Founded in 2008, the firm specializes in deep linguistic analysis. The systems and methods invented and refined by Bitext improve the accuracy of a wide range of content processing and text analytics systems. What’s remarkable about the Bitext breakthroughs is that the company support more than 40 different languages, and its platform can support additional languages with sharp reductions in the time, cost, and effort required by old-school systems. With the proliferation of intelligent software, Bitext, in my opinion, puts the digital brains in overdrive. Bitext’s platform improves the accuracy of many smart software applications, ranging from customer support to business intelligence.

In our wide ranging discussion, Dr. Valderrabanos made a number of insightful comments. Let me highlight three and urge you to read the full text of the interview at this link. (Note: this interview is part of the Search Wizards Speak series.)

Linguistics as an Operating System

One of Dr. Valderrabanos’ most startling observations addresses the future of operating systems for increasingly intelligence software and applications. He said:

Linguistic applications will form a new type of operating system. If we are correct in our thought that language understanding creates a new type of platform, it follows that innovators will build more new things on this foundation. That means that there is no endpoint, just more opportunities to realize new products and services.

Better Understanding Has Arrived

Some of the smart software I have tested is unable to understand what seems to be very basic instructions. The problem, in my opinion, is context. Most smart software struggles to figure out the knowledge cloud which embraces certain data. Dr. Valderrabanos observed:

Search is one thing. Understanding what human utterances mean is another. Bitext’s proprietary technology delivers understanding. Bitext has created an easy to scale and multilingual Deep Linguistic Analysis or DLA platform. Our technology reduces costs and increases user satisfaction in voice applications or customer service applications. I see it as a major breakthrough in the state of the art.

If he is right, the Bitext DLA platform may be one of the next big things in technology. The reason? As smart software becomes more widely adopted, the need to make sense of data and text in different languages becomes increasingly important. Bitext may be the digital differential that makes the smart applications run the way users expect them to.

Snap In Bitext DLA

Advanced technology like Bitext’s often comes with a hidden cost. The advanced system works well in a demonstration or a controlled environment. When that system has to be integrated into “as is” systems from other vendors or from a custom development project, difficulties can pile up. Dr. Valderrabanos asserted:

Bitext DLA provides parsing data for text enrichment for a wide range of languages, for informal and formal text and for different verticals to improve the accuracy of deep learning engines and reduce training times and data needs. Bitext works in this way with many other organizations’ systems.

When I asked him about integration, he said:

No problems. We snap in.

I am interested in Bitext’s technical methods. In the last year, he has signed deals with companies like Audi, Renault, a large mobile handset manufacturer, and an online information retrieval company.

When I thanked him for his time, he was quite polite. But he did say, “I have to get back to my desk. We have received several requests for proposals.”

Las Rosas looked quite a bit like Silicon Valley when I left the Bitext headquarters. Despite the thousands of miles separating Madrid from the US, interest in Bitext’s deep linguistic analysis is surging. Silicon Valley has its charms, and now it has a Bitext US office for what may be the fastest growing computational linguistics and text analysis system in the world. Worth watching this company I think.

For more about Bitext, navigate to the firm’s Web site at www.bitext.com.

Stephen E Arnold, April 11, 2017

Diffeo Incorporates Meta Search Technology

March 24, 2017

Will search-and-discovery firm  Diffeo’s recent acquisition give it the edge? Yahoo Finance shares, “Diffeo Acquires Meta Search and Launches New Offering.” Startup Meta Search developed a local computer and cloud search system that uses smart indexing to assign index terms and keep the terms consistent. Diffeo provides a range of advanced content processing services based on collaborative machine intelligence. The press release specifies:

Diffeo’s content discovery platform accelerates research analysts by applying text analytics and machine intelligence algorithms to users’ in-progress files, so that it can recommend content that fills in knowledge gaps — often before the user thinks of searching. Diffeo acts as a personal research assistant that scours both the user’s files and the Internet. The company describes its technology as collaborative machine intelligence.

Diffeo and Meta’s services complement each other. Meta provides unified search across the content on all of a user’s cloud platforms and devices. Diffeo’s Advanced Discovery Toolbox displays recommendations alongside in-progress documents to accelerate the work of research analysts by uncovering key connections.

Meta’s platform integrates cloud environments into a single keyword search interface, enabling users to search their files on all cloud drives, such as Dropbox, Google Drive, Slack and Evernote all at once. Meta also improves search quality by intelligently analyzing each document, determining the most important concepts, and automatically applying those concepts as ‘Smart Tags’ to the user’s documents.

This seems like a promising combination. Founded in 2012, Diffeo made Meta Search its first acquisition on January 10 of this year. The company is currently hiring. Meta Search, now called Diffeo Cloud Search, is based in Boston.

Cynthia Murrell, March 24, 2017

Big Data: The Crawfish Approach to Meaningful Information

March 21, 2017

Have you ever watched a crawfish (sometimes called a crawdad or a crayfish) get away from trouble. The freshwater crustaceans can go backwards. Members of the members of the Astacidae can be found in parts of the south, so you will have to wander in a Georgia swamp to check out the creature’s behavior.

The point is that crawfish go backwards to protect themselves and achieve their tiny lobster like goals. Big time consultants also crawfish in order to sell more work and provide “enhanced” insight into a thorny business or technical problem other consultants have created.

To see this in action, navigate to “The Conundrum of Big Data.” A super consultant explains that Big Data is not exactly the home run, silver bullet, or magic potion some lesser consultants said Big Data would be. I learned:

Despite two decades of intensive IT investment in data [mining] applications, recent studies show that companies continue to have trouble identifying metrics that can predict and explain performance results and/or improve operations. Data mining, the process of identifying patterns and structures in the data, has clear potential to identify prescriptions for success but its wide implementation fails systematically. Companies tend to deploy ‘unsupervised-learning’ algorithms in pursuit of predictive metrics, but this automated [black box] approach results in linking multiple low-information metrics in theories that turn out to be improbably complex.

Big surprise. For folks who are not trained in the nuts and bolts of data analysis and semi fancy math, Big Data is a giant vacuum cleaner for money. The cash has to pay for “experts,” plumbing, software, and more humans. The outputs are often fuzzy wuzzy probabilities which more “wizards” interpret. Think of a Greek religious authority looking at the ancient equivalent of road kill.

The write up cites the fizzle that was Google Flu Trends. Cough. Cough. But even that sneeze could be fixed with artificial intelligence. Yep, when smart humans make mistakes, send in smart software. That will work.

In my opinion, the highlight of the write up was this passage:

When it comes to data, size isn’t everything because big data on their own cannot just solve the problem of ‘insight’ (i.e. inferring what is going on). The true enablers are the data-scientists and statisticians who have been obsessed for more than two centuries to understand the world through data and what traps lie in wait during this exercise. In the world of analytics (AaaS), it is agility (using science, investigative skills, appropriate technology), trust (to solve the client’s real business problems and build collateral), and ‘know-how’ (to extract intelligence hidden in the data) that are the prime ‘assets’ for competing, not the size of the data. Big data are certainly here but big insights have yet to arrive.

Yes. More consulting is needed to make those payoffs arrive. But first, hire more advisers. What could possibly go wrong? Cough. Sneeze. One goes forwards with Big Data by going backwards for more analysis.

Stephen E Arnold, March 21, 2017

Big Data Requires More Than STEM Skills

March 13, 2017

It will require training Canada’s youth in design and the arts, as well as STEM subjects if that country is to excel in today’s big-data world. That is the advice of trio of academic researchers in that country, Patricio Davila, Sara Diamond, and Steve Szigeti,  who declare, “There’s No Big Data Without Intelligent Interface” at the Globe and Mail. The article begins by describing why data management is now a crucial part of success throughout society, then emphasizes that we need creative types to design intuitive user interfaces and effective analytics representations. The researchers explain:

Here’s the challenge: For humans, data are meaningless without curation, interpretation and representation. All the examples described above require elegant, meaningful and navigable sensory interfaces. Adjacent to the visual are emerging creative, applied and inclusive design practices in data “representation,” whether it’s data sculpture (such as 3-D printing, moulding and representation in all physical media of data), tangible computing (wearables or systems that manage data through tactile interfaces) or data sonification (yes, data can make beautiful music).

Infographics is the practice of displaying data, while data visualization or visual analytics refers to tools or systems that are interactive and allow users to upload their own data sets. In a world increasingly driven by data analysis, designers, digital media artists, and animators provide essential tools for users. These interpretive skills stand side by side with general literacy, numeracy, statistical analytics, computational skills and cognitive science.

We also learn about several specific projects undertaken by faculty members at OCAD University, where our three authors are involved in the school’s Visual Analytics Lab. For example, the iCity project addresses transportation network planning in cities, and the Care and Condition Monitor is a mobile app designed to help patients and their healthcare providers better work together in pursuit of treatment goals. The researchers conclude with an appeal to their nation’s colleges and universities to develop programs that incorporate data management, data numeracy, data analysis, and representational skills early and often. Good suggestion.

Cynthia Murrell, March 13, 2017

To Make Data Analytics Sort of Work: Attention to Detail

March 10, 2017

I read “The Much-Needed Business Facet for Modern Data Integration.” The write up presents some useful information. Not many of the “go fast and break things” crowd will relate to some of the ideas and suggestions, but I found the article refreshing.

What does one do to make modern data centric activities sort of work? The answers are ones that I have found many more youthful wizards often elect to ignore.

Here they are:

  1. Do data preparation. Yikes. Normalization of data. I have fielded this question in the past, “Who has time for that?” Answer: Too few, gentle reader. Too few.
  2. Profile the data. Another gasp. In my experience it is helpful to determine what data are actually germane to the goal. Think about the polls for the recent
  3. Create data libraries. Good idea. But it is much more fun to just recreate data sets. Very Zen like.
  4. Have rules which are now explained as “data governance.” The jargon does not change the need for editorial and data guidelines.
  5. Take a stab at data quality. This is another way of saying, “Clean up the data.” Even whiz bang modern systems are confused with differences like I.B.M and International Business Machines or numbers with decimal points in the incorrect place.
  6. Get colleagues in the game. This is a good idea, but in many organizations in which I have worked “team” is spelled “my bonus.”

Useful checklist. I fear that those who color unicorns will not like the dog work which accompanies implementing the ideas. That’s what makes search and content processing so darned interesting.

Stephen E Arnold, March 10, 2017

Intelligence Industry Becoming Privatized and Concentrated

March 10, 2017

Monopolies aren’t just for telecoms and zipper manufacturers. The Nation reveals a much scarier example in its article, “5 Corporations Now Dominate Our Privatized Intelligence Industry.” Reporter Tim Shorrock outlines the latest merger that brings us to this point, one between Pentagon &  NSA contractor Leidos Holdings and a division of Lockheed Martin called Information Systems and Global Solutions. Shorrock writes:

The sheer size of the new entity makes Leidos one of the most powerful companies in the intelligence-contracting industry, which is worth about $50 billion today. According to a comprehensive study I’ve just completed on public and private employment in intelligence, Leidos is now the largest of five corporations that together employ nearly 80 percent of the private-sector employees contracted to work for US spy and surveillance agencies.

Yes, that’s 80 percent. For the first time since spy agencies began outsourcing their core analytic and operational work in the late 1990s, the bulk of the contracted work goes to a handful of companies: Leidos, Booz Allen Hamilton, CSRA, SAIC, and CACI International. This concentration of ‘pure plays’—a Wall Street term for companies that makes one product for a single market—marks a fundamental shift in an industry that was once a highly diverse mix of large military contractors, small and medium technology companies, and tiny ‘Beltway Bandits’ surrounding Washington, D.C.

I should mention that our beloved leader, Stephen E Arnold, used to work as a gopher for one of these five companies, Booz Allen Hamilton. Shorrock details the reasons such concentrated power is a problem in the intelligence industry, and shares the profile he has made on each company. He also elaborates on the methods he used to analyze the shadowy workforce they employ. (You’ll be unsurprised to learn it can be difficult to gather data on intelligence workers.) See the article for those details, and for Shorrock’s discussion of negligence by the media and by Congress on this matter. We can agree that most folks don’t seem to be aware of this trend, or of its potential repercussions.

Cynthia Murrell, March 10, 2016

 

 

Cambridge Analytica: Buzz, Buzz, Buzz

March 9, 2017

The idea that software can make sense of information is a powerful one. Many companies tout the capabilities of their business processes, analytical tools, and staff to look at data and get a sense of the future. The vast majority of these firms have tools and methods which provide useful information.

What happens when a person who did not take a course in analytics learns about the strengths and limitations of these systems?

Answer: You get some excitement.

I read “Big Data’s Power Is Terrifying. That Could Be Good News for Democracy.” The main idea is that companies with nifty analytic systems and methods can control life is magnetic. Lots of folks want to believe that a company’s analyses can have a significant impact on elections, public opinion, and maybe the stock market.

The write up asserts:

Online information already lends itself to manipulation and political abuse, and the age of big data has scarcely begun. In combination with advances in cognitive linguistics and neuroscience, this data could become a powerful tool for changing the electoral decisions we make. Our capacity to resist manipulation is limited.

My view is that one must not confuse the explanations from marketing mavens, alarmists, and those who want to believe that Star Trek is “real” with what today’s systems can do. Firms like Cambridge Analytica and others generate reports. In fact, companies have been using software to figure out what’s what for many years.

What’s interesting is that folks learn about these systems and pick up the worn ball and carry it down field while screaming, “Touchdown.”

Sorry. The systems don’t warrant that type of excitement. Reality is less exciting. Probabilities are useful, not reality. But why not carry the ball. It is easier than learning what analytics firms do.

Stephen E Arnold, March 9, 2017

Voice Recognition Software Has Huge Market Reach

March 3, 2017

Voice recognition software still feels like a futuristic technology, despite its prevalence in our everyday lives.  WhaTech explains how far voice recognition technology has imbedded itself into our habits in, “Listening To The Voice Recognition Market.”

The biggest example of speech recognition technology is an automated phone system.  Automated phone systems are used all over the board, especially in banks, retail chains, restaurants, and office phone directories.  People usually despise automated phone systems, because they cannot understand responses and tend to put people on hold for extended periods of time.

Despite how much we hate automated phone systems, they are useful and they have gotten better in understanding human speech and the industry applications are endless:

The Global Voice Recognition Systems Sales Market 2017report by Big Market Research is a comprehensive study of the global voice recognition market. It covers both current and future prospect scenarios, revealing the market’s expected growth rate based on historical data. For products, the report reveals the market’s sales volume, revenue, product price, market share and growth rate, each of which is segmented by artificial intelligence systems and non-artificial intelligence systems. For end-user applications, the report reveals the status for major applications, sales volume, market share and growth rate for each application, with common applications including healthcare, military and aerospace, communications, and automotive.

Key players in the voice recognition software field are Validsoft, Sensory, Biotrust ID, Voicevault, Voicebox Technologies, Lumenvox, M2SYS, Advanced Voice Recognition Systems, and Mmodal.  These companies would benefit from using Bitext’s linguistic-based analytics platform to enhance their technology’s language learning skills.

Whitney Grace, May 3, 2017

IBM and Root Access Misstep?

March 2, 2017

Maybe this is fake news? Maybe. Navigate to “Big Blue’s Big Blunder: IBM Accidentally Hands Over Root Access to Its Data Science Servers.” When I read the title, my first reaction was, “Hey, Yahoot is back in the security news.” Wrong.

According to the write up, which I assume to be exposing the “truth”:

IBM left private keys to the Docker host environment in its Data Science Experience service inside freely available containers. This potentially granted the cloud service’s users root access to the underlying container-hosting machines – and potentially to other machines in Big Blue’s Spark computing cluster. Effectively, Big Blue handed its cloud users the secrets needed to potentially commandeer and control its service’s computers.

IBM hopped to it. Two weeks after the stumble was discovered, IBM fixed the problem.

The write up includes this upbeat statement, attributed to the person using a demo account which exposed the glitch:

I think that IBM already has some amazing infosec people and a genuine commitment to protecting their services, and it’s a matter of instilling security culture and processes across their entire organization. That said, any company that has products allowing users to run untrusted code should think long and hard about their system architecture. This is not to imply that containers were poorly designed (because I don’t think they were), but more that they’re so new that best practices in their use are still being actively developed. Compare a newer-model table saw to one decades old: The new one comes stock with an abundance of safety features including emergency stopping, a riving knife, push sticks, etc, as a result of evolving culture and standards through time and understanding.

Bad news. Good news.

Let’s ask Watson about IBM security. Hold that thought, please. Watson is working on health care information. And don’t forget the March 2017 security conference sponsored by those security pros at IBM.

Stephen E Arnold, March 2, 2017

Finding Meaning in Snapchat Images, One Billion at a Time

February 27, 2017

The article on InfoQ titled Amazon Introduces Rekognition for Image Analysis explores the managed service aimed at the explosive image market. According to research cited in the article, over 1 billion photos are taken every single day on Snapchat alone, compared to the 80 billion total taken in the year 2000. Rekognition’s deep learning power is focused on identifying meaning in visual content. The article states,

The capabilities that Rekognition provides include Object and Scene detection, Facial Analysis, Face Comparison and Facial Recognition. While Amazon Rekognition is a new public service, it has a proven track record. Jeff Barr, chief evangelist at AWS, explains: Powered by deep learning and built by our Computer Vision team over the course of many years, this fully-managed service already analyzes billions of images daily. It has been trained on thousands of objects and scenes. Rekognition was designed from the get-go to run at scale.

The facial analysis features include markers for image quality, facial landmarks like facial hair and open eyes, and sentiment expressed (smiling = happy.) The face comparison feature includes a similarity score that estimates the likelihood of two pictures being of the same person. Perhaps the most useful feature is object and scene detection, which Amazon believes will help users find specific moments by searching for certain objects. The use cases also span vacation rental markets and travel sites, which can now tag images with key terms for improved classifications.

Chelsea Kerwin, February 27, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta