Turning to AI for Better Data Hygiene
December 28, 2017
Most big data is flawed in some way, because humans are imperfect beings. That is the premise behind ZDNet’s article, “The Great Data Science Hope: Machine Learning Can Cure Your Terrible Data Hygiene.” Editor-in-Chief Larry Dignan explains:
The reality is enterprises haven’t been creating data dictionaries, meta data and clean information for years. Sure, this data hygiene effort may have improved a bit, but let’s get real: Humans aren’t up for the job and never have been. ZDNet’s Andrew Brust put it succinctly: Humans aren’t meticulous enough. And without clean data, a data scientist can’t create algorithms or a model for analytics.
Luckily, technology vendors have a magic elixir to sell you…again. The latest concept is to create an abstraction layer that can manage your data, bring analytics to the masses and use machine learning to make predictions and create business value. And the grand setup for this analytics nirvana is to use machine learning to do all the work that enterprises have neglected.
I know you’ve heard this before. The last magic box was the data lake where you’d throw in all of your information–structured and unstructured–and then use a Hadoop cluster and a few other technologies to make sense of it all. Before big data, the data warehouse was going to give you insights and solve all your problems along with business intelligence and enterprise resource planning. But without data hygiene in the first place enterprises replicated a familiar, but failed strategy: Poop in. Poop out.
What the observation lacks in eloquence it makes up for in insight—the whole data-lake concept was flawed from the start since it did not give adequate attention to data preparation. Dignan cites IBM’s Watson Data Platform as an example of the new machine-learning-based cleanup tools, and points to other noteworthy vendors investigating similar ideas—Alation, Io-Tahoe, Cloudera, and HortonWorks. Which cleaning tool will perform best remains to be seen, but Dignan seems sure of one thing—the data that enterprises have been diligently collecting for the last several years is as dirty as a dustbin lid.
Cynthia Murrell, December 28, 2017
IBM Thinks It Can Crack Pharmaceutical Code with AI
December 20, 2017
Artificial intelligence has been tasked with solving every problem from famine to climate change to helping you pick a new favorite song. So, it should come as no surprise that IBM thinks it can revolutionize another industry with AI. We learned exactly what from a Digital Trends story, “IBM’s New AI Predicts Chemical Reactions, Could Revolutionize Drug Development.”
According to the story,
As described in a new research paper, the A.I. chemist is able to predict chemical reactions in a way that could be incredibly important for fields like drug discovery. To do this, it uses a highly detailed data set of knowledge on 395,496 different reactions taken from thousands of research papers published over the years.
Teo Laino, one of the researchers on the project from IBM Research in Zurich, told Digital Trends that it is a great example of how A.I. can draw upon large quantities of knowledge that would be astonishingly difficult for a human to master — particularly when it needs to be updated all the time.
It’s an absolutely valid plan, but we aren’t sure if IBM is the one to really pull off this trick. IBM trying to work in big pharma seems kind of like your uncle tinkering on his “inventions” out in the shed. We’d rather see someone whose primary focus is AI and medicine, like Certara, PhinC, and Chem Abstracts.
Patrick Roland, December 20, 2017
IBM AI: Speeding Up One Thing, Ignoring a Slow Thing
December 12, 2017
I read “IBM Develops Preprocessing Block, Makes Machine Learning Faster Tenfold.” I read this statement and took out my trust Big Blue marketing highlight felt tip:
“To the best of our knowledge, we are first to have generic solution with a 10x speedup. Specifically, for traditional, linear machine learning models — which are widely used for data sets that are too big for neural networks to train on — we have implemented the techniques on the best reference schemes and demonstrated a minimum of a 10x speedup.” [Emphasis added to make it easy to spot certain semantically-rich verbiage.”]
I like the traditional, linear, and demonstrated lingo.
From my vantage point, this is useful, but it is one modest component of a traditional, linear machine learning “model”.
The part which suck ups subject matter experts, time, and money (lots of money) includes these steps:
- Collecting domain specific information, figuring out what’s important and what’s not, and figuring out how to match what a person or subsystem needs to know against this domain knowledge
- Collecting the information. Sure, this seems easy, but it can be a slippery fish for some domains. Tidy, traditional domains like a subset of technical information make it easier and cheaper to fiddle with word lists, synonym expansion “helpers”, and sources which are supposed to be accurate. Accuracy, of course, is a bit like mom’s apple pie.
- Converting the source information into a format which the content processing system can use without choking storage space with exceptions or engaging in computationally expensive conversions which have to be checked by software or humans before pushing the content to the content processing subsystem. (Some outfits fudge by limiting content types. The approach works in some eDiscovery system because the information is in more predictable formats.)
What is the time and money relationship of dealing with these three steps versus the speed up for the traditional machine learning models? In my experience the cost of the three steps identified above are often greater than the cost of the downstream processes. So a 10 percent speed up in a single process is helpful but it won’t pay for pizza for the development team.
Just my view from Harrod’s Creek, which sees things in a way which is different from IBM marketing and IBM Zurich wizards. Shoot those squirrels before eating them, you hear.
Stephen E Arnold, December 12, 2017
Microsoft Bing Has the Last AI Laugh
December 1, 2017
Nobody likes Bing, but because it is a Microsoft product it continues to endure. It chugs along as the second most used search engine in the US, but apparently first is the worst and second is the best for creating a database of useful information for AI. India News 24 shares that, “Microsoft Bing: The Redmond Giant’s Overlooked Tool” is worth far more than thought.
Every day millions of users use Bing by inputting search queries as basic keywords, questions, and even images. In order to test an AI algorithm, huge datasets are needed so the algorithm can learn and discover patterns. Bing is the key to creating the necessary datasets. You also might be using Bing without knowing it as it powers Yahoo search and is also on Amazon tablets.
All of this has helped Microsoft better understand language, images and text at a large scale, said Steve Clayton, who as Microsoft’s chief storyteller helps communicate the company’s AI strategy. It is amazing how Bing serves a dual purpose:
Bing serves dual purposes, he said, as a source of data to train artificial intelligence and a vehicle to be able to deliver smarter services. While Google also has the advantage of a powerful search engine, other companies making big investments in the AI race – such as IBM or Amazon – do not.
Amazon has access to search queries centered on e-commerce, but when it comes to everything else that is not available in one of their warehouses. This is where Bing comes in. Bing feeding Microsoft’s AI projects has yet to turn a profit, but AI is still a new market and new projects are always being worked on.
Whitney Grace, December 1, 2017
Watson Works with AMA, Cerner to Create Health Data Model
December 1, 2017
We see IBM Watson is doing the partner thing again, this time with the American Medical Association (AMA). I guess they were not satisfied with blockchain applications and the i2 line of business after all. Forbes reports, “AMA Partners With IBM Watson, Cerner on Health Data Model.” Contributor Bruce Japsen cites James Madera of the AMA when he reports that though the organization has been collecting a lot of valuable clinical data, it has not yet been able to make the most of it. Of the new project, we learn:
The AMA’s ‘Integrated Health Model Initiative’ is designed to create a ‘shared framework for organizing health data , emphasizing patient-centric information and refining data elements to those most predictive of achieving better outcomes.’ Those already involved in the effort include IBM, Cerner, Intermountain Healthcare, the American Heart Association, the American Academy of Family Physicians and the American Medical Informatics Association. The initiative is open to all healthcare and information stakeholders and there are no licensing fees for participants or potential users of what is eventually created. Madara described the AMA’s role as being like that of Switzerland: working to tell companies like Cerner and IBM what data elements are important and encouraging best practices, particularly when patient care and clinical information is involved. The AMA, for example, would provide ‘clinical validation review to make sure there is an evidence base under it because we don’t want junk,’ Madara said.
IBM and Cerner each have their own healthcare platforms, of course, but each is happy to work with the AMA. Japsen notes that as the healthcare industry shifts from the fee-for-service approach to value-based pricing models, accurate and complete information become more crucial than ever.
Cynthia Murrell, December 1, 2017
IBM Can Train Smart Software ‘Extremely Fast’ an IBM Wizard Asserts
November 30, 2017
Short honk: If you love IBM, you will want to read “AI, Cognitive Realities and Quantum Futures – IBM’s Head of Cognitive Solutions Explains.” The article contains extracts of an IBM wizard’s comments at a Salesforce event. Here’s the passage I noted:
What we find is we always start with POCs, proof of concept. They can be small or large. They’re very quick now, because we can train Watson our new data extremely fast.
If this is true, IBM may have an advantage over many other smart software vendors. Why? Gathering input data, formatting that data into a form the system’s content processing module can handle, and crunching the data to generate the needed indexes takes time and costs a great deal of money. If one needs to get inputs from subject matter experts, the cost of putting engineers in a room with the SMEs can be high.
It would be interesting to know the metrics behind the IBM “extremely fast” comment. My hunch is that head-to-head tests with comparable systems will reveal that none of the systems have made a significant breakthrough in these backend and training processes.
Talk is easy and fast; training smart software not so much.
Stephen E Arnold, November 29, 2017
IBM Watson: Shedding Dreams?
November 27, 2017
I read a short item in “IBM to Retire Two Watson IoT Services.” IBM rolled out Watson in 2011, but the bits and pieces were floating around or acquired years before Watson won a TV game show. (Post production, anyone?)
The short write up reveals a factoid, which I assume not to be too fake. Specifically, IBM Watson is not longer providing two Internet of Things, Watson-infused services. These are or were Context Mapping and Driver Behavior. Presumably clever customers can whip up something to perform Watson-like services with other chunks of IBM code.
From birth to shedding functions in just 72 months. What’s next? Shall I ask Watson? No, I shall not. It seems to me that reality may be dawning in some IBM management circles. There is more to shed as baby Watson tries to generate money, not PR and marketing hyperbole.
Stephen E Arnold, November 27, 2017
Google Made AI Learning Fun
November 22, 2017
Games that are supposed to be educational and fun usually stink worse than rotten fruit (except for Oregon Trail). One problem is that these games are not designed by gamers, i.e. people who actually play games! Another problem is that when gamers do design games they lack the ability to convey in a learnable manner. Thankfully Google has both gamers and teachers. According to Engadget, Google has a fun way to learn about AI: “Google Created A Fun Way To Learn Simple AI.”
Google invented the Teachable Machine that teaches users simple ways to learn about AI with only a webcam and microphone. What is great about the Teachable Machine is that it does not require any coding experience in order to use it. Anyone from children to adults can use it and it has already been used to do silly and stupid things along with smart and practical uses.
Teachable Machine conveys just how important pattern recognition is becoming in the technology world. It’s used in photo apps to recognize faces and objects, but it also powers supercomputers like IBM’s Watson. Looking ahead, we might eventually be able to use similar machine learning techniques to train our smarthomes. For example, it could automatically turn on your living room lights and TV when it detects you’ve come home. Or a pet feeder could dispense more food when your cat sits in front of it.
It is neat to play around with Teachable Machine and get your computer to do simple commands. The article ends on a sour and scary note: machine learning technology will always be watching and listening to users to learn more. Yes, very creepy.
Whitney Grace, November 22, 2017
IBM Watson: A Fashionista Never Says Sorry
October 19, 2017
I haven’t paid much attention to IBM Watson since the popular media began poking holes in IBM’s marketing assertions. However, I feel compelled to highlight the information in “Presenting Intelligent Fashion: IBM’s Watson and Vogue Unveil the World’s First AI Inspired Saree.” There’s nothing quite like versatile software able to treat cancer and whip up a saree.
Here’s the passage that I found amusing:
Findability Sciences, an IBM ecosystem partner, used Watson’s Visual Recognition API to extract specific context around patterns and colors that were most dominant. Aura Gupta used this data via a custom-built IBM application, to design a never-before-seen saree-gown that was worn by the event’s MC and Emmy Award winning actress, Archie Panjabi. The designs embodied the achievements of every woman and two men, yet were unique to each individual winner.
Quite a case example.
Stephen E Arnold, October 19, 2017
IBM: A Roll Downhill?
October 18, 2017
I read “IBM Reports Marginal Dip in Quarterly Revenue.” I think the headline qualifies as politically correct information. I don’t have the energy to point out that Big Blue is making some stakeholders blue. I highlighted this passage from the Reuters’ news story:
IBM’s revenue has declined for nearly six years as the company continues to exit some legacy businesses, while bolstering its cloud services.
Yep, six years. No, I don’t want to ask Watson anything.
Stephen E Arnold, October 18, 2017