Watson Based Tradeoff Analytics Weighs Options
July 13, 2015
IBM’s Watson now lends its considerable intellect to helping users make sound decisions. In “IBM Watson Tradeoff Analytics—General Availability,” the Watson Developer Community announces that the GA release of this new tool can be obtained through the Watson Developer Cloud platform. The release follows an apparently successful Beta run that began last February. The write-up explains that the tool:
“… Allows you to compare and explore many options against multiple criteria at the same time. This ultimately contributes to a more balanced decision with optimal payoff.
“Clients expect to be educated and empowered: ‘don’t just tell me what to do,’ but ‘educate me, and let me choose.’ Tradeoff Analytics achieves this by providing reasoning and insights that enable judgment through assessment of the alternatives and the consequent results of each choice. The tool identifies alternatives that represent interesting tradeoff considerations. In other words: Tradeoff Analytics highlights areas where you may compromise a little to gain a lot. For example, in a scenario where you want to buy a phone, you can learn that if you pay just a little more for one phone, you will gain a better camera and a better battery life, which can give you greater satisfaction than the slightly lower price.”
For those interested in the technical details behind this Watson iteration, the article points you to Tradeoff Analytics’ documentation. Those wishing to glimpse the visualization capabilities can navigate to this demo. The write-up also lists post-beta updates and explains pricing, so check it out for more information.
Cynthia Murrell, July 13, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Elsevier and Its Business Model May Be Ageing Fast
July 13, 2015
If you need to conduct research and are not attached to a university or academic library, then you are going to get hit with huge subscription fees to have access to quality material. This is especially true for the scientific community, but on the Internet if there is a will there most certainly is a way. Material often locked behind a subscription service can be found if you dig around the Internet long enough, mostly from foreign countries, but the material is often pirated. Gizmodo shares in the article, “Academic Publishing Giant Fights To Keep Science Paywalled” that Elsevier, one of the largest academic publishers, is angry about its content being stolen and shared on third party sites. Elsevier recently filed a complaint with the New York District Court against Library Genesis and SciHub.org.
“The sites, which are both popular in developing countries like India and Indonesia, are a treasure trove of free pdf copies of research papers that typically cost an arm and a leg without a university library subscription. Most of the content on Libgen and SciHub was probably uploaded using borrowed or stolen student or faculty university credentials. Elsevier is hoping to shut both sites down and receive compensation for its losses, which could run in the millions.”
Gizmodo acknowledges Elsevier has a right to complain, but they also flip the argument in the other direction by pointing out that access to quality scientific research material is expensive. The article brings up Netflix’s entertainment offerings, with Netflix users pay a flat fee every month and have access to thousands of titles. Netflix remains popular because it remains cheap and the company openly acknowledges that it sets its prices to be competitive against piracy sites.
Publishers and authors should be compensated for their work and it is well known that academics do not rake in millions, but access to academic works should be less expensive. Following Netflix’s model or having a subscription service like Amazon Prime might be a better business model to follow.
Whitney Grace, July 13, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Need a 1.3 Gb Corpus with a Million Text Objects?
July 12, 2015
Short honk: If you have a search and content processing system, you might want to navigate to this link. You can access the Hacker news data dump. My thought would be for the Watson team to process this information and then put up a demo of the Watson system using the Hacker News content. Any other search and content processing vendors game? interesting content and a beefy enough corpus to provide interesting results.
Stephen E Arnold, July 12, 2015
Semantic Search: How Far Will This Baloney Tube Stretch?
July 12, 2015
I have seen a number of tweets, messages, and comments about “Semantic Search: the Future of Search Marketing?”
For those looking for traffic, it seems that using the phrase “semantic search” in conjunction with “search marketing” is Grade A click bait. Go for it.
My view is a bit different. I think that the baloney manufactured from semantic search (more correctly the various methods that can be grouped under the word semantic) is low grade baloney.
Search marketing is on a par with the institutional pizza pumped out for freshman in a dorm in DeKalb, Illinois. Yum, tasty. What is it? Oh, I know it is something that is supposed to be nutritious and tasty. The reality is that the pizza isn’t. That’s search marketing. The relevant result may not be. Relevance is jiggling results so that a message is displayed whether the user wants that message or not. Not pizza.
Here’s a passage in the write up I highlighted in pale yellow, the color in my marker set closest to the dorm pizza:
Semantic search is the technology the search engines employ to better understand the context of a search.
Contrast this definition with this one from “Breakthrough Analysis: Two + Nine Types of Semantic Search” published in 2010, five years before the crazy SEO adoption of the buzzword, if not the understanding of what “semantic” embraces:
Semantics (in an IT setting) is meaningful computing: the application of natural language processing (NLP) to support information retrieval, analytics, and data-integration that compass both numerical and “unstructured” information.
The article then trots out these semantic search options:
- Related searches and queries
- Reference results (dictionary look up)
- Annotated results
- Similarity search
- Syntactic annotations
- Concept search
- Ontology based search
- Semantic Web search
- Faceted search
- Clustered search
- Natural language search
Now there are many, many issues with this list. How about differentiating faceted, concept, and clustered search? Give up yet?
The point is that semantic search is not one thing. If one accepts this list as the touchstone, the functions referenced are going to contain other content processing operations.
The problem is that these functions on their own or used in some magical, affordable combination are not likely to deliver what the user wants.
The user wants relevant results which pertain directly to her specific information need.
The search engine optimization and marketing crowd want the results to be what they want to present to a user.
The objectives are different and may not be congruent or even similar.
In short, the notion of taking crazy, generalized concepts and slapping them on marketing is likely to produce howlers like this write up and the equally wonky list from 2010.
The point is that semantic baloney has been in the supermarket for a long time.
Obviously this baloney has a long shelf life.
In the meantime, how is ad supported Web search working for you? Oh, how is that in house information access system working for you?
If you want traffic, buy Adwords. Please, do not deliver to me the six pack of baloney.
Stephen E Arnold, July 12, 2015
Information Technology: The Myth of Control
July 12, 2015
In the good old days circa 1962, one went to a computer center. In the center was a desk, usually too high to be comfortable for the supplicant to lean comfortably. There were young people ready to ask the supplicant to sign in, fill out a form to request computer time, and wait. Once in a while, a supplicant would be granted a time slot on a keypunch machine. Most of the time, the supplicant was given an time slot. But that was the start of the process.
I won’t bore you with the details of submitting decks of punched cards, returning to get a green bar print out, and the joy or heartbreak of finding out that you program ran or did not.
I figured out quickly that working in the computer center was the sure fire way to get access to the computer, a giant IBM thing which required care and feeding of two or three people plus others on call.
The pain of those experiences have not gone away, gentle reader. If you are fortunate enough to be in a facility with a maybe-is or maybe-isn’t quantum computer, the mainframe mentality is the only way to go. There are research facilities with even more stringent guidelines, but the average mobile phone user thinks that computer use is a democracy. Wrong. Controls are important. Period. But senior management, not information technology, has the responsibility to steer the good ship Policies & Procedures.
It is not. It never will be.
When I read “Cloudy with a Chance of Data Loss: Has Corporate IT Lost Control?” I was not comfortable. The reality is that corporate information technology has control in certain situations. In others, for all practical purposes, there is no organizational information technology department.
MBAs, venture capital types, and those without patience what what they want when they want it. The controls are probably in place, but the attitude of these hyper kinetic history majors with a law degree is that those rules do not apply to them. Toss in a handful of entitled but ineffective middle school teachers and a clueless webmaster and you have the chemical components of bone head information technology behaviors.
The information technology professionals just continue to do their thing, hoping that they can manage the systems in today’s equivalent of a 1960s air conditioned, sealed off, locked, and limited access computer room.
Other stuff is essentially chaos.
The write up assumes that control is a bad thing. The write up uses words like “consumer oriented,” “ease of use,” and “ownership.” The reason a non mainframe mentality exists among most people with whom I interact is a reptilian memory of the mainframe method. For most people, entitlement and do your own thing are the keys to effective computing.
If an information technology professional suggests a more effective two factor authentication procedure or a more locked down approach to high value content—these people are either ignored, terminated, or just worked around.
As a result of organization’s penchant for hiring those who are friendly and on the team, one gets some darned exciting information technology situations. Management happily cuts budgets. One Fortune 100 company CFO told me, “We are freezing the IT budget. Whatever those guys do, they have to do it with a fixed allocation.” Wonderful reasoning.
The write up concludes with this statement:
Modern IT departments realize that to overcome security challenges they must work together with users– not dictate to them. The advent of the cloud model means that smart users can readily circumvent restrictions if they see no value in abiding by the rules. IT teams must therefore be inclusive and proactive, investing in secure file-sharing solutions that are accepted by users while also providing visibility, compliance and security. Fortunately, there are good alternatives for the 84 per cent of senior IT management who admit they are “concerned” over employee-managed cloud services. The bottom line is this: there are times when we all need to share files. But there is never an occasion when any of us should trust a consumer-grade service with critical business data. It simply presents too many risks.
Nope. The optimal way in my view is for organizations to knock off the shortcuts, focus on specific methods required to deliver functionality and help reduce the risk of a “problem,” and shift from entitlement feeling good attitudes to a more formal, business-centric approach.
It is not a matter of control. Commonsense and the importance of senior management to create a work environment in which control exists across business policies and procedures.
The hippy dippy approach to information technology is more risky than some folks realize. As the wall poster in my server room says, “Ignorance is bliss. Hello, happy.”
Stephen E Arnold, July 12, 2015
Dealing with Company and Product Identity: Terbium Labs Nails It
July 11, 2015
Navigate to www.terbiumlabs.com and read about the company.
Nifty name. Very nifty name indeed. Now, a bit of branding commentary.
I used to work at Halliburton Nuclear. Ah, the good old days of nuclear engineers poking fun at civil engineers and mathematicians not understanding any joke made my the computer engineers.
The problem of naming companies in high technology disciplines is a very big one. Before Halliburton gobbled up the Nuclear Utility Services outfit, the company with more than 400 nuclear engineers on staff struggled with its name. Nuclear Utility Services was abbreviated to NUS. A pretty sharp copywriter named Richard Harrington of the dearly loved Ketchum, McLeod and Gove ad agency came up with this catchy line:
After the EPA, call NUS.
The important point is that Mr. Harrington, a whiz person, wanted to have people read each letter: E-P-A, not say eepa and say N-U-S not say noose. In Japanese, the sound “nus” has a negative meaning usually applied to pressurized body odor emissions. Not good.
Search and content processing vendors struggle with names. I have written about outfits which have fumbled the branding ball. Examples range from Thunderstone which has been usurped by a gaming company. Brainware which has been snagged and used for interesting videos. Smartlogic whose name has been appropriated by a smaller outfit doing marketing/design stuff. There are names which are impossible to find; for example, i2, AMI, and ChaCha to name a few among many.
I want to call attention to a quite useful product naming which I learned about recently. Navigate to TerbiumLabs.com. Consider the word Terbium. Look for the word “Matchlight.”
I find Terbium a darned good word because terbium is an element, which my old (and I mean old) chemistry professor pronounced “ter-beem”). The element has a number of useful applications. Think solid sate devices and as a magic ingredient in some rocket fuels and—okay, okay—some explosives.
But as good as “terbium” is for a company I absolutely delight in this product name:
Matchlight.
Now what’s Matchlight and why should anyone care. My hunch is that the technology which allows a next generation approach to content identification and other functions works to
- light a match in the wilderness
- illuminate a dark space
- start a camp fire so I can cook a goose
You can and should learn more about Terbium Labs and its technology. The names will help you remember.
Important company; important technology. Great name Matchlight. (Hear that search and content processing vendors with dud names?)
Stephen E Arnold, July 11, 2015
Holy Cow. More Information Technology Disruptors in the Second Machine Age!
July 11, 2015
I read a very odd write up called “The Five Other Disruptors about to Define IT in the Second Machine Age.”
Whoa, Nellie. The second machine age. I thought we were in the information age. Dorky machines are going to be given an IQ injection with smart software. The era is defined by software, not machines. You know. Mobile phones are pretty much a commodity with the machine part defined by fashion and brand and, of course, software.
So a second machine age. News to me. I am living in the second machine age. Interesting. I thought we had the Industrial Revolution, then the boring seventh grade mantra of manufacturing, the nuclear age, the information age, etc. Now we are doing the software thing.
My hunch is that the author of this strange article is channeling Shoshana Zuboff’s In the Age of the Smart Machine. That’s okay, but I am not convinced that the one, two thing is working for me.
Let’s look at the disruptors which the article asserts are just as common as the wonky key fob I have for my 2011 Kia Soul. A gray Kia soul. Call me exciting.
Here are the four disruptors that, I assume, are about to remake current information technology models. Note that these four disruptors are “about to define IT.” These are like rocks balanced above Alexander the Great’s troops as they marched through the valleys in what is now Afghanistan. A 12 year old child could push the rock from its perch and crush a handful of Macedonians. Potential and scary enough to help Alexander to decide to march in a different direction. Hello, India.
These disruptors are the rocks about to plummet into my information technology department. The department, I wish to point out, works from their hovels and automobiles, dialing in when the spirit moves them.
Here we go:
- Big Data
- Cloud
- Mobile
- Social
I am not confident that these four disruptors have done much to alter my information technology life, but if one is young, I assume that these disruptors are just part of the everyday experience. I see grade school children poking their smart phones when I take my dogs for their morning constitutional.
But the points which grabbed my attention were the “five other disruptors.” I had to calm down because I assumed i had a reasonable grasp on disruptors important in my line of work. But, no. These disruptors are not my disruptors.
Let’s look at each:
The Trend to NoOps
What the heck does this mean? In my experience, experienced operations professionals are needed even as some of the smart outfits I used to work with.
Agility Becomes a First Class Citizen
I did not know that the ability to respond to issues and innovations was not essential for a successful information technology professional.
Identity without Barriers
What the heck does this mean? The innovations in security are focused on ensuring that barriers exist and are not improperly gone through. The methods have little to do with an individual’s preferences. The notion of federation is an interesting one. In some cases, federation is one of the unresolved challenges in information technology. Mixing up security, “passwords,” and disparate content from heterogeneous systems is a very untidy serving of fruit salad.
Thinking about information technology after reading Rush’s book of farmer flummoxing poetry. Is this required reading for a mid tier consultant? I wonder if Dave Schubmehl has read it? I wonder if some Gartner or Forrester consultants have dipped into its meaty pages. (No pun intended.)
IT Goes Bi Modal?
What the heck does this mean again? Referencing Gartner is a sure fire way to raise grave concerns about the validity of the assertion. But bi-modal. Two modes. Like zero and one. Organizations have to figure out how to use available technology to meet that organization’s specific requirements. The problem of legacy and next generation systems defines the information landscape. Information technology has to cope with a fuzzy technology environment. Bi modal? Baloney.
The Second Machine Age
Okay, I think I understand the idea of a machine age. The problem is that we are in a software and information datasphere. The machine thing is important, but it is software that allows legacy systems to coexist with more with it approaches. This silly number of ages makes zero sense and is essentially a subjective, fictional, metaphorical view of the present information technology environment.
Maybe that’s why Gartner hires poets and high profile publications employ folks who might find an hour discussing the metaphorical implications of “bare ruined choirs.”
None of these five disruptions makes much sense to me.
My hunch is that you, gentle reader, may be flummoxed as well.
Stephen E Arnold, July 11, 2015
What Is Watt? It Is the Innovation That Counts.
July 11, 2015
Years ago I worked with a polymath named Fred Czufin. Czufin was an author, writer, consultant, and former Office of Strategic Services cartographic specialist. Today Czufin would be buried in geocoding.
Why am I mentioning a fellow who died in 2009.
Czufin introduced me to James Watt. I knew the steam engine thing, but Czufin was bonkers over James Watt’s innovative streak.
I thought of Czufin, my ignorance of an important scientist, and our reasonably fun times when we collaborated on some interesting projects.
I read “A Twelve Year Flash of Genius.” The write up sparked anew my effort to chip away at my ignorance of this 18th century inventor. Watt struggled with the engineering problems of early Newcomen pumps. Mostly these puppies exploded.
Watt went for a walk and cook dup the idea of a condenser. Eureka. Steam engines mostly worked. Even my server room air conditioner contains a version of Watt’s invention.
I am not going to take sides in the flash of genius approach to innovation. One can argue that the antecedents for Watt’s thinking littered the laboratories of his predecessors, tinkerers, and fellow scientists.
My hunch is that there was no single epiphany. The result of sifting through many facts, fiddling around, and then trying to figure out if and then why something worked made him a bright person.
As I think about James Watt, I wonder when a similar thinker will come up with a breakthrough in information access. Most of the search systems with which I am familiar are in their pre-condenser stage. They blow up, fizzle, disappoint, hiss, and produce more angst than smiley faces.
My hunch is that Czufin would be as impatient as I about the opportunity a modern day James Watt can deliver. Search has more in common with Newcomen’s pump than a solution to a very important information problem.
Stephen E Arnold, July 11, 2015
Google and Smart Software: We Are a Leader, Darn It
July 10, 2015
Google once posted lots of research papers. Over the years, the flow required that I participate in many Easter Egg Hunts. I came across a page on the Google Research Blog that is titled (fetchingly, I might add) “ICML 2015 and Machine Learning Research at Google.” Many with in companies are into smart software. This page should be called, “Darn it. We are leaders in machine learning.”
Two things are evident on the Google Web page.
- Google is sending a platoon of its centurions to the International Conference on Machine Learning. Attendees will have many opportunities to speak with Googlers.
- Google is displaying a range of machine learning expertise. Some of it is a typical data crunching exercise like reducing internal covariate shift to research that provides a glimpse of what Google will do with its knowledge. See, for example, “Ficitious Self Play in Extensive Form Games.” Google just really wants to do the game thing and get the game revenue, gentle reader.
I suggest you download the papers quickly. If Google loses enthusiasm for making these documents available, you will end up having to buy them when the research surfaces in assorted journals. Some papers, once off the Google page, may be unavailable unless you are pals with the author or authors.
Good stuff here. Take that Baidu, Facebook, et al.
Stephen E Arnold, July 10, 2015
Business Intelligence: The Grunt Work? Time for a Latte
July 10, 2015
I read “One Third of BI Pros Spend Up to 90% of Time Cleaning Data.” Well, well, well. Good old and frail eWeek has reported what those involved in data work have known for what? Decades, maybe centuries? The write up states with typical feather duster verbiage:
A recent survey commissioned by data integration platform provider Xplenty indicates that nearly one-third of business intelligence (BI) professionals are little more than “data janitors,” as they spend a majority of their time cleaning raw data for analytics.
What this means is that the grunt work in analytics still has to be done. This is difficult and tedious work even with normalization tools and nifty hand crafted scripts. Who wants to do this work? Not the MBAs who need slick charts to nail their bonus. Not the frantic marketer who has to add some juice to the pale and wan vice president’s talk at the Rotary Club. Not anyone, except those who understand the importance of scrutinizing data.
The write up points out that extract, transform, and load functions or ETL in the jingoism of Sillycon Valley is work. Guess what? The eWeek story uses these words to explain what the grunt work entails:
- Integrating data from different platforms
- Transforming data
- Cleansing data
- Formatting data.
But here’s the most important item in the article: If the report on which the article is based is correct, 21 percent of the data require special care and feeding. How’s that grab you for a task when you are pumping a terabyte of social media or intercept data a day? Right. Time for a bit of Facebook and a trip to Starbuck’s.
What happens if the data are not ship shape? Well, think about the fine decisions flowing from organizations which are dependent on data analytics. Why not chase down good old United Airlines and ask the outfit if anyone processed log files for the network which effectively grounded all flights? Know anyone at the Office of Personnel Management? You might ask the same question.
Ignoring data or looking at outputs without going through the grunt work is little better than guessing. No, wait. Guessing would probably return better outcomes. Time for some Foosball.
Stephen E Arnold, July 10, 2015