Watson: Following in the Footsteps of America Online with PR, not CD ROMs
July 31, 2015
I am now getting interested in the marketing efforts of IBM Watson’s professionals. I have written about some of the items which my Overflight system snags.
I have gathered a handful of gems from the past week or so. As you peruse these items, remember several facts:
- Watson is Lucene, home brew scripts, and acquired search utilities like Vivisimo’s clustering and de-duplicating technology
- IBM said that Watson would be a multi billion dollar business and then dropped that target from 10 or 12 Autonomy scale operations to something more modest. How modest the company won’t say.
- IBM has tallied a baker’s dozen of quarterly reports with declining revenues
- IBM’s reallocation of employee resources continues as IBM is starting to run out of easy ways to trim expenses
- The good old mainframe is still a technology wonder, and it produces something Watson only dreams about: Profits.
Here we go. Remember high school English class and the “willing suspension of disbelief.” Keep that in mind, please.
ITEM 1: “IBM Watson to Help Cities Run Smarter.” The main assertion, which comes from unicorn land, is: “Purple Forge’s “Powered by IBM Watson” solution uses Watson’s question answering and natural language processing capabilities to let users ask questions and get evidence-based answers using a website, smartphone or wearable devices such as the Apple Watch, without having to wait for a call agent or a reply to an email.” There you go. Better customer service. Aren’t government’s supposed to serve its citizens? Does the project suggest that city governments are not performing this basic duty? Smarter? Hmm.
ITEM 2: “Why I’m So Excited about Watson, IBM’s Answer Man.” In this remarkable essay, an “expert” explains that the president of IBM explained to a TV interviewer that IBM was being “reinvented.” Here’s the quote that I found amusing: “IBM invented almost everything about data,” Rometty insisted. “Our research lab was the first one ever in Silicon Valley. Creating Watson made perfect sense for us. Now he’s ready to help everyone.” Now the author is probably unaware that I was, lo, these many years ago, involved with an IBM Herb Noble who was struggling to make IBM’s own and much loved STAIRS III work. I wish to point out that Silicon Valley research did not have its hands on the steering wheel when it came to the STAIRS system. In fact, the job of making this puppy work fell to IBM folks in Germany as I recall.
ITEM 3: “IBM Watson, CVS Deal: How the Smartest Computer on Earth Could Shake Up Health Care for 70m Pharmacy Customers.” Now this is an astounding chunk of public relations output. I am confident that the author is confident that “real journalism” was involved. You know: Interviewing, researching, analyzing, using Watson, talking to customers, etc. Here’s the passage I highlighted: “One of the most frustrating things for patients can be a lack of access to their health or prescription history and the ability to share it. This is one of the things both IBM and CVS officials have said they hope to solve.” Yes, hope. It springs eternal as my mother used to say.
If you find these fact filled romps through the market activating technology of Watson, you may be qualified to become a Watson believer. For me, I am reminded of Charles Bukowski’s alleged quip:
The problem with the world is that the intelligent people are full of doubts while the stupid ones are full of confidence.
Stephen E Arnold, July 31, 2015
Enterprise Search: You Cannot Do It Yourself, People.
July 31, 2015
I love write ups like “Don’t Settle When It Comes to Enterprise Search Platforms.” These articles are designed to make consulting firms with the marketing flim flam which positions each as an “expert” in enterprise information access. I would not be surprised to find copies of this article in the peddler kit of search sales professionals.
The main point of the write up is that enterprise search is a “platform.” Because there are options, no self respecting company will try to implement search without the equivalent of the F Troop in mid tier or below consultants.
I noted:
Let’s look at two very common workarounds some have tried, and then we will talk about why you must go with a reputable developer when you make your final decision.
When I read this, I wondered if the “expert” were familiar with the Maxxcat line of enterprise search systems or the Blossom hosted solution.
The write up dismisses an open source solution apparently unaware of research by Diomidis Spinellis and Vaggelis Giannikas work published in Journal of Systems and Software, March 2012, pages 666 to 682. That’s okay. My hunch is that those finding the “Don’t Settle” article compelling are not likely to be interested in researchy type stuff.
One of the more interesting segments in the write up is the assertion that scalability is a “given.” Hmmm. In my experience, there are some on going enterprise search challenges: Scalability is one facet of a nest of vipers which includes my favorite reptile indexing latency.
The article states:
Open source platforms are only as scalable as their code allows, so if the person who first made it didn’t have your company’s needs in mind, you’ll be in trouble. Even if they did, you could run into a problem where you find out that scaling up actually reveals some issues you hadn’t encountered before. This is the exact kind of event you want to avoid at all costs.
I don’t want to rain on this parade of “information,” but every enterprise search system which I have had the pleasure of procuring, managing, investigating, and analyzing has scalability problems.
The reason is simple: The volume of changed information and the flow of new information goes up. Whatever one starts with is rather rapidly choked. The solutions are painful: Spend more or index less.
I am not confident that one who follows the advice of certain experts will find his or her enterprise search journey pleasant. On the other hand, there are opportunities as Uber drivers one can pursue.
Stephen E Arnold, July 31, 2015
Google to the French: Wrong to Be Forgotten
July 31, 2015
i read “Google Says Non to French Demand to Expand Right to Be Forgotten Worldwide.” When third parties want the GOOG to do something, those suggestions face headwinds. It is okay for the Google to terminate unused Gmail accounts. It is okay for the Google to nuke APIs. It is okay for the Google to deliver “relevant” results which are beyond the statistical embrace of precision and recall analyses.
But when a third party wants to be forgotten? According to the write up from the increasingly anti Google folks in the UK, I learned:
Google has rejected the French data protection authority’s demand that it censor search results worldwide in order to comply with the European Court of Justice’s so-called right to be forgotten ruling. The company’s rejection of the ruling could see its French subsidiary facing daily fines, although no explicit sanction has yet been declared.
The write up also reminded me of Google’s official view of third party requests to be forgotten:
In a blog post, Peter Fleischer, Google’s Global Privacy Counsel, said: “We believe this order is disproportionate and unnecessary, given that the overwhelming majority of French internet users – currently around 97% – access a European version of Google’s search engine like Google.fr, rather than Google.com or any other version of Google.” Additionally, Fleischer added, the company is concerned that complying with the French courts could potentially set a precedent that one country’s laws can control access to content globally.
My hunch is that Google wants its policies and procedures applied globally. Google has suggested that some nation states alter their behavior to better mesh with the Googley universe.
Standing by for more Google vs. France dust ups.
Stephen E Arnold, July 31, 2015
Bing Is Very Important, I Mean VERY Important
July 31, 2015
The online magazine eWeek published, “What The Bing Search Engine Brings To Microsoft’s Web Strategy” and it explains how Bing spurs a lot of debate:
“Some who don’t like the direction in which Google is going say that Bing is the search engine they prefer, especially since Microsoft has honed Bing’s ability to deliver relevant results. Others, however, look at Bing as one of many products from Microsoft, which is still seen as the “Evil Empire” in some quarters and a search platform that’s incapable of delivering the results that compare favorably with Google. Bing, introduced six years ago in 2009, is still a remarkably controversial product in Microsoft’s lineup. But it’s one that plays an important role in so many of the company’s Internet services.”
Microsoft is ramping up Bing to become a valuable part of its software services, it continues its partnership with Yahoo and Apple, and it will also power AOL’s web advertising and search. Bing is becoming a more respected search engine, but what does it have to offer?
Bing has many features it is using to entice people to stop using Google. When searching a person’s name, search results display a bio of the person (only if they are affluent, however). Bing has a loyalty program, seriously, called Bing Rewards, the more you search on Bing it rewards points that are redeemable for gift cards, movie rentals, and other items.
Bing is already a big component in Microsoft software, including Windows 10 and Office 365. It serves as the backbone for not only a system search, but searching the entire Internet. Think Apple’s Spotlight, except for Windows. It also supports a bevy of useful applications and do not forget about Cortana, which is Microsoft’s answer to Siri.
Bing is very important to Microsoft because of the ad revenue. It is just a guess, but you can always ask Cortana for the answer.
Whitney Grace, July 31, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Finnish Content Discovery Case Study
July 31, 2015
There are many services that offer companies the ability to increase their content discover. One of these services is Leiki, which offers intelligent user profiling, context-based intelligence, and semantic SaaS solutions. Rather than having humans adapt their content to get to the top of search engine results, the machine is altered to fit a human’s needs. Leiki pushes relevant content to a user’s search query. Leiki released a recent, “Case Study: Lieki Smart Services Increase Customer Flow Significantly At Alma Media.”
Alma Media is one of the largest media companies in Finland, owning many well-known Finnish brands. These include Finland’s most popular Web site, classified ads, and a tabloid newspaper. Alma Media employed two of Leiki’s services to grow its traffic:
“Leiki’s Smart Services are adept at understanding textual content across various content types: articles, video, images, classifieds, etc. Each content item is analyzed with our semantic engine Leiki Focus to create a very detailed “fingerprint” or content profile of topics associated with the content.
SmartContext is the market leading service for contextual content recommendations. It’s uniquely able to recommend content across content types and sites and does this by finding related content using the meaning of content – not keyword frequency.
SmartPersonal stands for behavioral content recommendations. As it also uses Leiki’s unique analysis of the meaning in content, it can recommend content from any other site and content type based on usage of one site.”
The case study runs down how Leiki’s services improved traffic and encouraged more users to consume its content. Leiki’s main selling point in the cast study is that offers users personal recommendations based on content they clicked on Alma Media Web sites. Leiki wants to be a part of developing Web 3.0 and the research shows that personalization is the way for it to go.
Whitney Grace, July 31, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
The Semantic Web and JSON LD: Some Irritation Perhaps?
July 30, 2015
I read the Wikipedia article about JSON LD or JavaScript Object notation for Linked Data when I was pondering the fate of the XML centric start ups like MarkLogic. I highlighted one sentence in the Wikipedia write up which is subject to the usual caveats about bias, incorrect information, etc. And that sentence was:
JSON-LD is designed around the concept of a “context” to provide additional mappings from JSON to an RDF model.
Yes, the much loved RDF model.
When I read “JSON-LD and Why I Hate the Semantic Web,” I noticed a bit of friskiness in the word choice; for example, misguided souls, cryptic, complicated, market share, “kick RDF in the nuts,” and similar rhetorical arabesques. I do like the active verb “kick” however.
The passage I highlighted with my bright orange marker was this one:
The problem with getting a room full of smart people together is that the group’s world view gets skewed. There are many reasons that a working group filled with experts don’t consistently produce great results. For example, many of the participants can be humble about their knowledge so they tend to think that a good chunk of the people that will be using their technology will be just as enlightened. Bad feature ideas can be argued for months and rationalized because smart people, lacking any sort of compelling real world data, are great at debating and rationalizing bad decisions.
Seems normal to me.
In my opinion, this write up explains why some XML centric, Semantic Web cheerleaders have labored to generate organic growth. Just a thought. Talking to fellow travelers is reassuring and comfortable. Those not on the cruise ship may have a different point of view.
Stephen E Arnold, July 30, 2015
The Hadoop Spark Thing: Simple, Simple
July 30, 2015
I am fascinated with the cheerleading about open source software which makes Big Data as easy as driving a Fiat 500 through a car wash. (Make sure the wheels fit inside the automated pulley system, of course.)
Navigate to “The Big Big Data Question: Hadoop or Spark?” Be prepared to read about two—count ‘em—two systems working as smoothly as the engine in a technical high school’s auto repair class’ project car.
I want to highlight two statements in the write up.
The first is:
As I [a Big Data practitioner] mentioned, Spark does not include its own system for organizing files in a distributed way (the file system) so it requires one provided by a third-party. For this reason many Big Data projects involve installing Spark on top of Hadoop, where Spark’s advanced analytics applications can make use of data stored using the Hadoop Distributed File System (HDFS).
In short, Spark is what I call a wrapper. One uses it like a taco shell to keep the good in position for real time munching.
The second is this comment:
The open source principle is a great thing, in many ways, and one of them is how it enables seemingly similar products to exist alongside each other – vendors can sell both (or rather, provide installation and support services for both, based on what their customers actually need in order to extract maximum value from their data.
What the write omits is that there are some other bits and pieces needed; for example, how does one locate a particular string amidst the Big Data?
The point, for me, is that these nested and layered systems are truly exciting to troubleshoot. Not only are their issues with the integrity of the data, there is the thrill of getting each subsystem to work and then figuring out how to get useful outputs from the digital equivalent of a Roy’s Place Lassie’s Double Revenge sandwich before it closed its doors in 2013.
A Lassie’s Double Revenge consisted of a knockwurst, cheese, grilled onions, baked beans, and assorted seasonings served to the discerning diner.
A little like an open source Big Data mash up.
As a bonus, one gets to hire consultants who can make separate products, systems, and solutions work in a way which benefits the licensee and the system’s users.
Stephen E Arnold, July 30, 2015
Organizations Should Consider Office 365 Utilization
July 30, 2015
Office 365 has been a bit contentious within the community. While Microsoft touts it as a solution that gives users more of the social and mobile components they were wishing for, it has not been widely adopted. IT Web gives some reasons to consider the upgrade in its article, “Why You Should Migrate SharePoint to Office 365.”
The article says:
“Although SharePoint as a technology has matured a great deal over the years, I still see many businesses struggling with issues related to on-premises SharePoint, says Simon Hepburn, director of bSOLVe . . . You may be thinking: ‘Are things really that different using SharePoint on Office 365?’ Office 365 is constantly evolving and as I will explain, this evolution brings with it opportunities that your business should seriously consider exploring.’”
Of course the irony is that with the new SharePoint 2016 upgrade, Microsoft is giving users a promise to stand behind on-premise installations, but they are continuing to integrate and promote the Office 365 components. Only time and feedback will dictate the continued direction of the enterprise solution. In the meantime, stay tuned to Stephen E. Arnold and his Web service, ArnoldIT.com. Arnold is a longtime leader in search and his dedicated SharePoint feed is a one-stop-shop for all the latest news, tips, and tricks.
Emily Rae Aldridge, July 30, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Viva The Academic Publisher Boycott!
July 30, 2015
Academic databases provide access to quality research material, which is key for any student, professor, or researcher to succeed in their work. One major drawback to academic databases is the high cost associated with subscription fees. Individual researchers cannot justify subscribing to an academic database and purchasing a single article runs high. This is why they rely on academic libraries to cover the costs. Due to changing publishing trends, academic publishers are raising subscription fees.
Elsevier is one of the largest and most well-known scientific journal database, but it is also the most notorious for its expensive subscription fee and universities are getting tired of it. Univers reports that “Dutch Universities Start Their Elsevier Boycott.” The Netherlands, led by state secretary Sander Dekker, want all scientific content to be free online. In order to be published, the university or financier pays to be so. All content by Dutch scientists will hopefully be open access by 2024.
In the meantime, the Association of Universities in the Netherlands has asked all Dutch scientists that work with Elsevier to resign from their positions. As to be expected, some are willing and others are more reluctant. The goal is to pressure Elsevier to change its practices.
“In Univers nr. 8, in January, professor Jan Blommaert called the current publishing system ‘completely absurd’. Not only because of the costs for subscription, but also because the journals have a lot of power over the content: ‘A young PhD student who has been able to get an article accepted by a journal may still have to wait 18 months for it to be published, because the editors prefer well-known names. It is not unthinkable that if I would submit a love letter, it would be published sooner than an intelligent scholarly article by a young researcher.’ ”
The Dutch universities are setting a standard that many libraries and universities will also follow, but the hardest part is encouraging more to participate. Libraries and universities have an obligation to provide needed materials to researchers and a boycott will hinder the step. Large boycotts, rather than individual, will be more effective and instrumental in changing Elsevier’s practices.
Whitney Grace, July 30, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Whither Unix Data
July 30, 2015
For anyone using open-source Unix to work with data, IT World has a few tips for you in “The Best Tools and Techniques for Finding Data on Unix Systems.” In her regular column, “Unix as a Second Language,” writer Sandra Henry-Stocker explains:
“Sometimes looking for information on a Unix system is like looking for needles in haystacks. Even important messages can be difficult to notice when they’re buried in huge piles of text. And so many of us are dealing with ‘big data’ these days — log files that are multiple gigabytes in size and huge record collections in any form that might be mined for business intelligence. Fortunately, there are only two times when you need to dig through piles of data to get your job done — when you know what you’re looking for and when you don’t. 😉 The best tools and techniques will depend on which of these two situations you’re facing.”
When you know just what to search for, Henry-Stocker suggests the “grep” command. She supplies a few variations, complete with a poetic example. Sometimes, like when tracking errors, you’re not sure what you will find but do know where to look. In those cases, she suggests using the “sed” command. For both approaches, Henry-Stocker supplies example code and troubleshooting tips. See the article for the juicy details.
Cynthia Murrell, July 30, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph