SharePoint Server 2016 Details Released
May 12, 2015
Some details about the rollout of SharePoint Server 2016 were revealed at the much-anticipated Ignite event in Chicago last week. Microsoft now commits to being on track with the project, making a public beta available in fourth quarter of this year, and “release candidate” and “general availability” versions to follow. Read more in the Redmond Magazine article, “SharePoint Server 2016 Roadmap Highlighted at Ignite Event.”
The article addresses the tension between cloud and on-premises versions:
“While Microsoft has been developing the product based on its cloud learnings, namely SharePoint Online as part of its Office 365 services, those cloud-inspired features eventually will make their way back into the server product. The capabilities that don’t make it into the server will be offered as Office 365 services that can be leveraged by premises-based systems.”
It appears that the delayed timeline may be a “worst case scenario” measure, and that the release could happen earlier. After all, it is better for customers to be prepared for the worst and be pleasantly surprised. To stay in touch with the latest news regarding features and timeline, keep an eye on ArnoldIT.com, specifically the SharePoint feed. Stephen E. Arnold is a longtime leader in search and serves as a great resource for individuals who need access to the latest SharePoint news at a glance.
Emily Rae Aldridge, May 12, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Reading in the Attention Deficit World
May 12, 2015
The article on Popist titled Telling the Truth with Charts outlines the most effective and simple method of presenting the information on the waning of book-reading among Americans. While the article focuses on the effectiveness of the chart, the information in the chart is disturbing as well, stating that the amount of Americans who read zero books in 2014 is up to 23% from 8% in 1987. The article links to another article on The Atlantic titled The Decline of the American Book Lover. That article presents an argument for some hope,
“The percentage of young folks reading for pleasure stopped declining. Last year, the NEA found that 52 percent of 18-24 year-olds had read a book outside of work or school, the same as in the pre-Facebook days of 2002. If book culture were in terminal decline, this is the demographic where you’d expect it to be fading fastest. Perhaps the worst of the fall is over. “
The article demonstrates the connection between education level and reading for pleasure, which may be validation for many teachers and professors. However, there also seems to be a growing tendency among students to read, even homework, without absorbing anything, or in other words, to skim texts instead of paying close attention. This may be the effect of too much TV or
Facebook, or even the No Child Left Behind generation entering college. Students are far more interested in their grades than in their education, and just tallying up the numbers of books they or anyone else read is not going to paint an accurate portrait. Similarly, what books are the readers reading? If they are all Twilight and 50 Shades of Grey, do we still celebrate the accomplishment?
Chelsea Kerwin, May 12, 2014
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
HP Autonomy Dust Up: Details, Details
May 11, 2015
I read belatedly yet another analysis of the HP lawsuit against Autonomy. “Details of HP Lawsuit against Autonomy Executives” The write up reports that HP is taking “direct legal action against Lynch.” There is nothing like a personal legal action to keep the legal eagles circling in search of money.
The HP position is that Lynch (the founder of Autonomy) and Sushovan Hussain (former Autonomy CFO) overstated Autonomy’s growth and profits. My reaction is “Yeah, but didn’t you guys review the numbers before you wrote a check for $7 or $8 billion?”
Details, details.
The article states:
The acquisition has been seen as a disaster for HP since the tech giant was forced to write down $8.8 billion from the deal in 2012. The $5.1 billion legal claim is one of the largest ever brought against an individual in Britain. HP bases the claim on a $4.6 billion charge linked to the alleged financial misconduct, roughly $400 million connected to shares given to Lynch and Hussain and a further $100 million loss associated with Autonomy that was suspected of being caused by the former executives’ activities, according to the British court documents.
HP may not be a tech leader or even a C student in acquisition analyses, but it is the leader in the magnitude of the claim it is making against Dr. Lynch. If he is found guilty of selling something to HP who analyzed the deal and then decided to buy the company, he will have to pay $5.1 billion.
I don’t have a dog in this fight. But it seems to me that HP reviewed Qatalyst Partners’ financial presentation about Autonomy. Then HP analyzed the numbers. Then HP involved third parties in the review of the numbers. Then HP decided to buy Autonomy. Then HP bought the company. Then HP found that Autonomy is not exactly a product like a tube of Colgate Total toothpaste. Then HP fired, forced, or tasered Lynch and others out of the HP carpet land. Then HP tried to convert the technology into some sort of cloud based toolkit. And finally HP decided to go after Dr. Lynch. You don’t have to like him, but he is a bit of a celebrity in the Silicon Fen, holds an Order of the British Empire, and he is quite intelligent, maybe brilliant, and in my experience, not into dorks, fools, goof balls, losers, or dopey managers. Your mileage may vary, of course.
I am sufficiently experienced to know that when a buyer wants a product, service, or company, craving—nay, lust and craziness—kick in. “Yo, we’re 17 years old again. Let’s do it” scream the adrenaline charged experts. This is a slam dunk. We can take Autonomy waaaay beyond the place it is today. Rah, rah, rah. Get ‘em, team.”
Autonomy’s management and its advisors knows that PowerPoint dust can close deals. The blend of blood frenzy and the feeling of power one gets when taking ownership of a new La Ferrari is what business is about, dog. Smiles and PowerPointing from Autonomy played a part, but HP made the decision and wrote the check. Caveat emptor is good advice.
Frankly I see HP as the ideal candidate for a marvelous business school case. The HP Autonomy story is better than the Yahoo track record of blunders and blind luck. The management of HP believed something that has never ever ever been done: Generate billions of dollars in new revenue quickly. Google generates billions from advertising. Autonomy generated hundreds of millions in revenues from the licensing of dozens of products. HP got its wires crossed in reasoning which does not line up with the history of the search and content processing industry.
Billions do not flow from content processing and search technology. Investors can pump big money into a content processing company like Palantir. Will these investors get their money back? Don’t know. But to spend billions for a search and content processing company and then project that a $600 million or $800 million per year outfit would produce a gusher of billions is a big, but quite incorrect, thought.
Never has happened. Never will. It took Autonomy 15 years, good management, intelligent acquisitions, and lots of adaptation to hit the $600-$700 million plus in annual revenue it generated. Only energy drinking MBAs with Excel fever can convert 15 years and multiple revenue streams from dozens of quite different products into one giant multi billion dollar business in a couple of years. The scale is out of whack. When I visited the store in Manhattan with the big crazy pencil and the other giant products I could see the difference between my pencil and the big pencil. HP, I assume, would see the two pencils as identical. HP, if it purchased a big pencil, would sue the shop in Manhattan because the big pencil would not fit into a Panasonic desktop pencil sharpener. Scale of thinking, accuracy of perception—They matter to me. HP? Hmm.
This is not bad business on HP’s part. This is not flawed acquisition analysis on HP’s part. This is not HP’s inability to ask the right questions. This is medieval lunacy with managers dancing on the grass under a full moon. Isn’t HP that company which has floundered, investigated its own Board of Directors, chased good managers from one office in Silicon Valley into the arms of a competitor based on the old Sea World property? Maybe. Maybe HP is a fully stocked fishing pond, not a water deficient stream in Palo Alto?
My personal view is that HP has itself, its Board of Directors, and its advisors to blame. I find it very difficult to believe that as talented as Dr. Lynch is that he could spoof HP’s Board, HP’s financial professionals, HP’s advisors, HP’s lawyer, and HP’s Meg Whitman. Hey, the guy is talented, but he is not Houdini.
Well, we have a show, gentle reader. We have a really big show. Where is Ed Sullivan when we need an announcer?
Stephen E Arnold, May 11, 2015
Show Business and Enterprise Search
May 11, 2015
Short Honk: I read “In Our Increasingly Automated and Global Economy, Every Business Is Becoming Just a Little Bit Like Show Business.” Quite a Google-ized string of words. The write up asserts that work will be skilled contractors coming together when there is a project, money, and a need for specialists. This is—wait for it—the Hollywood model.
I think the author is sort of right. For certain types of work, hiring specialists makes sense. When an employee needs a hip replacement, few companies want to have the requisite specialists on staff.
The article asserts:
Our economy is in the midst of a grand shift toward the Hollywood model.
The author adds:
It’s a surprisingly good system for many workers too, in particular those with highly sought after skills.
The future will be
a new era of the human-robot partnership, in which robots can be told what to do without the use of difficult programming languages.
Sounds fantastic as long as one has in demand skills and can market her or his skills to generate awareness of an individual’s capabilities. (Too bad for those without skills and lacking in visibility. Tough luck.)
My interest is search.
If there is a tech sector where the Hollywood model should be visible, it is enterprise search. The experts come together, implement a system, and users become really happy with their new information retrieval system.
Unfortunately the data I have gathered suggests that anywhere from 55 to 75 percent of a search system’s users are unhappy. The folks in information technology departments have become gun shy when it comes to search. The folks who manage enterprise search solutions live a life of quiet desperation. It is not whether the person managing search will be RIFed; it is when in many search intolerant organizations.
The generalizations about the outputs of a Hollywood style approach to staffing don’t make much sense to me on a practical level for television and motion pictures. I find the outputs’ quality and value at odds with the products themselves.
The fact that a handful of specialists contribute their skills to a product via services that look good, appeal to the young in mind, and tap into the rich repository of comic book literature is evidence that the Hollywood model does not work for me.
Enterprise search has embraced the Hollywood model and tossed in superstars like the Google Search Appliance as well. How is that working? From my experience, search remains a problem no matter what the experts say or do.
Maybe the Hollywood model works but only in a superficial way. But those who are unemployed can watch the TV or go to the motion pictures. That’s value.
For enterprise search, more than a buzzword and a management catchphrase are needed to deliver a usable system. I hear the song now, “Another opening, another show…”
Stephen E Arnold, May 11, 2015
Neural Networks Finally Have Their Day
May 11, 2015
The Toronto Star offers a thoughtful piece about deep learning titled, “How a Toronto Professor’s Research Revolutionized Artificial Intelligence.” Professor Geoffrey Hinton was instrumental in pursuing the development of neural network-based AI since long before the concept was popular. Lately, though, this “deep learning” approach has taken off, launching many a product, corporate division, and startup. Reporter Kate Allen reveals who we can credit for leading neural networks through the shadows of doubt:
“Ask anyone in machine learning what kept neural network research alive and they will probably mention one or all of these three names: Geoffrey Hinton, fellow Canadian Yoshua Bengio and Yann LeCun, of Facebook and New York University.
“But if you ask these three people what kept neural network research alive, they are likely to cite CIFAR, the Canadian Institute for Advanced Research. The organization creates research programs shaped around ambitious topics. Its funding, drawn from both public and private sources, frees scientists to spend more time tackling those questions, and draws experts from different disciplines together to collaborate.”
Hooray for CIFAR! The detailed article describes what gives deep learning the edge, explains why “machine learning” is a better term than “AI”, and gives several examples of ways deep learning is being used today, including Hinton’s current work at Google and the University of Toronto. Allen also traces the history of the neural network from its conceptualization in 1958 by Frank Rosenblatt, through an era of skepticism, to its recent warm embrace by the AI field. I recommend interested parties check out the full article. We’re reminded:
“In 2006, Hinton and a PhD student, Ruslan Salakhutdinov, published two papers that demonstrated how very large neural networks, once too slow to be effective, could work much more quickly than before. The new nets had more layers of computation: they were ‘deep,’ hence the method’s rebranding as deep learning. And when researchers began throwing huge data sets at them, and combining them with new and powerful graphics processing units originally built for video games, the systems began beating traditional machine learning systems that had been tweaked for decades. Neural nets were back.”
What detailed discussion of machine learning would be complete without a nod to concerns that we develop AI at our peril? Allen takes some time to sketch out both sides of that debate, and summarizes:
“Some in the field believe that artificial intelligence will augment, not replace: algorithms will free us from rote tasks like memorizing reams of legal precedents and allow us to pursue the higher-order thinking our massive brains are capable of. Others think the only tasks machines can’t do better are creative ones.”
I suppose the answers to those debates will present themselves eventually. Personally, I’m more excited than scared by the possibilities. How about you, dear reader?
Cynthia Murrell, May 11, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Elasticsearch Transparent about Failed Jepsen Tests
May 11, 2015
The article on Aphyr titled Call Me Maybe: Elasticsearch 1.5.0 demonstrates the ongoing tendency for Elasticsearch to lose data during network partitions. The author goes through several scenarios and found that users can lose documents if nodes crash, a primary pauses, a network partitions into two intersecting components or into two discrete components. The article explains,
“My recommendations for Elasticsearch users are unchanged: store your data in a database with better safety guarantees, and continuously upsert every document from that database into Elasticsearch. If your search engine is missing a few documents for a day, it’s not a big deal; they’ll be reinserted on the next run and appear in subsequent searches. Not using Elasticsearch as a system of record also insulates you from having to worry about ES downtime during elections.”
The article praises Elasticsearch for their internal approach to documenting the problems, and especially the page they opened in September going into detail on resiliency. The page clarifies the question among users as to what it meant that the ticket closed. The page states pretty clearly that ES failed their Jepsen tests. The article exhorts other vendors to follow a similar regimen of supplying such information to users.
Chelsea Kerwin, May 11, 2014
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Math and Search Experts
May 10, 2015
I found “There’s More to Mathematics Than Rigor and Proofs” a useful reminder between the the person who is comfor4table with math and the person who asserts he is good in math. With more search and content processing embracing numerical recipes, the explanations of what a math centric system can do often leave me rolling my eyes and, in some cases, laughing out loud.
This essay explains that time and different types of math experiences are necessary stages in developing a useful facility with some of today’s information retrieval systems and methods. The write up points out:
The distinction between the three types of errors can lead to the phenomenon (which can often be quite puzzling to readers at earlier stages of mathematical development) of a mathematical argument by a post-rigorous mathematician which locally contains a number of typos and other formal errors, but is globally quite sound, with the local errors propagating for a while before being cancelled out by other local errors. (In contrast, when unchecked by a solid intuition, once an error is introduced in an argument by a pre-rigorous or rigorous mathematician, it is possible for the error to propagate out of control until one is left with complete nonsense at the end of the argument.)
Perhaps this section of the article sheds some light on the content processing systems which wander off the track of relevance and accuracy? As my mathy relative Vladimir Igorevich Arnold was fond of saying to anyone who would listen: Understand first, then talk.
Stephen E Arnold, May 10, 2015
Watson: The Swiss Army Knife for Digital Information, Business Intelligence, and Search?
May 10, 2015
I thought I could make it through the weekend without being subjected to another fusillade of IBM Watson Braunschweiger. Nope. My Overflight system delivered this 24 16 ounce tubes in plastic this smoggy spring morning in Harrod’s Creek. One tube struck me square in my bald spot. I am still groggy.
Navigate to “IBM’s Watson Supercomputer Strives to Be Jack of All Trades.” My immediate reaction to this conflation of IBM with hardware and smart software built from open source components, home brew scripts, and acquired technology was to think, “And master of none.”
The last Renaissance man was not the poster child for innovation that my sixth grade teacher described. Reality was a bit more gritty and a trifle sad. Leonardo, you gave it the old college try but the inspiration of the frescos in Nero’s Domus Aurea revealed that imitation played a part in your repertoire of insights.
The write up does not focus on what Watson is in terms of hardware or software. Instead I learn:
Watson, which gained fame in 2011 for defeating human opponents on the “Jeopardy” quiz show, has been reaching into its computing power since then for an array of other services.
The article then lists Watson’s initiatives: An engagement advisor for the military, a leak management capability for the petroleum industry, cancer treatment, management of post operative conditions, smart toys for tots, and analysis of financial investment opportunities. Included in the list is Watson’s abilities to develop recipes with tamarind as an ingredient.
How long will it be before Watson delivers sustainable revenues and profits to the struggling IBM? Watson, would you answer that question? Watson, Watson, are you there?
Stephen E Arnold, May 10, 2015
Semantic Search: The View from a Taxonomy Consultant
May 9, 2015
My team and I are working on a new project. With our Overflight system, we have an archive of memorable and not so memorable factoids about search and content processing. One of the goslings who was actually working yesterday asked me, “Do you recall this presentation?”
The presentation was “Implementing Semantic Search in the Enterprise,” created in 2009, which works out to six years ago. I did not recall the presentation. But the title evoked an image in my mind like this:
I asked, “How is this germane to our present project?’
The reply the gosling quacked was, “Semantic search means taxonomy.” The gosling enjoined me to examine this impressive looking diagram:
Okay.
I don’t want a document. I don’t want formatted content. I don’t want unformatted content. I want on point results I can use. To illustrate the gap between dumping a document on my lap and presenting some useful, look at this visualization from Geofeedia:
The idea is that a person can draw a shape on a map, see the real time content flowing via mobile devices, and look at a particular object. There are search tools and other utilities. The user of this Geofeedia technology examines information in a manner that does not produce a document to read. Sure, a user can read a tweet, but the focus is on understanding information, regardless of type, in a particular context in real time. There is a classification system operating in the plumbing of this system, but the key point is the functionality, not the fact that a consulting firm specializing in taxonomies is making a taxonomy the Alpha and the Omega of an information access system.
The deck starts with the premise that semantic search pivots on a taxonomy. The idea is that a “categorization scheme” makes it possible to index a document even though the words in the document may be the words in the taxonomy.
For me, the slide deck’s argument was off kilter. The mixing up of a term list and semantic search is the evidence of a Rube Goldberg approach to a quite important task: Accessing needed information in a useful, actionable way. Frankly, I think that dumping buzzwords into slide decks creates more confusion when focus and accuracy are essential.
At lunch the goslings and I flipped through the PowerPoint deck which is available via LinkedIn Slideshare. You may have to register to view the PowerPoint deck. I am never clear about what is viewable, what’s downloadable, and what’s on Slideshare. LinkedIn has its real estate, publishing, and personnel businesses to which to attend, so search and retrieval is obviously not a priority. The entire experience was superficially amusing but on a more profound level quite disturbing. No wonder enterprise search implementations careen in a swamp of cost overruns and angry users.
Now creating taxonomies or what I call controlled term lists can a darned exciting process. If one goes the human route, there are discussions about what term maps to what word or phrase. Think buzz group and discussion group and online collaboration. What terms go with what other terms. In the good old days, these term lists were crafted by subject matter and indexing specialists. For example, the guts of the ABI/INFORM classification coding terms originated in the 1981-1982 period and was the product of more than 14 individuals, one advisor (the now deceased Betty Eddison), and the begrudging assistance of the Courier Journal’s information technology department which performed analyses of the index terms and key words in the ABI/INFORM database. The classification system was reasonably, and it was licensed by the Royal Bank of Canada, IBM, and some other savvy outfits for their own indexing projects.
As you might know, investing two years in human and some machine inputs was an expensive proposition. It was the initial step in the reindexing of the ABI/INFORM database, which at the time was one of the go to sources of high value business and management information culled from more than 800 publications worldwide.
The only problem I have with the slide deck’s making a taxonomy a key concept is that one cannot craft a taxonomy without knowing what one is indexing. For example, you have a flow of content through and into an organization. In a business engaged in the manufacture of laboratory equipment, there will be a wide range of information. There will be unstructured information like Word documents prepared by wild eyed marketing associates. There will be legal documents artfully copied and pasted together from boiler plate. There will be images of the products themselves. There will be databases containing the names of customers, prospects, suppliers, and consultants. There will be information that employees download from the Internet or tote into the organization on a storage device.
The key concept of a taxonomy has to be anchored in reality, not an external term list like those which used to be provided by Oracle for certain vertical markets. In short, the time and cost of processing these items of information so that confidentiality is not breached is likely to make the organization’s accountant sit up and take notice.
Today many vendors assert that their systems can intelligently, automatically, and rapidly develop a taxonomy for an organization. I suggest you read the fine print. Even the whizziest taxonomy generator is going to require some baby sitting. To get a sense of what is required, track down an experienced licensee of the Autonomy IDOL system. There is a training period which requires a cohesive corpus of representative source material. Sorry, no images or videos accepted but the existing image and video metadata can be processed. Once the system is trained, then it is run against a test set of content. The results are examined by a human who knows what he or she is doing, and then the system is tuned. After the smart system runs for a few days, the human inspects and calibrates. The idea is that as content flows through the system and periodic tweaks are made, the system becomes smarter. In reality, indexing drift creeps in. In effect, the smart software never strays too far from the human subject matter experts riding herd on algorithms.
The problem exists even when there is a relatively stable core of technical terminology. The content of a lab gear manufacturer is many times greater than the problem of a company focusing on a specific branch of engineering, science, technology, or medicine. Indexing Halliburton nuclear energy information is trivial when compared to indexing more generalized business content like that found in ABI/INFORM or the typical services organization today.
I agree that a controlled term list is important. One cannot easily resolve entities unless there is a combination of automated processes and look up lists. An example is figuring out if a reference to I.B.M., Big Blue, or Armonk is a reference to the much loved marketers of Watson. Now handle a transliterated name like Anwar al-Awlaki and its variants. This type of indexing is quite important. Get it wrong and one cannot find information germane to a query. When one is investigating aliases used by bad actors, an error can become a bad day for some folks.
The remainder of the slide deck rides the taxonomy pony into the sunset. When one looks at the information created 72 months ago, it is easy for me to understand why enterprise search and content processing has become a “oh, my goodness” problem in many organizations. I think that a mid sized company would grind to a halt if it needed a controlled vocabulary which matched today’s content flows.
My take away from the slide deck is easy to summarize: The lesson is that putting the cart before the horse won’t get enterprise where it must go to retain credibility and deliver utility.
Stephen E Arnold, May 9, 2015
Making Queries of PostgreSQL Data Easy
May 9, 2015
If you query PostgreSQL tables, you may find yourself making nice with a script herder. Tired of that intermediated approach? Navigate to Slinky. You will want to watch the demo in Internet Explorer because I encountered flakiness in Firefox and Mozilla. You enter what you want in a search box, pick the table, and the system spits out the SQL query. Punch a button and you get a data table. Looked good and worked for us.
Stephen E Arnold, May 8, 2015