Hybrid Is Essential to SharePoint 2016

May 19, 2015

It looks like SharePoint is planning to bring the cloud to its SharePoint Server 2016 users at critical points, rather than forcing them to go “all cloud.” This technique allows Microsoft to continue with the cloud-based services that they have invested in, while improving the on-premises experience that users are demanding. ZDNet covers the whole story in their article, “Microsoft’s SharePoint 2016: What’s Hybrid Got to do With It?

The article sums up the much talked about hybrid approach:

“Though it will run on top of Windows Server 2016 R2 and/or Windows Server 2016, SharePoint 2016 will include support for what Microsoft calls ‘cloud-accelerated experiences,’ meaning new hybrid scenarios . . . Instead of trying to push all SharePoint users and all SharePoint workloads to the cloud, Microsoft is acknowledging there are some reasons (compliance among them) that not all data can or should be in SharePoint Online. That said, Microsoft wants to enable its SharePoint users to get at their data wherever it’s stored.”

Stephen E. Arnold is a lifelong leader in search and a long-time expert in SharePoint. He keeps managers and users updated on the latest SharePoint news through his Web service ArnoldIT.com. All eyes should stay peeled for continuing developments, as users get closer to seeing a public release of SharePoint Server 2016.

Emily Rae Aldridge, May 19, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Archive.is Preserves Online Information

May 18, 2015

Today’s information seekers use the Internet the way some of used reference books growing up. Unlike the paper tomes on our dusty bookshelves, however, websites can change their content without so much as a by-your-leave. Suggestions for preserving online information can be found in “Create Publicly Available Web Page Archives with Archive.is” at gHacks.net.

Writer Martin Brinkmann begins by listing several local options familiar to many of us. There’s Ctrl-s, of course, and assorted screenshot-saving methods. Website archivers like Httrack perform their own crawls and save the results to the user’s local machine. Remotely, Archive.org automatically creates snapshots of prominent sites, but users cannot control the results. Enter Archive.is. Brinkmann writes:

Archive.is is a free service that helps you out. To use it, paste a web address into the form on the services main page and hit submit url afterwards. The service takes two snapshots of that page at that point in time and makes it available publicly. The first takes a static snapshot of the site. You find images, text and other static contents included while dynamic contents and scripts are not. The second snapshot takes a screenshot of the page instead. An option to download the data is provided. Note that this downloads the textual copy of the site only and not the screenshot. A Firefox add-on has been created for the service which may be useful to some of its users. It creates automatic snapshots of every web page that you bookmark in the web browser after installation of the add-on.”

Wow, don’t set and forget that Firefox option! In fact, the article cautions, be mindful of the public availability of every Archive.is snapshot; Brinkmann reasonably suggests the tool could benefit from a password feature. Still, this could be an option to preserve important (but, for the prudent, impersonal) information found online.

Cynthia Murrell, May 18, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

HP Idol and Hadoop: Search, Analytics, and Big Data for You

May 16, 2015

I was clicking through links related to Autonomy IDOL. One of the links which I noted was to a YouTube video labeled “HP IDOL for for Hadoop: Create a Smarter Data Lake.” Hadoop has become a simile for making sense of Big Data. I am not sure what Big Data are, but I assume I will know when my eight gigabyte USB key cannot accept another file. Big Data? Doesn’t it depend on one’s point of view?

What is fascinating about the HP Idol video is that it carries a posting date of October 2014, which is in the period when HP was ramping up its anti-Autonomy legal activities. The video, I assumed before watching, would break from the Autonomy marketing assertions and move in a bold, new direction.

The video contained some remarkable assertions. Please, watch the video yourself because I may have missed some howlers as I was chuckling and writing on my old school notepad with a decidedly old fashioned pencil. Hey, these tools work, which is more than I can say for some of the software we examined last week.

Here’s what I noted with the accompanying screenshot so you can locate the frame in the YouTube video to double check my observation with the reality of the video.

First, there is the statement that in an organization 88 percent of its information is “unanalyzed.” The source is a 2012 study from Forrsights Strategy Spotlight: Business Intelligence and Big Data. Forrester, another mid tier consulting firm, produces these reports for its customers. Okay, a couple of years old research. Maybe it is valid? Maybe not? My thought was that HP may be a company which did not examine the data to which it had access about Autonomy before it wrote a check for billions of dollars. I assume HP has rectified any glitch along this line. HP’s litigation with Autonomy and the billions in write down for the deal underscore the problem with unanalyzed data. Alas, no reference was made to this case example in the HP video.

Second, Hadoop, a variant of Google’s MapReduce technology, is presented as a way to reap the benefits of cost efficiency and scalability. These are generally desirable attributes of Hadoop and other data management systems. The hitch, in my opinion, is that it is a collection of projects. These have been developed via the open source / commercial model. Hadoop works well for certain types of problems. Extract, transform, and load works reasonably well once the Hadoop installation is set up, properly resourced, and the Java code debugged so it works. Hadoop requires some degree of technical sophistication; otherwise, the system can be slow, stuffed with duplicates, and a bit like a Rube Goldberg machine. But the Hadoop references in the video are not a demonstration. I noted this “explanation.”

image

Third, HP jumps from the Hadoop segment to “what if” questions. I liked the “democratize Big Data” because “Big Data Changes everything.” Okay, but the solution is Idol for Hadoop. The HP approach is to create a “smarter data lake.” Hmmm. Hadoop to Idol to data lake for the purpose of advanced analytics, machine learning functions, and enterprise level security. That sounds quite a bit like Autonomy’s value proposition before it was purchased from Dr. Lynch and company. In fact, Autonomy’s connectors permitted the system to ingest disparate types of data as I recall.

Fourth, the next logical discontinuity is the shift from Hadoop to something called “contextual search.” A Gartner report is presented which states with Douglas McArthur-like confidence:

HP Idol. A leader in the 2014 Garnter Magic Quadrant for Contextual Search.

What the heck is contextual search in a Hadoop system accessed by Autonomy Idol? The answer is SEARCH. Yep, a concept that has been difficult to implement for 20, maybe 30 years. Search is so difficult to sell that Dr. Lynch generated revenues by acquiring companies and applying his neuro-linguistic methods to these firms’ software. I learned:

The sophistication and extensibility of HP Autonomy’s Intelligent Data Operating Layer (Idol) offering enable it to tackle the most demanding use cases, such as fraud detection and search within large video libraries and feeds.

Yo, video. I thought Autonomy acquired video centric companies and the video content resided within specialized storage systems using quite specific indexing and information access features. Has HP cracked the problem of storing video in Hadoop so that a licensee can perform fraud detection and search within video libraries. My experience with large video libraries is that certain video like surveillance footage is pretty tough to process with accuracy. Humans, even academic trainees, can be placed in front of a video monitor and told, “Watch this stream. Note anomalies.” Not exciting but necessary because processing large volumes of video remains what I would describe as “a bit of a challenge, grasshopper.” Why is Google adding wild and crazy banners, overlays, and required metadata inputs? Maybe because automated processing and magical deep linking are out of reach? HP appears to have improved or overhauled Autonomy’s video analysis functions, and the Gartner analyst is reporting a major technical leap forward. Identifying a muzzle flash is different from recognizing a face in a flow of subway patrons captured on a surveillance camera, is it not?

image

I have heard some pre HP Autonomy sales pitches, but I can’t recall hearing that Idol can crunch flows of video content unless one uses the quite specialized system Autonomy acquired. Well, I have been wrong before, and I am certainly not qualified to be an analyst like the ones Gartner relies upon. I learned that HP Idol has a comprehensive list of data connectors. I think I would use the word “library,” but why niggle?

Fifth, the video jumps to a presentation of a “content hub.” The idea is that HP idol provides visual programming tools. I assume an HP Idol customer will point and click to create queries. The  queries will deliver outputs from the Hadoop data management system and the content which embodies the data lake. The user can also run a query and see a list of documents. but the video jumps from what strikes me as exactly what many users no longer want to do to locate information. One can search effectively when one knows what one is looking for and that the needed information is actually in the index. The use case appears to be health care and the video concludes with a reminder that one can perform advanced analytics. There is a different point of view available in this ParAccel  white paper.

I understand the strengths and weaknesses of videos. I have been doing some home brew videos since I retired. But HP is presenting assertions about Autonomy’s technology which seem to be out of step with my understanding of what Idol, the digital reasoning engine, Autonomy’s acquired video technology.

The point is that HP seems to be out marketing Autonomy’s marketing. The assert6ions and logical leaps in the HP Idol Hadoop video stretch the boundaries of my credulity. I find this interesting because HP is alleging that Autonomy used similar verbal polishing to convince HP to write a billion dollar check for a search vendor which had grown via acquisitions over a period of 15 years.

Stephen E Arnold, May 16, 2015

Lousy Search Results. An Attention Span Issue?

May 15, 2015

I read the enervating “Humans Have Shorter Attention Span Than Goldfish, Thanks to Smartphones.” Yep, thanks. When I am working and someone speaks to me, I often let out a squeal and twitch. I concentrate on the task at hand to the exclusion of the world. Some folks may lack this old-school concentration.

According to the write up, short attention spans are due to smartphones, not stupidity, a failure to exercise discipline over the mind, or the cranial wiring which permits one to focus. I learned:

According to scientists, the age of smartphones has left humans with such a short attention span even a goldfish can hold a thought for longer. Researchers surveyed 2,000 participants in Canada and studied the brain activity of 112 others using electroencephalograms. The results showed the average human attention span has fallen from 12 seconds in 2000, or around the time the mobile revolution began, to eight seconds.

Right, 12 seconds. That is probably enough attention for pre-Millennials. Eight seconds is too darned long to concentrate on any one thing.

Is this the next Dark Web research specialist I will hire?

When one of the people lobbying me for work whips out a smartphone, scans an iPad, and lets his or her eyes roam around the room—that’s it. No work. The goldfish has a nine second attention span. The fish I have watched in the holding tank in a Chinese restaurant in Wu Han seemed to be able to fix their attention for far long. One red fish just hovered in place and regarded me for 30 seconds maybe more.

Instead of hiring humans, perhaps I should go with a giant koi? Are lousy search skills an example of what happens when one cannot concentrate? Nah, blame the vendor or the IT department. Entitlement management works well.

Stephen E Arnold, May 15, 2015

Developing an NLP Semantic Search

May 15, 2015

Can you imagine a natural language processing semantic search engine?  It would be a lovely tool to use in your daily routines and make research a bit easier.  If you are working on such a project and are making a progress, keep at that startup because this is lucrative field at the moment.  Over at Stack Overflow, an entrepreneuring spirit is trying to develop a “Semantic Search With NLP And Elasticsearch”:

“I am experimenting with Elasticsearch as a search server and my task is to build a “semantic” search functionality. From a short text phrase like “I have a burst pipe” the system should infer that the user is searching for a plumber and return all plumbers indexed in Elasticsearch.

Can that be done directly in a search server like Elasticsearch or do I have to use a natural language processing (NLP) tool like e.g. Maui Indexer. What is the exact terminology for my task at hand, text classification? Though the given text is very short as it is a search phrase.”

Given that this question was asked about three years ago, a lot has been done not only with Elasticsearch, but also NLP.  Search is moving towards a more organic experience, but accuracy is often muddled by different factors.  These include the quality of the technology, classification, taxonomies, ads in results, and even keywords (still!).

NLP semantic search is closer now than it was three years ago, but technology companies would invest a lot of money in a startup that can bridge the gap between natural language and machine learning.

Whitney Grace, May 15, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Automated Search News: Lost in Link Land

May 14, 2015

I scanned the Paper.li’s “The Enterprise Search Daily.” I spotted this item:

image

Curious, I clicked on it. Here’s what Sinequa displayed:

image

Isn’t Sinequa one of the vendors Gartner described as a leader of the search pack. Not only was the Paper.li link submitted by Embedded something wrong. The source url is a 404.

So, how are those automated information systems supposed to work? See my write up about IBM’s burrito to get a glimpse of what happens when big ideas cannot be converted into workable components.

Yep, page not found. Reality is different from the marketing hoo hah.

Stephen E Arnold, May 14, 2015

Don’t  Fear the AI

May 14, 2015

Will intelligent machines bring about the downfall of the human race? Unlikely, says The Technium, in “Why I Don’t Worry About a Super AI.” The blogger details four specific reasons he or she is unafraid: First, AI does not seem to adhere to Moore’s law, so no Terminators anytime soon. Also, we do have the power to reprogram any uppity AI that does crop up and (reason three) it is unlikely that an AI would develop the initiative to reprogram itself, anyway. Finally, we should see managing this technology as an opportunity to clarify our own principles, instead of a path to dystopia. The blog opines:

“AI gives us the opportunity to elevate and sharpen our own ethics and morality and ambition. We smugly believe humans – all humans – have superior behavior to machines, but human ethics are sloppy, slippery, inconsistent, and often suspect. […] The clear ethical programing AIs need to follow will force us to bear down and be much clearer about why we believe what we think we believe. Under what conditions do we want to be relativistic? What specific contexts do we want the law to be contextual? Human morality is a mess of conundrums that could benefit from scrutiny, less superstition, and more evidence-based thinking. We’ll quickly find that trying to train AIs to be more humanistic will challenge us to be more humanistic. In the way that children can better their parents, the challenge of rearing AIs is an opportunity – not a horror. We should welcome it.”

Machine learning as a catalyst for philosophical progress—interesting perspective. See the post for more details behind this writer’s reasoning. Is he or she being realistic, or naïve?

Cynthia Murrell, May 14, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Explaining Big Data Mythology

May 14, 2015

Mythologies usually develop over a course of centuries, but big data has only been around for (arguably) a couple decades—at least in the modern incarnate.  Recently big data has received a lot of media attention and product development, which was enough to give the Internet time to create a big data mythology.  The Globe and Mail wanted to dispel some of the bigger myths in the article, “Unearthing Big Myths About Big Data.”

The article focuses on Prof. Joerg Niessing’s big data expertise and how he explains the truth behind many of the biggest big data myths.  One of the biggest items that Niessing wants people to understand is that gathering data does not equal dollar signs, you have to be active with data:

“You must take control, starting with developing a strategic outlook in which you will determine how to use the data at your disposal effectively. “That’s where a lot of companies struggle. They do not have a strategic approach. They don’t understand what they want to learn and get lost in the data,” he said in an interview. So before rushing into data mining, step back and figure out which customer segments and what aspects of their behavior you most want to learn about.”

Niessing says that big data is not really big, but made up of many diverse, data points.  Big data also does not have all the answers, instead it provides ambiguous results that need to be interpreted.  Have questions you want to be answered before gathering data.  Also all of the data returned is not the greatest.  Some of it is actually garbage, so it cannot be usable for a project.  Several other myths are uncovered, but the truth remains that having a strategic big data plan in place is the best way to make the most of big data.

Whitney Grace, May 14, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The Philosophy of Semantic Search

May 13, 2015

The article Taking Advantage of Semantic Search NOW: Understanding Semiotics, Signs, & Schema on Lunametrics delves into semantics on a philosophical and linguistic level as well as in regards to business. He goes through the emergence of semantic search beginning with Ray Kurzweil’s interest in machine learning meaning as opposed to simpler keyword search. In order to fully grasp this concept, the author of the article provides a brief refresher on Saussure’s semantics.

“a Sign is comprised of a signifier, or the name of a thing, and the signified, what that thing represents… Say you sell iPad accessories. “iPad case” is your signifier, or keyword in search marketing speak. We’ve abused the signifier to the utmost over the years, stuffing it onto pages, calculating its density with text tools, jamming it into title tags, in part because we were speaking to robot who read at a 3-year-old level.”

In order to create meaning, we must go beyond even just the addition of price tag and picture to create a sign. The article suggests the need for schema, in the addition of some indication of whom and what the thing is for. The author, Michael Bartholow, has a background in linguistics and marketing and search engine optimization. His article ends with the question of when linguists, philosophers and humanists will be invited into the conversation with businesses, perhaps making him a true visionary in a field populated by data engineers with tunnel-vision.

Chelsea Kerwin, May 13, 2014

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

CyberOSINT Videos

May 12, 2015

Xenky.com has posted a single page which provides one click access to the three CyberOSINT videos. The videos provide highlight of Stephen E Arnold’s new monograph about next generation information access. You can explore the videos which run a total of 30 minutes on the Xenky site. One viewer said, “This has really opened my eyes. Thank you.”

Kenny Toth, May 12, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta