CyberOSINT banner

Is Collaboration the Key to Big Data Progress?

May 22, 2015

The article titled Big Data Must Haves: Capacity, Compute, Collaboration on GCN offers insights into the best areas of focus for big data researchers. The Internet2 Global Summit is in D.C. this year with many exciting panelists who support the emphasis on collaboration in particular. The article mentions the work being presented by several people including Clemson professor Alex Feltus,

“…his research team is leveraging the Internet2 infrastructure, including its Advanced Layer 2 Service high-speed connections and perfSONAR network monitoring, to substantially accelerate genomic big data transfers and transform researcher collaboration…Arizona State University, which recently got 100 gigabit/sec connections to Internet2, has developed the Next Generation Cyber Capability, or NGCC, to respond to big data challenges.  The NGCC integrates big data platforms and traditional supercomputing technologies with software-defined networking, high-speed interconnects and visualization for medical research.”

Arizona’s NGCC provides the essence of the article’s claims, stressing capacity with Internet2, several types of computing, and of course collaboration between everyone at work on the system. Feltus commented on the importance of cooperation in Arizona State’s work, suggesting that personal relationships outweigh individual successes. He claims his own teamwork with network and storage researchers helped him find new potential avenues of innovation that might not have occurred to him without thoughtful collaboration.

Chelsea Kerwin, May 22, 2014

Stephen E Arnold, Publisher of CyberOSINT at

Long-term Plans for SharePoint

May 21, 2015

Through all the iterations of SharePoint, it seems that Microsoft has wised up and is finally giving customers more of what they want. The release of SharePoint Server 2016 shows a shift back toward on-premises installations, and yet there will still be functions supported through the cloud. This new hybrid emphasis provides a third pathway through which users are experiencing SharePoint. The CMS Wire article, “3 SharePoint Paths for the Next 10 Years,” covers all the details.

The article begins:

“Microsoft Office 365 has proven to be a major disruption of how companies use SharePoint to meet business requirements. Rumors, fear, uncertainty and doubt proliferate around Microsoft’s plans for SharePoint’s future releases, as well as the support of critical features and functionality companies rely on . . . So, taking into account Office 365, the question is: How will companies be using SharePoint over the next 10 years?”

Stephen E. Arnold of is a leader in SharePoint, with a lifelong career in search. His SharePoint feed is a great resource for users and managers alike, or anyone who needs to keep on top of the latest developments. It may be that the hybrid solution is a way to keep on-premises users happy while they still benefit from the latest cloud functions like Delve and OneDrive.

Emily Rae Aldridge, May 21, 2015

Sponsored by, publisher of the CyberOSINT monograph

Developing an NLP Semantic Search

May 15, 2015

Can you imagine a natural language processing semantic search engine?  It would be a lovely tool to use in your daily routines and make research a bit easier.  If you are working on such a project and are making a progress, keep at that startup because this is lucrative field at the moment.  Over at Stack Overflow, an entrepreneuring spirit is trying to develop a “Semantic Search With NLP And Elasticsearch”:

“I am experimenting with Elasticsearch as a search server and my task is to build a “semantic” search functionality. From a short text phrase like “I have a burst pipe” the system should infer that the user is searching for a plumber and return all plumbers indexed in Elasticsearch.

Can that be done directly in a search server like Elasticsearch or do I have to use a natural language processing (NLP) tool like e.g. Maui Indexer. What is the exact terminology for my task at hand, text classification? Though the given text is very short as it is a search phrase.”

Given that this question was asked about three years ago, a lot has been done not only with Elasticsearch, but also NLP.  Search is moving towards a more organic experience, but accuracy is often muddled by different factors.  These include the quality of the technology, classification, taxonomies, ads in results, and even keywords (still!).

NLP semantic search is closer now than it was three years ago, but technology companies would invest a lot of money in a startup that can bridge the gap between natural language and machine learning.

Whitney Grace, May 15, 2015

Sponsored by, publisher of the CyberOSINT monograph

Don’t  Fear the AI

May 14, 2015

Will intelligent machines bring about the downfall of the human race? Unlikely, says The Technium, in “Why I Don’t Worry About a Super AI.” The blogger details four specific reasons he or she is unafraid: First, AI does not seem to adhere to Moore’s law, so no Terminators anytime soon. Also, we do have the power to reprogram any uppity AI that does crop up and (reason three) it is unlikely that an AI would develop the initiative to reprogram itself, anyway. Finally, we should see managing this technology as an opportunity to clarify our own principles, instead of a path to dystopia. The blog opines:

“AI gives us the opportunity to elevate and sharpen our own ethics and morality and ambition. We smugly believe humans – all humans – have superior behavior to machines, but human ethics are sloppy, slippery, inconsistent, and often suspect. […] The clear ethical programing AIs need to follow will force us to bear down and be much clearer about why we believe what we think we believe. Under what conditions do we want to be relativistic? What specific contexts do we want the law to be contextual? Human morality is a mess of conundrums that could benefit from scrutiny, less superstition, and more evidence-based thinking. We’ll quickly find that trying to train AIs to be more humanistic will challenge us to be more humanistic. In the way that children can better their parents, the challenge of rearing AIs is an opportunity – not a horror. We should welcome it.”

Machine learning as a catalyst for philosophical progress—interesting perspective. See the post for more details behind this writer’s reasoning. Is he or she being realistic, or naïve?

Cynthia Murrell, May 14, 2015

Sponsored by, publisher of the CyberOSINT monograph

Explaining Big Data Mythology

May 14, 2015

Mythologies usually develop over a course of centuries, but big data has only been around for (arguably) a couple decades—at least in the modern incarnate.  Recently big data has received a lot of media attention and product development, which was enough to give the Internet time to create a big data mythology.  The Globe and Mail wanted to dispel some of the bigger myths in the article, “Unearthing Big Myths About Big Data.”

The article focuses on Prof. Joerg Niessing’s big data expertise and how he explains the truth behind many of the biggest big data myths.  One of the biggest items that Niessing wants people to understand is that gathering data does not equal dollar signs, you have to be active with data:

“You must take control, starting with developing a strategic outlook in which you will determine how to use the data at your disposal effectively. “That’s where a lot of companies struggle. They do not have a strategic approach. They don’t understand what they want to learn and get lost in the data,” he said in an interview. So before rushing into data mining, step back and figure out which customer segments and what aspects of their behavior you most want to learn about.”

Niessing says that big data is not really big, but made up of many diverse, data points.  Big data also does not have all the answers, instead it provides ambiguous results that need to be interpreted.  Have questions you want to be answered before gathering data.  Also all of the data returned is not the greatest.  Some of it is actually garbage, so it cannot be usable for a project.  Several other myths are uncovered, but the truth remains that having a strategic big data plan in place is the best way to make the most of big data.

Whitney Grace, May 14, 2015

Sponsored by, publisher of the CyberOSINT monograph

The Forgotten List of Telegraph

May 13, 2015

Technology experts and information junkies in the European Union are in an uproar over a ruling that forces Google to remove specific information from search results.  “The right to be forgotten” policy upheld by the EU is supposed to help people who want “inadequate, irrelevant, or no longer relevant” information removed from Google search results.  Many news outlets in Europe have been affected, including the United Kingdom’s Telegraph.  The Telegraph has been recording a list called “Telegraph Stories Affected By ‘EU Right To Be Forgotten’” of all the stories they have been forced to remove.

According to the article, the Google has received over 250,000 requests to remove information.  Some of these requests concern stories published by Telegraph.  While many oppose the ‘right to be forgotten,’ including the House of Lords, others are still upholding the policy:

“But David Smith, deputy commissioner and director of data protection for the Information Commissioner’s Office (ICO), hit back and claimed that the criticism was misplaced, ‘as the initial stages of its implementation have already shown.’ ”

Many of the “to be forgotten” requests concern people with criminal pasts and misdeeds that are color them in an bad light.  The Telegraph’s content might be removed from Google, but they are keeping a long, long list on their website.  Read the stories there or head on over to the US Google website-freedom of the press still holds true here.

Whitney Grace, May 13, 2015

Sponsored by, publisher of the CyberOSINT monograph

The Philosophy of Semantic Search

May 13, 2015

The article Taking Advantage of Semantic Search NOW: Understanding Semiotics, Signs, & Schema on Lunametrics delves into semantics on a philosophical and linguistic level as well as in regards to business. He goes through the emergence of semantic search beginning with Ray Kurzweil’s interest in machine learning meaning as opposed to simpler keyword search. In order to fully grasp this concept, the author of the article provides a brief refresher on Saussure’s semantics.

“a Sign is comprised of a signifier, or the name of a thing, and the signified, what that thing represents… Say you sell iPad accessories. “iPad case” is your signifier, or keyword in search marketing speak. We’ve abused the signifier to the utmost over the years, stuffing it onto pages, calculating its density with text tools, jamming it into title tags, in part because we were speaking to robot who read at a 3-year-old level.”

In order to create meaning, we must go beyond even just the addition of price tag and picture to create a sign. The article suggests the need for schema, in the addition of some indication of whom and what the thing is for. The author, Michael Bartholow, has a background in linguistics and marketing and search engine optimization. His article ends with the question of when linguists, philosophers and humanists will be invited into the conversation with businesses, perhaps making him a true visionary in a field populated by data engineers with tunnel-vision.

Chelsea Kerwin, May 13, 2014

Sponsored by, publisher of the CyberOSINT monograph

Math and Search Experts

May 10, 2015

I found “There’s More to Mathematics Than Rigor and Proofs” a useful reminder between the the person who is comfor4table with math and the person who asserts he is good in math. With more search and content processing embracing numerical recipes, the explanations of what a math centric system can do often leave me rolling my eyes and, in some cases, laughing out loud.

This essay explains that time and different types of math experiences are necessary stages in developing a useful facility with some of today’s information retrieval systems and methods. The write up points out:

The distinction between the three types of errors can lead to the phenomenon (which can often be quite puzzling to readers at earlier stages of mathematical development) of a mathematical argument by a post-rigorous mathematician which locally contains a number of typos and other formal errors, but is globally quite sound, with the local errors propagating for a while before being cancelled out by other local errors.  (In contrast, when unchecked by a solid intuition, once an error is introduced in an argument by a pre-rigorous or rigorous mathematician, it is possible for the error to propagate out of control until one is left with complete nonsense at the end of the argument.)

Perhaps this section of the article sheds some light on the content processing systems which wander off the track of relevance and accuracy? As my mathy relative Vladimir Igorevich Arnold was fond of saying to anyone who would listen: Understand first, then talk.

Stephen E Arnold, May 10, 2015

Google Glass: A Harsh Assessment

May 8, 2015

I read “The Debacle of Google Glass.” As a 70 year old wearer of trifocal lenses, I failed to see (pun alert) the future in this wonky product. I haven’t thought too much about Google Glass, although I did a research report for one of those really stable financial outfits.

“Debacle” comes at Glass with some zest. I read:

When Google introduced their Google Glass, this was the first thing that came to mind about this project. I wondered if Google even had a clue how tech adoption cycles develop. While it is true glasses had been used in vertical markets since 1998, even after all of this time, we saw no interest by consumers. Google’s decision to aim Glass at consumers first, yet price them as if they were going to vertical markets, stumped me. Even the folks who had spent decades making glasses for use in manufacturing, government applications, and transportation were dumfounded by Google’s consumer focus with Google Glass, priced at $1500. Apparently, Google found out the hard way how tech products get adopted. They lost hundreds of millions of dollars on this project and, worse yet, they soured the consumer market for similar products. Even those with disposable income who could afford to be a Glass Explorer have to feel taken as Google used them as beta testers at their personal expense. I have seen a recent report that details the damage in consumer minds about Google Glass and, even if a competitor came to market with a cheaper product better than Glass, they would have a hard time getting anything but vertical users interested.

The idea that Google has some weak spots is not a new one. The write up includes what strikes me as a positive nod to the Apple Watch. My hunch is that the idea is that Apple is better at some things than Google.

The write up pops the “debacle” word again in this passage which I highlighted with my trusty pink marker. I reserve pink for anti Google sentiments, by the way:

Google glasses was a debacle for multiple reasons. It gave Google a black eye in the minds of consumers and cost them a lot in the way of consumer confidence when it comes to their efforts in hardware. It also tainted the market for consumer glasses for them and competitors in the future beyond how these products can be used in vertical markets. It also proved to be a debacle for a lot of partners who lost serious money on the Google Glass project. I spoke at a major customer conference of a company who was highly focused on the optical side of the glass. For years, they were very successful in vertical markets but were pulled into the consumer glasses area by Google and the media hype and tried to convince their own customers to jump into the space with competitive products. To their chagrin, most of their customers passed on this and I am sure they are glad they did.

Like most Glass analyses, this write up ignores some of the points I still find interesting; for example, the Babak Parviz (yep, the smart contact lens person with the multiple versions of his name) Microsoft-Google-Amazon adventure, the impact of the senior manager-marketer interaction on intra company inter personal processes, the fascinating sales approach, the likely re-emergence of a more fashionable and stylish Glass, and the concomitant use of the festive neologism “glasshole.” Not many products warrant a coinage like “glasshole.”

If you are interested in Glass, you will find the write up fascinating. Perhaps the full story of Glass will emerge as a Netflix original series?

Stephen E Arnold, May 8, 2015

Annual Ranking of Legal Sector Puts Omnivere at the Top

May 6, 2015

The article titled Omnivere Voted Best National End-To-End Ediscovery, Managed Ediscovery & Litigation Support, and Data & Technology Provider in 2015 Best of the National Law Journal on Blackbird discusses the ranking and what it means. This is an annual ranking that is conducted with readers of The National Law Journal & Legal Times casting ballots based on their experiences with their own legal services. Omnivere won this year’s legal sector “best in show.” The article states,

“In less than a year, OmniVere has established itself as a trailblazer in the next wave of data and technology consulting, eDiscovery services and litigation support. In creating an in-house team of expert, veteran data consultants, including former senior leadership from FTI, Navigant Consulting, Integreon, Recommind, Xerox and Berkeley Research Group, OmniVere is well positioned to deliver a range of products and services on a global playing field.”
Omnivere was launched in May 2014 and rapidly grew into one of the biggest and most sought-after companies for its work in litigation support and discovery management. Erik Post, Omnivere President, is quoted in the article celebrating the win and the overall success of the company. He suggests that in spite of their new brand, the work and abilities of the staff is “resonating across the country.”

Chelsea Kerwin, May 6, 2014

Sponsored by, publisher of the CyberOSINT monograph

Next Page »