CyberOSINT banner

Altiar And dtSearch Combine

May 27, 2015

Sometimes when items are combine they create something even better, such as Oreos and peanut butter, Disney and Marvel, and Netflix and original series.  EContentMag alerted us that a new team-up is underway between two well known companies.  The press release title says it all, “Altiar Cloud-Based ECM Platform Is Embedding The dtSearch Engine.”  Altair is an enterprise content management platform that has been specifically used by Microsoft Azure.  The popular dtSearch platform has been searching through terabytes since 1991 and is referred to as a powerful search tool.  Embedding dtSearch into the Altiar core will make it a more powerful ECM.

Altiar is a popular ECM and can only be improved by dtSearch:

“A cloud-based service, Altiar includes rapid setup, scalability, and storage. It can accept any type of file, from PowerPoint to streaming video, as well as providing a host of tools and services to create custom content pages, newsletters, personal zones, and the like. The platform lets users not only access content from any connected device, but also manage, share, and track content, including features like email alerts.”

Microsoft is not a main player in the cloud computing and Microsoft Azure is supposed to drive more customers to them.  Anything, like this new Altair improving its search will make it more appealing.

Whitney Grace, May 27, 2015

Sponsored by, publisher of the CyberOSINT monograph

Welcome YottaSearch

May 26, 2015

There is another game player in the world of enterprise search: Yotta Data Technologies announced their newest product: “Yotta Data Technologies Announces Enterprise Search And Big Data Analytics Platform.”  Yotta Data Technologies is known for its affordable and easy to use information management solutions. Yotta has increased its solutions by creating YottaSearch, a data analytics and search platform designed to be a data hub for organizations.

“YottaSearch brings together the most powerful and agile open source technologies available to enable today’s demanding users to easily collect data, search it, analyze it and create rich visualizations in real time.  From social media and email for Information Governance and eDiscovery to web and network server logs for Information Technology Operations Analytics (ITOA), YottaSearch™ provides the Big Data Analytics for users to derive information intelligence that may be critical to a project, case, business unit or market.”

YottaSearch uses the popular SaaS model and offers users not only data analytics and search, but also knowledge management, information governance, eDiscovery, and IT operations analytics.  Yotta decided to create YottaSearch to earn revenue from the burgeoning big data market, especially the enterprise search end.

The market is worth $1.7 billion, so Yotta has a lot of competition, but if they offer something different and better than their rivals they stand a chance to rise to the top.

Whitney Grace, May 26, 2015
Sponsored by, publisher of the CyberOSINT monograph

Hadoop Has Accessories

May 25, 2015

ZDNet’s article, “Why Hadoop Is Hard, And How To Make It Easier” alludes that Hadoop was going to disappear at some point.  We don’t know about you, but the open source big data platform has a huge support community and hundreds have adopted it, if not thousands of companies, have deployed Hadoop.  The article argues otherwise, citing that a recent Gartner survey found that only 26 percent of the corporate world is actively using it.

One of the biggest roadblocks for Hadoop is that it is designed for specialist to tinker with and it is not an enterprise tool.  That might change when Microsoft releases its new SQL Server 2016.  With the new server, Microsoft will add Polybase that bridges Hadoop to the server.  Microsoft is still the most popular OS for enterprise systems and when this upgrade becomes available Hadoop will be a more viable enterprise option.

What is the counterpoint?

“It’s also a counterpoint to the interpretation of Gartner’s survey that says Hadoop is somehow languishing. What’s languishing is the Enterprise’s willingness to invest in a new, premium skill set, and the low productivity involved in working with Hadoop through its motley crew of command-line shells and scripting languages. A good data engine should work behind the scenes and under the covers, not in the spotlight.”

So once more enterprise systems need to be updated, which is comparable to how Hadoop needs to be augmented with add-on features to make it more accessible, such as mature analytics tools, DBMS abstraction layers and Hadoop-as-a-Service cloud offerings.

Whitney Grace, May 25, 2015

Sponsored by, publisher of the CyberOSINT monograph

Big Data: The Shrinky Dink Approach

May 21, 2015

I read “To Handle Big Data, Shrink It.” Years ago I did a job for a unit of a blue chip consulting firm. My task was to find a technology which allowed a financial institution to query massive data sets without bringing the computing system to its knees and causing the on-staff programmers to howl with pain.

I located an outfit in what is now someplace near a Prague-like location. The company was CrossZ, and it used a wonky method of compression and a modified version of SQL with a point and click interface. The idea was that a huge chunk of the bank data—for instance, the transactions in the week before mother’s day—to be queried for purchasing-related trends. Think fraud. Think flowers. Think special promotions that increased sales. I have not kept track of the low profile, secretive company. I did think of it when I read the “shrink Big Data story.”

This passage resonated and sparked my memory:

MIT researchers will present a new algorithm that finds the smallest possible approximation of the original matrix that guarantees reliable computations. For a class of problems important in engineering and machine learning, this is a significant improvement over previous techniques. And for all classes of problems, the algorithm finds the approximation as quickly as possible.

The point is that it is now 2015 and a mid 1990s notion seems to be fresh. My hunch is that the approach will be described as novel, innovative, and a solution to the problems Big Data poses.

Perhaps the MIT approach is those things. For me, the basic idea is that Big Data has to be approached in a rational way. Otherwise, how will queries of “Big Data” which has been processed and a stream of new or changed “Big Data” be processed in a way that is affordable, is computable, and is meaningful to a person who has no clue what is “in” the Big Data.

Fractal compression, recursive methods, mereological techniques, and other methods are a good idea. I am delighted with the notion that Big Data has to be made small in order to be more useful to companies with limited budgets and a desire to answer some basic business questions with small data.

Stephen E Arnold, May 21, 2015

Long-term Plans for SharePoint

May 21, 2015

Through all the iterations of SharePoint, it seems that Microsoft has wised up and is finally giving customers more of what they want. The release of SharePoint Server 2016 shows a shift back toward on-premises installations, and yet there will still be functions supported through the cloud. This new hybrid emphasis provides a third pathway through which users are experiencing SharePoint. The CMS Wire article, “3 SharePoint Paths for the Next 10 Years,” covers all the details.

The article begins:

“Microsoft Office 365 has proven to be a major disruption of how companies use SharePoint to meet business requirements. Rumors, fear, uncertainty and doubt proliferate around Microsoft’s plans for SharePoint’s future releases, as well as the support of critical features and functionality companies rely on . . . So, taking into account Office 365, the question is: How will companies be using SharePoint over the next 10 years?”

Stephen E. Arnold of is a leader in SharePoint, with a lifelong career in search. His SharePoint feed is a great resource for users and managers alike, or anyone who needs to keep on top of the latest developments. It may be that the hybrid solution is a way to keep on-premises users happy while they still benefit from the latest cloud functions like Delve and OneDrive.

Emily Rae Aldridge, May 21, 2015

Sponsored by, publisher of the CyberOSINT monograph

Open Source Conquers Proprietary Software, Really?

May 19, 2015

Open source is an attractive option for organizations wanting to design their own software as well as saving money of proprietary licenses.  ZDNet reports that “It’s An Open Source World-78 Percent of Companies Run Open Source Software”, but the adopters  do not manage their open source systems very well.  Every year Black Duck Software, an open source software logistics and legal solutions provider, and North Bridge, a seed to growth venture capital firm, run the Future of Open Source Survey.  Organizations love open source, but

“Lou Shipley, Black Duck’s CEO, said in a statement, ‘In the results this year, it has become more evident that companies need their management and governance of open source to catch up to their usage. This is critical to reducing potential security, legal, and operational risks while allowing companies to reap the full benefits OSS provides.’”

The widespread adoption is due to people thinking that open source software is easier to scale, has fewer security problems, and much faster to deploy.  Organizations, however, do not have a plan to manage open source, an automated code approval process, or have an inventory of open source components.  Even worse is that they are unaware of the security vulnerabilities.

It is great that open source is being recognized as a more viable enterprise solution, but nobody knows how to use it.

Whitney Grace, April 19, 2015
Sponsored by, publisher of the CyberOSINT monograph

Data Mining Algorithms Explained

May 18, 2015

In plain English too. Navigate to “Top 10 Data Mining Algorithms in Plain English.” When you fire up an enterprise content processing system, the algorithms beneath the user experience layer are chestnuts. Universities do a good job of teaching students about some reliable methods to perform data operations. In fact, the universities do such a good job that most content processing systems include almost the same old chestnuts in their solutions. The decision to use some or all of the top 10 data mining algorithms has some interesting consequences, but you will have to attend one of my lectures about the weaknesses of these numerical recipes to get some details.

The write up is worth a read. The article includes a link to information which underscores the ubiquitous nature of these methods. This is the Xindong Wu et all write up “Top 10 Algorithms in Data Mining.” Our research reveals that dependence on these methods is more wide spread now than they were seven years ago when the paper first appeared.

The implication then and now is that content processing systems are more alike than different. The use of similar methods means that the differences among some systems is essentially cosmetic. There is a flub in the paper. I am confident that you, gentle reader, will spot it easily.

Now to the “made simple” write up. The article explains quite clearly the what and why of 10 widely used methods. The article also identifies some of the weaknesses of each method. If there is a weakness, do you think it can be exploited? This is a question worth considering I suggest.

Example: What is a weakness of k means:

Two key weaknesses of k-means are its sensitivity to outliers, and its sensitivity to the initial choice of centroids. One final thing to keep in mind is k-means is designed to operate on continuous data — you’ll need to do some tricks to get it to work on discrete data.

Note the key word “tricks.” When one deals with math, the way to solve problems is to be clever. It follows that some of the differences among content processing systems boils down to the cleverness of the folks working on a particular implementation. Think back to your high school math class. Was there a student who just spit out an answer and then said, “It’s obvious.” Well, that’s the type of cleverness I am referencing.

The author does not dig too deeply into PageRank, but it too has some flaws. An easy way to identify one is to attend a search engine optimization conference. One flaw turbocharges these events.

My relative Vladimir Arnold, whom some of the Arnolds called Vlad the Annoyer, would have liked the paper. So do I. The write up is a keeper. Plus there is a video, perfect for the folks whose attention span is better than a goldfish’s.

Stephen E Arnold, May 18, 2015

Exit Governance. Enter DMP.

May 17, 2015

A DMP is a data management platform. I think in terms of databases. I find that software does not do a particularly reliable job “managing data.” Software can run processes, write log file, and perform other functions. But management, based on my experience at Booz, Allen & Hamilton, requires humans. Talking about analytics from Big Data and implementing a platform to perform management are apples and house paint in my mind.

Intrigued by the reference, I downloaded a document available upon registration from Infinitive. You can find the company’s Web site at The white paper maps you 10 ways a data management platform can help me.

I was not familiar with Infinitive. According to the firm’s Web site: Infinitive is

A Different Kind of Consultancy. Results-driven and client-centric. Fun, focused and flexible. Highly engaged and easy to work with. Those are the qualities that make Infinitive a different kind of consultancy. And they’re the pillars of our unique culture. Headquartered in the Washington, D.C. area, Infinitive specializes in digital ad solutions, business transformation, customer & audience intelligence and enterprise risk management. Leveraging best practices in process engineering, change management and program management, we design and deliver custom solutions for leading organizations in communications, media and entertainment, financial services and educational services. For our clients, the results include quantifiable performance improvement and tangible bottom-line value in addressing their most pressing challenges and fulfilling their top-priority objectives.

What is a data management platform?

White paper or two page document identifies these benefits of a DMP. I was hoping for an explanation of the “platform,” but let’s look at the payoffs from the platform.

The company points out that a DMP makes ad money go farther. Big Data become actionable. A DMP provides a foundation for analytics. The DMP “ensures the quality and accessibility of customer and audience intelligence data.” The DMP can harmonize data. A DMP allows me to “adapt traditional CRM strategies and technology to incorporate new customer behavior.” I can create new customer and audience “segments.” The DMP becomes the central nervous system for my company. And the DMP protects privacy.

That is a bundle of benefits. But what is the platform provided by a consulting company, especially one that is “fun”? I was not able to locate details about the platform. The company appears to be a firm focused on advertising.

The Web site includes a page about the DMP at this link. The information is buzzword heavy and fact free. My view is that the DMP is a marketing hook. The implied technology is consulting services. That’s okay, but I find the approach representative of marketing billable time, not delivering a platform with the remarkable and perhaps unattainable benefits suggested in the white paper.

The approach must work. The company’s Web site points out this message:


Not a platform, however.

Stephen E Arnold, May 17, 2015

Developing an NLP Semantic Search

May 15, 2015

Can you imagine a natural language processing semantic search engine?  It would be a lovely tool to use in your daily routines and make research a bit easier.  If you are working on such a project and are making a progress, keep at that startup because this is lucrative field at the moment.  Over at Stack Overflow, an entrepreneuring spirit is trying to develop a “Semantic Search With NLP And Elasticsearch”:

“I am experimenting with Elasticsearch as a search server and my task is to build a “semantic” search functionality. From a short text phrase like “I have a burst pipe” the system should infer that the user is searching for a plumber and return all plumbers indexed in Elasticsearch.

Can that be done directly in a search server like Elasticsearch or do I have to use a natural language processing (NLP) tool like e.g. Maui Indexer. What is the exact terminology for my task at hand, text classification? Though the given text is very short as it is a search phrase.”

Given that this question was asked about three years ago, a lot has been done not only with Elasticsearch, but also NLP.  Search is moving towards a more organic experience, but accuracy is often muddled by different factors.  These include the quality of the technology, classification, taxonomies, ads in results, and even keywords (still!).

NLP semantic search is closer now than it was three years ago, but technology companies would invest a lot of money in a startup that can bridge the gap between natural language and machine learning.

Whitney Grace, May 15, 2015

Sponsored by, publisher of the CyberOSINT monograph

The Latest SharePoint News from Ignite

May 14, 2015

The Ignite conference in Chicago has answered many of the questions that SharePoint users have been curious about for months now. Among them was the question of release timing and features for the latest iteration of SharePoint. CMS Wire gives a rundown in their article, “What’s Up With SharePoint? #MSIgnite.”

The article sums up the biggest news:

“Microsoft will continue to enhance the core offerings in the on-premises edition. It will also continue to develop SharePoint Online and update it as quickly as the updates are available. A preview version of SharePoint 2016 will be made available later this summer, with a beta version expected by the end of the year . . . In an afternoon session entitled Evolution of SharePoint Overview and Roadmap, the duo gave a rough outline of Microsoft’s plans, albeit without precise delivery dates.”

Having had to push back delivery dates once already, Microsoft is likely hesitant to announce anything solid until development is final. As far as qualities for the new version, Microsoft is focusing on: user experience, extensibility, and SharePoint management. The inclusion of user experience should be a welcome change for many. To stay in touch with developments as they become available, keep an eye on, and particularly his feed devoted to SharePoint. Stephen E. Arnold has made a lifelong career out of all things search, and he has a knack for distilling down the “need to know” facts to keep an organization on track.

Emily Rae Aldridge, May 14, 2015

Sponsored by, publisher of the CyberOSINT monograph

Next Page »