CyberOSINT banner

A Little Lucene History

March 26, 2015

Instead of venturing to Wikipedia to learn about Lucene’s history, visit the blog and read the post, “Lucene: The Good Parts.”  After detailing how Doug Cutting created Lucene in 1999, the post describes how searching through SQL in the early 2000s was a huge task.   SQL databases are not the best when it comes to unstructured search, so developers installed Lucene to make SQL document search more reliable.  What is interesting is how much it has been adopted:

“At the time, Solr and Elasticsearch didn’t yet exist. Solr would be released in one year by the team at CNET. With that release would come a very important application of Lucene: faceted search. Elasticsearch would take another 5 years to be released. With its recent releases, it has brought another important application of Lucene to the world: aggregations. Over the last decade, the Solr and Elasticsearch packages have brought Lucene to a much wider community. Solr and Elasticsearch are now being considered alongside data stores like MongoDB and Cassandra, and people are genuinely confused by the differences.”

If you need a refresher or a brief overview of how Lucene works, related jargon, tips for using in big data projects, and a few more tricks.  Lucene might just be a java library, but it makes using databases much easier.  We have said for a while, information is only useful if you can find it easily.  Lucene made information search and retrieval much simpler and accurate.  It set the grounds for the current big data boom.

Whitney Grace, March 26, 2015
Stephen E Arnold, Publisher of CyberOSINT at

SAS Text Miner Provides Valuable Predictive Analytics

March 25, 2015

If you are searching for predictive analytics software that provides in-depth text analysis with advanced linguistic capabilities, you may want to check out “SAS Text Miner.”  Predictive Analytics Today runs down the features and what SAS Text Miner and details how it works.

It is a user-friendly software with data visualization, flexible entity options, document theme discovery, and more.

“The text analytics software provides supervised, unsupervised, and semi-supervised methods to discover previously unknown patterns in document collections.  It structures data in a numeric representation so that it can be included in advanced analytics, such as predictive analysis, data mining, and forecasting.  This version also includes insightful reports describing the results from the rule generator node, providing clarity to model training and validation results.”

SAS Text Miner includes other features that draw on automatic Boolean rule generation to categorize documents and other rules can be exported into Boolean rules.  Data sets can be made from a directory on crawled from the Web.  The visual analysis feature highlights the relationships between discovered patterns and displays them using a concept link diagram.  SAS Text Miner has received high praise as a predictive analytics software and it might be the solution your company is looking for.

Whitney Grace, March 25, 2015
Stephen E Arnold, Publisher of CyberOSINT at

Modus Operandi Gets a Big Data Storage Contract

March 24, 2015

The US Missile Defense Agency awarded Modus Operandi a huge government contract to develop an advanced data storage and retrieval system for the Ballistic Missile Defense System.  Modus Operandi specializes in big data analytic solutions for national security and commercial organizations.  Modus Operandi posted a press release on their Web site to share the news, “Modus Operandi Awarded Contract To Develop Advanced Data Storage And Retrieval System For The US Missile Defense Agency.”

The contract is a Phase I Small Business Innovation Research (SBIR), under which Modus Operandi will work on the DMDS Analytic Semantic System (BASS).  The BASS will replace the old legacy system and update it to be compliant with social media communities, the Internet, and intelligence.

“ ‘There has been a lot of work in the areas of big data and analytics across many domains, and we can now apply some of those newer technologies and techniques to traditional legacy systems such as what the MDA is using,’ said Dr. Eric Little, vice president and chief scientist, Modus Operandi. ‘This approach will provide an unprecedented set of capabilities for the MDA’s data analysts to explore enormous simulation datasets and gain a dramatically better understanding of what the data actually means.’ ”

It is worrisome that the missile defense system is relying on an old legacy system, but at least it is being upgraded now.  Modus Operandi also sales Cyber OSINT and they are applying this technology in an interesting way for the government.

Whitney Grace, March 24, 2015
Stephen E Arnold, Publisher of CyberOSINT at

SharePoint’s Evolution of Ease

March 24, 2015

At SharePoint’s beginning, users and managers viewed it as a framework. It is often still referred to as an installation, and many third party vendors do quite well offering add-on options to flesh out the solution. However, due to users’ expectations, SharePoint is shifting its focus to accommodate quick and full implementation without a lengthy build-out. Read more in the CMS Wire article, “From Build It and Go, to Ready to Go with SharePoint.”

The article sums up the transformation:

“We hunger for solutions that can be quickly acquired and implemented, not ones that require building out complex and robust solutions.  The world around us is changing fast and it’s exciting to see how productivity tools are beginning to encompass almost every area of our lives. The evolution not only impacts new tools and products, but also the tools we have been using all long. In SharePoint, we can see this in the addition of Experiences and NextGen Portals.”

SharePoint 2016 is on its way and there will be addition information to leak throughout the coming months. Keep an eye on for breaking news and the latest releases. Stephen E. Arnold has made a career out of all things search, including enterprise and SharePoint, and his dedicated SharePoint feed is a great resource for professionals who need to keep up without a huge investment in research time.

Emily Rae Aldridge, March 24, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Data and Marketing Come Together for a Story

March 23, 2015

An article on the Marketing Experiments Blog titled Digital Analytics: How To Use Data To Tell Your Marketing Story explains the primacy of the story in the world of data. The conveyance of the story, the article claims, should be a collaboration between the marketer and the analyst, with both players working together to create an engaging and data-supported story. The article suggests breaking this story into several parts, similar to the plot points you might study in a creative writing class. Exposition, Rising Action, Climax, Denouement and Resolution. The article states,

“Nate [Silver] maintained throughout his speech that marketers need to be able to tell a story with data or it is useless. In order to use your data properly, you must know what the narrative should be…I see data reporting and interpretation as an art, very similar to storytelling. However, data analysts are too often siloed. We have to understand that no one writes in a bubble, and marketing teams should understand the value and perspective data can bring to a story.”

Silver, Founder and Editor in Chief of is also quoted in the article from his talk at the Adobe Summit Digital Marketing Conference. He said, “Just because you can’t measure it, doesn’t mean it’s not important.” This is the back to the basics approach that companies need to consider.

Chelsea Kerwin, March 23, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Apache Samza Revamps Databases

March 19, 2015

Databases have advanced far beyond the basic relational databases. They need to be consistently managed and have real-time updates to keep them useful. The Apache Software Foundation developed the Apache Samza software to help maintain asynchronous stream processing network. Samza was made in conjunction with Apache Kafka.

If you are interested in learning how to use Apache Samza, the Confluent blog posted “Turning The Database Inside-Out With Apache Samza” by Martin Keppmann. Kleppmann recorded a seminar he gave at Strange Loop 2014 that explains his process for how it can improve many features on a database:

“This talk introduces Apache Samza, a distributed stream processing framework developed at LinkedIn. At first it looks like yet another tool for computing real-time analytics, but it’s more than that. Really it’s a surreptitious attempt to take the database architecture we know, and turn it inside out. At its core is a distributed, durable commit log, implemented by Apache Kafka. Layered on top are simple but powerful tools for joining streams and managing large amounts of data reliably.”

Learning new ways to improve database features and functionality always improve your skill set. Apache Software also forms the basis for many open source projects and startups. Martin Kleppman’s talk might give you a brand new idea or at least improve your database.

Whitney Grace, March 20, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Give Employees the Data they Need

March 19, 2015

A classic quandary: will it take longer to reinvent a certain proverbial wheel, or to find the documentation from the last time one of your colleagues reinvented it? That all depends on your organization’s search system. An article titled “Help Employees to ‘Upskill’ with Access to Information” at DataInformed makes the case for implementing a user-friendly, efficient data-management platform. Writer Diane Berry, not coincidentally a marketing executive at enterprise-search company Coveo, emphasizes that re-covering old ground can really sap workers’ time and patience, ultimately impacting customers. Employees simply must be able to quickly and easily access all company data relevant to the task at hand if they are to do their best work. Berry explains why this is still a problem:

“Why do organizations typically struggle with implementing these strategies? It revolves around two primary reasons. The first reason is that today’s heterogeneous IT infrastructures form an ‘ecosystem of record’ – a collection of newer, cloud-based software; older, legacy systems; and data sources that silo valuable data, knowledge, and expertise. Many organizations have tried, and failed, to centralize information in a ‘system of record,’ but IT simply cannot keep up with the need to integrate systems while also constantly moving and updating data. As a result, information remains disconnected, making it difficult and time consuming to find. Access to this knowledge often requires end-users to conduct separate searches within disconnected systems, often disrupting co-workers by asking where information may be found, and – even worse – moving forward without the knowledge necessary to make sound decisions or correctly solve the problem at hand.

“The second reason is more cultural than technological. Overcoming the second roadblock requires an organization to recognize the value of information and knowledge as a key organizational asset, which requires a cultural shift in the company.”

Fair enough; she makes a good case for a robust, centralized data-management solution. But what about that “upskill” business? Best I can tell, it seems the term is not about improving skills, but about supplying employees with resources they need to maximize their existing skills. The term was a little confusing to me, but I can see how it might be catchy. After all, marketing is the author’s forte.

Cynthia Murrell, March 19, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Zementis and Software AG Team Up

March 18, 2015

I learned that Software AG (a digital business platform for enterprises) and Zementis (a company that empowers Big Data insights) have teamed up. According to “Zementis and Software AG Announce Joint Solution at CeBIT 2015”, the new solution is Apama, an analytics platform. It is:

designed to rapidly process streaming, fast-moving and real-time data sets at massive scale to support intelligent, automated actions and rapid, insightful business decisions. Its functionality comprises event processing, messaging, in-memory data management and visualization. The Apama platform allows businesses to analyze and act on high-volume business operations and customer interactions in real-time. It rapidly correlates, aggregates and detects patterns across large volumes of fast-moving data from multiple sources, so that business decision makers can take the right action at the right time.

The software allows the user to design and visualize real-time analytics, connect to streaming and static data, and detect and analyze patterns in real time.

The system can be used for multi channel fraud detection, risk based product pricing, and risk based capital management. No word about the system’s application to law enforcement and intelligence tasks.

Stephen E Arnold, March 18, 2015

Stephen E Arnold, Publisher of CyberOSINT at

IBM Hadoop

March 18, 2015

For anyone who sees setting up an instance of Hadoop as a huge challenge, Open Source Insider points to IBM’s efforts to help in, “Has IBM Made (Hard) Hadoop Easier?” Why do some folks consider Hadoop so difficult? Blogger Adrian Bridgwater elaborates:

“More specifically, it has been said that the Hadoop framework for distributed processing of large data sets across clusters of computers using simple programming models is tough to get to grips with because:

Hadoop is not a database

Hadoop is not an analytics environment

Hadoop is not a visualisation tool

Hadoop is not known for clusters that meet enterprise-grade security requirements

Foundation fixation

This is because Hadoop is a ‘foundational’ technology in many senses, so its route to ‘business usefulness’ is neither direct or clear cut in many cases.”

Hmm. So, perhaps one should understand what Hadoop is and what it does before trying to implement it. Still, the folks at IBM would prefer companies just pay them to handle it. The article cites a survey of “bit-data developers” (commissioned by IBM) that shows about a quarter of the respondents us IBM’s Hadoop. Bridgwater also mentions:

“IBM also recently conducted an independently audited benchmark, which was reviewed by third-party Infosizing, of three popular SQL-on-Hadoop implementations, and the results showed that IBM’s Big SQL was the only Hadoop solution tested that was able to run all 99 Hadoop-DS queries…. Smith says that this new report and benchmark are proof that customers can ask more complex questions of IBM when it comes to Hadoop implementation.”

I’m not sure that’s what those factors prove, but it is clear that many companies do turn to the tech giant for help with Hadoop. But is their assistance worth the cost? Unfortunately, this article includes no word on IBM’s Hadoop pricing.

Cynthia Murrell, March 18, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Vilocity 2.0 Released by Nuwave

March 17, 2015

The article on Virtual Strategy Magazine titled NuWave Enhances their Vilocity Analytic Framework with Release of Vilocity 2.0 Update promotes the upgraded framework as a mixture of Oracle Business Intelligence Enterprise Edition and Oracle Endeca Information Discovery. The ability to interface across both of these tools as well as include components from both in a single dashboard makes this a very useful program, with capabilities such as exporting to Microsoft to create slideshows, pre-filter and the ability to choose sections of a page and print across both frameworks. The article explains,

“The voices of our Vilocity customers were vital in the Vilocity 2.0 release and we value their input,” says Rob Castle, NuWave’s Chief Technology Officer… The most notable Vilocity deployment NuWave has done is for the U.S. Army EMDS Program. From deployment and through continuous support NuWave has worked closely with this client to communicate issues and identify tools that could improve Vilocity. The Vilocity 2.0 release is a culmination of NuWave’s desire for their clients to be successful.”

It looks like they have found a way to make Endeca useful. Users of the Vilocity Analytic framework will be able to find answers to the right questions as well as make new discoveries. The consistent look and feel of both systems should aid users in getting used to them, and making the most of their new platform.

Chelsea Kerwin, March 17, 2014

Stephen E Arnold, Publisher of CyberOSINT at

Next Page »