CyberOSINT banner

SAS Text Miner Provides Valuable Predictive Analytics

March 25, 2015

If you are searching for predictive analytics software that provides in-depth text analysis with advanced linguistic capabilities, you may want to check out “SAS Text Miner.”  Predictive Analytics Today runs down the features and what SAS Text Miner and details how it works.

It is a user-friendly software with data visualization, flexible entity options, document theme discovery, and more.

“The text analytics software provides supervised, unsupervised, and semi-supervised methods to discover previously unknown patterns in document collections.  It structures data in a numeric representation so that it can be included in advanced analytics, such as predictive analysis, data mining, and forecasting.  This version also includes insightful reports describing the results from the rule generator node, providing clarity to model training and validation results.”

SAS Text Miner includes other features that draw on automatic Boolean rule generation to categorize documents and other rules can be exported into Boolean rules.  Data sets can be made from a directory on crawled from the Web.  The visual analysis feature highlights the relationships between discovered patterns and displays them using a concept link diagram.  SAS Text Miner has received high praise as a predictive analytics software and it might be the solution your company is looking for.

Whitney Grace, March 25, 2015
Stephen E Arnold, Publisher of CyberOSINT at

Elasticsearch Becomes Elastic, Acquires Found

March 25, 2015

The article on titled Elasticsearch Changes Its Name, Enjoys An Amazing Open Source Ride and Hopes to Avoid Mistakes explains the latest acquisition and the reasons behind the name change to simply Elastic. That choice is surmised to be due to Elastic’s wish to avoid confusion over the open source product Elasticsearch and the company itself. It also signals the company’s movement beyond solely providing search technology. The article also discusses the acquisition of Found, a Norwegian company,

“Found provides hosted and fully ­managed Elasticsearch clusters with technology that automates processes such as installation, configuration, maintenance, backup, and high­availability. Doing all of this heavy-lifting enables developers to integrate a search engine into their database, website or app quickly In addition, Found has created a turnkey process to scale Elasticsearch clusters up or down at any time and without any downtime. Found’s Elasticsearch as a Service offering is being used by companies like Docker, Gild… and the New York Public Library.”

Elasticsearch has raised almost $105 million since its start after being created by Shay Banon in 2010. The article posits that they have been doing the right things so far, such as the acquisition of Kibana, the visualization vendor. Although some startups relying on Elasticsearch may throw shade at the Found acquisition, there are no foreseeable threats to Elastic’s future.

Chelsea Kerwin, March 25, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Modus Operandi Gets a Big Data Storage Contract

March 24, 2015

The US Missile Defense Agency awarded Modus Operandi a huge government contract to develop an advanced data storage and retrieval system for the Ballistic Missile Defense System.  Modus Operandi specializes in big data analytic solutions for national security and commercial organizations.  Modus Operandi posted a press release on their Web site to share the news, “Modus Operandi Awarded Contract To Develop Advanced Data Storage And Retrieval System For The US Missile Defense Agency.”

The contract is a Phase I Small Business Innovation Research (SBIR), under which Modus Operandi will work on the DMDS Analytic Semantic System (BASS).  The BASS will replace the old legacy system and update it to be compliant with social media communities, the Internet, and intelligence.

“ ‘There has been a lot of work in the areas of big data and analytics across many domains, and we can now apply some of those newer technologies and techniques to traditional legacy systems such as what the MDA is using,’ said Dr. Eric Little, vice president and chief scientist, Modus Operandi. ‘This approach will provide an unprecedented set of capabilities for the MDA’s data analysts to explore enormous simulation datasets and gain a dramatically better understanding of what the data actually means.’ ”

It is worrisome that the missile defense system is relying on an old legacy system, but at least it is being upgraded now.  Modus Operandi also sales Cyber OSINT and they are applying this technology in an interesting way for the government.

Whitney Grace, March 24, 2015
Stephen E Arnold, Publisher of CyberOSINT at

Data and Marketing Come Together for a Story

March 23, 2015

An article on the Marketing Experiments Blog titled Digital Analytics: How To Use Data To Tell Your Marketing Story explains the primacy of the story in the world of data. The conveyance of the story, the article claims, should be a collaboration between the marketer and the analyst, with both players working together to create an engaging and data-supported story. The article suggests breaking this story into several parts, similar to the plot points you might study in a creative writing class. Exposition, Rising Action, Climax, Denouement and Resolution. The article states,

“Nate [Silver] maintained throughout his speech that marketers need to be able to tell a story with data or it is useless. In order to use your data properly, you must know what the narrative should be…I see data reporting and interpretation as an art, very similar to storytelling. However, data analysts are too often siloed. We have to understand that no one writes in a bubble, and marketing teams should understand the value and perspective data can bring to a story.”

Silver, Founder and Editor in Chief of is also quoted in the article from his talk at the Adobe Summit Digital Marketing Conference. He said, “Just because you can’t measure it, doesn’t mean it’s not important.” This is the back to the basics approach that companies need to consider.

Chelsea Kerwin, March 23, 2015

Stephen E Arnold, Publisher of CyberOSINT at

Apache Samza Revamps Databases

March 19, 2015

Databases have advanced far beyond the basic relational databases. They need to be consistently managed and have real-time updates to keep them useful. The Apache Software Foundation developed the Apache Samza software to help maintain asynchronous stream processing network. Samza was made in conjunction with Apache Kafka.

If you are interested in learning how to use Apache Samza, the Confluent blog posted “Turning The Database Inside-Out With Apache Samza” by Martin Keppmann. Kleppmann recorded a seminar he gave at Strange Loop 2014 that explains his process for how it can improve many features on a database:

“This talk introduces Apache Samza, a distributed stream processing framework developed at LinkedIn. At first it looks like yet another tool for computing real-time analytics, but it’s more than that. Really it’s a surreptitious attempt to take the database architecture we know, and turn it inside out. At its core is a distributed, durable commit log, implemented by Apache Kafka. Layered on top are simple but powerful tools for joining streams and managing large amounts of data reliably.”

Learning new ways to improve database features and functionality always improve your skill set. Apache Software also forms the basis for many open source projects and startups. Martin Kleppman’s talk might give you a brand new idea or at least improve your database.

Whitney Grace, March 20, 2015

Stephen E Arnold, Publisher of CyberOSINT at

DataStax Buys Graph-Database Startup Aurelius

February 20, 2015

DataStax has purchased open-source graph-database company, Aurelius, we learn in “DataStax Grabs Aurelius in Graph Database Acqui-Hire” at TechCrunch. Aurelius’ eight engineers will reportedly be working at DataStax, delving right into a scalable graph component for the company’s Cassandra-based Enterprise database. This acquisition, DataStax declares, makes theirs the only database platform with graph, analytics, search, and in-memory in one package. Writer Ron Miller tells us:

“DataStax is the commercial face of the open source Apache Cassandra database. Aurelius was the commercial face of the Titan graph database.

“Matt Pfeil, co-founder and chief customer officer at DataStax, says customers have been asking about graph database functionality for some time. Up until now customers have been forced to build their own on top of the DataStax offering.

“‘This was something that was on our radar. As we started to ramp up, it made sense from corporate [standpoint] to buy it instead of build it.’ He added that getting the graph-database engineering expertise was a bonus. ‘There’s not a ton of graph database experts [out there],’ he said.

“This expertise is especially important as two of the five major DataStax key use cases — fraud detection and recommendation engines — involve a graph database.”

Though details of the deal have not been released, see the write-up for some words on the fit between these two companies. Founded on an open-source model, Aurelius was doing just fine in its own. Co-founder Matthias Bröcheler is excited, though, about what his team can do at DataStax. Bröcheler did note that the graph database’s open-source version, Titan, will live on. Aurelius is located in Oakland, California, and was just launched in 2014.

Headquartered in San Mateo, California, DataStax was founded in 2010. Their Cassandra-based software implementations are flexible and scalable. Clients range from young startups to Fortune 100 companies, including such notables as eBay, Netflix and HealthCare Anytime.

Cynthia Murrell, February 20, 2015

Sponsored by, developer of Augmentext

Apache Solr Search NoSQL Search Shines Solo

February 3, 2015

Apache Solr is an open source enterprise search engine that is used for relational databases and Hadoop. ZDNet’s article, “Why Apache Solr Search Is On The Rise And Why It’s Going Solo” explores why its lesser-known use as a NoSQL store might explode in 2015.

At the beginning of 2014, the most Solr deployments were using it in the old-fashioned way, but 2015 shows that fifty percent of the pipeline is now using it as a first class data store. Companies are upgrading their old file intranets for the enterprise cloud. They want the upgraded system to be searchable and they are relying on Solr to get the job done.

Search is more complex than basic NoSQL and needs something more robust to handle the new data streams. Solr adds the extra performance level, so users have access to their data and nothing is missing.

” ‘So when we talk about Solr, it’s all your data, all the time at scale. It’s not just a guess that we think is likely the right answer. ‘We’re going to go ahead and push this one forward’. We guarantee the quality of those results. In financial services and other areas where guarantees are important, that makes Solr attractive,’ [CEO Will Hayes of LucidWorks, Apache Solr’s commercial sponsor] said.”

It looks like anything is possible for LucidWorks in the coming year.

Whitney Grace, February 03, 2014
Sponsored by, developer of Augmentext

Basho: A Comeback?

January 18, 2015

I read “NoSQL Pioneer Basho Scores $25M to Attempt a Comeback.” In 2012, Basho looked like a player. Then the company lost traction. The all-too-familiar “staff changes” kicked in. Now the company has gobbled another $25 million to the $32 million previously raised. My thought is that generating this much cash from a NoSQL system is going to be a task I would not undertake. I do have a profile of Basho when it was looking like a contender. I will hunt it down and post a version on the Xenky Vendor Profiles page. I will put an item in Beyond Search and provide the link to a free profile of the company in the next few days. Availability of the free report will be in Beyond Search.

Stephen E Arnold, January 18, 2015

On Commercial vs Open Source Databases

December 22, 2014

Perhaps we should not be surprised that MarkLogic’s Chet Hays urges caution before adopting an open-source data platform. His article, “Thoughts on How to Select Between COTS and Open Source” at Sys-Con Media can be interpreted as a defense of his database company’s proprietary approach. (For those unfamiliar with the acronym, COTS stands for commodity off-the-shelf.) Hayes urges buyers to look past initial cost and consider other factors in three areas: technical, cultural, and, yes, financial.

In the “technical” column, Hayes asserts that whether a certain solution will meet an organization’s needs is more complex than a simple side-by-side comparison of features would suggest; we are advised to check the fine print. “Cultural” refers here to taking workers’ skill sets into consideration. Companies usually do this with their developers, Hayes explains, but often overlook the needs of the folks in operational support, who might appreciate the more sophisticated tools built into a commercial product. (No mention is made of the middle ground, where we find third-party products designed that add such tools to Hadoop iterations.)

In his comments on financial impact, Hayes basically declares: It’s complicated. He writes:

“Organizations need to look at the financial picture from a total-cost perspective, looking at the acquisition and development costs all the way through the operations, maintenance and eventual retirement of the system. In terms of development, the organization should understand the costs associated with using a COTS provided tool vs. an Open Source tool.

“[…] In some cases, the COTS tool will provide a significant productivity increase and allow for a quicker time to market. There will be situations where the COTS tool is so cumbersome to install and maintain that an Open Source tool would be the right choice.

“The other area already alluded to is the cost for operations and maintenance over the lifecycle of project. Organizations should take into consideration existing IT investments to understand where previous investments can be leveraged and the cost incurred to leverage these systems. Organizations should ask whether the performance of one or the other allow for a reduced hardware and deployment footprint, which would lead to lower costs.”

These are all good points, and organizations should indeed do this research before choosing a solution. Whether the results point to an open-source solution or to a commercial option depends entirely upon the company or institution.

Cynthia Murrell, December 22, 2014

Sponsored by, developer of Augmentext

Amazon and Oracle: The Love Affair Ends

November 14, 2014

I recall turning in a report about Amazon’s use of Oracle as its core database. The client, a bank type operation, was delighted that zippy Amazon had the common sense to use a name brand database. For the bank types, recognizable names used to be indicators of wise technological decisions.

I read “Amazon: DROP DATABASE Oracle; INSERT Our New Fast Cheap MySQL Clone.” Assume the write up is spot on, Amazon and Oracle have fallen out of love or at least beefy payments from Amazon for the sort of old Oracle data management system. This comment becomes quite interesting to me:

“This old-world relational database software is very expensive,” Jassy [Amazon tech VP] said. “They’re proprietary. There’s a high level of lock-in. And they’ve got punitive licensing terms, not just allowing very little flexibility in moving to the cloud the way customers want, but also in the auditing and fining of their customers.”

Several thoughts flitted through my mind as I kept one eye on the Philae gizmo:

  1. Amazon’s move, if it proves successful, may allow Mr. Bezos to mount a more serious attack on the enterprise market. Bad news for Oracle and possibly good news for those who want to save some Oracle bucks and trim the number of Oracle DBAs on the payroll
  2. Encourage outfits that offer enterprise cloud solutions. Will Amazon snap up some of the enterprise services and put the squeeze on Google and Microsoft?
  3. Trigger another round of database wars. Confusion and marketing hype often add a bit of spice to the Codd fest
  4. Cause concern among the commercial, proprietary NoSQL outfits. Think of MarkLogic and its ilk trying to respond to an Amazon package designed to make a 20 something developer jump up and down.

Interesting move by the digital WalMart.

Stephen E Arnold, November 14, 2014

Next Page »