CyberOSINT banner

DataStax Buys Graph-Database Startup Aurelius

February 20, 2015

DataStax has purchased open-source graph-database company, Aurelius, we learn in “DataStax Grabs Aurelius in Graph Database Acqui-Hire” at TechCrunch. Aurelius’ eight engineers will reportedly be working at DataStax, delving right into a scalable graph component for the company’s Cassandra-based Enterprise database. This acquisition, DataStax declares, makes theirs the only database platform with graph, analytics, search, and in-memory in one package. Writer Ron Miller tells us:

“DataStax is the commercial face of the open source Apache Cassandra database. Aurelius was the commercial face of the Titan graph database.

“Matt Pfeil, co-founder and chief customer officer at DataStax, says customers have been asking about graph database functionality for some time. Up until now customers have been forced to build their own on top of the DataStax offering.

“‘This was something that was on our radar. As we started to ramp up, it made sense from corporate [standpoint] to buy it instead of build it.’ He added that getting the graph-database engineering expertise was a bonus. ‘There’s not a ton of graph database experts [out there],’ he said.

“This expertise is especially important as two of the five major DataStax key use cases — fraud detection and recommendation engines — involve a graph database.”

Though details of the deal have not been released, see the write-up for some words on the fit between these two companies. Founded on an open-source model, Aurelius was doing just fine in its own. Co-founder Matthias Bröcheler is excited, though, about what his team can do at DataStax. Bröcheler did note that the graph database’s open-source version, Titan, will live on. Aurelius is located in Oakland, California, and was just launched in 2014.

Headquartered in San Mateo, California, DataStax was founded in 2010. Their Cassandra-based software implementations are flexible and scalable. Clients range from young startups to Fortune 100 companies, including such notables as eBay, Netflix and HealthCare Anytime.

Cynthia Murrell, February 20, 2015

Sponsored by, developer of Augmentext

Apache Solr Search NoSQL Search Shines Solo

February 3, 2015

Apache Solr is an open source enterprise search engine that is used for relational databases and Hadoop. ZDNet’s article, “Why Apache Solr Search Is On The Rise And Why It’s Going Solo” explores why its lesser-known use as a NoSQL store might explode in 2015.

At the beginning of 2014, the most Solr deployments were using it in the old-fashioned way, but 2015 shows that fifty percent of the pipeline is now using it as a first class data store. Companies are upgrading their old file intranets for the enterprise cloud. They want the upgraded system to be searchable and they are relying on Solr to get the job done.

Search is more complex than basic NoSQL and needs something more robust to handle the new data streams. Solr adds the extra performance level, so users have access to their data and nothing is missing.

” ‘So when we talk about Solr, it’s all your data, all the time at scale. It’s not just a guess that we think is likely the right answer. ‘We’re going to go ahead and push this one forward’. We guarantee the quality of those results. In financial services and other areas where guarantees are important, that makes Solr attractive,’ [CEO Will Hayes of LucidWorks, Apache Solr’s commercial sponsor] said.”

It looks like anything is possible for LucidWorks in the coming year.

Whitney Grace, February 03, 2014
Sponsored by, developer of Augmentext

Basho: A Comeback?

January 18, 2015

I read “NoSQL Pioneer Basho Scores $25M to Attempt a Comeback.” In 2012, Basho looked like a player. Then the company lost traction. The all-too-familiar “staff changes” kicked in. Now the company has gobbled another $25 million to the $32 million previously raised. My thought is that generating this much cash from a NoSQL system is going to be a task I would not undertake. I do have a profile of Basho when it was looking like a contender. I will hunt it down and post a version on the Xenky Vendor Profiles page. I will put an item in Beyond Search and provide the link to a free profile of the company in the next few days. Availability of the free report will be in Beyond Search.

Stephen E Arnold, January 18, 2015

On Commercial vs Open Source Databases

December 22, 2014

Perhaps we should not be surprised that MarkLogic’s Chet Hays urges caution before adopting an open-source data platform. His article, “Thoughts on How to Select Between COTS and Open Source” at Sys-Con Media can be interpreted as a defense of his database company’s proprietary approach. (For those unfamiliar with the acronym, COTS stands for commodity off-the-shelf.) Hayes urges buyers to look past initial cost and consider other factors in three areas: technical, cultural, and, yes, financial.

In the “technical” column, Hayes asserts that whether a certain solution will meet an organization’s needs is more complex than a simple side-by-side comparison of features would suggest; we are advised to check the fine print. “Cultural” refers here to taking workers’ skill sets into consideration. Companies usually do this with their developers, Hayes explains, but often overlook the needs of the folks in operational support, who might appreciate the more sophisticated tools built into a commercial product. (No mention is made of the middle ground, where we find third-party products designed that add such tools to Hadoop iterations.)

In his comments on financial impact, Hayes basically declares: It’s complicated. He writes:

“Organizations need to look at the financial picture from a total-cost perspective, looking at the acquisition and development costs all the way through the operations, maintenance and eventual retirement of the system. In terms of development, the organization should understand the costs associated with using a COTS provided tool vs. an Open Source tool.

“[…] In some cases, the COTS tool will provide a significant productivity increase and allow for a quicker time to market. There will be situations where the COTS tool is so cumbersome to install and maintain that an Open Source tool would be the right choice.

“The other area already alluded to is the cost for operations and maintenance over the lifecycle of project. Organizations should take into consideration existing IT investments to understand where previous investments can be leveraged and the cost incurred to leverage these systems. Organizations should ask whether the performance of one or the other allow for a reduced hardware and deployment footprint, which would lead to lower costs.”

These are all good points, and organizations should indeed do this research before choosing a solution. Whether the results point to an open-source solution or to a commercial option depends entirely upon the company or institution.

Cynthia Murrell, December 22, 2014

Sponsored by, developer of Augmentext

Amazon and Oracle: The Love Affair Ends

November 14, 2014

I recall turning in a report about Amazon’s use of Oracle as its core database. The client, a bank type operation, was delighted that zippy Amazon had the common sense to use a name brand database. For the bank types, recognizable names used to be indicators of wise technological decisions.

I read “Amazon: DROP DATABASE Oracle; INSERT Our New Fast Cheap MySQL Clone.” Assume the write up is spot on, Amazon and Oracle have fallen out of love or at least beefy payments from Amazon for the sort of old Oracle data management system. This comment becomes quite interesting to me:

“This old-world relational database software is very expensive,” Jassy [Amazon tech VP] said. “They’re proprietary. There’s a high level of lock-in. And they’ve got punitive licensing terms, not just allowing very little flexibility in moving to the cloud the way customers want, but also in the auditing and fining of their customers.”

Several thoughts flitted through my mind as I kept one eye on the Philae gizmo:

  1. Amazon’s move, if it proves successful, may allow Mr. Bezos to mount a more serious attack on the enterprise market. Bad news for Oracle and possibly good news for those who want to save some Oracle bucks and trim the number of Oracle DBAs on the payroll
  2. Encourage outfits that offer enterprise cloud solutions. Will Amazon snap up some of the enterprise services and put the squeeze on Google and Microsoft?
  3. Trigger another round of database wars. Confusion and marketing hype often add a bit of spice to the Codd fest
  4. Cause concern among the commercial, proprietary NoSQL outfits. Think of MarkLogic and its ilk trying to respond to an Amazon package designed to make a 20 something developer jump up and down.

Interesting move by the digital WalMart.

Stephen E Arnold, November 14, 2014

Google and Images: What Does Remove Mean?

October 4, 2014

I read “After Legal Threat, Google Says It Removed ‘Tens of Thousands’ of iCloud Hack Pics.” On the surface, the story is straightforward. A giant company gets a ringy dingy from attorneys. The giant company takes action. Legal eagles return to their nests.

However, a question zipped through my mind:

What does remove mean?

If one navigates to a metasearch engine like, the user can run queries. A query often generates results with a hot link to the Google cache. Have other services constructed versions of the Google index to satisfy certain types of queries? Are their third parties that have content in Web mirrors? Is content removed from those versions of content? Does “remove” mean from the Fancy Dan pointers to content or from the actual Google or other data structure? (See my write ups in Google Version 2.0 and The Digital Gutenberg to get a glimpse of how certain content can be deconstructed and stored in various Google data structures.)

Does remove mean a sweep of Google Images? Again are the objects themselves purged or are the pointers deleted.

Then I wondered what happens if Google suffers a catastrophic failure. Will the data and content objects be restored by a back up. Are those back ups purged?

I learned in the write up:

The Hollywood Reporter on Thursday published a letter to Google from Hollywood lawyers representing “over a dozen” of the celebrity victims of last month’s leak of nude photos. The lawyers accused Google of failing to expeditiously remove the photos as it is required to do under the Digital Millennium Copyright Act. They also demanded that Google remove the images from Blogger and YouTube as well as suspend or terminate any offending accounts. The lawyers claimed that four weeks after sending the first DMCA takedown notice relating to the images, and filing  over a dozen more since, the photos are still available on the Google sites.

What does “remove” mean?

Stephen E Arnold, October 4, 2014

Why Good Enough Is the New Norm in Search

September 29, 2014

Navigate to “Postgres Full Text Search Is Good Enough.” I first heard this argument at a German information technology conference a few years ago. The idea is surprisingly easy to understand. As long as a user can bang in a couple of key words, scan a result list, and locate information that the user finds helpful—job done. The search results may consist of flawed or manipulated information. The search results may be off point for the user’s query when evaluated by old fashioned methods such as precision and recall. The user may be dumb and relies on what the user finds accurate.


This write up explains the good enough approach in terms of PostgreSQL, a useful open source Codd type data management system. Please, note. I am not uncomfortable with good enough search. I understand that when the herd stampedes, it is not particularly easy to stop the run. Prudence suggests that one take cover.

Here’s the guts of the write up:

What do I mean by ‘good enough’? I mean a search engine with the following features:

  • Stemming
  • Ranking / Boost
  • Support Multiple languages
  • Fuzzy search for misspelling
  • Accent support

Luckily PostgreSQL supports all these features.

The write up contains some useful code snippets to make use of search features. The discussion of full text search is coherent and addresses a vast swath of content. Note that proprietary vendors have tilled acres of marketing earth and fertilizer to convert search into a mind boggling range of functions.

This article includes code snippets to tackle full text within PostgreSQL.

Querying is included as well. Again, code snippets are included. (My teenage advisors said, “Very useful snippets.” Okay. Good.

The write up concludes:

We have seen how to build a decent multi-language search engine based on a non-trivial document. This article is only an overview but it should give you enough background and examples to get you started with your own….Postgres is not as advanced as ElasticSearch and SOLR but these two are dedicated full-text search tools whereas full-text search is only a feature of PostgreSQL and a pretty good one

Reasonable observation. Worth reading.

If you are a vendor of proprietary search technology, there will be more individuals infused with the sprit of open source, not fewer. How many experts are there for proprietary systems? Fewer than the cadres of open source volk I surmise.

Stephen E Arnold, September 29, 2014

MarkLogic Bets New Offices Equal Revenues

September 25, 2014

MarkLogic, founded more than a decade ago, is an interesting company. I heard that Google kicked its tires because Christopher Lindblad is a true wizard.

The outfit offers an Extensible Markup Language data management solution. Over the years, the company has positioned the system to slice and dice content for publishers, intelligence analysis for government entities, and enterprise search. Along the way, the company’s technology has been shaped to meet the needs of the pivoting forces in content processing. Stated another way, when one thing won’t sell at a pace to keep investors happy, try another way. In the course of its journey, the company brushed against Oracle and then found itself snarled in the confusion between JSON and XML and the sort of open proprietary extensions to the query language used to extract results from the XML store only to get buffeted by the hoo hah about Hadoop and assorted open source alternatives to Codd databases. Wow.

I read a content marketing / public relations story called “MarkLogic Expands Global Reach with New Offices in Chicago.” Check the source quickly because some BusinessWire content can disappear or become available to those who fork over dough to the “news” service. The write up asserted:

“The opening of these new offices is well-timed for the growing number of global customers who need the enterprise grade NoSQL solutions we are delivering to US-based customers,” said David Ponzini, senior vice president of corporate development, MarkLogic. “We are in an advantageous position to make an immediate impact in Europe and Southeast Asia. We continue broadening the market awareness for MarkLogic throughout the world.”

The trick, of course, will be to blast through the financial goals for the company set by the investors years ago. A failure to produce more than $60 million in revenues a several years ago led to the departure of one president. A couple of more senior executives have spun through the revolving door not too far from Google Island with its quirky dinosaur skeleton. Does that skeleton stand as a metaphor to proprietary software solutions?

In my view, the business thinking at work is more sales offices equals more sales. I once had an office in Manhattan even though I worked in Illinois. The cost was about $20 per month. I had an address on Park Avenue, south unfortunately and a 212 phone number. I made a sale or two to an organization run by John Suhler, but I quickly figured out that the key to making sales was my being in and around midtown.

I thought I read that outfits like IBM are going to a “no office” approach. Maybe MarkLogic has identified a solution to the overhead associated with full time equivalents and physical space? That begs another question, “What does MarkLogic know that IBM does not know?”

Some vendors have found that more sales offices increase costs without generating sufficient revenue to cover the overhead, miscellaneous costs and in country marketing expenses. I can name several Paris, France based content processing companies who learned first hand that additional offices are a very, very expensive proposition. Other companies leverage partners for revenues. In one of my industry reports, I pointed out that prior to the sale of Autonomy to HP, Autonomy figured out a hybrid sales model that seemed to work as long as Dr. Lynch was cracking the whip. Remove the management, the partnering model can go off the rails.

Don’t get me wrong. XML is a wonderful solution to certain types of information challenges. Thomson Reuters can produce hundreds of for fee publications using XQuery and XSLT with proprietary extensions. A quick look at Thomson Reuters financial results suggest that more may be needed by this company than a foundation and an XML data store.

How quickly will MarkLogic deliver a five or ten X return on the $70 million investors have pumped in. In today’s market, cranking out $300 to $700 million in revenues from content processing technology that competes with open source alternatives is a tall order.

Maybe more sales offices will do it? My hunch is that more closed deals is the evidence some stakeholders seek.

Stephen E Arnold, September 25, 2014

Open Sourcers Believe In Cassandra

August 27, 2014

In Homer’s Odyssey, the character Cassandra had the gift of prophesy, but she was also cursed to where no one believed her. The NoSQL database of the same name shared a similar problem when it first started, but unlike the tragic heroine it has since grown to be a popular and profitable bit of code. Wired discusses Cassandra’s history and current endeavors in “Out In the Open: The Abandoned Facebook Tech That Now Helps Power Apple.”

Cassandra is the brainchild of Jonathan Ellis and he used it to found DataStax. Facebook used Cassandra to better scale information across machines and open sourced it in 2008. It faded into the background for a while, but DataStax continued to gain traction with its proprietary software. Apple has since joined the Cassandra community and is its second largest contributor. DataStax, however, will not acknowledge that Apple is one of its clients.

The article points out that a single database product cannot reign supreme in 2014’s market. New ways to house and utilize data will continue to grow, much of it driven by open source. What does that mean for DataStax and Cassandra?

“Ellis says the strategy for Cassandra and DataStax will be ensuring that its technology can work with any new technology that can come along. For example, DataStax recently released a connector for Spark that will enable developers to easily use Spark to analyze data stored in Cassandra. ‘We’re trying to be the database that drives our application, not necessarily the analytics,’ he says. ‘There’s nothing that marries us to one of those platforms.’”

From reading this, it seems the big data push has quieted down somewhat, but companies based on open source software are trying to create products that allow people to use their data smarter and without the holdups of earlier big data pushes. One thing for sure is if DataStax truly does have Apple as a client, they can kiss success on the mouth.

Whitney Grace, August 27, 2014
Sponsored by, developer of Augmentext

Guide to Hiding Java Source in Oracle Database

August 13, 2014

The explanatory article on MacLochlainns Weblog titled Hiding a Java Source offers information for those interested in concealing a Java source in an Oracle database. It is a relatively brief article that consists of straightforward instructions. The article begins,

“The ability to deploy Java inside the Oracle database led somebody to conclude that the source isn’t visible in the data catalog. Then, that person found that they were wrong because the Java source is visible when you use a DDL command to CREATE, REPLACE, and COMPILE the Java source. This post discloses how to find the Java source and how to prevent it from being stored in the data catalog.”

The article concludes with instructions on how to ascertain that the Java source is compiled outside the database. Obviously, this article is only intended for white hate reasons, right? Michael McLaughlin, the author of the blog, has a long history with Oracle, going back to Oracle 6. He has written several handbooks on Oracle and teaches database technology at BYU-Idaho. The blog used to be focused solely on Oracle as well, but now offers posts on a range of topics from Java to Mac OS to Microsoft Excel and more.

Chelsea Kerwin, August 13, 2014

Sponsored by, developer of Augmentext

Next Page »