March 31, 2014
Microsoft recently announced changes to SharePoint, some well received and others less so. For instance, the next SharePoint server update is planned for 2015. However, in other news, SQL server will be supported within SharePoint 2013. Read more in the Redmond article, “Microsoft Adding SQL Server 2014 Support to SharePoint 2013.”
The article says:
“SharePoint Server 2013 will be capable of supporting SQL Server 2014 when Microsoft releases the next SharePoint cumulative update next month, according to an announcement on Friday. SQL Server 2014 is currently in the release-to-manufacturing (RTM) stage, and is expected to hit general availability on April 1.”
SharePoint is continuing its quest to be all things to all people, incorporating more and more outside components. However, it is becoming more difficult and more complicated for users to manage such complex implementations. Stephen E. Arnold is a longtime leader in search and gives a lot of coverage to SharePoint on his Web site ArnoldIT.com.
Emily Rae Aldridge, March 31, 2014
March 27, 2014
I believe that MarkLogic opened for business in 2001. One of the founders was involved with Ultraseek, a search engine that eventually ended up in the hands of HP Autonomy. In case you did not recall Ultraseek, that product dates from the mid 1990s.
Why’s is this relevant to MarkLogic, a company offering an XML database?
I read “MarkLogic Poised for Continued Growth as the Industry Leader in NoSQL Marketplace.” The write up states:
growth in new markets including Japan and Europe, steady customer acquisition, strategic partner relationships and industry recognition, has further propelled the company into the leadership position within the NoSQL database market.
The company points to the release of MarkLogic, Version 7, which works out to one release every two years. The company “introduced new pricing and packaging, a free developer license, and cloud ready hourly pricing for Amazon Web services.” No details on the pricing were in the story. No information about MarkLogic’s revenues were included. After the last shift in senior management, MarkLogic seemed to be nosing toward $60 million in revenues in 2011, based on our estimates. Now three years later, the company is showing renewed press release activity, but I would have preferred some hard numbers. In those three years, MarkLogic has suggested that its XML database can work as an information retrieval system, a platform for conducting intelligence, and providing print publishers with a useful content processing system. In this 36 month period, open source solutions, JSON, and competitors have been moving in similar directions. Choice, at least in data management, abounds.
MarkLogic, since 2001, according to Crunchbase, has ingested $73.6 million in funding with the last cash infusion coming in 2013 from Sequoia Capital, Tenaya Capital, Northgate Capital, and Gary Bloom, who is, according to Businessweek, the chief executive Officer, President, and Director of MarkLogic.
The news release points out:
MarkLogic received many industry accolades during the last year. The company was favorably positioned in Gartner’s “Magic Quadrant for Operational Database Management Systems,” published in October 2013. In addition, MarkLogic was the only enterprise NoSQL database vendor featured in the report that integrates search and application services. The company was also recognized in the April 2013 “Gartner Magic Quadrant for Enterprise Search,”- the only company to have the same product featured on both reports. Other accolades include the 2013 Computerworld Honors Laureate, by IDG’s Computerworld Honors Program. The annual award program honors visionary applications of information technology promoting positive social, economic, and educational change. Furthermore, MarkLogic was selected as one of the 2013 Red Herring 100 Global Winners – recognized as a leading global private company and an innovator in the technology industry.
These types of awards are not identified as “content marketing” or pay-to-play studies. I assume these accolades are objective and based on the cited firms’ deep experience with Extensible Markup Language and its applications. Anything less would be suspect in my way of looking at the world of databases, semantics, search systems, and business intelligence solutions.
With fast moving deals for outfits like Oculus Rift, the surging growth of Elasticsearch among developers, and almost frantic efforts of some MarkLogic competitors to find a way to generate revenue growth and profits—MarkLogic appears in the news release to be showing signs of revivification.
My view is that investors may be looking some return on the money pumped into MarkLogic. Assuming that patience is a virtue, I wonder if this 2001 start up is ready to deliver a big pay day to its stakeholders. WhatsApp, founded in 2009, was a home run for its stakeholders. Cloudera seems to be on a similar trajectory.
MarkLogic is 13 years old and proving to be like a teen in a fancy private school. Money is needed periodically. Do teens repay their parents? My teens did not. Investors may not have the appetite for underwriting without a return that I did as a happy parent.
Stephen E Arnold, March 27, 2014
March 3, 2014
The legacy of TeraText is long, but many in the information field have never heard of the pioneering database. Our own Stephen E. Arnold shares his extensive knowledge on the subject in a free 30-page analysis, “TeraText: Decades in the Making, Still Performing Mission Critical Functions.” The report is number 11 in Mr. Arnold’s valuable Vendor Profiles series. Why should we learn about a veteran like TeraText? He explains:
“TeraText provides a robust, scalable information processing system to government entities in the U.S., Australia, and elsewhere. TeraText is the forerunner of such systems as Recorded Future (funded by In-Q-Tel and Google) and IBM i2 Analyst Notebook. Yet most vendors marketing search and content processing systems are unaware of this important system. My report fills an important gap in the literature describing advanced information retrieval systems.”
Originally funded by university research grants, TeraText became a core system for governmental entities in law-making, defense, and intelligence. Perhaps the system’s low profile stems from the company’s sales approach; they prefer to capture a few large-scale contracts on their product’s merits, rather than capture widespread attention with flashy marketing.
If you are not familiar with the Vendor Profile series, you owe it to yourself to check out this free resource. Arnold brings his formidable expertise to bear on analyses of search and content processing vendors like Convera, Entopia, Fulcrum, and Verity. These papers are no thin giveaways; they rival reports from firms that charge as much as $3,500. Arnold’s shares this work for free, because he believes knowledge about foundational search systems can help companies make better decisions about vendor claims. He also hopes that spreading basic information about important search and content processing systems will speed up innovation in this typically sluggish field.
Cynthia Murrell, March 03, 2014
January 24, 2014
Who knew LinkedIn could be so useful? The site’s Engineering blog supplies an thorough look at logs in, “The Log: What Every Software Engineer Should Know About Real-Time Data’s Unifying Abstraction.” Writer and LinkedIn Engineer Jay Kreps aims to fill what he sees as a large gap in the education of most software engineers. The site’s transition last year from a centralized database to a distributed, Hadoop-based system opened his eyes.
“One of the most useful things I learned in all this was that many of the things we were building had a very simple concept at their heart: the log. Sometimes called write-ahead logs or commit logs or transaction logs, logs have been around almost as long as computers and are at the heart of many distributed data systems and real-time application architectures. You can’t fully understand databases, NoSQL stores, key value stores, replication, paxos, hadoop, version control, or almost any software system without understanding logs; and yet, most software engineers are not familiar with them. I’d like to change that. In this post, I’ll walk you through everything you need to know about logs, including what is log and how to use logs for data integration, real time processing, and system building.”
He isn’t kidding. The extensive article is really a mini-course that any programmer who hasn’t already mastered logs should look into. Part one is, titled “What is a log?”, covers logs in general as well as their place in both databases and distributed systems. Part two discusses data integration, including potential complications, the relationship to a data warehouse, log files, and building a scalable log. Real-time stream processing is discussed in part three, as well as data flow graphs, real-time processing, and log compaction. Part four covers system building, delving into the prospect of unbundling and where logs fits into system architecture. At the end, Kreps supplies an extensive list of resources for further study.
Cynthia Murrell, January 24, 2014
January 1, 2014
The article titled Codd’s Relational Vision – Has NoSQL Come Full Circle on opensource connections relates the history of relational databases and applies their lessons to the NoSQL databases so popular today. The article walks through the simplest databases that followed the hierarchical model and then into generalized databases. The article then delves into the work of Edgar F. Codd himself:
“When Codd wrote his paper, he criticized the DBTG databases of the day around the area of how the application interacted with the databases abstractions. Low-level abstractions leaked into user applications. Application logic became dependent on aspects of the databases: Specifically, he cites three criticisms: access dependencies… order dependencies… index dependencies… Codd proposed to get around these limitations by focusing on a specific abstraction: relations…. In short, Codd created a beautiful abstraction that turned out to be reasonable to implement.”
Then came the decision to build horizontally scalable systems, which were incompatible with Codd’s abstraction. The article ultimately suggests that the smart way to approach a database is to base it off of your needs, not off of what is currently trending. There is even a Contact us link for readers who aren’t sure what type of database to select, hierarchical or relational.
Chelsea Kerwin, January 01, 2014
December 18, 2013
Interesting—it seems the venerated Thomas Bayes is now with us in database land. BayesDB is being developed, in conjunction with an analysis method called CrossCat, by a team of folks from MIT‘s Probabilistic Computing Project and the Shafto Lab at the University of Louisville.
The project’s page explains:
“BayesDB, a Bayesian database table, lets users query the probable implications of their data as easily as a SQL database lets them query the data itself. Using the built-in Bayesian Query Language (BQL), users with no statistics training can solve basic data science problems, such as detecting predictive relationships between variables, inferring missing values, simulating probable observations, and identifying statistically similar database entries.
BayesDB is suitable for analyzing complex, heterogeneous data tables with up to tens of thousands of rows and hundreds of variables. No preprocessing or parameter adjustment is required, though experts can override BayesDB’s default assumptions when appropriate.
BayesDB’s inferences are based in part on CrossCat, a new, nonparametric Bayesian machine learning method, that automatically estimates the full joint distribution behind arbitrary data tables.”
The database is designed for two types of folks: those with no statistics chops who nonetheless have tabular data to analyze, and those proficient with statistics who have a non-standard problem or who have no time or patience for custom modeling. The team credits CrossCat in part with making BayesDB possible, but also say the BQL language was key to its development.
The description includes examples, a discussion of which types of data and problems the database addresses best, reasons to trust the results, why they named it BayesDB, and more. Check out the page for all the details.
Cynthia Murrell, December 18, 2013
December 16, 2013
Through their blog, Attivio weighs in on the HealthCare.gov service: “Could IBM or Oracle Have Been the Miracle Cure for Healthcare.gov?” The telling subtitle is reads, “if you believe that, then I have a bridge to sell you.” Yes, Attivio comes out against pinning all the blame on a refusal to go with the tried and true (or outdated and limited, depending on one’s perspective.)
Senior Attivio marketing VP (and blogger) MaryAnne Sinville observes that the latest trend in the finger-pointing crusade is to assert that the site’s database component should have gone to an old stalwart like IBM, Oracle, or Microsoft instead of to the NoSQL firm MarkLogic. Not because those databases are better suited to the project, necessarily, but because it is easier to find technicians familiar with those systems.
“Does anyone really believe a better solution to a project involving many disparate sources of information, complex logic, and a dynamic interface, which must be built in a very short timeframe would have been to select IBM, Microsoft or Oracle? The idea that legacy mega-vendors have the agility required for a project of this scope is absurd, as the states of Oregon, Pennsylvania and the US Air Force have all recently learned the hard way.
Let’s take a look at the real issues at play here. Selecting a NoSQL database like MarkLogic, or more precisely in this case, an XML database, means that all of the Healthcare.gov data sources would have to be converted to XML. Of course that’s a monumental task, but it’s no more difficult and time consuming than the arduous extract, transform and load (ETL) processes required by traditional relational databases because of their fixed schema. The enormous time and cost associated with ETL is precisely why new technologies are emerging.”
For a nation that prides itself on innovation, we seem to have a lot of folks afraid of progress. Granted, Attivio has a stake in encouraging organizations to break away from traditional database providers. Still, I agree that a project this size called for the most up-to-date approach available. Let us turn our accusatory gaze from MarkLogic, which after all represents a small fraction of the vendors involved with this website, to where it belongs: on our government’s unwieldy and outdated procurement process. Granted, addressing that will be much tougher than assigning a scapegoat, but the approach has a singular advantage—it might actually fix a problem currently poised to cause us trouble for years to come.
Cynthia Murrell, December 16, 2013
November 24, 2013
I read “DB-Engines Ranking.” What struck me is that search engines were included in the list. More remarkable, some of the search systems are not data management systems at all. One data management system bills itself as a search engine. I was surprised to find the Google Search Appliance listed. The system is expensive and garners only basic support from the “search experts” at Google.
Let me highlight the search related notes I made as I worked through the list of 171 systems.
- At position 12 is Solr. This is the open source faceted search engine that can be downloaded and installed—usually.
- At position 21d is ElasticSearch. The person who created Compass whipped up ElasticSearch and made some changes to enhance system performance. With $39 million in venture funding, ElasticSearch can be many things, but for me the company does search and retrieval.
- At position 27 is Sphinx Search. This system makes it easy to retrieve information from MySQL and some other databases without writing formal SQL queries.
- At position 38, MarkLogic is the polymath among the group. The company bills itself as enterprise search, XML data management system, and business intelligence vendor. The company also enjoys some notoriety due to its contributions to the exceptional Healthcare.gov project.
- In position 44 is the Google Search Appliance. The system is among the most expensive appliances I have examined. Is the GSA an end of life project? Is the GSA a database system? My view is that it is a somewhat limited way to get Google style results for users who see Google as the champion in the search derby.
- At position 104 is Xapian. Again, I don’t think of Xapian and its enthusiastic supporters as card carrying members of the database society. For me, Xapian evokes thoughts of Flax.
- At position 124 is CloudSearch. Amazon’s somewhat old fashioned search system. Frankly I think of Amazon as more of a database services outfit than a search outfit.
- At position 127 is the end of life Compass Search. This was the precursor to ElasticSearch. There are those who are happy with an old school open source solution. Good for them.
- At position 149 is SearchBlox. Now SearchBlox uses ElasticSearch. Interesting?
- At position 163 is SRCH2. This vendor is one that has some organizational challenges. The focus of the company seems to be shifting to mobile search.
Quite an eclectic list. Some of the systems mentioned are search engines; for example, Basho Riak. In terms of list “points”, ElasticSearch looks like the big winner. Shay Bannon made the list with Compass. ElasticSearch is moving up the charts. SearchBlox uses ElasticSearch in its product. What happened to LucidWorks and reflexive search?
Which of these systems would you select for data management? My thought is that one should check out the software before taking a list at face value.
The confusion about search is evident in this list. No wonder the LinkedIn discussion groups want to do surveys to figure out what search means.
Stephen E Arnold
November 13, 2013
Basho has released a technical preview of Riak 2.0, the company announced at the Ricon West developers’ conference last month in San Francisco. Several key improvements have been made to the open source distributed database: additional Riak data types; the option for strong consistency; full-text search integration with Apache Solr; more flexibility in security administration; simplified configuration management; and the option of storing fewer replicas across multiple data centers. See the article for details on each of these changes.
The press release emphasizes that this is not the final release of Riak 2.0, and that Basho would like users’ feedback:
“Please note that this is only a Technical Preview of Riak 2.0. This means that it has been tested extensively, as we do with all of our release candidates, but there is still work to be completed to ensure its production hardened. Between now and the final release, we will be continuing manual and automated testing, creating detailed use cases, gathering performance statistics, and updating the documentation for both usage and deployment. As we are finalizing Riak 2.0, we welcome your feedback for our Technical Preview. We are always available to discuss via the Riak Users mailing list, IRC (#riak on freenode), or contact us.”
Riak is developed by Basho Technologies, who naturally offers a commercial edition of the NOSQL database. They also offer Riak CS, a cloud-based object storage system deployable on top of Riak. The company positions their enterprise version as the solution for companies whose needs go beyond the traditional database or who have wrestled with scalability constraints within relational databases. Founded in 2008, Basho is headquartered in in Cambridge, Massachusetts, and maintains offices in London, San Francisco, Tokyo, and Washington D.C.
Cynthia Murrell, November 13, 2013
November 13, 2013
We already knew that MarkLogic is good at search. Now the company is being recognized for its database management chops, we learn from “MarkLogic Featured in the Gartner Magic Quadrant for Operational Database Management Systems” at BWW Geeks World.
The press release tells us:
“MarkLogic has been positioned for its ability to execute and is the only Enterprise NoSQL database vendor featured in the report that integrates search and application services. . . .
MarkLogic is the only schema-agnostic Enterprise NoSQL database that integrates semantics, search and application services with the enterprise features customers require for production applications. This combination helps enterprises make better-informed decisions and create robust, scalable applications to drive revenue, streamline operations, manage risk and make the world safer. MarkLogic features ACID transactions, horizontal scaling, real-time indexing, high availability, disaster recovery, and government-grade security.”
CEO Gary Bloom does not let us forget his company’s search success. He points out that they also captured a place on Gartner‘s 2013 Magic Quadrant for Enterprise Search roster, and that they are the only company to be included in both reports. He understandably takes this achievement as evidence that MarkLogic is on the right track with its integrated approach. The company focuses on scalability, enterprise-readiness, and leveraging the latest technology. Founded in 2001, MarkLogic is headquartered in Silicon Valley and maintains offices around the world.
Cynthia Murrell, November 13, 2013