MarkLogic and the New York Times

December 2, 2013

On Saturday, November 30, 2013, The New York Times published “Health Care Site Rushing to Make Fixes by Sunday.” As I now know, mission accomplished. But there was no aircraft carrier, brass band, or flag. (Here’s the link to the online story, but like so many “real” journalistic efforts, the link can go dead and you will have to hunt for a November 30, 2013 Times and look on pages A 1 with a jump to page A 12. Penguin, there is nothing I care to do about the link. Sorry.)

I wanted to document this passage from the Times’ story about MarkLogic. What’s interesting is that the company gets little attention from other “real” journalists. I suppose if I were curious, I would attempt to answer the question, “Why?”

I am not curious. Here’s what snagged my attention on the 30th:

Gary C. Boom, the chief executive officer of another vendor, MarkLogic, said his firm is also moving its software to differently configured servers.

The idea is from MarkLogic’s neighbor in Silicon Valley, Oracle. A few years ago, Oracle wrote a white paper banging on MarkLogic’s technology. You can find a copy of that analysis in “Mark Logic XML Server 4.1.” I wrote about the tempest in “A Coming Dust Up between Oracle and MarkLogic?

The Times’ story continued:

MarkLogic provided the technology for the database that serves as the system’s internal filing cabinet and index.

The story does not make clear whether MarkLogic is an XML server that acts like a junction box among the moving parts of the HealthCare.gov site, a data management system interacting with Oracle’s technology, or a search engine for the Web site. MarkLogic positions its technology as doing each of these functions plus analytics, business intelligence, customer relationship management, publishing, and probably some other functions as well.

the Times quotes Mr. Bloom as having said:

I am picking up my house and moving it to a better foundation next door,” he [Mr. Bloom] said in an interview. He said MarkLogic is performing up to standard, but “the network and the storage systems are not properly sized and not properly run.”

It is not clear to me which vendor is providing the storage systems. Is it MarkLogic or is it another vendor such as Oracle, a company apparently unimpressed with some of MarkLogic’s technology if I understand the Oracle white paper.

The Times added:

“Another critical problem involved the specifications for a major computer switch that connects the computer services through a security firewall to the Internet. Mr. Bloom said it has been upgraded from four gigabytes a second to 60 [gigabytes a second]. He said the earlier speed was the equivalent of employing four security staffers to screen Heathrow Airport’s passengers. “The line to get through,” he said, “would go back to the city of London.”

I am not sure how these issues did not become known to the vendors pushing data through the system, but apparently, the 15X shortfall was not noticed. I wonder how many home builders move a completed house to a new foundation. Also, what if the security folks at Heathrow are more or maybe less efficient than those located where HealthCare.gov is?

I will keep my eye on this issue because MarkLogic has been emphasizing that it offers a search system. Where there is a search vendor, there seems to be some activity of interest. And where there are MarkLogic and Oracle, there may be some interesting discussion between the parties.

Stephen E Arnold, December 2, 2013

MarkLogic: Data Management and Healthcare.gov

November 23, 2013

I read “Tension and Flaws Before Health Website Crash.” The good news is that the story focuses on what is now old news: Management challenges at the agency responsible for Healthcare.gov. The bad news—at least for champions of XML repositories, XML normalization, and XML as the “answer” to a wide range of information management woes—is that XML (extensible markup language) is not the slam dunk, whiz bang solution some true believers hope.

Here’s the passage that caught my attention:

Another sore point was the Medicare agency’s decision to use database software, from a company called MarkLogic, that managed the data differently from systems by companies like IBM, Microsoft and Oracle. CGI officials argued that it would slow work because it was too unfamiliar. Government officials disagreed, and its configuration remains a serious problem.

MarkLogic has not been identified as a vendor creating some headaches until now. MarkLogic has a system that can store information and data in an XML data management system. The trick is that content not in XML must be normalized; that is, converted to XML. MarkLogic has developed some proprietary methods to perform its data management operations. A person familiar with XML may not be conversant with the MarkLogic conventions. The upside of this approach is that MarkLogic has experts who are able to address most customer requests. The downside is that a person familiar with XML but not MarkLogic can introduce some problems into an otherwise spiffy system.

In the last few years, MarkLogic has had a number of senior management changes. I track the company via my Overflight system and have noted that the firm has gone from a company that does a good job of publicizing itself to an outfit that has trimmed back on its public presence. You can check out the MarkLogic Overflight on the ArnoldIT.com Web site. The minimal news flow, the absence of tweets, and the termination of public blog content can be verified by visiting the paste every few days.

One interesting aspect of MarkLogic is that the company has positioned itself as a publishing platform. Once content is in the repository, it is possible to slice and dice information and data. Publishers can use this feature to whip out books with little or no involvement of human editors. But the company has, like Verity, grafted on other features and services. These range from enterprise search to text mining to electronic mail management.

I heard that the company was to have been a $200 or $300 million dollar a year operation a few years ago. The firm may be the best kept secret in terms of its revenues and profits. If so, kudos. But if the company has not been able to demonstrate strong growth and healthy net profits, the firm may need to ramp up its publicity and marketing activities.

The New York Times’s comment may be hogwash. Even if a stretch, getting a paragraph that strikes me as less than favorable raises some questions; for example:

  1. Are proprietary extensions a good idea for an XML system that must be used by folks who are not into XML?
  2. Will the transformations between and among content from disparate systems remain bottlenecks during periods of high content flow and usage?
  3. Will Oracle seize on the MarkLogic system and revive its flow of information about the weaknesses of XML as compared with content stored in an Oracle data management system?

MarkLogic has rolled through three of four presidents in the last few years. Dave Kellogg departed, and I mostly lost track of who followed him. At the time of his departure MarkLogic was in the $60 million estimated revenues. Will the management turmoil kick in again? Will the company continue to expand its features and functions as Verity did prior to its initial public offering? Are there parallels between the trajectories of Convera, Delphes, Entopia, and Verity and MarkLogic. For some case analyses, check out www.xenky.com/vendor-profiles.

Stephen E Arnold, November 23, 2013

Xenky Vendor Profile: Dieselpoint

November 6, 2013

If you need a search system and love Java, you will want to read the most recent Xenky Vendor Profile. Dieselpoint is based in Chicago, Illinois. Compared to some search vendors, Dieselpoint keeps a low profile. The profile is available without charge at Xenky’s Vendor Profile page. Be sure to read the caveats for these free profiles. If you want to make a comment or explain a point I missed by a mile, use the comments section of Beyond Search. The profiles are drafts and will not be updated.

Stephen E Arnold, November 6, 2013

RaptorXML

July 31, 2013

For all you XML lovers out there, particularly those with dual-core machines, RaptorXML is here. Market Wired hosts, “Altova Announces General Availability of RaptorXML.” The product is part of Altova’s suite of server products. The press release informs us:

“Altova RaptorXML is a high-performance XML and XBRL server optimized for today’s multi-CPU, multi-core computers and servers. Developers creating solutions using Altova MissionKit XML development and XBRL development tools will be able to power server applications with RaptorXML for hyper-performance, increased throughput, and efficient memory utilization to validate and process large amounts of XML or XBRL data cost-effectively. . . .

“RaptorXML conforms to the latest versions of all relevant XML and XBRL standards and has been submitted to rigorous regression and conformance testing. The server is available in three versions.”

These versions include Raptor XML Server, Raptor XML+XBRL Server, and RaptorXML Development Edition. The last of these facilitates applications testing by developers working in Altova’s XMLSpy, MapForce, and StyleVision. The products are available for use on Windows, 32-bit or 64-bit, and for the 64-bit MacOS. Pricing is on an annual licensing basis, determined by the number of CPU cores in a prospective customer’s server. A few features include a low memory footprint, cross-platform capabilities, and beefed-up error reporting. See the article above (and/or this one) for more details.

The developer-centered Altova focuses on data management, software development, and data integration. The company boasts that 91% of Fortune 500 companies use their products, but emphasizes that small and medium businesses are also valuable clients. Altova splits its headquarters between Beverly, Massachusetts and Vienna, Austria.

Cynthia Murrell, July 31, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Temis and MarkLogic: Timid? Not on the Semantic Highway

April 12, 2013

My in box overfloweth. Temis has rolled out a number of announcements in the last 10 days. The company is one of the many firms offering “semantic” technology. Due to the vagaries of language, Temis is in the “content enrichment” business. The idea is that technology indexes key words and concepts even though a concept may not be expressed in a text document. I call this indexing, but “enrichment” is certainly okay.

The first announcement which caught my attention was a news release I saw on the Marketwatch for fee distribution service. The title of the article was “TEMIS Completes Successful Wide Scale Semantic Content Enrichment Test in Windows Azure.” A news release about a test struck me as unusual. The key point for me was that Temis is positioning itself to go after the SharePoint add in market.

The second announcement was a news story distributed by Eureka Alert called “Wiley Selects Temis for Semantic Big Data Initiative  The key point is that a traditional publishing company has licensed software to do what humans used to do in a venerable publishing company which, until recently, was sticking with traditional methods and products. Will Temis propel John Wiley to the top of the leader board of professional publishers? Hopefully some information will become available quickly.

The third announcement which I noted was “Temis and MarkLogic Strengthen Strategic Alliance.” The write up hits the concepts of semantics and big data. Here’s the passage which intrigued me:

MarkLogic® Server is the only enterprise NoSQL database designed for building reliable, scalable and secure search, analytics and information applications quickly and easily. The platform includes tools for fast application development, powerful analytics and visualization widgets for greater insight, and the ability to create user-defined functions for fast and flexible analysis of huge volumes of data.

I am uncomfortable with the notion of “only”. MarkLogic is an XML centric data management system. Software wrappers can use the XML back end for a range of applications. These include something as exotic as a Web site for the US Army to more sophisticated applications for publishing technical documents for an aircraft manufacturing firm. However, there are a number of ways to accomplish these tasks and some of the options make use of somewhat similar technology; for example, eXist-db. While not perfect, the fact that an alternative exists only increases my discomfort with an “only”.

So what’s up? My hunch is that both MarkLogic and Temis are in flat out marketing mode. Clusters of announcements are, in my experience, an indication that the pipeline needs to be filled. Equally surprising is that MarkLogic into a big data player and an enterprise search system, not a publishing system. Most vendors are morphing. The tie up with Temis suggests that Temis’ back end needs some beefing up. The MarkLogic positioning is that it is now a player in semantics and big data. I think that partnering is a quick way to fill gaps.

Will MarkLogic blast through the $100 million in revenue ceiling? Will Temis emerge as a giant slayer in semantic big data? The company recently raised $25 million to become a player in big data. (See “Big Data Boon: MarkLogic Pulls In $25 Million In VC Funding”.) Converting $25 million into high margin revenue could tax the likes of Jack Welch in his prime.

My hunch is that both firms’ management teams have this as a 2013 goal. With the patience of investors wearing thin for many search and content processing vendors, closed deals are a must. The economy may be improving for analysts on CNBC, but for search vendors, making Autonomy-scale or Endeca-scale revenues may be difficult, if not impossible.

In my opinion, the labels “big data” and semantics do not by themselves deliver revenue the way Google delivers Adwords. As more search firms chase additional funding, has the world of search switched from finding information for customers to getting money to stay in business?

No timidity visible as these two firms race down the semantic interstate.

Stephen E Arnold, April 12, 2013

Understanding JSON

April 8, 2013

The Altova Blog piece “Editing, Converting and Generating JSON” provides a helpful guide to using JSON. The use of JSON as a data transport protocol has been on the rise and so has the debate about the advantages of JSON vs. XML. The debate has been waging on but the author actually sums it up fairly well.

“But when you boil it down, there are simply some cases for which JSON is the best choice, and others where XML makes more sense. While you might need to choose between JSON and XML depending on the development task at hand, you don’t have to choose between code editors – XMLSpy supports both technologies and will even convert between the two.”

Altova has extended its intelligent XML editing features to JSON editor in order to make JSON editing as simple as possible. Users who begin editing JSON in text view will get lots of help along the way from XMLSpy thanks in the form of syntax coloring, bracket matching, source folding, entry helper windows, menus and other helpful tools. A one click option on the XMLSpy convert menu makes converting XML to or from JSON quick and easy. The ability to edit but also convert items directly within the XML editor program is extremely useful. JSON lovers will definitely have something to look forward to.

April Holmes, April 08, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

MarkLogic Takes Olympic Coverage From Probable Nightmare to Practical Success

February 26, 2013

Most people never really think about how news organizations transmit data across continents when there is a big event. For the Summer Olympics in 2012 The Press Association relied on MarkLogic’s XML repository’s ability to store and query hundreds of thousands of pieces of metadata per second.

In “How PA Cleared The Big Data Hurdle At The London Olympics” the Press Associations director of technical architecture, John O’Donovan, gives consumers an in depth look at how the office was able to cope with more than 50,000 requests per second.

“The problem with that is having to sit down and design a relational database model that can represent everything that’s in the XML. That takes quite a lot of time, you have to build all of your input/output extenders and map XML objects into relational stores.”

At first look it seems like an impossible task, organizing all of the photos, biographical information, statistics, and competition results for thousands of athletes and beaming it to televisions, phones and computers everywhere, but, by removing the relational database the PA made it possible.XML store instead of storing it in the relational database and then retransferring the data back to XML.

It simplified the delivery system from 100 to 34 man hour days to get off the ground and was so successful that The Press Association will be utilizing the new system for all of its wire and output communications.

Big thumbs ups to MarkLogic’s ability to handle the process and to the PA for finding a new way to utilize an already reliable resource.

Leslie Radcliff, February 26, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Altova Release New Version of MissionKit

November 30, 2012

Altova, a data management solutions provider and creator of XMLSpy, recently published the news release, “Altova Announces the Release of Version 2013 of MissionKit” on its website.

According to the article, Altova has released an integrated suite of XML, SQL, and UML tools. It offers automatic error correction and support for SQL stored procedures in data mapping projects. Prices start at $59 per product and are available for purchase in the Altova online shop.

The release states:

“Among the many updates and new features we incorporated into the Version 2013 release, one of the most significant is Smart Fix. Smart Fix is unique to XMLSpy 2013 and is a huge leap forward in intelligent XML editing. It provides options for fixing validation errors that developers can apply automatically, with a single click. It’s true XML alchemy,” said Alexander Falk, President and CEO for Altova. “With increased demands on developers today we are always looking for ways to incorporate efficiencies into our products. You simply won’t find this functionality in other tools.”

Altova’s MissionKit is certainly affordable and the suite offers great tools. However, it only saves you money if you plan on using equal numbers of XMLSpy and MapForce.

Jasmine Ashton, November 30, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Easy XML Converter for Sale

September 23, 2012

Perhaps this is useful: Sofotex offers through its site an Easy XML Converter. The downloadable software runs $119, but there is a twenty day trial period. The product description reads:

“Easy XML Converter helps to convert XML files into a variety of formats. Easy XML Converter also has a help screen that tells you which tables (elements) that are related to each other. What you want to convert, choose from a tree view, select the desired columns that you want, making it very easy to set up. The converter also supports batch job. Paths and all conversion functions are set and stored in a schema, which you activate when you are in need of conversion of the XML file.Supported formats: Excel 2003 and 2007, Text, Access (.mdb), HTML and XML”

The page goes on to list these functions: the software can convert several XML files, then merge them into one output file; users can filter converted data; a detail view of the file allows the software to double as a handy XML viewer; and backup folders are available.

We haven’t given the converter a spin yet, but it could be useful if it works as advertised. If you think such a product could help you, try it out for about nineteen days, then decide.

Cynthia Murrell, September 23, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

XML Exhausting Possibly Too Complex to Last

August 19, 2012

A post on DevXtra Editors’ Blog, “Is XML Too Big? Does Anyone Care?,” poses an interesting sentiment on the size and possibilities of XML.

XML, or the Extensible Markup Language, is too big and can be quite complex depending on the size and purpose of the documents. Syntactic analysis of XML documents are time consuming and difficult, not only for the people completing the task but also for the CPU. The World Wide Web Consortium says that XML “is a simple, very flexible text format.”

The blog post disagrees, stating:

“[…]it’s actually more difficult to parse a large document than to create one. If an XML document is damaged or malformed, software can become very confused, and often, even trivial errors or corruption in the XML document can stop processing. Working with schema extensions can be difficult, and older documents written using DTDs (Document Type Definitions) and Document Object Models (DOMs) can be incomprehensible.”

We think the better question is: “Will people care about XML in two years?” Currently, XML is crucial to exchange data and documents, but will the complexity of the system make it an inexpugnable solution? It is hard to validate using such extensive resources. A simplified system is surely, hopefully, on the way.

Andrea Hayden, August 19, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Next Page »