Betting $11 Million That Content Processing Can Be Fixed

February 13, 2020

The Semantic Web, data lakes, data ponds, dark data, federated information, natural language processing — you have heard the buzzwords for years. The solution? MarkLogic, IBM (Data Fountain, OmniFind, Vivisimo, or Watson), social graph outfits like CluedIn, and Google’s Ramanathan Guha inventions. What about Kapow? And there are others, hundreds maybe.

Nevertheless, making sense of oceans of digital information is a bit of task. What MBA-inspired manager asks about document exception folders? Ah, what’s that mean? Just delete them because no one wants to explain. It is Foosball time.

AI Document Engineering Startup Docugami Raises $10M Seed Round in Unusually Large Early Stage Deal” reports some interesting information; for example:

Some former Microsofties did not gain traction at the Amazon-chasing Redmond firm

Funding sources include an assortment of investment firms SignalFire and NextWorld Capital. There are some people with links to the Google

What does Docugami seek to do? The article states:

The startup’s technology uses artificial intelligence to help users create documents such as contracts and reports that can then be analyzed in the aggregate as if the contents were stored in a structured database.

Okay, smart software, machine learning, computer vision, and “unique XML approaches.”

The millions of money indicate that the company founder Jean Paoli (who had his fingers on the keyboard cranking out the XML standard) can tell a heck of a story. The official word for this craft is “creating a narrative.”

The most interesting factoid in the write up is the multiple references to InfoPath. As you may know, InfoPath appears in Office 2003 and disappeared in 2014. Like many Microsoft ideas, filling in the blanks — like filling out a form to get work at Wendy’s — is a logical way to get users to generate structured data. Yeah, well. InfoPath is still around, and there are some rah rah users, but support officially ends in 2026. (Some of those users like forms and spend lots of money for SharePoint and other Microsoft works in progress.)

What happened to InfoPath other than not becoming the next Azure super service? XML and structured data for information in email, note apps, Excel files used to allow analysts to write their reports in a spreadsheet, and other Microsoft products was not a home run. That’s one problem, and the idea is to let smart software apply structure, assign index terms, extract named entities, and perform “knowledge extraction.” Sounds easy. Yeah, well.

But the federation issue has some other facets, and it is not clear if the Docugami approach will solve these; for example:

  • Does a company want software to have access to content which may be confidential, incriminating, or restricted by law or common sense (that new drug in trial seems to be killing people so let’s not index that)?
  • How does a content and indexing system deal with the wild and crazy information on the Internet? Some of that information may be important in litigation, competitive intelligence, and personal idiosyncrasies like comments added to certain interesting social media content.
  • What happens when copyrighted material is sucked into the Docugami digital weather system? What happens when pornographic, drug related, and other information of a possible criminal nature is indexed along with those human resource salary data and the actual earnings data on the CFO’s computing device?
  • Where will the content reside? What’s the cost for storage, transmission, updating, and flagging “incorrect” data?

For quite specific types of content, InfoPath and probably Docugami makes sense.

But the narrative may be more important than the word painting to describe a world in which information is at one’s fingertips.

Is DarkCyber skeptical? Not at all. There is insufficient information at this time to determine if those millions are bet on a potential Kentucky Derby winner or a creature who will spend its life carrying kids around a dude ranch’s pony ride.

Stephen E Arnold, February 13, 2020

Et Tu, Brutus? Oracle Database on the Way Out

January 10, 2017

i read “NoSQL to Undo Oracle’s Database Reign.” The author is a person who once worked at Oracle. Like Brutus, the author knows Julius Caesar. Sorry, I meant the jet loving, top dog at Oracle.

The tussle between Oracle and MarkLogic seems likely to continue in 2017. The write up explains that Oracle has become a lot like IBM. I learned:

Like IBM did in the past, Oracle and the other incumbents are adding features to old technologies in an attempt to meet today’s challenges — features such as in-memory, graph, JSON and XML support. None of them have changed their underlying architectures so their efforts will fall short, just as IBM’s did in the last generational shift of the database industry 35 years ago. What’s more, their widely publicized moves of shifting old technology to the cloud changes the deployment model but doesn’t help solve the modern data challenges their customers are facing. An outdated database technology on the cloud is still an outdated database.

The new champion of the data management world is MarkLogic, the outfit where Gary Bloom labors. MarkLogic, I concluded, is one of the “emergent winners.”

That’s good.

MarkLogic is an XML centric data management system. XML is ideal for slicing and dicing once the data have been converted to validated XML. For some folks, changing a legacy AS/400 Ironside output into XML might be interesting. But, it seems, that MarkLogic has cracked the data conversion, transformation, extraction, and loading processes. Anyone can do it. Perhaps not everyone because there are some proprietary tweaks to the open source methods required by the MarkLogic system. No problem, but volume, time, and cost constraints might be an issue for some use cases.

I noted this passage in the undated write up:

There is definitely shake out of the NoSQL vendors and MarkLogic is one of the emergent victors. As an enterprise-ready NoSQL database that handles multiple models natively and doesn’t care if you have two or hundreds of data silos, MarkLogic is becoming the database platform for those with complex data integration problems. In fact, some companies are skipping the relational generation altogether and going straight from the mainframe to NoSQL. Virginia’s Fairfax County recently migrated years of historical data from its 30-year-old mainframe system to MarkLogic’s NoSQL. Residents and employees can now more easily and quickly search all the data—including property records going back to the 1950s and both old and new data coming from multiple data silos.

MarkLogic, however, is no spring chicken. The company was founded in 2001, which works out to 16 years old. Oh, you might recall that the total equity funding is $173.23 million with the most recent round contributing $102 million in May 2015 if the Crunchbase data are on the money. Some of that $102 million came from Gary Bloom, the author of the write up. (No wonder he is optimistic about MarkLogic. Hope is better than fear that one might have to go look for another job.)

My view is that MarkLogic wants a big fight with Oracle. That adds some zip to what is one of the less magnetic types of software in a business world excited by Amazon,  Google, Facebook, Tesla, and Uber. Personally I find data management exciting, but I gravitate to the systems and methods articulated by Googler Ramanathan Guha. Your mileage may vary.

The challenge for MarkLogic is to generate sufficient sustainable revenue to achieve one of these outcomes:

  1. A sale of the company to a firm which believes in the XML tinted world of the XML rock stars. (Yes, there’s is an XML rock star video at this link.) Obviously a lucrative sale would make the folks watching their $173 million grow into a huge payday would find this exit worthy of a happy face emoji.
  2. A surge in the number of companies convinced that MarkLogic and not an open source, now license fee alternative writing checks for multi year licenses and six figure service deals. Rapid revenue growth and high margin services may not get the $172 million back, but life would be less stressful if those numbers soar.
  3. MarkLogic goes public fueled in part by a PR battle with Oracle.

Will systems like MarkLogic’s become the future of next generation operational and transaction systems? MarkLogic believes NoSQL is the future. Will Oracle wake up and buy MarkLogic? Will Google realize its error when it passed on a MarkLogic buy out? Will Amazon figure out that life will be better without the home brew approach to data management that Amazon has taken since it shifted from an Oracle type fixation? Will Facebook see MarkLogic as a solution to some of its open source data management hassles?

Here in Harrod’s Creek, we still remember the days when MarkLogic was explaining that it was an enterprise search system, an analytics system, and a content production system. A database can be many things. The one important characteristic, however, is that the data management system generate substantial revenue and juicy profits.

Stephen E Arnold, January 10, 2017

MarkLogic Tells a Good Story

May 25, 2016

I lost track of MarkLogic when the company hit about $51 million in revenue and changed CEOs in 2006. In 2012, another CEO changed took place Since Gary Bloom, a former Oracle executive took over, the company, according to “Gary Bloom Interview: Big Data Driving Sales Boom at MarkLogic,” the company is now “topping” $100 million in annual revenue.

MarkLogic is one of the outfits laboring in the DCGX / DI2E vineyard. The company may be butting heads with outfits like Palantir Technologies as the US Army’s plan to federate its systems and data move forward.

MarkLogic opened for business in 2003 and has ingested, according to Crunchbase, $175 million in venture funding. With a timeline equivalent to Palantir Technologies’, there may be some value in comparing these two “startups” and their performance. That is an exercise better left to the feisty young MBAs who have to produce a return for the Sequoia and Wellington experts.

The interview contained two interesting statements which I found surprising:

The driver is Big Data: large corporations are convinced there is an El Dorado of untapped commercial opportunities — if only they can run their reports across all their data sources. But integrating all that data is too costly, and takes too long with relational databases. The future will be full of data in many forms, formats, and sources and how that data is used will be the differentiator in many competitive battles. If that data can’t be searched it can’t be used.

That is indeed the belief and the challenge. Based on what I have learned via open sources about the DCGS project, the reality is different from the “all” notions which fill the heads of some of the vendors delivering a comprehensive intelligence system to US government clients. In fact, the reality today seems to me to be similar to the hope for the Convera system when it was doing the “all” approach to some US government information. That, as you may recall, did not work out as some had hoped.

The second statement I highlighted is:

Although MarkLogic is tiny compared to Oracle there are some interesting parallels. “MarkLogic is at about the same size as Oracle was when I began working there. It took a long time for Oracle to get security and other enterprise features right, but when it did, that was when company really took off.”

The stakeholders hope that MarkLogic does “take off.” With more than 12 years of performance history under its belt, MarkLogic could be the next big thing. The only hitch in the git along is that normalization of information and data have to take place. Then there is the challenge of the query language. One cannot overlook the competitors which continue to bedevil those in the data management game.

With Oracle also involved in some US government work, there might be a bit of push back as the future of MarkLogic rolls forward. What happens if IBM’s data management systems group decide to acquire MarkLogic? Excitement? Perhaps.

Stephen E Arnold, May 25, 2016

The Semantic Web and JSON LD: Some Irritation Perhaps?

July 30, 2015

I read the Wikipedia article about JSON LD or JavaScript Object notation for Linked Data when I was pondering the fate of the XML centric start ups like MarkLogic. I highlighted one sentence in the Wikipedia write up which is subject to the usual caveats about bias, incorrect information, etc. And that sentence was:

JSON-LD is designed around the concept of a “context” to provide additional mappings from JSON to an RDF model.

Yes, the much loved RDF model.

When I read “JSON-LD and Why I Hate the Semantic Web,” I noticed a bit of friskiness in the word choice; for example, misguided souls, cryptic, complicated, market share, “kick RDF in the nuts,” and similar rhetorical arabesques. I do like the active verb “kick” however.

The passage I highlighted with my bright orange marker was this one:

The problem with getting a room full of smart people together is that the group’s world view gets skewed. There are many reasons that a working group filled with experts don’t consistently produce great results. For example, many of the participants can be humble about their knowledge so they tend to think that a good chunk of the people that will be using their technology will be just as enlightened. Bad feature ideas can be argued for months and rationalized because smart people, lacking any sort of compelling real world data, are great at debating and rationalizing bad decisions.

Seems normal to me.

In my opinion, this write up explains why some XML centric, Semantic Web cheerleaders have labored to generate organic growth. Just a thought. Talking to fellow travelers is reassuring and comfortable. Those not on the cruise ship may have a different point of view.

Stephen E Arnold, July 30, 2015

Amazon Learns from XML Adventurers

October 10, 2014

I recall learning a couple of years ago that Amazon was a great place to store big files. Some of the XML data management systems embraced the low prices and pushed forward with cloud versions of their services.

When I read “Amazon’s DynamoDB Gets Hugely Expanded Free Tier And Native JSON Support,” I formed some preliminary thoughts. The trigger was this passage in the write up:

many new NoSQL and relational databases (including Microsoft’s DocumentDB service) now use JSON-style document models. DynamoDB also allowed you to store these documents, but developers couldn’t directly work with the information stored in them. That’s changing today. With this update, developers can now use the AWS SDKs for Java, .NET, Ruby and JavaScript to easily map their JSON data to DynamoDB’s own data types. That turns DynamoDB in a fully-featured document store and is going to make life easier for many developers on the platform.

Is JSON better than XML? Is JSON easier to use than XML? Is JSON development faster than XML? Ask an XML rock star and the answer is probably, “You crazy.” I can hear the guitar riff from Joe Walsh now.

Ask a 20 year old in a university programming class, and the answer may be different. I asked the 20 something sitting in my office about XML and he snorted: “Old school, dude.” I hire only people with respect for their elders, of course.

Here are the thoughts that flashed through my 70 year old brain:

  1. Is Amazon getting ready to make a push for the customers of Oracle, MarkLogic, and other “real” database systems capable of handling XML?
  2. Will Amazon just slash prices, take the business, and make the 20 year old in my office a customer for life just because Amazon is “new school”?
  3. Will Amazon’s developer love provide the JSON fan with development tools, dashboards, features, and functions that push clunky methods like proprietary Xquery messages into a reliquary?

No answers… yet.

Stephen E Arnold, October 10, 2014


MarkLogic and the New York Times

December 2, 2013

On Saturday, November 30, 2013, The New York Times published “Health Care Site Rushing to Make Fixes by Sunday.” As I now know, mission accomplished. But there was no aircraft carrier, brass band, or flag. (Here’s the link to the online story, but like so many “real” journalistic efforts, the link can go dead and you will have to hunt for a November 30, 2013 Times and look on pages A 1 with a jump to page A 12. Penguin, there is nothing I care to do about the link. Sorry.)

I wanted to document this passage from the Times’ story about MarkLogic. What’s interesting is that the company gets little attention from other “real” journalists. I suppose if I were curious, I would attempt to answer the question, “Why?”

I am not curious. Here’s what snagged my attention on the 30th:

Gary C. Boom, the chief executive officer of another vendor, MarkLogic, said his firm is also moving its software to differently configured servers.

The idea is from MarkLogic’s neighbor in Silicon Valley, Oracle. A few years ago, Oracle wrote a white paper banging on MarkLogic’s technology. You can find a copy of that analysis in “Mark Logic XML Server 4.1.” I wrote about the tempest in “A Coming Dust Up between Oracle and MarkLogic?

The Times’ story continued:

MarkLogic provided the technology for the database that serves as the system’s internal filing cabinet and index.

The story does not make clear whether MarkLogic is an XML server that acts like a junction box among the moving parts of the site, a data management system interacting with Oracle’s technology, or a search engine for the Web site. MarkLogic positions its technology as doing each of these functions plus analytics, business intelligence, customer relationship management, publishing, and probably some other functions as well.

the Times quotes Mr. Bloom as having said:

I am picking up my house and moving it to a better foundation next door,” he [Mr. Bloom] said in an interview. He said MarkLogic is performing up to standard, but “the network and the storage systems are not properly sized and not properly run.”

It is not clear to me which vendor is providing the storage systems. Is it MarkLogic or is it another vendor such as Oracle, a company apparently unimpressed with some of MarkLogic’s technology if I understand the Oracle white paper.

The Times added:

“Another critical problem involved the specifications for a major computer switch that connects the computer services through a security firewall to the Internet. Mr. Bloom said it has been upgraded from four gigabytes a second to 60 [gigabytes a second]. He said the earlier speed was the equivalent of employing four security staffers to screen Heathrow Airport’s passengers. “The line to get through,” he said, “would go back to the city of London.”

I am not sure how these issues did not become known to the vendors pushing data through the system, but apparently, the 15X shortfall was not noticed. I wonder how many home builders move a completed house to a new foundation. Also, what if the security folks at Heathrow are more or maybe less efficient than those located where is?

I will keep my eye on this issue because MarkLogic has been emphasizing that it offers a search system. Where there is a search vendor, there seems to be some activity of interest. And where there are MarkLogic and Oracle, there may be some interesting discussion between the parties.

Stephen E Arnold, December 2, 2013

MarkLogic: Data Management and

November 23, 2013

I read “Tension and Flaws Before Health Website Crash.” The good news is that the story focuses on what is now old news: Management challenges at the agency responsible for The bad news—at least for champions of XML repositories, XML normalization, and XML as the “answer” to a wide range of information management woes—is that XML (extensible markup language) is not the slam dunk, whiz bang solution some true believers hope.

Here’s the passage that caught my attention:

Another sore point was the Medicare agency’s decision to use database software, from a company called MarkLogic, that managed the data differently from systems by companies like IBM, Microsoft and Oracle. CGI officials argued that it would slow work because it was too unfamiliar. Government officials disagreed, and its configuration remains a serious problem.

MarkLogic has not been identified as a vendor creating some headaches until now. MarkLogic has a system that can store information and data in an XML data management system. The trick is that content not in XML must be normalized; that is, converted to XML. MarkLogic has developed some proprietary methods to perform its data management operations. A person familiar with XML may not be conversant with the MarkLogic conventions. The upside of this approach is that MarkLogic has experts who are able to address most customer requests. The downside is that a person familiar with XML but not MarkLogic can introduce some problems into an otherwise spiffy system.

In the last few years, MarkLogic has had a number of senior management changes. I track the company via my Overflight system and have noted that the firm has gone from a company that does a good job of publicizing itself to an outfit that has trimmed back on its public presence. You can check out the MarkLogic Overflight on the Web site. The minimal news flow, the absence of tweets, and the termination of public blog content can be verified by visiting the paste every few days.

One interesting aspect of MarkLogic is that the company has positioned itself as a publishing platform. Once content is in the repository, it is possible to slice and dice information and data. Publishers can use this feature to whip out books with little or no involvement of human editors. But the company has, like Verity, grafted on other features and services. These range from enterprise search to text mining to electronic mail management.

I heard that the company was to have been a $200 or $300 million dollar a year operation a few years ago. The firm may be the best kept secret in terms of its revenues and profits. If so, kudos. But if the company has not been able to demonstrate strong growth and healthy net profits, the firm may need to ramp up its publicity and marketing activities.

The New York Times’s comment may be hogwash. Even if a stretch, getting a paragraph that strikes me as less than favorable raises some questions; for example:

  1. Are proprietary extensions a good idea for an XML system that must be used by folks who are not into XML?
  2. Will the transformations between and among content from disparate systems remain bottlenecks during periods of high content flow and usage?
  3. Will Oracle seize on the MarkLogic system and revive its flow of information about the weaknesses of XML as compared with content stored in an Oracle data management system?

MarkLogic has rolled through three of four presidents in the last few years. Dave Kellogg departed, and I mostly lost track of who followed him. At the time of his departure MarkLogic was in the $60 million estimated revenues. Will the management turmoil kick in again? Will the company continue to expand its features and functions as Verity did prior to its initial public offering? Are there parallels between the trajectories of Convera, Delphes, Entopia, and Verity and MarkLogic. For some case analyses, check out

Stephen E Arnold, November 23, 2013

Xenky Vendor Profile: Dieselpoint

November 6, 2013

If you need a search system and love Java, you will want to read the most recent Xenky Vendor Profile. Dieselpoint is based in Chicago, Illinois. Compared to some search vendors, Dieselpoint keeps a low profile. The profile is available without charge at Xenky’s Vendor Profile page. Be sure to read the caveats for these free profiles. If you want to make a comment or explain a point I missed by a mile, use the comments section of Beyond Search. The profiles are drafts and will not be updated.

Stephen E Arnold, November 6, 2013


July 31, 2013

For all you XML lovers out there, particularly those with dual-core machines, RaptorXML is here. Market Wired hosts, “Altova Announces General Availability of RaptorXML.” The product is part of Altova’s suite of server products. The press release informs us:

“Altova RaptorXML is a high-performance XML and XBRL server optimized for today’s multi-CPU, multi-core computers and servers. Developers creating solutions using Altova MissionKit XML development and XBRL development tools will be able to power server applications with RaptorXML for hyper-performance, increased throughput, and efficient memory utilization to validate and process large amounts of XML or XBRL data cost-effectively. . . .

“RaptorXML conforms to the latest versions of all relevant XML and XBRL standards and has been submitted to rigorous regression and conformance testing. The server is available in three versions.”

These versions include Raptor XML Server, Raptor XML+XBRL Server, and RaptorXML Development Edition. The last of these facilitates applications testing by developers working in Altova’s XMLSpy, MapForce, and StyleVision. The products are available for use on Windows, 32-bit or 64-bit, and for the 64-bit MacOS. Pricing is on an annual licensing basis, determined by the number of CPU cores in a prospective customer’s server. A few features include a low memory footprint, cross-platform capabilities, and beefed-up error reporting. See the article above (and/or this one) for more details.

The developer-centered Altova focuses on data management, software development, and data integration. The company boasts that 91% of Fortune 500 companies use their products, but emphasizes that small and medium businesses are also valuable clients. Altova splits its headquarters between Beverly, Massachusetts and Vienna, Austria.

Cynthia Murrell, July 31, 2013

Sponsored by, developer of Augmentext

Temis and MarkLogic: Timid? Not on the Semantic Highway

April 12, 2013

My in box overfloweth. Temis has rolled out a number of announcements in the last 10 days. The company is one of the many firms offering “semantic” technology. Due to the vagaries of language, Temis is in the “content enrichment” business. The idea is that technology indexes key words and concepts even though a concept may not be expressed in a text document. I call this indexing, but “enrichment” is certainly okay.

The first announcement which caught my attention was a news release I saw on the Marketwatch for fee distribution service. The title of the article was “TEMIS Completes Successful Wide Scale Semantic Content Enrichment Test in Windows Azure.” A news release about a test struck me as unusual. The key point for me was that Temis is positioning itself to go after the SharePoint add in market.

The second announcement was a news story distributed by Eureka Alert called “Wiley Selects Temis for Semantic Big Data Initiative  The key point is that a traditional publishing company has licensed software to do what humans used to do in a venerable publishing company which, until recently, was sticking with traditional methods and products. Will Temis propel John Wiley to the top of the leader board of professional publishers? Hopefully some information will become available quickly.

The third announcement which I noted was “Temis and MarkLogic Strengthen Strategic Alliance.” The write up hits the concepts of semantics and big data. Here’s the passage which intrigued me:

MarkLogic® Server is the only enterprise NoSQL database designed for building reliable, scalable and secure search, analytics and information applications quickly and easily. The platform includes tools for fast application development, powerful analytics and visualization widgets for greater insight, and the ability to create user-defined functions for fast and flexible analysis of huge volumes of data.

I am uncomfortable with the notion of “only”. MarkLogic is an XML centric data management system. Software wrappers can use the XML back end for a range of applications. These include something as exotic as a Web site for the US Army to more sophisticated applications for publishing technical documents for an aircraft manufacturing firm. However, there are a number of ways to accomplish these tasks and some of the options make use of somewhat similar technology; for example, eXist-db. While not perfect, the fact that an alternative exists only increases my discomfort with an “only”.

So what’s up? My hunch is that both MarkLogic and Temis are in flat out marketing mode. Clusters of announcements are, in my experience, an indication that the pipeline needs to be filled. Equally surprising is that MarkLogic into a big data player and an enterprise search system, not a publishing system. Most vendors are morphing. The tie up with Temis suggests that Temis’ back end needs some beefing up. The MarkLogic positioning is that it is now a player in semantics and big data. I think that partnering is a quick way to fill gaps.

Will MarkLogic blast through the $100 million in revenue ceiling? Will Temis emerge as a giant slayer in semantic big data? The company recently raised $25 million to become a player in big data. (See “Big Data Boon: MarkLogic Pulls In $25 Million In VC Funding”.) Converting $25 million into high margin revenue could tax the likes of Jack Welch in his prime.

My hunch is that both firms’ management teams have this as a 2013 goal. With the patience of investors wearing thin for many search and content processing vendors, closed deals are a must. The economy may be improving for analysts on CNBC, but for search vendors, making Autonomy-scale or Endeca-scale revenues may be difficult, if not impossible.

In my opinion, the labels “big data” and semantics do not by themselves deliver revenue the way Google delivers Adwords. As more search firms chase additional funding, has the world of search switched from finding information for customers to getting money to stay in business?

No timidity visible as these two firms race down the semantic interstate.

Stephen E Arnold, April 12, 2013

Next Page »

  • Archives

  • Recent Posts

  • Meta