Betting $11 Million That Content Processing Can Be Fixed

February 13, 2020

The Semantic Web, data lakes, data ponds, dark data, federated information, natural language processing — you have heard the buzzwords for years. The solution? MarkLogic, IBM (Data Fountain, OmniFind, Vivisimo, or Watson), social graph outfits like CluedIn, and Google’s Ramanathan Guha inventions. What about Kapow? And there are others, hundreds maybe.

Nevertheless, making sense of oceans of digital information is a bit of task. What MBA-inspired manager asks about document exception folders? Ah, what’s that mean? Just delete them because no one wants to explain. It is Foosball time.

“AI Document Engineering Startup Docugami Raises $10M Seed Round in Unusually Large Early Stage Deal” reports some interesting information; for example:

Some former Microsofties did not gain traction at the Amazon-chasing Redmond firm

Funding sources include an assortment of investment firms SignalFire and NextWorld Capital. There are some people with links to the Google

What does Docugami seek to do? The article states:

The startup’s technology uses artificial intelligence to help users create documents such as contracts and reports that can then be analyzed in the aggregate as if the contents were stored in a structured database.

Okay, smart software, machine learning, computer vision, and “unique XML approaches.”

The millions of money indicate that the company founder Jean Paoli (who had his fingers on the keyboard cranking out the XML standard) can tell a heck of a story. The official word for this craft is “creating a narrative.”

The most interesting factoid in the write up is the multiple references to InfoPath. As you may know, InfoPath appears in Office 2003 and disappeared in 2014. Like many Microsoft ideas, filling in the blanks — like filling out a form to get work at Wendy’s — is a logical way to get users to generate structured data. Yeah, well. InfoPath is still around, and there are some rah rah users, but support officially ends in 2026. (Some of those users like forms and spend lots of money for SharePoint and other Microsoft works in progress.)

What happened to InfoPath other than not becoming the next Azure super service? XML and structured data for information in email, note apps, Excel files used to allow analysts to write their reports in a spreadsheet, and other Microsoft products was not a home run. That’s one problem, and the idea is to let smart software apply structure, assign index terms, extract named entities, and perform “knowledge extraction.” Sounds easy. Yeah, well.

But the federation issue has some other facets, and it is not clear if the Docugami approach will solve these; for example:

Does a company want software to have access to content which may be confidential, incriminating, or restricted by law or common sense (that new drug in trial seems to be killing people so let’s not index that)?
How does a content and indexing system deal with the wild and crazy information on the Internet? Some of that information may be important in litigation, competitive intelligence, and personal idiosyncrasies like comments added to certain interesting social media content.
What happens when copyrighted material is sucked into the Docugami digital weather system? What happens when pornographic, drug related, and other information of a possible criminal nature is indexed along with those human resource salary data and the actual earnings data on the CFO’s computing device?
Where will the content reside? What’s the cost for storage, transmission, updating, and flagging “incorrect” data?

For quite specific types of content, InfoPath and probably Docugami makes sense.

But the narrative may be more important than the word painting to describe a world in which information is at one’s fingertips.

Is DarkCyber skeptical? Not at all. There is insufficient information at this time to determine if those millions are bet on a potential Kentucky Derby winner or a creature who will spend its life carrying kids around a dude ranch’s pony ride.

Stephen E Arnold, February 13, 2020

Written by Stephen E. Arnold · Filed Under Business strategy, Investment, News, XML

Comments

One Response to “Betting $11 Million That Content Processing Can Be Fixed”

maha8.com on April 7th, 2020 11:26 am

I was suggested this blog by means of my cousin. I am now not
sure whether this post is written by him as no one else realize
such detailed approximately my difficulty. You’re amazing!
Thanks!

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.