Map Economics: Useful Content and One Major Omission

February 13, 2020

DarkCyber spotted a paper called “The Economics of Maps.” The authors have presented some extremely useful and interesting information about depicting the real world.

One of the most useful aspects of the article is the list of companies providing different types of mapping services and data. The list of firms in this business includes such providers, vendors, and technology companies as:

Airbus

Farmers Edge

Mapbox

Pitney Bowes

There are some significant omissions; for example, the category for geo-analytics for law enforcement and intelligence applications; for example, the low profile Geogence and investigative tools like those available from Verint.

Worth reading and tucking into one’s intelligence folder in our opinion.

Stephen E Arnold, February 13, 2020

Will Amazon Send President Trump a Valentine This Year?

February 13, 2020

DarkCyber noted “Amazon Wants Trump to Testify on Order to Screw Amazon in Pentagon Deal.” The Australian information service states:

Amazon Web Services said on Monday it was seeking to depose President Donald Trump and Defense Secretary Mark Esper in its lawsuit over whether the president was trying “to screw Amazon” when it awarded a Pentagon contract for cloud computing to rival Microsoft Corp. The Amazon.com Inc unit alleged that Trump, who has publicly derided Amazon head Jeff Bezos and repeatedly criticized the company, exerted undue influence on the decision to deny it the US$10 billion contract.

Years ago I read the handbook of modern management, De Principatibus by Niccolo Machiavelli. One observation I sort of recall is:

If an injury has to be done to a man it should be so severe that his vengeance need not be feared.

Worth monitoring billionaires fighting.

Stephen E Arnold, February 13, 2020

The Clouds in UAE: Amazon Not Mentioned

February 13, 2020

Here’s the big reveal in the write up titled “Microsoft Sees Room for Growth Opportunities for All Cloud Providers in UAE.”

The US technology giant [Microsoft] offers three main clouds – Azure, Microsoft Office 365 and Dynamics 365.

Three clouds. Well, Google has 10 chat apps. So much for efficiency, federation, and distributed architectures.

Other factoids in the write up, in DarkCyber’s opinion, are:

  • The growth for Microsoft and “all” is PaaS or Platform as a Service.
  • Competition is good, presumably among members of the oligopoly players in the “cloud”
  • Alibaba has a cloud data center in the UAE
  • The intelligent edge is a reality. What’s an intelligent edge? Hey, no need to explain this bit of verbal frippery.
  • There are more than 1,000 technologies on Azure. Can anyone list these? DarkCyber cannot.

The write up does not mention the other outfit near Microsoft. But Amazon has some operations in the UAE, Bahrain, and other countries in the area as well.

The write up toots Microsoft’s tuba and Oracle’s flute.

Yep, useful marketing packaged as “real” analysis. But three “main clouds”?

Stephen E Arnold, February 13, 2020

Betting $11 Million That Content Processing Can Be Fixed

February 13, 2020

The Semantic Web, data lakes, data ponds, dark data, federated information, natural language processing — you have heard the buzzwords for years. The solution? MarkLogic, IBM (Data Fountain, OmniFind, Vivisimo, or Watson), social graph outfits like CluedIn, and Google’s Ramanathan Guha inventions. What about Kapow? And there are others, hundreds maybe.

Nevertheless, making sense of oceans of digital information is a bit of task. What MBA-inspired manager asks about document exception folders? Ah, what’s that mean? Just delete them because no one wants to explain. It is Foosball time.

AI Document Engineering Startup Docugami Raises $10M Seed Round in Unusually Large Early Stage Deal” reports some interesting information; for example:

Some former Microsofties did not gain traction at the Amazon-chasing Redmond firm

Funding sources include an assortment of investment firms SignalFire and NextWorld Capital. There are some people with links to the Google

What does Docugami seek to do? The article states:

The startup’s technology uses artificial intelligence to help users create documents such as contracts and reports that can then be analyzed in the aggregate as if the contents were stored in a structured database.

Okay, smart software, machine learning, computer vision, and “unique XML approaches.”

The millions of money indicate that the company founder Jean Paoli (who had his fingers on the keyboard cranking out the XML standard) can tell a heck of a story. The official word for this craft is “creating a narrative.”

The most interesting factoid in the write up is the multiple references to InfoPath. As you may know, InfoPath appears in Office 2003 and disappeared in 2014. Like many Microsoft ideas, filling in the blanks — like filling out a form to get work at Wendy’s — is a logical way to get users to generate structured data. Yeah, well. InfoPath is still around, and there are some rah rah users, but support officially ends in 2026. (Some of those users like forms and spend lots of money for SharePoint and other Microsoft works in progress.)

What happened to InfoPath other than not becoming the next Azure super service? XML and structured data for information in email, note apps, Excel files used to allow analysts to write their reports in a spreadsheet, and other Microsoft products was not a home run. That’s one problem, and the idea is to let smart software apply structure, assign index terms, extract named entities, and perform “knowledge extraction.” Sounds easy. Yeah, well.

But the federation issue has some other facets, and it is not clear if the Docugami approach will solve these; for example:

  • Does a company want software to have access to content which may be confidential, incriminating, or restricted by law or common sense (that new drug in trial seems to be killing people so let’s not index that)?
  • How does a content and indexing system deal with the wild and crazy information on the Internet? Some of that information may be important in litigation, competitive intelligence, and personal idiosyncrasies like comments added to certain interesting social media content.
  • What happens when copyrighted material is sucked into the Docugami digital weather system? What happens when pornographic, drug related, and other information of a possible criminal nature is indexed along with those human resource salary data and the actual earnings data on the CFO’s computing device?
  • Where will the content reside? What’s the cost for storage, transmission, updating, and flagging “incorrect” data?

For quite specific types of content, InfoPath and probably Docugami makes sense.

But the narrative may be more important than the word painting to describe a world in which information is at one’s fingertips.

Is DarkCyber skeptical? Not at all. There is insufficient information at this time to determine if those millions are bet on a potential Kentucky Derby winner or a creature who will spend its life carrying kids around a dude ranch’s pony ride.

Stephen E Arnold, February 13, 2020

Managing a Science Club Is Hard: Human Re What? Personnel Who?

February 13, 2020

DarkCyber noted a story titled “Google HR Chief Eileen Naughton Steps Aside As Worker Activis…” No, that’s the title. There are other versions of the story, but the Gadgets 360 take captures how many Googlers respond when human resources is mentioned. “Human re what? Personnel who?”

The story is a revolving door tale. Get out while the getting is good. Recently some high profile and somewhat interesting people have left the online ad machine: A lawyer, two founders, some disgruntled employees, and now the head of human re what?

The article states:

In recent years, the Google workplace has been disrupted by employee opposition to top-level decisions ranging from forging contracts with the US military to tailoring a version of the search engine for China. Google in November fired four employees on the grounds they had violated data security policies, but the tech titan was accused of persecuting them for trying to unionize staff. The dismissals of the quartet — dubbed the “Thanksgiving Four” on social media — deepened staff-management tensions at a company once seen as a paradigm of Silicon Valley freedoms but now embroiled in numerous controversies. One of the workers fired was connected to a petition condemning Google for working with the US customs and border patrol agency, which has been involved in President Donald Trump’s crackdown on illegal immigration. Google employees have also openly opposed the company pursuing contracts to put its technology to work for the US military.

The Googlers are a frisky group of youngish wizards. Managing a science club is difficult. A high school science club? Yep, more difficult than a college science club.

Human re what? Personnel who? There is LinkedIn for those in need of a job.

Stephen E Arnold, February 13, 2020

Acquiring Data: Addressing a Bottleneck

February 12, 2020

Despite all the advances in automation and digital technology, humans are still required to manually input information into computers. While modern technology makes automation easier than ever millions of hours are spent on data entry. Artificial intelligence and deep learning could be the key to ending data entry says Venture Beat article, “How Rossum Is Using Deep Learning To Extract Data From Any Document.”

Rossum is an AI startup based in Prague, Czechoslovakia, founded by Tomas Gogar, Tomas Tunys, and Petr Baudis. Rossum was started in 2017 and its client list has grown to include top tier clients: IBM, Box, Siemens, Bloomberg, and Siemens. Its recent project focuses on using deep learning to end invoice data entry. Instead of relying entirely on optical character recognition (OCR) Rossum uses “cognitive data capture” that trains machines to evaluate documents like a human. Rossum’s cognitive data capture is like an OCR upgrade:

“OCR tools rely on different sets of rules and templates to cover every type of invoice they may come across. The training process can be slow and time-consuming, given that a company may need to create hundreds of new templates and rule sets. In contrast, Rossum said its cloud-based software requires minimal effort to set up, after which it can peruse a document like a human does — regardless of style or formatting — and it doesn’t rely on fully structured data to extract the content companies need. The company also claims it can extract data 6 times faster than with manual entry while saving companies up to 80% in costs.”

Rossum’s cloud approach to cognitive data capture differentiates it from similar platforms due to being located on the cloud. Because Rossum does not need on-site installation, all of Rossum’s rescuers and engineering goes directly to client support. It is similar to Salesforce’s software-as-a-service model established in 1999.

The cognitive data capture tool works faster and unlike its predecessors:

“Rossum’s pretrained AI engine can be tried and tested within a couple of minutes of integrating its REST API. As with any self-respecting machine learning system, Rossum’s AI adapts as it learns from customers’ data. Rossum claims an average accuracy rate of around 95%, and in situations where its system can’t identify the correct data fields, it asks a human operator for feedback to improve from.”

Rossum is not searching to replace human labor, instead they want to free up human time to focus on more complex problems.

Whitney Grace, February 12, 2020

Live at Five: Queue the Avatar! Slash Costs!

February 12, 2020

Thomson Reuters has been looking for a revenue hockey stick since Michael Brown and Gene Garlan departed. The company has not been a home run in the innovation department. Palantir Technology did not provide the zoom zoom some stakeholders wanted. The Thomson “labs”. Sorry, no TikTok from those hard working Thomson Reuter wizards.

The fix, however, may be deep fakes, automated news, and some AI sizzle. Reddit, a social information service, posted a link to “Reuters Built a Prototype for Automated News Videos Using Deepfakes Tech.” The write up explains:

Designed as a proof-of-concept, the system takes real-time scoring data from football matches and generates news reports complete with photographs and a script. Synthesia and Reuters then use a neural network similar to Deepfakes and prerecorded footage of a real news anchor to turn the script into a “live” video of the news anchor giving up-to-the-second scoring updates.

The technology comes from Synthesia, founded in 2017. (One of the company’s investors is the Sharktank and video savant Mark Cuban.) The company describes itself as a “next gen content creation” outfit.

You can try the service by navigating to this link. I said I was Nancy. And this fake humanoid delivered a short summary to me:

image

The company’s Web site says:

Go beyond the regular edit suite … forge more meaningful relationships with your global audiences using Synthesia’ powerful content tools.

Is this the winner Thomson Reuters has been seeking for a decade or so? If the company applies its 10-10-20 formula, that’s possible, just unlikely. If today’s Thomson Reuters can manage some of the Lord Thomson of Fleet magic, the professional publishing and news company could disrupt how news can be generated and streamed at bargain basement rates. Hasta la vista talking heads.

An avatar with real AI will present the news: Objective, content rich, and without the hassles of humans, vacations, benefits, and dealing with wimpy humanoid issues like “my manager is not treating me fairly.”

Worth watching.

Stephen E Arnold, February 12, 2020

NoSQL DBMS: A Surprising Inclusion

February 12, 2020

Top Databases Used in Machine Learning Project” is a listicle. The information in the write up is similar to the lists of “best” products whipped up by Silicon Valley type publications, mid tier consulting firms (a shade off the blue chip outfits like McKinsey, Booz, and BCG), and 20 somethings fresh from university.

The interesting inclusion in the list of DBMS is?

If you said, Elasticsearch you would be correct. Elasticsearch is an open source play doing business as Elastic. The open source version is at its core a search and retrieval system. (Does this mean the index is the data and the database?)

DarkCyber is not going to get into a discussion of whether an enterprise search system can be a database management system. Both sides in the battle are less interested in resolving the fuzzy language than making sales.

Maybe Elasticsearch is just doing what other enterprise search systems have done since the 1980s? Vendors describe search and retrieval as the solution to the world’s data management Wu Flu.

Net net: Without boundaries, why make distinctions? Just close the deal. Distinctions are irrelevant for some business tasks.

Stephen E Arnold, February 12, 2020

Easy Facial Recognition

February 11, 2020

DarkCyber spotted a Twitter thread. You can view it here (verified on February 8, 2020). The main point is that using open source software, an individual was able to obtain (scrape; that is copying) images from publicly accessible services. Then the images were “processed.” The idea was identify a person from an image. Net net: People can object to facial recognition, but once a technology migrates from “little known” to public-available, there may be difficulty putting the tech cat bag in the black bag.

Stephen E Arnold, February 11, 2020

TemaTres: Open Source Indexing Tool Updated

February 11, 2020

Open source software is the foundation for many proprietary software startups, including the open source developers themselves. Most open source software tends to lag in the manner of updates and patches, but TemaTres recently updated according to blog post, “TemaTres 3.1 Release Is Out! Open Source Web Tool To Manage Controlled Vocabularies.”

TemaTres is an open source vocabulary server designed to manage controlled vocabularies, taxonomies, and thesauri. The recent update includes the following:

“Utility for importing vocabularies encoded in MARC-XML format

  • Utility for the mass export of vocabulary in MARC-XML format
  • New reports about global vocabulary structure (ex: https://r020.com.ar/tematres/demo/sobre.php?setLang=en#global_view)
  • Distribution of terms according to depth level
  • Distribution of sum of preferred terms and the sum of alternative terms
  • Distribution of sum of hierarchical relationships and sum of associative relationships
  • Report about terms with relevant degree of centrality in the vocabulary (according to prototypical conditions)
  • Presentation of terms with relevant degree of centrality in each facet
  • New options to config the presentation of notes: define specific types of note as prominent (the others note types will be presented in collapsed div).
  • Button for Copy to clipboard the terms with indexing value (Copy-one-click button)
  • New user login scheme (login)
  • Allows to config and add Google Analytics tracking code (parameter in config.tematres.php file)
  • Improvements in standard exposure of metadata tags
  • Inclusion of the term notation or code in the search box predictive text
  • Compatibility with PHP 7.2”

TemaTres does updates frequently, but it is monitored. The main ethos about open source is to give back as much as you take. TemaTres appears to follow this modus operandi. It TemaTres wants to promote its web image, the organization should really upgrade its Web site, fix the broken links, and provide more information on what the software actually does.

Whitney Grace, February 11, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta