Tracking Trends in News Homepage Links with Google BigQuery

October 17, 2019

Some readers may be familiar with the term “culturomics,” a particular application of n-gram-based linguistic analysis to text. The practice arose after a 2010 project that applied such analysis to five million historical books across seven languages. The technique creates n-gram word frequency histograms from the source text. Now the technique has been applied to links found on news organizations’ home pages using Google’s BigQuery platform. Forbes reports, “Using the Cloud to Explore the Linguistic Patterns of Half a Trillion Words of News Homepage Hyperlinks.” Writer Kalev Leetaru explains:

“News media represents a real-time reflection of localized events, narratives, beliefs and emotions across the world, offering an unprecedented look into the lens through which we see the world around us. The open data GDELT Project has monitored the homepages of more than 50,000 news outlets worldwide every hour since March 2018 through its Global Frontpage Graph (GFG), cataloging their links in an effort to understand global journalistic editorial decision-making. In contrast to traditional print and broadcast mediums, online outlets have theoretically unlimited space, allowing them to publish a story without displacing another. Their homepages, however, remain precious fixed real estate, carefully curated by editors that must decide which stories are the most important at any moment. Analyzing these decisions can help researchers better understand which stories each news outlet believed to be the most important to its readership at any given moment in time and how those decisions changed hour by hour.”

The project has now collected more than 134 billion such links. The article describes how researchers have used BigQuery to analyze this dataset with a single SQL query, so navigate there for the technical details. Interestingly, one thing they are looking at is trends across the 110 languages represented by the samples. Leetaru emphasizes this endeavor demonstrates how much faster these computations can be achieved compared to the 2010 project. He concludes:

“Even large-scale analyses are moving so close to real-time that we are fast approaching the ability of almost any analysis to transition from ‘what if’ and ‘I wonder’ to final analysis in just minutes with a single query.”

Will faster analysis lead to wiser decisions? We shall see.

Cynthia Murrell, October 17, 2019

Algolia: Cash Funding Hits $184 Million

October 15, 2019

Exalead was sucked into Dassault Systèmes. Then former Exaleaders abandoned ship. Algolia benefited from some Exalead experience. But unlike Exalead, Algolia embraced venture funding with cash provided by Accel, Point Nine Capital, Storm Ventures, and Y Combinator, among others.

DarkCyber noted “Algolia Finds $110M from Accel and Salesforce for Its Search-As-a-Service, Used by Slack, Twitch and 8K Others.” The write up reports that the company has “closed a Series C of $110 million, money that it plans to invest in R&D around its search technology, including doubling down on voice, and further global expansion in Europe, North America and Asia Pacific.”

The write up adds:

Having Salesforce as a strategic backer in this round is notable: the CRM giant currently does not have a native search product in its wide range of cloud-based services for enterprises, instead opting for endorsed integrations with third parties, such as Algolia competitor Coveo. The plan will be to further integrate with Salesforce although no products to speak of as of yet.

The challenge will be to go where few search and retrieval systems have gone before.

Some people have forgotten the disappointments and questionable financial tricks promising search vendors delivered to stakeholders and customers.

With venture firms looking for winners, returns of 20 percent will not deliver what the sources of the funds expect. The good old days of a 17X return may have cooled, but generating an 8X or 12X return may be a challenge.

Why?

In the course of our researching and writing the enterprise search report in 2003 to 2006 and out and our subsequent work, several “themes” or “learnings” surfaced:

  1. Good enough search is now the order of the day; that is, an organization-wide search system does not meet the needs of many operating units. Examples range from the legal department to research and development to engineering and the drawings plus data embedded in product manufacturing systems to information under security umbrellas with real time data and video content objects. Therefore, the “one solution” approach dissipates like morning fog.
  2. Utility search from outfits like Amazon are “good enough.” This means that a developer using Amazon blockchain services and workflow tools may use the search functions available from Amazon. Maybe Amazon will buy Algolia, but for the foreseeable future, search is a tag-along function, not a driver of the big money apps which Amazon is aiming toward.
  3. Search, regardless of vendor, must spend significant sums to enrich the functions of the system. Natural language processing, predictive analytics, entity extraction, and other desired functions are moving targets. Adding and tuning these capabilities becomes expensive. And it the experiences of Autonomy and Fast Search & Transfer are representative, the costs become difficult to control.

DarkCyber hopes that Algolia can adapt to these research factoids. If not, search and retrieval may be rushing toward a disconnect between revenues, sustainable profits, and investor expectations.

The wheel of fortune is spinning. Where will it stop? On a winner or a loser? This is a difficult question to answer, and one which Attivio, BA-Insight, Coveo, Elastic, IBM Watson, Lucidworks, Microsoft, Sinequa, Voyager Search, and others have been trying to answer with millions of dollars, thousands of engineering hours, and massive investments in marketing. I am not including the search vendors positioned as policeware and intelware; for example, BAE NetReveal, Diffeo, LookingGlass, Palantir Technologies, and Shadowdragon, among others.

Worth monitoring the trajectory of Algolia.

Stephen E Arnold, October 15, 2019

Real Life Q and A for Information Access Allegedly Arrives

October 14, 2019

DarkCyber noted “Promethium Tool Taps Natural Language Processing for Analytics.” The write up, which may be marketing oriented, asserts:

software, called Data Navigation System, was designed to enable non-technical users to make complex SQL requests using plain human language and ease the delivery of data.

The company developing the system is Promethium, founded in 2018, may have delivered what users have long wanted: Ask the computer a question and get a usable, actionable answer. If the write up is accurate, Promethium has achieved with $2.5 million in funding a function that many firms have pursued.

The article reports:

After users ask a question, Promethium locates the data, demonstrates how it should be assembled, automatically generates the SQL statement to get the correct data and executes the query. The queries run across all databases, data lakes and warehouses to draw actionable knowledge from multiple data sources. Simultaneously, Promethium ensures that data is complete while identifying duplications and providing lineage to confirm insights. Data Navigation System is offered as SaaS in the public cloud, in the customer’s virtual private cloud or as an on-premises option.

More information is available at the firm’s Web site.

Stephen E Arnold, October 14, 2019

A List of Enterprise Search Vendors

October 7, 2019

DarkCyber does not follow the enterprise search sector. In fact, two of the flagships from the 2000s found themselves caught in embarrassing financial missteps. Why? It certainly suggests that making big bucks from a search and retrieval service is difficult.

We came across a Web site called Trust Radius. This site has a section devoted to enterprise search. What we found interesting is that the site lists what seem to be the key players in the sector today. With most LE and intel policeware platforms relying on open source search like Lucene, DarkCyber was quite surprised with the line up of systems and the information provided by Trust Radius.

Here’s the list of vendors in alphabetical order, a method of presenting information which is not in favor with some whiz kids:

3RDi Search

Aderant Handshake (knowledge management for law firms)

Agree Ya Site Administrator

Algolia

Amazon Cloud Search (Lucene)

Apache Lucene

Apache Solr

Expert Systems Cogito Discover

Constructor.io Search

Coveo

Customer Matrix (customer support)

Dassault Systems Exalead (Exalead)

Dieselpoint

Elasticsearch (Elastic)

Fabasoft Mindbreeze

Fabasoft Mindbreeze Inspire

Google Search Appliance (discontinued)

IBM Watson (once Omnifind)

IBM Watson Discovery for Salesforce

IBM Watson Explorer

IManage Insight (Interwoven, Autonomy, HP, now a standalone)

Inbenta Enterprise Search

Lookeen Desktop Search (listed as Enterprise Search however)

Lucidworks Fusion ($100 million in funding)

Maana

Microfocus IDOL (Autonomy to HP to HPE to Microfocus)

Microsoft Azure (Fast Search & Transfer)

Microsoft Bing Search

Perceptive Search (ISYS Search Software to Lexmark to Highland)

Rocket NXT Enterprise Search (Aerotext)

Rockset

Searchify

Search Spring (product search)

Search Tap

Search Unify

Sinequa

SLI Systems (e commerce)

Swiftype

Synacor Video Search & Discovery

TeraText Searchable Archive for Files and Email (SAIC)

Zakta

What DarkCyber finds interesting is the omission of outfits like Oracle Endeca, Antidot, and Blossom. Also, of this listing of 41 “search systems” there are multiple enterprise search products from single companies like IBM and Microsoft. There are also e-commerce search systems and systems which do not handle enterprise content because the service supports desktops. There are two “no longer around” products and a weird blend of search utilities with text processing features. In short, this list is illustrative of the chaos, confusion, and craziness that makes some information technology professionals to buy a solution that just delivers key word and some option features.

DarkCyber believes that Amazon’s approach is likely to gain traction. That’s bad news for most of the companies on this list, particularly search vendors who manage to confuse individuals or the smart software used to create this list at Trust Radius.

It seems that the message from this list is that search is a bit of a dog’s breakfast—just as it has been for decades.

Stephen E Arnold, October 7, 2019

 

 

 

Open Source: Everything New Is Old Again

October 7, 2019

The Andreessen Horowitz open source info blitz contains some good stuff. You will want to read the essay “Open Source: From Community to Commercialization” and, if you qualify, download the pdf of lecture notes. We noted this statement from the essay about the SaaS open source business model:

In a SaaS model, you provide a complete hosted offering of the software. If your value and competitive edge is in the operational excellence of the software, then SaaS is a good choice. However, since SaaS is usually based around cloud hosting, there is the potential risk that public clouds will choose to take your open source code and compete.

Accurate.

We noted this statement at the end of the article:

I [Peter Levine / Jennifer Li?] believe Open Source 3.0 will expand how we think of and define open source businesses. Open source will no longer be RedHat, Elastic, Databricks, and Cloudera; it will be – at least in part – Facebook, Airbnb, Google, and any other business that has open source as a key part of its stack. When we look at open source this way, then the renaissance underway may only be in its infancy. The market and possibilities for open source software are far greater than we have yet realized.

Correct.

Years ago, the DarkCyber team undertook a study of a dozen open source software vendors specializing in search and retrieval. Today, most of those vendors have embraced “artificial intelligence”, “predictive analytics”, and “natural language processing”. That’s because search is a utility and the developers and vendors of general purpose open source software have to differentiate themselves. In the course of that research, DarkCyber noted several things.

  1. Big companies in 2008 were among the most enthusiastic testers and eventually users of open source software. Why? Our data suggested that open source allowed users of commercial proprietary software more freedom to make changes. Bug fixes would often arrive in a more timely way. Plus, the IBM- and Oracle-style license fees did not come along for the ride. That is probably true in some cases today.
  2. Open source was a free lunch. The developers often contributed for the common good; others created and made available open source software as a way to demonstrate and prove their capabilities. Translation, as one person told one of my researchers, “A job, man. Big bucks.”
  3. Monetization was mostly “little plays”; that is use our free stuff and then pay for support or proprietary extensions.

Flash forward to today. Some of these three decade old findings may still be in play, but the context is now very different.

What’s changed?

For the first time, meta plays are possible. Forget the investment, merger, and acquisition angles that motivate venture capital firms. Think in terms of just using Amazon and paying for what you need.

Start ups no longer just use Microsoft because it is available and works. Start ups use Amazon because it appears to be open source, cheap or subsidized, and available globally.

The challenge this presents to open source is significant. DarkCyber is not convinced that open source developers, users of open source software, analysts, and other professionals recognize what Amazon’s meta play and strategy is doing; that is, creating a new context of open source.

Want to learn more about Amazon’s meta play for open source? Write seaky2000 at yahoo dot com and inquire about our Amazon strategy webinar. Note: It’s not a freebie.

Everthing new is old again, including vendor lock in.

Stephen E Arnold, October 7, 2019

 

Amazon AWS, DHS Tie Up: Meaningful or Really Meaningful?

October 7, 2019

In my two lectures at the TechnoSecurity & Digital Forensics conference in San Antonio last week, my observations about Amazon AWS and the US government generated puzzled faces. Let’s face it. Amazon means a shopping service for golf shirts and gym wear.

I would like to mention — very, very briefly because interest in Amazon’s non shopping activities is low among some market sectors — “DHS to Deploy AWS-Based Biometrics System.” The deal is for Homeland Security:

to deploy a cloud-based system that will process millions of biometrics data and support the department’s efforts to modernize its facial recognition and related software.

The system will run on the AWS GovCloud platform. Amazon snagged this deal from the incumbent Northrop Grumman. AWS takes over the program in 2021. DarkCyber estimates that the contract will be north of $80 million, excluding ECOs and scope changes.

This is not a new biometrics system. Its been up and running since the mid 1990s. What’s interesting is that the seller of golf shirts displaced one of the old line vendors upon which the US government has traditionally relied.

DarkCyber finds this suggestive which is a step toward really meaningful. Watch for “Dark Edge: Amazon Policeware”. It will be available in the next few months.

Stephen E Arnold, October 7, 2019

Will the Real Disintermediating Entity Step Forward?

October 3, 2019

Big Microsoft day. It’s back in the mobile phone business. Sometime next year, probably coincident with a delayed Win 10 update, the Microsoft Surface Dual Screen Folding Android Phone becomes available. You can get the scoop and one view of Microsoft’s “we’re in phones again strategy” in “Microsoft’s Future Is Built on Google Code.” Do I agree? Of course not, that’s my method: Find other ways to look at an announcement.

The write up posits:

Google underpins Microsoft’s browser and mobile OS now.

I noted this statement as well:

… it could come as quite a shock that the CEO of Microsoft doesn’t care that much about operating systems. But there it is, in black and white. Microsoft obviously isn’t abandoning Windows — it announced a new version of it today — but it matters much more to Microsoft that you use its services like Office. That’s where the money is, after all.

Money. A phone that is not here?

But there’s another side to Microsoft. Amazon, the evil enemy, makes it possible run Microsoft on the AWS platform.

Now who is going to disintermediate whom?

Will Google get frisky and nuke Microsoft’s Android love?

Will Amazon just push MSFT SQLServer and other Microsoft innovations off the AWS platform and suck up the MSFT business.

Will Microsoft find that loving two enemies is more a management hassle than getting a Windows 10 server out the door?

Will Amazon and Google escalate their skirmishes and take actions that miss one enemy and plug the Redmond frenemy?

The stakes are high. Microsoft has done a pivot with an double backflip.

Perfect 10 or broken foot? Enron tried something like Microsoft’s approach. The landing was bumpy. The cloud may not cushion a lousy landing.

Stephen E Arnold, October 3, 2109

Microsoft and Software Problems

September 30, 2019

Microsoft wants DarkCyber to trust its cloud solutions. Not going to happen.

Navigate to “Windows 10 Problems Are Ruining Microsoft’s Reputation – and the Damage Can’t Be Understated.” The article asserts:

Reputation deflation is the path to damnation…

What? The data dignity company is on the path to hellfire?

We learned from the article:

According to ACSI, customer satisfaction with software for PCs has dropped by 1.3% compared to last year, with Microsoft slipping the most out of all software makers with a 3% decrease. The report further notes: “According to ACSI data, customer perceptions of quality have deteriorated significantly for Microsoft over the past year, as the manufacturer has encountered a host of customer issues with its Windows 10 updates.”

The write up stated:

This bug-related reputational damage isn’t just about desktop operating systems, though. The wider public perception of Microsoft flailing around in an almost amateurish fashion could well have a knock-on effect when it comes to the levels of trust in the company, and all those future dreamy cloud products we mentioned at the outset could be subsequently affected…

Microsoft wants to catch up with Amazon. Amazon, on the other hand, does not seem worried about catching up with Microsoft.

Microsoft may be creating problems for itself.

Stephen E Arnold, September 30, 2019

AI: Of, By, and For the One Percenters

September 28, 2019

I read “At Tech’s Leading Edge, Worry About a Concentration of Power.” You can too if you pay the Gray Lady or have a dead tree version of the estimable newspaper.

The main point of the write up is that doing smart software with machine learning and lots of data is expensive. Therefore, if a person struggles to pay the rent, smart software is going to be out of reach.

Sure, Amazon offers deals, but the fees for big time machine learning can be beyond the reach of the average country club member. Even a pro athlete with a history of interesting tweets may not be able to handle the invoices from Google, Microsoft, and other cloud vendors.

The newspaper observes against these somewhat poorly kept smart software secrets:

Computer scientists say A.I. research is becoming increasingly expensive, requiring complex calculations done by giant data centers, leaving fewer people with easy access to the computing firepower necessary to develop the technology behind futuristic products like self-driving cars or digital assistants that can see, talk and reason.

Is there a fix?

Well, sort of. The New York Times pointed to foundation support; for example:

At the Allen Institute in Seattle, Mr. Etzioni [former professor and online expert] said, the team will pursue techniques to improve the efficiency of artificial intelligence technology. “This is a big push for us,” he said. But Mr. Etzioni emphasized that what he was calling green A.I. should be seen as “an opportunity for additional ingenuity, not a restraint” — or a replacement for deep learning, which relies on vast computing power, and which he calls red A.I.

Net net: Smart software requires big bucks, big brains, big computing, and big effort. Can innovations emerge from a lab like the one beleaguered Tesla operated?

Maybe, just not probable. When big outfits “help”, the opportunity for “borrowing” may be tempting. In an ethics free zone, who wins?

The one percent. What’s different this time?

Stephen E Arnold, September 28, 2019

Amazon Policeware: The Path to IBM-Style Lock In on Steroids

September 27, 2019

Quite a bit of Amazon news has flowed through the DarkCyber system. The problem is that most of the information is oblivious to Amazon’s policeware initiative. DarkCyber’s research suggests that Amazon is building a surveillance system. One DarkCyber team member said, “Amazon is building what China has been working on for several years.” Is this DarkCyber researcher correct? Who knows?

I do want to provide a diagram from our Amazon webinar which puts Amazon’s activities into a context for enforcement. The scope of Amazon’s business strategy extends beyond local law enforcement and the Ring video doorbell activities, beyond the cloud services for several US government agencies, and beyond the company’s online businesses.

Amazon may be positioning itself to provide:

  • IRS-related services associated with tax investigations
  • Drug enforcement actions related to physicians who allegedly overprescribe or entities which obtain certain compounds using obfuscation methods
  • SEC-related services to determine entity interaction, expenditures, and related financial activities
  • Credit verification, including other financial analyses, for government and retail financial activities.

Other “extensions” are possible. What’s interesting is that few have noticed and even fewer pay much attention beyond hand waving about Alexa. There’s more than Alexa, which is a low level gateway service.

Here’s the diagram, which is copyrighted by Stephen E Arnold, operator of DarkCyber, and author of the forthcoming monograph, Dark Edge: Amazon’s Policeware Initiative.

image

© Stephen E Arnold, 2019.

How do you use this diagram? Just map Amazon’s most recent product announcements into the grid.

The DarkCyber Amazon policeware webinar walks through the tactics and the strategy for this “in plain sight” play. Analysts, journalists, policeware vendors paying Amazon to host their systems, and Microsoft-type outfits are oblivious to what is now the end game for a 12 year push by Amazon to make IBM-style lock in seem as quaint as a Model T Ford.

For those who recycle my information and claim it as your own creative output, why not be somewhat ethical and provide attribution. You know. Old-fashioned stuff like a footnote. Yep, that includes a real journalist who writes for the New York Times and the Epstein linked MIT publication, among others.

Stephen E Arnold, September 27, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta