Where Did That Technology Originate?

January 23, 2020

Western tech companies have been under fire for cooperating with China’s efforts to spy on its own citizens. Google and Apple have both been criticized for censoring apps and other content the Chinese government found problematic. Now, BuzzFeed tells us that “Amazon, Apple, and Google Are Distributing Products from Companies Building China’s Surveillance State.” Reporters Rosalind Adams and Ryan Mac write:

“The goods and apps come from three companies — Division, Dahua Technology, and iFlytek — which the US Commerce Department recently placed on an export blacklist for their role in aiding in the surveillance and detention of more than a million Uighur Muslims and other Muslim ethnic minorities in China’s northwest Xinjiang region. The blacklist designation prevents US companies from exporting commodities or software to those companies. But it does not stop Amazon and eBay from selling their products in their own online marketplaces, or Google and Apple from distributing their apps to US consumers. BuzzFeed News’ findings underscore, however, the extent to which the technology industry’s leading companies continue to work with entities that supply surveillance software and cameras to watch over one of the world’s most persecuted ethnic minorities. BuzzFeed News counted hundreds of products from Dahua and Hikvision, which manufacture security system equipment, and iFlytek, a voice recognition and translation company, on Amazon, eBay, and Overstock. Apple and Google also collectively distributed more than 100 apps from the three Chinese companies on the Apple App Store and Google Play, the main marketplace for Android software.”

The article supplies more information about Hikvision, Dahua Technology, and iFlytek and the products they sell, so navigate there for those details. Western companies risk being expelled from China if they do not cooperate with the government’s demands, and it is hard to turn down the profits a market of 1.4 billion people offers. However, it is difficult for Western democracies to put pressure on China to change its ways when our own companies support it. By vending these blacklisted companies’ apps and hardware through their online marketplaces, the tech giants embrace a loophole. Doing so is further evidence they value profits over principles.

Cynthia Murrell, January 23, 2020

China: Making Sales in a Booming Surveillance Market

January 22, 2020

China has an authoritarian government, so it is not surprising they are developing AI surveillance technology. What is surprising, however and yet not so much, is that China is exporting their AI surveillance technology to other countries. Japan Times reports how, “AI Surveillance Proliferating, With China Exporting Tech To Over To 60 Countries, Report Says.”

Among the countries China has sold the technology to are Venezuela, Myanmar, Iran, and Zimbabwe (all less than reputable places).

China uses facial recognition technology to monitor Muslim minorities, who have been imprisoned in concentration camps. The Carnegie Endowment for International Peace shared the news about China’s AI technology sales. The fears are that these authoritarian governments would use the technology to augment their dominance and share the data with China.

As China slowly gains more economic prominence, it is trying to encourage more countries to purchase its technology and other electronics. These include countries in Europe, Asia, and Africa. China is slowly making these countries rely on their technology:

“ ‘Chinese product pitches are often accompanied by soft loans to encourage governments to purchase their equipment,’ [the report] said. ‘This raises troubling questions about the extent to which the Chinese government is subsidizing the purchase of advanced repressive technology.’

China has come under international condemnation in the wake of an investigative report by the International Consortium of Investigative Journalists on the country’s surveillance and predictive-policing system to oppress Uighurs and send them to internment camps.”

Democratic countries are also developing AI surveillance technology, but they are not controlling how the technology is used and how it could violate laws.

China has a powerful piece of police ware technology and are already using it to violate human rights. What will China do when the technology becomes more advanced?

Whitney Grace, January 22, 2020

Amazon: Wooden Shoes, Tulips, and Cheese. Oh, and Money. Yes, Money

January 21, 2020

Amazon is moving into the Netherlands. “Amazon Confirms Netherlands Expansion” states:

Amazon has said it plans to expand its Amazon.nl site by making physical product categories available to Dutch customers later this year. The e-commerce seller launched an e-book shop on Amazon.nl in 2014, but physical products have been offered via a Dutch language option on Amazon’s Germany country site. Netherlands-based customers have also been offered Prime membership since 2017. Amazon has also announced that third-party sellers in the Netherlands and around the world can now register their accounts in preparation for the launch.

This is an important step for Amazon. The Netherlands is an ideal location for same day services. For merchants wanting to tap into the dense population centers serviced from Amazon’s Netherlands location, navigate to “Step-by-Step Guide: How to Sell in Europe with Amazon” for some useful information. To get a sense of the scope of Amazon’s international operations, you may find this map to be helpful:


The darker orange indicates regions served by Amazon via its ecommerce network.

Stephen E Arnold, January 21, 2020

Calling Out Search: Too Little, Too Late

January 20, 2020

The write up’s title is going to be censored in DarkCyber. We are not shrinking violets, but we think that stop word lists do exist. Problem? Buzz your favorite ad supported search vendor and voice your complaints.

The write is “How Is Search So #%&! Bad? A ‘Case Study’.” The author appears to be frustrated with the outputs of ad supported and probably other types of seemingly “free” search systems providing links to Web content. This is what some people call “open source intelligence online”. There are other information resources available, but most of the consumer oriented, eyeball hungry vendors ignore i2p, forums with minimal traffic, what some experts call the Dark Web, and even some government information services. How many people pay any attention to the US National Archives? Be honest in your assessment.

Here’s a passage we noted:

Google Search is ridiculously, utterly bad.

This seems clear.

The write up provides some examples, but I anticipate that some other people have found that the connection between a user’s query and the Google search outputs is tenuous at best. One criticism DarkCyber has of the write up is that it mentions Google, shifts to Reddit, and then to metadata. The key point for us was the focus on time.

Now time is an interesting issue in indexing. Years ago I did a research project on the “meaning” of “real time” in online services. I think my research team identified five or six different types of time. I will skip the nuances we identified and focus only on the data or freshness of an item in a results list.

Let’s by sympathetic to the indexing company. Here’s why:

First, many documents do not provide an explicit date in the text of the article. In Beyond Search and DarkCyber, you will notice that we provide the author’s name and a day and data at which the article was posted. Many write ups on the open Web don’t bother. In fact, there will be no easy way to date the time the author posted the story within the content displayed in a browser. Don’t you love news releases which do not include a date, time, and time zone?

Second, many write ups include dates and times in the text of an article. For example, the reference to Day 2 of the recent CES trade show may include the explicit date January 8, 2020, for a product announcement. The approach is similar to using CES without spelling out “Consumer Electronics Show.” Buy, hey, these folks are busy, and everyone in the know understands the what and when, right?

Third, auto-assigned dates by operating systems may be “correct” when a file or content object is created. But what happens when a file or drive is restored? The original dates and metadata may be replaced with the time stamp of the restore. What about date last accessed or date last changed? Too much detail. Yada yada.

Fourth, time sorting is possible. Google invested in Recorded Future (now part of Insight). I had heard that someone at the GOOG thought Recorded Future’s time functions were nifty. Guess not. Google did not implement more sophisticated time functions in any service other than those related to advertising. For the great unwashed masses of those who don’t work at Google, tough luck I supposed.

Fifth, when was the content first indexed. More significantly, when was the content last updated. Important? May be, gentle reader. May be.

There are several other conditions as well. For the purposes of a blog post, I want to make clear: The person who is annoyed with search should have been annoyed decades ago. These time problems are not new, and they are persistent.

The author with a penchant for tardy profanity stated:

Part of the issue in this specific case is that they’ve started ignoring settings for displaying results from specific time periods. It’s definitely not the whole issue though, and not something new or specific to phone searches. Now, I’ve always been biased towards the new – books, tech, everything, but I can’t help but feel that a lot of things which were done pretty well before are done worse today. We do have better technology, yet we somehow build inferior solutions with it all too often. Further, if they had the same bias of showing me only recent results I’ll understand it better, but that’s not even the case. And yes, I get that the incentives of users and providers don’t align perfectly, that Google isn’t your friend, etc. But what is DDG’s excuse? As for the Case Study part, and me saying this isn’t simply a rant – I lied, hence the quotation marks in the title. Don’t trust everything you read, especially the goddamn dates on your search results.

The write up omits a few other minor problems with modern search and retrieval systems. Yep, this includes Reddit, LinkedIn, and a bunch of others. Let me provide a few dot points:

  • Poorly implemented Boolean search
  • Zero information about what’s in an index
  • Zero information about what’s excluded from and index and why
  • Minimal auto linking to information about an “author” or the “source” of the content
  • No data to make a precision or recall calculation possible and reproducible
  • No data to make it possible to determine overlap among Web indexes. Analyses must be brute forced. Due to the volatility, latency, and editorial vagaries of ad supported Web search systems, data are mostly suggestive.

Why? Why are none of these dot points operative?

Answer: Too expensive, too hard, not appropriate for our customers, and “What are you talking about? We never heard of half these issues you identified.”

Net net: Years ago I wrote an article for Searcher Magazine, edited at the time by Barbara Quint, a bit of an expert in online information retrieval. She worked at RAND for a number of years as an information expert. She said, “Do you really want me to use the title ‘Search Sucks’ on your article.” I told her, use whatever title you want. But if you agree with me, go with “sucks.”  She used “sucks”. Let’s see that was a couple of decades ago.

Did anyone care? Nope. Does anyone care today? Nope. There you go.

Stephen E Arnold, January 20, 2020

Amazon and Microsoft: Different Ways to Leverage $1 Billion

January 17, 2020

Author and big gun Brad Smith, president of Microsoft, allegedly wrote “Microsoft Will Be Carbon Negative by 2030.” To achieve this goal, the company will spend $1 billion dollars. Okay, that appears to work out to $8.3 million per month for 10 years. That’s about 11 Azure Cognitive S4 transactions. Impressive. I suppose it depends on one’s point of view. From the PR perspective, this is probably a decent billion. From other points of view, one’s mileage may vary.

Now contrast this Microsoft $1 billion with Amazon’s. Dark Cyber noted “During Bezos Visit, India minister Says Amazon’s $1 Billion Investment Is No Big Favour.” The write up states something that is a PR downer:

Amazon and Walmart’s Flipkart are facing mounting criticism from India’s brick-and-mortar retailers, which accuse the U.S. giants of violating Indian law by racking up billions of dollars of losses to fund deep discounts and discriminating against small sellers. The companies deny the allegations.

Amazon’s reaction? Read on:

Bezos said on Wednesday [January 15, 2020] Amazon would invest $1 billion to bring small businesses online in the country, adding to the $5.5 billion the company had committed since 2014.

Stepping back, Microsoft is going for good ink. Amazon seems to be going after what may be the second or third largest market in the world for Amazon services and battery powered Ring doorbells.

Interesting uses of $1 billion.

Stephen E Arnold, January 17, 2020

Education: Is the Future in the Hands of Google Type Companies

January 15, 2020

I spotted a news item which would not be fodder for either this blog or our DarkCyber video program. Then one of the research team emailed me a link to an apparently unrelated article. Then it struck me: The future of education is probably going to be ceded to big companies and sources of revenue which may have interesting avocations.

Let me explain.

The first news item reports that “US Colleges Struggling with Low Enrollment Are Closing at Increasing Rate.” The article, from a source with which I am not familiar, asserts:

For 185 years this college campus in Vermont was teeming with students. Now it sits empty. In January, the school announced it would be closing. ‘I’ve had a very long professional career. It’s the hardest thing I’ve ever had to do – to stand in front – in our auditorium with 400 people and telling principally students, but faculty and staff, that we wouldn’t be opening this fall,” said  Bob Allen, President at Green Mountain College.

Sure enough. The institution is a goner.

Then the article which I spotted but decided was not suitable for this blog. Its title? “UVM Gets $1 Million from Google for Open Source Research.” The write up from the delightfully named WCAX asserts:

The unrestricted gift is to support open-source research. Open source is a type of computer software, where source code is released under a license, and the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose.

We know that august institutions like the Massachusetts Institute of Technology will deal with individuals of questionable character when the cash pay off is big enough.

Let’s assume these items are accurate. Now let’s look into a future in which universities become increasingly desperate for money.

Who will provide the dough?

Answer: People who have the money and have a need.

Why? Let me suggest a few reasons:

  1. Access to lower cost talent
  2. Opportunity to recycle research into commercial products
  3. Force students to “like” big companies. See “‘Techlash’: Positive Perceptions of Facebook, Google Crumble on Campuses.

So who owns what the grant money generates, particularly if the output is open source? What happens if Amazon uses Google funded open source as part of its platform? Who determines how the money is used or, in the case of MIT, how its origin is obfuscated? Is academic R&D a more efficient way to generate innovation?

Net net: The financial situation is likely to lead to the equivalent of corporate naming rights to NFL football stadia. And if you don’t like, don’t attend.

Stephen E Arnold, January 15, 2020

Qatalyst Autonomy Presentation 2

January 14, 2020

DarkCyber spotted a link to a second presentation apparently prepared by Qatalyst Partners prior to Hewlett Packard’s purchase of Autonomy in 2011. This second slide deck covers:

  • Historical trading performance and related financial data
  • Shareholder ownership
  • Comparative financial data; for example, Google, Oracle, HP, and other firms.

If you want to check out the first Qatalyst Autonomy presentation, you can find that document at this link. You may be able to locate other Autonomy documents via some scouting around on the Vdocuments.mx site.

These documents are almost a decade old, but they provide useful information for anyone considering an investment in or purchase of an organization engaged in enterprise search and text analysis software.

Documents like these provide some of the factual foundation we use in our reports and analyses. It is far easier to talk about the revenue potential of search and text processing. It is far more difficult to generate sustainable revenue and growing profits.


The reasons include:

  • Ignoring the highly particularized nature of search and text analysis; that is, one size fits all doesn’t, so expensive, one off tailoring is required
  • Making a search or text analysis sale is time consuming. The reasons range from “we have been burned before” to “this got the previous information people fired.”
  • Keeping the search and text analysis system up and running is expensive.
  • Staying competitive is very expensive. Innovation is easy to talk about but difficult to deliver.
  • Growth requires acquisitions, and these just add to the cost of dealing with the technical debt the acquirer has to generate money to pay.

Net net: Documents like these are useful and often difficult to obtain.

Stephen E Arnold, January 14, 2020

Looking at Some Research Made Public by Google

January 10, 2020

The today Google is different from the yesterday Google. When I began work on the Google Legacy in 2002, I was able to locate Google presentations in PowerPoint form, Google papers posted on Google sub sites, and from Googlers who staffed booths at trade shows. Often these individuals would email me links to public information stored on obscure online urls.

Today figuring out where often obscure Google information is located is very difficult. Google is not so much secretive and really disorganized. Now that’s saying something because the early days of Google were comparable to predicting which way a squirrel would jump when a driver honked at a critter sitting in the road.

You can access some Google documents, often for a limited period of time, in the Google Research publication database. Today version of the service looks like this:


The service has about 6,000 papers posted. Some of these are full text; others are bibliographic citations. Some papers disappear.

In its present form, one can get some insight into what Google wants to expose to the public. Thus, the listing has a bit of marketing and PR spin to it. If you want to know about Alon Halevy’s Transformics technology, this collection is not for you. Ramanathan Guha has a single citation.

The good news is that the service is online. As you use the resource as a complement to other research, the limitations of the service become visible.

What’s Google philosophy of research? The Web site contains a link to the Google research philosophy. There you go. And I did not spot any advertising on the pages I examined…yet.

Stephen E Arnold, January 10, 2020

Lucidworks: Beyond Search for Sure

January 9, 2020

Lucid Imagination experienced what DarkCyber recalls as a bit of turmoil. From the git go, there was tension in the open sourcey ranks. One of the founders was unceremoniously given an opportunity to find his future elsewhere. Then there was the game of Revolving Door Presidents. Next was the defection of some lucid thinkers to Amazon, not in Seattle but just up the 101 to some non descript buildings. Like a law of nature another round of presidential revolving doors. Along the way, more investors wrote checks for what was an open source play based on Lucene/Solr. (I know that writing the two “names” together does not capture the grandiosity of the conception of community supported search and the privately held companies efforts to create a huge, billion dollar information access business. Sigh.

Now Lucidworks (which I automatically interpret as the phrase “Lucidworks. Really?”) has acquired an eCommerce vendor. Hello, what’s happening Magento, Mercado, Shopify, and Amazon. Yep, Amazon. But doesn’t Amazon have search too? Trivial point. Lucidworks is going to turn the $200 million in investment capital, an interface scripting engine, open source software, and Cirrus10 (an ecommerce service provider) into billions. Yes, billions!

According to “Lucidworks Acquires Cirrus10, Global Ecommerce Service Provider, to Deepen Domain Expertise and Become a Leader in Digital Commerce Solutions” states:

 Lucidworks, leader in AI-powered search, acquires Cirrus10, ecommerce solutions expert with more than 100 ecommerce customers. Lucidworks and Cirrus10 have worked together as partners for the past two years and now combine their domain expertise to provide more targeted solutions for different domains in the fast-moving ecommerce market.

The Yahoo news story points out that Lucidworks’ secret sauce is a system:

produces relevant results, recommends products that meet customer goals, and predicts shopper intent to create a more engaging experience.

And don’t forget artificial intelligence. AI! Obviously.

But whose AI? The answer appears to be AI from Cirrus10. DarkCyber noted this statement from a co founder of the ecommerce service provider:

“Fusion is the world’s only platform for extensible AI-driven search. Fusion elevated our service offerings by giving us a framework for exploring machine learning with our customers, and using it, we can build personalized and scalable relevancy models without a black box or army of data scientists. By combining Lucidworks search and AI expertise with our deep experience in the ecommerce space we can cement our role as digital commerce solution leaders.”—Peter Curran, Cirrus10

What appears to be the business strategy for Lucidworks is to get something that generates sustainable revenue, allows the company to upsell Cirrus10’s customers, and differentiate Lucidworks from the competitors in plain old search.

There are competitors; for example, outfits with venture capital backers demanding results (Algolia, Coveo). Also, open sourcey solutions (Drupal Commerce, Magento Community Edition) and small, feisty outfits like SLI Systems and EasyAsk). Note: This is a partial list. I almost forget companies like Amazon, eBay, and Google.

DarkCyber interprets the “beyond search” phrase as an attempt to make a 12 year old company into a revenue and profit machine.

DarkCyber, which is an annex to our blog Beyond Search, wishes the clear thinkers a great 2020. The question “Lucidworks. Really?” could be answered as long as AI, NLP, machine learning, open source, and synergy produce a winner, not a horse designed by a committee.

Stephen E Arnold, January 9, 2019

Oracle, Amazon, and Maybe Soon Open Source Excitement?

January 6, 2020

Remember the on going Google-Oracle Java dust up? Oracle may. According to “Oracle Copied Amazon’s API. Was That Copyright Infringement?”:

Among the companies offering a copy of Amazon’s S3 API is Oracle itself. In order to be compatible with S3, Oracle’s “Amazon S3 Compatibility API” copies numerous elements of Amazon’s API, down to the x-amz tags. Did Oracle infringe Amazon’s copyright here? Ars Technica contacted Oracle to ask them if they had a license to copy Amazon’s S3 API. An Oracle spokeswoman said that the S3 API was licensed under an Apache 2.0 license. She pointed us to the Amazon SDK for Java, which does indeed come with an Apache 2.0 license. However, the Amazon SDK is code that uses the S3 API, not code that implements it—the difference between a customer who orders hash browns and the Waffle House cook who interprets the orders.

DarkCyber thinks the author is saying, “Yep, we copied.”

But… and this is interesting.

the Amazon SDK is code that uses the S3 API, not code that implements it.

Is this going to have an impact on API use? A court may decide.

In the meantime, let’s approach this from a different angle.

What’s the future of software? In DarkCyber’s opinion the future of software is a mix of open source code with proprietary components. DarkCyber doesn’t have a nifty Waffle House analogy for this trajectory.

The idea is that the technical constructs we know and love as FANG for Facebook, Amazon, Netflix, and Google want to reduce costs, create a glide path for young open sourcey developers, and lock in big spending customers.

One way to think about the Oracle copying Amazon move is in the context of the 2020 version of proprietary software. The APIs and the need for lock in are essential to the persistence of certain big companies.

Net net: What looks open is not? What looks like wordsmithing is a prelude to more aggressive maneuvers.

The name of the game is revenue and growth. Losers will eat in a Waffle House. Winners will not.

Stephen E Arnold, January 6, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta