Google Wants to Help. Really Help

January 8, 2020

We noted “Google Assistant Now Lets You Torment Roommates with Household Notes.” The evil thing and all the world’s information are long gone. But the write up reminded the DarkCyber team of what Google is now embracing digital notes. These are reminders or what some Okay, boomers call nags. The write up states in prose which we have edited to make semi tasteful:

You’ll feel a little prouder when you pretentiously remind your roommate they [failed at a task] because you no longer have to waste paper on post-it notes to teach your fellow basement dwellers proper manners.

Which is more impressive? Google Assistant with a notes (nag) function or the write up by an aspiring Hemingway type?

Stephen E Arnold, January 8, 2020

Linguistics: Becoming Useful to Regular People?

January 8, 2020

Now here is the linguistic reference app I have been waiting for: IDEA’s “In Other Words.” Finally, an online resource breaks the limiting patterns left over from book-based resources like traditional dictionaries and thesauri. The app puts definitions into context by supplying real-world examples from both fiction and nonfiction works of note from the 20th and 21st centuries. It also lets users explore several types of linguistic connections. Not surprisingly, this thoroughly modern approach leverages a combination of artificial and human intelligence. Here is how they did it:

“Building on the excellent definitions written by the crowd-sourced editors at Wiktionary, IDEA’s lexicographic team wrote more than 2,700 short, digestible definitions for all common words, including ‘who,’ ‘what,’ and ‘the.’ For over 100k other words that also have Wikipedia entries, we included a snippet of the article as well. To power the app, our team created the IDEA Linguabase, a database of word relationships built on an analysis of various published and open source dictionaries and thesauri, an artificial intelligence analysis of a large corpus of published content, and original lexicographic work. Our app offers relationships for over 300,000 terms and presents over 60 million interrelationships. These include close relationships, such as synonyms, as well as broader associations and thousands of interesting lists, such as types of balls, types of insects, words for nausea, and kinds of needlework. Additionally, the app has extensive information on word families (e.g., ‘jump,’ ‘jumping’) and common usage (‘beautiful woman’ vs. ‘handsome man’), revealing words that commonly appear before or after a word in real use. In Other Words goes beyond the traditional reference text by allowing users to explore interesting facts about words and wordplay, such as common letter patterns and phonetics/rhymes.”

The team has endeavored to give us an uncluttered, intuitive UI that makes it quick to look up a word and easy to follow a chain of meanings and associations. Users can also save and share what they have found across devices. Be warned, though—In Other Words does not shy away from salty language; it even points out terms that were neutral in one time period and naughty in another. (They will offer a sanitized version for families and schools.) They say the beta version is coming soon and will be priced at $4.99, or $25 with a custom tutorial. We look forward to it.

Cynthia Murrell, January 8, 2020

Google and Open Innovation: A Tiny Ripple, the Flap of a Butterfly Wing?

January 7, 2020

The US government is rethinking its approach to commercial artificial intelligence or to application programming interfaces nature. “The Case for Open Innovation” is interesting.

The write up, allegedly written by a senior vice president and legal eagle at Google, states:

Software programs work better when they work together. Open software interfaces let smartphone apps and other services connect across devices and operating systems. And interoperability—the ability of different software systems to exchange information—lets people mix and match great features, and helps developers create new products that work across platforms. The result? Consumers get more choices for how they use software tools; developers and startups can challenge bigger incumbents; and businesses can move data from one platform to another without missing a beat. This kind of open and collaborative innovation, from scientific peer-reviewed papers to open-source software, has been key to America’s achievements in science and technology.

The Googler emphasizes that Google is fighting Oracle’s claim that the online ad company improperly used Oracle’s intellectual property.

The write up claims:

That’s why today we filed our opening Supreme Court brief in Oracle’s lawsuit against us. We’re asking the Court to reaffirm the importance of the software interoperability that has allowed millions of developers to write millions of applications that work on billions of devices.

After reading this, I jotted down factors which have facilitated information exchange:

  • Technical experts from other countries working for US companies in the US
  • Desire to reduce costs
  • Need to piggyback to avoid reinventing the wheel
  • Presence of staff who worked on a technology when it was developed at a different company
  • Importance of an acquiring firm to maximize the financial return of its purchase of a company and technology; for example, Sun Microsystems and Java.

Also, the ideas of openness and interoperability are interesting, particularly when articulated by commercial firms eager to establish revenue, user, and customer locks. The context of the actions taken by the US government to address export of smart software may be sucked into this particular legal dispute. Export controls seem to be different from the intent of open innovation.

The timing is important. In this particular case of Google versus Oracle, timing play a significant role. The court’s decision or non decision might unsettle today’s context of commerce and politics.

Stephen E Arnold, January 7, 2020

Is Open Source Changing and Rapidly?

January 7, 2020

Open source technology is what some perceive as unencumbered, handcuff free code. For outfits eager to slash costs, open source software is a foot stool for some developers and organizations. One interpretation of open source operates on the premise that the technology should be free and available for anyone. The social contract is that users “give back” to the open source community.

Some Amazon Web Services’ critics appear to suggest that the company is not giving back. Not surprisingly, some AWS-ers are not happy campers. ZDNet shares more on the story in the article, “AWS Hits Back At Open-Source Software Critics.”

Also, the deeply technical New York Times was not kind to AWS, when it stated that AWS, a giant cloud computing provider, consistently integrated open source software that non-AWS developers created. Vice President of AWS analytics and ElasticCache Andi Gutman claims that AWS is giving its customers what they want. Gutman says that Was customers want technology and services based off open source technology, so AWS is not strip mining, but truly answering their clients’ desires. He continued:

“The story is largely talking about open source software projects and companies who’ve tried to build businesses around commercializing that open-source software. These open-source projects enable any company to utilize this software on-premises or in the cloud, and build services around it. AWS customers have repeatedly asked AWS to build managed services around open source,” Gutman said.

He noted that AWS contributes to open-source projects such as Linux, Java, Kubernetes, Xen, KVM, Chromium, Robot Operating System, Apache Lucene, Redis, s2n, FreeRTOS and Elasticsearch.”

The complaints apparently come from AWS’s rivals, who have also discussed filing antitrust complaints against the company. One rival CEO, Matthew Prince of Cloudflare, is afraid Amazon’s ambitions are endless and might overpower or monopolize the entire cloud computing market.

Will open source return to its roots? Will some open source developers not permit big companies to privatize the community technology?

Which will triumph? Open source precepts or the needs of a publicly-traded company?

Elastic, the developers of open source Elastic, the write up “Why Elastic Stock Dropped 19% in December” may presage the impact of efforts to change the definition of open source.

Whitney Grace, January 7, 2020

Shutting Down a C Suite Person to Cyber Security

January 7, 2020

DarkCyber spotted an interesting approach to marketing. The write up “Implications for CEOs Who Miss Security Targets” offers words of wisdom from a consultancy doing business as Thycotic. With what does this name rhyme? Note: This is a question, you gentle reader, can answer. DarkCyber thinks stenotic perhaps. The word, as you may know, means narrowing.

With the poetry out of the way, what are the issues related to a “security target”?

One of the main reasons behind this is that there is a disconnect between the C-suite and the IT security team. A lack of effective communication between the two can often result in security targets that are based on KPIs that have little relation to business objectives.

Yes, we have a failure to communicate.

Image result for paul newman failure to communicate

And there is evidence, proof from a sample of 550 “IT decision makers”:

a Thycotic survey of 550 IT decision makers shows that a quarter (26 percent) report that IT security is not prioritized or invested in by their boards as strategically important. Further, more than half (52 percent) of IT security decision makers say their organizations struggle to align business goals and security initiatives. Four out of 10 (43 percent) say their business’s goals are not communicated with them and a third (36 percent) admit that they aren’t clear on what the business goals even are.

DarkCyber can add the following downsides:

  1. The IT person will be given an opportunity to [a] testify and [b] find his/her future elsewhere
  2. New cyber security vendors will be hired, adding to the confusion and complexity for sitting ducks to fend off guerilla hunters working alone, in squads, or for an industrialize criminal organization
  3. Employees will be reminded to change their passwords, zip their lips, and avoid clicking on emails which usually look pretty darned authentic.

DarkCyber’s view is that change, particularly with regard to cyber security, comes slowly for many organizations.

PS. The C suite may be given an overhaul.

Stephen E Arnold, January 7, 2020

Abandoned Books: Yep, Analytics to the Rescue

January 6, 2020

DarkCyber noted “The Most ‘Abandoned’ Books on GoodReads.” The idea is that by using available data, a list of books people could not finish reading can be generated. Disclosure: I will try free or $1.99 books on my Kindle and bail out if the content does not make me quiver with excitement.

The research, which is presented in academic finery, reports that the the author of Harry Potter’s adventurers churned out a book few people could finish. The title? The Casual Vacancy by J.K. Rowling. I was unaware of the book, but I will wager that the author is happy enough with the advance and any royalty checks which clear the bank. Success is not completion; success is money I assume.

I want to direct your attention, gentle reader, to the explanation of the methodology used to award this singular honor to J.K. Rowling, who is probably pleased as punch with the bank interaction referenced in the preceding paragraph.

Several points merit brief, very brief comment:

  • Bayesian. A go to method. Works reasonably well. Guessing has its benefits.
  • Data sets. Not exactly comprehensive. Amazon? What about the Kindle customer data, including time to abandonment, page of abandonment, etc.? Library of Congress? Any data to share? Top 20 library systems in the US? Got some numbers; for example, number of copies in circulation?
  • Communication. The write up is a good example why some big time thinkers ignore the inputs of certain analysts.

To sum up, perhaps The Casual Vacancy may make a great gift when offered by Hamilton Books? A coffee table book perhaps?

Stephen E Arnold, January 6, 2020

Oracle, Amazon, and Maybe Soon Open Source Excitement?

January 6, 2020

Remember the on going Google-Oracle Java dust up? Oracle may. According to “Oracle Copied Amazon’s API. Was That Copyright Infringement?”:

Among the companies offering a copy of Amazon’s S3 API is Oracle itself. In order to be compatible with S3, Oracle’s “Amazon S3 Compatibility API” copies numerous elements of Amazon’s API, down to the x-amz tags. Did Oracle infringe Amazon’s copyright here? Ars Technica contacted Oracle to ask them if they had a license to copy Amazon’s S3 API. An Oracle spokeswoman said that the S3 API was licensed under an Apache 2.0 license. She pointed us to the Amazon SDK for Java, which does indeed come with an Apache 2.0 license. However, the Amazon SDK is code that uses the S3 API, not code that implements it—the difference between a customer who orders hash browns and the Waffle House cook who interprets the orders.

DarkCyber thinks the author is saying, “Yep, we copied.”

But… and this is interesting.

the Amazon SDK is code that uses the S3 API, not code that implements it.

Is this going to have an impact on API use? A court may decide.

In the meantime, let’s approach this from a different angle.

What’s the future of software? In DarkCyber’s opinion the future of software is a mix of open source code with proprietary components. DarkCyber doesn’t have a nifty Waffle House analogy for this trajectory.

The idea is that the technical constructs we know and love as FANG for Facebook, Amazon, Netflix, and Google want to reduce costs, create a glide path for young open sourcey developers, and lock in big spending customers.

One way to think about the Oracle copying Amazon move is in the context of the 2020 version of proprietary software. The APIs and the need for lock in are essential to the persistence of certain big companies.

Net net: What looks open is not? What looks like wordsmithing is a prelude to more aggressive maneuvers.

The name of the game is revenue and growth. Losers will eat in a Waffle House. Winners will not.

Stephen E Arnold, January 6, 2020

Megaputer Spans Text Analysis Disciplines

January 6, 2020

What exactly do we mean by “text analysis”? That depends entirely on the context. Megaputer shares a useful list of the most popular types in its post, “What’s in a Text Analysis Tool?” The introduction explains:

“If you ask five different people, ‘What does a Text Analysis tool do?’, it is very likely you will get five different responses. The term Text Analysis is used to cover a broad range of tasks that include identifying important information in text: from a low, structural level to more complicated, high-level concepts. Included in this very broad category are also tools that convert audio to text and perform Optical Character Recognition (OCR); however, the focus of these tools is on the input, rather than the core tasks of text analysis. Text Analysis tools not only perform different tasks, but they are also targeted to different user bases. For example, the needs of a researcher studying the reactions of people on Twitter during election debates may require different Text Analysis tasks than those of a healthcare specialist creating a model for the prediction of sepsis in medical records. Additionally, some of these tools require the user to have knowledge of a programming language like Python or Java, whereas other platforms offer a Graphical User Interface.”

The list begins with two of the basics—Part-of-Speech (POS) Taggers and Syntactic Parsing. These tasks usually underpin more complex analysis. Concordance or Keyword tools create alphabetical lists of a text’s words and put them into context. Text Annotation Tools, either manual or automated, tag parts of a text according to a designated schema or categorization model, while Entity Recognition Tools often use knowledge graphs to identify people, organizations, and locations. Topic Identification and Modeling Tools derive emerging themes or high-level subjects using text-clustering methods. Sentiment Analysis Tools diagnose positive and negative sentiments, some with more refinement than others. Query Search Tools let users search text for a word or a phrase, while Summarization Tools pick out and present key points from lengthy texts (provided they are well organized.) See the article for more on any of these categories.

The post concludes by noting that most text analysis platforms offer one or two of the above functions, but that users often require more than that. This is where the article shows its PR roots—Megaputer, as it happens, offers just such an all-in-one platform called PolyAnalyst. Still, the write-up is a handy rundown of some different text-analysis tasks.

Based in Bloomington, Indiana, Megaputer launched in 1997. The company grew out of AI research from the Moscow State University and Bauman Technical University. Just a few of their many prominent clients include HP, Johnson & Johnson, American Express, and several US government offices.

Cynthia Murrell, January 02, 2020

Are Catalogs Made from Dead Trees Rising from the Ashes of Retail?

January 6, 2020

DarkCyber spotted an interesting write up about dead trees. The article is about printed catalogs. Paper. You remember the stuff, don’t you? “Catalog Retailers See Reason for Optimism after Declines” contains a somewhat surprising statement; to wit:

New companies are mailing catalogs. And even died-in-the-wool online retailers like Amazon and Bonobos are getting into the act. “They’re tapping out on what they’re able to do digitally,” said Tim Curtis, president of CohereOne, a direct marketing agency in California. “They’ve got to find some new way to drive traffic to their websites.”

Does this assertion translate into a certain exhaustion of the possibilities of online advertising. Maybe pop up or Google fatigue is affecting people looking for information.

Consider online information services. I encountered a situation with the Daily Mail, a British newspaper. The site would not display. There were ads loading, questions to answer, and pop ups to dismiss. I solved the problem by navigating to another site. Too much hassle.

A paper catalog can be viewed and maybe used to buy something without the annoyance.

“Driving traffic to a Web site” may be less important than a catalog’s ability to deliver information without digital annoyances, creepy tracking cookies, and ads for products one just purchased.

Stephen E Arnold, January 6, 2020

Why Black Boxes in Smart Software?

January 5, 2020

I read “Why Are We Using Black Box Models in AI When We Don’t Need To? A Lesson From An Explainable AI Competition.” The source is HDSR, which appears to be hooked up to MIT. Didn’t MIT find an alleged human trafficker an ideal source of contributions and worthy of a bit of “black boxing”? (See “Jeffrey Epstein’s money bought a cover-up at the MIT Media Lab.”) The answer seems obvious: Keep prying eyes out. Prevent people from recognizing how mundane flashy stuff actually is.

The write up from HDSR states:

The belief that accuracy must be sacrificed for interpretability is inaccurate. It has allowed companies to market and sell proprietary or complicated black box models for high-stakes decisions when very simple interpretable models exist for the same tasks.

The write up moves with less purpose that Jeffrey Epstein.

I noted this statement as well:

Let us insist that we do not use black box machine learning models for high-stakes decisions unless no interpretable model can be constructed that achieves the same level of accuracy. It is possible that an interpretable model can always be constructed—we just have not been trying. Perhaps if we did, we would never use black boxes for these high-stakes decisions at all.

I love the privileged tone of the passage.

Here’s my take:

Years ago I prepared for a European country’s intelligence service an analysis of the algorithms used in smart software. I thought this was an impossible job. But after making some calls, talking to wizards, and doing a bit of reading about what’s taught in computer science classes, my team and I unearthed several interesting factoids:

  1. The black box became the marketing hot button in the mid 1990s. The outfit adding oomph to mystery and secrecy was Autonomy. If you are not familiar with the company, think Bayesian maths. Keep the neuro linguistic programming mechanism under wraps differentiated Autonomy from its competition.
  2. Computer science and advanced mathematics courses around the world incorporated into their courses of study some useful and mostly reliable methods; for example, k means. There were another nine computational touchstones we identified. Did we miss a few? Probably, but my team concluded that most of the fancy math outfits were using a handful of procedures and fiddling with thresholds, training data, and workflows to deliver their solutions. Why reveal to anyone that under the hood most of the fancy stuff for NLP, text analytics, machine learning, and the other buzzwords which seem so 2020 were the same.
  3. My team also identified that each of the widely used, what we called “good enough” methods, could be manipulated. Change a threshold here, modify training data there, create a feedback loop and rules there—the system output results that appeared quite accurate, even useful. Putting the methods in a black box disguised for decades the simple methods used by Cambridge Analytica to skew outputs and probably elections. Differentiation comes not from the underlying methods; uniqueness is a result of the little bitty tweaks. Otherwise, most systems are just lik the competitions’ systems.

Net net: Will transparent methods prevail? Unlikely. Making something clear reduces its perceived value. Just think how linking Jeffrey Epstein to MIT alters the outputs about good judgment.

Black boxes? Very useful indeed. Secrets? Selective revelation of facts? Millennial marketing? All useful

Stephen E Arnold, January 5, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta