About Those Cut and Paste Smart Software Recipes
October 15, 2019
DarkCyber noted this write up in Vice: “A Code Glitch May Have Caused Errors In More Than 100 Published Studies.” Okay, no big deal. A few errors.
The write up quotes another source as saying:
“This simple glitch in the original script calls into question the conclusions of a significant number of papers on a wide range of topics in a way that cannot be easily resolved from published information because the operating system is rarely mentioned,” the new paper reads. “Authors who used these scripts should certainly double-check their results and any relevant conclusions using the modified scripts in the [supplementary information].”
So what?
if the code led Williams [expert who made the mistake] to wrongly identify the contents of his sample, chemists trying to recreate the molecule to test as a potential cancer drug would be chasing after the wrong compound
So what?
What unknown, unrealized errors exist within the cut and paste world of smart software?
What about errors in warfighting or crime fighting smart systems?
Don’t know?
No one does.
That’s the issue, isn’t it?
Stephen E Arnold, October 15, 2019
Amazon: Elasticsearch Bounced and Squished
October 14, 2019
DarkCyber noted “AWS Elasticsearch: A Fundamentally-Flawed Offering.” The write up criticizes Amazon’s implementation of Elasticsearch. Amazon hired some folks from Lucidworks a few years ago. But under the covers, Lucene thrums along within Amazon and a large number of other search-and-retrieval companies, including those which present themselves as policeware. There are many reasons: [a] good enough, [b] no one company fixes the bugs, [c] good enough, [d] comparatively cheap, [e] good enough. Oh, one other point: Not under the control of one company like those good, old fashioned solutions like STAIRS III, Fulcrum (remember that?), or Delphes (the francophone folks).
This particular write up is unlikely to earn a gold star from Amazon’s internal team. The Spun.io essay states:
I’m currently working on a large logging project that was initially implemented using AWS Elasticsearch. Having worked with large-scale mainline Elasticsearch clusters for several years, I’m absolutely stunned at how poor Amazon’s implementation is and I can’t fathom why they’re unable to fix or at least improve it.
I think the tip off is the phrase “how poor Amazon’s implementation is…”
The section Amazon Elasticsearch Operation provides some color to make vivid the author’s viewpoint; for example:
On Amazon, if a single node in your Elasticsearch cluster runs out of space, the entire cluster stops ingesting data, full stop. Amazon’s solution to this is to have users go through a nightmare process of periodically changing the shard counts in their index templates and then reindexing their existing data into new indices, deleting the previous indices, and then reindexing the data again to the previous index name if necessary. This should be wholly unnecessary, is computationally expensive, and requires that a raw copy of the ingested data be stored along with the parsed record because the raw copy will need to be parsed again to be reindexed. Of course, this also doubles the storage required for “normal” operation on AWS. [Emphasis in the original essay.]
The wrap up for the essay is clear from this passage:
I cannot fathom how Amazon decided to ship something so broken, and how they haven’t been able to improve the situation after over two years.
DarkCyber’s team formulated several observations. Let’s look at these in the form of questions and trust that some young sprites will answer them:
- Will Amazon make its version of Elasticsearch proprietary?
- Are these changes designed to “pull” developers deeper into the AWS platform, making departure more difficult or impossible for some implementations?
- Are the components the author of the essay finds objectionable designed to generate more revenue for Amazon?
Stephen E Arnold, October 14, 2019
Real Life Q and A for Information Access Allegedly Arrives
October 14, 2019
DarkCyber noted “Promethium Tool Taps Natural Language Processing for Analytics.” The write up, which may be marketing oriented, asserts:
software, called Data Navigation System, was designed to enable non-technical users to make complex SQL requests using plain human language and ease the delivery of data.
The company developing the system is Promethium, founded in 2018, may have delivered what users have long wanted: Ask the computer a question and get a usable, actionable answer. If the write up is accurate, Promethium has achieved with $2.5 million in funding a function that many firms have pursued.
The article reports:
After users ask a question, Promethium locates the data, demonstrates how it should be assembled, automatically generates the SQL statement to get the correct data and executes the query. The queries run across all databases, data lakes and warehouses to draw actionable knowledge from multiple data sources. Simultaneously, Promethium ensures that data is complete while identifying duplications and providing lineage to confirm insights. Data Navigation System is offered as SaaS in the public cloud, in the customer’s virtual private cloud or as an on-premises option.
More information is available at the firm’s Web site.
Stephen E Arnold, October 14, 2019
Australian Police Crowdsource Missing Person Investigations
October 14, 2019
DarkCyber noted the report “Police Turn to Hackers in Australia’s First Crowdsourced Attempt to Find Missing People.” The idea is interesting and illustrates the lateral thinking law enforcement is increasingly directing at certain investigative challenges.
The write up states:
More than 350 internet sleuths and “ethical hackers” — hobbyists and professionals — gathered at 10 locations around the country on Friday in a national missing-persons “hackathon”. The aim was to generate leads for 12 of Australia’s most frustrating cold cases, using sophisticated but legal methods of trawling the Internet.
There have been OSINT efforts to address criminal issues. Open source information is an important component of the approach.
WorldStack, according to the article, “has built a search index of content on the ‘dark web’ — a network of hidden, encrypted websites, sometimes used to organize illegal activity, and hoped to use image-matching software to help find some of the 12 missing people.”
Australian engineers have developed or contributed a number of useful tools. Examples include Sintelix, TeraText, ISYS Search Software, Funnelback, and LMNTrix, among others.
Stephen E Arnold, October 14, 2019
What to Be Found Via Google?
October 13, 2019
Do you want your business and/or Web site to be at the top of Google’s search results? It is a hard race to the top, but it can be won and Business 2 Community explains how in, “How To Use Google My Business Posts.” Google My Business (GMB) posts are part of the Google My Business profile, one of the many services that Google offers users to optimize their business’s profile. According to the article, Google My Business (GMB) posts are mini-ads for your business or the goods/services you offer.
The GMB posts allow users to publish products, services, events and other information directly to Google’s search and maps. Your content is then placed in front of potential customers. The biggest clincher is that the GMB posts are placed in Google’s many services in real time. That is a big deal! Being able to view and interact with content in real time is part of the augmented reality.
“Google offers four different types of posts to help you promote your business:
• Events, like a wine night or networking event
• Offers, such as sales or discounts
• Product updates, like new merchandise
• Announcements, such as “We’re open late” or “Closed due to inclement weather!”
There are two ways to create a GMB post, on a desktop or mobile device. Videos and photos can also be added to posts. All posts appear in a user’s GMB profile and are live on Google search for seven days, unless an event is more than seven days in the future.
GMB is like Facebook or LinkedIn, except for businesses. How long will it take before it becomes spam filled? Also Google Ads works too, but that requires some monetary investment.
Whitney Grace, October 13, 2019
Games Go International: New Challenge for LE?
October 12, 2019
Gamers the world over need no longer struggle to master games in a foreign tongue. The International Business Times reports, “AI-Based Emulator Will Translate Previously Untranslated Languages.” While some video games have multiple language options, the most sought-after ones tend to be available in just English, and many leading RPGs are first released in Japanese only. This new emulator taps into Google Translate to solve that problem. Writer Rishbah Jain informs us:
“A new software aims to change this equation. RetroArch emulator 1.7.8 has introduced an artificial intelligence feature which will use machine learning to master translation. It will translate the text used in the game to a language of the user’s choice. The player will get the option to see the text or get it through voice instructions. The former will disturb the gameplay, the latter won’t. It will do this using Google Translate. … The developers claim that the emulator can work with all kinds of arcade and classic consoles. Not only can it be used to translate English to other languages, but also the other way around. ‘You can set the source and target language already. How well it works is up to the translation services being used,’ the company behind the project, LibRetro says in its YouTube video on the emulator. This means that for gamers whose native language is English, they no longer need to go blind into a Japanese game.”
Jain acknowledges that setting up an emulator, which must be done before one begins playing, is a step that many gamers will skip (impatient beings that we are.) For those willing to take the trouble, though, RetroArch has posted instructions here.
Will policeware systems process the comments and emojis used in some online games’ chat functions?
Cynthia Murrell, October 11, 2019
The Chernobyl Control Rooms of the Digital Era
October 11, 2019
A minor error. Chernobyl melted down. Radiation galore. According to Red Ferret (great name!), a motivated individual can now tour the control room of that nuclear plant. For details, navigate to “Chernobyl Control Room – You Can Now Go Inside the Infamous Site.” [Note: This story has an interesting url. If the link doesn’t work, I am not sure a Bing, Google, or Yandex query will point to the source.]
My thought is, “Will those in the future visit the offices in which major companies took decisions that are Chernobyl-like in their impact?”
Here are a handful of examples of future tour destinations which might become vacations of the future:
- The office in which Tim Cook at Apple decided to explain that Apple was making an independent, objective decision about an app. This app informed iPhone users of Hong Kong police movements. For details, see the Guardian.
- The office at Google where executives made the decision that the HKmap.live would endanger lives. See the paywalled Wall Street Journal here.
- The work area of the person who “secretly” recorded Mark Zuckerberg explaining that he would go to the mats to fight the breakup of Facebook. See Bloomberg’s story here.
A new way to boost revenue for the whiz bang tech companies?
Could be.
Stephen E Arnold, October 11, 2019
Hot Buzzword: Continuous Intelligence
October 11, 2019
No, I don’t know what “continuous intelligence” means. When I worked at Booz, Allen, one of the presidents from that era remarked to me, “I have a sixth sense for great jargon.” That fellow, James Farley, would have embraced “continuous intelligence.” The phrase sounds good. It is metaphorical. It could support a new practice area.
I heard the word at the TechnoSecurity & Digital Forensics Conference. I am not sure which session speaker dropped the phrase. Maybe Cisco’s and Coalfire’s? At the time, I noted the phrase but did not think much about it.
This morning it surfaced again in “Clear the Path to Continuous Intelligence with Machine Learning, Consultancy Urges.” Not a Booz, Allen pitch which is interesting. The jargon outputters are from ThoughtWorks.
The write up defines the phrase “continuous intelligence” this way:
… The continuous intelligence state: This is where CD4ML platform thinking and a data DevOps culture become the norm. This is “continuous delivery for data,” the ThoughtWorks team explains. “As data scientists create more refined and accurate models, they can easily deploy these into production as replacements for prior models. Being able to create products which learn and complete the intelligence cycle in a continuous fashion is what sets this stage apart. The loops become more seamless and most of the hurdles are removed. Loops become tighter and faster with more use and more experimentation, which is a key indicator of the health of intelligence cycle.”
Got it? If not, a mid tier consulting firm will assist you as you travel the learning curve. A conference opportunity? Absolutely.
“Continuous intelligence” has arrived.
Stephen E Arnold, October 11, 2019
The Roots of Common Machine Learning Errors
October 11, 2019
It is a big problem when faulty data analysis underpins big decisions or public opinion, and it is happening more often in the age of big data. Data Science Central outlines several “Common Errors in Machine Learning Due to Poor Statistics Knowledge.” Easy to make mistakes? Yep. Easy to manipulate outputs? Yep. We believe the obvious fix is to make math point and click—let developers decide for a clueless person.
Blogger Vincent Granville describes what he sees as the biggest problem:
“Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing how to handle and fix it.) This is being done on such a large scale, I think it is probably the main cause of fake news, and the impact is disastrous on people who take for granted what they read in the news or what they hear from the government. Some people are sent to jail based on evidence tainted with major statistical flaws. Government money is spent, propaganda is generated, wars are started, and laws are created based on false evidence. Sometimes the data scientist has no choice but to knowingly cook the numbers to keep her job. Usually, these ‘bad stats’ end up being featured in beautiful but faulty visualizations: axes are truncated, charts are distorted, observations and variables are carefully chosen just to make a (wrong) point.”
Granville goes on to specify several other sources of mistakes. Analysts sometimes take for granted the accuracy of their data sets, for example, instead of performing a walk-forward test. Relying too much on the old standbys R-squared measures and normal distributions can also lead to errors. Furthermore, he reminds us, scale-invariant modeling techniques must be used when data is expressed in different units (like yards and miles). Finally, one must be sure to handle missing data correctly—do not assume bridging the gap with an average will produce accurate results. See the post for more explanation on each of these points.
Cynthia Murrell, October 11, 2019
Amazon Twitch: Some Thinking and Work to Do
October 10, 2019
I assume that this Verge story is accurate: “An Anti-Semitic Shooting in Germany Was Live-Streamed on Twitch.” Twitch allegedly said:
We are shocked and saddened by the tragedy.
Okay, but it is time for:
- Time delays in Twitch streams
- More aggressive content takedowns for soft porn, transmission of commercial television shows, and interesting online gambling sessions, among others
- Elimination of a banned user under one name (SweetSaltyPeach) now streaming as RachelKay.
The Verge reports:
Today’s attack echoed the March mass shooting of Muslims in Christchurch, New Zealand — which was streamed on Facebook Live. In today’s roughly 35-minute video, a man is seen shooting two people and attempting unsuccessfully to break into the synagogue. He also gives a brief speech into the camera, railing against Jews and denying that the Holocaust happened. Two people have been confirmed dead in today’s attack, and German law enforcement has raised the possibility that multiple attackers were involved. Only one perpetrator appears in this video.
Were young kids and young adults watching murder in real time? The Verge dances around the point:
It’s unclear how many people watched the initial stream or how many copies may have been archived at Twitch — which is owned by Amazon — or on other sites. Extremism researcher Megan Squire reported that the video was also spread through the encrypted platform Telegram, with clips being viewed by around 15,600 accounts. The Christchurch shooting was viewed live by only a few people, but reuploaded roughly 1.5 million times after the attack — so dealing with the aftermath will be a real concern. Complicating this is the fact that video of the attack — from people besides the perpetrator — is newsworthy footage. But as all social networks continue to fight hate content, live videos of shootings are a uniquely sensitive issue for live-streaming platforms.
Amazon wants to be a player in the policeware market. Amazon Twitch streaming crime is one thing. I might even believe it if the driver of the Bezos bulldozer opined, “Well, that’s a lot of video to screen.”
I think streaming murder just may be more important because what advertiser wants a pre-roll before a series of killings?
Does a live stream encourage illegal activity?
DarkCyber opines that the answer is, “Yes.”
The good old days are dead just like those who were killed on the Twitch stream.
Responsibility, not arrogance may be useful.
Stephen E Arnold, October 10, 2019