MIT and Ethics for the 21st Century: A New Spin on Academia, Ethics, and Technology
January 13, 2020
Yes, a new spin. There is nothing like spin, particularly when an august institution has accepted money from an interesting person. Who is this fascinating individual?
Jeffrey Epstein, alleged procurer, human trafficker, and hobnobber with really great and wonderful people.
I read, with some disgust, “Eight Revelations from MIT’s Jeffrey Epstein Report,” which was conveniently published in Technology Review, an organ of truth and insight affiliated with MIT. For context, I had just completed “Alphabet’s Top Lawyer to Retire after Google Founders Leave,” which appeared in the Bloomberg news-iverse. You remember Bloomberg, the outfit which reported with some nifty assertions that motherboard spying was afoot.
But to MIT and Epstein, then a comment about the sterling outfit Google.
MIT’s write up explained that MIT was prudent. Instead of accepting $10 million from the interesting and now allegedly deceased Mr. Epstein, the university accepted a mere $800,000. Such restraint. And that’s the subtitle for the write up!
What are the eight teachings derived from the fraternization, support, and joy of accepting the interesting Mr. Epstein? Here you go, gentle reader:
- The relationship for money extended over 15 years. Such tenacity.
- The hook up with Mr. Epstein were happenstance. Maybe MIT was seduced?
- The $10 million didn’t happen, but the donations had to be anonymous. Such judgment.
- It was the MIT Corporation, not the real school.
- Mr. Epstein prevaricated about his donations. Quite a surprise, of course. Lies, deception, manipulation, etc. etc.
- Mr. Epstein attended real MIT events, like the funeral for “AI pioneer Marvin Minsky.” An icon, of course.
- No big wheels like Bill Gates were involved in directing Mr. Epstein’s money. Perhaps a bit of color on this point would be helpful.
- A real MIT professional asserted that Mr. Epstein was a person whom MIT “should treat with respect.”
And the write up concludes, “The Media Lab [a unit of MIT] rejected $25,000, Mr. Epstein tried to donate in 2019. Another example of judgment.
To sum up, quite a write up about an institution which I assume offers a course in ethics. Well, maybe not. Full disclosure: I was quote in the MIT Technology Review late in 2019. I was not thrilled with that association with an outfit will to treat Mr. Epstein with respect.
Now to the Google. The world’s largest online advertising agency seems to be channeling the antics of Madison Avenue in the 1950s. In this episode of the Science Club Explores Biological Impulses”, I learned:
David Drummond, the legal chief of Google parent Alphabet Inc. and a company veteran, stepped down following questions about his conduct at the technology giant.
The conduct may have involved another Googler. What do two Googlers create? Why another Googler it seems. Who knew that Madison Avenue extended from New York City to Mountain View, California.
Net net: Two outfits with people who should have known about propriety demonstrated poor judgment. Look for slightly used ethical compasses on eBay. Lightly used but likely to manifest flawed outputs.
I would suggest that certain non technical behaviors qualify as grounds for viewing MIT and Google as very poorly managed institutions staffed by individuals who operate from a position above the “madding crowd.”
Stephen E Arnold, January 13, 2020
Journalists: Welcome Your New Colleague Artificial Intelligence
January 12, 2020
Life is going to become more interesting for journalists. I use the word in its broadest possible sense. That includes the DarkCyber team, gentle reader. Who needs humans when smart software is available.
“AI-Written Articles Are Copyright-Protected, Rules Chinese Court” explains that software can create content. Then that content is protected by copyright laws.
DarkCyber noted this statement:
According to state media outlet China News Service (CNS), a court in Shenzhen this month ruled in favor of Tencent, which claimed that work created by its Dreamwriter robot had been copied by a local financial news company. The Shenzhen Nanshan District People’s Court ruled that, in copying the Dreamwriter article, Shanghai Yingxun Technology Company had infringed Tencent’s copyright. Dreamwriter is an automated writing system created by Tencent and based on the company’s own algorithms.
Presumably software can ingest factoids, apply algorithms, and output new, fresh, and original information. No hanging out at the Consumer Electronic Show looking for solid information at real technology event. Imagine the value of creating “real” news without having to pay humans. No hotel, airplane, taxi, or meal expenses.
Special content can be produced on an industrialized scale like Double Happiness ping pong balls.
Upsides for journalists include:
- Opportunities to explore new careers in fast food, blogging, and elder care
- Time to study with coal miners learning to code
- Mental space to implement entrepreneurial ideas like elderberry products designed for those who suffer from certain allergic responses.
Downsides, but only a few, of course, are:
- No or reduced income
- Loss of remaining self respect
- Weight loss due to items one and two in this downside list
- No need to have lunch with these content generators.
One question: What’s a nation state able to do with content robots?
Stephen E Arnold, January 12, 2020
Why Archived Information Can Be Useful
January 11, 2020
There’s nothing like a ubiquitous service like email and systems for keeping copies of information. Online is interesting and often surprising. This thought struck DarkCyber while reading the Time Magazine article “‘This Airplane Is Designed by Clowns.’ Internal Boeing Messages Describe Efforts to Dodge FAA Scrutiny of MAX.” Here’s the passage of interest:
“This airplane is designed by clowns, who in turn are supervised by monkeys,” said one company pilot in messages to a colleague in 2016, which Boeing disclosed publicly late Thursday.
Will the clowns and monkeys protest.
Another statement which comes directly from the Guide Book for Captain Obvious Rhetoric, which may have influenced this Time Magazine editorial insight:
The communications threaten to upend Boeing’s efforts to rebuild public trust in the 737 Max…
Ah, email and magazines. One good thing, however. No references to AI, NLP, or predictive analytics appear in the write up.
Stephen E Arnold, January 11, 2020
Amazon Finds a Home in the UK
January 10, 2020
Just a quick item about Amazon Web Service contract size. “Home Office Reinforces Commitment to AWS with £100m Cloud Hosting Deal” makes clear that a UK government entity has not been won over my Microsoft Azure. The write up reports this information:
“The award of the public cloud hosting services contract to Amazon is a continuation of services already provided to the Home Office,” a departmental spokesperson told Computer Weekly. “The contract award provides significant savings for the department of a four-year term.” The Home Office is renowned for being a heavy user of cloud technologies, and is – according to the government’s own Digital Marketplace IT spending league table – by far the biggest buyer of off-premise services and technologies via the G-Cloud procurement framework.
The contract is significant because it suggests that other Five Eyes’ participants will be exposed to the AWS approach.
For Amazon staff working on the contract, there may be some meetings at Clarendon Terrance. London taxi drivers know where that is. No digital map needed.
Stephen E Arnold, January 10, 2020
Looking at Some Research Made Public by Google
January 10, 2020
The today Google is different from the yesterday Google. When I began work on the Google Legacy in 2002, I was able to locate Google presentations in PowerPoint form, Google papers posted on Google sub sites, and from Googlers who staffed booths at trade shows. Often these individuals would email me links to public information stored on obscure online urls.
Today figuring out where often obscure Google information is located is very difficult. Google is not so much secretive and really disorganized. Now that’s saying something because the early days of Google were comparable to predicting which way a squirrel would jump when a driver honked at a critter sitting in the road.
You can access some Google documents, often for a limited period of time, in the Google Research publication database. Today version of the service looks like this:
The service has about 6,000 papers posted. Some of these are full text; others are bibliographic citations. Some papers disappear.
In its present form, one can get some insight into what Google wants to expose to the public. Thus, the listing has a bit of marketing and PR spin to it. If you want to know about Alon Halevy’s Transformics technology, this collection is not for you. Ramanathan Guha has a single citation.
The good news is that the service is online. As you use the resource as a complement to other research, the limitations of the service become visible.
What’s Google philosophy of research? The Web site contains a link to the Google research philosophy. There you go. And I did not spot any advertising on the pages I examined…yet.
Stephen E Arnold, January 10, 2020
Why Sci Tech Publishers Fight Online Innovation: Money
January 10, 2020
Who reads academic papers? Give up.
Answer: Other academics, students, and curious people with an interest in often arcane research.
Here’s another question: What characteristic do many of these journal readers share?
Answer: A desire to zoom through information without paying.
“The Unstoppable Rise of Sci Hub: How Does a New Generation of Researchers Perceive Sci Hub?” explains:
Interestingly, Sci-Hub’s attraction, unlike RG’s, is not its social media features (it has none), but that it offers free and relatively easy access to millions of papers harvested (illegally) from publishers’ websites. It is an open one-stop full-text warehouse, which is thought to be more convenient to use than clunky heavily regulated library platforms. There is another, possibly, more important explanation for Sci-Hub’s popularity and that is it speaks to ECRs’ sharing beliefs and open access (OA) sympathies.
The write up closes with this statement:
The bigger question is whether in the longer term Sci-Hub will still exist? The answer is: not in its current form, but, as Napster was for music, it might be the precursor to the collapse of the status quo. However, unlike ResearchGate [this is a European commercial online site for sci tech types to share information] However, unlike RG, Sci-Hub does not have the opportunity to sell its platform, there is no advertising, or ‘social networking’ to obtain vital user data that it can monetize; it is a pure and unashamed ‘pirate’.
The write up is worded in a polite way. The message seems clear, even in rural Kentucky: Professional database publishers are likely to face continued pressure from journal readers who want to cut the cost of information.
What will professional publishers do? Their historical behavior: Raise prices. Innovative? The approach has worked before.
Stephen E Arnold, January 10, 2020
Lucidworks: Beyond Search for Sure
January 9, 2020
Lucid Imagination experienced what DarkCyber recalls as a bit of turmoil. From the git go, there was tension in the open sourcey ranks. One of the founders was unceremoniously given an opportunity to find his future elsewhere. Then there was the game of Revolving Door Presidents. Next was the defection of some lucid thinkers to Amazon, not in Seattle but just up the 101 to some non descript buildings. Like a law of nature another round of presidential revolving doors. Along the way, more investors wrote checks for what was an open source play based on Lucene/Solr. (I know that writing the two “names” together does not capture the grandiosity of the conception of community supported search and the privately held companies efforts to create a huge, billion dollar information access business. Sigh.
Now Lucidworks (which I automatically interpret as the phrase “Lucidworks. Really?”) has acquired an eCommerce vendor. Hello, what’s happening Magento, Mercado, Shopify, and Amazon. Yep, Amazon. But doesn’t Amazon have search too? Trivial point. Lucidworks is going to turn the $200 million in investment capital, an interface scripting engine, open source software, and Cirrus10 (an ecommerce service provider) into billions. Yes, billions!
According to “Lucidworks Acquires Cirrus10, Global Ecommerce Service Provider, to Deepen Domain Expertise and Become a Leader in Digital Commerce Solutions” states:
Lucidworks, leader in AI-powered search, acquires Cirrus10, ecommerce solutions expert with more than 100 ecommerce customers. Lucidworks and Cirrus10 have worked together as partners for the past two years and now combine their domain expertise to provide more targeted solutions for different domains in the fast-moving ecommerce market.
The Yahoo news story points out that Lucidworks’ secret sauce is a system:
produces relevant results, recommends products that meet customer goals, and predicts shopper intent to create a more engaging experience.
And don’t forget artificial intelligence. AI! Obviously.
But whose AI? The answer appears to be AI from Cirrus10. DarkCyber noted this statement from a co founder of the ecommerce service provider:
“Fusion is the world’s only platform for extensible AI-driven search. Fusion elevated our service offerings by giving us a framework for exploring machine learning with our customers, and using it, we can build personalized and scalable relevancy models without a black box or army of data scientists. By combining Lucidworks search and AI expertise with our deep experience in the ecommerce space we can cement our role as digital commerce solution leaders.”—Peter Curran, Cirrus10
What appears to be the business strategy for Lucidworks is to get something that generates sustainable revenue, allows the company to upsell Cirrus10’s customers, and differentiate Lucidworks from the competitors in plain old search.
There are competitors; for example, outfits with venture capital backers demanding results (Algolia, Coveo). Also, open sourcey solutions (Drupal Commerce, Magento Community Edition) and small, feisty outfits like SLI Systems and EasyAsk). Note: This is a partial list. I almost forget companies like Amazon, eBay, and Google.
DarkCyber interprets the “beyond search” phrase as an attempt to make a 12 year old company into a revenue and profit machine.
DarkCyber, which is an annex to our blog Beyond Search, wishes the clear thinkers a great 2020. The question “Lucidworks. Really?” could be answered as long as AI, NLP, machine learning, open source, and synergy produce a winner, not a horse designed by a committee.
Stephen E Arnold, January 9, 2019
YouTube: Adulting Continues
January 9, 2020
YouTube is taking a step designed to protect children on its platform, despite concerns that the move may decrease revenue for the creators of children’s content. CBS News shares their six minute Privacy Watch segment, “YouTube to Limit Kids Video Data Collection.” Specifically, the platform will no longer attach personalized advertising to children’s content. The video description states:
“YouTube will be limiting the amount of data it collects on children. Going forward, videos made for children won’t have personalized ads. Creators are concerned this could result in less revenue, and ultimately less content for children. CNET senior producer Dan Patterson joins CBSN to discuss the development.”
The segment begins by explains what personalized ads are, then covers who pushed for this change: privacy and security experts, regulators, and even YouTube’s parent company. As we are reminded, Google was sued last year over the issue of children’s privacy on that platform. Now, in fact, the company is trying to assert the platform is not for children under age 13 at all. That declaration rings hollow, though. As the Privacy Watch host notes, kids “live on YouTube, they consume videos for hours at a time. It’s basically their Netflix.” Just try to rebottle that genie.
The interview also discusses the Children’s Online Privacy Protection Act of 1998, the speed at which laws get outdated, and the sophistication of today’s ad technology. We also learn that generalized ads will still be included in children’s content, but those creating that content worry that will not be enough to maintain their revenue streams. Perhaps, but let us ask this—if the platform is no longer intended for those under 13, shouldn’t many of those operations be shuttering or shifting to another platform, anyway?
Cynthia Murrell, January 9, 2020
Free and Legal Movies Online
January 9, 2020
We have found a handy resource—Fossbytes lists “20 Free Movie Download Sites for 2020 [Legal Streaming].” It can be difficult to pick out the legit sites from the illegal sources in search results, and I’m sure all our gentle readers prefer to be on the up-and-up. While he is at it, writer Adarsh Verma links to his publication’s lists of free and legal music downloads and sports streaming, as well.
Verma provides a screenshot from each site and a paragraph or two describing its content. You may want to bookmark it so you can take your time exploring all 20 options. Some entries, like Netflix and Hulu, are only free for the extent of their trial periods. Still, one could get some mileage out of that. Several fill very specific niches, like the Korean Film Archive (classic Korean films), Le CiNéMa Club (Indie films), and Open Culture (international films). Others are more general-purpose, like The Internet Archive, Pluto TV, and Vimeo.
Here is what the write-up says about one of our favorite Googley properties, YouTube:
“In an earlier version of this article, YouTube was listed a lot lower in this list. However, thanks to some recent changes and the company’s growing inclination towards free, ad-supported content, I’ve decided to list it as the #1 source of free movies on the web. Now, YouTube offers more than 100 complete feature-length films on its platform. It’s also mentioned on our latest list of best movie streaming sites. This makes YouTube a perfectly free movie website for those who can’t afford to spend a premium for accessing Netflix and Hulu content. You can simply visit this link and watch movies like Zookeeper, Legally Blonde, The Terminator, Flawless, Kung Fu Killer, IP Man, and more. Moreover, YouTube also has plans to make its original shows and movies free from 2020. It’s worth noting that these movies for streaming are currently only available in the USA.”
Some options, like YouTube, are supported by interruptive ads. Most of the sites are available anywhere in the world, though several are only available in the U.S. Check out the article for the rest of the list.
Cynthia Murrell, January 9, 2020
Data Are a Problem? And the Solution Is?
January 8, 2020
I attended a conference about managing data last year. I sat in six sessions and listened as enthusiastic people explained that in order to tap the value of data, one has to have a process. Okay? A process is good.
Then in each of the sessions, the speakers explained the problem and outlined that knowing about the data and then putting it in a system is the way to derive value.
Neither Pros Nor Cons: Just Consulting Talk
This morning I read an article called “The Pros and Cons of Data Integration Architectures.” The write up concludes with this statement:
Much of the data owned and stored by businesses and government departments alike is constrained by the silos it’s stuck in, many of which have been built over the years as organizations grow. When you consider the consolidation of both legacy and new IT systems, the number of these data silos only increases. What’s more, the impact of this is significant. It has been widely reported that up to 80 per cent of a data scientist’s time is spent on collecting, labeling, cleaning and organizing data in order to get it into a usable form for analysis.
Now this is most true. However, the 80 percent figure is not backed up. An IDG expert whipped up some percentages about data and time, and these, I suspect, have become part of the received wisdom of those struggling with silos for decades. Most of a data scientist’s time is frittered away in meetings, struggling with budgets and other resources, and figuring out what data are “good” and what to do with the data identified by person or machine as “bad.”
The source of this statement is MarkLogic, a privately held company founded in 2001 and a magnet for $173 million from funding sources. That works out to an 18 years young start up if DarkCyber adopts a Silicon Valley T shirt.
A modern silo is made of metal and impervious to some pests and most types of weather.
One question the write up begs is, “After 18 years, why hasn’t the methodology of MarkLogic swept the checker board?” But the same question can be asked of other providers’ solutions, open source solutions, and the home grown solutions creaking in some government agencies in Europe and elsewhere.
Several reasons:
- The technical solution offered by MarkLogic-type companies can “work”; however, proprietary considerations linked with the issues inherent in “silos” have caused data management solutions to become consultantized; that is, process becomes the task, not delivering on the promise of data, elther dark or sunlit.
- Customers realize that the cost of dealing with the secrecy, legal, and technical problems of disparate, digital plastic trash bags of bits cannot be justified. Like odd duck knickknacks one of my failed publishers shoved into his lumber room, ignoring data is often a good solution.
- Individuals tasked with organizing data begin with gusto and quickly morph into bureaucrats who treasure meetings with consultants and companies pitching magic software and expensive wizards able to make the code mostly work.
DarkCyber recognizes that with boundaries like budgets, timetables, measurable objectives, federation can deliver some zip.
Silos: A Moment of Reflection
The article uses the word “silo” five times. That’s the same frequency of its use in the presentations to which I listened in mid December 2019.
So you want to break down this missile silo which is hardened and protected by autonomous weapons? That’s what happens when a data scientist pokes around a pharma company’s lab notebook for a high potential new drug.
Let’s pause a moment to consider what a silo is. A silo is a tower or a pit used to store core, wheat, or some other grain. Dust is silos can be exciting. Tip: Don’t light a match in a silo on a dry, hot day in a state where farms still operate. A silo can also be a structure used to house a ballistic missile, but one has to be a child of the Cold War to appreciate this connotation.
As applied to data, it seems that a silo is a storage device containing data. Unlike a silo used to house maize or a nuclear capable missile, the data silo contains information of value. How much value? No one knows. Are the data in a digital silo explosive? Who knows? Maybe some people should not know? What wants to flick a Bic and poke around?