A Googley Rah Rah for Synthetic Data

April 27, 2023

Vea4_thumb_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I want to keep this short. I know from experience that most people don’t think too much about synthetic data. The idea is important, but other concepts are important and no one really cares too much. When was the last time Euler’s Number came up at lunch?

A gaggle of Googlers extoll the virtues of synthetic in a 19 page ArXiv document called “Synthetic Data from Diffusion Models Improves ImageNet Classification.” The main idea is that data derived from “real” data are an expedient way to improve some indexing tasks.

I am not sure that a quote from the paper will do much to elucidate this facet of the generative model world. The paper includes charts, graphs, references to math, footnotes, a few email addresses, some pictures, wonky jargon, and this conclusion:

And we have shown improvements to ImageNet classification accuracy extend to large amounts of generated data, across a range of ResNet and Transformer-based models.

The specific portion of this quote which is quite important in my experience is the segment “across a range of ResNet and Transformer-based models.” Translating to Harrod’s Creek lingo, I think the wizards are saying, “Synthetic data is really good for text too.”

What’s bubbling beneath the surface of this archly-written paper? Here are my answers to this question:

  1. Synthetic data are a heck of a lot cheaper to generate for model training; therefore, embrace “good enough” and move forward. (Think profits and bonuses.)
  2. Synthetic data can be produced and updated more easily that fooling around with “real” data. Assembling training sets, tests, deploying and reprocessing are time sucks. (There is more work to do than humanoids to do it when it comes to training, which is needed frequently for some applications.)
  3. Synthetic datasets can be smaller. Even baby Satan aka Sam Altman is down with synthetic data. Why? Elon could only buy so many nVidia processing units. Thus finding a way to train models with synthetic data works around a supply bottleneck.

My summary of the Googlers’ article is much more brief than the original: Better, faster, cheaper.

You don’t have to pick one. Just believe the Google. Who does not trust the Google? Why not buy synthetic data and ready-to-deploy models for your next AutoGPT product? Google’s approach worked like a champ for online ads. Therefore, Google’s approach will work for your smart software. Trust Google.

Stephen  E Arnold, April 27, 2023

Is It Lights Out on the Information Superhighway?

April 26, 2023

Vea4_thumb_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

We just completed a lecture about the shadow web. This is our way of describing a number of technologies specifically designed to prevent law enforcement, tax authorities, and other entities charged with enforcing applicable laws in the dark.

Among the tools available are roulette services. These can be applied to domain proxies so it is very difficult to figure out where a particular service is at a particular point in time. Tor has uttered noises about supporting the Mullvad browser and baking in a virtual private network. But there are other VPNs available, and one of the largest infrastructure service providers is under what appears to be “new” ownership. Change may create a problem for some enforcement entities. Other developers work overtime to provide services primarily to those who want to deploy what we call “traditional Dark Web sites.” Some of these obfuscation software components are available on Microsoft’s GitHub.

I want to point to “Global Law Enforcement Coalition Urges Tech Companies to Rethink Encryption Plans That Put Children in Danger from Online Abusers.” The main idea behind the joint statement (the one to which I point is from the UK’s National Crime Agency) is:

The announced implementation of E2EE on META platforms Instagram and Facebook is an example of a purposeful design choice that degrades safety systems and weakens the ability to keep child users safe. META is currently the leading reporter of detected child sexual abuse to NCMEC. The VGT has not yet seen any indication from META that any new safety systems implemented post-E2EE will effectively match or improve their current detection methods.

From my point of view, a questionable “player” has an opportunity to make it possible to enforce laws related to human trafficking, child safety, and related crimes like child pornography. The “player” seems interested in implementing encryption that would make government enforcement more difficult, if not impossible in some circumstances.

The actions of this “player” illustrate what’s part of a fundamental change in the Internet. What was underground is now moving above ground. The implementation of encryption in messaging applications is a big step toward making the “regular” Internet or what some called the Clear Web into a new version of the Dark Web. Not surprisingly, the Dark Web will not go away, but why develop Dark Web sites when Clear Web services provide communications, secrecy, the ability to transmit images and videos, and perform financial transactions related to these data. Thus the Clear Web is falling into the shadows.

My team and I are not pleased with ignoring appropriate and what we call “ethical” behavior with specific actions to increase risks to average Internet users. In fact, some of the “player’s actions” are specifically designed to make the player’s service more desirable to a market segment once largely focused on the Dark Web.

More than suggestions are needed in my opinion. Direct action is required.

Stephen E Arnold, April 26, 2023

What Smart Software Will Not Know and That May be a Problem

April 26, 2023

This blog post is the work of a real, live dinobaby. No smart software involved.

I read a short item called “Who Owns History? How Remarkable Historical Footage Is Hidden and Monetized.” The main point of the article was to promote a video which makes clear that big companies are locking “extraordinary footage… behind paywalls.” The focus is on images, and I know from conversations with people with whom I worked who managed image rights years ago. The companies are history; for example, BlackStar and Modern Talking Pictures. And there were others.

Images are now a volleyball, and the new spiker on the Big Dog Team is smart software generated images. I have a hunch that individuals and companies will aggregate as many of these as possible. The images will then be subject to the classic “value adding” process and magically become for fee. Image trolls will feast.

I don’t care too much about images. I do think more about textual and tabular content. The rights issue is a big one, but I came at smart software from a different angle. Smart software has to be trained, whether via a traditional human constructed corpus, a fake-o corpus courtesy of the synthetic data wizards, or some shotgun marriage of “self training” and a mash up of other methods.

But what if important information are not available to the smart software? Won’t that smart software be like a student who signs up for Differential Geometry without Algebraic Topology? Lots of effort but that insightful student may not be in gear to keep pace with other students in the class. Is not knowing the equivalent of being uninformed or just dumb?

One of the issues I have with smart software is that some content, which I think is essential to clear thinking, is not available to today’s systems. Let me give one example. In 1963, when I was sophomore at a weird private university, a professor urged me to read the metaphysics text by a person named A. E. Taylor. The college I attended did not have too many of Dr. Taylor’s books. There was a copy of his Aristotle and nothing else. I did some hunting and located a copy of Elements of Metaphysics, a snappy thriller.

However, Dr. Taylor wrote a number of other books. I went looking for these because I assume that the folks training smart data want to make sure the “model” has information about the nature of information and related subjects. Guess what? Project Gutenberg, the Internet Archive, and the online gem Amazon have the Aristotle book and a couple of others. FYI: You can get a copy of A. E. Taylor’s Metaphysics for $3.88, a price illustrating the esteem in which Dr. Taylor’s work is held today.

My team and I ran some queries on the smart software systems to which we have access. We learned that information from Dr. Taylor is a scarce as hen’s teeth. We shifted gears and checked out information generated by the much loved brother of Henry James. More of William James’s books were available at bargain basement prices. A collection of essays was less than $2 on Amazon.

My point is that images are likely to be locked up behind a paywall. However, books which may be important to one’s understanding of useless subjects like ethics, perception, and information are not informing the outputs of the smart software we probed. (Yes, we mean you, gentle Bard, and you too ChatGPT.)

Does the possible omission of these types of content make a difference?

Probably not. Embrace synthetic data. The “old” content is not digitally massaged. Who cares? We are in “good enough” land. It’s like a theme park with a broken rollercoaster and some dicey carnies.,

Stephen E Arnold, April 26, 2023

Google: Any Day Now, Any Day Now

April 21, 2023

Vea4_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I read what could be a recycled script from the Sundar and Prabhakar Comedy Show. Although not yet a YouTube video series, the company is edging ever closer to becoming the most amusing online advertising company in Mountain View.

Google Devising Radical Search Changes to Beat Back A.I. Rivals” is chock full of one-liners. Now these are not as memorable as Jack Benny’s “I’m thinking it over” or Abbott and Costello’s “I don’t know is on third”, but the Google is in the ball park.

I liked these statements:

The tech giant is sprinting. [Exactly how does Googzilla sprint?]

Google is racing [Okay, Kentucky Derby stuff or NASCAR stuff? One goes at the speed of organisms, and the other is into the engineering approach to speed. Google is in progressive tense mode, not delivering results mode.]

we’re excited about bringing new A.I.-powered features to search, and will share more details soon.” [I laughed at the idea of an outfit in panic and Red Alert mode getting exciting. Is this like a high school science club learning that it has qualified to participate in the international math competition or excite like members of the high school science club learning that the club will not be expelled for hijacking the principal’s morning announcements.]

“Modernizing its search engine has become an obsession at Google…” [I wonder if this is the type of obsession that pulled the Google VP to his yacht with a specialized contractor allegedly in possession of a controlled substance or the legal eagle populating his nest or the Google HR mastermind who made stochastic parrot the go-to phrase for discrimination and bias.’’]

The article contains more comedic gems. The main point is that my team and I cannot keep pace with the number of new applications of the chatbot technology. Amazon is giving the capability away free. China’s technical sector continues to beaver away adding to its formidable array of software capabilities. Plus we spotted a German outfit able to crank out interesting videos of former President Obama making fascinating statements about another former president.

The future and progressive present tenses are interesting. Other firms are outputting features, services, and products at a remarkable pace.

And what’s the Google search sensitive professionals doing? Creating more grist for the Sundar and Prabhakar Comedy Show.

The only problem is that Google continues to talk, do PR, and promise. What’s that suggest about quantum supremacy or delivering relevant search results? I do know one thing. If I want an answer, I am going to run the query on the You.com service, thank you very much.

Stephen E Arnold, April 21, 2023

SenseChat: Better Than TikTok?

April 18, 2023

Vea4_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

In the midst of “chat”-ter about smart software, the Middle Kingdom shifts into babble mode. “Meet SenseChat, China’s Latest Answer to ChatGPT” is an interesting report. Of course, I believe everything I read on the Internet. Others may be more skeptical. To those Doubting Thomasinas I say, “Get with the program.”

The article reports with the solemnity of an MBA quoting from Sunzi or Sun-Tzu (what does a person unable to make sense of ideographs know?):

…SenseChat could tell a story about a cat catching fish, with multiple rounds of questions and responses.

And what else? The write up reported:

… the bot could help with writing computer code, taking in layman-level questions in English or Chinese and then translating them into a workable product.

SenseTime, the company which appears to “own” the technology is, according to the write up:

best known as a leader in computer vision.

Who is funding SenseTime? Perhaps Alibaba, the dragon with the clipped wings and docked tail. The company is on the US sanctions list. Investors in the US? Chinese government entities?

The write up suggests that SenseTime is resource intensive. How will the Chinese company satiate its thirst for computing power? The article “China’s Loongson Unveils 32 Core CPU, Reportedly 4X Faster Than Arm Chip” implies that China’s push to be AMD, Intel, and Qualcomm free is stumbling forward.

But where did the surveillance savvy SenseTime technology originate? The answer is the labs and dorms at Massachusetts Institute of Technology. Tang Xiao’ou started the company in 2021. Where does SenseTime operated? From a store front in Cambridge, Massachusetts, or a shabby building on Route 128? Nope. The MIT student labors away in the Miami Beach of the Pacific Rim, Pudong, Shanghai.

Several observations:

  1. Chinese developers, particularly entities involved with the government of the Middle Kingdom, are unlikely to respond from letters signed by US luminaries
  2. The software is likely to include a number of interesting features, possibly like those on one of the Chinese branded mobiles I once owned which sent data to Singapore data centers and then to other servers in a nearby country. That cloud interaction is a wonderful innovation for some in my opinion.
  3. Will individuals be able to determine what content was output by SenseTime-type systems?

That last question is an interesting one, isn’t it?

Stephen E Arnold, April 18, 2023

Good Enough Is Not Good Enough. Sorry, You Get an F from Me

April 17, 2023

Vea4_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I have become increasingly concerned about the idea of good enough. Whether it is the quality of Amazon’s customer service or the work of a handyman planting bushes, the idea of excellence has vaporized.

How far has the rot of “good enough” chewed into the philosophy of over achievers? I would say that the wooden structure of excellence has been beavered into woodchips. The result is that the woodchips have clogged the stream, flooded basements, and drowned children and pets.

One outstanding example of “good enough” is the essay (thank heavens, the write up was not presented as “real” news) is “Being Mediocre Sets You Free.” I wonder if the author would have been able to submit this to William James as a required analysis of motivation in his Harvard psychology class in the 19th Century? My hunch is that Mr. James would have offered the aspiring student to consider might be called the pursuit of excellence.

The article posits as a truth which can be extended to cover a wide swath of intellectual ground ideas like this statement about being a so-so gardener:

There was no performance with this hobby. No end goal. No metric of success other than I suppose, do I enjoy it? And even enjoyment isn’t quite the right word for enjoyment has its own never ending metrics. I suppose gardening brings me a modest sort of happiness. It focuses me. It releases me from my head and my nerves. And that is quite enough.

The idea exerts a powerful magnetic pull on those who lack the gumption to commit to a task, master it, and deliver excellence. Who judges excellence? May I suggest it is a result obtained from others engaged in the same activity. What if the person does not enjoy the activity? My response is, “Suck it in. Do the job in the best way possible?”

Mediocrity provides the warmth and comfort of a heavy blanket filled with plasticized pellets. Excellence means cold fingers wrestling with flower bulbs or recalcitrant books in a library, making notes when others are working on pre-diabetes at a tavern, or slapping plaster in a careless manner in order to watch TikToks.

I don’t need to learn about good enough. I want work done in an excellent way. Good enough is a C. Average. Sure, there is comfort in the normcore. Why not find solace in excellence? Why define freedom as gray? The bright colors of life shine from doing one’s best.

Stephen E Arnold, April 17, 2023

Google versus Microsoft: Whose Marketing Is Wonkier?

April 17, 2023

Vea4_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I want to do what used to be called a comparison. I read Microsoft’s posts on April 12, 2023 (I don’t know for certain because LinkedIn does not provide explicit data and time information because who really cares about indexing anymore.) The first post shown in the screenshot is from the Big Dog himself at Microsoftland. The information is one more announcement about the company’s use of OpenAI’s technology in another Microsoftland product. I want to shout, “Enough already,” but my opinion is not in sync with Microsoft’s full-scale assault on Microsoft users. It is now a combination of effective hyperbole and services designed to “add value.” The post below Mr. Nadella’s is from another Softie. The main point is that Microsoft is doing smart things for providers and payors. My view is that Microsoft is doing this AI thing for money, but again my view is orthogonal to the company which cannot make some of its software print on office printers.

image

Source: LinkedIn 2023 at shorturl.at/egnpz. Note: The LinkedIn url is a long worm thing. I do not know if the short url will render. If not, give Microsoft’s search function a whirl.

Key takeaways: Microsoft owns a communications channel. Microsoft posts razzmatazz verbiage about smart software. Microsoft controls the message. Want more? Just click the big plus and Microsoft will direct more information directly at you, maybe on your Windows 11 start menu.

Now navigate to “Sundar Pichai’s Response to the Delayed Launch of Bard Is Brilliant and Reminds Us Why Google Is Still Great.” I want to cry for joy because the Google has not lost the marketing battle with Microsoft. I want to shout, “Google is number one.” I want to wave Googley color pom poms and jump up and down. Join me. “Google is number one.”

The write up strikes me as a remarkable example of lip flapping and arm waving; to wit:

Google secures its competitive advantage not necessarily by being the fastest to act, but by staying the course on why it exists and what it stands for. Innovation and product disruption is baked into its existence. From its operating models to its people strategy, everything gets painted with a stroke of ingenuity, curiosity, and creativity. While other companies may have been first to market with new technologies or products, Google’s focus on innovation and improving upon existing solutions has allowed it to surpass competitors and become the market leader in many areas.

The statements in this snippet are remarkable for several reasons:

  1. Google itself announced Code Red, a crisis. Google itself called Mom and Dad (Messrs. Brin and Page) to return to the Mountain View mothership to help figure out what to do after Microsoft’s Davos AI blizzard. Google itself has asked every employee to work on smart software. Now Google is being cautious. Is that why Googler Jeff Dean has invested in a ChatGPT competitor?
  2. Google is killing off products. The online magazine with the weird logo published “The Google Graveyard” in 2019. On April 12, 2023, Google killed off something called Currents. Believe it or not, the product was to replaced Google Plus. Yeah, Google really put wood behind the hit for a social media home run.
  3. The phrase “ingenuity, curiosity, and creativity” does not strike me as the way to sum up how Google operates. I think in terms of “poaching and paying for the GoTo, Overture, Yahoo online advertising inspiration,” perfecting the swinging door so all parties to an ad deal pay Google, and speaking like a wandering holy figure when answering questions before a legal body.

Key takeaways: Google relies on a PR firm or a Ford F 150 Lightning carrying Google mouse pads to get a magazine to write an article which appears to be a reality not reflected by the quite specific statements and actions of the Google.

Bottom-line: Microsoft bought a channel. Google did not. Google may want to consider implementing the “me too” approach and buy an Inc.-type publication. I am now going to be increasingly skeptical of the information presented by Inc. Magazine. I already know to be deeply suspicious of LinkedIn.

Stephen E Arnold, April 17, 2023

A Trend? Silicon Valley Type Media Squabbles

April 13, 2023

Vea4_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
In rural Kentucky the Silicon Valley type media don’t capture the attention of too many in Harrod’s Creek. I noted several stories from what I call the Sillycon Valley “real” news outfits which may suggest a trend. And what is the OMG slay?
Let’s let three examples shape what’s shakin’ in “real” news:
ONE: The write up “Mehdi Hasan Dismantles The Entire Foundation Of The Twitter Files As Matt Taibbi Stumbles To Defend It” makes clear that author Matt Taibbi is not up to the “real” news standards of an online publication called “TechDirt.” The charges are interesting; for instance, “Taibbi shrugs, sighs, and makes it clear he’s totally out of his depth when confronted with facts.” That’s clear: Facts are important.

TWO: A publication with a logo I find minty but at odds with the silly idea of legible typography published “Substack CEO Pushes Back at Elon Musk, Says Twitter Situation Is Very Frustrating.” The article explains that a financially challenged Silicon Valley reinterpretation of old-fashioned magazine publishing called Substack is struggling with the vibe checked outfit Twitter. The article provides examples of some back and forth or what my deceased grandmother called “tit for tat” talk.

THREE: The world-changing owner of Twitter (an old school TikTok) labeled the very sensitive National Public Radio as state sponsored radio. Apart from the fact that NPR runs ads, I suppose the label would annoy some people. However, the old school Fortune Magazine reported that the “real” news outfit Twitter had changed the facts. “Elon Musk Changes NPR’s Twitter Label to Government Funded Media after US State Affiliated Media Draws Heavy Criticism.” said, “Musk is known for being impulsive, and on Friday he tweeted, “I am dumb way more often than I’d like to be.”

Is the trend navel gazing at drip outfits. If one takes each of the publications as outfits which want to capture the spirit of Silicon Valley (oh, please, exclude Fortune Magazine from the Silicon Valley set. The Time Inc. legacy and New York attitude make its stories different, well, sort of.)

I find the uptick in criticism about the ripples in the “real” news pond originating from Sillycon Valley interesting. I am watching for the scrutiny to vibrate in social media. Who knows? Maybe “real” TV will pick up the story? One can hope. Ad hominem, spiteful remarks, and political characterizations — yes, “real” news Sillycon Valley style.

Stephen E Arnold, April 13, 2023

The Chivalric Ideal: Social Media Companies as Jousters or Is It Jesters?

April 12, 2023

Vea4_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

As a dinobaby, my grade school education included some biased, incorrect, yet colorful information about the chivalric idea. The basic idea was that knights were governed by the chivalric social codes. And what are these, pray tell, squire? As I recall Miss Soapes, my seventh grade teacher, the guts included honor, honesty, valor, and loyalty. Scraping away the glittering generalities from the disease-riddled, classist, and violent Middle Ages – the knights followed the precepts of the much-beloved Church, opened doors for ladies, and embodied the characters of Sir Gawain, Lancelot, King Arthur, and a heaping dose of Hector of Troy, Alexander the Great (who by the way figured out pretty quickly that what is today Afghanistan would be tough to conquer), and baloney gathered by Ramon Llull were the way to succeed.

Flash forward to 2023, and it appears that the chivalric ideals are back in vogue. “Google, Meta, Other Social Media Platforms Propose Alliance to Combat Misinformation” explains that social media companies have written a five page “proposal.” The recipient is the Indian Ministry of Electronics and IT. (India is a juicy market for social media outfits not owned by Chinese interests… in theory.)

The article explains that a proposed alliance of outfits like Meta and Google:

will act as a “certification body” that will verify who a “trusted” fact-checker is.

Obviously these social media companies will embrace the chivalric ideals to slay the evils of weaponized, inaccurate, false, and impure information. These companies mount their bejeweled hobby horses and gallop across the digital landscape. The actions evidence honor, loyalty, justice, generosity, prowess, and good manners. Thrilling. Cinematic in scope.

The article says:

Social media platforms already rely on a number of fact checkers. For instance, Meta works with fact-checkers certified by the International Fact-Checking Network (IFCN), which was established in 2015 at the US-based Poynter Institute. Members of IFCN review and rate the accuracy of stories through original reporting, which may include interviewing primary sources, consulting public data and conducting analyses of media, including photos and video. Even though a number of Indian outlets are part of the IFCN network, the government, it is learnt, does not want a network based elsewhere in the world to act on content emanating in the country. It instead wants to build a homegrown network of fact-checkers.

Will these white knights defeat the blackguards who would distort information? But what if the companies slaying the inaccurate factoids are implementing a hidden agenda? What if the companies are themselves manipulating information to gain an unfair advantage over any entity not part of the alliance?

Impossible. These are outfits which uphold the chivalric ideals. Truth, honor, etc., etc.

The historical reality is that chivalry was cooked up by nervous “rulers” in order to control the knights. Remember the phrase “knight errant”?

My hunch is that the alliance may manifest some of the less desirable characteristics of the knights of old; namely, weapons, big horses, and a desire to do what was necessary to win.

Knights, mount your steeds. To battle in a far off land redolent with exotic spices and revenue opportunities. Toot toot.

Stephen E Arnold, April 2023

AI Is Not the Only System That Hallucinates

April 7, 2023

Vea4_thumb_thumbNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

I personally love it when software goes off the deep end. From the early days of “Fatal Error” to the more interesting outputs of a black box AI system, the digital comedy road show delights me.

I read “The Call to Halt ‘Dangerous’ AI Research Ignores a Simple Truth” reminds me that it is not just software which is subject to synapse wonkiness. Consider this statement from the Wired Magazine story:

… there is no magic button that anyone can press that would halt “dangerous” AI research while allowing only the “safe” kind.

Yep, no magic button. No kidding. We have decades of experience with US big technology companies’ behavior to make clear exactly the trajectory of new methods.

I love this statement from Wired Magazine no less:

Instead of halting research, we need to improve transparency and accountability while developing guidelines around the deployment of AI systems. Policy, research, and user-led initiatives along these lines have existed for decades in different sectors, and we already have concrete proposals to work with to address the present risks of AI.

Wired was one of the cheerleaders when it fired up its unreadable pink text with orange headlines in 1993 as I recall. The cheerleading was loud and repetitive.

I would suggest that “simple truth” is in short supply. In my experience, big technology savvy companies will do whatever they can do to corner a market and generate as much money as possible. Lock in, monopolistic behavior, collusion, and other useful tools are available.

Nice try Wired. Transparency is good to consider, but big outfits are not in the let the sun shine in game.

Stephen E Arnold, April 7, 2023


« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta