Is Real News Synthetic?

June 13, 2018

There are new artificial intelligence algorithms being designed to develop new security measures. AI algorithms “learn” when they are fed large datasets to discover patterns, inconsistencies, and other factors. It is harder than one thinks to generate large datasets, so Google has turned to fake…er…synthetic data over real. Valuewalk wrote more about synthetic data in, “Why Facebook Now Uses Synthetic (‘Fake’) Data.”

Facebook recently announced plans to open two new AI labs to develop user security tools and the algorithms would be built on synthetic data. Sergey Nikolenko, a data scientist, complimented the adoption of synthetic data, especially since it would enable progress without hindering user privacy.

“ ‘While fake news has caused problems for Facebook, fake data will help fix those problems,’ said Nikolenko.  ‘In a computing powerhouse like Facebook, where reams of data are generated every day, you want a solution in place that will help you quickly train different AI algorithms to perform different tasks, even if all the training data is.  That’s where synthetic data gets the job done!’ “

One of the biggest difficulties AI developers face is a lack of usable data. In other words, data that is high-quality, task-specific and does not compromise user privacy. Companies like Neuromation nabbed this niche, so they started creating qualifiable data.

Facebook will use the AI tools to fight online harassment, political propaganda from foreign governments, fake news, and various networking tools and apps. This might be the start of better safety protocols protecting users and preventing online bullies.

Perhaps “real news” is synthetic?

Whitney Grace, June 13, 2018

An Upside to Fake Data

February 2, 2018

We never know if “data” are made up or actual factual. Nevertheless, we read “How Fake Data Can Help the Pentagon Track Rogue Weapons.” The main idea from our point of view is predictive analytics which can adapt to that which has not yet happened. We circled this statement from the company with the contract to make “fake” data useful under a US government contract:

IvySys Founder and Chief Executive Officer James DeBardelaben compared the process to repeatedly finding a needle in a haystack, but making both the needle and haystack look different every time. Using real-world data, agencies can only train algorithms to spot threats that already exist, he said, but constantly evolving synthetic datasets can train tools to spot patterns that have yet to occur.

Worth monitoring IvySys at https://www.ivysys.com/.

Stephen E Arnold, February 2, 2018

Enterprise Search: Will Synthetic Hormones Produce a Revenue Winner?

October 27, 2017

One of my colleagues provided me with a copy of the 24 page report with the hefty title:

In Search for Insight 2017. Enterprise Search and Findability Survey. Insights from 2012-2017

I stumbled on the phrase “In Search for Insight 2017.”

image

The report combines survey data with observations about what’s going to make enterprise search great again. I use the word “again” because:

  • The buy up and sell out craziness which culminated with Microsoft’s buying Fast Search & Transfer in 2008 and Hewlett Packard’s purchase of Autonomy in 2011 marked the end of the old-school enterprise search vendors. As you may recall, Fast Search was the subject of a criminal investigation and the HP Autonomy deal continues to make its way through the legal system. You may perceive these two deals as barn burners. I see them as capstones for the era during which search was marketed as the solution to information problems in organizations.
  • The word “search” has become confusing and devalued. For most people, “search” means the Danny Sullivan search engine optimization systems and methods. For those with some experience in information science, “search” means locating relevant information. SEO erodes relevance; the less popular connotation of the word suggests answering a user’s question. Not surprisingly, jargon has been used for many years in an effort to explain that “enterprise search” is infused with taxonomies, ontologies, semantic technologies, clustering, discovery, natural language processing, and other verbal chrome trim to make search into a Next Big Thing again. From my point of view, search is a utility and a code word for spoofing Google so that an irrelevant page appears instead of the answer the user seeks.
  • The enterprise search landscape (the title of one of my monographs) has been bulldozed and reworked. The money in the old school precision and recall type of search comes from consulting. Search Technologies was acquired by Accenture to add services revenue to the management consulting firm’s repertoire of MBA fixes. What is left are companies offering “solutions” which require substantial engineering, consulting, and training services. The “engine”, in many cases, are open source systems which one can download without burdensome license fees. From my point of view, search boils down to picking an open source solution. If those don’t work, one can license a proprietary system wrapped around open source. If one wants a proprietary system, there are some available, but these are not likely to reach the lofty heights of the Fast Search or Autonomy IDOL systems in the salad days of enterprise search and its promises of a universal search system. The universal search outfit Google pulled out of enterprise search for a reason.

I want to highlight five of the points in the 24 page write up. Please, register to get your own copy of this document.

Here are my five highlights. My comments are in italics after each quote from the document:

Read more

Big Data and Its Less-Than-Gentle Lessons

August 1, 2013

I read “9 Big Data Lessons Learned.” The write up is interesting because it explores the buzzword that every azure chip consultant has used in their marketing pitches over the last year. Some true believers have the words Big Data tattooed on their arms like those mixed martial arts fighters sporting the names of casinos. Very attractive I say.

Because “big data” has sucked up search, content processing, and analytics, the term is usually not defined. The “problems” of Big Data are ignored. Since not much works when it comes to search and content processing, use of another undefined term is not particularly surprising. What caught my attention is that Datamation reports about some “lessons” its real journalists have tracked down and verified.

Please, read the entire original write up to get the full nine lessons. I want to highlight three of them:

First, Datamation points out that getting data from Point A to Point B can be tricky. I think that once the data has arrived at Point B, the next task is to get the data into a “Big Data” system. Datamation does not provide any cost information in its statement “Don’t underestimate the data integration challenges.” I would point out that the migration task can be expensive. Real expensive.

Second, Datamation sates, “Big Data success requires scale and speed.” I agree that scale and speed are important. Once again, Datamation does not bring these generalizations down to an accounting person’s desktop. Scale and speed cost money. Often a lot of money. In the analysis I did of “real time” a year or two ago, chopping latency down to a millisecond or two exponentiates the cost of scale and speed. Bandwidth and low latency storage are not sporting WalMart price tags.

Third, Datamation warns (maybe threatens) those with children in school and mortgages with, “If you’re not in the Big Data pool now, the lifespan of your career is shrinking by the day.” A couple of years ago this sentence would have said, “If you’re not in the social media pool now, the lifespan of your career is shrinking by the day.” How long with these all-too-frequent “next big things” sweep through information technology. I just learned that “CIO” means chief innovation officer. I also learned that the future of computing rests with synthetic biology.

The Big Data revolution is here. The problem is that the tools, the expertise, and the computational environment are inadequate for most Big Data problems. Companies with the resources like Google and Microsoft are trimming the data in order to get a handle on what today’s algorithms assert is important. Is it reasonable to think that most organizations can tackle Big Data when large organizations struggle to locate attachments in intra-organization email?

Reality has not hampered efforts to surf on the next big thing. Some waves are more challenging than others, however. I do like the fear angle. Nice touch at a time when senior managers are struggling to keep revenues and profits from drifting down. The hope is that Big Data will shore up products and services which are difficult to sell.

Catch the wave I suppose.

Stephen E Arnold, August 1, 2013

Sponsored by Xenky

Kapow Reinforces It Is a Big Data Platform

July 21, 2013

Short honk: Data integration, like search, is expanding. We noted a news release called “Kapow Software Quarterly Revenue Rises as Newly Acquired Customer Bookings and Subscriptions Fuel Growth.” The news release explains that a privately held firm is growing. The important point for me was this phrase: “a leading Big Data solution provider.”

The news release explains:

The Kapow Enterprise Big Data Integration Platform enables companies to integrate any cloud or on-premise data source using Kapow Software’s patented, intelligent integration workflows and Synthetic APIs™. Once the critical data is found and surgically extracted, Kapow Enterprise 9.2 delivers timely information to the workforce in an easily consumable form called Kapow Kapplets™ through an enterprise app library offering called the Kapow KappZone™. KappZones can be easily branded and distributed for employees to discover and use on any computing device they choose.

The Kapow Web site points out that the company’s business includes:

  • Content integration
  • Content migration
  • Legacy application integration
  • Enterprise search.

The company also offers three aforementioned products: Katalyst, Kapplets, and KappZone. I find this semantic embrace fascinating and indicative of a trend in which vendors pretty much do anything related to information which is, it seems, Big Data.

Stephen E Arnold, July 21, 2013

Sponsored by Xenky

LinkedIn Content Ripple: Possible Wave Amplification

April 19, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Google continues to make headlines. This morning (April 19, 2024) I flicked through the information in my assorted newsreaders. The coverage of Google’s calling the police and have alleged non-Googley professionals chatted up by law enforcement sparked many comments. One of those comments about this most recent demonstration of management mastery was from Dr. Timnit Gebru. My understanding of the Gebru incident is that she called attention to the bias in Google’s smart software systems and methods. She wrote a paper. Big thinkers at Google did not like the paper. The paper appeared, and Dr. Gebru disappeared from the Google payroll. I am have over simplified this remarkable management maneuver, but like some of Google’s synthetic data, I think I am close enough for horse shoes.

image

Is change coming to a social media service which has been quite homogeneous? Thanks, MSFT Copilot. How’s the security work coming?

Dr. Gebru posted a short item on LinkedIn, which is Microsoft’s professional social media service. Here’s what Dr. Gebru made available to LinkedIn’s members:

Not even 24 hrs after making history as the first company to mass fire workers for pro-Palestine protests, by summarily firing 28 people, Google announced that the “(ir)responsible AI org,” the one they created in response to firing me, is now reporting up the Israeli office, through an SVP there. Seems like they want us to know how forcefully and clearly they are backing this genocide.

To provide context, Dr. Gebru linked to a Medium (a begging for dollars information service). That article brandished the title “STATEMENT from Google Workers with the No Tech for Apartheid Campaign on Google’s Mass, Retaliatory Firings of Workers: [sic].” This Medium article is at this link. I am not sure if [a] these stories are going to require registration or payment to view and [b] the items will remain online.

What’s interesting about the Dr. Gebru item and her link is the comments made by LinkedIn members. These suggest that [a] most LinkedIn members either did not see Dr. Gebru’s post or were not motivated go click one of the “response” icons or [b] topics like Google’s management mastery are not popular with the LinkedIn audience.

Several observations based on my experience:

  1. Dr. Gebru’s use of LinkedIn may be a one-time shot, but on the other hand, it might provide ideas for others with a specific point of view to use as a platform
  2. With Apple’s willingness to remove Meta apps from the Chinese iPhone app store, will LinkedIn follow with its own filtering of content? I don’t know the answer to the question, but clicking on Dr. Gebru’s link will make it easy to track
  3. Will LinkedIn begin to experience greater pressure to allow content not related to self promotion and look for business contacts? I have noticed an uptick in requests from what appear to be machine-generated images preponderately young females asking, “Will you be my contact?” I routinely click, No, and I often add a comment along the lines of “I am 80 years old. Why do you want to interact with me?”

Net net: Change may be poised to test some of the professional social media service’s policies.

Stephen E Arnold, March 19, 2024

Harvard University: William James Continues Spinning in His Grave

March 15, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

William James, the brother of a novelist which caused my mind to wander just thinking about any one of his 20 novels, loved Harvard University. In a speech at Stanford University, he admitted his untoward affection. If one wanders by William’s grave in Cambridge Cemetery (daylight only, please), one can hear a sound similar to a giant sawmill blade emanating from the a modest tombstone. “What’s that horrific sound?” a by passer might ask. The answer: “William is spinning in his grave. It a bit like a perpetual motion machine now,” one elderly person says. “And it is getting louder.”

image

William is spinning in his grave because his beloved Harvard appears to foster making stuff up. Thanks, MSFT Copilot. Working on security today or just getting printers to work?

William is amping up his RPMs. Another distinguished Harvard expert, professor, shaper of the minds of young men and women and thems has been caught fabricating data. This is not the overt synthetic data shop at Stanford University’s Artificial Intelligence Lab and the commercial outfit Snorkel. Nope. This is just a faculty member who, by golly, wanted to be respected it seems.

The Chronicle of Higher Education (the immensely popular online information service consumed by thumb typers and swipers) published “Here’s the Unsealed Report Showing How Harvard Concluded That a Dishonesty Expert Committed Misconduct.” (Registration required because, you know, information about education is sensitive and users must be monitored.) The report allegedly required 1,300 pages. I did not read it. I get the drift: Another esteemed scholar just made stuff up. In my lingo, the individual shaped reality to support her / its vision of self. Reality was not delivering honor, praise, rewards, money, and freedom from teaching horrific undergraduate classes. Why not take the Excel macro to achievement: Invent and massage information. Who is going to know?

The write up says:

the committee wrote that “she does not provide any evidence of [research assistant] error that we find persuasive in explaining the major anomalies and discrepancies.” Over all, the committee determined “by a preponderance of the evidence” that Gino “significantly departed from accepted practices of the relevant research community and committed research misconduct intentionally, knowingly, or recklessly” for five alleged instances of misconduct across the four papers. The committee’s findings were unanimous, except for in one instance. For the 2012 paper about signing a form at the top, Gino was alleged to have falsified or fabricated the results for one study by removing or altering descriptions of the study procedures from drafts of the manuscript submitted for publication, thus misrepresenting the procedures in the final version. Gino acknowledged that there could have been an honest error on her part. One committee member felt that the “burden of proof” was not met while the two other members believed that research misconduct had, in fact, been committed.

Hey, William, let’s hook you up to a power test dynamometer so we can determine exactly how fast you are spinning in your chill, dank abode. Of course, if the data don’t reveal high-RPM spinning, someone at Harvard can be enlisted to touch up the data. Everyone seems to be doing from my vantage point in rural Kentucky.

Is there a way to harness the energy of professors who may cut corners and respected but deceased scholars to do something constructive? Oh, look. There’s a protest group. Let’s go ask them for some ideas. On second thought… let’s not.

Stephen E Arnold, March 15, 2024

Stanford: Tech Reinventing Higher Education: I Would Hope So

March 15, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I read “How Technology Is Reinventing Education.” Essays like this one are quite amusing. The ideas flow without important context. Let’s look at this passage:

“Technology is a game-changer for education – it offers the prospect of universal access to high-quality learning experiences, and it creates fundamentally new ways of teaching,” said Dan Schwartz, dean of Stanford Graduate School of Education (GSE), who is also a professor of educational technology at the GSE and faculty director of the Stanford Accelerator for Learning. “But there are a lot of ways we teach that aren’t great, and a big fear with AI in particular is that we just get more efficient at teaching badly. This is a moment to pay attention, to do things differently.”

imageI

A university expert explains to a rapt audience that technology will make them healthy, wealthy, and wise. Well, that’s the what the marketing copy which the lecturer recites. Thanks, MSFT Copilot. Are you security safe today? Oh, that’s too bad.

I would suggest that Stanford’s Graduate School of Education consider these probably unimportant points:

  • The president of Stanford University resigned allegedly because he fudged some data in peer-reviewed documents. True or false. Does it matter? The fellow quit.
  • The Stanford Artificial Intelligence Lab or SAIL innovated with cooking up synthetic data. Not only was synthetic data the fast food of those looking for cheap and easy AI training data, Stanford became super glued to the fake data movement which may be good or it may be bad. Hallucinating is easier if the models are training using fake information perhaps?
  • Stanford University produced some outstanding leaders in the high technology “space.” The contributions of famous graduates have delivered social media, shaped advertising systems, and interesting intelware companies which dabble in warfighting and saving lives from one versatile software and consulting platform.

The essay operates in smarter-than-you territory. It presents a view of the world which seems to be at odds with research results which are not reproducible, ethics-free researchers, and an awareness of how silly it looks to someone in rural Kentucky to have a president accused of pulling a grade-school essay cheating trick.

Enough pontification. How about some progress in remediating certain interesting consequences of Stanford faculty and graduates innovations?

Stephen E Arnold, March 15, 2024

Is AI Another VisiCalc Moment?

February 14, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The easy-to-spot orange newspaper ran a quite interesting “essay” called “What the Birth of the Spreadsheet Can Teach Us about Generative AI.” Let me cut to the point when the fox is killed. AI is likely to be a job creator. AI has arrived at “the right time.” The benefits of smart software are obvious to a growing number of people. An entrepreneur will figure out a way to sell an AI gizmo that is easy to use, fast, and good enough.

In general, I agree. There is one point that the estimable orange newspaper chose not to include. The VisiCalc innovation converted old-fashioned ledger paper into software which could eliminate manual grunt work to some degree. The poster child of the next technology boom seems tailor-made to facilitate surveillance, weapons, and development of novel bio-agents.

image

AI is going to surprise some people more than others. Thanks, MSFT Copilot Bing thing. Not good but I gave up with the prompts to get a cartoon because you want to do illustrations. Sigh.

I know that spreadsheets are used by defense contractors, but the link between a spreadsheet and an AI-powered drone equipped with octanitrocubane variants is less direct. Sure, spreadsheets arrived in numerous use cases, some obvious, some not. But the capabilities for enabling a range of weapons systems strike me as far more obvious.

The Financial Times’s essay states:

Looking at the way spreadsheets are used today certainly suggests a warning. They are endlessly misused by people who are not accountants and are not using the careful error-checking protocols built into accountancy for centuries. Famous economists using Excel simply failed to select the right cells for analysis. An investment bank used the wrong formula in a risk calculation, accidentally doubling the level of allowable risk-taking. Biologists have been typing the names of genes, only to have Excel autocorrect those names into dates. When a tool is ubiquitous, and convenient, we kludge our way through without really understanding what the tool is doing or why. And that, as a parallel for generative AI, is alarmingly on the nose.

Smart software, however, is not a new thing. One can participate in quasi-religious disputes about whether AI is 20, 30, 40, or more years old. What’s interesting to me is that after chugging along like a mule cart on the Information Superhighway, AI is everywhere. Old-school British newspapers like it to the spreadsheet. Entrepreneurs spend big bucks on Product Hunt roll outs. Owners of mobile devices can locate “pizza near me” without having to type, speak, or express an interest in a cardiologist’s favorite snack.

AI strikes me as a different breed of technology cat. Here are my reasons:

  1. Serious AI takes serious money.
  2. Big AI is going to be a cloud-linked service which invites consolidation just like those hundreds of US railroads became the glorious two player system we have today: One for freight and one for passengers who love trains more than flying or driving.
  3. AI systems are going to have to find a way to survive and thrive without becoming victims of content inbreeding and bizarre outputs fueled by synthetic data. VisiCalc spawned spreadsheet fever in humans from the outset. The difference is that AI does its work largely without humanoids.

Net net: The spreadsheet looks like a convenient metaphor. But metaphors are not the reality. Reality can surprise in interesting ways.

Stephen E Arnold, February 14, 2024

AI: Big Ideas and Bigger Challenges for the Next Quarter Century. Maybe, Maybe Not

February 13, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I read an interesting ArXiv.org paper with a good title: “Ten Hard Problems in Artificial Intelligence We Must Get Right.” The topic is one which will interest some policy makers, a number of AI researchers, and the “experts” in machine learning, artificial intelligence, and smart software.

The structure of the paper is, in my opinion, a three-legged stool analysis designed to support the weight of AI optimists. The first part of the paper is a compressed historical review of the AI journey. Diagrams, tables, and charts capture the direction in which AI “deep learning” has traveled. I am no expert in what has become the next big thing, but the surprising point in the historical review is that 2010 is the date pegged as the start to the 2016 time point called “the large scale era.” That label is interesting for two reasons. First, I recall that some intelware vendors were in the AI game before 2010. And, second, the use of the phrase “large scale” defines a reality in which small outfits are unlikely to succeed without massive amounts of money.

The second leg of the stool is the identification of the “hard problems” and a discussion of each. Research data and illustrations bring each problem to the reader’s attention. I don’t want to get snagged in the plagiarism swamp which has captured many academics, wives of billionaires, and a few journalists. My approach will be to boil down the 10 problems to a short phrase and a reminder to you, gentle reader, that you should read the paper yourself. Here is my version of the 10 “hard problems” which the authors seem to suggest will be or must be solved in 25 years:

  1. Humans will have extended AI by 2050
  2. Humans will have solved problems associated with AI safety, capability, and output accuracy
  3. AI systems will be safe, controlled, and aligned by 2050
  4. AI will make contributions in many fields; for example, mathematics by 2050
  5. AI’s economic impact will be managed effectively by 2050
  6. Use of AI will be globalized by 2050
  7. AI will be used in a responsible way by 2050
  8. Risks associated with AI will be managed by effectively by 2050
  9. Humans will have adapted its institutions to AI by 2050
  10. Humans will have addressed what it means to be “human” by 2050

Many years ago I worked for a blue-chip consulting firm. I participated in a number of big-idea projects. These ranged from technology, R&D investment, new product development, and the global economy. In our for-fee reports were did include a look at what we called the “horizon.” The firm had its own typographical signature for this portion of a report. I recall learning in the firm’s “charm school” (a special training program to make sure new hires knew the style, approach, and ground rules for remaining employed at that blue-chip firm). We kept the horizon tight; that is, talking about the future was typically in the six to 12 month range. Nosing out 25 years was a walk into a mine field. My boss, as I recall told me, “We don’t do science fiction.”

2 10 robot and person

The smart robot is informing the philosopher that he is free to find his future elsewhere. The date of the image is 2025, right before the new year holiday. Thanks, MidJourney. Good enough.

The third leg of the stool is the academic impedimenta. To be specific, the paper is 90 pages in length of which 30 present the argument. The remain 60 pages present:

  • Traditional footnotes, about 35 pages containing 607 citations
  • An “Electronic Supplement” presenting eight pages of annexes with text, charts, and graphs
  • Footnotes to the “Electronic Supplement” requiring another 10 pages for the additional 174 footnotes.

I want to offer several observations, and I do not want to have these be less than constructive or in any way what one of my professors who was treated harshly in Letters to the Editor for an article he published about Chaucer. He described that fateful letter as “mean spirited.”

  1. The paper makes clear that mankind has some work to do in the next 25 years. The “problems” the paper presents are difficult ones because they touch upon the fabric of social existence. Consider the application of AI to war. I think this aspect of AI may be one to warrant a bullet on AI’s hit parade.
  2. Humans have to resolve issues of automated systems consuming verifiable information, synthetic data, and purpose-built disinformation so that smart software does not do things at speed and behind the scenes. Do those working do resolve the 10 challenges have an ethical compass and if so, what does “ethics” mean in the context of at-scale AI?
  3. Social institutions are under stress. A number of organizations and nation-states operate as dictators. One central American country has a rock star dictator, but what about the rock star dictators working techno feudal companies in the US? What governance structures will be crafted by 2050 to shape today’s technology juggernaut?

To sum up, I think the authors have tackled a difficult problem. I commend their effort. My thought is that any message of optimism about AI is likely to be hard pressed to point to one of the 10 challenges and and say, “We have this covered.” I liked the write up. I think college students tasked with writing about the social implications of AI will find the paper useful. It provides much of the research a fresh young mind requires to write a paper, possibly a thesis. For me, the paper is a reminder of the disconnect between applied technology and the appallingly inefficient, convenience-embracing humans who are ensnared in the smart software.

I am a dinobaby, and let me you, “I am glad I am old.” With AI struggling with go-fast and regulators waffling about go-slow, humankind has quite a bit of social system tinkering to do by 2050 if the authors of the paper have analyzed AI correctly. Yep, I am delighted I am old, really old.

Stephen E Arnold, February 13, 2024

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta