SQL Made Easy: Better Than a Human? In Some Cases

January 9, 2023

Just a short item for anyone who has to formulate Structured Query Language queries. Years ago, SQL queries were a routine for my research team. Today, the need has decreased. I have noticed that my recollection and muscle memory for SQL queries have eroded. Now there is a solution which seems to work reasonably well. Is the smart software as skilled as our precious Howard? Nope. But Howard lives in DC, and I am in rural Kentucky. Since neither of us like email or telephones, communicate via links to data available for download and analysis. Hey, the approach works for us. But SQL queries. Just navigate to TEXT2SQL.AI. Once you sign in using one of the popular privacy invasion methods, you can enter a free text statement and get a well formed SQL query. Is the service useful? It may be. The downside is the overt data collection approach.

Stephen E Arnold, January 9, 2023

Confessions? It Is That Time of Year

December 23, 2022

Forget St. Augustine.

Big data, data science, or whatever you want to call is was the precursor to artificial intelligence. Tech people pursued careers in the field, but after the synergy and hype wore off the real work began. According to WD in his RYX,R blog post: “Goodbye, Data Science,” the work is tedious, low-value, unwilling, and left little room for career growth.

WD worked as a data scientist for a few years, then quit in pursuit of the higher calling as a data engineer. He will be working on the implementation of data science instead of its origins. He explained why he left in four points:

• “The work is downstream of engineering, product, and office politics, meaning the work was only often as good as the weakest link in that chain.

• Nobody knew or even cared what the difference was between good and bad data science work. Meaning you could suck at your job or be incredible at it and you’d get nearly the same regards in either case.

• The work was often very low value-add to the business (often compensating for incompetence up the management chain).

• When the work’s value-add exceeded the labor costs, it was often personally unfulfilling (e.g. tuning a parameter to make the business extra money).”

WD’s experiences sound like everyone who is disenchanted with their line of work. He worked with managers who would not listen when they were told stupid projects would fail. The managers were more concerned with keeping their bosses and shareholders happy. He also mentioned that engineers are inflamed with self-grandeur and scientists are bad at code. He worked with young and older data people who did not know what they were doing.

As a data engineer, WD has more free time, more autonomy, better career advancements, and will continue to learn.

Whitney Grace, December 23, 2022

The Internet: Cue the Music. Hit It, Regrets, I Have Had a Few

December 21, 2022

I have been around online for a few years. I know some folks who were involved in creating what is called “the Internet.” I watched one of these luminaries unbutton his shirt and display a tee with the message, “TCP on everything.” Cute, cute, indeed. (I had the task of introducing this individual only to watch the disrobing and the P on everything joke. Tip: It was not a joke.)

Imagine my reaction when I read “Inventor of the World Wide Web Wants Us to Reclaim Our Data from Tech Giants.” The write up states:

…in an era of growing concern over privacy, he believes it’s time for us to reclaim our personal data.

Who wants this? Tim Berners-Lee and a startup. Content marketing or a sincere effort to derail the core functionality of ad trackers, beacons, cookies which expire in 99 years, etc., etc.

The article reports:

Berners-Lee hopes his platform will give control back to internet users. “I think the public has been concerned about privacy — the fact that these platforms have a huge amount of data, and they abuse it,” he says. “But I think what they’re missing sometimes is the lack of empowerment. You need to get back to a situation where you have autonomy, you have control of all your data.”

The idea is that Web 3 will deliver a different reality.

Do you remember this lyric:

Yes, there were times I’m sure you knew
When I bit off more than I could chew
But through it all, when there was doubt
I ate it up and spit it out
I faced it all and I stood tall and did it my way.

The my becomes big tech, and it is the information highway. There’s no exit, no turnaround, and no real chance of change before I log off for the final time.

Yeah, digital regrets. How’s that working out at Amazon, Facebook, Google, Twitter, and Microsoft among others? Unintended consequences and now the visionaries are standing tall on piles of money and data.

Change? Sure, right away.

Stephen E Arnold, December 21, 2022

Google Did What? Misleading Users? Google!

November 15, 2022

In the midst of an economic downturn, most businesses try to avoid: [a] bad publicity regarding a sensitive issue and [b] paying lots of cash to US states. I suppose I could add [c] buying Twitter and [d] funding the metaverse, but let’s stick to the information in “Google Will Pay $392m to 40 States in Largest Ever US Privacy Settlement.”

For a big outfit like the Google my thought is that the negative publicity is more painful than writing checks. But advertisers are affected by the economic downturn and may be looking for ways to make sales without cutting deals with companies found guilt of user/customer surveillance.

The write up, which I assume is mostly on the money, says:

The states’ investigation was sparked by a 2018 Associated Press story, which found that Google continued to track people’s location data even after they opted out of such tracking by disabling a feature the company called “location history”.

The article points out:

It [the penalty] comes at a time of mounting unease over privacy and surveillance by tech companies that has drawn growing outrage from politicians and scrutiny by regulators.

Free services are great as long as users/customers don’t know exactly what’s happening. In the early days of the Google, there was not a generation interested in dinobaby ideas. Well, this decision suggests that some dinobabies with law degrees expect commercial enterprises to act with some sense of propriety.

The article makes clear exactly what Google did:

The attorneys general said Google misled users about its location tracking practices since at least 2014, violating state consumer protection laws. As part of the settlement, Google also agreed to make those practices more transparent to users. That includes showing them more information when they turn location account settings on and off and keeping a webpage that gives users information about the data Google collects.

Hmmm. What about targeted ads which miss their targets? Perhaps that’s an issue which will capture the attention of US attorneys general? Perhaps, but I am not optimistic. Awareness and subsequent legal processes move slowly, and slow is the friend of some firms.

Stephen E Arnold, November 15, 2022

One Tiny Point about Oracle

September 5, 2022

When silicon valley-type real news outfits “correct” one another, we tend to wonder why. In this case, it appears Gizmodo writer Matt Novak feels readers should know one key bit of information omitted by a recent Vox article: the fact that “Larry Ellison’s Oracle Started As a CIA Project.” He writes:

“Vox simply says that Oracle was founded in ‘the late 1970s’ and ‘sells a line of software products that help large and medium-sized companies manage their operations.’ All of which is true! But as the article continues, it somehow ignores the fact that Oracle has always been a significant player in the national security industry. And that its founder would not have made his billions without helping to build the tools of our modern surveillance state.”

One of those tools, of course, being the sort of database Oracle specializes in. The write-up emphasizes Ellison’s longstanding belief in a large federal database, asserting the attacks of 9/11 gave the tech tycoon the chance to push his vision. Novak quotes:

“‘The single greatest step we Americans could take to make life tougher for terrorists would be to ensure that all the information in myriad government databases was copied into a single, comprehensive national security database,’ Larry Ellison wrote in the New York Times in January of 2002. ‘Creating such a database is technically simple. All we have to do is copy information from the hundreds of separate law enforcement databases into a single database. A national security database could be built in a few months,’ Ellison explained. ‘A national security database combined with biometrics, thumb prints, hand prints, iris scans or whatever is best can be used to detect people with false identities.'”

We are not sure whether Novak is suggesting Vox deliberately downplayed Oracle’s role in facilitating a surveillance state infrastructure. He certainly wants us to know the company’s fortunes rose after that fateful day in September 2001, with federal government contracts making up 23 percent if its licensing revenue in 2003 to the tune of $2.5 billion. We are reminded Oracle’s David Carney stated in 2002, while trying but failing to avoid sounding callous, that 9/11 had been good for business. Perhaps Vox did not believe this facet of Oracle’s history to be relevant, but Gizmodo can consider us, dear readers, duly informed.

Cynthia Murrell, September 5, 2022

Data: A Disappointing Ride Down Zero Lane to Cell One

August 26, 2022

Projects meant to glean business insights through the analysis of vast troves of data still tend to disappoint. On its blog, British data-project management firm Brijj lists “5 Reasons Why 80% of Data and Insight Projects Fail.” The write-up tells us:

“In the UK alone, we spend £24bn on data projects every year. According to recent studies, however, organizational leadership has been dissatisfied with the value they get from data. In fact, they consider 80% of all data projects a failure. That equates to £19bn of waste. And why? Because so many don’t do the basics well. They never stood a chance.”

Not surprisingly, writer and Brijj founder/CEO Adrian Mitchell suggests consulting outside data experts from the start to make sure one’s project delivers those sweet, sweet insights:

“The bottom line is that both data creators and their business customers need to be involved in the data & insight project from the initial question through to the outcome and work closely together for it to provide actionable insights and urge action. Currently, there are many gaps between the two groups, resulting in disconnect, frustrations, time and financial losses, and no real-world outcomes. Organizations need to close these to truly harness the power of data and maximize its value.”

The list Mitchell offers looks awfully familiar; we think we have heard some of these “reasonsbefore. We are told the biggest problem is asking the wrong questions in the first place. Then there is, as mentioned above, a lack of collaboration between data analysts and their clients. If one has managed to gather useful bits of knowledge, they must be both communicated to the right people and made easy to find. Finally, standardized systems (like Brijj’s, we presume) should be put in place to make the whole process easier for the technically disinclined.

Perhaps Mitchell is right and these measures can help some companies make the most of the data they were persuaded to accumulate? It is worth keeping in mind, though, that any concepts derived by software have limitations… just like a blind data.

Cynthia Murrell, August 26, 2022

Quality Defined: Just Two Ways?

August 25, 2022

I am not sure what to make of “The Two Types of Quality.” A number of years ago I was in Osaka and Tokyo to deliver several lectures about commercial databases. The topic had to be narrowed, so I focused on the differences between a commercially successful database like those produced by the Courier Journal & the Louisville Times, the Petroleum Institute, and ERIC, among a handful of other must-have professional-operated databases. I explained that database quality could be defined by technical requirements; for example, timeliness of record updating, assigning a specific number of index terms from a subject matter expert developed and maintained controlled vocabulary (term) list, accurate spelling, abstracts conforming to the database publishers’ editorial guidelines, etc. The other type of quality was determined by the user; for example, was the information provided by the database timely, accurate, and in line with the scope of the database. Neither definition of quality was particularly good. I made this point in my lectures. Quality is like any abstract noun. Context defines it. Today quality means, as I was told  after a lecture in Germany, “good enough.” I thought the serious young person was joking. He was not. This professional, who worked for an important European Union department, embraced “good enough” as the definition of quality.

The cited essay explains that there are two types of quality. The first is “purely functional.” I think that’s close to my definition of quality for old-fashioned databases. There were expensive to produce, difficult to index in a useful, systematic way even with our human plus smart software systems, and quite difficult to explain to a person unfamiliar with the difference between looking up something using Google and looking up a chemical structure in Chemical Abstracts. When I was working full time, I had a tough time explaining that Google was neither comprehensive nor focused on consistency. Google wanted to sell ads. Popularity determined quality, but that’s not what “quality” means to a person researching a case in a commercial database of legal information.

The second is “quality that fascinates.” I must admit that this is related to my notion of context, but I am not sure that “fascination” is exactly what I mean by context. A ball of string can fascinate the cat owner as well as the cat. Is this quality? Not in my book.

Several observations:

  1. Quality cannot be defined. I do believe that a company, its products, and an individual can produce objects or services that serve a purpose and do not anger the user. Well, maybe not anger. How about annoy, frustrate, stress, or force a product or service change. It is also my perception that quality is becoming a lost art like chipping stone arrowheads.
  2. The word “quality” can be defined in terms of cost cutting. I use products and services that are not without big flaws. Whether it is getting Microsoft Windows to print or preventing a Tesla from exploding into flames, short cuts seem to create issues. These folks are not angels in my opinion.
  3. The marketers, many of those whom I met were former journalists or art history majors, explain quality and other abstract terms in a way which obfuscates the verifiable behavior of a product or service. These folks are mendacious in my opinion.

Net net: Quality now means good enough.

That’s why nothing seems to work: Airport luggage handling, medical treatments of a president, electric vehicles, contacting a local government agency about a deer killed by a rolling smoke pickup truck driver, etc.

Quality products and services exist. Is it possible to find these using Bing, Google, or Yandex?

Nope.

Stephen E Arnold, August 25, 2022

TikTok: Allegations of Keylogging

August 22, 2022

I am not a TikTok person; therefore, I exist in a trend free zone. Others are sucking down short videos with alacrity. I admire a company, possibly linked to China’s government, which has pioneered a next generation video editor and caused the Alphabet Google YouTube DeepMind thing to innovate via its signature “me too” method of innovation.

Now TikTok has another feature, which is an interesting allegation. “TikTok’s In-App Browser Can Monitor Your Every Click and Keystroke” asserts:

When Krause [a security researcher] dug a little deeper into what these apps’ in-app browsers really do, he’d found that TikTok does some bad things, including monitoring all of users’ keyboard inputs and taps. So, if you open a web page inside of TikTok’s app, and enter your credit card details there, TikTok can access all of those details. TikTok is also the only app, out of all the apps Krause has looked into, that doesn’t even offer an option to open the link in the device’s default browser, forcing you to go through its own in-app browser.

Let’s assume this finding is spot on. First question: Does anyone care? Second question: So what?

I don’t have answers to either question. I do, however, have several observations:

  1. Oracle, for some reason, seems to care. The estimable database company is making an effort to find information that suggests TikTok data are kept in a cupboard. Only grandma can check out who will be an easy target for psychological manipulation. No results yet, but if TikTok is a neutral service, why’s Oracle involved?
  2. A number of Silicon Valley pundits have pointed out that TikTok is no big deal. That encapsulates the “so what” issue. “Put that head in the sand and opine forward” is the rule of thumb for these insightful folks.
  3. Keyloggers are a fave of certain actors. TikTok may have found them useful for benign purposes.

Quite an allegation.

Stephen E Arnold, August 22, 2022

The Expanding PR Challenge for Cyber Threat Intelligence Outfits

August 10, 2022

Companies engaged in providing specialized services to law enforcement and intelligence entities have to find a way to surf on the building wave of NSO Group  backlash.

What do I mean?

With the interest real journalists have in specialized software and services has come more scrutiny from journalists, financial analysts, and outfits like Citizens Lab.

The most recent example is the article which appeared in an online publication focused on gadgets. The write up is “: These Companies Know When You’re Pregnant—And They’re Not Keeping It Secret. Gizmodo Identified 32 Brokers Selling Data on 2.9 Billion Profiles of U.S. Residents Pegged as Actively Pregnant or Shopping for Maternity Products.” The write up reports:

A Gizmodo investigation into some of the nation’s biggest data brokers found more than two dozen promoting access to datasets containing digital information on millions of pregnant and potentially pregnant people across the country. At least one of those companies also offered a large catalogue of people who were using the same sorts of birth control that’s being targeted by more restrictive states right now. In total, Gizmodo identified 32 different brokers across the U.S. selling access to the unique mobile IDs from some 2.9 billion profiles of people pegged as “actively pregnant” or “shopping for maternity products.” Also on the market: data on 478 million customer profiles labeled “interested in pregnancy” or “intending to become pregnant.”

To add some zest to the write up, the “real news” outfit provided a link to 32 companies allegedly engaged in such data aggregation, normalization, and provision. Here are the 32 companies available from the gadget blogs link. Note sic means this is the actual company name. The trendy means very hip marketing.

123Push
Adprime Health
Adstra
Alike Audience
Anteriad (180byTwo)
Cross Pixel
Datastream Group
Dstillery (sic and trendy)
Epsilon
Experian
Eyeota (sic and trendy)
FieldTest
Fluent
Fyllo (sic)
LBDigital
Lighthouse (Ameribase Digital)
PurpleLab
Quotient
Reklaim (sic)
ShareThis
Skydeo
Stirista (Crosswalk) (sic)
TrueData
Valassis Digital
Weborama Inc
Ziff Davis
ZoomInfo (Clickagy)

How many of these do you recognize? Perhaps Experian, usually associated with pristine security practices and credit checks? What about Ziff Davis, the outfit which publishes blogs which reveal the inner workings of Microsoft and a number of other “insider” information? Or Zoom Info, an outfit once focused on executive information and now apparently identified as a source of information to make a pregnant teen fear the “parent talk”?

But the others? Most people won’t have a clue? Now keep in mind these are companies in the consumer information database business. Are there other firms with more imaginative sources of personal data than outfits poking around open source datasets, marketing companies with helpful log file data, and blossoming data scientists gathering information from retail outlets?

The answer is, “Yes, there are.”

That brings me to the building wave of NSO Group backlash. How does one bridge the gap between a government agency using NSO Group type tools and data?

The answer is that specialized software and services firms themselves are the building blocks, engineer-constructors, and architect-engineers of these important bridges.

So what’s the PR problem?

Each week interesting items of information surface. For example, cyber threat firms report new digital exploits. I read this morning about Cerebrate’s Redeemer. What’s interesting is that cyber threat firms provide software and services to block such malware, right? So the new threat appears to evade existing defense mechanisms. Isn’t this a circular proposition: Buy more cyber security. Learn about new threats. Ignore the fact that existing systems do not prevent the malware from scoring a home run? Iterate… iterate… iterate.

At some point, a “real news” outfit will identify the low profile engineers engaged in what might be called “flawed bridge engineering.”

Another PR problem is latent. People like the Kardashians are grousing about Instagram. What happens when influencers and maybe some intrepid “real journalists” push back against the firms collecting personal information very few people think of as enormously revelatory. Example: Who has purchased a “weapon” within a certain geofence? Or who has outfitted an RV with a mobile Internet rig? Or who has signed up for a Dark Web forum and accessed it with a made up user name?

Who provides these interesting data types?

The gadget blog is fixated on pregnancy because of the current news magnetism. Unfortunately the pursuit of clicks with what seems really significant does not provide much insight into the third party data businesses in the US, Israel, and other countries.

That’s the looming PR problem. Someone is going to step back and take a look at companies which do not want to become the subject of a gadget blog write up with a 30 plus word headline. In my opinion, that will happen, and that’s the reason certain third party data providers and specialized software and services firms face a crisis. These organizations have to sell to survive, except for a handful supported by their countries’ governments. If that marketing becomes too visible, then the gadget bloggers will out them.

What’s it mean when a cyber threat company hires a former mainstream media personality to bolster the company’s marketing efforts? I have some thoughts. Mine are colored by great sensitivity to the NSO Group and the allegations about its Pegasus specialized software. If these allegations are true, what better way to get personal data than suck it directly from a single target’s or group of targets’ mobile devices in real time?

Here are the chemical compounds in the data lab: The NSO Group-type technology which is increasingly understood and replicated. Gadget bloggers poking around data aggregators chasing ad and marketing service firms. Cyber threat companies trying to market themselves without being too visible.

The building wave is on the horizon, just moving slowly.

Stephen E Arnold, August 10, 2022

Oracle: Marketing Experience or MX = Zero?

August 10, 2022

How does one solve the problem MX = 0? One way is to set M to zero and X to zero and bingo! You have zero. If the information in the super select, restricted, juicy article called “Oracle Insiders Describe the Complete Chaos from Layoffs and Restructuring While Employees Brace for More” is accurate, the financially lucrative Oracle database system is unhappy with the firm’s marketing. Not just the snappy PowerPoint decks or the obedient database administrator documentation. Nope. Everything is apparently a bit of indigestion.

The write up which is as I have mentioned is super selected, restricted, and juicy is a bit jumbled. Nevertheless, I noted several observations I found interesting. Let me summarize the 1,100 word report this way: Lots of people from marketing and customer experience (whatever that is) have been fired. Okay. Now let’s look at the comments that struck me as significant. Keep in mind that I love Oracle. Yep, clients just pay those who can make the sleek, efficient, tightly integrated components hum like an electric motor on a fully functioning Ford F 150 Lightning. Here we go. (My comments appear in italics after each bullet.)

  • “The common verb to describe ACX is that they were obliterated,” said a person who works at Oracle. (I quite liked the use of the word “obliterated.” Was Oracle using a Predator launched flying ginsu management bomb or just an email or maybe a Zoom call?)
  • “There’s no marketing anymore…” (My question is, “Was there ever any marketing at Oracle?” Bombast, yes. Rah rah conferences. Jet flights after curfew at the San Jose airport. But marketing? In my opinion, no.)
  • “There’s a sense among many at Oracle of impending doom…” (Yep, upbeat stuff.)
  • “We’ve been kind of working like zombies the last couple of weeks because there’s just this sense of ‘What am I doing here?” (The outfit on the former Sea World exit excels at management. Well, maybe it doesn’t? How does the Oracle hit above its weight? That’s a good question. Let’s ask Cerner about the electronic medical record business and its seamless functioning with the Oracle database, shall I? No I shall not.)
  • “…Oracle’s code base is so complicated that it can take years before engineers are fully up to speed with how everything works, and workers with over a decade of experience were cut…” (Ah, ha, Oracle is weeding out the dinobabies. Useless deadwood. A 20 something engineer can figure out where an entire database is hiding.)

Net net: I hate to suggest this, but perhaps some database types think using AWS, the GOOG, or the super secure MSFT data management systems is better, faster, and cheaper. Pick two.

Stephen E Arnold, August 10, 2022

Next Page »

  • Archives

  • Recent Posts

  • Meta