Datasette: Useful Tool for Crime Analysts
February 15, 2023
If you want to explore data sets, you may want to take a look at the “open source multi-tool for exploring and publishing data.” The Datasette Swiss Army knife “is a tool for exploring and publishing data.”
The company says,
It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API. Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world. It is part of a wider ecosystem of 42 tools and 110 plugins dedicated to making working with structured data as productive as possible.
A handful of demos are available. Worth a look.
Stephen E Arnold, February 15, 2023
Summarize for a Living: Should You Consider a New Career?
February 13, 2023
In the pre-historic age of commercial databases, many people earned money by reading and summarizing articles, documents, monographs, and consultant reports. In order to prepare and fact check a 130 word summary of an article in the Harvard Business Review in 1980, the cost to the database publisher worked out to something like $17 to $25 per summary for what I would call general business information. (If you want more information about this number, write benkent2020@yahoo.com, and maybe you will get more color on the number.) Flash forward to the present, the cost for a human to summarize an article in the Harvard Business Review has increased. That’s why it is necessary to pay to access and print an abstract from a commercial online service. Even with yesterday’s technology, the costs are a killer. Now you know why software that eliminates the human, the editorial review, the fact checking, and the editorial policies which define what is indexed, why, and what terms are assigned is a joke to many of those in the search and retrieval game.
I mention this because if you are in the A&I business (abstracting and indexing), you may want to take a look at HunKimForks ChatGPT Arxiv Extension. The idea is that ChatGPT can generate an abstract which is certainly less fraught with cost and management hassles than running one of the commercial database content generation systems dependent on humans, some with degrees in chemistry, law, or medicine.
Are the summaries any good? For the last 40 years abstracts and summaries have been, in my opinion, degrading. Fact checking is out the window along with editorial policies, style guidelines, and considered discussion of index terms, classification codes, time handling and signifying, among other, useful knowledge attributes.
Three observations:
- Commercial database publishers may want to check out this early-days open source contribution
- Those engaged in abstracting, writing summaries of books, and generating distillations of turgid government documents (yep, blue chip consulting firms I an thinking of you) may want to think about their future
- Say “hello” to increasingly inaccurate outputs from smart software. Recursion and liquid algorithms are not into factual stuff.
Stephen E Arnold, February 13, 2023
SQL Made Easy: Better Than a Human? In Some Cases
January 9, 2023
Just a short item for anyone who has to formulate Structured Query Language queries. Years ago, SQL queries were a routine for my research team. Today, the need has decreased. I have noticed that my recollection and muscle memory for SQL queries have eroded. Now there is a solution which seems to work reasonably well. Is the smart software as skilled as our precious Howard? Nope. But Howard lives in DC, and I am in rural Kentucky. Since neither of us like email or telephones, communicate via links to data available for download and analysis. Hey, the approach works for us. But SQL queries. Just navigate to TEXT2SQL.AI. Once you sign in using one of the popular privacy invasion methods, you can enter a free text statement and get a well formed SQL query. Is the service useful? It may be. The downside is the overt data collection approach.
Stephen E Arnold, January 9, 2023
Confessions? It Is That Time of Year
December 23, 2022
Forget St. Augustine.
Big data, data science, or whatever you want to call is was the precursor to artificial intelligence. Tech people pursued careers in the field, but after the synergy and hype wore off the real work began. According to WD in his RYX,R blog post: “Goodbye, Data Science,” the work is tedious, low-value, unwilling, and left little room for career growth.
WD worked as a data scientist for a few years, then quit in pursuit of the higher calling as a data engineer. He will be working on the implementation of data science instead of its origins. He explained why he left in four points:
• “The work is downstream of engineering, product, and office politics, meaning the work was only often as good as the weakest link in that chain.
• Nobody knew or even cared what the difference was between good and bad data science work. Meaning you could suck at your job or be incredible at it and you’d get nearly the same regards in either case.
• The work was often very low value-add to the business (often compensating for incompetence up the management chain).
• When the work’s value-add exceeded the labor costs, it was often personally unfulfilling (e.g. tuning a parameter to make the business extra money).”
WD’s experiences sound like everyone who is disenchanted with their line of work. He worked with managers who would not listen when they were told stupid projects would fail. The managers were more concerned with keeping their bosses and shareholders happy. He also mentioned that engineers are inflamed with self-grandeur and scientists are bad at code. He worked with young and older data people who did not know what they were doing.
As a data engineer, WD has more free time, more autonomy, better career advancements, and will continue to learn.
Whitney Grace, December 23, 2022
The Internet: Cue the Music. Hit It, Regrets, I Have Had a Few
December 21, 2022
I have been around online for a few years. I know some folks who were involved in creating what is called “the Internet.” I watched one of these luminaries unbutton his shirt and display a tee with the message, “TCP on everything.” Cute, cute, indeed. (I had the task of introducing this individual only to watch the disrobing and the P on everything joke. Tip: It was not a joke.)
Imagine my reaction when I read “Inventor of the World Wide Web Wants Us to Reclaim Our Data from Tech Giants.” The write up states:
…in an era of growing concern over privacy, he believes it’s time for us to reclaim our personal data.
Who wants this? Tim Berners-Lee and a startup. Content marketing or a sincere effort to derail the core functionality of ad trackers, beacons, cookies which expire in 99 years, etc., etc.
The article reports:
Berners-Lee hopes his platform will give control back to internet users. “I think the public has been concerned about privacy — the fact that these platforms have a huge amount of data, and they abuse it,” he says. “But I think what they’re missing sometimes is the lack of empowerment. You need to get back to a situation where you have autonomy, you have control of all your data.”
The idea is that Web 3 will deliver a different reality.
Do you remember this lyric:
Yes, there were times I’m sure you knew
When I bit off more than I could chew
But through it all, when there was doubt
I ate it up and spit it out
I faced it all and I stood tall and did it my way.
The my becomes big tech, and it is the information highway. There’s no exit, no turnaround, and no real chance of change before I log off for the final time.
Yeah, digital regrets. How’s that working out at Amazon, Facebook, Google, Twitter, and Microsoft among others? Unintended consequences and now the visionaries are standing tall on piles of money and data.
Change? Sure, right away.
Stephen E Arnold, December 21, 2022
Google Did What? Misleading Users? Google!
November 15, 2022
In the midst of an economic downturn, most businesses try to avoid: [a] bad publicity regarding a sensitive issue and [b] paying lots of cash to US states. I suppose I could add [c] buying Twitter and [d] funding the metaverse, but let’s stick to the information in “Google Will Pay $392m to 40 States in Largest Ever US Privacy Settlement.”
For a big outfit like the Google my thought is that the negative publicity is more painful than writing checks. But advertisers are affected by the economic downturn and may be looking for ways to make sales without cutting deals with companies found guilt of user/customer surveillance.
The write up, which I assume is mostly on the money, says:
The states’ investigation was sparked by a 2018 Associated Press story, which found that Google continued to track people’s location data even after they opted out of such tracking by disabling a feature the company called “location history”.
The article points out:
It [the penalty] comes at a time of mounting unease over privacy and surveillance by tech companies that has drawn growing outrage from politicians and scrutiny by regulators.
Free services are great as long as users/customers don’t know exactly what’s happening. In the early days of the Google, there was not a generation interested in dinobaby ideas. Well, this decision suggests that some dinobabies with law degrees expect commercial enterprises to act with some sense of propriety.
The article makes clear exactly what Google did:
The attorneys general said Google misled users about its location tracking practices since at least 2014, violating state consumer protection laws. As part of the settlement, Google also agreed to make those practices more transparent to users. That includes showing them more information when they turn location account settings on and off and keeping a webpage that gives users information about the data Google collects.
Hmmm. What about targeted ads which miss their targets? Perhaps that’s an issue which will capture the attention of US attorneys general? Perhaps, but I am not optimistic. Awareness and subsequent legal processes move slowly, and slow is the friend of some firms.
Stephen E Arnold, November 15, 2022
One Tiny Point about Oracle
September 5, 2022
When silicon valley-type real news outfits “correct” one another, we tend to wonder why. In this case, it appears Gizmodo writer Matt Novak feels readers should know one key bit of information omitted by a recent Vox article: the fact that “Larry Ellison’s Oracle Started As a CIA Project.” He writes:
“Vox simply says that Oracle was founded in ‘the late 1970s’ and ‘sells a line of software products that help large and medium-sized companies manage their operations.’ All of which is true! But as the article continues, it somehow ignores the fact that Oracle has always been a significant player in the national security industry. And that its founder would not have made his billions without helping to build the tools of our modern surveillance state.”
One of those tools, of course, being the sort of database Oracle specializes in. The write-up emphasizes Ellison’s longstanding belief in a large federal database, asserting the attacks of 9/11 gave the tech tycoon the chance to push his vision. Novak quotes:
“‘The single greatest step we Americans could take to make life tougher for terrorists would be to ensure that all the information in myriad government databases was copied into a single, comprehensive national security database,’ Larry Ellison wrote in the New York Times in January of 2002. ‘Creating such a database is technically simple. All we have to do is copy information from the hundreds of separate law enforcement databases into a single database. A national security database could be built in a few months,’ Ellison explained. ‘A national security database combined with biometrics, thumb prints, hand prints, iris scans or whatever is best can be used to detect people with false identities.'”
We are not sure whether Novak is suggesting Vox deliberately downplayed Oracle’s role in facilitating a surveillance state infrastructure. He certainly wants us to know the company’s fortunes rose after that fateful day in September 2001, with federal government contracts making up 23 percent if its licensing revenue in 2003 to the tune of $2.5 billion. We are reminded Oracle’s David Carney stated in 2002, while trying but failing to avoid sounding callous, that 9/11 had been good for business. Perhaps Vox did not believe this facet of Oracle’s history to be relevant, but Gizmodo can consider us, dear readers, duly informed.
Cynthia Murrell, September 5, 2022
Data: A Disappointing Ride Down Zero Lane to Cell One
August 26, 2022
Projects meant to glean business insights through the analysis of vast troves of data still tend to disappoint. On its blog, British data-project management firm Brijj lists “5 Reasons Why 80% of Data and Insight Projects Fail.” The write-up tells us:
“In the UK alone, we spend £24bn on data projects every year. According to recent studies, however, organizational leadership has been dissatisfied with the value they get from data. In fact, they consider 80% of all data projects a failure. That equates to £19bn of waste. And why? Because so many don’t do the basics well. They never stood a chance.”
Not surprisingly, writer and Brijj founder/CEO Adrian Mitchell suggests consulting outside data experts from the start to make sure one’s project delivers those sweet, sweet insights:
“The bottom line is that both data creators and their business customers need to be involved in the data & insight project from the initial question through to the outcome and work closely together for it to provide actionable insights and urge action. Currently, there are many gaps between the two groups, resulting in disconnect, frustrations, time and financial losses, and no real-world outcomes. Organizations need to close these to truly harness the power of data and maximize its value.”
The list Mitchell offers looks awfully familiar; we think we have heard some of these “reasons” before. We are told the biggest problem is asking the wrong questions in the first place. Then there is, as mentioned above, a lack of collaboration between data analysts and their clients. If one has managed to gather useful bits of knowledge, they must be both communicated to the right people and made easy to find. Finally, standardized systems (like Brijj’s, we presume) should be put in place to make the whole process easier for the technically disinclined.
Perhaps Mitchell is right and these measures can help some companies make the most of the data they were persuaded to accumulate? It is worth keeping in mind, though, that any concepts derived by software have limitations… just like a blind data.
Cynthia Murrell, August 26, 2022
Quality Defined: Just Two Ways?
August 25, 2022
I am not sure what to make of “The Two Types of Quality.” A number of years ago I was in Osaka and Tokyo to deliver several lectures about commercial databases. The topic had to be narrowed, so I focused on the differences between a commercially successful database like those produced by the Courier Journal & the Louisville Times, the Petroleum Institute, and ERIC, among a handful of other must-have professional-operated databases. I explained that database quality could be defined by technical requirements; for example, timeliness of record updating, assigning a specific number of index terms from a subject matter expert developed and maintained controlled vocabulary (term) list, accurate spelling, abstracts conforming to the database publishers’ editorial guidelines, etc. The other type of quality was determined by the user; for example, was the information provided by the database timely, accurate, and in line with the scope of the database. Neither definition of quality was particularly good. I made this point in my lectures. Quality is like any abstract noun. Context defines it. Today quality means, as I was told after a lecture in Germany, “good enough.” I thought the serious young person was joking. He was not. This professional, who worked for an important European Union department, embraced “good enough” as the definition of quality.
The cited essay explains that there are two types of quality. The first is “purely functional.” I think that’s close to my definition of quality for old-fashioned databases. There were expensive to produce, difficult to index in a useful, systematic way even with our human plus smart software systems, and quite difficult to explain to a person unfamiliar with the difference between looking up something using Google and looking up a chemical structure in Chemical Abstracts. When I was working full time, I had a tough time explaining that Google was neither comprehensive nor focused on consistency. Google wanted to sell ads. Popularity determined quality, but that’s not what “quality” means to a person researching a case in a commercial database of legal information.
The second is “quality that fascinates.” I must admit that this is related to my notion of context, but I am not sure that “fascination” is exactly what I mean by context. A ball of string can fascinate the cat owner as well as the cat. Is this quality? Not in my book.
Several observations:
- Quality cannot be defined. I do believe that a company, its products, and an individual can produce objects or services that serve a purpose and do not anger the user. Well, maybe not anger. How about annoy, frustrate, stress, or force a product or service change. It is also my perception that quality is becoming a lost art like chipping stone arrowheads.
- The word “quality” can be defined in terms of cost cutting. I use products and services that are not without big flaws. Whether it is getting Microsoft Windows to print or preventing a Tesla from exploding into flames, short cuts seem to create issues. These folks are not angels in my opinion.
- The marketers, many of those whom I met were former journalists or art history majors, explain quality and other abstract terms in a way which obfuscates the verifiable behavior of a product or service. These folks are mendacious in my opinion.
Net net: Quality now means good enough.
That’s why nothing seems to work: Airport luggage handling, medical treatments of a president, electric vehicles, contacting a local government agency about a deer killed by a rolling smoke pickup truck driver, etc.
Quality products and services exist. Is it possible to find these using Bing, Google, or Yandex?
Nope.
Stephen E Arnold, August 25, 2022
TikTok: Allegations of Keylogging
August 22, 2022
I am not a TikTok person; therefore, I exist in a trend free zone. Others are sucking down short videos with alacrity. I admire a company, possibly linked to China’s government, which has pioneered a next generation video editor and caused the Alphabet Google YouTube DeepMind thing to innovate via its signature “me too” method of innovation.
Now TikTok has another feature, which is an interesting allegation. “TikTok’s In-App Browser Can Monitor Your Every Click and Keystroke” asserts:
When Krause [a security researcher] dug a little deeper into what these apps’ in-app browsers really do, he’d found that TikTok does some bad things, including monitoring all of users’ keyboard inputs and taps. So, if you open a web page inside of TikTok’s app, and enter your credit card details there, TikTok can access all of those details. TikTok is also the only app, out of all the apps Krause has looked into, that doesn’t even offer an option to open the link in the device’s default browser, forcing you to go through its own in-app browser.
Let’s assume this finding is spot on. First question: Does anyone care? Second question: So what?
I don’t have answers to either question. I do, however, have several observations:
- Oracle, for some reason, seems to care. The estimable database company is making an effort to find information that suggests TikTok data are kept in a cupboard. Only grandma can check out who will be an easy target for psychological manipulation. No results yet, but if TikTok is a neutral service, why’s Oracle involved?
- A number of Silicon Valley pundits have pointed out that TikTok is no big deal. That encapsulates the “so what” issue. “Put that head in the sand and opine forward” is the rule of thumb for these insightful folks.
- Keyloggers are a fave of certain actors. TikTok may have found them useful for benign purposes.
Quite an allegation.
Stephen E Arnold, August 22, 2022