Twitter and the Fire Hose for Academics

January 29, 2021

I read “Enabling the Future of Academic Research with the Twitter API.” According to the official Twitter statement:

Our developer platform hasn’t always made it easy for researchers to access the data they need, and many have had to rely on their own resourcefulness to find the right information.

Understatement, of course.

The post continues:

We’ve also made improvements to help academic researchers use Twitter data to advance their disciplines, answer urgent questions during crises, and even help us improve Twitter.

Help is sometimes — well — helpful. But self help is often a positive step; for example, verifying the actual identity of a person who uses the tweeter thing. There are some software robots chugging along I believe.

Also, charging a subscription fee. The amount is probably less important than obtaining verifiable bank information. Sure, some software robots have accounts at outstanding institutions like Credit Suisse and HSBC, but whatever account data are available might be helpful under certain circumstances.

But academics? How many academics work for non governmental or governmental entities as experts, analysts, and advisors? Will the tweeter thing’s new initiative take such affiliations into account before and during usage of Twitter data?

I assume that a tweeter senior manager will offer an oracular comment like, “For sure.”

There are three hoops through which the agile academic must jump, and I quote:

  1. You are either a master’s student, doctoral candidate, post-doc, faculty, or research-focused employee at an academic institution or university.
  2. You have a clearly defined research objective, and you have specific plans for how you intend to use, analyze, and share Twitter data from your research…
  3. You will use this product track for non-commercial purposes….

Sounds like a plan which will make some nation states’ academics wriggle with anticipative joy.

My view is that this new initiative may unfold in interesting ways. But I am sure the high school science club managers have considered such possibilities. Why who would hire a graduate student to access tweeter outputs to obtain actionable information for use by a country’s intelligence professionals? The answer in the twitterverse is, “Who would risk losing the trust of Twitter by doing that?” Certainly not an academic funded by an intelligence or law enforcement entity.

Right, no one. Misuse the tweeter? Inconceivable.

Stephen E Arnold, January 29, 2021

Tweet This! Real News Discovers the Concept of Hidden in Plain Sight

December 31, 2020

Remember the Purloined Letter? No, that’s okay. Thumbtypers don’t either. I read “Just How Bad Was This Year? These Professors Found Answers on Twitter.” I noted this passage:

Since 2008, the duo [professors at a school in Vermont] has taken a random 10 percent of everything tweeted each day, seeking truths hidden in plain sight. (Whileacknowledging, as Danforth put it, that “Twitter is a nonuniform subsample of utterances made by a nonuniform subsample of humans whoare on the Internet.”) They’ve used it, for example, to explore fame, finding that DonaldTrump and K-pop band BTS are mentioned as commonly as some regular words (think: “after,” “would.”). As Dodds put it, “The word‘Trump’ has been in the top 300 words all year this year, which he’s never done before. That’s more common than the word ‘God.’ ”

The sampling is done by the Hedonometer, possible a reference to either a town in England or a unit of pleasure used to theoretically weigh people’s happiness. I like the latter candidate, split infinitive, and the weird idea of “weighing” happiness. I often say to the grocery clerk in Harrod’s Creek, Kentucky, “I will take a pound of happiness and a half pound of ricotta, please.”

The big find seems to be:

Some trends have emerged through the years. All else being equal,Saturday is the week’s happiest day on Twitter, Tuesday the saddest.National holidays cause huge spikes in happiness, with Christmas beingthe most cheerful. Major sporting events and birthdays of pop stars,particularly K-pop stars, tend to make for gleeful days. On the flipside, natural disasters and mass shootings tend to spark more unhappydays.

What’s the analysis reveal?

“In the last five years, we’ve seen the usual weekly cycle justget busted,” Dodds added. “It’s sort of all over the place now.Events are happening any day of the week. It’s much more what Iwould call emotional turbulence.”

Remarkable in a way, a modest way.

Stephen E Arnold, December 31, 2020

Saddle Up, Statistical Analytics Fans. Place Your Bets on AI

December 7, 2020

The Best Way to Win a Horse Race? Mathematicians May Have the Answer” is an interesting example of a snappy headline not supported by the write up’s text. For gamblers, the promise of finding a way to predict which mighty steed will cross the finish line first is catnip. (Sorry for the mixed animal metaphor, but I could not resist.) The problem is that the summary of the study includes lots of references to data collection and number crunching. Then the killer statement:

Various other scientific attempts to explain performance over the past 4 decades “haven’t been particularly successful,” he says—and not just because horses vary so much in body size and aerobic capacity: The models cannot account for the horse’s own behaviors. For example, a horse might give up when another horse passes it, because it doesn’t understand that it’s supposed to win. Until researchers can get inside the horse’s head and account for psychological variables, Knight says, “we can’t truly model performance.”

Net net: If one can’t model an equine, what’s that suggest for figuring out what human will cross an innovation finish line, crack a tough problem, or write a headline which does mislead the pony player?

Stephen E Arnold, December 7, 2020

Clarity: A Better Name Than Pluton. Pluton?

November 20, 2020

After two years, Clarity has finally made it out of Beta, we learn from “Microsoft Clarity Debuts as Free Analytics Tool with Heat Maps” at Search & Performance Marketing Daily. The free tool uses heat maps to analyze the behavior of visitors to one’s website. Reporter Laurie Sullivan writes:

“Clarity — designed to have a low impact on page-load times and there are no caps on traffic no matter what the number of visitors to the website — helps give marketers a deeper understanding of why at website performs one way and not another. It also provides anonymized heat maps and data that show where site visitors clicked and scrolled, and enables marketers to analyze use behavior on the website exactly as it happened through a job description code. Some of the data includes the name of the browser, and whether they are using a PC, tablet or mobile phone to access the site. Heat maps provide a visual way to examine large numbers of site visitor interactions. Microsoft built two types: click maps and scroll maps. While the heat maps tell marketers which pages get the most clicks, the click maps tell marketers what website page content visitors interact with the most. Areas in the map marked in red have the highest frequency of clicks and are usually centered on focal points.”

The heat maps let marketers know whether visitors are clicking where they want them to. It also reports certain behaviors—excessive scrolling, dead clicks, and rage clicks. The last term describes users clicking several times on a spot they believe should be a hyperlink but is not—one would want to either fix an intended link or tweak the graphics on those spots. The tool also supplies a dashboard that presents metrics of the overall traffic patterns, time spent on the site, and even concurrent JavaScript errors. Microsoft pledges Clarity complies with the EU’s General Data Protection Regulation.

But Pluton, Microsoft’s mystery processor? Pluton?

Cynthia Murrell, November 20, 2020

Surveys: These Marketing Devices Are Accurate, Right?

November 10, 2020

There’s nothing like a sample, a statistical sample, that is. What’s interesting is that the US polls seem to have been reflecting some interesting but marketing-type trends. The bastion of “real journalism”— the UK Daily Mail — published “…We Did a Good Job: Defiant Pollster Nate Silver Rushes to Defend His Profession after Another Systematic Failure of Polls in the Build-Up to an Election.” Bibliophiles will note that I have omitted the tasteful obscenity. I like to avoid using words likely to irritate the really smart software which edits blog posts.

The write up points out:

FiveThirtyEight founder and editor-in-chief Nate Silver hit back at those slamming the website for being so off with their election predictions.

Let’s think about why FiveThirtyEight and other polls seem to have predicted a reality different from the one generated by humanoids marking ballots.

First, there is the sample. Picking people at random is dependent on a number of factors: Sources, selection bias, humanoids who don’t respond, etc.

Second, there are the humanoids themselves. Some people plug in the “answers” which get the poll over with really fast. I lose interest at the first hint of dark patterns which make it tough to know how may questions I have to answer to get the coupon, pat on the head, or the free shopping sack.

Third, there is counting. Yep, humans or machine things can happen.

Fourth, there is analysis. It is remarkable what one can do when counting or doing “analytics.”

The Daily Mail quotes an expert about making polls better:

‘The polling profession needs to reshape and reorganize their questionnaires,’ Luntz [the polling expert] told DailyMail.com. ‘It’s the only way they’ll ever get it right.’

But I keep thinking about the FiveThirtyEight obscenity. Defensive? Eloquent? Subjective? Insightful?

That subjective thing.

Stephen E Arnold, November 10, 2020

Linear Math Textbook: For Class Room Use or Individual Study

October 30, 2020

Jim Hefferon’s Linear Algebra is a math textbook. You can get it for free by navigating to this page. From Mr. Hefferon’s Web page for the book, you can download a copy and access a range of supplementary materials. These include:

  • Classroom slides
  • Exercise sets
  • A “lab” manual which requires Sage
  • Video.

The book is designed for students who have completed one semester of calculus. Remember: Linear algebra is useful for poking around in search or neutralizing drones. Zaap. Highly recommended.

Stephen E Arnold, October 30, 2020

Exclusive: Interview with DataWalk’s Chief Analytics Officer Chris Westphal, Who Guides an Analytics Rocket Ship

October 21, 2020

I spoke with Chris Westphal, Chief Analytics Officer for DataWalk about the company’s string of recent contract “wins.” These range from commercial engagements to heavy lifting for the US Department of Justice.

Chris Westphal, founder of Visual Analytics (acquired by Raytheon) brings his one-click approach to advanced analytics.

The firm provides what I have described as an intelware solution. DataWalk ingests data and outputs actionable reports. The company has leap-frogged a number of investigative solutions, including IBM’s Analyst’s Notebook and the much-hyped Palantir Technologies’ Gotham products. This interview took place in a Covid compliant way. In my previous Chris Westphal interviews, we met at intelligence or law enforcement conferences. Now the experience is virtual, but as interesting and information in July 2019. In my most recent interview with Mr. Westphal, I sought to get more information on what’s causing DataWalk to make some competitors take notice of the company and its use of smart software to deliver what customers want: Results, not PowerPoint presentations and promises. We spoke on October 8, 2020.

DataWalk is an advanced analytics tool with several important innovations. On one hand, the company’s information processing system performs IBM i2 Analyst’s Notebook and Palantir Gotham type functions — just with a more sophisticated and intuitive interface. On the other hand, Westphal’s vision for advanced analytics has moved past what he accomplished with his previous venture Visual Analytics. Raytheon bought that company in 2013. Mr. Westphal has turned his attention to DataWalk. The full text of our conversation appears below.

Read more

Tickeron: The Commercial System Which Reveals What Some Intel Professionals Have Relied on for Years

October 16, 2020

Are you curious about the capabilities of intelware systems developed by specialized services firms? You can gat a good idea about the type of information available to an authorized user:

  • Without doing much more than plugging in an entity with a name
  • Without running ad hoc queries like one does on free Web search systems unless there is a specific reason to move beyond the provided output
  • Without reading a bunch of stuff and trying to figure out what’s reliable and what’s made up by a human or a text robot
  • Without having to spend time decoding a table of numbers, a crazy looking chart, or figuring out weird colored blobs which represent significant correlations.

Sound like magic?

Nope, it is the application of pattern matching and established statistical methods to streams of data.

The company delivering this system, tailored to Robinhood-types and small brokerages, has been assembled by Tickeron. There’s original software, some middleware, and some acquired technology. Data are ingested and outputs indicate what to buy or sell or to know, as a country western star crooned, “know when to hold ‘em.”

A rah rah review appeared in The Stock Dork. “Tickeron Review: An AI-Powered Trading Platform That’s Worth the Hype” provides a reasonably good overview of the system. If you want to check out the system, navigate to Tickeron’s Web site.

Here’s an example of a “card,” the basic unit of information output from the system:

image

The key elements are:

  • Icon to signal “think about buying” the stock
  • A chart with red and green cues
  • A hot link to text
  • A game angle with the “odds” link
  • A “more” link
  • Hashtags (just like Twitter).

Now imaging this type of data presented to an intel officer monitoring a person of interest. Sound useful? The capability has been available for more than a decade. It’s interesting to see this type of intelware finds its way to those who want to invest like the wizards at the former Bear Stearns (remember that company, the bridge players, the implosion?).

DarkCyber thinks that the high-priced solutions available from Wall Street information providers may wonder about the $15 a month fee for the Tickeron service.

Keep in mind that predictions, if right, can allow you to buy an exotic car, an island, and a nice house in a Covid-free location. If incorrect, there’s van life.

The good news is that the functionality of intelware is finally becoming more widely available.

Stephen E Arnold, October 16, 2020

Rah, Rah, Sis Boom Analytics. No, Wait. Boo, Boo, Hiss, Hiss Analytics

October 16, 2020

One of the DarkCyber researchers alerted me to “Most CMOs Disappointed with Analytics Results.” We are wrapping up an interview with one of the senior technologists at Datawalk, and the topic of complexity in easy-to-use analytics systems was a topic of discussion. Watch for this revealing interview in an upcoming issue of DarkCyber.

The article about disappointed CMOs is not surprising. What is surprising is that individuals with expectations that smart software will generate just the answer one needs to generate bigly sales are so widespread.

The write up reports citing a study by the mid-tier consulting firm Gartner Group:

“Though CMOs understand the importance of applying analytics throughout the marketing organization, many struggle to quantify the relationship between insights gathered and their company’s bottom line. In fact, nearly half of respondents in this year’s survey say they’re unable to measure marketing ROI,” says Lizzy Foo Kune, senior director analyst in the Gartner Marketing practice. “This inability to measure ROI tarnishes the perceived value of the analytics team.”

Other findings from the study of 415 marketing “leaders” are:

  • Training staff is not a priority
  • Data science and campaigns are behind other analytic use cases
  • Most organizations will spend more for analytics.

These types of surveys deliver results that gild the available lilies.

For those without numerical skills and training, many of today’s analytic tools are like to disappoint. The digital oracle of Delphi is not working particularly well for many users. Even individuals with a couple of statistics courses on their record have to spend time familiarizing themselves with the analytic tools and their options. Plus if bad data go in, not even a super smart system can produce silk purses from chubby data pigs. Nevertheless, MBAs believe in analytics and, of course, magic.

Stephen E Arnold, October 16, 2020

Spreadsheet Fever Case Example

October 12, 2020

I have been using the phrase “spreadsheet fever” to describe the impact of fiddling with numbers in Microsoft Excel has on MBAs. With Excel providing the backbone for numerous statistical confections, the sugar hit of magic assumptions cannot be under-estimated. The mental structure of a crazed investment analyst brooks no interference from common sense.

Excel: Why Using Microsoft’s Tool Caused Covid-19 Results to Be Lost” provides a possible case example of what happens when thumbtypers and over-confident innumerates tangle with a digital spreadsheet. No green eyeshades and no pencils needed. Calculators? One can hear a 22 year old ask, “What’s a calculator? I have one on my iPhone?”

The Beeb reports:

PHE [Public Health England, a fine UK entity] had set up an automatic process to pull this data together into Excel templates so that it could then be uploaded to a central system and made available to the NHS Test and Trace team, as well as other government computer dashboards.

And what tool did these over confident wizards use?

Microsoft Excel, the weapon of choice for business and STEM analysis, of course.

How did the experts wander off the information highway into a thicket of errors? The Beeb explains:

The problem is that PHE’s own developers picked an old file format to do this – known as XLS. As a consequence, each template could handle only about 65,000 rows of data rather than the one million-plus rows that Excel is actually capable of. And since each test result created several rows of data, in practice it meant that each template was limited to about 1,400 cases. When that total was reached, further cases were simply left off.

The fix? Can kicking perhaps:

But insiders acknowledge that the current clunky system needs to be replaced by something more advanced that excludes Excel, as soon as possible.

Righto.

Stephen E Arnold, October 12, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta