Accidental Bias or a Finger on the Scale?

September 18, 2023

Who knew? According to Bezos’ rag The Washington Post, “Chat GPT Leans Liberal, Research Shows.” Writer Gerrit De Vynck cites a study on OpenAI’s ChatGPT from researchers at the University of East Anglia:

“The results showed a ‘significant and systematic political bias toward the Democrats in the U.S., Lula in Brazil, and the Labour Party in the U.K.,’ the researchers wrote, referring to Luiz Inácio Lula da Silva, Brazil’s leftist president.”

Then there’s research from Carnegie Mellon’s Chan Park. That study found Facebook’s LLaMA, trained on older Internet data, and Google’s BERT, trained on books, supplied right-leaning or even authoritarian answers. But Chat GPT-4, trained on the most up-to-date Internet content, is more economically and socially liberal. Why might the younger algorithm, much like younger voters, skew left? There’s one more juicy little detail. We learn:

“Researchers have pointed to the extensive amount of human feedback OpenAI’s bots have gotten compared to their rivals as one of the reasons they surprised so many people with their ability to answer complex questions while avoiding veering into racist or sexist hate speech, as previous chatbots often did. Rewarding the bot during training for giving answers that did not include hate speech, could also be pushing the bot toward giving more liberal answers on social issues, Park said.”

Not exactly a point in conservatives’ favor, we think. Near the bottom, the article concedes this caveat:

“The papers have some inherent shortcomings. Political beliefs are subjective, and ideas about what is liberal or conservative might change depending on the country. Both the University of East Anglia paper and the one from Park’s team that suggested ChatGPT had a liberal bias used questions from the Political Compass, a survey that has been criticized for years as reducing complex ideas to a simple four-quadrant grid.”

Read more about the Political Compass here and here. So does ChatGPT lean left or not? Hard to say from the available studies. But will researchers ever be able to pin down the rapidly evolving AI?

Cynthia Murrell, September 18, 2023

Sam AI-Man: A Big Spender with Trouble Ahead?

August 15, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_tNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

$700,000 per day. That’s an interesting number if it is accurate. “ChatGPT In Trouble: OpenAI May Go Bankrupt by 2024, AI Bot Costs Company $700,000 Every Day” states that the number is the number. What’s that mean? First, forget salaries, general and administrative costs, the much-loved health care for humans, and the oddments one finds on balance sheets. (What was that private executive flight to Tampa Bay?)

81 cannt pay ees

A young entrepreneur realizes he cannot pay his employees. Thanks, MidJourney, whom did you have in your digital mind?

I am a dinobaby, but I can multiply. The total is $255,500,000. I want to ask about money (an investment, of course) from Microsoft, how the monthly subscription fees are floating the good ship ChatGPT, and the wisdom of hauling an orb to scan eyeballs from place to place. (Doesn’t that take away from watching the bourbon caramel cookies reach their peak of perfection? My hunch is, “For sure.”)

The write up reports:

…the shift from non-profit to profit-oriented, along with CEO Sam Altman’s lack of equity ownership, indicates OpenAI’s interest in profitability. Although Altman might not prioritize profits, the company does. Despite this, OpenAI hasn’t achieved profitability; its losses reached $540 million since the development of ChatGPT.

The write up points out that Microsoft’s interest in ChatGPT continues. However, the article observes:

Complicating matters further is the ongoing shortage of GPUs. Altman mentioned that the scarcity of GPUs in the market is hindering the company’s ability to enhance and train new models. OpenAI’s recent filing for a trademark on ‘GPT-5’ indicates their intention to continue training models. However, this pursuit has led to a notable drop in ChatGPT’s output quality.

Another minor issue facing Sam AI-Man is that legal eagles are circling. The Zuck dumped his pet Llama as open source. And the Google and Googley chugs along and Antropic “clawed” into visibility.

Net net: Sam AI-Man may find that he will an opportunity to explain how the dial on the garage heater got flipped from Hot to Fan Only.

Stephen E Arnold, August 15, 2023

Generative AI: Good or Bad the Content Floweth Forth

August 11, 2023

Hollywood writers are upset that major studios want to replace them with AI algorithms. While writing bots have not replaced human writers yet AI algorithms such as ChatGPT, Ryter,, and more are everywhere. Threat Source Newsletter explains that, “Every Company Has Its Own Version of ChatGPT Now.”

8 7 flood of content

A flood of content. Thinking drowned. Thanks Mid Journey. I wanted words but got letters. Great Job.

AI writing algorithms are also known as AI assistants. They are programmed to answer questions and perform text-based tasks. The text-based tasks include writing résumés, outlines, press releases, Web site content, and more. While the AI assistants still cannot pass the Turing test, it is not stopping big tech companies from developing their own bots. Meta released Llama 2 and IBM rebranded its powerful computer system from Watson to watsonx (it went from a big W to a lower case w and got an “x” too).

While Llama 2, the “new” Watson, and ChatGPT are helpful automation tools they are also dangerous tools for bad actors. Bad actors use these tools to draft spam campaigns, phishing emails, and scripts. Author Jonathan Munshaw tested AI assistants to see how they responded to illegal prompts.

Llama 2 refused to assist in generating an email for malware, while ChatGPT “gladly” helped draft an email. When Munshaw asked both to write a script to ask a grandparent for a gift card, each interpreted the task differently. Llama 2 advised Munshaw to be polite and aware of the elderly relative’s financial situation. ChatGPT wrote a TV script.

Munshaw wrote that:

“I commend Meta for seeming to have tighter restrictions on the types of asks users can make to its AI model. But, as always, these tools are far from perfect and I’m sure there are scripts that I just couldn’t think of that would make an AI-generated email or script more convincing.”

It will be awhile before writers are replaced by AI assistants. They are wonderful tools to improve writing but humans are still needed for now.

Whitney Grace, August 10, 2023

DarkCyber for October 2, 2018, Now Available

October 2, 2018

DarkCyber for October 2, 2018, is now available at and on Vimeo at .

Stephen E Arnold’s DarkCyber is a weekly video news and analysis program about the Dark Web and lesser known Internet services. This week’s program covers four Dark Web and security related stories.

The first story reports some of the findings from Carbon Black’s study of cryptojacking. The exploit uses an unsuspecting organization’s computers to mine cryptocurrency without the knowledge of the unwitting host. Organizations in the US, according to the study, are the number one target in the world. DarkCyber reveals how to get a free copy of this report.

The second story explores a new Dark Web crowd funding site called SadaqaCoins. The purpose of the site is to make it easy for terrorist – activists to support specific projects; for example, funding ransom, purchasing weapons, or contributing money so that sacrificial animals can be purchased by the devout. Contributions are accepted in Bitcoin, Monero, and Ethereum. The SadaqaCoins’ site then provides the funds to the person or organization requesting the funds. SadaqaCoins is not a replacement for hawala method of fund transfer.

The third story provides a snapshot of a hacking tool called theHarvester. Included with Kali Linux, theHarvester acquires information about a domain, including subdomains and other information. The system uses publicly available sources of information, including Web searches, PGP registries, Shodan, and similar content resources. The software can display names, email addresses, and related information. The software tool can be used for forensic and more aggressive information gathering tasks. DarkCyber provides information so that a viewer can download the software without charge.

The final story reports that the Drug Llama has been identified and captured. A 31 year old female allegedly sold controlled substances, including fentanyl and engaged in money laundering. The investigation included state and federal law enforcement units. For now, the Drug Llama is no longer roaming the highs and lows of the Dark Web.

Watch for our Amazon Policeware series beginning on Tuesday, October 30, 2018.

Stephen E Arnold, October 2, 2018

Smart Software and Clever Humans

September 23, 2018

Online translation works pretty well. If you want 70 to 85 percent accuracy, you are home free. Most online translation systems handle routine communications like short blog posts written in declarative sentences and articles written in technical jargon just fine. Stick to mainstream languages, and the services work okay.

But if you want an online system to translate my pet phrases like HSSCM or azure chip consultant, you have to attend more closely. HSSCM refers to the way in which some Silicon Valley outfits run their companies. You know. Like a high school science club which decides that proms are for goofs and football players are not smart. The azure chip thing refers to consulting firms which lack the big time reputation of outfits like Bain, BCG, Booz, etc. (Now don’t get me wrong. The current incarnations of these blue chip outfits is far from stellar. Think questionable practices. Maybe criminal behavior.) The azure chip crowd means second string, maybe third string, knowledge work. Just my opinion, but online translation systems don’t get my drift. My references to Harrod’s Creek are geocoding nightmares when I reference squirrel hunting and bourbon in cereal. Savvy?

I was, therefore, not surprised when I read “AI Company Accused of Using Humans to Fake Its AI.” The main point seems to be:

[An[ interpreter accuses leading voice recognition company of ripping off his work and disguising it as the efforts of artificial intelligence.

There are rumors that some outfits use Amazon’s far from mechanical Turk or just use regular employees who can translate that which baffles the smart software.

The allegation from a former human disguised as smart software offered this information to Sixth Tone, a blog publishing the article:

In an open letter posted on Quora-like Q&A platform Zhihu, interpreter Bell Wang claimed he was one of a team of simultaneous interpreters who helped translate the 2018 International Forum on Innovation and Emerging Industries Development on Thursday. The forum claimed to use iFlytek’s automated interpretation service.

Trust me, you zippy millennials, smart software can be fast. It can be efficient. It can be less expensive than manual methods. But it can be wrong. Not just off base. Playing a different game with expensive Ronaldo types.

Why not run this blog post through Google Translate and check out the French or Spanish the system produces? Better yet, aim the system as a poor quality surveillance video or a VoIP call laden with insider talk between a cartel member and the Drug Llama?

Stephen E Arnold, September 23, 2018

Linguamatics and the US FDA

August 24, 2012

Linguamatics recently announced that the FDA’s Center for Drug Evaluation and research (CDER) is set to use their I2E platform, which is the company’s interactive data mining and extraction software across CDER’s laboratory research relating to drug safety.

The write-up “Linguamatics’ I2E Text Mining Platform Chosen by FDA” provides more details about why the text mining company was selected by the CDER:

“I2E’s NLP-based querying capabilities, coupled with its scalability and flexibility, mean it is ideally suited to answering many challenging, high value questions in life sciences and healthcare by unlocking knowledge buried in the scientific literature and other textual information. Rather than just retrieving documents, I2E can rapidly identify, extract, synthesize and analyze specific, relevant facts and relationships, such as those between genes and diseases or compounds and side effects. Customers include nine of the top ten global pharmaceutical companies.”

What’s great about the I2E platform is that unlike other text mining systems, I2E provides businesses with full control over what information is to be extracted, what query definitions are, and the kind of output. With it, users can obtain information in a short period of time even from large documents.

Again, another company from the science sector has opted to use Linguamatics’ I2E platform. CDER joins Pfizer, Selventa, AstraZeneca, and others from the company’s roster of prestigious clients. Linguamatics has truly evolved from being a small player to being the industry leader in NLP-based text mining within just a few years. We’re excited to see what the company will become two to three years from now.

Lauren Llamanzares, August 24, 2012

Sponsored by, developer of Augmentext

Microsoft Issues Exchange Sharepoint Related Security Advisory

August 24, 2012

Possible a first in the industry, Microsoft Security Research Center published Microsoft Security Advisory (2737111), which describes how possible vulnerabilities in Oracle Outside In libraries affect the WebReady Document Viewing functionality of Microsoft Exchange and FAST Search Server. Oracle also released their own Critical Patch Update Advisory. Here are more details about the security risk:

“The vulnerabilities exist due to the way that files are parsed by the third-party, Oracle Outside In libraries. In the most severe case of Microsoft Exchange Server 2007 and Microsoft Exchange Server 2010, it is possible under certain conditions for the vulnerabilities to allow an attacker to take control of the server process that is parsing a specially crafted file. An attacker could then install programs; view, change, or delete data; or take any other action that the server process has access to do.”

If you think you may be affected by this, look at this blog post that recommends the workarounds to be done.

Take note that there are 24 other companies – some of them industry giants – that also make use of the said Oracle library. Some of them are IBM, Cisco, Symantec, and McAfee. Hopefully, these companies will soon be able to assess the impact of the said vulnerability on their platforms and issue a security update soon.

Lauren Llamanzares, August 24, 2012

Sponsored by, developer of Augmentext

Lexalytics and FutureEverything Join Forces to Analyze London Olympics Sentiment

August 23, 2012

To further add to the hype of the London Olympics, text analysis firm Lexalytics has announced that it has partnered with FutureEverything with the goal of analyzing the overall sentiment during the said event. “Lexalytics tracks mood of London Olympics” describes how they’re set on accomplishing this:

“The Amherst-based software business has provided Salience, a multi-lingual text analysis engine that is integrated into systems for media monitoring, analysis and business intelligence, to Emoto, a project by FutureEverything.

… Launched this week, Emoto provides the worldwide mood in response to events that are taking place in London 2012. The project tracks micro-blogging sites such as Twitter for themes that are related to the Olympic Games and then analyzes the messages for content and tone, according to the company.”

The public can then access this information via the Emoto website and through the Android mobile app aptly named Emoto in London.

While we all used to think that emotions and computers just don’t mix, Lexalytics has done a good job of getting a lifeless, emotionless machine quantify some sentiment and draw out meaning from text. Of course, the company is far from perfecting this technology and is currently refining it. But once it does, I’m really excited what sort of big applications will emerge, particularly in the area of mobile tech. I can imagine our phones summarizing our emails for us complete with the action items. What do you think?

Lauren Llamanzares, August 23, 2012

Sponsored by, developer of Augmentext

Thunderstone Offers Smooth Transition from Google Mini

August 23, 2012

If you have been using Google Mini as your search appliance of choice, then probably by now you know that you’ve been forcefully exiled. But worry not, Mini refugees. There’s still good news.

Thunderstone, a leading player in search and content processing has announced that they will be offering an upgrade path that will allow Google Mini owners to transition to the Thunderstone Search Appliance smoothly.

The write-up “Thunderstone Provides Special Competitive Upgrade Offer for Google Mini Owners” gives us more details:

“For customers upgrading to the Thunderstone Search Appliance, Thunderstone will honor the remaining warranty and support contract on the Google Mini as an extension to the standard two year support contract on a Thunderstone Search Appliance. In addition Thunderstone will provide assistance in the migration and a 30-day money back guarantee, so that the entire process is painless.”

For a long time, many felt sorry for Thunderstone for having been forced to fight head-on with Google even though they pioneered the search appliance. It’s such a pity that the first runner in the field doesn’t necessarily win. But good thing that it was able to stick around. Now, it even offers a good upgrade package for Google Mini after it was announced that the latter’s production will be discontinued. What a relief for Mini users.

Lauren Llamanzares, August 23, 2012

Sponsored by, developer of Augmentext

Blue Washing of IBMs Recent Acquisitions Might Affect Licensing Deals

August 23, 2012

Lately, IBM seems to be on an analytics-buying spree having successfully completed three acquisitions spaced only a month apart. Its latest purchase is Tealeaf Technology, which is a company that focuses on customer behavior analysis and digital customer experience management. Prior that, it was Varicent, which specializes on sales performance and compensation. And before that was Vivisimo, a discovery and data-capture software provider.

But pre-acquisition customers may have a problem with this later on. “IBM’s “Blue Washing” Affects Customers Worldwide – Scott & Scott, LLP Alerts Customers of Potential Licensing Surprises” discusses why:

“…IBM’s recent acquisition of Tealeaf, Vivisimo, and Varicent will likely change existing license agreements with the newly acquired publishers.

There are a number of legal strategies that can be employed when IBM ‘blue washes’ its code and its license agreements. Blue washing is IBM’s term used when IBM releases updated code and changes its licensing metrics for products acquired from other publishers. Once customers upgrade to IBM’s product, it is often too late to negotiate and avoid hefty licensing charges associated with changed licensing metrics…”

Blue washing isn’t an ideal situation for companies who have been using business-critical software under terms that are advantageous to them. But what can they do against IBM? What negotiating power do small companies have versus the blue giant?

Changes in licensing deals by IBM wouldn’t surprise us here at Beyond Search. The only thing left for the affected companies to do is to absorb one of two costs: IBM’s more costly licensing metrics or shifting to other software.

Lauren Llamanzares, August 23, 2012

Sponsored by, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta