Academic Research Resources: Smart Software Edition
August 8, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
One of my research team called “The Best AI Tools to Power Your Academic Research.” The article identifies five AI infused tools; specifically:
- ChatPDF
- Consensus
- Elicit.org
- Research Rabbit
- Scite.ai
Each of the tools is described briefly. The “academic research” phrase is misleading. These tools can provide useful information related to inventors and experts (real or alleged), specific technical methods, and helpful background or contest for certain social, political, and intellectual issues.
If you have access to a LLM question-and-answer system, experimenting with article summaries, lists of information, and names of people associated with a particular activity — give a ChatGPT system a whirl too.
Stephen E Arnold, August 8, 2023
AI-Search Tool Talpa Burrows Into Library Catalogues
July 19, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
For a few years now, libraries have been able to augment their online catalogue with enrichment services from Syndetics Unbound, which adds details and imagery to each entry. Now the company is incorporating new AI capabilities, we learn from its write-up, “Introducing Talpa Search.” Talpa is still experimental and is temporarily available to libraries already using Syndetics Unbound.
A book lover in action. Thanks MidJourney. You made me more appealing than I was in the 1951 when I got kicked out of the library for reading books for adults, not stuff about Freddy the Pig.
Participating libraries will get a year of the service for free. We cannot know just how much they will be saving, though, since the pricing remains a mystery. Writer Tim Spalding describes how Talpa works:
“First, Talpa queries large language models (from Claude AI and ChatGPT) for books and other media. Critically, every item is checked against true and authoritative bibliographic data, solving the problem of invented answers (called ‘hallucinations’) that such models can fall into. Second, Talpa uses the natural-language abilities of large language models to parse and understand queries, which are then answered using traditional library data. Thus a search for ‘novels about World War II in France’ is broken down into subjects and tags and answered with results from the library’s collection. Our authoritative book data comes from Syndetics Unbound, Bowker and LibraryThing. Surprisingly, Talpa’s ability to find books by their cover design isn’t powered by AI at all, but by the effort of thousands of book lovers who have played LibraryThing’s CoverGuess cover-tagging game since 2010!”
Interesting. If you don’t happen to be part of a library using Syndetics, you can try Talpa out at one of the three libraries linked to in the post. The tool sports a cute mole mascot and, to add a bit of personality, supplies mole facts beneath the search bar. As with many AI tools, the functionality has plenty of room to grow. For example, my search for “weaving velvet” did return a few loom-centered books scattered through the results but more prominently suggested works of fiction or philosophy that simply contained “velvet” in the title. (Including, adorably, several versions of “The Velveteen Rabbit.”) The write-up does not share when the tool will be available more widely, but we hope it will be more refined when it is. Is it AI? Isn’t everything?
Cynthia Murrell, July 19, 2023
Amazon Is Winning the Product Search Derby… for Now
July 12, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Google cannot be happy about these numbers. We learn from a piece at Search Engine Land that now “50% of Product Searches Start on Amazon.” That is even worse for the competition than previously predicted. In fact, Google’s share of this market has slipped to less than a third at 31.5%. What’s Google’s solution to this click loss? Higher ad pricing? Or maybe an even higher ad-to-real content ratio?
The search racers are struggling to win traffic related to products. What has Amazon accomplished? Has Google’s vehicle lost power? What about Microsoft, a company whose engine is Bing-ing?
We also learn just 14% of respondents start their searches at retail or brand websites, while social media and review sites each capture a measly 2%. But that could change as Generation Z continues to age into independent shoppers. That group is the most likely to launch searches from social media. They are also most inclined to check online reviews. Reviews with photos are especially influential. Writer Danny Goodwin cites a recent Pew survey as he writes:
“Reviews and ratings can make or break a sale more than any other factor, including product price, free shipping, free returns and exchanges, and more. Overall, 77% of respondents said they specifically seek out websites with reviews – and this number was even higher for Gen Z (87%) and millennials (81%). Ratings without accompanying reviews are considered untrustworthy by 56% of survey respondents. Where people read reviews and ratings:
- Amazon: 94%
- Retail websites (e.g., Target, Wal-Mart): 91%
- Search engines: 70%
- Brand websites (the brand that manufactures the product: 68%
- Independent review sites: 40%
User-generated photos and videos gain value. Sixty percent of consumers looked at user-generated images or videos when learning about new products.
- 77% of respondents said they trust customer photos and videos.
- 53% said user-generated photos and videos from previous customers impacted their decision whether to purchase a product.”
So there you have it—if you have a product to market online, best encourage reviews. With pics, or it didn’t happen. Videos are a significant marketing factor. What happens if Zuck’s Threads pushes into product search, effectively linking text promotions with Instagram? And the Google? Let’s ask Bard?
Cynthia Murrell, July 12, 2023
Scinapse Is A Free Academic-Centric Database
July 11, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Quality academic worthy databases are difficult to locate outside of libraries and schools. Google Scholar attempted to qualify as an alternative to paywalled databases, but it returns repetitive and inaccurate results. Thanks to AI algorithms, free databases improved, such as Scinapse.
Scinapse is designed by Pluto and it is advertised as the “researcher’s favorite search engine. Scinapse delivers accurate and updated research materials in each search. Many free databases pull their results from old citations and fail to include recent publications. Pluto promises Scinapse delivers high-performing results due to its original algorithm optimized for research.
The algorithm returns research materials based on when it was published, how many times it was citied, and how impactful a paper was in notable journals. Scinapse consistently delivers results that are better than Google Scholar. Each search item includes a complete citation for quick reference. The customized filters offer the typical ways to narrow or broaden results, including journal, field of study, conference, author, publication year, and more.
People can also create an account to organize their research in reading lists, share with other scholars, or export as a citation list. Perhaps the most innovative feature is the paper recommendations where Scinapse sends paper citations that align with research. Scinapse aggregates over 48,000 journals. There are users in 196 countries and 1,130 reputable affiliations. Scinapse’s data sources include Microsoft Research, PubMed, Semantic Scholar, and Springer Nature.
Whitney Grace, July 11, 2023
In the Midst of Info Chaos, a Path Identified and Explained
July 10, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
The Thread – Twitter spat in the midst of BlueSky and Mastodon mark a modest change in having one place to go for current information. How does one maintain awareness with high school taunts awing, Mastodon explaining how easy it is to use, and BlueSky doing its deep gaze thing?
One answer and a quite good one at that appears in “RSS for Post-Twitter News and Web Monitoring.” The author knows quite a bit about finding information, and she also has the wisdom to address me as “dinobaby.” I know a GenZ when I get an email that begins, “Hey, there.” Trust me. That salutation does not work as the author expects.
In the cited article, you will get useful information about newsfeeds, screenshots, and practical advice. Here’s an example of what’s in the excellent how to:
If you want to check a site for RSS feeds and you think it might be a WordPress site, just add /feed/ to the end of the domain name. You might get a 404 error, but you also might get a page full of information!
There are more tips. Just navigate to Research Buzz, and learn.
This dinobaby awards one swish of its tail to Tara Calishain. Swish.
Stephen E Arnold, July 10, 2023
Neeva: Is This Google Killer on the Run?
May 18, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Sometimes I think it is 2007 doing the déjà vu dance. I read “Report: Snowflake Is in Advanced Talks to Acquire Search Startup Neeva.” Founded by Xooglers, Neeva was positioned to revolutionize search and generate subscription revenue. Along the highway to the pot of gold, Neeva would deliver on point results. How did that pay for search model work out?
According to the article:
Snowflake Inc., the cloud-based data warehouse provider, is reportedly in advanced talks to acquire a search startup called Neeva Inc. that was founded by former Google LLC advertising executive Sridhar Ramaswamy.
Like every other content processing company I bump into, Neeva was doing smart software. Combine the relevance angle with generative AI and what do you get? A start up that is going to be acquired by a firm with some interesting ideas about how to use search and retrieval to make life better.
Are there other search outfits with a similar business model? Sure, Kagi comes to mind. I used to keep track of start ups which had technology that would provide relevant results to users and a big payday to the investors. Do these names ring a bell?
Cluuz
Deepset
Glean
Kyndi
Siderian
Umiboza
If the Snowflake Neeva deal comes to fruition, will it follow the trajectory of IBM Vivisimo. Vivisimo disappeared as an entity and morphed into a big data component. No problem. But Vivisimo was a metasearch and on-the-fly tagging system. Will the tie up be similar to the Microsoft acquisition of Fast Search & Transfer. Fast still lives but I don’t know too many Softies who know about the backstory. Then there is the HP Autonomy deal. The acquisition is still playing out in the legal eagle sauna.
Few care about the nuances of search and retrieval. Those seemingly irrelevant details can have interesting consequences. Some are okay like the Dassault Exalead deal. Others? Less okay.
Stephen E Arnold, May 18, 2023
Am I a Moron Because I Use You.com?
May 10, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
“Only Morons Use ChatGPT As a Substitute for Google” is a declarative statement. Three words strike me as important in the title of the Lifehacker (an online publication).
First, “morons.” A moron according to TheFreedictionary.com citation is: A city in Eastern Argentina although it has the accented ó. On to the next definition which is “A person who is considered foolish or stupid.” I think this is closer to the mark. I am not comfortable invoking the third definition because it aims denotative punch a a person with a person having a mental age of from seven to 12. I am 78, so let’s go with “foolish or stupid.” I am in that set.
Second, “ChatGPT.” I think the moniker can apply specifically to the for-fee service of OpenAI. It is possible that “ChatGPT” stands for an entire class of generative software. I tried to make a list of a who’s who in generative software and abandoned the task. Quite a few companies are in the game either directly like the aforementioned OpenAI or a bandwagon of companies joyfully tallied by ProductWatch.com and a few LinkedIn contributors. I think the idea is that ChatGPT outputs content which is either derivative (a characteristic of a machine eating other people’s words and images) or hallucinatory (a feature of software which can go off the rails and output like a digital Lewis Carroll galumphing around a park in which young females frolic).
Third, “Google.” My hunch is that the author is an expert online searcher who like many open source intelligence professionals rely on the advertising-supported Google search for objective, on-point answers. Oh, my, that’s quite a reliable source of information. I want to point out that Google focuses on revenue-generation from advertising. Accuracy of results often has little connection to the user’s query. My interpretation of the word “Google” is that Google is good, probably better than “ChatGPT” in providing answers designed to meet the needs of users who may not read above the 9th grade level, struggle with derivatives, and cannot name the capital of Tasmania. (It is Hobart, by the way.)
I am on the fence with the word “only.” I am not comfortable with categorical affirmatives. Given the context of the article and the fact that Google is the Web search engine of choice (conscious or manipulated) for 90 percent of people in North America and Western Europe, I can understand why the field of view is narrow. An expert with regard to Google knows more and more about less and less.
Why is ChatGPT presented as the yan to Google’s yang? The write up says:
If you can’t fact-check the bot, the answers it gives are useless.
Okay, so people fact-check Google outputs. What does Google present with regard to Larry Page and his alleged interactions with Mr. Epstein, the late Mr. Epstein I should say. Run this query Google Page Epstein and this is what the system presented to me on May 6, 2023:
Are these stories accurate? Is CNBC the most relevant story about a lawsuit involving JPMorgan Chase and Jeffrey Epstein? What about the New York Post article “Virgin Islands Struggles to Subpoena Google’s Larry Page in Jeffrey Epstein Lawsuit against JPMorgan.” Or the tech insider outfit Gizmodo which offers “Google’s Larry Page Could Be Served in Jeffrey Epstein Case.” There are citations to the British newspaper Telegraph and one from Law & Crime. There is a Yahoo story and a Daily Beast article.
Now what does You.com offer me when I query Google Page Epstein?
There have been recent news articles reporting that the US Virgin Islands has issued subpoenas to Google co-founder Larry Page in connection with a lawsuit against JPMorgan Chase over the Jeffrey Epstein sex trafficking case 1. The lawsuit alleges that JPMorgan Chase facilitated Epstein’s trafficking scheme by continuously offering financial support to Epstein despite knowing about the illegal activities conducted at Little St. James, a private island owned by Epstein. The subpoenas seek documents and communications related to JPMorgan’s banking relationship with Epstein. Sergey Brin, another Google co-founder, has also reportedly been subpoenaed. Despite the subpoenas, it appears that the US Virgin Islands is struggling to serve Page with the legal documents. A recent ruling allows Page to be served with the subpoenas via his attorney in California. cnbc1
I would suggest that the Google citations provide a list, no indication of which source is more or less highly regarded for accuracy. Google wants me to click on one or more of the links, ingest the content of each article, and then synthesize the items of information which strike me as on the money. You.com on the other hand provides me with the bare bones of the alleged involvement with a person who like Lewis Carroll may have had an interest in hanging out around a park on a sunny Saturday afternoon. Catching some rays and perhaps coming up with new ideas are interpretations of such as action by a lawyer hired to explain the late and much lamented Mr. Epstein.
So which is it? The harvesting of buckwheat the old-fashioned way or the pellet of information spat out in a second or two?
I think the idea is that morons are going to go the ChatGPT-like route. Wizards and authors of online “real” news articles want to swing that sickle and relive the thrill of the workers in Vincent van Gogh’s “The Harvest.”
The article says:
you can’t tell whether an AI-generated fact is true or not by the way the text looks; it’s designed to look plausible and correct. You have to fact-check it.
Does one need to fact-check what Google spits out? What about the people who follow Google Maps’s instructions and drive off a cliff? What about the links in Google Scholar to papers with non-reproducible results?
Here’s the conclusion to the write up:
So if you want to use ChatGPT to get ideas or brainstorm places to look for more information, fine. But don’t expect it to base its answers on reality. Even for something as innocuous as recommending books based on your favorites, it’s likely to make up books that don’t even exist.
I like that “don’t even exist.” Google Bard would never do that. Google management would never fire a smart software executive who points out that Google’s smart software is biased. Google would never provide search results that explain how to steal copyright protected software. Well, maybe just one time like this:
Oh, no. Wonky software would never ever do that but for Google’s results via YouTube for the query “Magix Vegas crack.” Now who is a moron? Perhaps an apologist for Google?
Stephen E Arnold, May 10, 2023
Divorcing the Google: Legal Eagles Experience a Frisson of Anticipation
April 24, 2023
No smart software has been used to create this dinobaby’s blog post.
I have poked around looking for a version or copy of the contract Samsung signed with Google for the firms’ mobile phone tie up. Based on what I have heard at conferences and read on the Internet (of course, I believe everything I read on the Internet, don’t you?), it appears that there are several major deals.
The first is the use of and access to the mindlessly fragmented Android mobile phone software. Samsung can do some innovating, but the Google is into providing “great experiences.” Why would a mobile phone maker like Samsung allow a user to manage contacts and block mobile calls without implementing a modern day hunt for gold near Placer.
The second is the “suggestion” — mind you, the suggestion is nothing more than a gentle nudge — to keep that largely-malware-free Google Play Store front and center.
The third is the default search engine. Buy a Samsung get Google Search.
Now you know why the legal eagles a shivering when they think of litigation to redo the Google – Samsun deal. For those who think the misinformation zipping around about Microsoft Bing displacing Google Search, my thought would be to ask yourself, “Who gains by pumping out this type of disinformation?” One answer is big Chinese mobile phone manufacturers. This is Art of War stuff, and I won’t dwell on this. What about Microsoft? Maybe but I like to think happy thoughts about Microsoft. I say, “No one at Microsoft would engage in disinformation intended to make life difficult for the online advertising king. Another possibility is Silicon Valley type journalists who pick up rumors, amplify them, and then comment that Samsung is kicking the tires of Bing with ChatGPT. Suddenly a “real” news outfit emits the Samsung rumor. Exciting for the legal eagles.
The write up “Samsung Can’t Dump Google for Bing As the Default Search Engine on Its Phones” does a good job of explaining the contours of a Google – Samsung tie up.
Several observations:
First, the alleged Samsung search replacement provides a glimpse of how certain information can move from whispers at conferences to headlines.
Second, I would not bet against lawyers. With enough money, contracts can be nullified, transformed, or left alone. The only option which disappoints attorneys is the one that lets sleeping dogs lie.
Third, the growing upswell of anti-Google sentiment is noticeable. That may be a far larger problem for Googzilla than rumors about Samsung. Perceptions can be quite real, and they translate into impacts. I am tempted to quote William James, but I won’t.
Net net: If Samsung wants to swizzle a deal with an entity other than the Google, the lawyers may vibrate with such frequency that a feather or two may fall off.
Stephen E Arnold, April 24, 2023
Useful Scholarly / Semi-Scholarly Research System with Deduplicated Results
March 24, 2023
I was delighted to receive a link to OpenAIRE Explore. The service is sponsored by a non-profit partnership established in 2018 as a legal outfit. The objective is to “ensure a permanent open scholarly communication infrastructure to support European research.” (I am not sure whoever wrote the description has read “Book Publishers Won’t Stop Until Libraries Are Dead.)
The specific service I found interesting is Explore located at https://explore.openaire.eu. The service is described by OpenAIRE this way:
A comprehensive and open dataset of research information covering 161m publications, 58m research data, 317k research software items, from 124k data sources, linked to 3m grants and 196k organizations.
Maybe looking at that TechDirt article will be useful.
I ran a number of queries. The probably unreadable screenshot below illustrates the nice interface and the results to my query for Hopf fibrations (if this query doesn’t make sense to you, there’s not much I can do. Perhaps OpenAIRE Explore is ill-suited to queries about Taylor Swift and Ticketmaster?):
The query returned 127 “hits” and identified four organizations as having people interested in the subject. (Hopf fibrations are quite important, in my opinion.) No ads, no crazy SEO baloney, but probably some non-error checked equations. Plus, the result set was deduplicated. Imagine that. A use Vivisimo-type function available again.
Observation: Some professional publishers are likely to find the service objectionable. Four of the giants are watching their legal eagles circle the hapless Internet Archive. But soon… maybe OpenAIRE will attract some scrutiny.
For now, OpenAIRE Explore is indeed useful.
Stephen E Arnold, March 24, 2023
20 Years Ago: Primus Knowledge Solutions
March 20, 2023
Note: Written by a real-live dinobaby. No smart software involved.
I am not criticizing Primus Knowledge Solutions (acquired by ATG in 2004 and then Oracle purchased ATG in 2011). I would ask that you read this text and consider what was marketed in 2003. The source is a description of Primus’ Answer Engine which was once located at dub dub dub primus.com/products/answerEngine:
Primus Answer Engine helps companies take full advantage of the valuable content that already exists in corporate documents and databases. Using proprietary natural language processing, Answer Engine delivers quick, relevant answers to plain English questions by bringing widespread corporate knowledge to support, agents, as well as to customers, partners, and employees via the web.
What “features” did the system provide two decades ago? The fact sheet I picked up at a search conference in 2003 told me:
- Natural language processing
- Scalability
- Database integration
- All major document types
- Insightful reporting
- Customizable interface
- Centralized administration.
The system can suggest questions and interprets these or other questions and returns a list of answers found in a company’s online documents. This allows users to view the answer in context if desired.
I mention Primus because it is one example from dozens in my files about NLP technology.
Several observations/questions:
- Where is Oracle in the ChatGPT derby? May I suggest this link for starters.
- Isn’t the principal difference between Primus and other NLP “smart software” users are chasing ChatGPT type systems, not innovators outputting marketing words?
- Are issues like updating training models and their content, biases in the models themselves, and the challenge of accurate, current data enjoying the 2003 naïveté?
Net net: ChatGPT is just one manifestation of innovators’ attempts to deal with the challenge of finding accurate, on-point, and timely information in the digital world. (This is a world I call the datasphere.)
Stephen E Arnold, March 20, 2023