Amazon Search: Just Outstanding

September 2, 2021

Authors at Paste Magazine are dedicated to assembling lists of the best streaming content from Netflix, Hulu, Amazon Prime, and other services. They know almost as much about these content libraries as their developers. The title in Paste Magazine’s article, “Amazon Prime Video’s Library Is Not Genuinely Impossible To Browse” says it all.

It is notoriously difficult to browse Amazon Prime’s content library and the problem was noted in 2018. Amazon Prime’s library contains a lot of content, much of it is considered unwatchable. The only way to locate anything is searching by its proper name, but users who want to browse films like physicals libraries and video stores of yore are abandoned.

Amazon Prime has also hidden its search function, instead it wants users to work around this road block:

It quickly becomes apparent that there is no obvious way to view that full list of sci-fi movies, suggesting that Amazon doesn’t want consumers to be able to easily find that kind of information—its user experience is built around you choosing one of the small handful of suggested films, or knowing in advance what you want to see and then specifically searching it out. However, it is possible to see the full list—in order for it to display, you just have to click on any specific sci-fi film, look at the movie’s genre tags, and click on the words “science fiction” once again.”

The search function is worse than that available in a medieval scriptorium. When users return to certain genre pages and browse the supposed complete list, the same twenty-one movies continuously reload.

Amazon Prime has thousands of titles and is designed by a high tech company, yet it cannot fix its search function? Why does Amazon, an important company that is shaking the film and television industry, not offering its users the best of the best when it comes to search? Amazon did A9, it sucked in Lucid Imagination “experts,” it intruded on Elastic search territory. And now search doesn’t work the way users expect. Has another high-tech outfit become customer hostile or just given up making search useful?

Whitney Grace,September 1, 2021

Written by Stephen E. Arnold · Filed Under Amazon, Business strategy, News, Search | 1 Comment

The British Library Channels University Microfilms and the Google

September 1, 2021

While a quick Google search can yield pertinent information, it is hard to find. Why? Google search results are clogged with paid ads and Web sites that are not authoritative sources. Newspapers are still a valuable resource, especially newspapers from before the Internet’s invention. The brilliant news is, as IanVisits shares, is that, “The British Library Puts 1 Million Newspaper Pages Online For Free.”

The British Newspaper Archive contains over forty-four million newspaper pages that range from 1600-2009. The newspapers are from British and Irish sources and they are over 10% of the newspapers the British Library owns. Around half a million pages are added the archive every month.

The newspapers currently require a subscription, but all funds go to scanning more pages to the archive. The British Newspaper Archive has released one million pages for free and plans to add another million over the next four years. Not all pages will be free, however:

“They won’t add all papers, as they say that while they consider newspapers made before 1881 to be in the public domain, that does not mean that will make all pre-1881 digitized titles available for free, as the archive is dependent on subscriptions to cover its costs. If like me you do a lot of historical research, then the cost of the full subscription is not that bad – just £80 a year for the full archive.”

The archive offers 158 free newspaper titles that range from 1720-1880. All of the newspapers that fall within this date range are in the public domain.

It would be awesome if all newspapers were available for free on the Internet, but money makes the world go round. Libraries and universities offer free access to newspaper databases and subscription services, in most cases, are not that expensive.

The good news is that researchers may have access to news stories infused with some of that good old “real” journalistic wire tapping.

Whitney Grace, September 1, 2021

Written by Stephen E. Arnold · Filed Under News, Reference tool | Comments Off on The British Library Channels University Microfilms and the Google

Semantic: Scholar and Search

September 1, 2021

The new three musketeers could be named Semantic, Scholar, and Search. What’s missing is a digital d’Artagnan. What are three valiant mousquetaires up to? Fixing search for scholarly information.

To learn why smart software goes off the rails, navigate to “Building a Better Search Engine for Semantic Scholar.” The essay documents how a group of guardsmen fixed up search which is sort of intelligent and sort of sensitive to language ambiguities like “cell”: A biological cell or “cell” in wireless call admission control. Yep, English and other languages require context to figure out what someone might be trying to say. Less tricky for bounded domains, but quite interesting for essay writing or tweets.

Please, read the article because it makes clear some of the manual interventions required to make search deliver objective, on point results. The essay is important because it talks about issues most search and retrieval “experts” prefer to keep under their kepis. Imagine what one can do with the knobs and dials in this system to generate non-objective and off point results. That would be exciting in certain scholarly fields I think.

Here are some quotes which suggest that Fancy Dan algorithmic shortcuts like those enabled by Snorkel-type solutions; for example:

Quote A

The best-trained model still makes some bizarre mistakes, and posthoc correction is needed to fix them.

Meaning: Expensive human and maybe machine processes are needed to get the model outputs back into the realm of mostly accurate.

Quote B

Here’s another:

Machine learning wisdom 101 says that “the more data the better,” but this is an oversimplification. The data has to be relevant, and it’s helpful to remove irrelevant data. We ended up needing to remove about one-third of our data that didn’t satisfy a heuristic “does it make sense” filter.

Meaning: Rough sets may be cheaper to produce but may be more expensive in the long run. Why? The outputs are just wonky, at odds with what an expert in a field knows, or just plain wrong. Does this make you curious about black box smart software? If not, it should.

Quote C

And what about this statement:

The model learned that recent papers are better than older papers, even though there was no monotonicity constraint on this feature (the only feature without such a constraint). Academic search users like recent papers, as one might expect!

Meaning: The three musketeers like their information new, fresh, and crunchy. From my point of view, this is a great reason to delete the backfiles. Even thought “old” papers may contain high value information, the new breed wants recent papers. Give ‘em what they want and save money on storage and other computational processes.

Net Net

My hunch is that many people think that search is solved. What’s the big deal? Everything is available on the Web. Free Web search is great. But commercial search systems like LexisNexis and Compendex with for fee content are chugging along.

A free and open source approach is a good concept. The trajectory of innovation points to a need for continued research and innovation. The three musketeers might find themselves replaced with a more efficient and unmanageable force like smart software trained by the Légion étrangère drunk on digital pastis.

Stephen E Arnold, September 1, 2021

Written by Stephen E. Arnold · Filed Under News, Search, Semantic | Comments Off on Semantic: Scholar and Search

T-Mobile Security: A Quote to Note

September 1, 2021

“T-Mobile Hacker Found Weakness” is a summary of the all-too-familiar story of a big company, indifference, security hand waving, and an alleged breach of alleged customers. Please, read the original “real” news story. No payee; no viewee, however. I want to highlight what I think is the most important direct quote in the write up; to wit:

Their security is awful.

That’s pretty juicy.

Wait, please. One more gem is tucked into the write up. Here’s that statement:

On August 13, the security research firm Unit221B LLC reported to T-Mobile that an account was attempting to sell T-Mobile customer data, according to the security firm.

What this statement, if accurate, suggests that the hundreds of high end, proactive threat detection systems did not spot this breach and offer of customer data.

One firm did. And what about other cyber security experts?

My hunch is that if the statements in the article are on the money, it may be time to entertain this question: Why don’t high end cyber security systems work?

Stephen E Arnold, September 1, 2021

Written by Stephen E. Arnold · Filed Under cybersecurity, News, Security | Comments Off on T-Mobile Security: A Quote to Note

« Previous Page

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.