Amazon Search: Just Outstanding
September 2, 2021
Authors at Paste Magazine are dedicated to assembling lists of the best streaming content from Netflix, Hulu, Amazon Prime, and other services. They know almost as much about these content libraries as their developers. The title in Paste Magazine’s article, “Amazon Prime Video’s Library Is Not Genuinely Impossible To Browse” says it all.
It is notoriously difficult to browse Amazon Prime’s content library and the problem was noted in 2018. Amazon Prime’s library contains a lot of content, much of it is considered unwatchable. The only way to locate anything is searching by its proper name, but users who want to browse films like physicals libraries and video stores of yore are abandoned.
Amazon Prime has also hidden its search function, instead it wants users to work around this road block:
It quickly becomes apparent that there is no obvious way to view that full list of sci-fi movies, suggesting that Amazon doesn’t want consumers to be able to easily find that kind of information—its user experience is built around you choosing one of the small handful of suggested films, or knowing in advance what you want to see and then specifically searching it out. However, it is possible to see the full list—in order for it to display, you just have to click on any specific sci-fi film, look at the movie’s genre tags, and click on the words “science fiction” once again.”
The search function is worse than that available in a medieval scriptorium. When users return to certain genre pages and browse the supposed complete list, the same twenty-one movies continuously reload.
Amazon Prime has thousands of titles and is designed by a high tech company, yet it cannot fix its search function? Why does Amazon, an important company that is shaking the film and television industry, not offering its users the best of the best when it comes to search? Amazon did A9, it sucked in Lucid Imagination “experts,” it intruded on Elastic search territory. And now search doesn’t work the way users expect. Has another high-tech outfit become customer hostile or just given up making search useful?
Whitney Grace,September 1, 2021
The British Library Channels University Microfilms and the Google
September 1, 2021
While a quick Google search can yield pertinent information, it is hard to find. Why? Google search results are clogged with paid ads and Web sites that are not authoritative sources. Newspapers are still a valuable resource, especially newspapers from before the Internet’s invention. The brilliant news is, as IanVisits shares, is that, “The British Library Puts 1 Million Newspaper Pages Online For Free.”
The British Newspaper Archive contains over forty-four million newspaper pages that range from 1600-2009. The newspapers are from British and Irish sources and they are over 10% of the newspapers the British Library owns. Around half a million pages are added the archive every month.
The newspapers currently require a subscription, but all funds go to scanning more pages to the archive. The British Newspaper Archive has released one million pages for free and plans to add another million over the next four years. Not all pages will be free, however:
“They won’t add all papers, as they say that while they consider newspapers made before 1881 to be in the public domain, that does not mean that will make all pre-1881 digitized titles available for free, as the archive is dependent on subscriptions to cover its costs. If like me you do a lot of historical research, then the cost of the full subscription is not that bad – just £80 a year for the full archive.”
The archive offers 158 free newspaper titles that range from 1720-1880. All of the newspapers that fall within this date range are in the public domain.
It would be awesome if all newspapers were available for free on the Internet, but money makes the world go round. Libraries and universities offer free access to newspaper databases and subscription services, in most cases, are not that expensive.
The good news is that researchers may have access to news stories infused with some of that good old “real” journalistic wire tapping.
Whitney Grace, September 1, 2021
Semantic: Scholar and Search
September 1, 2021
The new three musketeers could be named Semantic, Scholar, and Search. What’s missing is a digital d’Artagnan. What are three valiant mousquetaires up to? Fixing search for scholarly information.
To learn why smart software goes off the rails, navigate to “Building a Better Search Engine for Semantic Scholar.” The essay documents how a group of guardsmen fixed up search which is sort of intelligent and sort of sensitive to language ambiguities like “cell”: A biological cell or “cell” in wireless call admission control. Yep, English and other languages require context to figure out what someone might be trying to say. Less tricky for bounded domains, but quite interesting for essay writing or tweets.
Please, read the article because it makes clear some of the manual interventions required to make search deliver objective, on point results. The essay is important because it talks about issues most search and retrieval “experts” prefer to keep under their kepis. Imagine what one can do with the knobs and dials in this system to generate non-objective and off point results. That would be exciting in certain scholarly fields I think.
Here are some quotes which suggest that Fancy Dan algorithmic shortcuts like those enabled by Snorkel-type solutions; for example:
Quote A
The best-trained model still makes some bizarre mistakes, and posthoc correction is needed to fix them.
Meaning: Expensive human and maybe machine processes are needed to get the model outputs back into the realm of mostly accurate.
Quote B
Here’s another:
Machine learning wisdom 101 says that “the more data the better,” but this is an oversimplification. The data has to be relevant, and it’s helpful to remove irrelevant data. We ended up needing to remove about one-third of our data that didn’t satisfy a heuristic “does it make sense” filter.
Meaning: Rough sets may be cheaper to produce but may be more expensive in the long run. Why? The outputs are just wonky, at odds with what an expert in a field knows, or just plain wrong. Does this make you curious about black box smart software? If not, it should.
Quote C
And what about this statement:
The model learned that recent papers are better than older papers, even though there was no monotonicity constraint on this feature (the only feature without such a constraint). Academic search users like recent papers, as one might expect!
Meaning: The three musketeers like their information new, fresh, and crunchy. From my point of view, this is a great reason to delete the backfiles. Even thought “old” papers may contain high value information, the new breed wants recent papers. Give ‘em what they want and save money on storage and other computational processes.
Net Net
My hunch is that many people think that search is solved. What’s the big deal? Everything is available on the Web. Free Web search is great. But commercial search systems like LexisNexis and Compendex with for fee content are chugging along.
A free and open source approach is a good concept. The trajectory of innovation points to a need for continued research and innovation. The three musketeers might find themselves replaced with a more efficient and unmanageable force like smart software trained by the Légion étrangère drunk on digital pastis.
Stephen E Arnold, September 1, 2021
T-Mobile Security: A Quote to Note
September 1, 2021
“T-Mobile Hacker Found Weakness” is a summary of the all-too-familiar story of a big company, indifference, security hand waving, and an alleged breach of alleged customers. Please, read the original “real” news story. No payee; no viewee, however. I want to highlight what I think is the most important direct quote in the write up; to wit:
Their security is awful.
That’s pretty juicy.
Wait, please. One more gem is tucked into the write up. Here’s that statement:
On August 13, the security research firm Unit221B LLC reported to T-Mobile that an account was attempting to sell T-Mobile customer data, according to the security firm.
What this statement, if accurate, suggests that the hundreds of high end, proactive threat detection systems did not spot this breach and offer of customer data.
One firm did. And what about other cyber security experts?
My hunch is that if the statements in the article are on the money, it may be time to entertain this question: Why don’t high end cyber security systems work?
Stephen E Arnold, September 1, 2021