July 4, 2015
Short honk: I have been a fan of the Silobreaker system, which is available for commercial and governmental content processing. I read Network Products Guide “New Products and Service: Winners 10th Annual 2015 IT Awards” recommended solutions league table this morning. Silobreaker, founded by a couple of wizards with military and commercial experience. According to the league table, the Silobreaker content processing and information access system is the top dog for applications centering in Europe, the Middle East and Asia. This means that the system’s multi-lingual capabilities were the best, according to the Network Products Guide’s editors. The company also nailed a silver medal for US focused solutions. You can get more information about Silobreaker at www.silobreaker.com. Sign up. Join the thousands of users who want to work with a winner.
Stephen E Arnold, July 4, 2015
June 22, 2015
I read “Publishers Slam Apple over Presumptuous News App Conditions.” Publishers presumptuous? I know of one publisher who used my research and marketed it on Amazon without my permission. Was that presumptuous of IDC and its wizard Dave Schubmehl?
According to the write up:
Publishers are up in arms following an email from Apple about inclusion in the firm’s upcoming News application and the kind of conditions that will be imposed. The email said that participants are presumed to have accepted Apple’s terms unless they explicitly opt out. It’s the old opt-out over opt-in thing.
Yes, up in arms. I can see the publishers at the New York Athletic Club wielding their squash rackets with malice. My goodness, what a chilling thought. What if those white clad clubsters were to descend on the Apple store in Manhattan and threaten the geniuses?
My fears subsided when I read:
The service will draw content from publicly available RSS feeds, and it is possible that Apple will be challenged, according to one expert, but not in any really meaningful way.
My concern for a Squash Assault receded. Publishers may have to retire to the Yacht Club to find another option.
Stephen E Arnold, June 22, 2015
June 18, 2015
Forget the mom and pop app. A couple of big outfits are going to select and present information you will consume. Choice? Well, for those who are [a] busy, [b] unable to read, and [c] those with short attention spans—your life is going to be just peachy.
The first rumble comes from lovable Apple. Navigate to “Apple Inc. To Hire Journalists For Curated Content On News App.” I highlighted this passage:
Apple’s decision to hire journalists is the latest example of fusion between news media and tech companies. In the last few years, many social networks such as Facebook, Twitter, and LinkedIn, have hired editors and reporters from high profile news media, such as NBC and News Corp. Recently, Snapchat also hired reporters from CNN, and The Verge, a tech site.
The article reminded me that Facebook is ambling down a content path as well.
The next it is that the recruiting tool LinkedIn is going to use humans to “tailor news.” The details, which I assume are spot are, appear in “LinkedIn Brings Back Human Editors to Tailor News to You.” I circled this statement:
But to compete with these other products, Kothari knows that Pulse must offer something different. It’s the “world’s first personalized business news digest,” he says. More importantly, perhaps, LinkedIn’s Pulse is bringing back human editors, not just algorithms, to tailor the news you see to what it already knows about you. And yet it may not be alone—Apple is reportedly planning to curate news with the help of humans, too.
Also, the GOOG, already armed with APIs and the warm and fuzzy news service is taking another baby step into content as well. The story I printed out is called “A New Window into Our World with Real Time Trends.” Yep, just family because it is “our world.” Google says:
On the new google.com/trends, you’ll find a ranked, real-time list of trending stories that are gaining traction across Google. In addition to Search, we now look at trends from YouTube and Google News and combine them to better understand what topics and stories are trending across the web right now. The redesigned homepage is now available in 28 countries around the world, and we’ll continue to add more locations in the coming months.
What’s the impact of these digital Gutenberg twirls?
My initial reaction is that TheNeeds.com and similar services will be doing some talking with their investors. Whatever money these news recyclers have is probably not going to be enough to deal with the Apples, Facebooks, Googles, and LinkedIns of the world. Heck, LinkedIn may need more dough too.
Second, are there enough readers to allow each of these services to meet the expectations of the spreadsheet jockeys who project revenues? My hunch is that the answer is, “Nope.” More concentration ahead I opine.
And, third, what about the old line publishing companies which continue to pretend that their products and services are exactly what the market wants? More pain and not much gain I assume.
Exciting times for the digital Gutenbergs? Too bad my study Google: The Digital Gutenberg is out of print. If you are curious about this trend, let me know and I will spin up a PDF of that original study. Write email@example.com.
Stephen E Arnold, June 18, 2015
June 1, 2015
Short honk: I read this item: “Slashdot Burying Stories About Slashdot Media Owned SourceForge.” The idea is that publications have to have an editorial policy. In this article, it seems that one popular next generation news aggregator is making some interesting choices. According to the article,
If you’ve followed any tech news aggregator in the past week, you’ve probably seen the story about how SourceForge is taking over admin accounts for existing projects and injecting adware in installers for packages like GIMP. For anyone not following the story, SourceForge has a long history of adware laden installers, but they used to be opt-in. It appears that the process is now mandatory for many projects.
The write up concludes:
In that vein, it’s funny to see Slashdot (which is owned by the same company as SourceForge) also attempting to destroy their own brand. They’re the only major tech news aggregator which hasn’t had a story on this, and that’s because they’ve buried every story that someone submits. This has prompted people to start submitting comments about this on other stories.
If accurate, filtering happens at publications large and small. It even happens at Beyond Search. We don’t report some of the crazier assertions made by search and content processing companies. Example: a “new” search system that displays results at page thumbnails or “breakthroughs” in natural language processing. Sorry.
Stephen E Arnold, June 1, 2015
April 25, 2015
Need patent information? Lots of folks believed that making sense of the public documents available from the USPTO were the road to riches. Before I kicked back to enjoy the sylvan life in rural Kentucky, I did some work on Fancy Dan patent systems. There was a brush with the IBM Intelligent Patent Miner system. For those who do not recall their search history, you can find a chunk of information in “Information Mining with the IBM Intelligent Miner Family.” Keep in mind that the write up is about 20 years old. (Please, notice that the LexisNexis system discussed below uses many of the same, time worn techniques.)
Patented dog coat.
Then there was the Manning & Napier “smart” patent analysis system with analyses’ output displayed in three-D visualizations. I bumped into Derwent (now Intellectual Property & Science) and other Thomson Corp. solutions as well. And, of course, there was may work for an unnamed, mostly clueless multi billion dollar outfit related to Google’s patent documents. I summarized the results of this analysis in my Google Version 2.0 monograph, portions of which were published by BearStearns before it met its thrilling end seven years ago. (Was my boss the fellow carrying a box out of the Midtown BearStearns’ building?)
Why the history?
Well, patents are expensive to litigate. For some companies, intellectual property is a revenue stream.
There is a knot in the headphone cable. Law firms are not the go go business they were 15 or 20 years ago. Law school grads are running gyms; some are Uber drivers. Like many modern post Reagan businesses, concentration is the name of the game. For the big firms with the big buck clients, money is no object.
The problem in the legal information business is that smaller shops, including the one and two person outfits operating in Dixie Highway type of real estate do not want to pay for the $200 and up per search commercial online services charge. Even when I was working for some high rollers, the notion of a five or six figure online charge elicited what I would diplomatically describe as gentle push back.
I read “LexisNexis TotalPatent Keeps Patent Research out of the Black Box with Improved Version of Semantic Search.” For those out of touch with online history, I worked for a company in the 1980s which provided commercial databases to LexisNexis. I knew one of the founders (Don Wilson). I even had reasonably functional working relationships with Dan Prickett and people named “Jim” and “Sharon.” In one bizarre incident, a big wheel from LexisNexis wanted to meet with me in the Cherry Hill Mall’s parking lot across from the old Bell Labs’ facility where I was a consultant at the time. Err, no thanks. I was okay with the wonky environs of Bell Labs. I was not okay with the lash up of a Dutch and British company.
Snippet of code from a Ramanathan Guha invention. Guha used to be at IBM Almaden and he is a bright fellow. See US7593939 B2.
What does LexisNexis TotalPatent deliver for a fee? According to the write up:
TotalPatent, a web-based patent research, retrieval and analysis solution powered by the world’s biggest assortment of searchable full-text and bibliographic patent authorities, allows researchers to enter as much as 32,000 characters (comparable to more than 10 pages of text)—much over along a whole patent abstract—into its search industry. The newly enhanced semantic brains, pioneered by LexisNexis during 2009 and continually improved upon utilizing contextual information supplied by the useful patent data offered to the machine, current results in the form of a user-adjustable term cloud, where the weighting and positioning of terms may be managed for lots more precise results. And countless full-text patent documents, TotalPatent in addition utilizes systematic, technical also non-patent literature to go back the deepest, most comprehensive serp’s.
March 29, 2015
Let me be clear. I am not a Facebook user. One of the goslings configured the Beyond Search blog to send content to a Facebook page. I, however, do not need a stream of information about my high school and college classmates. At my last reunion, the 50th, I saw only two mobile phones: My wife’s and mine. Obviously central Illinois is not a technology hot spot for the over 70 set.
I read “Many, Many Facebook Users Still Don’t Know That Their News Fees Are Filtered by an Algorithm.” Big whoop. Most of the MBAs I know are clueless about Google’s personalization functions and don’t have much appetite for understanding that what you see may not be what is available. For these cohorts, a little learning is just fine. Drinking from a spring is okay as long as the water comes from an authentic source like Dasani. Isn’t that Coca Cola’s outfit?
The write up reveals what strikes me as a no brainer type factoid:
But a majority of everyday Facebook users in a recent study had no idea that Facebook constructs their experience, pushing certain posts into their stream and leaving others out. And worse, many participants blamed themselves, not Facebook’s software, when friends or family disappeared from their news feeds.
The article reports:
While some participants were upset by the idea that Facebook was changing their social experience, more than half of the study participants “came to appreciate the algorithm over the course of the study.” Most came to think that the filtering and ranking software was actually doing a decent job. “Honestly I have nothing to change which I’m surprised!” one said. “Because I came in like ‘Ah, they’re screwing it all!’”
Sigh. Is there a remedy for this lack of understanding? Nope.
Do most online “experts” care? Nah, but some of them charge windmills with their iPad Airs as a shield.
The reality is that a comprehensive understanding of a particular content domain requires good, old fashioned research. The idea is to read, talk to informed individuals, gather additional primary data, analyze what you collect, and then figure out who knows what about a topic.
We are doing this type of grunt work about one facet of the Dark Web. Early results are in. Most of the people writing about the Dark Web are not doing a particularly good job of explaining where the “dark” content lives, how to find it, or what the content reveals about a fundamental shift in online usage for a small but important and interesting group of users worldwide.
If one cannot understand what Facebook is doing, the Dark Web is of zero consequence. If a Google user accepts search results as objective, I am not sure there is much hope for remedial intervention.
Net net: At a time when ease, convenience, short cuts, and distractions are of primary importance, thinking about information is not of much interest to many people.
“Hey, after the NCAA games, let’s binge watch Breaking Bad. We can post our comments on Facebook too!”
Sound fun? Oh, wait. I have to take this call, send an SMS, and post a picture of our pizza to Facebook. Cool.
Stephen E Arnold, March 29, 2015
January 9, 2015
Books. Interesting idea. Are books a growth market in the Amazon world?Bing is looking at books. Err, doesn’t Amazon/Goodreads do this? I read “Finding Great Books Just got Easier with Bing Best Sellers Search.” The article provides some suggested searches; for example, best business books. I am not sure how many of the thumb typing crowd are into books. Perhaps Bing can pull new readers with its new service? My hunch is that Bing is likely to generate more sales for Amazon. Publishers will find the Bing thing a step in the right direction.
Stephen E Arnold, January 9, 2015
December 30, 2014
I read “WWWTXT: The Oldest Internet Archive.” The write up makes clear that archival online content is tough to find. I like the idea that online history is lost. The idea, one might say, is that lack of awareness of the past makes everything new again. Here’s a quote I noted:
(Rehn’s archive was acquired from the now-defunct Deja News, which was acquired by Google in 2001.) These days, the majority of new content he gets is from old BBS archives, either given to him, or found on old floppy disks.
When experts in search are clueless about early information retrieval systems, I thought it was a failure on the part of the expert. Now I see. Those folks have no past to which to refer. Hence, old stuff is innovative. Good to know.
Stephen E Arnold, December 30, 2014
December 22, 2014
To my two or three readers:
We will be reducing the flow of stories from December 18, 2014, to January 1, 2015. Coverage in Beyond Search will be expanded to include the new Cyber OSINT data stream and including content about NGIA (next generation information access). I will be moving the IDC/Schubmehl content to the Xenky.com Web site to make on going references to the reputation surfing easier to reference.
Enjoy the holidays.
Stephen E Arnold, December 22, 2014
December 22, 2014
Years and years ago, a unit of the Courier Journal & Louisville Times created the Business Dateline database. As far as I know, it was the first full text online database to feature corrections. The team believed that most online content contained flaws, and neither the database producers, the publishers, nor the online distributions like LexisNexis invested much effort in accuracy. How many databases followed in our footsteps? Well, not too many. At one time it was exactly zero. But people perceive information from a computer as accurate, based on studies we did at the newspaper and subsequently as part of ArnoldIT’s work.
Flash forward to our go go now. The worm, after several decades, may be turning, albeit slowly. Navigate to “Elsevier Retracting 16 Papers for Faked Peer Review.” Assuming the write up was itself accurate, I noted this passage:
We consider ourselves to have an important role in prevention. We try to put a positive tone to our education material, so it’s not a draconian “we will catch you” – it’s also about the importance of research integrity for science, the perception of science with taxpayers…there are a lot of rewards for doing this the right way.
The questions in my mind are:
- How many errors are in the LexisNexis online file? What steps are being taken to remove the ones known to be incorrect; for example, technical papers with flawed information?
- How will Elsevier alert its customers that some information may be inaccurate?
- What process is in place for other Elsevier properties to correct, minimize, and eliminate errors in print and online content?
I can imagine myself in a meeting with Elsevier’s senior management. My task is to propose specific measures to ensure quality, accuracy, and timeliness in Elsevier’s products. I am not sure my suggestions will be ones that generate a great deal of enthusiasm. Hopefully, I am incorrect.
Stephen E Arnold, December 22, 2014