CyberOSINT banner

LexisNexis: Riding the Patent Pony

April 25, 2015

Need patent information? Lots of folks believed that making sense of the public documents available from the USPTO were the road to riches. Before I kicked back to enjoy the sylvan life in rural Kentucky, I did some work on Fancy Dan patent systems. There was a brush with the IBM Intelligent Patent Miner system. For those who do not recall their search history, you can find a chunk of information in “Information Mining with the IBM Intelligent Miner Family.” Keep in mind that the write up is about 20 years old. (Please, notice that the LexisNexis system discussed below uses many of the same, time worn techniques.)


Patented dog coat.

Then there was the Manning & Napier “smart” patent analysis system with analyses’ output displayed in three-D visualizations. I bumped into Derwent (now Intellectual Property & Science) and other Thomson Corp. solutions as well. And, of course, there was may work for an unnamed, mostly clueless multi billion dollar outfit related to Google’s patent documents. I summarized the results of this analysis in my Google Version 2.0 monograph, portions of which were published by BearStearns before it met its thrilling end seven years ago. (Was my boss the fellow carrying a box out of the Midtown BearStearns’ building?)

Why the history?

Well, patents are expensive to litigate. For some companies, intellectual property is a revenue stream.

There is a knot in the headphone cable. Law firms are not the go go business they were 15 or 20 years ago. Law school grads are running gyms; some are Uber drivers. Like many modern post Reagan businesses, concentration is the name of the game. For the big firms with the big buck clients, money is no object.

The problem in the legal information business is that smaller shops, including the one and two person outfits operating in Dixie Highway type of real estate do not want to pay for the $200 and up per search commercial online services charge. Even when I was working for some high rollers, the notion of a five or six figure online charge elicited what I would diplomatically describe as gentle push back.

I read “LexisNexis TotalPatent Keeps Patent Research out of the Black Box with Improved Version of Semantic Search.” For those out of touch with online history, I worked for a company in the 1980s which provided commercial databases to LexisNexis. I knew one of the founders (Don Wilson). I even had reasonably functional working relationships with Dan Prickett and people named “Jim” and “Sharon.” In one bizarre incident, a big wheel from LexisNexis wanted to meet with me in the Cherry Hill Mall’s parking lot across from the old Bell Labs’ facility where I was a consultant at the time. Err, no thanks. I was okay with the wonky environs of Bell Labs. I was not okay with the lash up of a Dutch and British company.


Snippet of code from a Ramanathan Guha invention. Guha used to be at IBM Almaden and he is a bright fellow. See US7593939 B2.

What does LexisNexis TotalPatent deliver for a fee? According to the write up:

TotalPatent, a web-based patent research, retrieval and analysis solution powered by the world’s biggest assortment of searchable full-text and bibliographic patent authorities, allows researchers to enter as much as 32,000 characters (comparable to more than 10 pages of text)—much over along a whole patent abstract—into its search industry. The newly enhanced semantic brains, pioneered by LexisNexis during 2009 and continually improved upon utilizing contextual information supplied by the useful patent data offered to the machine, current results in the form of a user-adjustable term cloud, where the weighting and positioning of terms may be managed for lots more precise results. And countless full-text patent documents, TotalPatent in addition utilizes systematic, technical also non-patent literature to go back the deepest, most comprehensive serp’s.

Read more

Facebook Users Lack Understanding of Filters: No Big Surprise

March 29, 2015

Let me be clear. I am not a Facebook user. One of the goslings configured the Beyond Search blog to send content to a Facebook page. I, however, do not need a stream of information about my high school and college classmates. At my last reunion, the 50th, I saw only two mobile phones: My wife’s and mine. Obviously central Illinois is not a technology hot spot for the over 70 set.

I read “Many, Many Facebook Users Still Don’t Know That Their News Fees Are Filtered by an Algorithm.” Big whoop. Most of the MBAs I know are clueless about Google’s personalization functions and don’t have much appetite for understanding that what you see may not be what is available. For these cohorts, a little learning is just fine. Drinking from a spring is okay as long as the water comes from an authentic source like Dasani. Isn’t that Coca Cola’s outfit?

The write up reveals what strikes me as a no brainer type factoid:

But a majority of everyday Facebook users in a recent study had no idea that Facebook constructs their experience, pushing certain posts into their stream and leaving others out. And worse, many participants blamed themselves, not Facebook’s software, when friends or family disappeared from their news feeds.

The article reports:

While some participants were upset by the idea that Facebook was changing their social experience, more than half of the study participants “came to appreciate the algorithm over the course of the study.” Most came to think that the filtering and ranking software was actually doing a decent job. “Honestly I have nothing to change which I’m surprised!” one said. “Because I came in like ‘Ah, they’re screwing it all!’”

Sigh. Is there a remedy for this lack of understanding? Nope.

Do most online “experts” care? Nah, but some of them charge windmills with their iPad Airs as a shield.

The reality is that a comprehensive understanding of a particular content domain requires good, old fashioned research. The idea is to read, talk to informed individuals, gather additional primary data, analyze what you collect, and then figure out who knows what about a topic.

We are doing this type of grunt work about one facet of the Dark Web. Early results are in. Most of the people writing about the Dark Web are not doing a particularly good job of explaining where the “dark” content lives, how to find it, or what the content reveals about a fundamental shift in online usage for a small but important and interesting group of users worldwide.

If one cannot understand what Facebook is doing, the Dark Web is of zero consequence. If a Google user accepts search results as objective, I am not sure there is much hope for remedial intervention.

Net net: At a time when ease, convenience, short cuts, and distractions are of primary importance, thinking about information is not of much interest to many people.

“Hey, after the NCAA games, let’s binge watch Breaking Bad. We can post our comments on Facebook too!”

Sound fun? Oh, wait. I have to take this call, send an SMS, and post a picture of our pizza to Facebook. Cool.

Stephen E Arnold, March 29, 2015

Bing Books: Chasing a Market

January 9, 2015

Books. Interesting idea. Are books a growth market in the Amazon world?Bing is looking at books. Err, doesn’t Amazon/Goodreads do this? I read “Finding Great Books Just got Easier with Bing Best Sellers Search.” The article provides some suggested searches; for example, best business books. I am not sure how many of the thumb typing crowd are into books. Perhaps Bing can pull new readers with its new service? My hunch is that Bing is likely to generate more sales for Amazon. Publishers will find the Bing thing a step in the right direction.

Stephen E Arnold, January 9, 2015

Losing the Past Online

December 30, 2014

I read “WWWTXT: The Oldest Internet Archive.” The write up makes clear that archival online content is tough to find. I like the idea that online history is lost. The idea, one might say, is that lack of awareness of the past makes everything new again. Here’s a quote I noted:

(Rehn’s archive was acquired from the now-defunct Deja News, which was acquired by Google in 2001.) These days, the majority of new content he gets is from old BBS archives, either given to him, or found on old floppy disks.

When experts in search are clueless about early information retrieval systems, I thought it was a failure on the part of the expert. Now I see. Those folks have no past to which to refer. Hence, old stuff is innovative. Good to know.

Stephen E Arnold, December 30, 2014

Beyond Search Content Flow

December 22, 2014

To my two or three readers:

We will be reducing the flow of stories from December 18, 2014, to January 1, 2015. Coverage in Beyond Search will be expanded to include the new Cyber OSINT data stream and including content about NGIA (next generation information access). I will be moving the IDC/Schubmehl content to the Web site to make on going references to the reputation surfing easier to reference.

Enjoy the holidays.

Stephen E Arnold, December 22, 2014

Elsevier and Bad Information

December 22, 2014

Years and years ago, a unit of the Courier Journal & Louisville Times created the Business Dateline database. As far as I know, it was the first full text online database to feature corrections. The team believed that most online content contained flaws, and neither the database producers, the publishers, nor the online distributions like LexisNexis invested much effort in accuracy. How many databases followed in our footsteps? Well, not too many. At one time it was exactly zero. But people perceive information from a computer as accurate, based on studies we did at the newspaper and subsequently as part of ArnoldIT’s work.

Flash forward to our go go now. The worm, after several decades, may be turning, albeit slowly. Navigate to “Elsevier Retracting 16 Papers for Faked Peer Review.” Assuming the write up was itself accurate, I noted this passage:

We consider ourselves to have an important role in prevention. We try to put a positive tone to our education material, so it’s not a draconian “we will catch you” – it’s also about the importance of research integrity for science, the perception of science with taxpayers…there are a lot of rewards for doing this the right way.

The questions in my mind are:

  • How many errors are in the LexisNexis online file? What steps are being taken to remove the ones known to be incorrect; for example, technical papers with flawed information?
  • How will Elsevier alert its customers that some information may be inaccurate?
  • What process is in place for other Elsevier properties to correct, minimize, and eliminate errors in print and online content?

I can imagine myself in a meeting with Elsevier’s senior management. My task is to propose specific measures to ensure quality, accuracy, and timeliness in Elsevier’s products. I am not sure my suggestions will be ones that generate a great deal of enthusiasm. Hopefully, I am incorrect.

Stephen E Arnold, December 22, 2014

UK Paintings Catalog: When Every Does Not Mean Every

December 2, 2014

I love headlines like “Every Painting in the UK at Your Fingertips.” The idea is that “images and details of every painting (in tempera or acrylic) in public ownership through the United Kingdom.” Well, obviously the “every” is not every painting. There is an 86 volume set which presumably presents the images and metadata. The digital images are available at Your Paintings. There is a search box and a number of other options. I ran a query for Patrick Heron, an artist whose work I find interesting. There are some of his pictures in the Tate, and he was born . Here’s what I found:


Pretty thin. The Patrick Heron entry for the St Ives School offers a bit more information.


I am not sure if the BBC index is incomplete. It appears that posting information or links to other UK online sources is not part of the project. Also, the presentation of different search boxes on the BBC site does not make accessing the Your Paintings information easier.

The enthusiasm of the newspaper is admirable. I expect/hope that the service will improve its usability and completeness in the months ahead. The BBC is, as one of my British acquaintences with an Oxford education used to say, performant.”

Stephen E Arnold, December 2, 20141

Online Accuracy: The Hollywood Sign Approach

November 24, 2014

I read “Why People Keep Trying to Erase the Hollywood Sign from Google Maps.” The write up underscores the fluidity of the notion about accurate online information. Last time I was in Hollywood, I gave my talk at an intel conference and beat a quick path back to Kentucky. For those who think that life has not been lived until one stands at the base of a giant letter, Google Maps, if the write up is correct, may give you an extra workout. Here’s the passage I noted:

Even though Google Maps clearly marks the actual location of the sign, something funny happens when you request driving directions from any place in the city. The directions lead you to Griffith Observatory, a beautiful 1920s building located one mountain east from the sign, then—in something I’ve never seen before, anywhere on Google Maps—a dashed gray line arcs from Griffith Observatory, over Mt. Lee, to the sign’s site. Walking directions show the same thing.

Obviously in the world of online this is the only instance of information being modified so it does not match reality. I am comforted unlike some folks.

Stephen E Arnold, November 24, 2014

Mozzila and Search Changes: Meh

November 20, 2014

You can read the crashing waves of opinions about Mozilla and its falling out of love with the GOOG. “Firefox Drops Google as Default Search Engine…” presents the new, “real” journalism approach; to wit:

Firefox has lost market share in recent years but is still used by roughly 17 percent of web goers.

Juicy factoid. Small percentage in a world in which traffic and eyeballs matter.

You can get the search engine optimization/inside scoop viewpoint in “Mozilla CEO: It Wasn’t Money — Yahoo Was The Better Strategic Partner For Firefox.” I noted this:

The official line from the Mozilla blog post about the deal helps parse what being a good strategic partner seems to be. It praises Yahoo as being “aligned with our values of choice and independence” — which suggests that Firefox was feeling that Google had become too controlling or wanted more control about what was happening within Firefox. Or, perhaps Mozilla felt Google has been less about supporting the web and more about supporting itself than in the past.

My view is not just tepid; it is indifferent. Monopolistic behaviors are the order of the day. Yahoo is no monopoly. Yandex may have a shot as long it stays on the right side of certain governmental authorities. Baidu is the best of the bunch, but one misstep and I would suggest that life could be viewed through a filter.

As the browser becomes the new operating system, if you are not running what’s mainstream, there may be some challenges ahead. Do you still have an Eagle desktop computer? If so, dig it out, plug in your DEC Rainbow, and let me know how you read this blog post.

Oh, and what about search? It seems to rank right along with the Mozilla attitude toward money in my opinion.

Stephen E Arnold, November 20, 2014

Ah, History and the 20 Somethings

November 16, 2014

I had a conversation last week with a quite assured expert in content processing. I mentioned that I was 70 years old and would not attending a hippy dippy conference in New York. I elicited a chuckle.

I thought of this gentle dismissal of old stuff when I read “Old Scientific Papers Never Die, They Just Fade Away. Or They Used to.” The main idea of the article seems to be that “old” work can provide some useful factoids for the 20 somethings and 35 year old whiz kids who wear shirts with unclothed female on them. Couple a festive shirt with tattoo, and you have a microcosm of the specialists inventing the future.

Here’s a passage I noted:

“Our [Googlers] analysis indicates that, in 2013, 36% of citations were to articles that are at least 10 years old and that this fraction has grown 28% since 1990,” say Verstak and co. What’s more, the increase in the last ten years is twice as big as in the previous ten years, so the trend appears to be accelerating.

Quite an insight considering that much of the math used to deliver whizzy content processing is a couple of centuries old. I looked for a reference to Dr. Gene Garfield and did not notice one. Well, maybe he’s too old to be remembered. Should I send a link to the 20 something with whom I spoke? Nah, waste of time.

Stephen E Arnold, November 16, 2014

Next Page »