Web Search Training Wheels: A Play for Precision

August 10, 2017

I read “How to Instantly Boost the Accuracy of Search Results on Google and Bing.” i love the word “instantly”, particularly when coupled to “accuracy.” The write up describes an overlay called Advangle, which helps a person create a search with more than 2.6 words. Interesting neologism Advangle.

These services are what I call “training wheels.” The idea is that a person looking for information fills in a form, which helps the person create a query more sophisticated than “pizza.” Many systems in the last 50 years have tried these types of interfaces. In fact, one can find them in the whiz bang interfaces available to cyber OSINT software users. I won’t drag the old Dow Jones interface into this post, nor will I provide screenshots of Palantir Gotham interfaces. (Hey, you probably know about these already.)

The write up, however, does not explore the concept in too much detail. I noted this statement:

The Advantage interface makes it easier to string together targeted searches with the right syntax, and in half the time it would take to type it all out by hand.

Saving time, not prediction or recall, is the unique selling proposition.

It is useful to keep in mind that formal search operators are still available to users of Bing, Google, Yandex, and a number of other systems. The problem is that as Web search has massified, a tiny faction of the users of ad supported Web search systems bother with formal operators like filetype: or other oddities.

The real problems with search are far deeper than an interface overlay. Let me highlight several which I find consistently troublesome:

  1. Finding a way to impart the skills of well executed reference interview conducted by an expert in online search and retrieval. (Marydee Ojala, Ruth Patel, Anne Mintz, Ulla de Stricker, and Barbara Quint are individuals who can help a PhD formulate a statement of what information and data are needed, convert that desire into appropriate queries of appropriate databases, and deliver a filtered list of results.) Software, no matter how nifty the interface, at this time cannot replicate this expertise.
  2. Individuals who need information are more crippled than their counterparts from 30 years ago. Online systems have worked hard to let popularity and past user behavior provide a context for a query like “cyrus.” If you think you will get the pop star before a long dead historical figure, you are more sophisticated than the eager consumers of pop up ads on a Pixel phone
  3. Databases are governed by editorial policies. In the good old days of 1975, creators of databases figured out what and how to index. Today most users believe that Google has “all” the world’s information. Nothing could be more wrong headed. Indexes, particularly free ones, include what creates traffic. If the content gets a little too frisky, censorship, filtering, and smart / predictive software steps in and delivers “better” information.

I suggest you give the Advantage service a try. You may find that it is better than a room stuffed with Quints and Ojalas and others of this ilk.

My approach is simple: Know what one wants. Formulate a suitable query. Pass the query across the sources/databases likely to have indexed the information. Review the results. Think about the information gaps. Repeat the process.

Pretty crazy today, right?

Who has time to figure out what companies are in the cyber OSINT business or what Dark Web sites continue to offer contraband in the wake of AlphaBay and Hansa.

Research via digital resources, unlike checking Facebook, is a bit of a mental workout.

On the other hand, why not let the ad supported search engines deliver exactly what they think you need. Better yet, let these outfits provide that information before you know you need it.

A system that actually delivered precise, on point, timely, and authoritative results would be great. It would be nice to be able to live forever and travel to the stars.

Reality is a tad different. UX is not yet a replacement for knowing how to research in a way that moves beyond finding Game of Thrones.

Stephen E Arnold, August 10, 2017

India Jumps on the Filtering Bandwagon

August 9, 2017

We noted “Internet Archive Contacted Indian Govt Regarding the Block but Got No Response.” The main point is that a repository (incomplete as its collection of Web pages may be) seems to be unavailable in India. Perhaps the Indian government has found a way to search for information in the service. We have noted that searching for rich media, including the collection of 78 rpm records, is a tough slog. It is tough to find information even when it is online. When services are filtered, locating facts, semi-facts, and outright hoohaw becomes impossible. We think the actions could impair the outstanding customer support services provided by the world’s second largest nation. Efficient delivery of information centric services, however, are like to improve in Mumbai. China, Indonesia, Russia, Turkey, and now India may be taking steps to put the data doggies in the kennel.

Stephen E Arnold, August 9, 2017

Google and Its Vestager Adventure

July 7, 2017

I found the analyses of Google’s fine for certain misunderstood and misinterpreted behavior interesting. I noted a round up in that font of legal and technical wisdom, the Hollywood Reporter, which presented pros and cons of the decision. Well, sort of one pro and one con. My question, “Why was the Hollywood Reporter interested in a legal decision seemingly far removed from the concerns of Hollywood?”

I also noted “More Than Money: Why Google’s Antitrust Loss Matters.” One of the points in this write up was that the EU process might qualify some other companies for a day in court with a stop at the toll booth on the way out of the building.

I noted this passage:

These other cases involve: (1) the available range of mobile apps in the Android operating system, and (2) allegations that through AdSense, Google has prevented third-party websites from sourcing search ads. Once complete, these cases could result in similarly hefty fines. Indeed, given the European Commission’s statements regarding the potentiality of findings of abuse, it seems unlikely that Google will escape further punitive measures.

Several observations:

  1. Google will pay the fine one way or another but there will be some legal excitement on the information highway leading to the pay station
  2. Other US companies are likely to be getting an invitation to explain their business practices. Brussels and Strasbourg are fun cities with good restaurants and some nice hotels.
  3. Google will have an opportunity to explain some of its other systems and methods in the future.

I am not sure saying, “Hey, we’re sorry” will work very well. One thing is certain: Google will not ask IBM Watson for its take on the matter.

Stephen E Arnold, July 7, 2017

Online Filtering: China and “All” Rich Media

July 6, 2017

i read “China’s Bloggers, Filmmakers Feel Chill of Internet Crackdown.” The main idea is that control over Internet content is getting exciting. I noted this point in the “real” news story:’

Over the last month, Chinese regulators have closed celebrity gossip websites, restricted what video people can post and suspended online streaming, all on grounds of inappropriate content.

Yep, an “all” in the headline and an “all” in the text of the story.

I also thought the point that emerges from the alleged statement of an academic whose travel to and from China is likely to become more interesting:

“According to these censorship rules, nothing will make it through, which will do away with audiovisual artistic creation,” Li Yinhe, an academic who studies sexuality at the government-run Chinese Academy of Social Sciences, wrote in an online post. Under the government rules, such works as Georges Bizet’s opera “Carmen” and Shakespeare’s “Othello” would technically have to be banned for depicting prostitution and overt displays of affection, she said.

What’s the key point? It seems to me that China wants to prevent digital content from eroding what the write up calls via a quote from “an industry association” “socialist values.” Yep, bad. Filtering and controls applied by commercial enterprises, therefore, must be better. If government filters applied by countries other than China may be sort of better than China’s approach.

Hey, gentle reader, this is news. But does “news” exist if one cannot access it online? Perhaps actions designed to limit Surface Web online content will increase the use of encrypted systems such as sites accessible via Tor.

Presumably Thomson Reuters new incubator for smart software and big data will not do any of the filtering thing? On the other hand, my hunch is that Thomson Reuters will filter like the Dickens: From screening ideas to fund to guiding the development trajectories of the lucky folks who get some cash.

Worth watching the publishing giant which has been struggling to generate significant top line growth.

Stephen E Arnold, July 5, 2017

Dark Web Notebook Now Available

June 5, 2017

Arnold Information Technology has published Dark Web Notebook: Investigative Tools and Tactics for Law Enforcement, Security, and Intelligence Organizations. The 250-page book provides an investigator with instructions and tips for the safe use of the Dark Web. The book, delivered as a PDF file, costs $49.

Orders and requests for more information be directed to darkwebnotebook@yandex.com. Purchasers must verify that they work for a law enforcement, security, or intelligence organization. Dark Web Notebook is not intended for general distribution due to the sensitive information it contains.

The author is Stephen E Arnold, whose previous books include CyberOSINT: Next Generation Information Access and Google Version 2.0: The Calculating Predator, among others. Arnold, a former Booz, Allen & Hamilton executive, worked on the US government-wide index and the Threat Open Source Intelligence Gateway.

The Dark Web Notebook was suggested by attendees at Arnold’s Dark Web training sessions, lectures, and webinars. The Notebook provides specific information an investigator or intelligence professional can use to integrate Dark Web information into an operation.

Stephen E Arnold, author of the Dark Web Notebook, said:

“The information in the Dark Web Notebook has been selected and presented to allow an investigator to access the Dark Web quickly and in a way that protects his or her actual identity. In addition to practical information, the book explains how to gather information from the Dark Web. Also included are lists of vendors who provide Dark Web services to government agencies along with descriptions of open source and commercial software tools for gathering and analyzing Dark Web data. Much of the information has never been collected in a single volume written specifically for those engaged in active investigations or operations.”

The book includes a comprehensive table of contents, a glossary of terms and their definitions, and a detailed index.

The book is divided into 13 chapters. These are:

  1. Why write about the Dark Web?
  2. An Introduction to the Dark Web
  3. A Dark Web Tour with profiles of more than a dozen Dark Web sites, their products, and services
  4. Dark Web Questions and Answers
  5. Basic Security
  6. Enhanced Security
  7. Surface Web Resources
  8. Dark Web Search Systems
  9. Hacking the Dark Web
  10. Commercial Solutions
  11. Bitcoin and Variants
  12. Privacy
  13. Outlook

In addition to the Glossary, the annexes include a list of DARPA Memex open source software written to perform specific Dark Web functions, a list of spoofed Dark Web sites operated by law enforcement and intelligence agencies, and a list of training resources.

Kenny Toth, June 5, 2017

Facebook Excitement: The Digital Country and Kids

May 4, 2017

I read “Facebook Admits Oversight after Leak Reveals Internal Research On Vulnerable Children.” The write up reports that an Australian newspaper:

reported that Facebook executives in Australia used algorithms to collect data on more than six million young people in Australia and New Zealand, “indicating moments when young people need a confidence boost.”

social media madness small

The idea one or more Facebook professionals had strikes me as one with potential. If an online service can identify a person’s moment of weakness, that online service could deliver content designed to leverage that insight. The article said:

The data analysis — marked “Confidential: Internal Only” — was intended to reveal when young people feel “worthless” or “insecure,” thus creating a potential opening for specific marketing messages, according to The Australian. The newspaper said this case of data mining could violate Australia’s legal standards for advertising and marketing to children.

Not surprisingly, the “real” journalism said:

“Facebook has an established process to review the research we perform,” the statement continued. “This research did not follow that process, and we are reviewing the details to correct the oversight.”

When Facebook seemed to be filtering advertising based on race, Facebook said:

“Discriminatory advertising has no place on Facebook.”

My reaction is to this revelation is, “What? This type of content shaping is news?”

My hunch is that some folks forget that when advertisers suggest one has a lousy complexion, particularly a disfiguring rash, the entire point is to dig at insecurities. When I buy the book Flow for a friend, I suddenly get lots of psycho-babble recommendations from Amazon.

Facebook, like any other sales oriented and ad hungry outfit, is going to push as many psychological buttons as possible to generate revenue. I have a hypothesis that the dependence some people have on Facebook “success” is part of the online business model.

What’s the fix?

“Fix” is a good word. The answer is, “More social dependence.”

In my experience, drug dealers do not do intervention. The customer keeps coming back until he or she doesn’t.

Enforcement seems to be a hit-and-miss solutions. Intervention makes some Hollywood types oodles of money in reality programming. Social welfare programs slump into bureaucratic floundering.

Could it be that online dependence is a cultural phenomenon. Facebook is in the right place at the right time. Technology makes it easy to refine messages for maximum financial value.

Interesting challenge, and the thrashing about for a “fix” will be fascinating to watch. Perhaps the events will be live streamed on Facebook? That may provide a boost in confidence to Facebook users and to advertisers. Win win.

Stephen E Arnold, May 4, 2017

Which Beyond Search? Text Processing or Meet Market?

April 3, 2017

In Madrid last week, a person showed me a link to Beyond Search. Nope, not this Beyond Search but to an executive recruitment firm based in London. This outfit owns the url beyondsearch.net and had the good sense to piggyback on the semantic value created by my Kentucky thoughts about search, content processing, text analytics and related subjects.

I took a quick look at the company’s Web site, which looks quite a bit like one of those Squarespace instant sites with sliders, large type, and zippy images. There were a couple of points I noted. Permit me to focus on the staff and the partners of the London-based “get you a new job, pal” store front.

First, the list of partners includes a link to a Brazilian executive recruitment company named Grupo Selpe. I used to live in Campinas, and I did a quick check of this company. The connection between Grupo Selpe and Beyond Search seems to be one of Beyond Search’s “directors.” There’s not much information about the executive directors, but we will continue to monitor the named entities. There was one link related to Grupo Selpe and Beyond Search, and it was dated 2005. Odd that in 12 years, there’s only one modest reference to the London shot house type company.

Second, we noted that the founder of Beyond Search is a person allegedly named James Davies. He too exists in a bit of an information vacuum. His LinkedIn page reports that he is a graduate of Keele University, and he has been the founder of two interesting Google-scale operations; specifically:

  • ScaleUp Works, a conference designed to raise investment funds
  • Walker Davies, an outfit described as “the UK’s pre-eminent startup and scale up hiring specialists”.

Walker Davies is interesting because it is listed as one of the “partners” of the Beyond Search recruitment outfit. It strikes me that Walker Davies and Beyond Search are in the same business: Headhunting, a colloquial terms popular in the US for moving a person to a new job.


Headhunting refers to the practice of some indigenous people. Beyond Search, despite its aboriginal origins, consumes only geese. Beyond Search in London may consume the careers of certain individuals. Beyond Search is enjoyed by certain individuals familiar with our approach and work for certain government entities engaged in law enforcement. Beyond Search in London is familiar to the pay-to-play aspect of executive recruitment; for instance, this company, Not Actively Looking.

Third, one of the partners of the recruitment outfit is the Financial Times. It apparently had a Non Executive Directors’ Club. I clicked on the link to the Financial Times, a publication which I view as one which tries not to get embroiled in illegal, underhanded, and deceptive practices. (I could be incorrect of course.) What happens when I follow the link? I get a 404 error.


This snippet from the headhunting Web site says that Beyond Search is proud to be partners with the Financial times Non Executive Director’s Club. Please, note the typographical error introduced between the logo and the executive placement service’s rendering of the identical text. Careless? No, just a bad link. I saw this when I clicked on the logo:


It seems that the Financial Times does not want to be captured in the headhunters’ pot of boiling oil or the Beyond Search headhunting outfit does not have the ability to get details right. If that is indeed the case, I am not sure I would entrust my Beyond Search goose’s job search to those who might plop the dear bird into a pot and sit back and wait for goose with sauerkraut. “Sour” right?

Fourth, The OwenJames’s link is not active. But it seems to be given pride of place on the Beyond Search LinkedIn page. I find that interesting because even my LinkedIn page includes slightly more timely information. Compare the two entries and decide for yourself: The Arnold LinkedIn page vs. the James Davies’ page.

Beyond Search BeyondSearch
image image

Fifth, the Beyond Search partner Paradox is in the coaching business. No, not football in the Roman Abramovich school of management. (See “Ruthless Sacking Is the Hallmark of Roman Abramovich Empire.” The Paradox service strikes me as somewhat vague. As a former Booz, Allen & Hamilton lackey, I understand the value of vagueness. I did enjoy the quote from Niels Bohr: The opposite of a correct statement is a false statement.” But is that what Paradox is about? False statements. I know that folks in Harrod’s Creek are not as sharp as those from more sophisticated cities like London, but the paradox is that I don’t understand how paradox is the heart of leadership.

An outfit with the same name as this beloved blog may have some good qualities. Granted, the punctuation errors, Financial Times’s link which isn’t, and the fascinating grab bag of partners suggests that the headhunter outfit is an interesting operation.

Rah, rah, to any company which wishes to hang on the webbed feet of the flying goose. Remember. When the Beyond Search goose lands, it can lay golden eggs. Sometimes, however, it can leave a deposit which can discolor paint with poo burn like this:


The opposite of the truth is what again? Ah, right. The Beyond Search operation in the UK. Recruit on, I say.

Stephen E Arnold, April 3, 2017

Canada: Right to Be Forgotten

February 15, 2017

I found this interesting. According to “Did a Canadian Court Just Establish a New Right to Be Forgotten Online?

the Federal Court of Canada issued a landmark ruling that paves the way for a Canadian version of the right to be forgotten that would allow courts to issue orders with the removal of Google search results on a global basis very much in mind. The case – A.T. v. Globe24H.com – involves a Romanian-based website that downloaded thousands of Canadian judicial and tribunal decisions, posted them online and demanded fees for their swift removal. The decisions are all public documents and available through the Canadian Legal Information Institute (CanLII), a website maintained by the legal profession in support of open access to legal materials

I find the logic interesting. I believe that Thomson Reuters processes public legal documents and charges a fee to access them and the “value add” that WestLaw and its sister outfits impose. Maybe I am addled like the goose in Harrod’s Creek, but it seems that what’s good for one gander is not so good for the Google.

Poor Romanian entrepreneur! Come up with an original idea and learn that a country wants the data removed. No word on the views of Reed Elsevier which operates LexisNexis. Thomson Reuters, anything to add?

The removal of links is a hassle at best and a real pain at the worst for the Google. For researchers, hey, find the information another way.

Stephen E Arnold, February 15, 2017

About Twitter: Kill It, Kill It Now

January 14, 2017

I am not sure what to make of “It’s Time to Kill Twitter, Before It Kills Us.”  I understand how drone swarms can kill. I grasp the notion of fungibles doing bad in airport baggage claim. But I had not considered the idea that sending short digital messages would kill “us.”

The write up explained to me:

The best thing you might say about Twitter is that it’s become the new micro press release—a way for the famous and powerful to promote, with as little effort as possible, their next project, product or random thought.

Twitter, therefore, can trigger people to do bad things. Therefore, kill Twitter.

The logic is obviously rock solid for some folks.

The write up continued:

From its founding, Twitter never had a purpose.

Okay, new media have no purpose. Interesting notion, particularly when viewed in the context of the tradition of communication methods.

But Twitter might be tough to kill. The write up pointed out:

Twitter might prove harder to get rid of than raccoons at a campsite. The company is still worth nearly $12 billion. It still has around 300 million monthly users. And it still has Trump, so if anyone tried to shutter it, he’d probably step in and classify Twitter as essential to our national security and install Ivanka to run it.

Fascinating. The question is, “Is the write up humorous like the Beyond Search weekly video news program, or is the write up making clear that certain types of communication must be stopped?”

News week or news weak?

Stephen E Arnold, January 14, 2017

The Dark Web and Surface Web Connection

January 11, 2017

IBM is doing its part to educate about the Dark Web. IBM Big Data and Analytics Hub shared a podcast episode entitled, Should we shut down the Dark Web?, which addresses the types of illegal activities on the Dark Web, explains challenges for law enforcement and discusses the difficulty in identifying Dark Web actors. Senior product manager of cyber analysis with IBM i2 Safer Planet, Bob Stasio, hosts the podcast. We found what one of the guests, Tyler Carbone, had to say quite interesting,

The parts of the internet we’re particularly interested in is where stolen information is posted and traded. What’s interesting is that that’s happening not through Tor…For what we’re interested in, a lot of stolen information is posted (traded and sold) on lite web sites — you can access them in Internet Explorer or Chrome. They’re just hosted in countries that aren’t particularly listed. One of the most well-known carding marketplaces…is hosted on a .cm….That’s not hidden within Tor at all. The problem is that individuals are logging in in an anonymous way so we can’t follow up with the individuals.

The line between the Surface Web and the Dark Web may be blurring or blurred. Ultimately, the internet is rooted in connection, so it’s hard to imagine clear separation between actors and activities being relegated to one or the other. We recommend giving this podcast a listen to ruminate on questions such as whether the Dark Web could and should be shut down. 

Megan Feil, January 11, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta