The Worlds Wealthiest People Should Fear Big Data

November 24, 2017

One of the strengths that the planets elite and wealthy have is secrecy. In most cases, average folks and media don’t know where big money is stored or how it is acquired. However, that recently changed for The Queen of England, several Trump cabinet members, and other powerful men and women. And they should be afraid of what big data and search can do with their info, as we learned in the Guardian’s piece, “Paradise Papers Leak Reveals Secrets of the World’s Elite Hidden Wealth.”

The story found a lot of fishy dealings with political donors and those in power, Queen Elizabeth having tax-free money in the Caymans and more. According to the story:

At the centre of the leak is Appleby, a law firm with outposts in Bermuda, the Cayman Islands, the British Virgin Islands, the Isle of Man, Jersey and Guernsey. In contrast to Mossack Fonseca, the discredited firm at the centre of last year’s Panama Papers investigation, Appleby prides itself on being a leading member of the “magic circle” of top-ranking offshore service providers.

 

Appleby says it has investigated all the allegations, and found “there is no evidence of any wrongdoing, either on the part of ourselves or our clients”, adding: “We are a law firm which advises clients on legitimate and lawful ways to conduct their business. We do not tolerate illegal behaviour.

Makes you wonder what would happen if some of the brightest minds in search and big data got ahold of this information? We suspect a lot of the financial knots this money ties to keep itself concealed would untangle. In an age of increasing transparency, we wouldn’t be shocked to see that happen.

Patrick Roland, November 24, 2017

Google Relevance: A Light Bulb Flickers

November 20, 2017

The Wall Street Journal published “Google Has Chosen an Answer for You. It’s Often Wrong” on November 17, 2017. The story is online, but you have to pay money to read it. I gave up on the WSJ’s online service years ago because at each renewal cycle, the WSJ kills my account. Pretty annoying because the pivot of the WSJ write up about Google implies that Google does not do information the way “real” news organizations do. Google does not annoy me the way “real” news outfits handle their online services.

For me, the WSJ is a collection of folks who find themselves looking at the exhaust pipes of the Google Hellcat. A source for a story like “Google Has Chosen an Answer for You. It’s Often Wrong” is a search engine optimization expert. Now that’s a source of relevance expertise! Another useful source are the terse posts by Googlers authorized to write vapid, cheery comments in Google’s “official” blogs. The guts of Google’s technology is described in wonky technical papers, the background and claims sections of the Google’s patent documents, and systematic queries run against Google’s multiple content indexes over time. A few random queries does not reveal the shape of the Googzilla in my experience. Toss in a lack of understanding about how Google’s algorithms work and their baked in biases, and you get a write up that slips on a banana peel of the imperative to generate advertising revenue.

I found the write up interesting for three reasons:

  1. Unusual topic. Real journalists rarely address the question of relevance in ad-supported online services from a solid knowledge base. But today everyone is an expert in search. Just ask any millennial, please. Jonathan Edwards had less conviction about his beliefs than a person skilled in the use of locating a pizza joint on a Google Map.
  2. SEO is an authority. SEO (search engine optimization) experts have done more to undermine relevance in online than any other group. The one exception are the teams who have to find ways to generate clicks from advertisers who want to shove money into the Google slot machine in the hopes of an online traffic pay day. Using SEO experts’ data as evidence grinds against my belief that old fashioned virtues like editorial policies, selectivity, comprehensive indexing, and a bear hug applied to precision and recall calculations are helpful when discussing relevance, accuracy, and provenance.
  3. You don’t know what you don’t know. The presentation of the problems of converting a query into a correct answer reminds me of the many discussions I have had over the years with search engine developers. Natural language processing is tricky. Don’t believe me. Grab your copy of Gramatica didactica del espanol and check out the “rules” for el complemento circunstancial. Online systems struggle with what seems obvious to a reasonably informed human, but toss in multiple languages for automated question answer, and “Houston, we have a problem” echoes.

I urge you to read the original WSJ article yourself. You decide how bad the situation is at ad-supported online search services, big time “real” news organizations, and among clueless users who believe that what’s online is, by golly, the truth dusted in accuracy and frosted with rightness.

Humans often take the path of least resistance; therefore, performing high school term paper research is a task left to an ad supported online search system. “Hey, the game is on, and I have to check my Facebook” takes precedence over analytic thought. But there is a free lunch, right?

Image result for there is no free lunch

In my opinion, this particular article fits in the category of dead tree media envy. I find it amusing that the WSJ is irritated that Google search results may not be relevant or accurate. There’s 20 years of search evolution under Googzilla’s scales, gentle reader. The good old days of the juiced up CLEVER methods and Backrub’s old fashioned ideas about relevance are long gone.

I spoke with one of the earlier Googlers in 1999 at a now defunct (thank goodness) search engine conference. As I recall, that confident and young Google wizard told me in a supercilious way that truncation was “something Google would never do.”

What? Huh?

Guess what? Google introduced truncation because it was a required method to deliver features like classification of content. Mr. Page’s comment to me in 1999 and the subsequent embrace of truncation makes clear that Google was willing to make changes to increase its ability to capture the clicks of users. Kicking truncation to the curb and then digging through the gutter trash told me two things: [a] Google could change its mind for the sake of expediency prior to its IPO and [b] Google could say one thing and happily do another.

I thought that Google would sail into accuracy and relevance storms almost 20 years ago. Today Googzilla may be facing its own Ice Age. Articles like the one in the WSJ are just belated harbingers of push back against a commercial company that now has to conform to “standards” for accuracy, comprehensiveness, and relevance.

Hey, Google sells ads. Algorithmic methods refined over the last two decades make that process slick and useful. Selling ads does not pivot on investing money in identifying valid sources and the provenance of “facts.” Not even the WSJ article probes too deeply into the SEO experts’ assertions and survey data.

I assume I should be pleased that the WSJ has finally realized that algorithms integrated with online advertising generate a number of problematic issues for those concerned with factual and verifiable responses.

Read more

Searx: Another Privacy Oriented Web Search System

November 13, 2017

There are a number of privacy oriented Web search systems. If you want to poke around, try the quirky Unbubble or give Gibiru a whirl. I noted another entrant called Searx. There are some important differences. Searx is a system which takes a page from peer to peer access systems. You host it yourself. The system is a metasearch engine like Ixquick (Startpage). This means that the user’s query is converted to the query syntax used by search systems like Bing.com. The results are merged and a results list displayed. Deduplication is a slippery fish. You will need to scan the results and run through the familiar, but much maligned procedure of scan, click, browse, and save the Web page with the information you want. If you are like a millennial, you will take the first result because everything on the Web is true.

Stephen E Arnold, November 13, 2017

Ichidan Simplifies Dark Web Searches

November 10, 2017

Now there is an easier way to search the Dark Web, we learn from a write-up at Cylance, “Ichidan, a Search Engine for the Dark Web.” Cybersecurity pro and writer Kim Crawley informs us:

Ichidan is a search engine for looking up websites that are hosted through the Tor network, which may be the first time that’s been done at this scale. Websites on Tor usually have the .onion top level domain and you typically need a web browser with the Tor plugin or Tor’s own configured web browser in order to access them. … The search engine is less like Google and more like Shodan, in that it allows users to see technical information about .onion websites, including their connected network interfaces, such as TCP/IP ports.

Researchers at BleepingComputer explored the possibilities of this search engine. They were able to reproduce OnionScan’s findingss on the shrinkage of the Dark Web—the number of Dark Web services decreased from about 30,000 in April 2016 to about 4,400 not quite a year later (so by about 85%). Researchers found this alarming capability, too:

BleepingComputer was also able to use Ichidan to find a website which a lot of exposed ports, including OpenSSH, an email server,  a Telnet implementation, vsftpd, and an exposed Fritzbox router. That sort of information is very attractive to cyber attackers. Using Ichidan is a lot easier than command line pentesting tools, which require more specific technical know-how.

Uh-oh. Crawley predicts that use of Icihan will grow as folks on both sides of the law discover its possibilities. She advises anyone administering a .onion site to strengthen their cyber defenses posthaste, “if they want to survive.”

Cynthia Murrell, November 10, 2017

Reddit Search Improves with Lucidworks

November 10, 2017

YouTube might swallow all of your free time with videos, but Reddit steals your entire life with videos, plus images, GIFS, posts, jokes, and cute pictures of doggos, danger noodles, trash pandas, and floofs.  If you do not know what those are, then shame on you.  If you are a redditor, then you might have noticed that the search function stinks worse than a troll face.  According to TechCrunch, Reddit has finally given their search function a facelift, “Reddit Teams With Lucidworks To Build New Search Framework.”

Reddit has some serious stats when it comes to user searches and postings.  The online discussion platform has more than 500 million users, generates 5 million comments, and 40 million searches are conducted each day.  While one of Reddit’s search challenges is dealing with the varied content, another is returning personalized search results without redactors having to explicitly write them in the search box.

Reddit’s poor search performance is legendary and its head honchos wanted to improve it, but trying to find the time to fix it was a problem.  That is why they hired Lucidworks to do the job for them:

Caldwell said that the company went with the Lucidworks Fusion platform because it had the right combination of technology and the ability to augment his engineering team, while helping search to continually evolve on Reddit. Buying a tool was only part of the solution though. Reddit also needed to hire a group of engineers with what Caldwell called “world class search and relevance engineering expertise.” To that end, he has set up a 30-person engineering search team devoted to maximizing the potential of the new search platform.

Lucidworks currently remains in charge of fixing Reddit’s search issues, but eventually, Reddit will take over.  Within a few searches for danger noodle, floof, and doggo not only have more accurate results, but you can learn the aww language lingo through the results

Whitney Grace, November 10, 2017

Treating Google Knowledge Panel as Your Own Home Page

November 8, 2017

Now, this is interesting. Mike Ramsey at AttorneyAtWork suggests fellow lawyers leverage certain Google tools  in, “Three Reasons Google Is Your New Home Page.” He points out that Google now delivers accessibility to many entities directly on the results page, reducing the number of clicks potential clients must perform to contact a firm. He writes:

[Google] has rolled out three products that provide potential clients with information about your law firm before they get to your site:

*Messages (on mobile)

*Questions and Answers (on mobile)

*Optional URLs for booking appointments (both mobile and desktop)

 

This means that Google search results are becoming your new ‘home page.’ All three products — Messages, Questions and Answers and URLs for appointments — are accessible from your Google My Business dashboard. They appear in your local Knowledge Panel in Google. If Google really is becoming your home page, but also giving you a say in providing potential clients with information about your firm, you will definitely want to take advantage of it.

The article explains how to best leverage each tool. For example, Messages let you incorporate text messages into your Knowledge Panel; Ramsey notes that customers prefer using text messages to resolve customer service issues. Questions and Answers will build an FAQ-like dialogue for the panel, while optional URLs allow clients to schedule appointments right from the results page. Ramsey predicts it should take about an hour to set up these tools for any given law firm, and emphasizes it is well worth that investment to make it as easy as possible for potential clients to get in touch.

Cynthia Murrell, November 8, 2017

The Power of Search: Forget Precision, Recall, and Accuracy of the Items in the Results List

November 3, 2017

Thank you, search engine optimization. I now have incontrovertible proof that search which is useful to the user is irrelevant. Maybe dead? Maybe buried?

Navigate to “70 SEO Statistics That Prove the Power of Search.” Prepare to be amazed. If you actually know about precision and recall, you will find that those methods for evaluating the efficacy of a search system belong in the grave.

The “power of search” is measured by statistics presented without silliness like sample size, date, confidence level, etc. Who needs these artifacts from Statistics 101?

Let’s look at four of the 70 statistics. Please, consult the original for the full listing which proves the power of search. I like that “proves” angle too.

First, users don’t do much research. Here’s the statistic which proves the assertion “Online users just take what the system serves up”:

75% of users never click past the first page of search results.

So if you, your product, your company, or your “fake news” item does not appear at the top of a search result list or an output determined by a black box algorithm, you, your product, your company, or your “fake news” item does not exist. How’s that grab you?

Second, users are not too swift when it comes to figuring out what’s content and what’s an ad. Amazing assertion, right?

55% of searchers don’t know which links in the Search Engine Results pages are PPC ads, according to a new survey. And up to 50% of users shown a Search engine Results page screenshot could not identify paid ads.

If one can’t figure out what’s an ad, how many users can figure out if a statistic, like those which prove search is powerful, can differentiate accurate information from hogwash?

Third, search results mean trust. Sound crazy to you? No. Well, it sure does to me. Here’s the statistic that proves search eats Wheaties:

88% of consumers trust online reviews as much as they trust personal recommendations.

I believe everything I read on the Internet, don’t you?

Third, if you blog, prepare to be inundated with sales calls and maybe money. Here’s the statistics which prove that search has power:

Companies who blog have 434% more indexed pages than those who don’t. That means more leads!

I would suggest that if you company engages in hate speech, certain product sales, or violates terms of use—you will have to chase customers on the Dark Web or via i2p. By the way, I think a company is a thing, so “which” not “who” seems more appropriate. Don’t y’all agree?

Fourth, using pictures is a good thing. Hey, who has time to read? This statistic conflicts with “longer articles are better” but I get the picture:

The Backlinko study also reported that using a single image within content will increase search engine rankings.

Here’s a picture to make this write up more compelling:

image

Search has power. Really?

Stephen E Arnold, November 3, 2017

Yet Another Way to Make Search Smarter

November 3, 2017

Companies are always inventing new ways to improve search.  Their upgrades are always guaranteed to do this or that, but usually they do nothing at all.  BA Insights is one of the few companies that offers decent search product and guess what?  They have a new upgrade!  According to their blog, “BA Insight Makes Search Smarter With Smarthub.”  BA Insight’s latest offering is called the Smarthub that is specifically designed for cognitive search.  It leverages cloud-based search and cognitive computing services from Google, Elastic, and Microsoft.

Did I mention it was an app?  Most of them are these days.  Smarthub also supports and is compatible with other technology, has search controls built from metadata, machine learning personalization analytics, cognitive image processing, and simultaneous access to content from over sixty enterprise systems. What exactly is cognitive search?

‘Cognitive search, and indeed, the entire new wave of cognitive applications, are the next leap forward in information access.  These apps rest on a search backbone that integrates information, making it findable and usable.  Companies such as BA Insight are now able to not only provide better search results, but also uncover patterns and solve problems that traditional search engines can’t,’ said Sue Feldman, Co-Founder and Managing Director at the Cognitive Computing Consortium.  ‘There’s a cognitive technology race going on between the big software superpowers, which are developing platforms on which these applications are built.  Smart smaller vendors go the next mile, layering highly integrated, well designed, purpose-built applications on top of multiple platforms so that enterprises can leave their information environments in place while adding in the AI, machine learning, and language understanding that gets them greater, faster insights.’

It sounds like what all search applications are supposed to do.  I guess it is just a smarter version of the search applications that already exist, but what makes them different is the analytics and machine learning components that make information more findable and personalize the experience.

Whitney Grace, November 3, 2017

China Trusts AI to Facilitate Human Communication

November 1, 2017

With the world’s largest population, one would think that the Chinese would not have any trouble finding someone to talk to.  Apparently, China enjoys talking with robots, says the MIT Technology Review’s article, “Why 500 Million People In China Are Talking To This AI.”  Going by the name iFlyTek, the AI app acts as on-demand translation service, but it does more than translate languages.

Over 500 million Chinese are using iFlyTek to manage their conversations with other people, including dictating texts, translating accents, transcribe, and generate automated messages.  iFlyTek is programmed with all good tools related to communication: voice recognition, natural language processing, machine translation, data mining, and more.  While the app has many applications in day to day life, the translation feature still has issues and intent is lost in translation.

The iFlyTek app is used in a variety of industries, especially healthcare and driving.  Drivers issue it vocal commands sop their hands can remain on the wheel.  Also, a hospital implemented ten female-looking robots to assist the overworked medical staff.  The robots can answer questions and direct patients to the correct department.  Doctors are also using iFlyTek to dictate a patient’s medical records.  Dictation will become more important, especially since it offers people a hands-free way to get work done.  There, of course, remains problems:

Although voice-based AI techniques are becoming more useful in different scenarios, one fundamental challenge remains: machines do not understand the answers they generate, says Xiaojun Wan, a professor at Peking University who does research in natural-language processing. The AI responds to voice queries by searching for a relevant answer in the vast amount of data it was fed, but it has no real understanding of what it says.  In other words, the natural-language processing technology that powers today’s voice assistants is based on a set of rigid rules.

Vocal-based technology is becoming better, more accurate, and more reliable, but thee are still kinks in the system.

Whitney Grace, November 1, 2017

 

Natural Language Processing: Tomorrow and Yesterday

October 31, 2017

I read “Will Natural Language Processing Change Search as We Know It?” The write up is by a search specialist who, I believe, worked at Convera. The Search Technologies’ Web site asserts:

He was the architect and inventor of RetrievalWare, a ground-breaking natural-language based statistical text search engine which he started in 1989 and grew to $50 million in annual sales worldwide. RetrievalWare is now owned by Microsoft Corporation.

I think Fast Search acquired a portion of Convera. When Microsoft purchased Fast Search, the Convera technology was part of the deal. When Convera faded, one rumor I captured in 2007 was that some of the Convera technology was used by Ntent, formed as the result of a merger between Convera Corporation and Firstlight ERA. If accurate, the history of Convera is fascinating with Excalibur, ConQuest, and Allen & Co. in the mix.

In the “Will Natural Language Processing Change Search As We Know It” blog post, I noted these points:

  • Intranets incorporating NLP, semantic search and AI can fuel chatbots as well as end-to-end question-answering systems that live on top of search. It is a truly semantic extension to the search box with far-reaching implications for all types of search.
  • With NLP, enterprise knowledge contained in paper documentation can be encoded in a machine-readable format so the machine can read, process and understand it enough to formulate an intelligent response.
  • it’s good to know about established tool sets and methodologies for developing and creating effective solutions for use cases like technical support. But like all development projects, take care to create the tools based on mimicking the responses of actual human domain experts. Otherwise, you may run into the proverbial development problem of “garbage in, garbage out” which has plagued many such expert system initiatives.

Mr. Nelson is painting a reasonable picture about the narrow use of widely touted technologies. In fact, the promise of NLP has been part of enterprise search marketing for decades.

What I found interesting was the Convera document called “Accurate Search: What a Concept, published by Convera in 2002. I noted this passage on page 4 of the document:

Concept Search capitalizes on the richness of language, with its multiple term meanings, and transforms it from a problem into an advantage. RetrievalWare performs natural language processing and search term expansion to paraphrase queries, enabling retrieval of documents that contain the specific concepts requested rather than just the words typed during the query while also taking advantage of its semantic richness to rank documents in results lists. RetrievalWare’s powerful pattern search abilities overcome common errors in both content and queries, resulting in greater recall and user satisfaction.

I find the shift from a broad solution to a more narrow solution interesting. In the span of 15 years, the technology of search seems to be struggling to deliver.

Perhaps consulting and engineering services are needed to make search “work”? Contrast search with mobile phone technology. Progress has been evident. For search, success narrows to improving “documentation” and “customer support.”

Has anyone tried to reach PayPal’s customer support or United Airlines’ customer support? Try it. United was at one time a “customer” of Convera’s. From my point of view, United Airlines’ customer service has remained about the same over the last decade or two.

Enterprise search, broad or narrow, remains a challenge for marketers and users in my opinion. NLP, I assume, has arrived after a long journey. For a free profile of Convera, check out this link.

Stephen E Arnold, October 31, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta