Online: Welcome to 1981 and 2018

January 8, 2018

I have been thinking about online. I met with a long-time friend and owner of a consumer-centric Web site. For many years (since 1993, in fact), the site grew and generated a solid stream of revenue.

At lunch, the site owner told me that in the last three years, the revenue was falling. As I listened to this sharp businessperson, I realized that his site had shifted from ads which he and his partners sold to ads provided by automated systems.

From direct control to the ease of automated ad provision created the current predicament: Falling revenue. At the same time, the mechanisms for selling ads directly evolved as well. The shift from many industry events to a handful of large business sector conferences took place. There were more potential customers at these shows, but the attendance shifted from hands-on marketers to people who wanted to make use of online automated sales and marketing systems began to dominate.

image

He said, “In the good old days of 1996, I could go to a trade show and meet people who made advertising and marketing decisions based on experience with print and TV advertising, dealer promotions, and ideas.”

“Now,” he continued, “I meet smart people who want to use methods which rely on automated advertising. When I talk about buying an ad on our site or sponsoring a section of our content, the new generation look at me like I’m crazy. What’s that?”

I listened. What could I say.

The good, old days maybe never existed.

I read “Facebook and Google Are Free. They Shouldn’t Be.” The write up has a simple premise: Users should pay for information.

I am not certain if the write up realizes that paying for online information was the only way to generate revenue from digital content in the past. I know that partners in law firms realize that running queries on LexisNexis and Westlaw have to allocate cash to pay for the digital information about laws, decisions, and cases. For the technical information in Chemical Abstracts, researchers and chemists have to pay as well. Financial data for traders costs money as well.

Read more

SIXGILL: Dark Web Intelligence with Sharp Teeth

December 14, 2017

“Sixgill” refers to the breathing apparatus of a shark. Deep. Silent. Stealthy. SIXGILL offers software and services which function like “your eyes in the Dark Web.”

Based in Netanya, just north of Tel Aviv, SIXGILL offers services for its cyber intelligence platform for the Dark Web. What sets the firm apart is its understanding of social networks and their mechanisms for operation.*

The company’s primary product is called “Dark-i.” The firm’s Web site states that the firm’s system can:

  • Track and discover communication nodes across darknets with the capability to trace malicious activity back to their original sources
  • Track criminal activity throughout the cyber crime lifecycle
  • Operate in a covert manner including the ability to pinpoint and track illegal hideouts
  • Support clients with automated and intelligence methods.

The Dark-i system is impressive. In a walk through of the firm’s capabilities, I noted these specific features of the Dark-i system:

  • Easy-to-understand reports, including summaries of alleged bad actors behaviors with time stamp data
  • Automated “profiles” of Dark Web malicious actors
  • The social networks of the alleged bad actors
  • The behavior patterns in accessing the Dark Web and the Dark Web sites the individuals visit.
  • Access to the information on Dark Web forums.

Details about the innovations the company uses are very difficult to obtain. Based on open source information, a typical interface for SIXGILL looks like this:

Related image

Based on my reading of the information in the screenshot, it appears that this SIXGILL display provides the following information:

  • The results of a query
  • Items in the result set on a time line
  • One-click filtering based on categories taken from the the sources and from tags generated by the system, threat actors, and Dark Web sources
  • A list of forum posts with the “creator” identified along with the source site and the date of the post.

Compared with reports about Dark Web activity from other vendors providing Dark Web analytic, monitoring, and search services, the Dark Web Notebook team pegs s SIXGILL in the top tier of services.

Read more

Google Relevance: A Light Bulb Flickers

November 20, 2017

The Wall Street Journal published “Google Has Chosen an Answer for You. It’s Often Wrong” on November 17, 2017. The story is online, but you have to pay money to read it. I gave up on the WSJ’s online service years ago because at each renewal cycle, the WSJ kills my account. Pretty annoying because the pivot of the WSJ write up about Google implies that Google does not do information the way “real” news organizations do. Google does not annoy me the way “real” news outfits handle their online services.

For me, the WSJ is a collection of folks who find themselves looking at the exhaust pipes of the Google Hellcat. A source for a story like “Google Has Chosen an Answer for You. It’s Often Wrong” is a search engine optimization expert. Now that’s a source of relevance expertise! Another useful source are the terse posts by Googlers authorized to write vapid, cheery comments in Google’s “official” blogs. The guts of Google’s technology is described in wonky technical papers, the background and claims sections of the Google’s patent documents, and systematic queries run against Google’s multiple content indexes over time. A few random queries does not reveal the shape of the Googzilla in my experience. Toss in a lack of understanding about how Google’s algorithms work and their baked in biases, and you get a write up that slips on a banana peel of the imperative to generate advertising revenue.

I found the write up interesting for three reasons:

  1. Unusual topic. Real journalists rarely address the question of relevance in ad-supported online services from a solid knowledge base. But today everyone is an expert in search. Just ask any millennial, please. Jonathan Edwards had less conviction about his beliefs than a person skilled in the use of locating a pizza joint on a Google Map.
  2. SEO is an authority. SEO (search engine optimization) experts have done more to undermine relevance in online than any other group. The one exception are the teams who have to find ways to generate clicks from advertisers who want to shove money into the Google slot machine in the hopes of an online traffic pay day. Using SEO experts’ data as evidence grinds against my belief that old fashioned virtues like editorial policies, selectivity, comprehensive indexing, and a bear hug applied to precision and recall calculations are helpful when discussing relevance, accuracy, and provenance.
  3. You don’t know what you don’t know. The presentation of the problems of converting a query into a correct answer reminds me of the many discussions I have had over the years with search engine developers. Natural language processing is tricky. Don’t believe me. Grab your copy of Gramatica didactica del espanol and check out the “rules” for el complemento circunstancial. Online systems struggle with what seems obvious to a reasonably informed human, but toss in multiple languages for automated question answer, and “Houston, we have a problem” echoes.

I urge you to read the original WSJ article yourself. You decide how bad the situation is at ad-supported online search services, big time “real” news organizations, and among clueless users who believe that what’s online is, by golly, the truth dusted in accuracy and frosted with rightness.

Humans often take the path of least resistance; therefore, performing high school term paper research is a task left to an ad supported online search system. “Hey, the game is on, and I have to check my Facebook” takes precedence over analytic thought. But there is a free lunch, right?

Image result for there is no free lunch

In my opinion, this particular article fits in the category of dead tree media envy. I find it amusing that the WSJ is irritated that Google search results may not be relevant or accurate. There’s 20 years of search evolution under Googzilla’s scales, gentle reader. The good old days of the juiced up CLEVER methods and Backrub’s old fashioned ideas about relevance are long gone.

I spoke with one of the earlier Googlers in 1999 at a now defunct (thank goodness) search engine conference. As I recall, that confident and young Google wizard told me in a supercilious way that truncation was “something Google would never do.”

What? Huh?

Guess what? Google introduced truncation because it was a required method to deliver features like classification of content. Mr. Page’s comment to me in 1999 and the subsequent embrace of truncation makes clear that Google was willing to make changes to increase its ability to capture the clicks of users. Kicking truncation to the curb and then digging through the gutter trash told me two things: [a] Google could change its mind for the sake of expediency prior to its IPO and [b] Google could say one thing and happily do another.

I thought that Google would sail into accuracy and relevance storms almost 20 years ago. Today Googzilla may be facing its own Ice Age. Articles like the one in the WSJ are just belated harbingers of push back against a commercial company that now has to conform to “standards” for accuracy, comprehensiveness, and relevance.

Hey, Google sells ads. Algorithmic methods refined over the last two decades make that process slick and useful. Selling ads does not pivot on investing money in identifying valid sources and the provenance of “facts.” Not even the WSJ article probes too deeply into the SEO experts’ assertions and survey data.

I assume I should be pleased that the WSJ has finally realized that algorithms integrated with online advertising generate a number of problematic issues for those concerned with factual and verifiable responses.

Read more

Enterprise Search: Will Synthetic Hormones Produce a Revenue Winner?

October 27, 2017

One of my colleagues provided me with a copy of the 24 page report with the hefty title:

In Search for Insight 2017. Enterprise Search and Findability Survey. Insights from 2012-2017

I stumbled on the phrase “In Search for Insight 2017.”

image

The report combines survey data with observations about what’s going to make enterprise search great again. I use the word “again” because:

  • The buy up and sell out craziness which culminated with Microsoft’s buying Fast Search & Transfer in 2008 and Hewlett Packard’s purchase of Autonomy in 2011 marked the end of the old-school enterprise search vendors. As you may recall, Fast Search was the subject of a criminal investigation and the HP Autonomy deal continues to make its way through the legal system. You may perceive these two deals as barn burners. I see them as capstones for the era during which search was marketed as the solution to information problems in organizations.
  • The word “search” has become confusing and devalued. For most people, “search” means the Danny Sullivan search engine optimization systems and methods. For those with some experience in information science, “search” means locating relevant information. SEO erodes relevance; the less popular connotation of the word suggests answering a user’s question. Not surprisingly, jargon has been used for many years in an effort to explain that “enterprise search” is infused with taxonomies, ontologies, semantic technologies, clustering, discovery, natural language processing, and other verbal chrome trim to make search into a Next Big Thing again. From my point of view, search is a utility and a code word for spoofing Google so that an irrelevant page appears instead of the answer the user seeks.
  • The enterprise search landscape (the title of one of my monographs) has been bulldozed and reworked. The money in the old school precision and recall type of search comes from consulting. Search Technologies was acquired by Accenture to add services revenue to the management consulting firm’s repertoire of MBA fixes. What is left are companies offering “solutions” which require substantial engineering, consulting, and training services. The “engine”, in many cases, are open source systems which one can download without burdensome license fees. From my point of view, search boils down to picking an open source solution. If those don’t work, one can license a proprietary system wrapped around open source. If one wants a proprietary system, there are some available, but these are not likely to reach the lofty heights of the Fast Search or Autonomy IDOL systems in the salad days of enterprise search and its promises of a universal search system. The universal search outfit Google pulled out of enterprise search for a reason.

I want to highlight five of the points in the 24 page write up. Please, register to get your own copy of this document.

Here are my five highlights. My comments are in italics after each quote from the document:

Read more

Understanding Intention: Fluffy and Frothy with a Few Factoids Folded In

October 16, 2017

Introduction

One of my colleagues forwarded me a document called “Understanding Intention: Using Content, Context, and the Crowd to Build Better Search Applications.” To get a copy of the collateral, one has to register at this link. My colleague wanted to know what I thought about this “book” by Lucidworks. That’s what Lucidworks calls the 25 page marketing brochure. I read the PDF file and was surprised at what I perceived as fluff, not facts or a cohesive argument.

image

The topic was of interest to my colleague because we completed a five month review and analysis of “intent” technology. In addition to two white papers about using smart software to figure out and tag (index) content, we had to immerse ourselves in computational linguistics, multi-language content processing technology, and semantic methods for “making sense” of text.

The Lucidworks’ document purported to explain intent in terms of content, context, and the crowd. The company explains:

With the challenges of scaling and storage ticked off the to-do list, what’s next for search in the enterprise? This ebook looks at the holy trinity of content, context, and crowd and how these three ingredients can drive a personalized, highly-relevant search experience for every user.

The presentation of “intent” was quite different from what I expected. The details of figuring out what content “means” were sparse. The focus was not on methodology but on selling integration services. I found this interesting because I have Lucidworks in my list of open source search vendors. These are companies which repackage open source technology, create some proprietary software, and assist organizations with engineering and integrating services.

The book was an explanation anchored in buzzwords, not the type of detail we expected. After reading the text, I was not sure how Lucidworks would go about figuring out what an utterance might mean. The intent-centric systems we reviewed over the course of five months followed several different paths.

Some companies relied upon statistical procedures. Others used dictionaries and pattern matching. A few combined multiple approaches in a content pipeline. Our client, a firm based in Madrid, focused on computational linguistics plus a series of procedures which combined proprietary methods with “modules” to perform specific functions. The idea for this approach was to reduce the errors in intent identification from accuracy between 65 percent to 80 percent to accuracy approaching and often exceeding 90 percent. For text processing in multi-language corpuses, the Spanish company’s approach was a breakthrough.

I was disappointed but not surprised that Lucidworks’ approach was breezy. One of my colleagues used the word “frothy” to describe the information in the “Understanding Intention” document.

As I read the document, which struck me as a shotgun marriage of generalizations and examples of use cases in which “intent” was important, I made some notes.

Let me highlight five of the observations I made. I urge you to read the original Lucidworks’ document so you can judge the Lucidworks’ arguments for yourself.

Imitation without Attribution

My first reaction was that Lucidworks had borrowed conceptually from ideas articulated by Dr. Gregory Grefenstette and his book Search Based Applications: At the Confluence of Search and Database Technologies. You can purchase this 2011 book on Amazon at this link. Lucidworks’ approach, unlike Dr. Grefenstette’s borrowed some of the analysis but did not include the detail which supports the increasing importance of using search as a utility within larger information access solutions. Without detail, the Lucidworks’ document struck me as a description of the type of solutions that a company like Tibco is now offering its customers.

Read more

Lucidworks: The Future of Search Which Has Already Arrived

August 24, 2017

I am pushing 74, but I am interested in the future of search. The reason is that with each passing day I find it more and more difficult to locate the information I need as my routine research for my books and other work. I was anticipating a juicy read when I requested a copy of “Enterprise Search in 2025.” The “book” is a nine page PDF. After two years of effort and much research, my team and I were able to squeeze the basics of Dark Web investigative techniques into about 200 pages. I assumed that a nine-page book would deliver a high-impact payload comparable to one of the chapters in one of my books like CyberOSINT or Dark Web Notebook.

I was surprised that a nine-page document was described as a “book.” I was quite surprised by the Lucidworks’ description of the future. For me, Lucidworks is describing information access already available to me and most companies from established vendors.

The book’s main idea in my opinion is as understandable as this unlabeled, data-free graphic which introduces the text content assembled by Lucidworks.

image

However, the pamphlet’s text does not make this diagram understandable to me. I noted these points as I worked through the basic argument that client server search is on the downturn. Okay. I think I understand, but the assertion “Solr killed the client-server stars” was interesting. I read this statement and highlighted it:

Other solutions developed, but the Solr ecosystem became the unmatched winner of the search market. Search 1.0 was over and Solr won.

In the world of open source search, Lucene and Solr have gained adherents. Based on the information my team gathered when we were working on an IDC open source search project, the dominant open source search system was Lucene. If our data were accurate when we did the research, Elastic’s Elasticsearch had emerged as the go-to open source search system. The alternatives like Solr and Flaxsearch have their users and supporters, but Elastic, founded by Shay Branon, was a definite step up from his earlier search service called Compass.

In the span of two and a half years, Elastic had garnered more than a $100 million in funding by 2014and expanded into a number adjacent information access market sectors. Reports I have received from those attending Elastic meetings was that Elastic was putting considerable pressure on proprietary search systems and a bit of a squeeze on Lucidworks. Google’s withdrawing its odd duck Google Search Appliance may have been, in small part, due to the rise of Elasticsearch and the changes made by organizations trying to figure out how to make sense of the digital information to which their staff had access.

But enough about the Lucene-Solr and open source versus proprietary search yin and yang tension.

Read more

New Enterprise Search Market Study

August 1, 2017

Don Quixote and Solving Death: No Problem, Amigo

I read “Global Enterprise Search Market 2017-2022.” I was surprised that a consulting firms would invest time and energy in writing about a market sector which has not been thriving. Now don’t start sending me email about my lack of cheerfulness about enterprise search. The sector is thriving, but it is doing so with approaches that are disguised as applications which deliver something other than inflated expectations, business closures, and lawsuits.

Image result for don quixote

I will slay the beast that is enterprise search. “Hold still, you knave!”

First, let’s look at what the report covers, then I will tackle some of the issues about which I think as the author of the Enterprise Search Report and a number of search-related articles and analyses. (The articles are available from the estimable Information Today Web site, and the free analyses may be located at www.xenky.com/vendor-profiles.

The write up told me that enterprise search boils down to these companies:

Coveo Corp
Dassault Systemes
IBM Corp
Microsoft
Oracle
SAP AG

Coveo is a fork of Copernic. Yep, it’s a proprietary system which originally was focused on providing search for Microsoft. Now the company has spread its wings to include a raft of functions which range from the cloud to customer support / help desk services.

Dassault Systèmes is the owner of Exalead. Since the acquisition, Exalead as a brand has faded. The desktop search system was killed, and its proprietary technology lives on mostly as a replacement for Dassault’s internal search system which was based on Autonomy. Most of the search wizards have left, but the Exalead technology was good before Dassault learned that selling search was indeed a challenge.

IBM offers a number of products which include open source Lucene, acquired technology like Vivisimo’s clustering engine, and home brew code from its IBM wizards. (Did you  know that the precursor of PageRank was an IBM “invention”?) The key is that IBM uses search to sell services which have a higher margins than providing a free version of brute force information access.

Read more

Google: What For-Fee Thought Leader Love? And for Money? Yep

July 13, 2017

Talk about disinformation. Alphabet Google finds itself in the spotlight for normal consulting service purchases. How many of those nifty Harvard Business Review articles, essays in Strategy & Business (the money loser published by the former Booz, Allen & Hamilton), or white papers generated by experts like me are labors of thought leader love.

Why not ask a person like me, an individual who has written a white paper for an interesting company in Spain? You won’t. Well, let me interview myself:

Question: Why did you write the white paper about multi-language text analysis?

Answer: I did a consulting job and was asked to provide a report about the who, what, why, etc. of the company’s technology.

Question: Is the white paper objective and factual?

Answer: Yes, I used information from my book research, a piece of published material from the “old” Autonomy Software, and the information gathered at the company’s headquarters in Madrid by one of my colleagues from the engineers. I had a couple of other researchers chase down information about the company, its products, customers, and founder. I then worked through the information about text analysis in my archive. I think I did a good job of presenting the technology and why it is important.

Question: Were you paid?

Answer: Yes, I retired in 2013, and I don’t write for third parties unless those third parties pony up cash.

Question: Do you flatter the company or distort the company’s technology, its applications, or its benefits?

Answer: I try to work through the explanation in order to inform. I offer my opinion at the end of the write up. In this particular case, the technology is pretty good. I state that.

Question: Would another expert agree with you?

Answer: Some would and some would not. When figuring out with a complex multi-lingual platform when processing text in 50 languages, there is room for differences of opinion with regard to such factors as [a] text through put on a particular application, [b] corpus collection and preparation, [c] system tuning for a particular application such as a chatbot, and other factors.

Question: Have you written similar papers for money over the years?

Answer: Yes, I started doing this type of writing in 1972 when I left the PhD program at the University of Illinois to join Halliburton Nuclear in Washington, DC.

Question: Do people know you write white papers or thought leader articles for money?

Answer: Anyone who knows me is aware of my policy of charging money for knowledge work. I worked at Booz, Allen & Hamilton and a number of other equally prestigious firms. To my knowledge, I have never been confused with Mother Teresa.

Mother Theresa A Person Who Works for Money
Image result for mother teresa seajpg02

 

I offer this information as my reaction to the Wall Street Journal’s write up “Google Pays Scholars to Influence Policy.” You will have to pay to read the original article because Mr. Murdoch is not into free information.The original appeared in my dead tree edition of the WSJ on July 12, 2017 on the first page with a jump to a beefy travelogue of Google’s pay-for-praise and pay-for-influence activities. A correction to the original story appears on Fox News. Gasp. Find that item here.

Google, it seems, is now finding itself in the spotlight for search results, presenting products to consumers, and its public relations/lobbying activities.

My view is that Google does not deserve this type of criticism. I would prefer that real journalists tackle such subjects as [a] the Loon balloon patent issue, [2] Google’s somewhat desperate attempts to discover the next inspiration like Yahoo’s online advertising approach, and [3] solving death’s progress.

Getting excited about white papers which have limited impact probably makes a real journalist experience a thrill. For me, the article triggers a “What’s new?”

But I am not Mother Teresa, who would have written for Google for nothing. Nah, not a chance.

Stephen E Arnold, July 14, 2017

Palantir Technologies: The Buzzfeed Beat

July 3, 2017

I read “There’s a Fight Brewing between the NYPD and Silicon Valley’s Palantir.” Two points about this story. Palantir Technologies, a vendor profiled in my CyberOSINT and Dark Web Notebook reports is probably going to keep its eye on the real journalistic outfit Buzzfeed. I don’t know much about “real” journalism, but my hunch is that if Palantir’s stakeholders find the Buzzfeed write up coverage interesting, some of those folks might spill their Philz coffee.

The other point is that the New York Police Department may find questions about its contractual dealings a bit of distraction from the quotidian tasks the force faces each day. I would not characterize “real” journalists asking questions “annoying,” but I would hazard the phrase “time consuming” or the word “distracting.”

image

“You want me to believe that?” asks Max, a skeptical show dog who knows that some owners will do anything to win.

The point of the “Fight Brewing” write up strikes me as a story designed to suggest that Palantir Technologies may be showing some signs of stress. When I read the story, I thought of the news which swirled around some of the defunct enterprise search companies when one of their client engagements went south. Vendors hit with these situations can do little but ride out the storm.

Hey, enterprise search was routinely oversold. When a system was up and running, the results were usually similar to the results generated by the previous “solution to all your information problems.” The search engineers who coded the systems knew that overpromising and under delivering were highly probable once the on switch was flipped. But the sales professional were going to say what was necessary to close the deal. In fact, most of the fancy promises about an enterprise search system set the company up for failure.

Is that what’s going on in the NYPD-Palantir “showdown”? To wit:

Palantir explained the system’s functions and outputs. The NYPD signed on. Then when the system was installed, additional work was needed to make the Palantir system meet the expectations set by the Palantir sales engineers.

The “Fight Brewing” story says:

The NYPD quietly began work last summer on its replacement data system, and in February it announced internally that it would cancel its Palantir contract and switch to the new system by the beginning of July, according to three people familiar with the matter. The new system, named Cobalt, is a group of IBM products tied together with NYPD-created software. The police department believes Cobalt is cheaper and more intuitive than Palantir, and prizes the greater degree of control it has over this system.

Keep in mind that I, before I retired in 2013, had been an adviser to the original i2 Group Ltd., the company which created in my opinion the analytic and visualization method which defines modern cyber eDiscovery in the 1990s.

The notion that IBM, which now owns i2’s Analyst’s Notebook, is working hard to close deals in key Palantir accounts from what I have heard in the general store in Harrod’s Creek.

I don’t have to go much farther than my own experience to get a sense that the “fight” may be a manifestation of how the world works when it comes to making sales for systems like Palantir’s Gotham or IBM’s i2. In my work career I have seen some interesting jabs and punches thrown to close a deal.

The NYPD, like any organization, wants systems which work and represent good value. Incumbent vendors have to find a way to retain a customer. Competitors have to find a way to get a licensee of one product to switch to a different product.

I noted this statement in the “Fight Brewing” story:

Palantir has struggled to expand its work with the police force, the emails show. As of March and April 2015, Palantir had had “little exposure to the top brass,” and although it wanted to add more business, “the door there clearly still remains closed given the larger political environment,” staffers wrote in emails. A staffer at one point invoked a phrase popularized by Thiel, author of Zero to One: Notes on Startups, or How to Build the Future, saying that Palantir still needed to get “from 0->1 at NYPD.”

Now how many police forces in the US can afford a comprehensive cyber eDiscovery system like Palantir Gotham or IBM Analyst’s Notebook? This is an important point because the number of potential customers is quite small. For example, after NY, LA, Chicago, Miami, and maybe three or four other cities, the sales professional runs out of viable prospects. How many counties can foot the bill for the software, the consultants, and the people required to tag and analyze the data? The number is modest. How many US states can afford the investment in high end cyber eDiscovery software? Again, the number is small, and you can count out Illinois because getting bills paid is an interesting challenge. The same market size problem exists for US government entities.

Read more

Bitvore: The AI, Real Time, Custom Report Search Engine

May 16, 2017

Just when I thought information access had slumped quietly through another week, I read in the capitalist tool which you know as Forbes, the content marketing machine, this article:

This AI Search Engine Delivers Tailored Data to Companies in Real Time.

This write up struck me as more interesting than the most recent IBM Watson prime time commercial about smart software for zealous professional basketball fans or Lucidworks’ (really?) acquisition of the interface builder Twigkit. Forbes Magazine’s write up did not point out that the company seems to be channeling Palantir Technologies; for example, Jeff Curie, the president, refers to employees at Bitvorians. Take that, you Hobbits and Palanterians.

image

A Bitvore 3D data structure.

The AI, real time, custom report search engine is called Bitvore. Here in Harrod’s Creek, we recognized the combination of the computer term “bit” with a syllable from one of our favorite morphemes “vore” as in carnivore or omnivore or the vegan-sensitive herbivore.

Read more

Next Page »

  • Archives

  • Recent Posts

  • Meta