Enterprise Search: Confusing Going to Weeds with Being Weeds

November 30, 2014

I seem to run into references to the write up by a “expert”. I know the person is an expert because the author says:

As an Enterprise Search expert, I get a lot of questions about Search and Information Architecture (IA).

The source of this remarkable personal characterization is “Prevent Enterprise Search from going to the Weeds.” Spoiler alert: I am on record as documenting that enterprise search is at a dead end, unpainted, unloved, and stuck on the margins of big time enterprise information applications. For details, read the free vendor profiles at www.xenky.com/vendor-profiles or, if you can find them, read one of my books such as The New Landscape of Search.

Okay. Let’s assume the person writing the Weeds’ article is an “expert”. The write up is about misconcepts [sic]; specifically, crazy ideas about what a 50 year plus old technology can do. The solution to misconceptions is “information architecture.” Now I am not sure what “search” means. But I have no solid hooks on which to hang the notion of “information architecture” in this era of cloud based services. Well, the explanation of information architecture is presented via a metaphor:

The key is to understand: IA and search are business processes, rather than one-time IT projects. They’re like gardening: It’s up to you if you want a nice and tidy garden — or an overgrown jungle.

Gentle reader, the fact that enterprise search has been confused with search engine optimization is one thing. The fact that there are a number of companies happily leapfrogging the purveyors of utilities to make SharePoint better or improve automatic indexing is another.

Let’s look at each of the “misconceptions” and ask, “Is search going to the weeds or is search itself weeds?”

The starting line for the write up is that no one needs to worry about information architecture because search “will do everything for us.” How are thoughts about plumbing and a utility function equivalent. The issue is not whether a system runs on premises, from the cloud, or in some hybrid set up. The question is, “What has to be provided to allow a person to do his or her job?” In most cases, delivering something that addresses the employee’s need is overlooked. The reason is that the problem is one that requires the attention of individuals who know budgets, know goals, and know technology options. The confluence of these three characteristics is quite rare in my experience. Many of the “experts” working enterprise search are either frustrated and somewhat insecure academics or individuals who bounced into a niche where the barriers to entry are a millimeter or two high.

Next there is a perception, asserts the “expert”, that search and information architecture are one time jobs. If one wants to win the confidence of a potential customer, explaining that the bills will just keep on coming is a tactic I have not used. I suppose it works, but the incredible turnover in organizations makes it easy for an unscrupulous person to just keep on billing. The high levels of dissatisfaction result from a number of problems. Pumping money into a failure is what prompted one French engineering company to buy a search system and sideline the incumbent. Endless meetings about how to set up enterprise systems are ones to which search “experts” are not invited. The information technology professionals have learned that search is not exactly a career building discipline. Furthermore, search “experts” are left out of meetings because information technology professionals have learned that a search system will consume every available resource and produce a steady flow of calls to the help desk. Figuring out what to build still occupies Google and Amazon. Few organizations are able to do much more that embrace the status quo and wait until a mid tier consultant, a cost consultant, or a competitor provides the stimulus to move. Search “experts” are, in my experience, on the outside of serious engineering work at many information access challenged organizations. That’s a good thing in my view.

The middle example is what the expert calls “one size fits all.” Yep, that was the pitch of some of the early search vendors. These folks packaged keyword search and promised that it would slice, dice, and chop. The reality of information, even for the next generation information access companies with which I work, focus on making customization as painless as possible. In fact, these outfits provide some ready-to-roll components, but where the rubber meets the road is providing information tailored to each team or individual user. At Target last night, my wife and I bought Christmas gifts for needy people. One of the gifts was a 3X sweater. We had a heck of a time figuring out if the store offered such a product. Customization is necessary for more and more every day situations. In organizations, customization is the name of the game. The companies pitching enterprise search today lag behind next generation information access providers in this very important functionality. The reason is that the companies lack the resources and insight needed to deliver. But what about information architecture? How does one cloud based search service differ from another? Can you explain the technical and cost and performance differences between SearchBlox and Datastax?

The penultimate point is just plain humorous: Search is easy. I agree that search is a difficult task. The point is that no one cares how hard it is. What users want are systems that facilitate their decision making or work. In this blog I reproduced a diagram showing one firm’s vision for indexing. Suffice it to say that few organizations know why that complexity is important. The vendor has to deliver a solution that fits the technical profile, the budget, and the needs of an organization. Here is the diagram. Draw your own conclusion:

infolibrarian-metadata-data-goverance-building-blocks

The final point is poignant. Search, the “expert” says, can be a security leak. No, people are the security link. There are systems that process open source intelligence and take predictive, automatic action to secure networks. If an individual wants to leak information, even today’s most robust predictive systems struggle to prevent that action. The most advanced systems from Centripetal Networks and Zerofox offer robust systems, but a determined individual can allow information to escape. What is wrong with search has to do with the way in which provided security components are implemented. Again we are back to people. Information architecture can play a role, but it is unlikely that an organization will treat search differently from legal information or employee pay data. There are classes of information to which individuals have access. The notion that a search system provides access to “all information” is laughable.

I want to step back from this “expert’s” analysis. Search has a long history. If we go back and look at what Fulcrum Technologies or Verity set out to do, the journeys of the two companies are quite instructive. Both moved quickly to wrap keyword search with a wide range of other functions. The reason for this was that customers needed more than search. Fulcrum is now part of OpenText, and you can buy nubbins of Fulcrum’s 30 year old technology today, but it is wrapped in huge wads of wool that comprise OpenText’s products and services. Verity offered some nifty security features and what happened? The company chewed through CEOs, became hugely bloated, struggled for revenues, and end up as part of Autonomy. And what about Autonomy? HP is trying to answer that question.

Net net: This weeds write up seems to have a life of its own. For me, search is just weeds, clogging the garden of 21st century information access. The challenges are beyond search. Experts who conflate odd bits of jargon are the folks who contribute to confusion about why Lucene is just good enough so those in an organization concerned with results can focus on next generation information access providers.

Stephen E Arnold, November 30, 2014

Government Initiatives and Search: A Make-Work Project or Innovation Driver?

March 25, 2013

I don’t want to pick on government funding of research into search and retrieval. My goodness, pointing out that payoffs from government funded research into information retrieval would bring down the wrath of the Greek gods. Canada, the European Community, the US government, Japan, and dozens of other nation states have poured funds into search.

In the US, a look at the projects underway at the Center for Intelligent Information Retrieval reveals a wide range of investigations. Three of the projects have National Science Foundation support: Connecting the ephemeral and archival information networks, Transforming long queries, and Mining a million scanned books. These are interesting topics and the activity is paralleled in other agencies and in other countries.

Is fundamental research into search high level busy work. Researchers are busy but the results are not having a significant impact on most users who struggle with modern systems usability, relevance, and accuracy.

In 2007 I read “Meeting of the MINDS: An Information Retrieval Research Agenda.” The report was sponsored by various US government agencies. The points made in the report were, like the University of Massachusetts’ current research run down, were excellent. The 2007 recent influences are timely six years later. The questions about commercial search engines, if anything, are unanswered. The challenges of heterogeneous data also remain. Information analysis and organization which is today associated with analytics and visualization-centric systems could be reprinted with virtually no changes. I cite one example, now 72 months young, for your consideration:

We believe the next generation of IR systems will have to provide specific tools for information transformation and user-information manipulation. Tools for information transformation in real time in response to a query will include, for example, (a) clustering of documents or document passages to identify both an information group and also the document or set of passages that is representative of the group; (b) linking retrieved items in timelines that reflect the precedence or pseudo-causal relations among related items; (c) highlighting the implicit social networks among the entities (individuals) in retrieved material;
and (d) summarizing and arranging the responses in useful rhetorical presentations, such as giving the gist of the “for” vs. the “against” arguments in a set of responses on the question of whether surgery is recommended for very early-stage breast cancer. Tools for information manipulation will include, for example, interfaces that help a person visualize and explore the information that is thematically related to the query. In general, the system will have to support the user both actively, as when the user designates a specific information transformation (e.g., an arrangement of data along a timeline), and also passively, as when the system recognizes that the user is engaged in a particular task (e.g., writing a report on a competing business). The selection of information to retrieve, the organization of results, and how the results are displayed to the user all are part of the new model of relevance.

In Europe, there are similar programs. Examples range from Europa’s sprawling ambitions to Future Internet activities. There is Promise. There are data forums, health competence initiatives, and “impact”. See, for example, Impact. I documented Japan’s activities in the 1990s in my monograph Investing in an Information Infrastructure, which is now out of print. A quick look at Japan’s economic situation and its role in search and retrieval reveals that modest progress has been made.

Stepping back, the larger question is, “What has been the direct benefit of these government initiatives in search and retrieval?”

On one hand, a number of projects and companies have been kept afloat due to the funds injected into them. In-Q-Tel has supported dozens of commercial enterprises, and most of them remain somewhat narrowly focused solution providers. Their work has been suggestive, but none has achieved the breathtaking heights of Facebook or Twitter. (Search is a tiny part of these two firms, of course, but the government funding has not had a comparable winner in my opinion.) The benefit has been employment, publications like the one cited above, and opportunities for researchers to work in a community.,

On the other hand, the fungible benefits have been modest. As the economic situation in the US, Europe, and Japan has worsened, search has not kept pace. The success story is Google, which has used search to sell advertising. I suppose that’s an innovation, but it is not one which is a result of government funding. The Autonomy, Endeca, Fast Search-type of payoff has been surprising. Money has been made by individuals, but the technology has created a number of waves. The Hewlett Packard Autonomy dust up is an example. Endeca is a unit of Oracle and is becoming more of a utility than a technology game changer. Fast Search has largely contracted and has, like Endeca, become a component.

Some observations are warranted.

First, search and retrieval is a subject of intense interest. However, the progress in information retrieval is advancing just slowly in my opinion. I think there are fundamental issues which researchers have not been able to resolve. If anything, search is more complicated today than it was when the Minds Agenda cited above was published. The question is, “Maybe search is more difficult than finding the Higgs Boson?” If so, more funding for search and retrieval investigations is needed. The problem is that the US, Europe, and Japan are operating at a deficit. Priorities must come into play.

Second, the narrow focus of research, while useful, may generate insights which affect the margins of larger information retrieval questions. For example, modern systems can be spoofed. Modern systems generate strong user antipathy more than half the time because they are too hard to use or don’t answer the user’s question. The problem is that the systems output information which is quite likely incorrect or not useful. Search may contribute to poor decisions, not improve decisions. The notion that one is better off using more traditional methods of research is something not discussed by some of the professionals engaged in inventing, studying, or selling search technology.

Third, search has fragmented into a mind boggling number of disciplines and sub-disciplines. Examples range from Coveo (a company which has ingested millions in venture funding and support from the province of Québec) which is sometimes a customer support system and sometimes a search system to Palantir (a recipient of venture funding and US government funding) which outputs charts and graphs, relegating search to a utility function.

Net net: I am not advocating the position that search is unimportant. Information retrieval is very important. One cannot perform some work today unless one can locate a specific digital item in many cases.

The point is that money is being spent, energies invested, and initiatives launched without accountability. When programs go off the rails, these programs need to be redirected or, in some cases, terminated.

What’s going on is that information about search produced in 2007 is as fresh today as it was 72 months ago. That’s not a sign of progress. That’s a sign that very little progress is evident. The government initiatives have benefits in terms of making jobs and funding some start ups. I am not sure that the benefits affect a broader base of people.

With deficit financing the new normal, I think accountability is needed. Do we need some conferences? Do we need giveaways like pens and bags? Do we need academic research projects running without oversight? Do we need to fund initiatives which generate Hollywood type outputs? Do we need more search systems which cannot detect semantically shaped or incorrect outputs?

Time for change is upon us.

Stephen E Arnold, March 25, 2013

PR in the Digital Arenas

January 27, 2013

I don’t know zip about public relations. First, I don’t do much “public” work. The connotation of “relations” remains mildly distasteful to me. I suppose that is a consequence of a high school English teacher who did not permit certain words to be used in class. If a student were to encounter a word on the banned list, he or she had to skip it when reading aloud. The notion of “public relations” gives me the willies.

You can check out the best in PR and real journalism in the scary “Microsoft: Google Blames Us for All Its Problems.” I thought I was jaded with corporate slickness. One is never too old to learn how the big guys handle communications.

I had a client ask me about a company which could post messages to LinkedIn and other social media. I motioned that the work was getting difficult. For example, Instagram wants a person who posts a picture to register with a government issued ID card. Now that is interesting because I use a passport for identification, and I am not too keen on having that information in the hands of a 20 something engineer working from a drafty apartment in a country to which the data processing has been outsourced. Also, LinkedIn has a number of groups which are managed by those who start the groups. LinkedIn wants anyone who found the group interesting to participate or the “member” is kicked out of the group. Some groups are lax about advertising. Other groups are not. LinkedIn has turned into a job hunting and marketing service, so its utility to me has declined. I find the “expert” commentary sent to me by LinkedIn employees annoying tool. Facebook is a wild and crazy place. I am not sure how the new Facebook search will work when a person posting can be linked to “interesting” topics and “friends.” The Google Plus thing is mandatory with each post linked to a “real” person. Maybe Google will just issue official ID cards and skip the government angle. Google’s mission to North Korea was fascinating, and I hope no one draws a connection between the Google visit and the increasingly hostile rhetoric from that country toward the United States.

So what about public relations.

I did a quick check online and found that a consulting and publishing company called O’Dwyer Company, Inc. publishes a list of the PR firms ranked by revenue. After all, what could be more important than revenue in today’s economic climate. (Do I hear a tiny voice saying, “Quality and integrity”? No, not here in Harrod’s Creek.

The list exists in a couple of different forms. The dates covered by the list are not clear to me. But the PR league table I reviewed contained 118 firms. Of these 118, the total revenue reported by O’Dwyer was $1,776,859,523, slightly more than the revenues for the enterprise search market which I wrote about here. The top 10 firms generated $1,120,706,215 or 63 percent of the total revenue in the O’Dwyer report. What’s interesting is that this concentration of money is similar to the concentration of revenues in enterprise search prior to the consolidation craze which peaked in 2012. Once a search vendor is absorbed into a giant new owner like Microsoft or Oracle, the revenues from search related deals disappears into the accounting miasma. Become too open about enterprise search revenues and an Autonomy type of situation may unfold.

What I found interesting was that of the top ten firms, two were flat with no significant increase in revenue and one new entrant was able to pump out $21 million quickly. Whoa, Nelly.

Another point I found interesting is that I recognized the “name” of these firms of the 118:

  • Edelman, not sure why
  • Waggener Ekstrom, the Microsoft PR outfit
  • Ruder Finn, not sure why.

Several observations:

  1. PR seems to be a low profile business. I am confident that the big dogs know how to market, but I am quite certain that most of the firms do not build a “brand” nor do they play a role in my world as “thought leaders.” I presume the reason is that the PR firms are so focused on their clients that any visibility for the PR firm would be a big no no.
  2. The revenues for PR are almost identical to those reported for enterprise search by Forrester. Does this mean that PR is a better business from a revenue point of view that search or content processing. Presumably the search vendors hire PR firms so the cash available for search marketing helps pump up the PR revenues. Interesting, particularly at a time when it is difficult to track sales to PR. (After all, if PR worked, wouldn’t the firms showing flat and declining revenue use their own tools to get those sales going?)
  3. PR, like enterprise search, generates one of those nifty long tale graphs which are so popular in today’s learned discussions about “concentration,” “oligopolies,” and “market forces.”

image

I told the client to take the O’Dwyer list and pick a firm close to home. The challenge is that the biggest firms are in the big cities; for example, Manhattan boasts 31 firms on the list, more if I include New Jersey and Connecticut. A quick check of Louisville, Kentucky’s PR density revealed 18 firms. More were listed if I tossed in marketing communications, social media, and similarly nebulous terms. PR advisors are as plentiful as consultants it seems. The swelling ranks of the unemployed creates a fertile ground for advisors, wizards, mavens, and poobahs in search, business consulting, and public relations.

My big finding is that the vast majority of public relations firms are likely to be struggling to generate revenue. What’s new in today’s economy? Is PR a discipline? Don’t know. Don’t care. I do know I tell those who write me PR spam that I am not a journalist. I get pretty frisky when people ignore my about page and assume I am, at age 69, a real journalist. Heaven forbid that I should be confused with a real journalist, a PR person, or an effective marketer. I am none of those things. Never will be.

Stephen E Arnold, January 26, 2013

The Alleged Received Wisdom about Predictive Coding

June 19, 2012

Let’s start off with a recommendation. Snag a copy of the Wall Street Journal and read the hard copy front page story in the Marketplace section, “Computers Carry Water of Pretrial Legal Work.” In theory, you can read the story online if you don’t have Sections A-1, A-10 of the June 18, 2012, newspaper. Check out a variant of the story appears as “Why Hire a Lawyer? Computers Are Cheaper.”

Now let me offer a possibly shocking observation: The costs of litigation are not going down for certain legal matters. Neither bargain basement human attorneys nor Fancy Dan content processing systems make the legal bills smaller. Your mileage may vary, but for those snared in some legal traffic jams, costs are tough to control. In fact, search and content processing can impact costs, just not in the way some of the licensees of next generation systems expect. That is one of the mysteries of online that few can penetrate.

The main idea of the Wall Street Journal story is that “predictive coding” can do work that human lawyers do for a higher cost but sometimes with much less precision. That’s the hint about costs in my opinion. But the article is traditional journalistic gold. Coming from the Murdoch organization, what did I expect? i2 Group has been chugging along with relationship maps for case analyses of important matters since 1990. Big alert: i2 Ltd. was a client of mine. Let’s see that was more than a couple of weeks ago that basic discovery functions were available.

The write up quotes published analyses which indicate that when humans review documents, those humans get tired and do a lousy job. The article cites “experts” who from Thomson Reuters, a firm steeped in legal and digital expertise, who point out that predictive coding is going to be an even bigger business. Here’s the passage I underlined: “Greg McPolin, an executive at the legal outsourcing firm Pangea3 which is owned by Thomson Reuters Corp., says about one third of the company’s clients are considering using predictive coding in their matters.” This factoid is likely to spawn a swarm of azure chip consultants who will explain how big the market for predictive coding will be. Good news for the firms engaged in this content processing activity.

What goes faster? The costs of a legal matter or the costs of a legal matter that requires automation and trained attorneys? Why do companies embrace automation plus human attorneys? Risk certainly is a turbo charger?

The article also explains how predictive coding works, offers some cost estimates for various actions related to a document, and adds some cautionary points about predictive coding proving itself in court. In short, we have a touchstone document about this niche in search and content processing.

My thoughts about predictive coding are related to the broader trends in the use of systems and methods to figure out what is in a corpus and what a document is about.

First, the driver for most content processing is related to two quite human needs. First, the costs of coping with large volumes of information is high and going up fast. Second, the need to reduce risk. Most professionals find quips about orange jump suits, sharing a cell with Mr. Madoff, and the iconic “perp walk” downright depressing. When a legal matter surfaces, the need to know what’s in a collection of content like corporate email is high. The need for speed is driven by executive urgency. The cost factor clicks in when the chief financial officer has to figure out the costs of determining what’s in those documents. Predictive coding to the rescue. One firm used the phrase “rocket docket” to communicate speed. Other firms promise optimized statistical routines. The big idea is that automation is fast and cheaper than having lots of attorneys sifting through documents in printed or digital form. The Wall Street Journal is right. Automated content processing is going to be a big business. I just hit the two key drivers. Why dance around what is fueling this sector?

Read more

Prediction, Metadata, and Good Enough

June 14, 2012

Several PR mavens have sent me today multiple unsolicited emails about their clients’ predictive statistical methods. I don’t like spam email. I don’t like PR advisories that promise wild and crazy benefits for predictive analytics applied to big data, indexing content, or figuring out what stocks to buy.

March Communications was pitching Lavastorm and Kabel Deutschland. The subject analytics—real time, predictive, and discovery driven.

Baloney.

Predictive analytics can be helpful in many business and technical processes. Examples range from figuring out where to sell an off lease mint green Ford Mustang convertible to planning when to ramp up outputs from a power generation station. Where predictive analytics are not yet ready for prime time is identifying which horse will win the Kentucky Derby and determining where the next Hollywood starlet will crash a sports car. Predictive methods can suggest how many cancer cells will die under certain conditions and assumptions, but the methods cannot identify which cancer cells will die.

Can predictive analytics make you a big winner at the race track? If firms with rock sold predictive analytics could predict a horse race, would these firms be selling software or would these firms be betting on horse races?

That’s an important point. Marketers promise magic. Predictive methods deliver results that provide some insight but rarely rock solid outputs. Prediction is fuzzy. Good enough is often the best a method can provide.

In between is where hopes and dreams rise and fall with less clear cut results. I am, of course, referring to the use by marketers of lingo like this:

The idea behind these buzzwords is that numerical recipes can process information or data and assign probabilities to outputs. When one ranks the outputs from highest probability to lowest probability, an analyst or another script can pluck the top five outputs. These outputs are the most likely to occur. The approach works for certain Google-type caching methods, providing feedback to consumer health searchers, and figuring out how much bandwidth is needed for a new office building when it is fully occupied. Picking numbers at the casino? Not so much.

Read more

Is Google a Monopoly? Is There Internet Freedom?

June 8, 2012

You will want to read the Wall Street Journal hard copy edition’s story “Google Monopoly and Internet Freedom.” (You may be able to access the online version at this link, but no promises where News Corp.’s business model is in action.) The print version is important. The article—more accurately, the “essay,” “op-ed,” or “gentrified blog post”—has price of place. Perched at the top of the “Opinion” page A-15, the four-column item comes with a beefy headline and a color picture. The author is Jeffrey Katz, who is “the CEO of Nextag, and a former CEO of Orbitz Inc., Swissair, and LeapFrog Enterprises.”

wavedistortion copy

Is distortion inevitable or is a part of decision making?

I was not familiar with Mr. Katz. A biography appears on the Nextag Web site. He is a Stanford graduate, and he flew from the airline industry to learning products to Nextag. That company loves shopping. The company says:

Expert deal-hunters since 1999, we make it surprisingly easy for you to find everything from tech to travel to tiki torches all at the price, place and moment that’s right for you. Browse, review, share, get the 411, get the deal: with Nextag, you’ll love the way you shop. 30+ million people consult us each month to make their online purchases, and we use our best-in-class search technology and proven expertise to ensure that each and every one of those shoppers is a happy one. This focus and commitment benefits our partners as well, delivering impressive sales volume and ROI for merchants and a streamlined user experience for search providers. (Source: http://www.nextag.com/about/main)

The background helps because I understand that online ticket agencies and online shopping comparison sites need utility services to allow these enterprises to do business without having to build a global infrastructure, attract and cultivate large numbers of users, and have a business model based on advertising.

Point of view is important.

In the News Corp. essay, Mr. Katz points out that Google is powerful. Well, that’s not much of a surprise. The company is more than a decade old, has an enviable business model, and online technology which works. I enjoy comparing Google’s ability to deliver online services when I sit in an airport waiting for United Airlines to cope with the 300 people stranded in London Heathrow on Friday June 1, 2012. Have you had an experience similar to mine with an airline. I also recall fondly turning up at a hotel with my Orbitz reservation in hand to hear, “Sir, we have no record of your reservation.” I also enjoy the many messages which induce me to compare prices at Nextag.com. In 2009 Nextag filled my Yahoo page with Nextag ads. (See this Yahoo Answers response.) Nextag has implemented an “advertising cookie opt out.” You can learn more here. I, therefore, find the suggestions Mr. Katz offers to Google fascinating.

First, Mr. Katz asserts that “Google needs to be transparent about how its search engine operates.” He believes that Google “hides behind forded-tongue gobbledygook that is meant to obfuscate.” I don’t agree. I have written three monographs based on open source information provided by Google to anyone who takes the time to read it. The disconnect is that Google is a deeply technical company, and it does a very good job of explaining its systems and methods. However, if a person is an expert because he or she can use a browser to surf the Web, that type of knowledge is not going to be particularly helpful. For example, one of the systems and methods in use at Google involves populating missing cells in a database. The approach is clearly explained again and again and again. Most recently Dr. Alon Halevy gave yet another repetitive presentation about this methods at the EDBT/ICDT 2012 Joint Conference on March 26 to 30, 2012 in Berlin, Germany. Of the major information retrieval companies with which I am familiar, Google does one of the best jobs making crystal clear exactly what it does, when, and under what circumstances. The problem is that if one lacks the motivation, resources, or sticktoitivity, the Google information is tough to parse. Want to know how Google search works, read U.S. Patent 628599. There it is. English. Clear. Equations. Background. Functions. What exactly does Mr. Katz want Google to do that it is not doing? Believe me, my relative Vladimir Ivanovich Arnold would have had zero trouble figuring out what Google does, and he would have been able to replicate it. The problem is that some folks are less sharp than Googlers and my uncle. If one does not take time to learn from what is publicly available, why should Google invest time and money in what amounts to remedial education?

Second, Mr. Katz opines, “Google should provide consumers with access to the unbiased search results it was once known for—regardless of which company or organization owns the service. It should also allow users to reduce the number of ads shown or incorporate a user’s preferred services in search results.” First, no set of search results from any vendor or any system at any time has delivered unbiased search results. The decision to use a specific relevancy method, what stop words to use, how to implement a default Boolean AND or OR, or any of hundreds of other key decisions introduces variants in search results. Research itself is not unbiased. As soon as sampling is used within any online system, objectivity is sacrificed. Hey, ask two advisors what to do about a personnel issue and you get non-objective results. Google is upfront and clear about the systems and methods used to determine what gets shown under what circumstances. Pick one of Google’s public disclosures—say, for example, US8065311. Google has dozens of open source publications that explains the exact system and method used to perform a specific task. What Mr. Katz wants is for Google to explain something that most Googlers could not figure out in a month of Sundays. Google uses “smart” software. When inputs change, then the selection of a particular method occurs. Not every method gets selected for every input. As a result, the outputs adapt to inputs. With millions of these decisions made in an interdependent system, exactly what does Mr. Katz want Google to explain? My suggestion. Read what Google has written. The cloud of unknowing is not caused by Google. But asking for an explanation of a particular action within a massively parallel intelligent system is what I would describe as “uninformed.”

Third, Mr. Katz wants one of those categorical affirmatives which I find logically uncomfortable. He says, “

Google should grant all companies equal access to advertising opportunities regardless of whether they are considered a competitor. Given its market share and public commitment to providing users with the most relevant, helpful information, Google has an obligation to provide a level playing field.

My hunch is that in Mr. Katz’s own business operations, there are business processes which are of great interest to consumers; for example, when I run a query on Nextag.com, “Why do I see eBay results at the top of a results list with a big logo?” I don’t want eBay results. How does Mr. Katz implement this specific function? Does it apply to “all” result sets? You don’t need me to write down trade secret type of questions because no executive is going to reveal these unless there are quite specific circumstances and safeguards in place. Why should a company which has an obligation to its shareholders do anything other than focus on delivering value to those shareholders as long as those actions are within the letter and spirit of applicable regulations. I don’t own shares in Google, but if I did, I would expect Google to take appropriate steps to grow the company’s revenue and profits. The reason is anchored in how capitalism works. Is Mr. Katz uncomfortable with capitalism when practiced with considerable skill and finesse?

The final point is an interesting one. Mr. Katz offers:

But mostly, Google should take a good, hard look at its philosophy and business model, and ask if this is the company Sergey Brin and Larry Page set out to build when they chose as their motto: “Don’t be evil.”

Ah, the chestnut “Don’t be evil.” In my research, the phrase originated with another Googler and it ended up becoming the shibboleth waved in front of the bulls running after Messrs. Brin and Page. The current business environment is easy to explain: If you can generate revenue by an appropriate business model, do it. One does not need to flip through Shcumpeter’s or Austrian school economists’ writings for an explanation. Good and evil have zero to do with business. I have experienced the pragmatism of changing a flight using Orbitz. I have to pay. I have experienced the thrill of contacting a merchant, ordering a product identified by Nextag, and then receiving a bait-and-switch in a week. I had to live with the trickery because neither the online service nor the delivery company was “responsible.” Hmmm. Why not do some local investigation into business practices, Mr. Katz.

Now what this News Corp. write up is “about” in my opinion is:

  1. Nextag wants more traffic and preferential listings for its Web pages. I understand the desire to get more from Google’s free service, but why should Google do any more or any less than it is now doing. Google is tweaking its systems, methods, and business models. Are these actions not permitted? “Compete more effectively. Complain less.” might be a starting point.
  2. I believe the News Corp. wants to advance agendas. I hope that the Wall Street Journal is above the alleged criminal behavior associated with some News Corp. properties. But there is Fox News, and it seems to advance an agenda. When I read Mr. Katz-type opinion pieces, I wonder, “Is the Wall Street Journal looking for clicks or just poking Google in the ribs because it is thriving and the Wall Street Journal is dogpaddling in terms of advertising revenue?” Just a question. Nothing concrete. But there is potential for bias when making decisions about what action to take, what story to feature, what numerical recipe to employ.
  3. Writing about Google serves the needs of the readers. I think that the Wall Street Journal is adopting some of the methods which have made Mr. Murdoch’s properties successful for many years. Hard business reporting is expensive and Google is important. I would like to see more analysis of Google’s enterprise strategy as articulated by the most recent vice president responsible for what seems to me a most disappointing market initiative. I would like to see less of the Monday morning quarterbacking.

I don’t have any direct involvement with Google. In fact, I spend less and less of ArnoldIT’s research resources chasing down the company’s innovations. The reason warrants an in-depth article in a newspaper like the Wall Street Journal. Why has Google’s ability to innovate internally become such a problem? What are the management methods Google will use to integrate its recent spate of acquisitions into the firm’s existing service line? How will Google’s dataspace and semantic technology contribute to predictive search outputs; that is, search without search? I at 68, and I think I will go gently into that good night without reading substantive business analyses about an important company in a Murdoch publication. I will have ample opportunities to read baloney about Google. That’s too bad. Who’s being “evil”? Am I? Google? The Wall Street Journal?

Stephen E Arnold, June 8, 2012

Freebie from ArnoldIT.com

The Courier Journal: A Louisville Death Rattle

May 13, 2012

In 1981, I joined the Courier Journal and Louisville Times. That was 31 years ago. I am not sure how I made the decision to leave the Washington, DC, area to journey to a city whose zip code and telephone area code were unknown to me. I am a 212, 202, and 301 type of person.

I recall meeting Barry Bingham Jr. He asked me what I did in my spare time. I was thunderstruck. My former employers—Halliburton Nuclear Utility Services and Booz, Allen & Hamilton—never asked me those questions. Those high powered, hard charging outfits wanted to know how much revenue I had generated and how much money I had saved the company, when the next meeting with the Joint Committee on Atomic Energy was, and how the Cleveland Design & Development man trip vehicle was rolling along. The personal stuff floored me.

 

I did not have an answer. As a Type A, Midwestern, over-achieving, no-brothers-and-no sisters worker bee, fun was not a big part of my personal repertoire.

I asked him, “Why?”

I recall to this day his answer, “I want our officers and employees to have time with their families, get involved in the community, and do great work without getting into that New York City thing.”

Interesting. The Courier Journal had a very good reputation. The newspaper was profitable, operated a wide range of businesses, printed the New York Times’s magazine for the Gray Lady, and operated a commercial database company. In fact, in 1980 the Courier Journal was one of the leaders in commercial online information, competing with a handful of other companies in the delivery of information via digital channels, not the dead-tree, ruin-the-environment, and dump-chemicals approach of most publishing companies.

In 1986, Gannet bought the Courier Journal. The commercial database unit was of zero interest to Gannet, so it and I were sold to Bell+Howell. After a short stint at a company entrenched in 16 mm motion film projectors, I headed back to New York City.

I retained my residence in Louisville, and I have watched the trajectory of the Courier Journal as it moved forward.

I have to be blunt. The Courier Journal is not the newspaper, the company, or the community force it was when I joined Mr. Bingham and a surprisingly diverse, bright, forward-looking team 31 years ago. The 1981 management approach of the Courier Journal was a culture shock to me. Think of the difference between Dick Cheney and Mr. Rogers. The 2012 approach saddens me.

This morning I read “Answering Your Questions on CJ Changes,” written by a person whom I do not know. The author of the article is Wesley Jackson, publisher of the Courier Journal. (I never liked the acronym CJ and still do not.)

The main point of the article is that the Courier Journal has to raise its prices. Last week, Mr. Jackson wrote a short article in the Courier Journal informing subscribers a letter would arrive explaining the new services that would be available. We received our letter on Wednesday, May 9, 2012. We called on Thursday, May 10, 2012, and cancelled our subscription. I am not sure how many other subscribers took this action, but a sufficient number of Courier Journal readers called to kill the phone system at the newspaper.

Mr. Jackson wrote this morning:

Unfortunately our Customer Service Center’s phone system had technical problems, and many of you  had long wait times or could not get through to get your questions answered. That I know was frustrating.

I bet. I would love to see the data about the number of calls and the number of cancellations that the paper received when it announced the rate hike, a free iPad application for subscribers, and an email copy of the newspaper sent each day to paying customers.

The write up troubled me for several other reasons:

  1. Some of the word choices were of the touchy-feely school of communication. There are 19 “we’s”. The word “value” appears twice, there are seven categoricals: six all’s and one never; and the word “conversation” appears twice.
  2. There is at least one split infinitive “to personally apologize”
  3. An absolutely amazing promise expressed in this statement: “For those of you who would like to ask questions directly, please email me at publisher@courier-journal.com or send a letter to Publisher, Courier-Journal Media, 525 W. Broadway, Louisville, KY 40202. I promise you will each receive a response.”

“Promise,” “all,” and “never”—yep, I believe those assertions.

I would have included an image of Wesley Jackson but I had to pay for it. Not today, sorry.

My view is that I hear a death rattle from the Courier Journal. The reality of the newspaper is that it runs more and more syndicated content. The type of local coverage for which the paper was known when I joined in 1981 has decreased over the years. When I want news, I look at online services. What I have noticed is that what appears in the Courier Journal has been mentioned on Facebook, Twitter, or headline aggregation services two or three days before the information appears in either the Courier Journal’s hard copy edition or its online site, www.courier-journal.com.

Dave Kellogg, the former president of MarkLogic, used to chide me that I should not refer to major publishing operations and “dead tree publishers.” My view was and is that I am entitled to my opinion. Traditional publishing companies have failed to respond to new opportunities to disseminate and profit from information opportunities.

The list of mistakes include:

  1. Belief that an app will generate new revenue. Unfortunately apps are not automatic money machines. (Print-centric apps are not the go-to medium for many digital device users.)
  2. Assumptions about a person’s appetite for paying for “nice to have content.” (One pays for “must have” content, not “nice to have” content.)
  3. Failure to control costs. (Print margins continue to narrow as traditio0nal publishers try to regain the glory of the pre digital business models.)
  4. Firing staff who then go on to compete by generating content funded by a different business model. (This blog is an example. We do online advertising and inclusions and sell technical services. For some reason, this works for me thanks to my team which includes some former “real” journalists.)
  5. Assuming that new technology for printing color on newsprint equips an information technology department that it can handle other information technologies in an effective manner. (Skill in one technical area does not automatically transfer to another technical field.)

I can hear the labored breathing of a local newspaper struggling to stay alive. What do you hear?

Stephen E Arnold, May 13, 2012

Sponsored by HighGainBlog, which is ArnoldIT

Two Pundits and Their Punditry

March 31, 2012

I find the notion of pundits fascinating. The US in 2012 pivots on a news hook, the Warhol fame thing, and a desire to share viewpoints to Flipbook and Pulse users.

This morning I was listening to the crackle of small arms fire in rural Kentucky. Dawn had not yet extended its crepuscular reach to my hollow but two write ups did. Neither is one of those magnum loads squirrel hunters desire here in the Commonwealth. Nope, these were birdshot, but each write up is interesting nonetheless.

Both indirectly concern search and retrieval. Both found their way into my “gems of the poobahs” folder.

First, I noted the digital Atlantic’s write up “The Advertising Industry’s Definition of ‘Do Not Track’ Doesn’t Make Sense.” What caught my attention was the juxtaposition of the word “advertising” with the phrase “doesn’t make sense.” Advertising making sense? The Atlantic “real” journalist has not watched television with a 67 year old. More than half of the TV commercials which I find embedded in basketball games every four minutes don’t make sense. Advertising is about creating a demand for must-have products. Advertising is part of the popular culture and an engine of growth for companies unable to generate sales without the craft and skill of psychological tactics. Check out an advertisement for Kentucky bourbon. Does this headline make sense?

“Honk if you’re proud to be a redneck?

As a resident of Kentucky, I am not sure I know what a redneck is, but I bet those folks in Boston do. But what’s “making sense” part. What advertising does is tickle the brain to make some folks want to drink. And we all know how important it is to imbibe whiskey, engage in “real” journalism, ferry children to soccer practice. Yep, makes “sense” to me.

But here’s the passage which caught my attention:

Stanford’s Aleecia McDonald found that 61 percent of people expect that clicking a Do Not Track button should shut off *all* data collection. Only 7 percent of people expected that websites could collect the same data before and after clicking a ‘Do Not Track’ button. That is to say, 93 percent of people do not understand the industry’s definition of DNT. Which totally makes sense! Who would ever think saying, “Do not track me,” actually means, “It’s fine to collect data on me, but don’t show me any signs that you’re doing so.” Simply because the industry itself has defined ‘Do Not Track’ in an idiosyncratic way doesn’t mean their self-serving decision should be the basis for all policy and practice in this field.

Almost any redneck would understand this passage, the implications of persistent cookies, and the distinction between various types of tracking, including my favorite, iFrames-based method.

Second, I read “Debunking Senator Al Franken On Google, The Internet & Privacy.” This screed is from a “real” journalist and favorite source of juicy quotes on the subject of search and retrieval. The point of the write up is that despite the author’s affection for a US senator as a comedian, the US senator does not know beans about tracking, Google, and, by extension, search and retrieval. Now “search” does not mean find. Search, I believe, means to the “real” journalist using methods to generate traffic to a Web site. I define “search” differently, but the good part in my opinion is this passage:

Ya think? But I mean, Facebook kind of does sell my friends. I can export all of them out to Yahoo and Bing, because Facebook and Yahoo and Bing all have deals. I can’t export them to Google, because, you know, they aren’t friends. Would you call that selling to the highest bidder? When I go over to search on Bing, by default, all my Facebook friends are being used to personalize my search results. Oh, I can opt-out, but you know how hard that is. Since that’s part of a Bing-Facebook deal, is that a line that’s crossed?

Please, read the entire “real” journalistic analysis of a talk by a US senator. I must admit I don’t relate to the questions and analytic points in this paragraph. I recognize the names of the companies mentioned, but “the deal” baffles me.

Why do I care? Three points:

  1. I sense the emotion in these write ups. Passion is good for advertising and good for capturing attention. However, I am struggling to figure out what the problem is. Advertising seems to be what America is. Untangling the warp and woof of this fabric is difficult for me.
  2. The ad hominem method and charged language causes me to think that the lingo of advertising has become the common parlance of “real” journalists.
  3. I struggle to unravel the meaning of certain parts of these two write ups. Am I alone?

Net net: technology and advertising are an interesting compound. Now “real” journalism is quite similar. To quote one “real” journalist, “Ya think?” Well, not much.

Stephen E Arnold, March 31, 2012

Sponsored by Pandia.com

A Road Map for Censorship

March 31, 2012

David Bamman, Brendan O’Connor, Noah A. Smith  present some interesting facts based on a study they wrote about in their article, Censorship and Deletion Practices in Chinese Social Media.  Their study touches on a variety of different aspects regarding how China allegedly controls the intake and outflow of information.

The Chinese government methods are far different from the United States’ approach. My understanding of the situation is that China takes censorship to extremes and infringes on the freedom of their citizens using the GFW (Great Firewall of China) , which filters key phrases and words, preventing access to sites like America’s Facebook and Google. However, Sina Weibo is the Chinese equivalent of Facebook where bloggers post and pass information presumably in a way the officials perceive as more suitable for the Middle Kingdom.

Sina Weibo is monitored and as long as members stay within the boundaries or disguise their information, posts go unnoticed. If any of the outlawed phrases are entered, the user’s post is deleted and anyone searching for the information is met with the phrase ‘Target weibo does not exist’. If the user properly masks the phrase or words used, the information will get through, showing that there is the possibility of future change regarding the censorship practices in China.

The GFW will catch obvious outgoing information such as political figures, which was monitored during the study. The article asserted:

In late June/early July 2011, rumors began circulating in the Chinese media that Jiang Zemin, general secretary of the Communist Party of China from 1989 to 2002, had died. These rumors reached their height on 6 July, with reports in the Wall Street Journal, Guardian and other Western media sources that Jiang’s name had been blocked in searches on Sina Weibo (Chin, 2011; Branigan, 2011). If we look at all 532 messages published during this time period that contain the name Jiang Zemin, we note a striking pattern of deletion: on 6 July, the height of the rumor, 64 of the 83 messages containing that name were deleted (77.1 percent); on 7 July, 29 of 31 (93.5 percent) were deleted.

No firewall is perfect, but according to the studies done on searches, blogs and texts containing prohibited information, China has a pretty impressive figure. It may not seem reasonable by American standards, but by filtering anything they deem as politically sensitive, China protects the privacy of their country, preventing global rumors and interference.

On one level, censorship makes sense, in particular regarding the business world. The Chinese government makes its corporations responsible for their employees, meaning if an employee is blogging instead of working and puts in illegal information, the company itself is fined, or worst case scenario, shut down. Thus Chinese factories have a high rate of productivity because their workers are actually doing their job.

How is China’s alleged position relevant to the US? There may be little relevance, but to officials in other countries, the article’s information may be just what one needs to check into a Holiday Inn of censorship.

Jennifer Shockley, March 31, 2012

Sponsored by Pandia.com

Another Pundit Outfit Predicts Doom for GOOG

March 17, 2012

I don’t think the Google is going anywhere. Granted the outfit is floundering, but have you ever tried to coordinate 60,000 employees with high IQs, deal with legal annoyances on every continent except Antarctica, and fight off the incursions of Amazon, Apple, Facebook, and Microsoft plus dozens of other companies looking to get a chunk of Googzilla’s tail? Nah, I did not think so. It is much, much easier to post punditage and collect paychecks.

I just read “This Is Why Google Is Losing the Future.” I grimaced at the “this is why” phrase and rebelled at “losing the future.” I wonder if the use of “its” was spiked by a “real” journalist. The point of the write up in my opinion was a way to work the word “crack” and the phrase “roach hotel” into a “real” article. I use on occasion Latin, Greek, and French. I don’t think I have ever used the phrase “roach hotel” to describe an online service. Nice metaphor.

Here’s the phrase that sets the news and opinion piece apart:

And, as an increasing number of developers feel that Google will treat them poorly, or that it is simply too much of a threat, it’s lost the future. Yet Larry Page is even telling his own engineers that they should leave if they don’t agree with his plan to focus on a “single, unified, ‘beautiful’ product across everything”. If that’s what’s happening inside the Googleplex, what hope for those on the outside? Let’s go back to where we started: the startup founder who sees Google as a drug dealer looking to offer him a sweetener that gets him addicted. Since he doesn’t want that to happen, he’s left with that single question.

I am okay with humor, sarcasm, criticism, and cynicism. I am not okay with “real” journalists, failed webmasters, unemployed political science majors working as experts, and folks who have never managed a big operation sitting in the balcony emitting catcalls.

I am not sure that heckling is particularly constructive even when the intended listener has no choice but attend to the message. The game  is traffic I suppose.

Stephen E Arnold, March 17, 2012

Sponsored by Pandia.com

Next Page »