The Courier Journal: A Louisville Death Rattle

May 13, 2012

In 1981, I joined the Courier Journal and Louisville Times. That was 31 years ago. I am not sure how I made the decision to leave the Washington, DC, area to journey to a city whose zip code and telephone area code were unknown to me. I am a 212, 202, and 301 type of person.

I recall meeting Barry Bingham Jr. He asked me what I did in my spare time. I was thunderstruck. My former employers—Halliburton Nuclear Utility Services and Booz, Allen & Hamilton—never asked me those questions. Those high powered, hard charging outfits wanted to know how much revenue I had generated and how much money I had saved the company, when the next meeting with the Joint Committee on Atomic Energy was, and how the Cleveland Design & Development man trip vehicle was rolling along. The personal stuff floored me.

I did not have an answer. As a Type A, Midwestern, over-achieving, no-brothers-and-no sisters worker bee, fun was not a big part of my personal repertoire.

I asked him, “Why?”

I recall to this day his answer, “I want our officers and employees to have time with their families, get involved in the community, and do great work without getting into that New York City thing.”

Interesting. The Courier Journal had a very good reputation. The newspaper was profitable, operated a wide range of businesses, printed the New York Times’s magazine for the Gray Lady, and operated a commercial database company. In fact, in 1980 the Courier Journal was one of the leaders in commercial online information, competing with a handful of other companies in the delivery of information via digital channels, not the dead-tree, ruin-the-environment, and dump-chemicals approach of most publishing companies.

In 1986, Gannet bought the Courier Journal. The commercial database unit was of zero interest to Gannet, so it and I were sold to Bell+Howell. After a short stint at a company entrenched in 16 mm motion film projectors, I headed back to New York City.

I retained my residence in Louisville, and I have watched the trajectory of the Courier Journal as it moved forward.

I have to be blunt. The Courier Journal is not the newspaper, the company, or the community force it was when I joined Mr. Bingham and a surprisingly diverse, bright, forward-looking team 31 years ago. The 1981 management approach of the Courier Journal was a culture shock to me. Think of the difference between Dick Cheney and Mr. Rogers. The 2012 approach saddens me.

This morning I read “Answering Your Questions on CJ Changes,” written by a person whom I do not know. The author of the article is Wesley Jackson, publisher of the Courier Journal. (I never liked the acronym CJ and still do not.)

The main point of the article is that the Courier Journal has to raise its prices. Last week, Mr. Jackson wrote a short article in the Courier Journal informing subscribers a letter would arrive explaining the new services that would be available. We received our letter on Wednesday, May 9, 2012. We called on Thursday, May 10, 2012, and cancelled our subscription. I am not sure how many other subscribers took this action, but a sufficient number of Courier Journal readers called to kill the phone system at the newspaper.

Mr. Jackson wrote this morning:

Unfortunately our Customer Service Center’s phone system had technical problems, and many of you had long wait times or could not get through to get your questions answered. That I know was frustrating.

I bet. I would love to see the data about the number of calls and the number of cancellations that the paper received when it announced the rate hike, a free iPad application for subscribers, and an email copy of the newspaper sent each day to paying customers.

The write up troubled me for several other reasons:

Some of the word choices were of the touchy-feely school of communication. There are 19 “we’s”. The word “value” appears twice, there are seven categoricals: six all’s and one never; and the word “conversation” appears twice.
There is at least one split infinitive “to personally apologize”
An absolutely amazing promise expressed in this statement: “For those of you who would like to ask questions directly, please email me at publisher@courier-journal.com or send a letter to Publisher, Courier-Journal Media, 525 W. Broadway, Louisville, KY 40202. I promise you will each receive a response.”

“Promise,” “all,” and “never”—yep, I believe those assertions.

I would have included an image of Wesley Jackson but I had to pay for it. Not today, sorry.

My view is that I hear a death rattle from the Courier Journal. The reality of the newspaper is that it runs more and more syndicated content. The type of local coverage for which the paper was known when I joined in 1981 has decreased over the years. When I want news, I look at online services. What I have noticed is that what appears in the Courier Journal has been mentioned on Facebook, Twitter, or headline aggregation services two or three days before the information appears in either the Courier Journal’s hard copy edition or its online site, www.courier-journal.com.

Dave Kellogg, the former president of MarkLogic, used to chide me that I should not refer to major publishing operations and “dead tree publishers.” My view was and is that I am entitled to my opinion. Traditional publishing companies have failed to respond to new opportunities to disseminate and profit from information opportunities.

The list of mistakes include:

Belief that an app will generate new revenue. Unfortunately apps are not automatic money machines. (Print-centric apps are not the go-to medium for many digital device users.)
Assumptions about a person’s appetite for paying for “nice to have content.” (One pays for “must have” content, not “nice to have” content.)
Failure to control costs. (Print margins continue to narrow as traditio0nal publishers try to regain the glory of the pre digital business models.)
Firing staff who then go on to compete by generating content funded by a different business model. (This blog is an example. We do online advertising and inclusions and sell technical services. For some reason, this works for me thanks to my team which includes some former “real” journalists.)
Assuming that new technology for printing color on newsprint equips an information technology department that it can handle other information technologies in an effective manner. (Skill in one technical area does not automatically transfer to another technical field.)

I can hear the labored breathing of a local newspaper struggling to stay alive. What do you hear?

Stephen E Arnold, May 13, 2012

Sponsored by HighGainBlog, which is ArnoldIT

Written by Stephen E. Arnold · Filed Under Editorial opinion, Feature, Publishing | 1 Comment

Two Pundits and Their Punditry

March 31, 2012

I find the notion of pundits fascinating. The US in 2012 pivots on a news hook, the Warhol fame thing, and a desire to share viewpoints to Flipbook and Pulse users.

This morning I was listening to the crackle of small arms fire in rural Kentucky. Dawn had not yet extended its crepuscular reach to my hollow but two write ups did. Neither is one of those magnum loads squirrel hunters desire here in the Commonwealth. Nope, these were birdshot, but each write up is interesting nonetheless.

Both indirectly concern search and retrieval. Both found their way into my “gems of the poobahs” folder.

First, I noted the digital Atlantic’s write up “The Advertising Industry’s Definition of ‘Do Not Track’ Doesn’t Make Sense.” What caught my attention was the juxtaposition of the word “advertising” with the phrase “doesn’t make sense.” Advertising making sense? The Atlantic “real” journalist has not watched television with a 67 year old. More than half of the TV commercials which I find embedded in basketball games every four minutes don’t make sense. Advertising is about creating a demand for must-have products. Advertising is part of the popular culture and an engine of growth for companies unable to generate sales without the craft and skill of psychological tactics. Check out an advertisement for Kentucky bourbon. Does this headline make sense?

“Honk if you’re proud to be a redneck?

As a resident of Kentucky, I am not sure I know what a redneck is, but I bet those folks in Boston do. But what’s “making sense” part. What advertising does is tickle the brain to make some folks want to drink. And we all know how important it is to imbibe whiskey, engage in “real” journalism, ferry children to soccer practice. Yep, makes “sense” to me.

But here’s the passage which caught my attention:

Stanford’s Aleecia McDonald found that 61 percent of people expect that clicking a Do Not Track button should shut off *all* data collection. Only 7 percent of people expected that websites could collect the same data before and after clicking a ‘Do Not Track’ button. That is to say, 93 percent of people do not understand the industry’s definition of DNT. Which totally makes sense! Who would ever think saying, “Do not track me,” actually means, “It’s fine to collect data on me, but don’t show me any signs that you’re doing so.” Simply because the industry itself has defined ‘Do Not Track’ in an idiosyncratic way doesn’t mean their self-serving decision should be the basis for all policy and practice in this field.

Almost any redneck would understand this passage, the implications of persistent cookies, and the distinction between various types of tracking, including my favorite, iFrames-based method.

Second, I read “Debunking Senator Al Franken On Google, The Internet & Privacy.” This screed is from a “real” journalist and favorite source of juicy quotes on the subject of search and retrieval. The point of the write up is that despite the author’s affection for a US senator as a comedian, the US senator does not know beans about tracking, Google, and, by extension, search and retrieval. Now “search” does not mean find. Search, I believe, means to the “real” journalist using methods to generate traffic to a Web site. I define “search” differently, but the good part in my opinion is this passage:

Ya think? But I mean, Facebook kind of does sell my friends. I can export all of them out to Yahoo and Bing, because Facebook and Yahoo and Bing all have deals. I can’t export them to Google, because, you know, they aren’t friends. Would you call that selling to the highest bidder? When I go over to search on Bing, by default, all my Facebook friends are being used to personalize my search results. Oh, I can opt-out, but you know how hard that is. Since that’s part of a Bing-Facebook deal, is that a line that’s crossed?

Please, read the entire “real” journalistic analysis of a talk by a US senator. I must admit I don’t relate to the questions and analytic points in this paragraph. I recognize the names of the companies mentioned, but “the deal” baffles me.

Why do I care? Three points:

I sense the emotion in these write ups. Passion is good for advertising and good for capturing attention. However, I am struggling to figure out what the problem is. Advertising seems to be what America is. Untangling the warp and woof of this fabric is difficult for me.
The ad hominem method and charged language causes me to think that the lingo of advertising has become the common parlance of “real” journalists.
I struggle to unravel the meaning of certain parts of these two write ups. Am I alone?

Net net: technology and advertising are an interesting compound. Now “real” journalism is quite similar. To quote one “real” journalist, “Ya think?” Well, not much.

Stephen E Arnold, March 31, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Editorial opinion, News, Online (general), Privacy | Comments Off on Two Pundits and Their Punditry

A Road Map for Censorship

March 31, 2012

David Bamman, Brendan O’Connor, Noah A. Smith present some interesting facts based on a study they wrote about in their article, Censorship and Deletion Practices in Chinese Social Media. Their study touches on a variety of different aspects regarding how China allegedly controls the intake and outflow of information.

The Chinese government methods are far different from the United States’ approach. My understanding of the situation is that China takes censorship to extremes and infringes on the freedom of their citizens using the GFW (Great Firewall of China) , which filters key phrases and words, preventing access to sites like America’s Facebook and Google. However, Sina Weibo is the Chinese equivalent of Facebook where bloggers post and pass information presumably in a way the officials perceive as more suitable for the Middle Kingdom.

Sina Weibo is monitored and as long as members stay within the boundaries or disguise their information, posts go unnoticed. If any of the outlawed phrases are entered, the user’s post is deleted and anyone searching for the information is met with the phrase ‘Target weibo does not exist’. If the user properly masks the phrase or words used, the information will get through, showing that there is the possibility of future change regarding the censorship practices in China.

The GFW will catch obvious outgoing information such as political figures, which was monitored during the study. The article asserted:

In late June/early July 2011, rumors began circulating in the Chinese media that Jiang Zemin, general secretary of the Communist Party of China from 1989 to 2002, had died. These rumors reached their height on 6 July, with reports in the Wall Street Journal, Guardian and other Western media sources that Jiang’s name had been blocked in searches on Sina Weibo (Chin, 2011; Branigan, 2011). If we look at all 532 messages published during this time period that contain the name Jiang Zemin, we note a striking pattern of deletion: on 6 July, the height of the rumor, 64 of the 83 messages containing that name were deleted (77.1 percent); on 7 July, 29 of 31 (93.5 percent) were deleted.

No firewall is perfect, but according to the studies done on searches, blogs and texts containing prohibited information, China has a pretty impressive figure. It may not seem reasonable by American standards, but by filtering anything they deem as politically sensitive, China protects the privacy of their country, preventing global rumors and interference.

On one level, censorship makes sense, in particular regarding the business world. The Chinese government makes its corporations responsible for their employees, meaning if an employee is blogging instead of working and puts in illegal information, the company itself is fined, or worst case scenario, shut down. Thus Chinese factories have a high rate of productivity because their workers are actually doing their job.

How is China’s alleged position relevant to the US? There may be little relevance, but to officials in other countries, the article’s information may be just what one needs to check into a Holiday Inn of censorship.

Jennifer Shockley, March 31, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Editorial opinion, News, Online (general), Privacy, Social | Comments Off on A Road Map for Censorship

Another Pundit Outfit Predicts Doom for GOOG

March 17, 2012

I don’t think the Google is going anywhere. Granted the outfit is floundering, but have you ever tried to coordinate 60,000 employees with high IQs, deal with legal annoyances on every continent except Antarctica, and fight off the incursions of Amazon, Apple, Facebook, and Microsoft plus dozens of other companies looking to get a chunk of Googzilla’s tail? Nah, I did not think so. It is much, much easier to post punditage and collect paychecks.

I just read “This Is Why Google Is Losing the Future.” I grimaced at the “this is why” phrase and rebelled at “losing the future.” I wonder if the use of “its” was spiked by a “real” journalist. The point of the write up in my opinion was a way to work the word “crack” and the phrase “roach hotel” into a “real” article. I use on occasion Latin, Greek, and French. I don’t think I have ever used the phrase “roach hotel” to describe an online service. Nice metaphor.

Here’s the phrase that sets the news and opinion piece apart:

And, as an increasing number of developers feel that Google will treat them poorly, or that it is simply too much of a threat, it’s lost the future. Yet Larry Page is even telling his own engineers that they should leave if they don’t agree with his plan to focus on a “single, unified, ‘beautiful’ product across everything”. If that’s what’s happening inside the Googleplex, what hope for those on the outside? Let’s go back to where we started: the startup founder who sees Google as a drug dealer looking to offer him a sweetener that gets him addicted. Since he doesn’t want that to happen, he’s left with that single question.

I am okay with humor, sarcasm, criticism, and cynicism. I am not okay with “real” journalists, failed webmasters, unemployed political science majors working as experts, and folks who have never managed a big operation sitting in the balcony emitting catcalls.

I am not sure that heckling is particularly constructive even when the intended listener has no choice but attend to the message. The game is traffic I suppose.

Stephen E Arnold, March 17, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Editorial opinion, Google, News | 1 Comment

Censorship Inputs: Filtering Content and Unintended Consequences

January 29, 2012

I find “inputs” annoying. An “input” is advice, a comment delivered in parental mode, or suggestions which are more about the person making the suggestion than the person receiving the suggestion. Twitter is getting “inputs” about the alleged filtering of tweets in certain countries. (Keep in mind that search engines filter on a routine basis.)

No tweets needed in this woodcut of the 1844 Nativist riot in Philadelphia. Social media just accelerates information flow. A happy quack to Wikipedia.

A good example is “Letter to Twitter Executive Chairman Jack Dorsey Urging Him Not to Cooperate with Censors.” The idea is a simple one—When asked to filter content, Twitter should ignore the request. But what happens when the request is made by a governmental entity? Does Twitter ignore that governmental request. This type of blow off sounds great sitting in a college dorm at 3 am talking about what is right and wrong. The problem is that it ignores three salient facts top most in the minds of governmental executives around the world:

Social media is the mechanism for starting and sustaining revolt. Even the Googler involved in Egypt’s transformation pointed the finger at Facebook. Facebook’s executives were half a world away and probably not thinking about the system as a mechanism for revolt.
Governments are behind the curve when it comes to technology. As a result, governments and officials with power want to stop the technology in its tracks. The idea is that if a service is a problem, one can make the problem go away. That’s why India, China, and other outfits want to clamp down hard on certain content channels or at least be able to pry them open and take action if warranted.
The companies want to keep earning money and keep their executives out of jail or out of harm’s way. Most of folks providing inputs don’t know what could and may happen to a frisky executive who ignores a request from a nation state. In case you don’t know, the actions range from jail time, death, harassment, and multiple actions across financial and personal spheres of behavior. This is hard ball, kids, and you need to know that nation states act lawfully within their borders and have the same extra-nation state options that the US, England, Israel, and other countries do.

Here’s an example of the sort of input which can lead to some interesting situations:

We are very disturbed by this decision, which is nothing other than local level censorship carried out in cooperation with local authorities and in accordance with local legislation, which often violates international free speech standards. Twitter’s position that freedom of expression is interpreted differently from country to country is inacceptable. This fundamental principle is enshrined in the Universal Declaration of Human Rights. We call on you to be transparent about the way you propose to carry out this censorship. Posting the removal requests you receive from governments on the Chilling Effects website will not suffice to offset the harm done by denying access to content. Twitter has said that, if it receives “a valid and properly scoped request from an authorized entity,” it may respond by withholding access to certain content in a particular country, while notifying the content’s author.

I heard that one nation state turned force on a crowd of protestors. See “Chinese Troops Seal Off Tibetan Protest Region.” Quite a spicy filter in my opinion. This is the real world and the social media which is touted as replacing search as the next big thing is fostering some interesting unintended consequences; namely, forcing governments to embrace tougher behaviors. I generally worked for governments and law enforcement. As a result, I am making an observation based on experience. There are two types of force: hard and soft. Filtering with software is about as soft as force gets. The hard force, on the other hand, is not something most readers of this blog want to experience as a receiver of input with intent.

Remember: I am okay with a person making inputs. I am not okay with the assumption that a commercial enterprise is going to be able to do the college dorm version of the “right thing.” Missing a class is one thing. Getting arrested, killed, or becoming the focus of a disinformation attack is another.

Finding is one thing. Inciting is quite another. Lowest common denominator, consumerization, commoditization—describe it as you will. There are interactions in the real world that don’t exist in a philosophical discussion among soon to be unemployable students.

Stephen E Arnold, January 29, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Editorial opinion, News, Social, Technology | 1 Comment

Googzilla Gets Social

January 11, 2012

I scanned the “official” line of Google’s most recent social play. I flipped through the long list of comments, views, opinions, etc. My reaction? What’s the big surprise. Here’s an anchor post: “Antitrust+,” which appeared in Parislemon. The main idea seems to be that pundits recognize Google, an outfit I called Googzilla back in 2005, is doing the beaver thing. (The notion of Googzilla originated from my research which revealed that Google believed that its “system” would provide the underpinnings for most business processes. Therefore, search was the new infrastructure. When I used this reference in a talk in London, the Googler on the panel with me said, “Cool.” Googzilla is just a big beaver, doing its beaver thing.) You may recall the adage, “Beavers do what beavers do.” Put the beaver in the kitchen of the Cast Iron Grill in Harrod’s Creek, Kentucky, and the beaver starts building a dam. Why? That’s what beavers do. Easy to predict because beavers do their thing. Here’s evidence of the Google-beaver similarity:

Google is using Search to propel their social network. They might say it’s “not a social network, it’s a part of Google”, but no one is going to buy that. They were late to the game in social and this is the best catch-up strategy ever. Given that it’s opt-out, I’m just not sure that this is all that different from Microsoft bundling IE with Windows.

Google is doing the social thing, not because Google is social. Google is doing social in order to remain relevant to the Facebook, Twitter, LinkedIn users. In these systems, content from humans is perceived to be more accurate, less biased, and generally more useful than a list of results in which ads, content, red herrings, and even malware lurk. Hey, some users seem to think, the social information is just “better.” When the user is looking for a short cut, getting mis- or dis-information from a “friend” is probably a better bet than taking what a non-social system generates.

Beavers do what beavers do. Why does one expect the beaver to build a computer when beavers build dams.

My view is that most of the free content available on the Web is dicey stuff. Most users today—including recent library school graduates—lack the skills to determine accurate content in most topic areas, distorted content with bent or shaped “facts”, content with mixed semantic or sentiment coloring, and the most relevant document for a particular query.

In short, “beavers do what beavers do” applies to Google, but the adage also applies to users who take what systems give them because advertisers and other funding sources foot the bill. Ask yourself these questions:

When I am looking for information, I consult multiple commercial databases, review a representative selection of the documents, and make judgments about which documents warrant further investigation?
When consuming results from any free online system, do I routinely verify facts by looking for another source which can verify the data in which I have an interest?
When accepting “hits” from predictive systems, I run the same query on another predictive system and evaluate the outputs?

I know from information gathered as recently as last week, that even among recent library school graduates that few, if any, perform these actions.

So Google is getting social because:

Facebook and other “real” competitors are nibbling into Google’s revenue growth system. In 2006, Google had essentially zero competitors. Today, Google is in an uncomfortable position. Amazon, Apple, Facebook, and even the once presumed terminal Microsoft are posing problems, big problems. Google’s management is responding with “me too” solutions in the hopes that sheer imitation will solve the competitive gap problem. The beaver is doing what the beaver does.
Google’s gravity free run is now carrying the ballast of staff retention. With the big paydays coming to employees of pre-IPO companies, 13 year old outfits don’t have that old hiring magnetism any longer. As a result, Google cannot innovate and disrupt. Google is now in the imitate and disrupt mode in my opinion. Aging beavers do what aging beavers do; that is, look for short cuts.
Google must push through increasing friction. The resistance is coming from regulators who can be “managed” but that takes time, mental resources, and effort. No problem but with legal hassles on every continent except Antarctica, Google finds the legal tar getting harder. Other factors bumping up the coefficient of friction at Google are the cut backs, the about faces, and the multi-front product and service wars the company is fighting. Even beavers grow careless. I saw a squashed on on the way to the post office yesterday.

Wow, I bet everyone using social media for information wishes that the traditional method of research were back in vogue. Online services reflect the user. In short, beavers do what beavers do, and today beavers don’t do “get your hands dirty” research. How inefficient! Let’s get social to find the “truth”. That works?

I find Google interesting and one can make its public search system deliver high value results. However, most online users just accept what the system outputs. When I was younger, I worried that commercial online services like Dialog and LexisNexis would manipulate results to suit their corporate purposes. As risky as placing trust in a commercial online service may be, Dialog and LexisNexis made no effort to filter the content generated by commercial database producers. In fact, the systems made it possible to run a query across multiple commercial files using the 411 command or to run comprehensive searches across a corpus of third party content. It took time and effort to grind through these outputs, but the effort would yield insights, suggestions for further research, and often make visible unintentional or factual errors. In our Business Dateline database, we went so far as to include post publication corrections to the full text article. The idea was to make it clear that even commercial publishers make mistakes, often really big ones.

Today, the online consumer is getting exactly what the online consumer wants. The content finding systems are not built to deliver accurate, unbiased results. The majority of online users want answers, not the time consuming, intellectually exhausting task of figuring out the provenance and accuracy of information. Who wants to do library research and mind numbing data analysis. I want the equivalent of ESPN Newscenter so I “know” what happened in sports. Who has time to watch the games? Why read “long form” content when one can snag information via Flipbook and Pulse?

So let’s knock off the worry about Google and its incursions into social. Put that effort into performing rigorous searching. When the users shift from taking spoon fed, baby food content to more substantive fare, then Google as well as other online services will adapt.

Perhaps this type of sign should be posted on search result pages from ad supported online research services? Image source: http://www.graphicshunt.com/funny/images/stupidity-13135.htm

Right now, Google is doing what beavers do. Users are doing what users do. Hard work, fact based analysis, and exercising judgment are not driving online. Distraction, ease of use, easy, fast, and fun information access is driving beavers into a frenzy.

Beavers do what beavers do. One can’t change Mother Nature. Complaining about Googzilla is pretty much a waste of energy which can be better spent with more rigorous research. Wow, that will be popular with today’s “average” user looking for pizza in all the wrong places.

Stephen E Arnold, January 11, 2012

Sponsored by Pandia.com, a Web site run by information professionals

Written by Stephen E. Arnold · Filed Under Editorial opinion, Google, Marketing, News, Search, Social, Text processing | 3 Comments

SAP: Long and Winding Road for Search

January 5, 2012

In one of the early editions of the Enterprise Search Report, that white elephant of 600 pages containing profiles of more than two dozen vendors, I described TREX, a nifty algorithm for Text Retrieval and Information Extraction. (The link is to the Wikipedia write up, however.) For those of you who are new to search, TREX is not the creature you wished you had as a pet when you were eight years old. The SAP TREX is a natural language processing search and retrieval system which was mostly home grown. Keep in mind that TREX owns the Inxight entity extraction and server technology developed by the adepts at Xerox PARC. I interviewed one of the developers, profiled the system’s approach to content processing, and pointed out that search was a killer in the SAP R/3 environment for three reasons:

SAP assigns its own spiffy metadata to content objects, storing these in the wild and wonder proprietary R/3 environment
SAP systems took and probably still take a long time to plan, implement, and impose on the client. My understanding is that the client does not tell SAP how the clients like to work. SAP tells the client how the client will work with the SAP system and method. Nifty for sure.
SAP systems have struggled with a wide range of performance “opportunities.” The idea is that when something goes slowly, then the client has the “opportunity” to make changes which will speed up the large, IBM-inspired system.

A few years ago, before Endeca became the new billion dollar toy at Oracle, Endeca accepted cash infusions from outfits hooked up with Intel (yep, the company with the vision that its chips could crush any computational problem because they were so darned fast) and SAP’s investment unit (an outfit allegedly looking at ways to give SAP a leg up on the future). After watching Endeca do its recursive indexing and faceting processes, Intel and SAP shifted gears. Endeca, as you know, is now part of Oracle along with TripleHop (clustering and indexing), InQuira (natural language processing from two predecessor companies), and RightNow (also infused with search technology), Artificial Linguistics, PL/SQL’s wonky command driven search, and probably some technologies I either don’t know about or have forgotten due to advancing senility.

Will SAP slip and fall with its information retrieval solutions? A happy quack to the image source http://personalinjuryclaims1.co.uk/fall-claims/

When you want to run search within an SAP environment, many folks just embrace one of the SharePoint solutions, give TREX a go, or license a system which is compatible with some of the SAP processed content. In short, SAP’s approach to search is not much different from IBM’s or Microsoft’s.

The question to consider is, “What’s next for SAP?”

Several observations:

First, SAP has to pump money into TREX to keep the system in step with today’s information demands. With SAP dabbling in open source and focusing on higher margin products and services, TREX is probably not the long haul solution for SAP. Home grown search is too expensive.

Second, SAP continues to poke around open source software. At some point, SAP may follow in the footsteps of the company which inspired SAP in the first place—IBM. Lucene and Solr look like possible options. This is a trend to watch.

Third, SAP buys or ties up with one of the workman-like search vendors. SAP could either sign a deal to use a third party system on some basis or just buy one of the dozens of information retrieval vendors who are looking for a financial white knight. Despite the chatter about search, many search and retrieval companies are gasping for oxygen. SAP may have a tank and a breathing mask.

What’s my view? Well, since I am a mercenary goose, I don’t have an official opinion. I do find it fascinating that SAP has not moved aggressively to the Lucene Solr solution. So for now, I am going out of town and will wait until my Overflight service provides some solid data about SAP’s next move.

Hopefully it will be more artfully crafted than SAP’s pricing and customer service activities in the last two or three years.

Stephen E Arnold,

January 5, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business strategy, Editorial opinion, Enterprise search, News, Search, Technology | Comments Off on SAP: Long and Winding Road for Search

Open Access Threatened by Elsevier Backed Legislation

January 3, 2012

Academic publishing, specifically in the fields of science and math, is a big money industry. The whole system hinges on containing the flow of information, a task that grows increasingly difficult with the demand for free access to information. Free access is fueled by the internet and social media, with these influences creating a new generation of young people who assume and demand that information be free. Arxiv.org is an open access archive for academic literature devoted to math and science. It and other open access portals are being threatened by potential legislation. (Open access is a term referring to quality information sources that are not protected by a subscription.) The Quantum Pontiff tells us more in, “Could Elsevier Shut Down Arxiv.org?”

The blogger reports:

They (Elsevier) haven’t yet, but they are supporting SOPA, a bill that attempts to roll back Web 2.0 by making it easy to shut down entire sites like Wikipedia and Craigslist if they contain any user-submitted infringing material.

Splash page of arxiv.org shows the seal of Cornell University and the phrase “We gratefully acknowledge supporting institutions. See http://arxiv.org/

Social media and copyright are inherently opposing concepts. User-submitted material, as it is referred to above, will almost always infringe upon copyright. In fact, very few submissions aside from the users own thoughts and words will not infringe upon copyright. If the legislators supporting SOPA (Stop Online Piracy Act) make good on all their promises, eventual showdowns with social media heavy hitters like Facebook or YouTube could occur.

American copyright was established by the founding fathers in our constitution to balance the protection of intellectual property with the ability to foster creativity and innovation. However, copyright has evolved in the modern era into a blanket protection policy, primarily serving corporations. Libraries and other institutions of learning champion the cause of open access, but even these civic organizations are threatened by corporate lobbyists in their constant quest to have copyright protection extended tighter and longer.

Written by Stephen E. Arnold · Filed Under Business strategy, Editorial opinion, Feature, Open source, Publishing, Search | 1 Comment

Predicting Failure: Pot Calls Kettle Black and Blue

January 2, 2012

Fascinating is traditional media’s ability to attack a hopelessly confused big corporation for a failure. The failure documented by the New York Times was Hewlett Packard’s immolation of its mobile strategy. The outfit doing the criticizing—what I call the pot calling the kettle gray lady black and blue—is the New York Times. Ah, irony.

Which is more flawed? The management of HP or the management of the New York Times. Let me try to remember. The New York Times lost its top manager and its head of digital stuff. The home delivery rate is nudging close to $700 a year. The Safari loophole makes its digital content free. The company has muffed the bunny with its indexing, its About.com property, and just about every financial knob and dial setting available.

HP, on the other hand, has engaged in improper behavior, the CEO revolving door game, the tablet fiasco, and the open sourcing of a $1.0 billion plus investment. HP bought Autonomy for $10 billion, creating a mini cash concern for some Wall Street types.

Sounds like a pretty even game of management

Now to the business at hand: “In Flop of H.P. TouchPad, an Object Lesson for the Tech Sector.” (If the link goes dead, just use Safari. Access to NYT content seems to be “free”. Nifty, eh? What is the New York Times suggesting? For me, the write up is more about the New York Times itself than about Hewlett Packard. Three points:

HP created a flop due to various management mistakes. Okay, sounds like the NYT’s problem
HP had a good idea but it “was ahead of its time”. Right. The NYT had a deal with LexisNexis which worked pretty well, but not well enough. So the NYT decided it could go it alone. It was, as the NYT says, “ahead of its time.” No kidding.
HP faced a problem with newcomers who dominated a market. Check. Same with the NYT and its various digital efforts. Being good at one thing does not mean that one if good at another thing.

My take? The NYT is trying to be just like the Harvard Business Review, adding value to what is not even a news story any longer. Going down this path ignores some of the basics of creating high value business and management analysis. The information is not what makes money. It is the other revenue streams. The NYT will learn as Time and Newsweek have that trying to up one’s intellectual game does not automatically make the money flow or the analysis insightful. Business information is often a loss leader or a way to generate consulting revenue.

The write up does explain how the NYT sees the woes of other companies. That is indeed interesting. I wonder if the NYT team remembers its original online search service. I bet Jeff Pemberton does.

Stephen E Arnold, January 2, 2012

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Business process, Business strategy, Editorial opinion, News, Technology | Comments Off on Predicting Failure: Pot Calls Kettle Black and Blue

Big Data Analytics and Sense Making with Synthesys

December 19, 2011

Tim Estes is the CEO and co-founder of Digital Reasoning. Digital Reasoning develops and markets solutions that provide Automated Understanding for Big Data.

There’s a great deal of talk about “big data” today. If you walk into an AT&T store near you, you may see the statistics of users sending over 3 Billion text messages a day or over 250 million tweets. Compare that to closer to 100 million or less tweets a day a year or two ago, and it’s daunting how rapidly the volume of digital information is increasing. A mobile phone without expandable storage frustrates users who want to keep a contacts list, rich media, and apps in their pocket. In organizations, the appetite for storage is significant. EMC, Hewlett Packard, and IBM are experiencing strong demand for their storage systems. Cloud vendors such as Amazon and Rackspace are also experiencing strong demand from companies offering compelling services to end users on their infrastructure. At a recent Amazon conference in Washington, Werner Vogels revealed that the AWS Cloud has hundreds of thousands of companies/customers running on it as some level. Finally, companies like Digital Reasoning are working the next generation of Cloud – automated understanding – that goes from a focus on infrastructure to sense-making of data that sits in hosted or private clouds.

While most of the attention has been on infrastructure like virtualization / hypervisors, Hadoop, and NoSQL data storage systems, we think those are really the enablers of the killer app for Cloud- which is making sense of data to solve information overload. Without next generation analytics and supporting technology, it is essentially impossible to:

Analyze a flow of data from multiple sensors deployed in a factory
Process mobile traffic at a telephone company
Make sense of unstructured and structured information flowing through an email system
Identify key entities and their importance in a stream of financial news and transaction data.

These are the real world problems that have engaged me for many years. I founded Digital Reasoning to automatically make sense of data because I believed that someday all software would learn and that would unleash the next great revolution in the Information Age. The demand for this revolution is inevitable because while data has increased exponentially, human attention has been essentially static in comparison. Technology to create better return on attention would go from “nice to have” to utterly essential. And now, that moment is here.

Digging a little deeper, Digital Reasoning has created a way to take human communication and use algorithms to make sense of it without having to depend on a human design, an ontology, or some other structure. Our system looks at patterns and the way a word is used in its context and bootstraps the understanding much like a human child does – creating associations and building into more complex relationships.

In 2009, we migrated onto Hadoop and began taking on the problem of managing very large scale unstructured data and move the industry beyond counting things that are well structured and toward being able to figure out exactly what the data means that you are measuring.

Digital Reasoning asks the question: “How do you take loose, noisy information that is disconnected and unstructured and then make sense of it so that you can then apply analytics to it in a way that is valuable to business?”

We identify actors, actions, patterns, and facts and then put it into the context of space and time in an efficient and scalable way. In the government scenario, that can mean to finding and stopping bad guys. In the legal environment they want to answer the questions of “who”, “what”, “where”, and “when”.

Digital Reasoning initially set our focus on the complex task of making sense out of massive volumes of unstructured text within the US Government Intelligence Community after the events of 9/11. But we also believe that our Synthesys software can be utilized in the commercial sector to create great value from the mountains of unstructured data that sit in the Enterprise and streaming in from the Web.

Companies with large-scale data will see value in investing in our technology because they cannot hire 100,000 people to go through and read all of the available material. This matters if you are a bank and trying to make financial trades. This matters for companies doing electronic discovery. This matters for health sectors that need help organizing medical records and guarding against fraud.

We are an emerging firm, growing rapidly and looking to have the best and the brightest join our quest to empower users and customers to make sense of their data through revolutionary software. With the recent investment from In-Q-Tel and partners of Silver Lake, I believe that Digital Reasoning has a great future ahead. We are on the bleeding edge of what is going on with Hadoop and Big Data in the engineering area and how to make sense of data through some of the most advanced learning algorithms in the world. Most of all we care that people are empowered with technology so that they can recover value and time in the race to overcome information overload.

To learn more about Digital Reasoning, navigate to our Web site and download our white paper.

Tim Estes, December 19, 2011

Sponsored by Pandia.com

Written by Stephen E. Arnold · Filed Under Analytics, Editorial opinion, Financial, News, Search, Technology, Text analytics, Text processing | 2 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

The Courier Journal: A Louisville Death Rattle

Two Pundits and Their Punditry

A Road Map for Censorship

Another Pundit Outfit Predicts Doom for GOOG

Censorship Inputs: Filtering Content and Unintended Consequences

Googzilla Gets Social

SAP: Long and Winding Road for Search

Open Access Threatened by Elsevier Backed Legislation

Predicting Failure: Pot Calls Kettle Black and Blue

Big Data Analytics and Sense Making with Synthesys

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta