Why Traditional Print and Database Publishers Are in Even More Trouble Than Thought
November 12, 2014
I read two articles snagged by my Overflight service. The first is “Are You Ready for Marketing in 2020?” The story ran in what I thought was one of the UK’s most eager of the electronic pony riders. The other is a news report that LinkedIn, the social network for those desperate for contract or 9-to5 work and individuals with a hunger for getting 15 nanoseconds of fame. Yep, the entity “Stephen E Arnold” has a presence on LinkedIn. However, the “entity” is powered by the efforts of two of my research goslings and a real law librarian. We find the response to the “Stephen E Arnold” postings to the LinkedIn faithful amusing and somewhat horrifying.
Let’s look at each news item and then do some social and digital strategery, a neologism from the W era of the US presidency.
The Guardian asks a question and then promptly answers it without any reference whatsoever to the steady erosion of the traditional newspaper and magazine business. The author, a real journalist I presume, shows some grim data about the decline in ad revenue. There is a fix for this. A “real” newspaper or magazine can quit fiddling with the objective journalism stuff and get down to selling “inclusions.”
If you are not familiar with an inclusion in “real” publishing, allow me to explain. Think about those big, fat college guides that parents buy when Jill or James is “looking for a college.” Some of the entries are obese. Ever wonder why? Well, the business model of many college guides are based on selling space to the colleges and universities. Instead of calling these juicy descriptions of caring faculty and well groomed campuses advertising, the publishers use the euphemism “inclusion.”
How does this fit into the decline of newspaper advertising revenue? Easy. Just sell stories that pitch the advertisers’ view of reality. Then sell social media posts about the inclusion. Keep beating the drum until the inclusion buyer’s money runs out. Rinse. Repeat.
The solution is different from mine. The future in 2020 marketing will be data, content, channels, and technology. I think these are fine words, but the job is to hook these words to money. That will be done by charging for the newspaper or magazine endorse, brand power, and ability to put out content that has more credibility than a blog produced by an unemployed journalist, a failed Webmaster, or a retired person like moi.
The Guardian “real” news story concludes with a question: “Are you ready for marketing 2020 style?” Well, the answer in my opinion is that “real” newspaper and magazine publishers are not ready for 2020. They were not ready for online content in the 1960s. Now a half century later, these outfits are still struggling in a digital fish bowl. By 2020, most of the “real” newspapers and magazines will either become PR and SEO outfits, get into a different business like real estate, or fail. In my opinion, the very expensive and complex business model of the Monocle will not be viable due to the difficulty of generating enough revenue to keep prints, shops, online, and other bits and pieces affordable.
The second article is “LinkedIn acquires Newsle, a Google Alerts-Style Service for You and Your Network. One good thing about LinkedIn is that it is more focused than Amazon or Google. The company offers the ego- and unemployed focus that sets it apart from other social networks. Also, the company has snagged a couple of content centric properties. I quite like Slideshare because users create content, upload it, and get the benefit of being able to hunt for work or boost their ego. That’s synergistic in the MBA 1975 definition of the term. The Newsle deal, like the Pulse deal, is aimed at service. These have potential to distribute Linked In “posts” and news about Slideshare uploads as well as content that some publishers provide. Please, note that the savvy publisher will charge a person or company to write a story, slap the “real” publication’s name on it, and then hose the data to LinkedIn’s services. So I am on board with this type of acquisition for LinkedIn.
But the real impact of this LinkedIn constellation of services is that traditi0nal database publishers like ProQuest and Ebsco Electronic Publishing are likely to find themselves in a deeper hole than the one they are now in. The traditional market for these outfits is a library willing to pay outrageous prices for content produced by others. Publishers are rightly suspicious of these database outfits. If specialized information is the focal point, the audience for ACM or IEEE content remains small. As a percentage of the working population, the specialist markets are more difficult to increase. Selling cheaper mobile devices is a tough business, but these burgeoning prospect pools are looking for ways to reduce their costs of online, not raise them by reading the full text of Elsevier journals.
Raising prices for this specialized content will squeeze both the professional customers and the go-between companies like Cambridge Scientific Abstracts. Westlaw and Lexis already are feeling the effects of having their core market flee for jobs at Uber, Kentucky Fried Chicken, human resources, and trying to make a franchise pay for the kids’ sneakers. Legal information is indeed a very tough business compared to the salad days of expensive online information. I balk at paying $100, $250, or more for a query of US government produced legal documents. I am not alone I believe.
This means that LinkedIn may benefit from “real” newspapers and magazines charging for inclusions. As LinkedIn’s audience grows, it—not the publishers nor the intermediating database folks—will get the big paydays necessary to live high on the hog.
Good for LinkedIn. Not so good for the folks who have not adapted to the 1970s. By 2020, many of these outfits will be like the snow leopards. LinkedIn could be one of the winners.
Stephen E Arnold, November 12, 2014
Will Two Xooglers Burnish Yahoo?
November 12, 2014
I read an exclusive story. Know how I know the story is “exclusive”? Here’s the title:
Exclusive: Some Unhappy Yahoo Investors Asking AOL for Rescue
Obviously you have to read the foundation’s exclusive. I want to focus on a different question: Can two former Google executives repair Yahoo’s revenues? I am less than optimistic. I used an illustration in one of the briefings I did during the era of Terry Semel. The picture featured a sinking ship with Mr. Semel’s face Photoshopped into a captain’s uniform.
As I pointed out years ago, once an Internet portal service loses its momentum, flat-lining is the upside. The downside is a slow, gentle drift into irrelevance. So the answer to the question, in my opinion, is, “Long shot.”
I like to recall Yahoo’s former chief technology officer railing me on a conference call about Yahoo’s super-advanced search technology. How is that working out?
Stephen E Arnold, November 12, 2014
IBM and Airstrip Work for Mobile Health Monitoring System
November 12, 2014
The article titled Airstrip and IBM Partner to Develop Predictive Analtics Solution on HIT Consultant explored the announcement of the partnership to the development of mobile monitoring of patients in critical conditions. The University of Michigan Center for Integrative Research in Critical Care (MCIRCC) will also be involved. The article explains,
“MCIRCC will pioneer the application of this technology with AirStrip by developing the advanced analytics and testing its ability to identify and predict a serious and unexpected complication called hemodynamic decompensation, one of the most common causes of death for critically ill or injured patients. MCIRCC researchers anticipate that the resulting solution may provide the clinical decision support tool that enables clinicians to identify patient risk factors for early intervention. Early intervention can enhance critical care delivery, improve patient outcomes, and reduce ICU admissions..”
The top goals of the research are to reduce healthcare costs while improving patient outcomes. This is to be achieved through the combination of the AirStrip ONE® platform and the IBM® InfoSphere® Streams. Especially exciting is the ability for this technology to assess patients inside and outside of the hospital walls.Patients with conditions including chronic obstructive pulmonary disease (COPD), diabetes, and congestive heart failure could be monitored for “clinical deterioration” and possible complications could be prevented with this technology.
Chelsea Kerwin, November 12, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Enterprise Data Discovery for Self-Service Search
November 12, 2014
The article titled The Five Rules for Data Discovery on Computerworld discusses Enterprise Data Discovery. In the pursuit of fast-paced, accurate data analytics, Enterprise Data Discovery is touted in this article as a ramped up tool for accessing relevant information quickly. The first capability is “governed self-service discovery” which enables users to reformulate their data search on their own. This also allows for the blending of data types including social media and unstructured data. The article also emphasizes the importance of having a dialogue with the data,
“You also discovered that the spike in sales occurred in the middle of the media campaign and during the time of the spike, there was a major sporting event. This new clue prompts a new question – what could a sporting event have to do with the spike? Again, the data reveals its value by providing a new answer – one of the advertisements from the campaign got additional play at the event. Now, you have something solid to work on.”
According to the article, Enterprise Data Discovery offers a view of the road less travelled, enabling users to approach their discovery with new questions. Of course, the question that arises while reading this article is, who has time for this? The emphasis on self-service is interesting, but it also suggests that users will be spending a good chunk of time manipulating the data on their own.
Chelsea Kerwin, November 12, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
LinkedIn Enterprise Search: Generalizations Abound
November 11, 2014
Three or four days ago I received a LinkedIn message that a new thread had been started on the Enterprise Search Engine Professionals group. You will need to be a member of LinkedIn and do some good old fashioned brute force search to locate the thread with this headline, “Enterprise Search with Chinese, Spanish, and English Content.”
The question concerned a LinkedIn user information vacuum job. A member of the search group wanted recommendations for a search system that would deliver “great results with content outside of English.” Most of the intelligence agencies have had this question in play for many years.
The job hunters, consultants, and search experts who populate the forum do not step forth with intelligence agency type responses. In a decision making environment when inputs in a range of language are the norm for risk averse, the suggestions offered to the LinkedIn member struck me as wide of the mark. I wouldn’t characterize the answers as incorrect. Uninformed or misinformed are candidate adjectives, however.
One suggestion offered to the questioner was a request to define “great.” Like love and trust, great is fuzzy and subjective. The definition of “great”, according the expert asking the question, boils down to “precision, mainly that the first few results strike the user as correct.” Okay, the user must perceive results as “correct.” But as ambiguous as this answer remains, the operative term is precision.
In search, precision is not fuzzy. Precision has a definition that many students of information retrieval commit to memory and then include in various tests, papers, and public presentations. For a workable definition, see Wikipedia’s take on the concept or L. Egghe’s “The Measures Precision, Recall, Fallout, and Miss As a function of the Number of Retrieved Documents and Their Mutual Interrelations, Universiiteit Antwerp, 2000.
In simple terms, the system matches the user’s query. The results are those that the system determines containing identical or statistically close results to the user’s query. Old school brute force engines relied on string matching. Think RECON. More modern search systems toss in term matching after truncation, nearness of the terms used in the user query to the occurrence of terms in the documents, and dozens of other methods to determine likely relevant matches between the user’s query and the document set’s index.
With a known corpus like ABI/INFORM in the early 1980s, a trained searcher testing search systems can craft queries for that known result set. Then as the test queries are fed to the search system, the results can be inspected and analyzed. Running test queries was an important part of our analysis of a candidate search system; for example, the long-gone DIALCOM system or a new incarnation of the European Space Agency’s system. Rigorous testing and analysis makes it easy to spot dropped updates or screw ups that routinely find their way into bulk file loads.
Our rule of thumb was that if an ABI/INFORM index contained a term, a high precision result set on SDC ORBIT would include a hit with that term in the respective hit. If the result set did not contain a match, it was pretty easy to pinpoint where the indexing process started dropping files.
However, when one does not know what’s been indexed, precision drifts into murkier areas. After all, how can one know if a result is on point if one does not know what’s been indexed? One can assume that a result set is relevant via inspection and analysis, but who has time for that today. That’s the danger in the definition of precision in what the user perceives. The user may not know what he or she is looking for. The user may not know the subject area or the entities associated consistently with the subject area. Should anyone be surprised when the user of a system has no clue what a system output “means”, whether the results are accurate, or whether the content is germane to the user’s understanding of the information needed.
Against this somewhat drab backdrop, the suggestions offered to the LinkedIn person looking for a search engine that delivers precision over non-English content or more accurately content that is not the primary language of the person doing a search are revelatory.
Here are some responses I noted:
- Hire an integrator (Artirix, in this case) and let that person use the open source Lucene based Elasticsearch system to deliver search and retrieval. Sounds simplistic. Yep, it is a simple answer that ignores source language translation, connectors, index updates, and methods for handling the pesky issues related to how language is used. Figuring out what a source document in an language with which the user is not fluent is fraught with challenges. Forget dictionaries. Think about the content processing pipeline. Search is almost the caboose at the end of a very long train.
- Use technology from LinguaSys. This is a semantic system that is probably not well known outside of a narrow circle of customers. This is a system with some visibility within the defense sector. Keep in mind that it performs some of the content processing functions. The technology has to be integrated into a suitable information retrieval system. LinguaSys is the equivalent of adding a component to a more comprehensive system. Another person mentioned BASIS Technologies, another company providing multi language components.
- Rely on LucidWorks. This is an open source search system based on SOLR. The company has spun the management revolving door a number of times.
- License Dassault’s Exalead system. The idea is wroth considering, but how many organizations are familiar with Exalead or willing to embrace the cultural approach of France’s premier engineering firm. After years of effort, Exalead is not widely known in some pretty savvy markets. But the Exalead technology is not 100 percent Exalead. Third party software delivers the goods, so Exalead is an integrator in my view.
- Embrace the Fast Search & Transfer technology, now incorporated into Microsoft SharePoint. Unmentioned is the fact that Fast Search relied on a herd of human linguists in Germany and elsewhere to keep its 1990s multi lingual system alive and well. Fast Search, like many other allegedly multi lingual systems, rely on rules and these have to be written, tweaked, and maintained.
So what did the LinkedIn member learn? The advice offers one popular approach: Hire an integrator and let that company deliver a “solution.” One can always fire an integrator, sue the integrator, or go to work for the integrator when the CFO tries to cap the cost of system that must please a user who may not know the meaning of nus in Japanese from a now almost forgotten unit of Halliburton.
The other approach is to go open source. Okay. Do it. But as my analysis of the Danish Library’s open source search initiative in Online suggested, the work is essentially never done. Only a tolerant government and lax budget oversight makes this avenue feasible for many organizations with a search “problem.”
The most startling recommendation was to use Fast Search technology. My goodness. Are there not other multi lingual capable search systems dating from the 1990s available? Autonomy, anyone?
Net net: The LinkedIn enterprise search threads often underscore one simple fact:
Enterprise search is assumed to be one system, an app if you will.
One reason for the frequent disappointment with enterprise search is this desire to buy an iPad app, not engineer a constellation of systems that solve quite specific problems.
Stephen E Arnold,November 11, 2014
Google and Search Relevance: Will a Robot Do This Work?
November 11, 2014
I flicked through Drudge Report this morning (November 11, 2014). One story and graphic caught my eye.
Here’s a snap of the animation for “Dawn of the Google Machines.”
Source: Drudge.com
Now this is a pretty friendly robot. I think most children under the age of five would see this device as a variant of a bunny, a deer, or a puppy.
The technology is impressive. My question, “Will the resources flowing into this friendly chap improve query relevance on Google Web search?”
I am confident this cuddly creature will make search really, really better. Perhaps Google can provide this fuzzy creature to pre-schools and kindergarten to explain why Google search is just so darned relevant.
Whir, beep, click.
Stephen E Arnold, November 11, 2014
Take Time Choosing an eDiscovery Solution
November 11, 2014
There is no escaping it: eDiscovery requirements are having a huge impact on today’s law practices. Reporter Shane Schick at Canadian Lawyer tells us why firms must not take the issue lightly in “Chasing Data: Legal Report: E-Discovery.” Though vendors might promise the auto-delivery of everything one needs for any case “at the push of a button,” the reality is much, much more complicated. In fact, the management of eDiscovery is literally a full-time position at many firms and, where it isn’t, it probably should be.
Schick writes:
“It’s probably best if law firms recognize that developing an e-discovery strategy and getting the right products to execute it is going to take some time. [Forensic-services lawyer Peter] Vakof estimates that in some cases, acquiring the tools through standard procurement can take up to 18 months. [eDiscovery pro Susan] Wortzman suggests making it easier by doing all the information gathering upfront to make the right purchasing decision. This includes a thorough look at what kind of cases crop up that typically require e-discovery, the volume of data involved, and which clients are good at self-collecting data versus those who need help with the forensics. [Secure-applications expert Chris] Grossman agrees — even if firms decide to outsource, it’s better to ‘level it out’ by having a vendor on retainer, rather than spend more during a peak period when several e-discovery cases crop up at once.”
See the piece for discussion of the complexities involved in eDiscovery, as well as a helpful list of questions to consider before choosing a solution. Schick notes that the intricacies around eDiscovery will likely affect the qualifications firms look for in employees. Will no prestigious field remain a safe haven for the tech-avoidant?
Cynthia Murrell, November 11, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
eDigital Research and Lexalytics Team Up on Real Time Text Analytics
November 11, 2014
Through the News section of their website, eDigitalResearch announces a new partnership in, “eDigitalResearch Partner with Lexalytics on Real-Time Text Analytics Solution.” The two companies are integrating Lexalytics’ Salience analysis engine into eDigital’s HUB analysis and reporting interface. The write-up tells us:
“By utilising and integrating Lexalytics Salience text analysis engine into eDigitalResearch’s own HUB system, the partnership will provide clients with a real-time, secure solution for understanding what customers are saying across the globe. Able to analyse comments from survey responses to social media – in fact any form of free text – eDigitalResearch’s HUB Text Analytics will provide the power and platform to really delve deep into customer comments, monitor what is being said and alert brands and businesses of any emerging trends to help stay ahead of the competition.”
Based in Hampshire, U.K., eDigitalResearch likes to work closely with their clients to produce the best solution for each. The company began in 1999 with the launch of the eMysteryShopper, a novel concept at the time. As of this writing, eDigitalResearch is looking to hire a developer and senior developer (in case anyone here is interested.)
Founded in 2003, Lexalytics is proud to have brought the first sentiment analysis engine to market. Designed to integrate with third-party applications, their text analysis software is chugging along in the background at many data-related companies. Lexalytics is headquartered in Amherst, Massachusetts.
Cynthia Murrell, November 11, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Microsoft Turns SharePoint Points Users to Yammer
November 11, 2014
SharePoint is a longstanding leader in enterprise search, but it continues to morph and shift in response to the latest technology and emerging needs. As the move toward social becomes more important, Microsoft is dropping outdated features and shifting its focus toward social components. Read more in the GCN article, “Microsoft Pushes Yammer as it Trims SharePoint Features.”
The article begins:
“Microsoft quietly retired some features from SharePoint Online while it enhanced mobile apps, email integration and collaboration tools of Yammer, the company’s cloud-based enterprise social networking platform. Microsoft MVP and SharePoint expert Vlad Catrinescu posted that the company was removing the Tasks menu option, and the Sync to Outlook button will also be removed. Additionally, SharePoint Online Notes and Tags were deprecated last month.”
Stephen E. Arnold is a longtime leader in search. He keeps a close eye on SharePoint, reporting his findings on ArnoldIT.com. The article hints that Microsoft is leaning toward moving to Yammer all the way, meaning that additional features are likely to be retired and collapsed into the new infrastructure. To keep up with all the changes, including the latest tips and tricks, stay tuned to Arnold’s specific SharePoint feed.
Emily Rae Aldridge, November 11, 2014
Disney and Search
November 10, 2014
I won’t bore you with the Disney InfoSeek adventure. Sigh. If you want to know how Disney is approaching Web search, read “Disney Fights Piracy With New Search Patent.” The system and method is intended to filter out content not licensed by means known to Disney. The write up’s headline suggests that a system and method in the form of a patent will “fight piracy.” Interesting notion, but I think the idea is that Disney has built or will build a system that shows only “official” content.
The notion of building a specialist Web site is an interesting idea. The reality may be that traffic will be very hard to come by. The most recent evidence is Axil Springer’s capitulation to the Google. Axil Springer owns a chunk of Qwanta. Again a good idea, but it does not deliver traffic.
If you build a search engine, who will use it? Answer: Not too many people if the data available to me are correct.
Stephen E Arnold, November 210, 2014