An Early Computer-Assisted Concordance

November 17, 2015

An interesting post at Mashable, “1955: The Univac Bible,” takes us back in time to examine an innovative indexing project. Writer Chris Wild tells us about the preacher who realized that these newfangled “computers” might be able to help with a classically tedious and time-consuming task: compiling a book’s concordance, or alphabetical list of key words, their locations in the text, and the context in which each is used. Specifically, Rev. John Ellison and his team wanted to create the concordance for the recently completed Revised Standard Version of the Bible (also newfangled.) Wild tells us how it was done:

“Five women spent five months transcribing the Bible’s approximately 800,000 words into binary code on magnetic tape. A second set of tapes was produced separately to weed out typing mistakes. It took Univac five hours to compare the two sets and ensure the accuracy of the transcription. The computer then spat out a list of all words, then a narrower list of key words. The biggest challenge was how to teach Univac to gather the right amount of context with each word. Bosgang spent 13 weeks composing the 1,800 instructions necessary to make it work. Once that was done, the concordance was alphabetized, and converted from binary code to readable type, producing a final 2,000-page book. All told, the computer shaved an estimated 23 years off the whole process.”

The article is worth checking out, both for more details on the project and for the historic photos. How much time would that job take now? It is good to remind ourselves that tagging and indexing data has only recently become a task that can be taken for granted.

Cynthia Murrell, November 17, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

No Microfiche Required

November 16, 2015

Longstanding publications are breathing new life into their archives by re-publishing key stories online, we learn from NiemanLab’s article, “Esquire Has a Cold: How the Magazine is Mining its Archives with the Launch of Esquire Classics.” We learn that Esquire has been posting older articles on their Esquire Classics website, timed to coincide with related current events. For example, on the anniversary of Martin Luther King Jr.’s death last April, the site republished a 1968 article about his assassination.

Other venerable publications are similarly tapping into their archives. Writer Joseph Lichterman notes:

“Esquire, of course, isn’t the only legacy publication that’s taking advantage of archival material once accessible only via bound volumes or microfiche. Earlier this month, the Associated Press republished its original coverage of Abraham Lincoln’s assassination 150 years ago…. Gawker Media’s Deadspin has The Stacks, which republishes classic sports journalism originally published elsewhere. For its 125th anniversary last year, The Wall Street Journal published more than 300 archival articles. The New York Times runs a Twitter account, NYT Archives, that resurfaces archival content from the Times. It also runs First Glimpses, a series that examines the first time famous people or concepts appeared in the paper.”

This is one way to adapt to the altered reality of publication. Perhaps with more innovative thinking, the institutions that have kept us informed for decades (or centuries) will survive to deliver news to our great-grandchildren. But will it be beamed directly into their brains? That is another subject entirely.

 

Cynthia Murrell, November 16, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The Guardian Recycles Binney and Seems to Omit Reddit Link to Original Content

November 12, 2015

I am not a subscriber to the Guardian. Perhaps the online article I viewed a moment ago is spoofed in some way. Anyway, navigate to “NSA Whistleblower Reveals Details of American Spying during Reddit AMA Session.” You can read a recycling of the Reddit Ask Me Anything. The link to the source on Reddit is here. Information has a way of disappearing, so if the link the AMA is a goner, there’s not much I can do.

The Guardian does take the time to provide links to its articles and to USA Today, an outstanding publication. Heck, yes, that “real” journalism stuff is just better than the original source.

Quick question: I find it interesting that real journalists are aggressively recycling social media content. Why not include an explicit link? Oh, I know. Pride, haste, a misplaced sense of providing “real” information. Pick one.

Stephen E Arnold, November 12, 2015

Ravel, Harvard, and Indigestion for Lexis and Westlaw

October 31, 2015

If you are a lucky online maven with free Lexis and Westlaw access, you do not want to waste your time reading “Harvard Law School Launches “Free the Law” Project with Ravel Law To Digitize US Case Law, Provide Free Access.”

But if you pay hard cash to run queries on certain court documents, you may want to pay attention to the Ravel-Harvard plan to provide access to US case law.

Ravel wants to catch the attention of the big guns at Reed Elsevier and Thomson Reuters. I assume the executives at these companies are on top of the Ravel plan to unravel their money machines.

According to the Harvard write up:

Harvard Law School’s collection comprises 40,000 books containing approximately forty million pages of court decisions, including original materials from cases that predate the U.S. Constitution. It is the most comprehensive and authoritative database of American law and cases available anywhere except for the Library of Congress, containing binding judicial decisions from the federal government and each of the fifty states, from the founding of each respective jurisdiction. The Harvard Law School Library—the largest academic law library in the world—has been collecting these decisions over the past two hundred years.

Where there is legal information and the two leading for fee legal online services, my hunch is that there will be some legal eagles taking flight.

According to Techdirt:

Harvard “owns” the resulting data (assuming what’s ownable), and while there are some initial restrictions that Ravel can put on the corpus of data, that goes away entirely after eight years, and can end earlier if Ravel “does not meet its obligations.” Beyond that, Harvard is making everything available to non-profits and researchers anyway. Ravel is apparently looking to make some money by providing advanced tools for sifting through the database, even if the content itself will be freely available.

What will the professional publishing outfits do to preserve their market? I can think of several actions. Sure, litigation is one route. But taking Harvard to court might generate some bad vibes. Perhaps Reed Elsevier and Thomson Reuters will finally bite the bullet, merge, and then buy out Ravel? We have Walgreen Boots, why not LexisWestlaw? Is that a scary Halloween thought? Let the Department of Justice unravel that deal. Don’t lawyers enjoy that sort of challenge.

Stephen E Arnold, October 31, 2015

Reclaiming Academic Publishing

October 21, 2015

Researchers and writers are at the mercy of academic publishers who control the venues to print their work, select the content of their work, and often control the funds behind their research.  Even worse is that academic research is locked behind database walls that require a subscription well beyond the price range of a researcher not associated with a university or research institute.  One researcher was fed up enough with academic publishers that he decided to return publishing and distributing work back to the common people, says Nature in “Leading Mathematician Launches arXiv ‘Overlay’ Journal.”

The new mathematics journal Discrete Analysis peer reviews and publishes papers free of charge on the preprint server arXiv.  Timothy Gowers started the journal to avoid the commercial pressures that often distort scientific literature.

“ ‘Part of the motivation for starting the journal is, of course, to challenge existing models of academic publishing and to contribute in a small way to creating an alternative and much cheaper system,’ he explained in a 10 September blog post announcing the journal. ‘If you trust authors to do their own typesetting and copy-editing to a satisfactory standard, with the help of suggestions from referees, then the cost of running a mathematics journal can be at least two orders of magnitude lower than the cost incurred by traditional publishers.’ ”

Some funds are required to keep Discrete Analysis running, costs are ten dollars per submitted papers to pay for software that manages peer review and journal Web site and arXiv requires an additional ten dollars a month to keep running.

Gowers hopes to extend the journal model to other scientific fields and he believes it will work, especially for fields that only require text.  The biggest problem is persuading other academics to adopt the model, but things move slowly in academia so it will probably be years before it becomes widespread.

Whitney Grace, October 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Traffic Pattern Information: News on the Hot Seat

October 15, 2015

The write up “Platforms for Everyone, Publications for No One.” The article contains some interesting information. Let me pull out a few points I highlighted in accounting department red:

  • News site traffic is “tanking”
  • Charts included in the write up show a decline in traffic for Thought Catalog and Elite Daily
  • Facebook sharing is declining
  • Digital big dogs are juggling their offerings to deal with “engagement”
  • Apps are fluid with outfits like Facebook and Twitter trying to respond to what seems to be an opportunity
  • Advertising based content may be the future of information.

What’s the impact on companies which specialize is developing and selling products based on the old school print approach to information? My hunch is that life is going to focus more narrowly on generating revenue? Which of the over extended vendors will find a solution to before falling off “the platform’s edge?”

Net net: Apple, Facebook, and Google may be the 21st century version of traditional news organizations. The fact that news organizations are just now grasping the shift is fascinating. I wonder if some of these outfits can emulate Buzzfeed-type of content, embrace draft brewing, or jump into the restorod business? Another interesting question, “Will Twitter make the transition?”

Stephen E Arnold, October 15, 2015

Computational Journalist? Stanford Has Your Ticket to Ride to Fame

October 6, 2015

Regular journalism going nowhere. After losing a job at a nifty PR agency, do you want to get back into the “real journalism” environment? Former newspaper person tired of writing baloney for enterprise and Big Data outfits?

Navigate to “Deep and Interesting Datasets for Computational Journalists: A Quick List.” Stanford University, birthplace of the Alphabet Google thing, has just what you need to ignite your career. Many interesting links; for example, Every thing and person paid for by Congressional office funds.

Now I did some work for a congress person, so I am not sure about the “every thing.” But, hey, it’s marketing even in academia.

Stephen E Arnold, October 6, 2015

Yandex TweetedTimes Is Back for Now

October 6, 2015

I noted about a week ago that Tweeted Times, now part of the Yandex operation, was dark. My magic pinger alerted me that the service is back up again as of October 5, 2015. I look forward to more tweets collected under such headings as Law Experts (aren’t all attorneys experts?) and Matt Cutts (yep, the Google “SEO is neither good nor bad specialist). Enjoy http://tweetedtimes.com.

Stephen E Arnold, October 6, 2015

Big Data: Systems of Insight

October 6, 2015

I read “All Your Big Data Will Mean Nothing without Systems of Insight.” The title reminded me of the verbiage generated by mid tier consulting firms and adjuncts teaching MBA courses at some institutions of higher learning. Malarkey, parental advice, and Big Data—a Paula Dean-type recipe for low-calorie intellectual fare.

image

Can one live on the outputs of mid tier consulting firm lingo prepared to be fudgier?

The notion of a system of insight is not particularly interesting. The rhetorical trip of moving from a particular to a more general concept fools some beginning debaters. For a more experienced debater, the key is to keep the eye on the ball, which, in this case, is the tenuous connection between Big Data and strategic management methods. (I am not sure these exist even after reading every one of Peter Drucker’s books.)

But I like to deal with particulars.

Computerworld is a sister or first cousin unit of the IDC outfit which sold my research on Amazon without asking my permission. My valiant legal eagle was able to disappear the report. I was concerned with the connection of my name and the names of two of my researchers with the IDC outfit. I have presented some of the back story in previous blog posts. I included screenshots along with the details of not issuing a contract, using content in ways to which I would never agree, and engaging in letters with my attorney offering inducements to drop the matter. Wow. A big company is unable to get organized and then pays its law firm to find a solution to the self created problem.

The report in question was a limp wristed, eight pages in length and available to Amazon’s eager readers of romance novels for a mere $3,500. Hey, the good stuff in our research was chopped out, leaving a GrapeNut flakes experience for those able to read the document. I am a lousy writer, but I try to get my points across in a colorful way. Cereal bowl writing is not for me.

What does this have to do with Big Data and a system of insights?

Aren’t Amazon’s sales data big? Isn’t it possible to look at what sells on Amazon by scanning the company’s public information about books? Won’t a casual Google search reveal information about Amazon’s best selling eBooks? Best sellers’ lists rarely feature eight pages of watered down analysis of a search vendor with some soul bonding with the outstanding Fast Search & Transfer operation. How many folks visiting the digital WalMart buy $3,500 reports with my name on them?

Er, zero. So what’s the disconnect between basic data about what sells on Amazon, issuing appropriate contractual documents, and selling research with my name and two of my goslings on the $3,500, eight page document. That’s brilliant data analysis for sure.

The write up explains:

Businesses want to use data to understand customers, but they can’t do that without harnessing insights and consistently turning data into effective action.

That sort of makes sense except that the company which owns Computerworld, under the keen-eyed Dave Schubmehl, appeared to ignore this step when trying to sell a report with my name on it to the Amazon faithful. Do the folks at Computerworld and the company’s various knowledge properties connect data with their colleagues’ decisions?

Read more

The Nikkei Financial Times Deal: Journalism and Quality

September 29, 2015

I read a Harry Quebert type article this morning called with typical big time journalism understatement “The Financial Times and the Future of Journalism.”

Yep, the future. Of journalism.

The set up is an interview which has been converted to a chatty, informed narrative with commentrary from the person asking the questions (a New Yorker “contributor”, which I think means contract worker) and statements from a full-time equivalent at the Financial Times, a salmon colored newspaper consumed by  750,000 quality-centric readers.

The quotes in this blog post come from the CEO of the Financial Times which sold to Nikkei, a Japanese outfit for 40 times the FT’s 2014 revenues. So $37 million netted Pearson, the former owner of the FT, about $1.4 billion. Like the HP purchase of Autonomy, I will be interested to see how the purchase plays out. Obviously Pearson was neither willing nor able to put the FT on a pedestal of cash. The former owner of Madame Tussaud’s wax museum sold the newspaper. Let Nikkei realize the long term benefits of FT ownership I assume.

The write up by the New Yorker magazine, which has pretty good cartoons, is a darned interesting journalistic artifact itself. But I want to focus on some of the statements in the write up, allegedly made by the FT CEO who played a big part in the deal with the Japanese buyer.

I noted this statement:

Nikkei wanted to prepare for the transition to digital, which has been slow in its home market.

My recollection may be fuzzy, but I thought that the Japanese were exploring the digital world, databases, and all sorts of software based activities in Japan’s Fifth Generation Project in the early 1990s. Hey, that was like yesterday in traditional print publishing.

The FT executive allegedly said to the reporter:

I think if you were to summarize the vision that we both share, it would be about growth. We both think there is a very good growth opportunity for the F.T. That requires a long-term perspective. It requires investment. They have committed to that. And for a news organization like the F.T. right now, that’s music to one’s ears, frankly.”

I like the long term growth perspective. Apparently Pearson was not on board with this concept about investment without significant payoff. As a result, Pearson shopped the FT and netted a nine figure payout.

Why did Pearson opt to sell and not pump cash into the FT? Here’s the explanation from the FT executive:

What is lacking is some fuel in the tank and the ability to spread our wings a bit.”

Pearson apparently lacked “fuel”. I wonder if the “fuel” is patience, money, financial resources, or wisdom. The billion dollar deal looked pretty snappy to me. Imagine. More than one billion for a newspaper. That’s the color of money.

Apparently the FT boss perceives those from the Far East—that is, beyond Dover—as adopting Adam Carolla’s “In 50 Years We’ll All Be Chicks” approach. None of this City and Wall Street aggressiveness toward revenues and profits. Here’s a passage I highlighted in green, the color of money:

But Japanese newspapers, including the ones owned by Nikkei, are also known for taking a less aggressive stance toward news than many of their Western counterparts.

Will the FT remain independent as other newspapers and real journalistic endeavors do the inclusion, sponsored content, advertising thing? According to the FT executive:

“Editorial independence is absolutely fundamental to the way we operate,” he replied.

He allegedly added:

But I think the most important thing is they understand our values and editorial independence. I’m not going to tell them how to run Nikkei, and they are not going to tell us how to do editorial independence at the F.T. They are very clear about that.

I like that understanding. The owner will not assert control over something the person owns. Shared values among the quality journalists effervesce from this factoid.

The most important passage concerns the FT business model. Here’s the explanation of the FT’s “vindicated”, super-charged approach to generating at some time in the future oodles of dough from a global market of discerning news consumers:

The replacement was cheap trial subscriptions. If you go to the F.T.’s Web site today and try to read a story, you will be prompted to take out a month-long subscription for a dollar….“We are now able to measure, optimize, and track all of these readers and changes with real insight that we could never do before. It sounds dry, but it’s not. It’s really understanding readers, what matters to them. We are never going to edit by numbers, but we are going to inform all of our decisions around data.”

Not only that, the FT is going back to the model for the newspapers which have become pedestal mounted historical artifacts. Newspapers are back as the “trusted” folks in the information business. I know that I trusted newspapers until I read about US diplomacy and yellow journalist in the period from 1895 to 1898.

“Precisely because of digital disruption, precisely because there is so much information and news and information out there, the value of a trusted guide, the value of a trusted brand” has gone up, he said.

Yep, those families losing sons in the dust up among the US, Spain, and Cuba understand that trust stuff.

Then there is a statement which seems to bring the future payday for the new owner of the FT tantalizingly near:

But we fundamentally believed that if it’s quality journalism, people will pay for it. That’s been vindicated.”

From my point of view, what’s been vindicated is that there was a buyer willing to pony up more than $1 billion for a brand, several hundred thousand readers, and a Web site offering a  $1.00 trial subscription. I assume that is the definition of vindication from Pearson’s point of view. I am not sure about Nikkei’s point of view. If the FT’s senior management had an agreement with Pearson designed to keep the FT’s senior management on board, was some of the money shared with the FT leadership? Good question.

I also highlighted in red ink red, not money green, this statement attributed to the FT executive:

But he also said insisted that things are changing in journalism and that the business climate is improving. “There is a belief in journalism,” he said.

Stepping back I thought about the New Yorker’s analysis of the FT deal. Much of the verbiage could be used to describe how the New Yorker feels about its approach to news and information.

I asked myself, “Is this article about the FT or is it about the New Yorker’s perception of quality journalism?” Another good question.

And what about search. Does anyone recall the Endeca FT Newssift project? I do. Moving on.

Stephen E Arnold, September 29, 2015

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta