Taxodiary: At Last a Taxonomy News Service

August 3, 2010

I have tried to write about taxonomies, ontologies, and controlled term lists. I will be the first to admit that my approach has been to comment on the faux pundits, the so-called experts, and the azurini (self appointed experts in metatagging and indexing). The problem with the existing content flowing through the datasphere is that it is uninformed.

What makes commentary about tagging informed? Three attributes. First, I expect those who write about taxonomies to have built commercially-successful systems to manage terms lists and that those term lists are in wide use, conform to standards from ISO, ANSI, and similar outfits. Second, I expect those running the company to have broad experience in tagging for serious subjects, not the baloney that smacks of search engine optimization and snookering humans and algorithms with their alleged cleverness. Third, I expect the systems used to build taxonomies, manage classification schemes, and term lists to work; that is, a user can figure out how to get information out of a system relevant to his / her query.

taxodiary splash

Splash page for the Taxodiary news and information service.

How rare are these attributes?

Darned rare. When I worked on ABI/INFORM, Business Dateline, and the other database products, I relied on two people to guide my team and me. The first person is Betty Eddison, one of the leaders in indexing. May she rest in indexing heaven where SEO is confined to Hell. Betty was one of the founders of InMagic, a company on whose board I served for several years. Top notch. Care to argue? Get ready for a rumble, gentle reader.

The second person was Margie Hlava. Now Ms. Hlava, like Ms. Eddison, is one of the top guns in indexing. In fact, I would assert that she is on my yardstick either at the top or holds the top spot in this discipline. Please, keep in mind that her company Access Innovations and her partner Dr. Jay ven Eman are included in my reference to Ms. Hlava. How good is Ms. Hlava? Very good saith the goose.

Read more

Pundit Ignores Information Retrieval Reality

August 1, 2010

Short honk: I don’t have the energy to deal with “Cookie Madness”, an essay that appeared in the Buzz Machine. Maybe academics are afflicted with “a certain blindness” to use William James’s brilliant phrase? Maybe academics forget that most of the people using computers don’t know that their online activities can be tracked, including hover time, mouse movement, and cursor movement patterns?

More important is the penchant for publishers and reporters to embrace the roots of American journalism. The catch phrase for this approach to information fit nicely under the precept at the Courier Journal’s WHAS television unit as “If it bleeds, it leads.” Why? Money. Simple. Fear, controversy, and explosive allegations are the chemicals that feed the modern Venus Fly Trap of journalism. Nothing is more effective than creating an issue and then huffing with indignation about that issue. Quite an information ecosystem, right?

The Wall Street Journal is owned by a modern media mogul, presumably an owner of properties employing journalism school graduates, new media specialists, and even PhDs in social collaboration (whatever that means). When these rosy cheeked warriors arrive, those titanium tipped diggers will ferret out what sells.

The Wall Street Journal is focusing on fear and breathless explanations of how a computer system can track a user’s every online action. Hey, as long as it generates sales and gets the pundits’ panties in a bind, the Wall Street Journal’s story about tracking is doing its job. At least the journalists working on the story have jobs, for a while at least.

Sigh. Next Hyde Park moment coming up. Film at 11. Now this word.

Stephen E Arnold, August 1, 2010

Freebie

When Domains Collide: The Apple-Time Bonk

July 30, 2010

My NFAIS lecture will appear in a forthcoming book of essays. I won’t drag you through the argument in that write up. I do want to call your attention to “Apple Blocks Time, Others from Running iPad Subscriptions.” The issue is that Apple is taking a somewhat arbitrary stand with regard to a publisher using the Apple ecosystem sell an Apple device user a subscription without Apple being in the middle. For me the key passage in the write up was:

Why Apple would reject subscriptions isn’t known, but it’s speculated that the company may be worried about how publishers would use the consumer data collected with each subscription, even though such collection is standard in the print world. Apple might alternately be worried about missed revenue opportunities, since allowing direct payment for subscriptions would cut the company out of some or a lot of income. The latter approach would be incongruous though, since Amazon and the Wall Street Journal can already bill customers directly in some cases.

The view from the goose pond is easy to explain. Publishers want control but no longer own the distribution channel. Apple owns the distribution channel, including the bank, the customer, and the information pipeline. Apple wants control and will take it until someone or something prevents Apple from doing what it wants. Publishers on the other hand think they are in control.

When domains collide, the nature of who is on top guarantees friction between those who are crushing others and who is being crushed. In this battle, the publishers like the music folks are learning that the person with power pretty much calls the shots.

Solution: find a way to regain control. This might be tough because a core competency in distributing information in a 19th century world does not mean much in the data-centric world of 2010. Apple, which was a possible dead on arrival company two or three times in the last 20 years found a way to survive. Now publishers have to find their solution.

Complaining and trying to be clever in order to “work around” the Apple guidelines won’t work. Anyway, it may be too late.

Stephen E Arnold, July 30, 2010

Freebie, unlike a single copy of Time

Google Books Israel Edition

July 29, 2010

Nobody ever said the next frontier of literature would be smooth, but it is a realm that will be conquered none the less. Google is learning all about the highs and lows of digital books these days. A recent Globes article, “Google Books Reaches Isreal,” [NOTE: Link may be dead when you read this Beyond Search post] highlighted the search giant’s new foray into Hebrew texts, reaching a deal to allow full or partial downloads of many books published in Israel. This victory is offset by the legal hot water Google Books currently finds itself in in America. There, the online bookshop finally reached a settlement which, “claimed that Google’s scanning of texts was copyright infringement,” so it can now release many more titles. While Google is making forward progress, their boat seems to be taking on water. But the Google is persistent.

Ken Toth, July 28, 2010

Freebie

Content Risks and Rewards

July 28, 2010

My field is open source intelligence. I can’t reveal my sources, but I have heard that an intelligence unit can duplicate anywhere from 80 to 90 percent of its classified information from open sources. The trick of course is to know what is important. Most people can look at an open source document, dismiss it, and go about their day unaware of the key item of information that was right in front of them.

For that reason, this blog and my other blogs are open source. I use my Overflight system to suck in publicly accessible content. I look at what the system spits out and I highlight the important stuff. The magic in the system is not the software nor the writers whom I pay to create most of the content in Beyond Search and my other writings. I am sufficiently confident in my method that when I talk with a so called expert or an executive from a company, I am skeptical about what that person asserts. In most cases, experts lack the ability to put their information in context. Without context, even good information is useless.

When i read about Wikileaks publishing allegedly classified information, I wondered about the approach. Point your browser at “Next Step for Wikileaks: Crowdsourcing Classified Data” and learn what is ahead for information dissemination. The idea is that lots of people will contribute secrets.

Baloney.

The more stuff that is described as secret and sensitive, the more difficult it will be to figure out what is on the money and what is not. I have some nifty software, but I know from my tests that when information is weaponized, neither humans nor software can pinpoint where the train went off the tracks.

In my view, folks publishing allegedly classified information are looking for some rough sledding. Furthermore, the more baloney that gets pumped into the system, the greater the likelihood for disinformation.

If these documents had become known to me, I would have kept the puppies to myself. I would have used my Overflight system to verify points that my method identified as important. I would not accept any assertion, fact, or argument as valid until some more work was done.

Wikileaks is now famous, and sometimes fame can be tough. Just ask John Belushi if you can find him. People ask me what I don’t provide some color for some of my remarks. Well, that is because some information is not appropriate for a free blog. This is a lesson that I think some folks are going to learn in the School of Hard Knocks.

Stephen E Arnold, July 29, 2010

Freebie and open source

JustSystems Expresses Its Love for XML

July 26, 2010

JustSystems, now a unit of Keyence Corporation, posted “Beyond PDFs – Reach Your Audience with Multiple Output Formats” to pump up excitement for XML. The goslings and I love XML. We even have one or two clients who think XML is the way to handle content, not a “bohica”. The hitch seems to be getting legacy content into well formed XML without plunging the information technology department’s budget into the red inkwell.

According to one of my correspondents, the main point is:

XML, and particularly the Darwin Information Typing Architecture  (DITA) XML language, enables you to optimize content for different media. Because an XML-based system handles formatting and content separately, it lets you create one set of source files and then generate PDF files for print and HTML files for the web” Background: “PDF is a print-oriented format, and what works for print often doesn’t work for the web, for mobile devices, or for other electronic media. PDF is not the answer to every content delivery question.

The article has a number of useful links, including the pointer to DITA on Wikipedia, which I can never remember. The run down of output features may be useful if you don’t think of information objects assembled into “documents.”

Worth a look.

Stephen E Arnold, July 26, 2010

Freebie

CMS Vendors Face Old Age, Maybe Need HGH?

July 20, 2010

Content management systems and CMS consultants are an interesting mix. On the lower digit end of the CMS spectrum are the lightweight content management systems. Four years ago, the capabilities of even the vaunted Google’s Blogger.com, which seems frozen in time to me, were like Lance Armstrong’s 2010 Tour de France.

On the end of the spectrum where the big numbers are round, the industrial strength records management systems were found. The addled goose honks about IBM, but when properly configured, IBM’s FileNet can perform some nifty CMS tricks.

So the CMS spectrum ran from the citizen journalism functions to the mad scientist mode. The consultants followed suit. I don’t recall getting spam from IBM about FileNet. Sure, IBM – like any $100 billion outfit – has its weak moments, but shoving FileNet at the addled goose has never happened. Probably won’t even happen opine I.

The reason is that when you move to the double digit end of the CMS spectrum you enter a world where a document error can shut down a nuclear power plant after a US government inspection or a really friendly CEO gets to spend time with prisoners in the “yard.” The vast majority of CMS consultants trample around in the lightweight end of the CMS market.

The problem is that the lightweight systems are now looking more sophisticated, and some venture firms and corporations are taking a hard look at these former wimps.

Don’t believe me. Navigate to “Squarespace Gets $38M to Compete With WordPress and Six Apart”. The write up calls attention to three outfits with CMS that can do interesting things and seem to be growing as my son did when he was in the third grade. Every day he needed a new pair of sneakers with the French chicken on them. Le Coq Sportif for those who are not into suburban Maryland fashions. I noted this passage in the write up:

The size of the investment that Squarespace has managed to attract from Accel and Index indicates that these investors see the potential to take the company’s software and services beyond simple blogging and into the broader world of content-management systems. Although some media companies have been experimenting with open-source software such as Drupal and Joomla for web publishing, both of these are fairly complex to manage, and a hosted solution could appeal to publishers such as the Telegraph Group, which is already using a number of cloud-based services.

Squarespace is quite interesting. The company makes it dead simple to create a blog, a photo gallery, even a complete Web site. The user can drag and drop. Sure, SquareSpace allows coders to fiddle, but the company seems to draw the line with some potentially interesting live database action from its pages. Aside from that prudent step, SquareSpace is a CMS for the person or company frustrated with a traditional CMS.

Is the SquareSpace system right for managing nuclear power plant records? Probably, but I wouldn’t use the system for that purpose. Nor would I rely on SquareSpace for information likely to be probed for effective safeguards against spoliation. For other work, SquareSpace looks mighty tasty as it is.

What will happen with $38 million? Traditional content management vendors may want to pay some attention to the fun loving folks at this outfit. Also, the CMS consultants may find themselves having to work much harder to get those high-paying, wild and crazy CMS product reviews. SquareSpace makes it dead simple to play with the system any time, for free, for a couple of weeks.

Times are a’changin’ in CMS and CMS consulting I conclude.

Stephen E Arnold, July 20, 2010

Freebie

Former AOL Top Dog Marks the Territory

July 20, 2010

One week from now it will be a year since AOL changed their CEO giving the new one 100 days to turn things around and restore the tech industry leader to its rightful place at the top. One of the really interesting developments was the fact that during the last 12 months almost every executive left and was replaced with someone who had once worked for Google.

The resulting article in BusinessInsider.com, The Inside Story: An Anonymous Ex-AOL Exec Tells All, does just that. The company was demoralized and its employees down and out before a Google influence injected a new life to the beleaguered firm. It’s an interesting read and one that begs the question if the Googlization of America is something that will work or rather just one of those things we might have to get used to.

Rob Starr, July 20, 2010

Freebie

Tech Bug Bites Aging Readers Digest

July 20, 2010

It’s been happening since that 1990s when newspapers attempted to go viral so it’s no surprise that one of the most popular magazines we all grew up with is adopting an App.

Readers Digest UK is launching an iPad app in concert with YUDU media that will even add video to the digital version of the print magazine everyone remembers.
From the magazine’s standpoint, they are understandably excited. Gill Hudson is the Editor in Chief and she says that the iPad App will help the magazine’s desire to transform and develop. The marketing director for YUDU media is equally excited in the article entitled Reader’s Digest UK Accelerates Digital Transformation with YUDU iPad App.
No wonder. As the population ages, imagine all the crosswords that will be done electronically in Senior’s Centers all across the land? Might be a demographics rift with iPad users here.

Rob Starr, July 20, 2010

Freebie

Wikipedia Looks Ahead To Web 3.0

July 15, 2010

As far as Wikipedia’s Foundation is concerned, one of the cornerstones for moving the global resource to the next level and Web 3.0 will be making that data on the site’s 15 million articles decipherable to computers as well as the humans pushing their buttons.

Last month’s 2010 Semantic Technology conference in San Francisco saw developers showcasing how the needed semantic structure might be added to Wikipedia. It’s a big idea for a big database. Still there is a question as to the real value of the move.

The people attending the conference from Wikipedia were also actively recruiting help to make the base of the website more accessible to both computers and software.

One of the questions is how to determine the benefits when the service is implemented.

Rob Starr, July 15, 2010

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta