Open Source Works

August 14, 2010

Eric Gries recently posted a commentary on the ideals behind open source. And I must say, I can’t agree more.

He begins with a simple question, “Is open source more democratic or meritocratic?” OK, well maybe the question isn’t all that simple? First he compares it to a democracy because programmers campaign and vote on specific pieces of code—much like the way our government is supposed to work.

He then goes on to explain how it is also a merit-based system due to the fact that participants get rewarded for their individual levels of contribution. Not to mention the quality of their work. It ensures that the programmers who get to contribute are not just the idea men, but also those capable to see the ideas through.

The bottom line is, open source works and will continue to do so. Mr. Gries and I see eye to eye. Quite a feat for a wizard and a goose.

Stephen E Arnold, August 14, 2010

Freebie

Google Likes Model Aircraft in General

August 14, 2010

Google Denies Plans to Use Aerial Drones to Gather Data” sets the record straight. The “record” is reports that Google has an interest in micro-drones. Here’s the key passage:

“Google is not testing or using this technology. This was a purchase by a Google executive with an interest in robotics for personal use,” the company said this morning.

The addled goose was surprised. Fun loving Wirtshaftswoche found some information about the type of gizmo much loved in some military circles. Small ultra light radio controlled devices can do some amazing things. How amazing? Just watch a 12 year old play a first person shooter and you can get a sense of war fighting in the future. The “eyes” and “ears” could be micro drones.

So Google has an executive interested in robotics. No surprise. The goose is disappointed. I have a picture of the Google white board with a bunch of ideas. I will have to look at it to see if the tiny airplane on that picture is a micro drone or a big commercial aircraft. Probably a big commercial aircraft. Micro drones. StreetView. Government projects. In my opinion, silly. Quite silly.

Stephen E Arnold, August 10, 2010

Freebie

Java with NLP?

August 14, 2010

Jeff’s Search Engine Caffe: Java Open Source NLP and Text Mining Tools is a mother lode of Java open-source natural language processing and test mining tools. Jeff is a PhD student at UMass Amherst’s prestigious Center for Intelligent Information Retrieval and maintains a blog, which is so well-researched, it can serve as a reference point. Jeff’s site features a link to an interesting Apache Lucene Mahout project, which is designed to create highly scalable machine learning libraries. Currently, Mahout specializes in recommendation mining, clustering, classification, and item set mining. The Mahout site welcomes contributors and looks to facilitate discussions on the project and realize potential use cases. One of the most popular text classification frameworks is Weka, a collection of machine learning algorithms.

This site contains many useful links to incubator and implemented projects, and is worth a bookmark here in Harrod’s Creek.

Bret Quinn, August 14, 2010

Has Demand Media Been Nipped By Googzilla?

August 13, 2010

PEHub.com ran an interesting story, “What Happened to Demand Media’s Traffic?” The focus of the story is a nifty chart designed to catch the attention of the search engine optimization crowd. After a long run up, Demand Media’s traffic fell. According to the write up, “The massive drop off occurred a few days prior to the [Demand Media] IPO filing.”

The PEHub.com story speculates that Google is planning a content play itself. Hardly news here in the goose pond. We wrote Google: The Digital Gutenberg and made that case last year. But news has little to do with research, so the PEHub.com story with references to Google patents is as fresh as a spring blossom.

Our view is slightly different. What did you expect from the addled goose? Chopped liver. (Oh, that is a faux pas related to foie gras.)

  1. Google is chasing money and really is intent on generating revenues. Demand Media type content produces clicks and the Google is keenly interested in this aspect of certain Web plays; that is, clicks equal money.
  2. The Demand Media content is different from a Jeffrey A. Dean Google technical paper. Ergo: Google has some tricks up its lab jacket to become a player in the content stream for which Demand Media has become known.
  3. The shift makes clear the power of traffic reports.

Exciting for sure. I think the traffic drop is a coincidence. The Google has its mind on Korean bar-b-q.

Stephen E Arnold, August 13, 2010

DC-X Goes with Lucene and Solr

August 13, 2010

Digital Collections’ newest DC-X product is saying “bye-bye!” to Oracle Text and “well, hello there…” to Lucene and Solr. This is Digital Collections first venture away from Oracle Text.

In a this useful write up, Digital Collections explained that Oracle Text’s limitations were starting to catch up with them. Among the Oracle Text drawbacks are unstable query performance, difficulty to scale, and inadequate support.

Digital Collections searched for alternatives, but were hard pressed to find a better option. Finally they heard about Lucene and Solr through the proverbial grapevine and gave them a try. They were thoroughly impressed and have decided to give it a go with their new DC-X product.

Oracle is still being used to hold much of the data—just not for the search engine. It’s not in production use yet, so it will be interesting to see if any bugs pop up that need fixing. Anyway, add this to your migration reference material.

Stephen E Arnold, August 12, 2010

Oracle Text 11g Bed Time Reading

August 13, 2010

You can get your hands on the Oracle Text 11g reference guide here. There’s plenty of useful information that makes this a valuable addition to your digital reference library.

Most interesting, perhaps, is the section entitled “What’s New in Oracle Text?” It details—you guessed it!—what they’ve added this go around.

One of the most notable additions is the new Oracle Text Manager in Oracle Enterprise Manager. This functional offering allows you to monitor and modify indexes, manage logs, figure out and fix failed operations, and rebuild indexes. Among other things. As an added bonus, the results from retrieving a list of words generates entries with the same word appearing multiple time. (There are more sophisticated ways to handle term lists to be sure.)

So do yourself a favor and check out the Oracle Text 11g reference manual. Just don’t read the entire thing and become a consultant. You may be zapped. No, make that SAP’ed if your try to sell your consulting expertise without an Oracle seal of approval.

Stephen E Arnold, August 14, 2010

ZL Systems and TREC

August 13, 2010

I don’t write anything about TREC, the text retrieval conference “managed” by NIST (US Department of Commerce’s National Institute of Standards and Technology). The participants in the “tracks”, as I understand the rules, may not use the data for Madison Avenue-style cartwheels and reality distortion exercises.

The TREC work is focused on what I characterize as “interesting academic exercises.” Over the years, the commercial marketplace has moved in directions that are different from the activities for the TREC “tracks”. A TREC exercise is time consuming and expensive. The results are difficult for tire kickers to figure out. In the last three years, the commercial market is moving in a manner different from academic analyses. You may recall my mentioning that Autonomy had 20,000 customers and that Microsoft SharePoint has tens of millions of licensees. Each license contains search technology and cultivates a fiercely competitive ecosystem to “improve” findability in SharePoint. Google is chugging along without much worry about what’s happening outside of the Googleplex unless it involves Apple, money, and lawyers. In short, research is one thing. Commercial success is quite another.

I was, therefore, interested to see “Study Finds that E-Discovery Using Enterprise-Wide Search Improves Results and Reduces Costs.” The information about this study appeared in the ZL Technologies’ blog The Modern Archivist in June 2010. You can read the story “New Scientific Paper for TREC Conference”, which was online this morning (August 10, 2010). In general information about TREC is hard to find. Folks who post links to TREC presentations often find that the referenced document is a very short item or no longer available. However, you can download the full “scientific paper” from the TREC Web site.

The point of the ZL write up is summarized in this passage:

Using two fully-independent teams, ZL tested the increased responsiveness of the enterprise-wide approach and the results were striking:  The enterprise-wide search yielded 77 custodians and 302 responsive email messages, while the custodian approach failed to identify 84% of the responsive documents.

The goose translates this to mean that there’s no shortcut when hunting for information. No big surprise to the goose, but probably a downer to those who like attention deficit disorder search systems.

So what’s a ZL Technologies? The company says:

[It] provides cutting-edge enterprise software solutions for e-mail and files archiving for regulatory compliance, litigation support, corporate governance, and storage management. ZL’s Unified Archive, offers a single unified platform to provide all the above capabilities, while maintaining a single copy and a unified policy across the enterprise. With a proven track record and enterprise clients which include top global institutions in finance and industry, ZL has emerged as the specialized provider of large-scale email archiving for eDiscovery and compliance.

Some information about TREC 2010 appears in “TREC 2010 Web Track Guidelines”. The intent is to describe one “track”, but the information provides some broader information about what’s going on for 2010. The “official” home page for TREC may be useful to some Beyond Search readers.

For more TREC information, you will have to attend the conference or contact TREC directly. The goose is now about to get his feathers ruffled about the availability of presentations that point out that search and retrieval has a long journey ahead.

Reality is often different from what the marketers present in my opinion.

Stephen E Arnold, August 12, 2010

Freebie

Alleged Police Raid on Google Korea

August 12, 2010

The BBC’s “Google Offices Raided by Korean Police” won’t make much difference to Verizon and probably not much difference to the search engine optimization gangs. However, if the story is true, a Google legal eagle will have some work in lovely South Korea.

The reason for the alleged raid is Google’s alleged WiFi sniffing for the not-do-alleged StreetView service. The math club crowd probably thinks that Korean authorities were acting irrationally. That’s okay. Points of view and differences of opinion cause people to see actions in different ways.

Here’s a snippet from the BBC story that I noted:

A police statement said they suspected Google has been collecting and storing data on “unspecified internet users from wi-fi networks”…. “[We] have been investigating Google Korea on suspicion of unauthorized collection and storage of data on unspecified Internet users from Wi-Fi networks,” the Korean National Police Agency (KNPA) said in a statement. Korean media reported that 19 KNPA agents raided the office, seizing hard drives and related documents.  Authorities said they plan to summon Google officials for investigation once analysis on the confiscated items is complete.

The way the Korean authorities acted reminded me of Norwegian police tactics in October 2008 when the Fast Search & Transfer offices experienced what I recall was an “action”, maybe a “raid.” (Another similarity Google shares with Microsoft wonder I?)

For Google to keep those revenues flowing, one wonders if lighting up the sensors of various law enforcement and governmental professionals is a revenue plus or minus. For sure, there may be some added friction for the Googlers in countries where authorities conduct raids. Sending 19 officers is either a typographical error or an indication that the Korean authorities were not in the mood for a Google foosball game, more like rugby.

Stephen E Arnold, August 12, 2010

Blog Battle: Big Blue versus an Azurini

August 12, 2010

I was not going to write this short item, but the battle of the blues was too enticing to ignore. First, I don’t know what’s right and what’s wrong. I read this story, “IBM, Gartner in Blog Tiff Over Notes Report.” The link may be dead when you read this story. Blame Yahoo, not me. The story describes a bloggy spat about Lotus Notes and a consultant’s view thereof. Now, I love Lotus Notes. At Ziff in the late 1980s, we were early adopters, and I got to know the importance of authorized engineering outfits like Kinderhook. Anyone remember that group? So Lotus Notes worked reasonably well, and it delivered functionality that was remarkable.

Over the years, IBM has continued to invest in Lotus Notes, and the system has a pretty good grip on some US government agencies and big companies. Today, Lotus Notes must deal with Microsoft SharePoint. Google wants to crush the Lotus blossom too. The fact is that none of these systems is particularly bad. Each company can make a strong case for itself and against the others. The reality is that software from big, smart outfits does not vary a great deal. Features vary but technology advances in a reasonably measured way for enterprise applications.

What’s the point of the Blog Tiff write up?

IBM seems not to be thrilled with a report from the azure chip consulting firm. Here’s a snippet from the article, and you will want to read the complete article to get the complete picture:

A Gartner report about users looking at migrating from Lotus Notes and Domino didn’t sit well with IBM’s Ed Brill, who spoke his mind in a Friday blog post. But his spin on the contents of the report is too one-sided, according to Gartner’s Tom Austin, who shot back over the weekend.

On Thursday, Gartner published a report called “Migrating off Notes/Domino e-mail may make sense in some circumstances,” saying that more Lotus customers come to Gartner for advice about moving to other e-mail systems. The report is much ado about nothing, according to Brill, director of product marketing at IBM Lotus. A headline that better describes the content of the report would be: “Migrating off Notes/Domino doesn’t make sense in most circumstances,” according to Brill’s blog post. However, that name probably wouldn’t sell as much consulting time, Brill said.

Here’s my take, and keep in mind that this is an opinion.

First, consultants love buzz. A dust up creates awareness. True, some outfits get annoyed, but the idea is that the buzz generates new leads.

Second, Big Blue does not like criticism. With $100 billion in revenue, the revenues of an azure chip consulting firm are chopped liver. Picking a fight with Big Blue allows IBM legal eagles to flap their wings. On a bad day, the eagles may want to get themselves a squirrel. Snack time. This means that the Blog Tiff could become a much bigger deal on a slow August afternoon.

Third, the folks on the sidelines like Google and Microsoft may have some fun emulating folks on the Comedy Channel. This type of casual humor may do more harm than good, however. Both Big Blue and the azure chip outfit will want it to stop. A misstep can make what is a tiny story grow to the Zeus-scale.

In short, I want to avoid this squabble and the companies involved. Back to the goose pond, gentle reader.

Stephen E Arnold, August 12, 2010

Freebie.

The Value of Service Level Agreements: Maybe Not So Much?

August 12, 2010

Whip out your iPad and load up “Inside American Eagle Outfitter’s 8-Day Website Nightmare.” If this information is on the money, In America, back to school starts in the middle of summer. When a company selling trendy stuff to school age people, big money is at stake. Like many outfits, the top dogs are not chewing on computer chips. These outfits go with big names in the warm embrace of an American TV drama. Pretty crazy stuff for 40 minutes and then, bang, everything is okay again. Well, life does not work that way. Service Level Agreements or SLAs are not written like US couch potato snacks.

Big outfit inked deals with IBM and Oracle. Bang. Problems. Here’s the key passage in the story:

atypical and concurrent failures with IBM’s hosting servers and backup plans as well as with Oracle’s Data Guard utility program ultimately proved to be the sources of problems.

If true, will outfits like Amazon and Google pitch American Eagle? What about the old saw that big vendors can offer SLAs that mean something? As long as the “something” is ambiguous, what’s not to like? Err, the outage? The time required to get back online? What’s the value of an SLA with a big vendor? Maybe not so much?

Stephen E Arnold, August 12, 2010

Unlike an SLA, the write up is free, has minimal downtime, and won’t put you out on a limb.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta