Solr: Useful Introduction

September 21, 2008

A happy quack to the person in the Netherlands who alerted me to this Softpedia write up about Solr. Solr is a Lucene-based Java search based enterprise search server. The description is here. The entry provides a comprehensive list of Solr features. Missing, however, is a link to download the system on all platforms. You can find Solr here.

Stephen Arnold, September 21, 2008

SharePoint: Picture Perfect Search

September 21, 2008

You have SharePoint search configured, optimized, and humming like a top. You have scaled up and out. Now you are ready for the next level in SharePoint. You are on the starting line for image search. For a useful guide to implementing image search within SharePoint, you will want to read and save Matthew McDermott’s “SharePoint Image Search.” This is a four part series, and you will need all four parts to round out your knowledge of the operation. You can retrieve the series here.

For me, the most useful part of the series was Mr. McDermott’s discussion of the procedures required to index images. Tips include finding and installing iFilters and then troubleshooting the iFilters. What sticks in my mind that multiple crawls and index inspection are necessary in order to find out exactly what has been retrieved and processed. Multiple crawls are trivial and quick on small SharePoint installations. When the SharePoint installation sprawls over hundreds or thousands of servers, the recrawls are non trivial. The procedure for mapping crawled properties to managed properties is a must save bit of explanation. Part 4’s sample code is as important. In fact, without this write up, the likelihood of a mere mortal getting SharePoint to deliver images in search results is pretty close to zero.

Three thoughts:

  1. This is a lot of work, particularly for large SharePoint installations. I personally would not go through these procedures and the recrawls, manual inspection, and code twiddling. Third party vendors deliver image results without this hassle.
  2. It’s clear that anyone with the programming knowledge, patience, and SharePoint bug can hack the system to perform some clever search operations. In my experience, large SharePoint installations and hacking are mutually exclusive. A glitch can be expensive to locate and remediate.
  3. Microsoft should add image search to its SharePoint service. The omission is egregious. I also want to locate other file types as well and without the hoop jumping.

Mr. McDermott deserves a happy quack. Microsoft earns a goose gift for not building this function into the system.

Stephen Arnold, September 21, 2008

Oh, My. Google Personal News

September 21, 2008

Newspapers worldwide no longer ignore Google. Nope. The “kill more trees” crowd sees the LCD message. The GOOG does news. If you have not explored Google’s personal news service, here’s the url http://news.google.com.my/news

The Star Online here has a good summary of what the service delivers to users worldwide. The Star is published in Malaysia. Users in more than 48 countries can use the service in the country’s native language. If you find a story you can’t read, you can use Google Translate to sort out the meaning here.

After some clicking you can configure a nifty summary of what’s happened in the last 36 hours. Some headlines turn up more quickly, but for the personalized topics that I track, Google lags me by about eight hours. Your mileage may vary.

There are no advertisements on the Personal News page that I could spot. Even Google seems reluctant to jab more digital lances into the media bulls’ necks. What’s interesting is that I can replicate most of the Google functionality with other free services. What sets the GOOG’s service apart is the easy to use configuration tool and the speed with which headlines, images, and snippets render on my cheap laptop in Utrecht via a snagged, open WiFi signal.

Will the global media titans be able to stop Googzilla? In my opinion, the media titans are about 10 years too late and Googleplex of technology savvy short. But, just for goose fun, let’s assume that the newspaper titans get this Google News “my” service turned off. Here’s a scenario for you:

Google offers to share revenue for stories posted by freelance journalists, retired journalists, or Web log people whom Google certifies. Slap a few ad slots on the “my” page and call it a day.

I know many people love Yahoo News. I have a personal Yahoo news page, and I find headlines that don’t update, weird configuration tools that don’t give me control, and a content selection function that makes me do too much work. The standard news page features weird pop ups, which I dislike, and the tab that I have to select to see stories from services that are not featured. In the last year, I have learned to put up with Yahoo and love newsreaders. Now “my” Google News is flirting with me.

In this scenario, three constituencies may have some trouble:

  1. The media titans are in for a long slog through a revenue Sahara
  2. Web 2.0 newsreader providers may have to do some additional work
  3. Yahoo, long number one in Web news, may face some competition

Check out the “my” service. I used to work for a traditional newspaper which was purchased by a global media titan. After watching the daily news hole shrink, talented journalists fired, and the paper chopped down to the size of a legal pad, I must say, “Well, dudes, you are in a bit of a pickle now.” Chuckle. Chuckle.

Stephen Arnold, September 22, 2008

Earth to Forrester, Earth to Forrester. The Economic Slowdown Is Here. Copy.

September 20, 2008

Update September 23, 2008: More on the “mild” downturn in the IT world. When Silicon Valley gets the flu, might it be contagious? More information here and here.

[original post]

I just stepped off a fun filled flight from Amsterdam to New York. I fire up my browser and read this headline: “Mild Tech Slowdown Ahead”. The author is the super-guru, mid range consulting firm edition, George F. Colony. You can read his exegesis here. The headline says it clearly. The balance of 2008 and 2009 is headed for a “mild” economic downturn. Technology will be affected. I–as the official addled goose of information access–can’t dispute the lofty thoughts of Forrester. I can add several observations and perhaps my two or three readers I have can add a few observations. You don’t have to agree with me. You are rejoicing in the “mild” slowdown which has little material impact on your technology centric activities.

image

Which is it in IT? Happy face or sad face. I vote for sad.

My observations:

  1. Information technology is in crisis. Major projects are not delivering. Users–up to two thirds of those struggling with search and content processing–are dissatisfied with those systems. The issues are noticeable in the desultory attitudes of trade show attendees (at least the trade shows I attend) and the “we can do anything” pitch of the vendors. There’s a problem, Houston, and “mild” doesn’t capture the situation.
  2. I took a quick look at an analysis of eDiscovery firms in a late 2006 report. Of the 48 vendors mentions substantive changes in the material circumstances of 26 firms indicate that in this one sector there are too many hungry chiefs and not enough to eat. The same revenue starvation is evident in business intelligence, enterprise search, and even that BurgerKing Whopper of fuzziness–content management. The downturn began in 2007 and is now accelerating. If the trend continues, 2009 will find the chiefs killing and consuming one another. Charles Darwin in action. “Mild” is not the word I choose for the tension building among small and large vendors alike.
  3. The information technology budgets are in shambles. A Fortune 500 has solved the inflationary systems and software costs in a very simple way–budget caps. No matter what happens, the IT folks have a finite amount of money. The same bean counter approach may be found at US national laboratories, Federal agencies, start ups without significant revenue, and big companies. There’s not enough pay off from many of the zippy new investments to make venture sharks and terrified fund managers to throw money into systems and services that don’t pay off.

Maybe my view of the information technology world is skewed. I get asked to comment on how to fix such excesses as organizations that own multiple search systems that don’t work or play well with one another, by financial outfits trying to figure out how certain companies report record revenues without a concomitant payoff to the bottomline, and innovative companies who can’t figure out how to close a deal and get the client to pay on time because the system doesn’t meet user needs.

When this “mild” downturn is put in the context of the broader economic challenges in the US and now elsewhere, I see some rough seas ahead. Actually that’s not a good metaphor. I see a category 5 hurricane building. It’s heading right at information technology implementations that fail. It’s going to hit the vendors who promise anything and then deliver disappointments. It will strike directly at companies who deploy yesterday’s bread as today’s freshly baked donut.

image

This is a Siebel smart probe. Do you want to deploy this on your watch or dial in to one of the new cloud based options? I am leaning to the cloud based solutions. On premises’ installations are too tough to manage and keep on a reasonable management track. What’s your experience?

The interest in cloud based applications is growing. One reason is that cost control may be easier. Another is that if a cloud solution works, an organization can trim some fat from its IT budget: people, licenses, consultants, and hardware. Five years ago geospatial meant on premises and expensive solutions; today, think Google. Five years ago CRM meant PeopleSoft and Siebel; today, think Salesforce.com. Other sectors are gaining steam because the old IT model is not right for the challenges that face companies now and tomorrow.

What’s your experience? Vindaloo or no spice in the IT, information access, and content processing world? Help me learn. Share your data, please.

Stephen Arnold, September 20, 2008

Search: Moving Up the Buzzword Chain of Being

September 20, 2008

In one of my university required courses, the professor revealed the secrets of “the great chain of being”. After 45 years, my recollection of Dr. Pearce’s lecture are fuzzy, but I recall at the top of the chain was God, then angels, and then a pecking order of creatures. Down at the bottom were paramecia like me.

Search terminology works like this I concluded after giving my talk at Erik Hartmann’s conference in Utrecht. I prepared for my remarks by talking with a dozen vendors exhibiting at the conference. I also listened to various presenters for five to 15 minutes. I had to limit my listening in order to get a representative sampling of the topics and interests of the conference attendees.

What I concluded was:

  1. People perceive Google as a Web search company that sells ads. In this biased sample, I noted a discomfort about Google’s growing dominance of digital information. I did not hear any one criticize Google, but I sensed a growing concern about privacy, scope, traffic, etc. I remain excited about Google and probably come across as a Google cheerleader, which annoyed some of the people with whom I spoke.
  2. Vendors and consultants who once hawked content management, records management, and enterprise search have changed their tune. Instead of talking about CMS, EDM, and other smart sounding acronyms, the vendors are pulling terminology out of MBA lexicons. (More about this in a moment.)
  3. The people listening to these talks, including mine, hunger–even plead–for solutions to challenges arising from their inability to find needed information, manage terabytes of digital “stuff” in their offices, and create a solution that does not require constant spoon feeding.

The result is that “old” solutions and half baked solutions are wrapped in new terminology taken from a higher level in the “great chain of buzzwords”. Here’s an example: instead of saying “enterprise search” or “behind the firewall search”, some vendors talked about “information access” and “findability” whatever that means. The lesser word is search, which most people seemed to agree was uninteresting, which is a code word for “does not work”. The words “information access” come from a loftier position on the buzzword “great chain of being”. The vendors are sounding more like McKinsey and Booz, Allen known nothings than subject matter experts.

great chain of being

A representation of the Great Chain of Being. Image source: http://www.kheper.net/topics/greatchainofbeing/Steps.gif

Consider this example: “business process management”. This is definitely a buzzword from a loftier position on the buzzword “great chain of being”. “BPM” is in the Heaven category, not Stone or Flame category. But I don’t know what BPM means. I think the folks using this word want to avoid precise definitions because that limits their freedom. Implying that “BPM” will solve a problem is easier than actually diagnosing the problem and solving it. “BPM” was the acronym of the conference. Presenters from publishers, consultancies, and vendors inserting this three letter token for what seemed like a pretty basic notion; that is, the steps needed to complete a task. Since search and content management are losers in the revenue generating department, folks engaged in these activities now talk about BPM. Old wine, new bottles but the labels have buzzwords from higher in the “great chain of being”.

Read more

Easy Ask and Progress Software’s Declining Earnings

September 20, 2008

Easy Ask is a search system owned by Progress Software. Like Endeca and Mercado, Easy Ask delivers a system that can make an e-commerce site generate more revenue. But Easy Ask, like other search companies gobbled by larger firms, has to fight for respect in a contentious, volatile world. You can get more information about Easy Ask e-commerce system here. 

The company has been enhancing the product. The full name is Progress EasyAsk, but my spelling checker balks at the squishing of two words together. The product has been sent to the gym and fed protein milk shakes. Its beefier functionalities are described by the company this way:

Progress Easy Ask for Operational Business Intelligence (BI) fills the gap between search and BI systems to allow executives, analysts, business managers, and professional staff to access the information they need to improve business operations.

Progress released its third quarter results, and you can scan Forbes’ take on the report card here, and the news was not bad but not good either. Progess’ revenue slipped, down from $127 million in Q3 2007 to $123 million in Q3 2008. Progress has been diversifying and leveraging its acquisitions, but if this trend continues, I wonder if Easy Ask will get the financial injections to which search and content processing companies are addicted. If not, Easy Ask may face a tough 2009.

Can Progress get its revenue back on track? I think it will take significant management work. Progress has to justify its products’ value when clients are looking at low cost tools from other vendors, open source, Microsoft’s array of servers. Dot Net, and programming tools, and brutal competition in search.

It’s too early to sound a stronger warning about Progress, but the company’s results for the next 12 weeks will be interesting to review when those data become available.

Stephen Arnold, September 20, 2008

SharePoint Positives

September 20, 2008

Software Development Times published on September 15, 2008, “SharePoint Scores with Developers” here. The writer, David Worthington, provides an excellent summary of the case for SharePoint. The explanation of SharePoint benefits may  be a better presentation than Microsoft’s own sales collateral. One point that caught my attention was this statement:

“Microsoft has run QA over it, and the CRUD [create, read, update and delete] operations are built into the quality life cycle. There are standard data access operations against SharePoint lists; you don’t have to spend nearly as much time testing.”

Rapid application development is important in many information department activities. Eliminating testing, in my experience, may have some downsides. Because of the differences that exist within Microsoft’s own products, time savings in one place may result in an unexpected time cost in another. Tailoring search in SharePoint can be a particularly tricky job. Agree? Disagree?

Stephen Arnold, September 20, 2008

Google: Unchrome Chrome’s Tracking Functions

September 20, 2008

Silicon.com has a useful article “Google Browser’s Tracking Feature Alarms Developers, Privacy Advocates” about Google Chrome. The writer Elise Ackerman references the browser’s “phoning home” feature. But the most important item in the write up is the unearthing of a piece of software that can be used, she asserts, to blunt some of Google’s usage tracking functions. She writes:

I firmly believe that it is better to have control over your own privacy without having to trust that Google doesn’t do anything bad with your data,” said Sven Abels, president of Abelssoft, a software company in Delmenhorst, Germany, that is offering free downloads of its UnChrome software.

This download link worked for me at 5 18 am Utrecht on September 20, 2008. I have not tested this script, so use it at your own risk. More information about Google’s tracking, ad injection technology, and usage data models appear in various Google technical papers and documents available from the USPTO. I included a description of some of the Google methods in my 2005 study The Google Legacy, which is still available from the publisher here. Usage tracking is a bit of old news given new life because of Google’s Chrome release. Chrome, as I noted in my speech on Thursday afternoon at the Hartmann Conference, is perceived as a  browser. In my opinion it is an umbilical that connects any computing device to the Google data centers. The computing device and its operating system are little more than booster rockets that get the user into Google orbit.

Stephen Arnold, September 20, 2008

HP Gets Googley

September 20, 2008

The story in InfoWorld “HP Applies Google Model to New Storage System” marks a turning point in vendors of expensive, brand name iron.  You can read the story by Mikael Ricknas here. For me the most important point in the article was:

Hewlett-Packard’s ExDS storage system is an online content repository that will cost less than $2 per gigabyte or $2,000 per terabyte.

HP sells significant amounts of hardware to Microsoft. At this bargain basement price, HP must think it can make money on what may be knife edge margins. As important, Hewlett Packard appears to be emulating Google’s approach to storage. Google’s technical papers reveal significant details about its storage methods five years ago. If you read between the lines, Google references its storage techniques in its discussion of other Google innovations.

The HP technology could assist Microsoft and other companies wrestling with storage. I wonder if Google has improved its storage methods in the same 60 month interval. Catching up to where Google was won’t provide a substantive payoff over the long term.

Can HP innovate to leap frog Google? Let me know your thoughts?

Stephen Arnold, September 19, 2008

 

Google: Limits of the Googley Technical Management Model

September 19, 2008

You can read the statement by the  former Digital Equipment wizard and now Google senior vice president of engineering and research Alan Eustace here. The article “Changes in Phoenix” announces the closing of the Google office in Phoenix. The office opened several years ago, and you can refresh your memory about the promise of this new engineering facility here. The statement is clear and gentle. For me the most important statement in the article was:

But we’ve found that despite everyone’s best efforts, the projects our engineers have been working on in Arizona have been, and remain, highly fragmented. So after a lot of soul searching we have decided to incorporate work on these projects into teams elsewhere at Google. We will therefore be closing our Arizona office on November 21, 2008.

My take on this is that Phoenix may be an anomaly. Maybe it was the weather? Maybe it was water or lack of it? With this office closing, I asked myself, “Has the Googley management method reached its sustainable limit?” With one office closing, I just have a question. I need more data. A single instance. Nothing more. But what if…

Stephen Arnold, September 19, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta