Security: Search a Factor

May 10, 2009

Security of online information is critical to any company who operates on the Internet, from large corporations to medical institutions to the federal government. Remember the stolen laptop? Security online, especially when setting up a database of searchable, confidential material, is a herculean task, because if it’s online–someone can search and find it. Case in point, a headline from May 7: US Med Data Held Hostage by Hackers; Ransom: $10M. See the article at http://bit.ly/16IoZi. Hackers stole over eight million cases of drug prescription records, social security numbers, and driver’s license details from Virginia on April 30. It was reported that several layers of protection failed and allowed the hackers access. It’s not the first time something like this has happened. Data security online must be improved, or we’re all going to be facing a lot more fraud in the future.

Jessica Bratcher, May 10, 2009

SharePartXXL Taxonomy Component

May 10, 2009

Some azure chip consultants tout a taxonomy as the spike that will kill the werewolf of information retrieval. A number of vendors have recognized the hunger of organizations with disappointing search systems. I cast an eye over the offerings, and I have visited with developers of these systems. A large number of SharePoint taxonomy solutions exists in the Microsoft ecosystem.

SharePoint Reviews covers quite a few SharePoint add ins. Jeremy Caney does a good job describing a product available from SharePartXXL. You can read “Taxonomy Extension by SharePartXXL Integrates Nicely with MOSS 2007” here. The product snaps into SharePoint and adds taxonomy management functions not included in SharePoint. Mr. Caney points to some shortcomings in the product. In my experience, there are only a few industrial strength taxonomy tools available that provide comprehensive control of term lists. Even fewer are able to generate ANSI standard taxonomies.

You can get information about SharePartXXL’s solutions here. These range in cost from about $1,500 to $3,500.

If you need the horsepower for managing ANSI standard term lists, taxonomies, and controlled vocabularies, you will want to take a look at the products available from Access Innovations here.

Stephen Arnold, May 10, 2009

YAGG Plagues Google Gmail

May 9, 2009

Short honk: The BBC reported here that Gmail suffered an outage’. You can read the May 8, 2009 story here. The Beeb’s headline tells the tale’ “Google Email Service Back Up After GFail.” For other Google glitches search this Web log for YAGG, yet another Google glitch.

Stephen Arnold, May 9, 2009

New Media, Old Media Spoofed

May 7, 2009

The story “Student’s Wikipedia Hoax Quote Used Worldwide in Newspaper Obituaries” here underscored for me the precarious nature of “information” in today’s world. The Irish Times reported that a fake quote in Wikipedia popped up in newspapers around the world. New media and old media both fell into the comfortable assumption that if it is online, information is correct, accurate, true, and well-formed.

At a conference yesterday, I spoke with a group of information professionals. The subject was the complexity of information. One of the people in the group said, “Most of the technical information goes right over my head. At work, people turn to me for answers.”

I don’t want to dip into epistemological waters. I can observe that the rising amount of digital information (right or wrong is irrelevant) poses some major challenges to individuals and organizations. The push for cost reduction fosters an environment suitable for short cuts.

Last Sunday, one of my publishers (Harry Collier, managing director, Infonortics Ltd.) and I were talking about the change in how large news organizations operated. He had worked from book and newspaper publishers earlier in his career as i had. He noted that the days of investigative reporting seem to have passed.

I had to agree. The advent of online has made research something that takes place within the cheerful confines of the Web browser. Interviews I once conducted face to face, now take place via email. Even the telephone has fallen from favor because it is difficult to catch a person when he or she is not busy.

A number of companies involved in content processing are experimenting with systems that can provide some clues to the “provenance” or “sentiment” of information. The idea is that tireless software can provide some guideposts one can use to determine if a statement is right or wrong, hot or cold, or some similar soft characteristic.

The quote story from the Irish Times highlights the pervasiveness of online short cuts. In this case, the quote is unlikely to do much lasting harm. Can the same be said of other information short cuts that are taken each day? Will one of these short cuts trigger a chain of unexpected events? Will the work processes that encourage short cuts create ever more chaotic information systems that act like a brake on performance? Who is spoofing whom? Maybe ourselves?

Stephen Arnold, June 8, 2009

Twitter Pumps Search

May 7, 2009

Newsfactor here and other Web news services posted stories about Twitter getting a dose of search steroids. You will want to read “Not-for-Sale Twitter Is Expanding Search Functionality” by Patricia Resende to get the details. Ms. Resende wrote:

Twitter Search will be used to crawl information from links by Twitters to analyze and then index the content for future use, Jayaram, a former vice president for search quality at Google, told Webware. Currently Twitter Search is only used to search words included in tweets, but not words in links. Along with its new crawling functionality, Twitter Search will also get a ranking system. When users do a search on trending topics — the top-10 topics people tweet about, which get their own link on the Twitter sidebar — Twitter will analyze the reputation of the tweet writer and rank search results partially based on that.

To me, I think this scoring will be an important step. Here’s why:

  1. Clickstream metrics by individuals about topics, links, and words provide important clues to smart software
  2. Individuals with large numbers of followers provide “stakes in the sand” for making some type of subjective, value-centric calculation; for example, a person with more followers can be interpreted as an “authority”
  3. Individuals who post large number of results and have followers and topics add additional scoring dimensions for calculating “reputation” and other squishy notions.

A number of commercial content processing companies are in the “reputation” and subjective scoring game, but Twitter is a free (for now) real time service with a large volume of posts. The combination makes Twitter a potential dark horse in the reputation analysis game. Believe me. That game has some high stakes. Nonsense about waiting in line at a restaurant becomes high value data when one can identify high score folks standing in line multiple times per week. You don’t have to be a rocket scientist to figure out that the restaurant is doing something right. The score may not be a Zagat type report, but it works pretty well for making certain types of marketing scans useful.

Twitter on steroids plus real time search. More than teen craziness I assert.

Stephen Arnold, May 8, 2009

Google and Publishing: Some Metrics

May 7, 2009

The Guardian reported some metrics about Google and publishing. You will find the summary of a Googler’s speech at a publishing conference here. The article is “Google: We Give Publishers £3.3bn”. Highlights from the news story include:

  • A quote attributed to Googler Matt Brittin, “We want to help publishers make money online”
  • Google sends a billion clicks to publishers each month
  • Google wants to “work with publishers to improve their digital revenues and help close the gap between print and online advertising returns”.

For me, the most interesting comment in the article was this passage:

He [Matt Brittin, Googler] said publishers should look to use technology to help their digital publications move at a greater pace and keep up with consumer demand, but that while it could help, Google could not offer all the necessary solutions.

The challenge that I see is that publishers think about technology in terms of putting color pictures in newspapers and slashing costs. Technology as the term is used by Googlers may a more sophisticated approach.

I don’t think the audience was able to see a path through the swamp. I wonder if any of those Google billions were jingling in the pockets of the attendees?

Stephen Arnold, June 7, 2009

Alternatives to Google Web Search

May 7, 2009

Abhijeet Mukherjee wrote “Ditch Google For A Day: 10 Amazing Search Engines to Try Out” here. This article provides a list of search engines that may be useful. The premised of the essay is to assert that a reader may want to set Google aside for a day or two and use these systems. I don’t want to reproduce the list. Please, visit the original write up. I would like to mention three of the systems and offer a brief comment.

First, Docjax is a metasearch system. I have noticed that Google’s coverage of PowerPoint files has been changing over the last year. There are fewer PowerPoints available and Google does not do a particularly stellar job of indexing the contents of presentations on services such as Scribd. Docjax may be useful to you. I find it helpful for certain queries.

Second, Yahoo Glue is one of those Yahoo search experiments that deliver some useful search features. I used Mindset, now removed from the Yahoo Labs’s site, for certain types of technical queries. I don’t like Yahoo Glue as well, but you may find that Yahoo is more useful with the Glue service. When I first saw Glue, I thought it was a variation on Google’s universal search.

Third, Freshbargains is a useful bargain search engine. I would classify this as a vertical search system. Some results are spot on, others less useful. Worth a shot when looking for deals.

In my opinion, none of these is a leap frog service, and I don’t any of these systems can scale like Google for a couple of reasons. First, the cost would be high and the economy is not too good in my view. Second, Google has a magnetic brand. Trucks of ad dollars would be needed to catch user’s attention.

Marginally better won’t close the gap between these systems’ market share and Google’s. That 10 year lead looks more formidable each day.

Stephen Arnold, May 8, 2009

Visualization: Interesting and Mostly Misleading

May 7, 2009

i fund the EVvie award for excellence in search system analysis. This year’s number two winner was Martin Baumgartel for his paper “Advanced Visualization of Search Results: More Risks, or More Chances?”. You can read the full text here and a brief interview with him here. You will want to read Mr. Baumgartel’s paper and then the Fast Company article called “Is Information Visualization the Next Frontier for Design?” here by Michael Cannell.

These two write ups point out the difference between a researcher’s approach to the role of visualization and a journalist’s approach to the subject. In a nutshell, visualization works in certain situations. In most information retrieval applications, visualization is not a benefit.

The Fast Company approach hits on the eye candy value of visualization. The title wisely references design. Visualization as illustration can enhance a page of text. Visualization as an aid to information analysis may not deliver the value desired.

Which side of the fence is right for your search system? Selective use of visualization or eye candy? The addled goose favors selectivity. Most visualizations of data distract me, but your response may be different. Functionality, not arts and crafts, appeal to the addled goose.

Stephen Arnold, May 7, 2009

Microsoft and the Twitter Imperative

May 7, 2009

I found Nicholas Carlson’s “Microsoft Must Buy Twitter” here an interesting analysis. My thought was that deal makers would have a day at the State Fair if Apple, Google, and Microsoft began a bidding war over Twitter.com. Mr. Carlson offers five reasons why Microsoft has a Twitter imperative. I can’t reproduce the five points here, but I can comment on two of them and invite you to navigate to Mr. Carlson’s article to get the full story.

Mr. Carlson suggests that Microsoft can’t make its dream of attending Google’s funeral a reality with Yahoo as its principal weapon. Microsoft needs the T bomb; that is, the Twitter user base, buzz, and monetization opportunity. I find the idea intriguing, but Microsoft has not made any progress in closing the gap between itself and Google in Web search. Now the GOOG is aiming at Microsoft’s enterprise business. Twitter could be, in my opinion, an expensive distraction that leaves Microsoft vulnerable in a business sector it can ill afford to see slip downhill.

Twitter is, Mr. Carlson implies, will get more expensive. So, buy now and save. Twitter is definitely hot at this moment. The challenge for a company like Microsoft is to acquire something hot and then prevent it from getting cold. Hot properties in the hands of big, slow moving entities often lose their zippiness. My hunch is that if Microsoft owned Twitter, Twitter would be surpassed by another real time messaging service and quickly.

Microsoft may buy Twitter. That opens the door for another Twitter and Microsoft is poorer and more vulnerable as a result. Microsoft needs to leapfrog to stay where it is. An acquisition won’t do the job in my opinion.

Stephen Arnold, June 7, 2009

Open Text Vignette: Canadian Omelet

May 6, 2009

A happy quack to the reader who alerted me to this big buck ($300 million plus) deal for Open Text to purchase the financially challenged content management vendor Vignette. You can read the Canadian press take on the announcement here. This story is hosted by Google, so it may disappear after a short time. I recall seeing a story by Matt Asay in August 2008 here that the two companies were talking. Well, the deal appears to be done. Open Text is a vendor with a collection of search systems, including Tim Bray’s SGML engine, Information Dimension’s BASIS, the mainframe centric BRS search, the Fulcrum system, and some built in query systems that came with acquisitions. Vignette, on the other hand, is complex and expensive content platform. The company has some who love it and some like me who think that it is a pretty messy bowl of clam linguini. The question is, “What will Open Text do with Vignette”?” Autonomy snagged Interwoven, snagged some up sell prospects, and fattened its eDiscovery calf. Open Text has systems that can manage content. Can Open Text manage the money losing Vignette? Autonomy in my opinion is pressuring Open Text. Open Text now has to manage the Vignette system and marshal its forces against the aggressive Autonomy. Joining me on the skepticism skateboard is ZDNet’s Stephen Powers. He wrote “Can Open Text Turn the Page on Vignette’s Recent History?” here. He wrote:

The other interesting question raised by this announcement: what to do about the Vignette brand? The press release states that Vignette will be run as a wholly-owned subsidiary. But will Open Text continue to invest in what some argue is a damaged brand? Or will they eventually go through a rebranding, as they did with their other ECM acquisitions, and retire the purple logo? Time will tell.

Mr. Powers is gentle. I think the complexity of the Vignette system, its money losing 2008, and the push back some of its licensees have directed at the company. Does Open Text have the management skill and the resources to deal with integration, support, and product stability issues? Will Open Text customers grasp the difference between Open Text’s overlapping product lines?

My hunch is that this deal is viewed by Autonomy as an opportunity, not a barrier.

Stephen Arnold, June 6, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta