Open Source XML Flaws

August 6, 2009

I want to stay out of the squabble between commercial software vendors (Microsoft, Computer Associates, and others) and the open source crowd (Lucene, Drupal, and Eclipse, among others). I do want to call your attention to the story in ITWorld “XML Flaw Threatens Apps Built with Sun, Apache, Python Libraries”. Microsoft, no tyro in the world of online security, has suggested that it fears Linux and maybe other open source software as well. If this security story by Ellen Messmer is spot on, the open source crowd may have some explaining to do. If there are flaws, are there backdoors? What other surprises for commercial and governmental entities will open source spring on the unaware. What are the implications for XML and its wide use? Will venture firms become more cautious about funding open source plays with consulting as the main engine of financial growth? Worth a read for sure.

Stephen Arnold, August 6, 2009

Online Trail

August 6, 2009

Short honk: Here is an example of the type of online trail that one leaves. Note that the Google search history was obtained by fiddling with user name and password. The link may  be removed. I verified it at 10 pm Eastern on August 6, 2009.

Stephen Arnold, August 6,2009

Bing and Censorship

July 20, 2009

Short honk: A reader alerted me to the Bing.com filter that chops out certain content and creates a collection of a mini vertical search engine for segmented content. The filter is now applied to X rated content. You can read about the filter in Network World’s story “Bing Gets Porn domain to Filter Out Explicit Images and Videos”. There are a number of complicated issues in play. The present solution creates an interesting revenue generating opportunity for Bing.com. Will Microsoft exploit it? I wonder how different this type of filtering from the Amazon filtering of certain content?

Stephen Arnold, July 20, 2009

Profit in Data Theft

July 8, 2009

I read Dancho Danchev’s “Microsoft Study Debunks Profitability of the Underground Economy” here. I have not read the Microsoft study, but I want to recommend both documents to my addled geese and you, gentle reader. The arguments strike me as germane to online information and data. Mr. Danchev’s view is that  there is money is underground endeavors. Check out the article. My question is, “If there weren’t money in underground plays, why are there so many efforts?”

Stephen Arnold, June 7, 2009

Performance Fireworks: Microsoft Fast Fizzles, Google Explodes

July 4, 2009

I was sitting in an airport, and I clicked on a link for Microsoft Fast ESP. A video ran and presented me with a couple of professional fellows talking about Microsoft Fast search. The video was interesting, but I went back and snagged one screen frame from the presentation because it struck me as a way to explain the distance between the performance of Microsoft Fast and the performance of Google’s system. Now performance data for search systems is a murky area. I don’t want to get into a squabble about something being five times faster. The difference here makes a point, and I will leave it to Googlers and Microsofties to post corrected performance data in the Comments section of this Web log, assuming those companies’ professionals have time to read the thoughts of the addled goose.

First, the Microsoft data. Here’s the screenshot, and I want you to notice that the performance that is presented is five to 20 queries per second. That is pretty modest for a performance threshold even for a Microsoft team in Charlotte, North Carolina, where I have heard the pace of life is on par with Harrod’s Creek.

fast performance

Source: http://www.youtube.com/ watch?v=kTbcCNby8xE

I ask you to click here to look at the performance data I calculated for Google. The key point is that if the Google data are reasonably accurate, the Google is cranking along about about 1,700 queries per second. Even Yahoo appears to perform better than Microsoft Fast. See my write up here.

That’s a big gap. Assume the Google data are off by a factor of four. The Google is handling 400 queries per second. If we boost the Microsoft Fast performance by a factor of four to 20 queries per second to 80 queries per second, the Google appears to be the speed demon.

If you want performance fireworks, my thought is that the Google is the fire cracker if the data are correct.

Stephen Arnold, July 4, 2009

Monitoring, Snooping, and Search

July 2, 2009

Every time I mention to an audience of information professionals the value of monitoring information flow, I see lots of rolling eyes and disgusted looks. Too bad. Snooping, monitoring, and search are fast friends. Don’t believe it? Click here and read “IT Staff Snooping on Colleagues on Rise: Survey”. Tarmo Virki summarized a number of data points. Among those that I found interesting was this factoid: One third of IT professionals abuse administrative passwords.The findings of the Cyber Ark study are almost identical to a study run in 2008. The passage that jumped out at me was:

Cyber-Ark said the most common areas respondents indicated they access are HR records, followed by customer databases, M&A plans, layoff lists and lastly, marketing information.

Troubled? Concerned? More information appeared in the original article. Accurate? A spoof?

Stephen Arnold, June 14, 2009

How to Avoid Enterprise Social Network Sin

July 2, 2009

Network World’s “Seven Deadly Sins of Social Networking Security” reminded me of the assurances about the security of social networks for the enterprise. I did not believe their assurances, and after reviewing Bill Brenner’s article, I wonder how long it will be before the hyperbolists accept some grim realities. One of these is that where humans are involved, security is actually up in the air, maybe non existent.

Mr. Bremmer wrote:

By sharing too much about your employer’s intellectual property, you threaten to put it out of business by tipping off a competitor who could then find a way to duplicate the effort or find a way to spoil what they can’t have by hiring a hacker to penetrate the network or by sneaking a spy into the building.

Yep, humans. His two page article runs through a number of actions that individuals can take to button up security loopholes.

My take: social networks in the enterprise can create some exciting situations. He does not dig into the legal and life threatening issues, preferring the more tame world of legal liability. Not me. I think that social networks can create a world of excitement for pharma companies and intelligence professionals. I don’t have an answer. The 20 somethings just point out that I am an old addled goose and the vulnerabilities multiple like gerbils.

The notion of real time search of posted social comments fresh from Intranets is quite interesting, however.

Stephen Arnold, July 1, 2009

SEO Mavens Embarassing Themselves

June 29, 2009

I am not sure if search engine optimization is as fraught with risks as hooking up with Nigerian email scammers, but SEO may be getting close. I am not sure what business www.absoluteSEO.net is in, but the addled goose plans to steer clear. Two reasons:

First, navigate (at your own risk, please) to Prudent Press Agency (great name that). Read the story “Addition of Advanced SEO Services in AbsoluteSEO.net”. The article was stuff with silly generalizations and claims that struck this addled goose as wacky. But, hey, that is what makes SEO such a tasty sector for those with a good nose for an easy buck, euro, or eek. Consider this passage:

In AbsoluteSeo there has been made addition of advanced SEO Services to beat up the competitors. In this era of stiff competition every big or small company wish to have a website of its own which helps in boosting up the business. But only having the website does not solve the purpose, but it need to be perfect in every aspect as only then it will top in the search engines and consumer will be able to reach the site easily. For this, AbsoluteSeo has introduced the latest SEO Services which will help to optimize the website in each and every way.

Well reasoned for some but not the addled goose.

Second, the browser I am using flaged www.AbsoluteSEO.com as a reported attack site. Here’s the message I saw when I poked around this online offering:

attack

I received a call from a journalist working on an SEO story. I mentined that I thought SEO was mostly baloney sold to those who could not create substantive content or who lacked the insight needed to provide surfers with useful services. He thought I was in the minority because some of the high profile “search experts” were on board with various methods, statistical tools, and proprietary techniques.

Baloney. In this goose’s opinion, as the economy declines, the cream of the scammers rises like the nasties in the Harrod’s Creek mine run off pond.

Stephen Arnold, June 29, 2009

Library Teaches Search – More Instruction Needed

June 22, 2009

My recollection is that libraries taught search as far back at 1980. I recall that either database vendors would run demonstrations or that librarians skilled in the use of online would provide guidance to those who asked. I recall running a class in ABI/INFORM at Chicago Public Library and there was an overflow crowd of both staff and research minded patrons. I was delighted, therefore, to see an article in the Sacramento Bee that described the Sutter Library’s classes in finding health and medical information online. The class is a reminder to me that:

  1. Librarians and information professionals often know how to search and have an interest in sharing that knowledge
  2. Patrons are smart enough to know that despite the marketing hype and the pundits’ assertions that search is a “done deal” additional instruction attracts people and finds its way into The Sacramento Bee

We have a long way to go before information professionals will be relics of a long gone time. The people who tell me that they “know how to search” and “can locate almost anything online” are kidding themselves. I think I am a reasonably good researcher. But if you spend time monitoring how I find information, you will learn quickly that I turn to experts who make my search skills look primitive. Even my nifty Overflight system pales with the type of information that my research team generates by:

  • Knowing what content is located where
  • Understanding the editorial method behind or absent from certain online systems
  • Leveraging hard-to-manipulate resources such as information from government repositories, specialized services, and individual experts.

I would like to see more libraries move aggressively into online instruction, market those programs, and raise the level of expertise. Most of the people who claim to be experts at search are clueless about how bad their skills are. Among the worst offenders are self appointed search experts who have trouble figuring out when something is likely to be baloney and when something is just plain wrong. Enterprise search, content management, and text mining are three disciplines where better research will be most beneficial in my opinion. Then we need critical thinking skills. Schools have dropped the ball. Maybe libraries can help in this area as well? Search procurement teams will be well served if the team has one or more librarians in the huddle.

Stephen Arnold, June 22, 2009

Wolfram Alpha Update: Not a Search Engine

June 10, 2009

The world wants a company to compete with Google. I survived the Cuil.com hype. Then I endured the Wolfram Alpha début. Then a drop off. Wolfram Alpha owes Wired Magazine a hug or maybe a double truck color ad. Ryan Singel’s “Wolfram Adds Updates, Still Not a Search Engine” here to keep the company in the search game. Ooops, Mr. Singel pointed out that Wolfram Alpha is not a search engine. He wrote:

In this first update, the engine is adding some new data sets, changing how it handles comparisons between units that can’t be compared, and tweaking how it handles what users type into the search box. It’s also fixing some touchy issues about geography, including the borders of India and China and “naming for certain politically sensitive countries and regions.” For certain queries, it’s unbeatable. Try for instance, “tides Santa Cruz tomorrow” or “odds three aces” or “weather san francisco march 14 2008.”

Wisely Mr. Singel included some sample queries. I have spoken with a number of people about the Wolfram Alpha search system. One theme was that queries returned no useful data or a cute response. The naked search box and formulating a query that makes use of the Mathematica engine may be at odds with what the folks with whom I spoke want in a a finding system.

My hunch is that Web search systems have to deliver the goods to those who sit smack in the middle of a Gaussian distribution. A bit of a drift right may not be too harmful, but aiming at the outliers is not going to make Jim and Jerry Normal happy with their search results.

Several observations:

  1. Bing.com may be closer to what the folks splat in the middle of a normal probability density function require. Wolfram Alpha is too abstract for that crowd. The notion of the central limit theorem has to be delivered to the user no matter what wackiness is typed in the search box.
  2. Wolfram Alpha requires the user to think. Google, on the other hand, does not. Which of the two systems has more math is a topic for a bar argument. In terms of market share, the GOOG’s approach wins hands down.
  3. The sadness of those who must explain that Web search engines intended to nibble at Google’s scales and claws is becoming more bittersweet. Those writing about Web search may want to include anigifs of tears running down the page because contenders can’t last one full round of mixed martial arts against the Google.

The good news is that Wolfram Alpha was the subject of a follow up story in Wired. Poor Cuil.com seems to have dropped off the edge of the “wired” earth. Kosmix? Where’s Kosmix? ChaCha? What happened to ChaCha?

Stephen Arnold, June 10, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta