Monitoring, Snooping, and Search

July 2, 2009

Every time I mention to an audience of information professionals the value of monitoring information flow, I see lots of rolling eyes and disgusted looks. Too bad. Snooping, monitoring, and search are fast friends. Don’t believe it? Click here and read “IT Staff Snooping on Colleagues on Rise: Survey”. Tarmo Virki summarized a number of data points. Among those that I found interesting was this factoid: One third of IT professionals abuse administrative passwords.The findings of the Cyber Ark study are almost identical to a study run in 2008. The passage that jumped out at me was:

Cyber-Ark said the most common areas respondents indicated they access are HR records, followed by customer databases, M&A plans, layoff lists and lastly, marketing information.

Troubled? Concerned? More information appeared in the original article. Accurate? A spoof?

Stephen Arnold, June 14, 2009

How to Avoid Enterprise Social Network Sin

July 2, 2009

Network World’s “Seven Deadly Sins of Social Networking Security” reminded me of the assurances about the security of social networks for the enterprise. I did not believe their assurances, and after reviewing Bill Brenner’s article, I wonder how long it will be before the hyperbolists accept some grim realities. One of these is that where humans are involved, security is actually up in the air, maybe non existent.

Mr. Bremmer wrote:

By sharing too much about your employer’s intellectual property, you threaten to put it out of business by tipping off a competitor who could then find a way to duplicate the effort or find a way to spoil what they can’t have by hiring a hacker to penetrate the network or by sneaking a spy into the building.

Yep, humans. His two page article runs through a number of actions that individuals can take to button up security loopholes.

My take: social networks in the enterprise can create some exciting situations. He does not dig into the legal and life threatening issues, preferring the more tame world of legal liability. Not me. I think that social networks can create a world of excitement for pharma companies and intelligence professionals. I don’t have an answer. The 20 somethings just point out that I am an old addled goose and the vulnerabilities multiple like gerbils.

The notion of real time search of posted social comments fresh from Intranets is quite interesting, however.

Stephen Arnold, July 1, 2009

SEO Mavens Embarassing Themselves

June 29, 2009

I am not sure if search engine optimization is as fraught with risks as hooking up with Nigerian email scammers, but SEO may be getting close. I am not sure what business www.absoluteSEO.net is in, but the addled goose plans to steer clear. Two reasons:

First, navigate (at your own risk, please) to Prudent Press Agency (great name that). Read the story “Addition of Advanced SEO Services in AbsoluteSEO.net”. The article was stuff with silly generalizations and claims that struck this addled goose as wacky. But, hey, that is what makes SEO such a tasty sector for those with a good nose for an easy buck, euro, or eek. Consider this passage:

In AbsoluteSeo there has been made addition of advanced SEO Services to beat up the competitors. In this era of stiff competition every big or small company wish to have a website of its own which helps in boosting up the business. But only having the website does not solve the purpose, but it need to be perfect in every aspect as only then it will top in the search engines and consumer will be able to reach the site easily. For this, AbsoluteSeo has introduced the latest SEO Services which will help to optimize the website in each and every way.

Well reasoned for some but not the addled goose.

Second, the browser I am using flaged www.AbsoluteSEO.com as a reported attack site. Here’s the message I saw when I poked around this online offering:

attack

I received a call from a journalist working on an SEO story. I mentined that I thought SEO was mostly baloney sold to those who could not create substantive content or who lacked the insight needed to provide surfers with useful services. He thought I was in the minority because some of the high profile “search experts” were on board with various methods, statistical tools, and proprietary techniques.

Baloney. In this goose’s opinion, as the economy declines, the cream of the scammers rises like the nasties in the Harrod’s Creek mine run off pond.

Stephen Arnold, June 29, 2009

Library Teaches Search – More Instruction Needed

June 22, 2009

My recollection is that libraries taught search as far back at 1980. I recall that either database vendors would run demonstrations or that librarians skilled in the use of online would provide guidance to those who asked. I recall running a class in ABI/INFORM at Chicago Public Library and there was an overflow crowd of both staff and research minded patrons. I was delighted, therefore, to see an article in the Sacramento Bee that described the Sutter Library’s classes in finding health and medical information online. The class is a reminder to me that:

  1. Librarians and information professionals often know how to search and have an interest in sharing that knowledge
  2. Patrons are smart enough to know that despite the marketing hype and the pundits’ assertions that search is a “done deal” additional instruction attracts people and finds its way into The Sacramento Bee

We have a long way to go before information professionals will be relics of a long gone time. The people who tell me that they “know how to search” and “can locate almost anything online” are kidding themselves. I think I am a reasonably good researcher. But if you spend time monitoring how I find information, you will learn quickly that I turn to experts who make my search skills look primitive. Even my nifty Overflight system pales with the type of information that my research team generates by:

  • Knowing what content is located where
  • Understanding the editorial method behind or absent from certain online systems
  • Leveraging hard-to-manipulate resources such as information from government repositories, specialized services, and individual experts.

I would like to see more libraries move aggressively into online instruction, market those programs, and raise the level of expertise. Most of the people who claim to be experts at search are clueless about how bad their skills are. Among the worst offenders are self appointed search experts who have trouble figuring out when something is likely to be baloney and when something is just plain wrong. Enterprise search, content management, and text mining are three disciplines where better research will be most beneficial in my opinion. Then we need critical thinking skills. Schools have dropped the ball. Maybe libraries can help in this area as well? Search procurement teams will be well served if the team has one or more librarians in the huddle.

Stephen Arnold, June 22, 2009

Wolfram Alpha Update: Not a Search Engine

June 10, 2009

The world wants a company to compete with Google. I survived the Cuil.com hype. Then I endured the Wolfram Alpha début. Then a drop off. Wolfram Alpha owes Wired Magazine a hug or maybe a double truck color ad. Ryan Singel’s “Wolfram Adds Updates, Still Not a Search Engine” here to keep the company in the search game. Ooops, Mr. Singel pointed out that Wolfram Alpha is not a search engine. He wrote:

In this first update, the engine is adding some new data sets, changing how it handles comparisons between units that can’t be compared, and tweaking how it handles what users type into the search box. It’s also fixing some touchy issues about geography, including the borders of India and China and “naming for certain politically sensitive countries and regions.” For certain queries, it’s unbeatable. Try for instance, “tides Santa Cruz tomorrow” or “odds three aces” or “weather san francisco march 14 2008.”

Wisely Mr. Singel included some sample queries. I have spoken with a number of people about the Wolfram Alpha search system. One theme was that queries returned no useful data or a cute response. The naked search box and formulating a query that makes use of the Mathematica engine may be at odds with what the folks with whom I spoke want in a a finding system.

My hunch is that Web search systems have to deliver the goods to those who sit smack in the middle of a Gaussian distribution. A bit of a drift right may not be too harmful, but aiming at the outliers is not going to make Jim and Jerry Normal happy with their search results.

Several observations:

  1. Bing.com may be closer to what the folks splat in the middle of a normal probability density function require. Wolfram Alpha is too abstract for that crowd. The notion of the central limit theorem has to be delivered to the user no matter what wackiness is typed in the search box.
  2. Wolfram Alpha requires the user to think. Google, on the other hand, does not. Which of the two systems has more math is a topic for a bar argument. In terms of market share, the GOOG’s approach wins hands down.
  3. The sadness of those who must explain that Web search engines intended to nibble at Google’s scales and claws is becoming more bittersweet. Those writing about Web search may want to include anigifs of tears running down the page because contenders can’t last one full round of mixed martial arts against the Google.

The good news is that Wolfram Alpha was the subject of a follow up story in Wired. Poor Cuil.com seems to have dropped off the edge of the “wired” earth. Kosmix? Where’s Kosmix? ChaCha? What happened to ChaCha?

Stephen Arnold, June 10, 2009

Google Search Appliance Gains Muscle

June 2, 2009

Update: June 3, 2009: Version 6.0 of the GSA software includes a SharePoint Web part.

Original story below:

If you looked at the line up of the Google Search Appliances on offer in February 2009, you probably noticed that the pricing discouraged organizations from indexing more than 30 million documents per appliance. To scale the system with the GB 7007s and GB 8008s cost millions.

Version 6.0 and a new GB 9009 were announced today. You can read Google’s own write up here. You can download a data sheet here. The features fall under the banner of universal search, but you will need a cheerful authorized reseller or partner to get the most from your GSA.

You can get some other information from several IDG publications, including:

Google today revealed that it has created a GSA on steroids to handle larger indexing jobs. The GB 8008 is no more. The new model is the GB 9009, and it is built on Dell’s PowerEdge R710 platform. Google is not into customer support, so the Dell crowd gets the honor of explaining what to do when a GB 9009 goes south.

The system consists of two components: one for content processing and one for storing the index. Until ArnoldIT.com can get up close and personal with one of these two part set ups, it is not clear what indexing and query processing changes may be necessary.

image

A PowerEdge in gray, wondering if it will be Googley.

We do know that Version 6.0 of the GSA software won’t run on low end or the older GB 8008s. This seems to suggest that an organization can mix and match the GB 7007s and the GB 8008s. If you haven’t been keeping up with the GSA software, Version 6.x gives system administrators more control over security, customization, and hit boosting. In the older versions of the software, Google decided its relevancy was exactly the relevancy the licensee needed. Period. There were some clunky Fast Search & Transfer type workarounds, but Version 6.0 makes the system’s controls a bit more flexible.

gb 9009 painted

The PowerEdge R710 gussied up for the enterprise prom.

Autonomy, Endeca, and other companies were previously able to point out that Google’s enterprise solution was less configurable that other high end systems. That’s still true, but to a lesser degree. Keep in mind that the GSA is not a box of components that can be assembled like Legos. An appliance is designed to eliminate the expensive and time consuming set up, tuning, and customizing that some high end systems permit. The GB 9009 is a search toaster, bigger, faster, and more capable, but still a toaster.

Google’s distribution channel will be selling the two part set up in the morning. I don’t want to estimate the cost of the GB 9009. Google has a fuzzy wuzzy approach to some pricing, and it is better to wait until the authorized resellers close some deals for the gizmos and the “street price” becomes clearer. My hunch is that the Dell gear will up the cost. With GB 8008s coming out of the blocks at $659,000 for a 15 million GB 8008 with two years of support and about $300,000 for a fully supported hot back up GB 8008, the GB 9009 will be in the same ballpark.

What’s interesting to me is that these prices convert to about what a fully loaded enterprise search system license with customization can cost from one of the blue chip search vendors. Expensive to perform search, isn’t it. I wonder why the actual cost of industrial strength search is not included in the reports from the azure chip consulting firms or those who witlessly insist that “search is simple. Yes, a no brainer.”

I look for another upgrade early in 2010., At that time, the blue chip vendors will have to start sweating the fact that Googzilla is finally getting serious about the enterprise search market. One indication of the shift is that the GB 1001 is a goner and that the the new software won’t work with even numbered GSAs.

Stephen Arnold, June 2, 2009

Search Archaeology

May 30, 2009

I find it amusing to look at articles about search, content processing and text mining. Perhaps I am tired or just confused. The past to me stretches back to cards with holes and wire rods and to the original NASA RECON system. For Computer Active, the past stretches all the way back to Lycos. You may find this revisionist view of history interesting. Click here to read “Bunch of Fives: Forgotten Search Engines.”

Let me comment of the five search engines, adding a bit of addled goose color to the authors’ view of search:

  • Cuil.com. Cuil is a product of a Googler (Anna Patterson), her husband, and some other wizards. The company had a connection to Google. Dr. Patterson’s patents are still stumbling out of the USPTO with Google as an assignee. Xift, Dr. Patterson’s search system, was not mentioned in Computer Active. It was important for its semantic method and it exposed Dr. Patterson to the Alta Vista team. Alta Vista played some role in Google’s rise to success and its current plumbing. Cuil has improved, and I thought I saw a result set including some Google content before the system became publicly available. I use Cuil.com, and I am not sure if “forgotten” is a good word for it or its technology.
  • MSN Live. I have lost count of Microsoft’s search systems. Microsoft search initiatives have moved through many iterations. The important point for me is that Microsoft is persistent. The search technology is an amalgamation of home grown, licensed, purchased, and reworked components. The search journey for Microsoft is not yet over. Bing is a demo. The rebuild of Fast as a SharePoint product is now in demo stage but not yet free of its Web and Linux roots. More to come on this front and, believe me, Microsoft search is not forgotten by Google or others in the search business.
  • Alta Vista. Yep, big deal. The reason is that Alta Vista provided the Googlers with a pool of experienced and motivated talent. The job switch from the hopelessly confused Hewlett Packard to the freewheeling Google was an easy one. Alta Vista persists today, and I still use the service for certain types of queries. What’s interesting is that Alta Vista may have been one of the greatest influences on both Google and Microsoft. Again. Not forgotten.
  • Lycos. We sold our Point system to Lycos, so I have some insight into that company’s system. The key point for me is that Fuzzy and his fellow band of coders from Carnegie Mellon sparked the interest in more timely and comprehensive Web search. Lycos was important at a sparkplug, but the company was among the first to add some important index update features and expanded snippets for each hit. Lycos has had a number of owners, but I won’t forget it. When we sold Point to the outfit, the check cleared the bank. That I will remember along with the fact that architectural issues hobbled the system just as the Excite Architext system was slowed. These are search as portal examples today.
  • Ask Jeeves. I can’t forget. One of the first Ask Jeeves execs used to work at Ziff. I followed the company’s efforts to create query templates that allowed the system to recognize a question and then deliver an answer. The company was among the first to bill this approach “natural language” but it wasn’t. Ask Jeeves was a look up service and it relied on humans to find answers to certain questions. Ask.com is the descendent of Ask Jeeves’ clunky technology, but the system today is a supported by ace entrepreneur Barry Diller who, like Steve Ballmer, is persistent. The key point about Ask Jeeves is that it marketed old technology with a flashy and misleading buzzword “natural language”. Marketers of search systems today practice this type of misnaming as a standard practice. Who can forget this when a system is described one way and then operates quite another.

Enjoy revisionism. Much easier in a Twitter- and Facebook-centric world with a swelling bulge of under 40 experts, mavens, and pundits. These systems failed in some ways and succeeded in others. I remember each. I still use each, just not frequently.

Stephen Arnold, May 31, 2009

Social Search and Security

May 27, 2009

Might these terms comprise an oxymoron? Some organizations are plunging forward with social networking, social search, and open collaboration. You may find Vanessa Ho’s “Risks Associated with Web 2.0” here a useful article. She summarizes the results of a study by an outfit called Dynamic Markets conducted for WebSense. With a sample of 1,300 business professionals, the report contained some interesting information. This statement from the article struck a chord in me:

“The thing about the web is once it is out there, it is out there [forever],” Meizlik [a WebSense executive’] noted. Other findings of the survey include 80 per cent of respondents reported feeling confident in their organizations web security, despite the fact that the numbers show that they are ill-equipped to protect against Web 2.0 security threats. For example, 68 per cent do not have real-time analysis of web content; 59 per cent cannot prevent URL re-directs; 53 per cent do not have security solutions that stop spyware from sending information to bots; 52 per cent do not have solutions to detect embedded malicious code on trusted websites; and 45 per cent do not have data loss prevention technology to stop company-confidential information from being uploaded to sites like blogs and wikis, hosted on unauthorized cloud computing sites or leaked as a result of spyware and phishing attacks.

I learned from a chirpy 30 year old conference manager last week that security is not an issue of interest to that conference attendee audience. Yep, those 30 somethings set me straight again.

Stephen Arnold, May 28, 2009

Useful SQL Injection Info

May 27, 2009

At Los Alamos National Lab several years ago, a fellow speaker for an in-house conference gave a brilliant analysis of SQL injection. The talk was not made public. I came across a Bitpipe white paper from Breach. I have a Bitpipe user name and password, so locating the document was no problem. If you don’t have access to Bitpipe, click here, fill out the form, and download the seven page document. Useful.

Stephen Arnold, May 27, 2009

EDI Data Transformation

May 27, 2009

Most of the mavens and pundits write about a handful of search vendors. Not me. I grub around in the dark and often very important corners of search. If you have to transform data for EDI in an XML environment, you will find Alex Woodie’s “MegaXML Looks to Drive Expense Out of EDI” here useful. Mr. Woodie describes a new product, which if it works as described can eliminate some sleepless nights and a long weekend or two. The article describes Task Performance Group’s MegaXML utility. For me, the key passage in the article was:

Task Performance Group launched MegaXML a decade ago to take advantage of the flexibility of XML. On the front end, the Windows-based product can generate and send EDI documents, such as purchase orders and invoices, over VANs or the Internet using protocols like AS2. And on the backend, MegaXML can translate EDI documents to the format needed for specific platforms, such as flat files for AS/400-based ERP systems on DB2.

MegaXML has a hybrid or semi-cloud option that may be worth investigating. Mr. Woodie wrote:

With the outsourcing option, MegaXML will reside on a Windows server in Task Performance Group’s data center near Chicago. After mapping the EDI documents to the customer’s systems (a process that takes a few days), the customer will upload and download documents to the MegaXML data center using Secure FTP (S/FTP). MegaXML, in turn, will handle the translation to EDI formats and the distribution via AS2 or another method.

Data transformation consumes a significant portion of an information technology group’s time and budget. MegaXML may be a partial solution in some situations. More information is available at www.megaXML.com

Stephen Arnold, May 27, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta