Cazoodle: Semantic Search
April 3, 2009
A happy quack to the reader who sent me a link to Euwyn’s “Cazoodle – Semantic Data-aware Search” here. Developed by Chambana wizards, Cazoodle “looks to create semantic data-aware search for various verticals, starting with apartments, events, and shopping (electronics, for the most part).” Euwyn makes clear that Cazoodle is a vertical search engine; that is, the content focuses on a specific topic such as apartments. Cazoodle said:
[It is] a startup company from the University of Illinois at Urbana-Champaign (UIUC), aims to enable “data-aware” search– to access the vast amount of structured information beyond the reach of current search engines. The company is co-founded by Prof. Kevin C. Chang and his research team of graduate and undergraduate students, with the support of the University and technology transfer from the MetaQuerier research at UIUC. Cazoodle is located at EnterpriseWorks, an incubator facility of the University, on the Research Park of UIUC in Champaign, Illinois.
The company seems to be going in the same direction as Classifieds.com, a Web start up that I found quite interesting. Cazoodle delivers a “semantic data-aware search.” I ran a query for an apartment in Urbana, where I worked on my PhD many years ago. The Cazoodle results looked like this:
The service looks interesting, demonstrating that dataspaces can be useful. I detected a few Google influences as well. Click here to try the beta search.
Stephen Arnold, April 3, 2009
Amazon Embraces Hadoop
April 2, 2009
The fleet footed Amazon surprised me. I read Larry Dignan’s Amazon Launches Hadoop Data Crunching Service” here. What interested me was Amazon’s use of the Hadoop framework. According to Mr. Dignan’s write up,
The service, called Amazon Elastic MapReduce, is designed for businesses, researchers and analysts trying to conduct data intensive number crunching (statement). Hadoop, which is used by companies like Google and Yahoo, is trying to be pushed into the enterprise data center by startups like Cloudera.
I found this interesting for three reasons:
- Amazon has consistently beaten Google to the punch in the cloud computing push for developers and startups. Google has, in my opinion, watched from the sidelines.
- Google influenced the Hadoop system, which is a variant of the Google MapReduce system. You can find a description in my The Google Legacy (2005) here.
- Amazon, despite its early somewhat unusual approach to infrastructure, has gotten its act together. The clearest indication of this is that the company can integrate new technology into its existing data centers and not go down.
In my view, Amazon is making the transition from digital retail operation to a more serious online force.
Stephen Arnold, April 2, 2009
Google Leximo Tie Up
April 2, 2009
Leximo is a social dictionary; specifically, “a Multilingual User Collaborated Dictionary that lets you search, discover and share your words with the World.” Google snapped up the company. You can read the Leximo manifesto here. One of the tenets is:
Open community-based and user-friendly functions promote participation, accountability and trust.
What’s Google need a dictionary for? In my opinion, the GOOG wants a flow of new words plus definitions to fatten up its existing knowledgebases. I am confident the idealism of Leximo will persist at the GOOG.
Stephen Arnold, April 2, 2009
Digital Video Delivery Cost
April 2, 2009
Short honk: ZDNet Blogs ran a short item called “$400 Mln Spent on Delivering Video via CDNs in 2008” here. Note: the link to the story is longer than the news item. So, $400 million spent hosing digital video. Question: who pays for this stuff? Not me. I prefer text which allows me to acquire information quickly, not at a fixed speed in linear streams. Will Google continue to subsidize YouTube.com? Stakeholders may want some of that money returned as dividends or invested in services that return a profit. Just my opinion and I am not a video person.
Stephen Arnold, April 2, 2009
QuePlix: Legacy Data Search
April 2, 2009
Several years ago I listened to a presentation from Index Engines. The company developed an appliance that sat in a back up stream. The idea was that an authorized user could search for a document processed by the back up system. I thought the idea was an interesting one. A number of eDiscovery firms address the legacy data issue via other methods. Today the organization wanting to query legacy information has a number of options.
QuePlix offers a search system for legacy data. Troy Dreir’s “QueSearch: A Search Engine for your Legacy Data” here alerted me to another vendor in this market space. Mr. Dreir wrote:
QuePlix has just released the second of its platform-agnostic programs which are each designed to retrieve information from legacy applications. The first solution was QueWeb, which not only extracts legacy application metadata, but then builds a user interface on top of it. The allows the company to make a transition toward new applications while still using the data from legacy apps. Because it’s based on existing systems, there’s no need to train staff on how to use it and that allows for a smoother migration. The program’s simplicity and usefulness translates into a huge ROI, Tenberg says. QueWeb was launched in 2001 and is already up to its third version.
I did have some information in my files about this company. The key points I had noted when I got a demo in 2007 included:
- The company is a Google partner so there’s an integration capability available to its customers
- Customers can use QuePlix’s cloud option and shift some of the hassles to hosted services such as Amazon’s S3
- A white paper provides more detail. You can get it here.
More information as I locate it.
Stephen Arnold, April 2, 2009
Traditional Journalism Is Dead, Well, Not Exactly
April 1, 2009
Short take: the Huffington Post has a way to keep investigative journalism alive. I hope so. Since Gannett bought the Courier Journal & Louisville Times Co. in 1986, the investigative and the journalism have disappeared from the newspaper. Click here to read “Huffington Post Launches Investigative Journalism Venture” here. I think this warrants close observation. Great idea.
Stephen Arnold, March 31, 2009
Cutting Information Technology Costs
April 1, 2009
I read CIO Magazine’s “Five Things You Need to Know: Budget Cutting” here and realized that there are journalists and their are more sophisticated financial types. Before you read the article, click here and learn how Qantas slashed its information technology costs.
Now you can romp through the tips and come away vulnerable to the machinations of a cost analyst. These tips provide a sense of false security which may give way to some even bigger organizational realignments in certain situations. One realignment may be suggesting the person who followed these tips find his or her future elsewhere. In short, there’s more to cost cutting that the tips suggest.
I don’t feel comfortable parroting the five tips. I want to highlight one and offer a few comments. Let’s look at item three:
Break down exactly how you spend your budget. Start by identifying your expenses. A CIO might know she has employees with BlackBerrys, but how much is each person spending on their phone bills? With more specifics, you can set more accurate goals for saving.
Sounds pretty safe, right? The problem is that analysis of a segment of a budget means that the numbers reflect the person’s understanding of costs. Most information technology managers are clueless about the secondary costs triggered by routine information technology actions.
Let’s look at one example. A marketing team must produce a proposal. The marketers named Trent and Wendy need to output the document in two forms: PDF and to an ftp server. The client wants to get ftp access on a Monday and receive the FedEx package with the six copies of the proposal no later than Tuesday, before 5 pm. The Trent and Wendy duo understand the PDF part, but neither knows how to move files to the organization’s ftp server. Trent calls the information technology department and asks for help. The IT person says, “We’ll be there before 2 pm.” The IT person does not arrive. At 5 pm, Wendy calls. The phone rings in space. Trent and Wendy call a consultant, explain the problem, agree to a fee for the ftp part of the job, and complete their work. The cost for the ad hoc consultant is lost in the organization’s budget. The IT manager and the CIO are clueless about the cost their unit created.
Sound familiar?
The problem with the recommendations is that the actions will not return a comprehensive picture. A true cost analysis will surface dependent costs and indirect costs, not just the obvious direct costs. Once tallied, these costs can be tracked back to the root cause of the over or under run.
In my experience, that’s too much work. As I learn about companies going out of business, the casual approach to cost analysis bites back and bites hard. Read the five tips. Track down a cost analyst and enlist his or her help. The present financial climate does not look kindly on those involved in projects such as search. It’s easy to blow through $500,000 in a matter of three or four months and have a non functioning system. Undisciplined thinking about enterprise systems and their costs can crush a career and an organization. The five tips omit that point.
Stephen Arnold, April 1, 2009
Passwords List
April 1, 2009
Short honk: you can get three lists of common passwords here. These lists often come in handy when filtering government information prior to putting documents online. ArnoldIT.com has used this method for years. If you are indexing an organization’s documents, you might want to filter test your corpus. Might be helpful.
Stephen Arnold, April 1, 2009
Microsoft Search Community Toolkit
March 31, 2009
SharePoint search is like Baskin-Robbins. Lots of flavors. To create the ice cream treat that suits your taste, you need a selection of toppings. A reader sent me a link to a list of code snippets contributed by SharePoint faithful. You may want to click here and peruse what’s available. Some of the choices:
- SharePoint WildCard Search Web Part http://www.codeplex.com/WildcardSearch
- Faceted Search V2 – http://www.codeplex.com/FacetedSearch
- Smart Search for SharePoint – http://www.codeplex.com/smartsearch
Not an April 1, 2009, spoof.
Stephen Arnold, April 1, 2009
Teen Codes
March 31, 2009
Short honk: I know among my two or three readers I have at least one person with a teenager. For this person, I want to point out “50 Sexting/IM Acronyms Every Parent/Teenager/Person Should Know” here. For example, WYCM? Might be useful when trying to sort out the Twitter thing young folks do. Not an April 1, 2009 joke.
Stephen Arnold, April 1, 2009