IBM and Online Shopping
November 12, 2009
I was quite surprised about IBM’s technology to enrich online shopping. Navigate to “New IBM Software Enriches Online Shopping” and learn about “new that delivers a personalized and more interactive shopping experience for the exploding population of mobile users worldwide. The software incorporates new social networking capabilities and the ability for retailers to reach consumers with personalized promotions, coupons and other content, regardless of how or where the customer chooses to shop with them.”
The technology is part of WebSphere Commerce. The news story asserts:
WebSphere Commerce comes with and leverages the strength of the IBM WebSphere Application Server and DB2 to attain high transaction volumes, reliable, and highly available operation as well as the integration to back-end systems and applications using SOA interfaces. WebSphere Commerce 7 includes new out-of-the-box integrations to social networking offerings such as IBM Lotus Connections, Bazaarvoice and Pluck SiteLife. Recognized as an industry leader, IBM® WebSphere® Commerce software provides companies of all sizes with a powerful customer interaction platform for cross-channel and online commerce, supporting all of a company’s business models while providing a rich, differentiated customer experience. Powerful out-of-the-box capabilities for marketing, catalog management and merchandising help companies revolutionize customer shopping experiences across all sales channels from online and call centers to mobile and in-store.
I dug back through my search archive and noticed references to IBM’s use of Endeca for its online shopping service. I had anecdotal information that indicated IBM had dallied with Fast Search & Transfer’s technology for eCommerce as well.
This announcement does not compute for me. Take a look at this online site, http://www.ibm.com/products/shop/software/in/en/ This is a subsite that allows me to “Shop for Software”. I click on the “view software pricing and buy link”. This is what I see:
An IBM shopping site. Try to buy STAIRS or SearchManager. I couldn’t figure out how to do it.
There are three choices plus some suggestions in the left hand and right hand panels. There are two search boxes. One does not search for software to buy. The user searches everything in the subsite’s index. In fact, there is no way to search for software like FileNet from this shopping page. Sure, I can search for FileNet from the search box, but that results list jumbles documentation, marketing collateral, and other extraneous information.
I can search by “business need”. I guess that’s okay if the potential customer knows what problem is going to be solved by the nebulous “need”. With regard to FileNet, what does that product do? Well, it does quite a number of functions. So “product category” is fuzzy to me. The notion of searching by name is good. My problem is that I am not sure of what some IBM products are now called. One example is STAIRS. So I search the “S” listings and see mostly products that begin with an “R”. At the bottom of the list are the products that start with an “S”, but STAIRS is not a product. To find it, I search for STAIRS in the Web site search box. I get this hit, but I don’t know how to buy it because Search Manager is not listed as a product in the shopping subsite.
My opinion is that this “new” product has not been given the type of shakedown it requires. The subsite used in this example reminds me of Endeca’s “guided navigation” method. Either the IBM shopping site is running Endeca or a service that sure looks a lot like Endeca. Either way, I am not sure what’s “new”. The Endeca system is pretty good, but it is about a decade in the oven. If the system is IBM’s, I wonder why the site does not manifest some of the new features such as those I see when I use Amazon.com.
More information is available at www.ibm.com/websphere/commerce. My hunch is that I will take cautious steps toward the product described in this IBM news release.
Stephen Arnold, November 11, 2009
I want to tell the St Louis Federal Reserve Bank that I was not compensated in pennies which cost more to make than their cash value for this article. Do we need pennies? Do we need this article of mine?
Writing about Online Revenue Is Easier than Generating Revenue from Online
November 12, 2009
The Guardian’s new media group took a kick in the kidney. Navigate to “Guardian News & Media to Cut More Than 100 Jobs”. Note these phrases, please:
- revenues have fallen by a worse-than-anticipated £33m
- the Guardian’s Thursday Technology print section will cease publication we cannot offer clarity about who is leaving and who is redeploying
- If we do the right things now, which I believe we are doing
- the organization should “not be paralyzed by change, but galvanized by change”
- December 9
Yep, easier to write about online than make it generate revenue. Happy run up to Christmas?
Stephen Arnold, November 12, 2009
I got my change back fro Stuart Schram who took $5, paid for lunch, and gave me a bottle of Rooibee Red Tea. I must write the Prospect Police and inform them of this financial transaction as I wrote this blog post.
The Google Treadmill System
November 12, 2009
The Google is not in the gym business. The company’s legal eagles find ways of converting wizard whimsy into patents. The tokenspace suite of patent documents does not excite the “Sergey and Larry eat pizza” style of Google watcher. For those who want to get a glimpse of the nuts and bolts in Google’s data management system, check out the treadmill invention by ace Googler, Jeffrey Dean. He had help, of course. The Google likes teams, small teams, but teams nevertheless. Here’s how the invention is described in U7,617,226, “Document Treadmilling System and Method for Updating Documents in a Document Repository and Recovering Storage Space from Invalidated Documents.”
A tokenspace repository stores documents as a sequence of tokens. The tokenspace repository, as well as the inverted index for the tokenspace repository, uses a data structure that has a first end and a second end and allows for insertions at the second end and deletions from the front end. A document in the tokenspace repository is updated by inserting the updated version into the repository at the second end and invalidating the earlier version. Invalidated documents are not deleted immediately; they are identified in a garbage collection list for later garbage collection. The tokenspace repository is treadmilled to shift invalidated documents to the front end, at which point they may be deleted and their storage space recovered.
There are some interesting innovations in this patent document. Manual steps to reclaim storage space are not the main focus. The big idea is that a digital treadmill allows the Google to perform some magic for content updates. The tokenspace is a nifty idea, but the Google has added the endless chain notion. Oh, and there is scale, compression, and access associated with the invention. You can locate the document at http://www.uspto.gov. In my opinion, the tokenspace technology is pretty important. Ah, what’s a tokenspace you ask? Sorry, not in the blog, gentle reader.
Stephen Arnold, November 11, 2009
I don’t think my AdSense check this month was intended for me to write a short blog post calling attention to a system and method that Google would prefer to remain off the radar. Report me to the USPTO. That outfit pushed the info via RSS to me. So, a freebie.
Deflation or Price War? You Decide
November 12, 2009
Stan Schroder’s “Google Cuts Prices of Cloud Storage, Increases Cap to 16 Terabyte” summarizes a Google pricing action. Mr. Schroeder writes:
We’re talking about extra storage; for example if free storage that comes with Picasa Web Albums or Gmail isn’t enough for you, you can purchase extra storage space for a price. Today, Google is dramatically slashing that price.
Interesting but not as interesting as thinking about the implications of a price cut. The economy remains uncertain. Competition in the buzzy cloud world is increasing. Google chops prices as Amazon did recently and boosts capacity. Are there implications? Sure there are, but the write ups steer clear of the core of this action. My hunch is that it is neither deflation or a price war. I keep thinking about the behavior of the hungry big cats when the herd of gazelles galloped along. Snack time?
Stephen Arnold, November 11, 2009
No one paid me for this observation about nature red in tooth and claw. The goose is a gentle creature, but he will alert the Department of Transportation that this was an uncompensated endeavor. Yikes, I smell hot tar on the information highway.
Exclusive Interview with Exorbyte CTO
November 11, 2009
An exclusive interview with Exorbyte’s founder and chief technical officer, Benno Nieswand, is now available in the Search Wizards Speak series. Exorbyte makes high-performance search software for most databases and structured data format. The company has been expanding in the US market, and it has been attracting quite a bit of attention in the last few months.
In the exclusive interview, Mr. Nieswand said:
Half of our implementations occur in back-office processes, where error-tolerance increases automation rates. One example: A healthcare claims processing center handles inbound documents (40,000 / day) and other processes (like electronic status inquiries) for over 120 health insurance plans. Matching claims with procedure codes, patient records, and other data types can be very difficult to fully automate. They saved 1 Million USD in two years by increasing the automation rates through our error-tolerant data matching with their central data repository. The other half of our implementations are systems with user interaction, like e-commerce search for which we have developed leading search products such as the SearchNavigator, an incremental search AJAX framework.
In the interview he revealed:
Over the last year Exorbyte went 64-bit, which led to a significant increase in speed of our core algorithms. In addition we added improvements in navigation generation (faceted search) and entity extraction. Besides the continuous improvement of our search engine, Exorbyte developed a data quality solution that flexibly handles data quality tasks. It can be applied to processing address enhancement, deduplication, dictionary collection, document processing and more. The underlying MatchMaker search empowers it to achieve excellent results for each of these tasks. Our core algorithms also were enhanced by the capability to match things like “nieswandkonstnzbenno” with “Benno Nieswand, Konstanz” which is called Block-Edit-Distance calculation pertaining the same speed as for regular Levenshtein calculation. This greatly improves single field entry support. We use this for CRM applications for instance.
For more information about Exorbyte, visit the company’s Web site at http://www.exorbyte.com. For the full text of the interview, navigate to http://www.arnoldit.com/search-wizards-speak/exorbyte.html.
Stephen Arnold, November 11, 2009
In theory, the next time I am in Germany, the Exorbyte team will feed me Gans auf elsaesser Art and shower me with euros. In theory, of course. Notify the Senate Police that I am responding to promises of goose cuisine. Wow, I am delighted I admitted this. I am not sure about the goose meal, however.
Getting in the Google Index
November 11, 2009
I read “Q and A: Why Doesn’t Google Index My Entire Site?. After two days of meetings at a company working to generate Web traffic, this question was apropos to me. I have concluded that if a Web site is not in Google, that Web site may be quite difficult to find. Microsoft and Yahoo offer Web indexes, and these companies make an effort to be competitive with Google. The operative word is “competitive”. Quite a few people rely on what Google displays and remain content to stick with a Google result set. Running the same query on multiple search engines may not be some searchers’ idea of fun.
What I found interesting about the article was the suggestions for getting a site in the Google index. Let me outline two of the suggestions and offer several observations.
- “Add all your pages to your XML sitemap and change all the priority tags from 1 “ The site map is important. Quite a few sites rely on an auto generated site map. Some auto generation programs are good; some, not so good.
- “Open a Google Webmaster Tools account and verify your site. You’ll be able to see exactly how many pages of your site Google has indexed and when Googlebot last visited. If Google is having trouble indexing the site, you’ll learn about it and be given advice for how to fix it.” Another good suggestion.
Several observations:
- Some Web sites are coded with errors. Errors, even small ones in capitalization in style sheets, can create some issues. Accurate coding should be a priority.
- Writing and content. Lots of text does not immediately translate to an improved position in a Google result list. Content, urls, and tags should be cut from the same semantic cloth. Getting words that generate a tidy semantic vector takes work and time. The effort can pay off.
- Backlinks. A change in the Google indexing method is approaching. There will be dislocations and some unexpected ranking alterations. One way to prepare for the shift is to have quality in bound links to your Web site. Backlink tricks can backfire. Quality backlinks are important.
Are you ready for the Google indexing tweak?
Stephen Arnold, November 11, 2009
Don Anderson made me buy him lunch. I disclose that it cost me money to write this article. Federal Research, do you have your ears on?
SQL Databases: Model Ts for 2010
November 11, 2009
I have been increasingly nervous about Dr. Codd’s beautiful baby, now getting close to Medicare qualification. The knock against SQL databases is that these systems can take quite a bit of work to get in shape for big data. If you want more detail about the limitations of SQL databases, navigate to Adam Wiggins “SQL Databases Don’t Scale”. One of the most interesting comments in the article, in my opinion, was:
So where do we go from here? Some might reply “keep trying to make SQL databases scale.” I disagree with that notion. When hundreds of companies and thousands of the brightest programmers and sysadmins have been trying to solve a problem for twenty years and still haven’t managed to come up with an obvious solution that everyone adopts, that says to me the problem is unsolvable. Like Kirk facing the Kobayashi Maru, we can only solve this problem by redefining the question.
My answer is different data management systems: Aster Data, Exalead, InfoBright, or similar next generation systems.
Stephen Arnold, November 11, 2009
If you think anyone paid me for stating the obvious, you, dear enforcement official, are wrong.
YAGG: Google Fined for False Content
November 11, 2009
YAGG is goose talk for yet another Google goof. The search giant has been “ordered to pay 500 000 in damages to Formula 1 racer Rubens Barrichello for hosting fake online profiles of him on its social network Orkut.” You can read “Google to Pay F1 Racer” and decide whether the Brazilian court was on the right track or stuck in a pit stop. Google’s algorithms for assigning “trust” scores seems to be suspect if these methods were used by the search firm. For me the most interesting comment in the write up was:
Brazilian specialists said the amount of damages was the biggest yet awarded for false web profiles and online libel.
I have quite a file of Orkut items. Google’s social service skills continue to irritate legal den mothers and fathers. I noticed that my Google log in now granted me access to Orkut. I sped out of there, however.
Stephen Arnold, November 11, 2009
Nope, not a farthing for this YAGG write up. I suppose I should report it to former F1 sponsor, Honda.
Darknet Left Unexplained
November 11, 2009
For content mavens, nothing is acts like digital catnip than the idea of access to information most people do not have. I found the Techradar article “Off the Grid: The Darknet Exposed. Explore the Uses and Abuses of the Net’s Darkest Corners” interesting. The story explains how a darknet works with particular emphasis on hacking. The article explains honeypots; that is, computers set up to attack bad folks and their scripts. The third page of the article dips into the information world I find less frequently described. Techradar said:
BitTorrent has become the most widely used darknet protocol on the internet, and it accounts for around 40 per cent of all traffic.
The write up is a bit misleading. Hopefully someone will tackle the subject in a way that interests me. I watched the article redirect three times, so you may want to make sure you are working from a protected system.
Stephen Arnold, November 10, 2009
A freebie I fear.
Google, Its Linux, and Some Open Source Angles
November 10, 2009
What a nice write up! Navigate to LWN.net and read “KS2009: How Google uses Linux”. The write up has a wealth of information. I am not sure that the Googler who presented the information summarized in the write up is spot on across the topics referenced. I do think the write up provides some insight into the chaotic, almost loose approach to certain technical processes at Google. Keep in mind that there are 19,000 people and that certain units may have more or less stringent work processes in place. Add to that, the segregation of certain Google initiatives. In short, Google is wild and crazy, but it has folks who know how to lock down certain methods and procedures.
Nevertheless, you will get quite a bit of useful information in the article. I want to point out one passage that struck me as important:
In the area of CPU scheduling, Google found the move to the completely fair scheduler to be painful. In fact, it was such a problem that they finally forward-ported the old O(1) scheduler and can run it in 2.6.26. Changes in the semantics of sched_yield() created grief, especially with the user-space locking that Google uses. High-priority threads can make a mess of load balancing, even if they run for very short periods of time. And load balancing matters: Google runs something like 5000 threads on systems with 16-32 cores.
Yep, scale.
When you read the article, be sure to scan the comments. I found this comment particularly stimulating to my thinking:
jmm82 (subscriber, #59425) [Link] I believe the reasons were outlined why they are not contributing code into the kernel. 1. They are not using kernels that are close to linus git head. 2. Some code would not be wanted in the mainline kernel. 3. Some code is not good enough to get into the mainline kernel. 4. They don’t want to have 30 people saying the code will only get in if it does this. Aka. They don’t want to make it support the features they are not using. 5. Some code is proprietary and they want to protect the IP. As was stated above as long as they are not distributing the code the changes are their property. 6. A lot of there patches are code backported from mainline, so it is already in the kernel. I think moving forward that you will see Google have a few developers working on mainline to try and influence future kernels because it will be financially cheaper to carry as few patches as possible. Also, I feel they will always have some patches that they feel are too valuable IP to gave back and will continue to maintain those outside the mainline.
Google and its open source “issues” may be an interesting topic for an investigative soul to explore.
Stephen Arnold, November 10, 2009