Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

From Search to Facebook, Yeah, Yeah, Yeah

December 6, 2010

From the Beyond Search really important behavior desk:

With the Beatles catalog finally released on iTunes after years of negotiations, it’s interesting to note that people do not seem to be turning primarily to Google to find their favorite tunes.  “Apple iTunes Beatles Success Driven by Facebook Not Search” reports on British Internet traffic the day the Beatles catalog was released.  The article states: “On the day the Beatles content came to iTunes, 26.32 percent of total Apple traffic came from social networks compared to just 16.59 percent two days previously.  Specifically, Apple saw a huge spike in traffic coming from Facebook. On the day that the Beatles content was available to download from iTunes, 1 in every 200 visits that left Facebook went directly to Apple.”

Wowza.

These stats focus only on the UK, but since FB is so popular in the U.S. my guess would be that the phenomenon was the same on this side of the pond as well.  What does this mean for Google?  Maybe it will make sure the Google social networking platform debuts faster than the Beatles catalog. Yeah, yeah, yeah, I hope you understand.

Alice Wasielewski, December 6, 2010

Freebie

Street View and As a Ground Hog Event

December 6, 2010

The cute little ground hog pops up and events repeat themselves. Google, a cute creature indeed, is caught in a ground hog event. We refer to Street View and its ability to pop up, recycle, pop up and recycle the same sequence of activities. Is it us or the ground hog effect?

Google takes another hit with revelations of Google Street View cars again gathering private information through WiFi networks.  “Google Privacy Breach: Damage to Brand ‘Substantial‘” reports that unlike the spring 2009 data gathering, this UK privacy breach was significant enough to warrant investigation, although the British Information Commissioner’s Office did not subject Google to a fine.  According to the article, though, the main punishment for this offense will not be found in legal penalties:

“Jack Adams, SEO consultant at Greenlight told Web User that although Google seems to have been ‘let off lightly’, he believes the breach could have a major impact on the brand’s reputation: ‘There can’t be any denying that some extent of damage has been done to users’ confidence in the brand and its squeaky-clean image, built around the company’s “don’t be evil” motto.’”

In addition, the FCC is investigating similar accusations.  Does this add to what seems to be Google’s recent losing streak with Buzz, Wave, Google TV, etc.?  After all, there are objections to Street View itself with nearly a quarter of a million Germans opting out of having their homes shown.  Put all this together, and  I’m not sure that reputation damage is the main point to take away from Street View’s private data gathering fiasco.  We are beginning to think the bigger issue is an emerging pattern of poor judgment calls.  Solving problems in math does not require social savvy. Perhaps services like Street View do?

Alice Wasielewski, December 6, 2010

Safe Search Engines for Kids

December 6, 2010

In 1993 Chris Kitze and Stephen E Arnold had the idea for a directory of family friendly Web sites. That product—The Point (Top 5% of the Internet)—was acquired by Lycos along with assorted people, code, and advertisers. Flash forward to 2010.

Top 10 Safe Search Engines by David Kapuler” lists some child-safe search engines.   Not included in the list is Kid’s Click, which is mentioned on the American Library Association website.  I tried some quick searches of terms that have an innocuous as well as an obscene meaning and found the best results from his Number One site, Sweet Search. As Kapuler says, “However, keep in mind when dealing with students and surfing the internet that no site is 100% safe — even when using filtered search engines.”  Yes, filters aren’t that smart and kids are clever too.

Filtered search does not go out of style. Just like Project Runway.

Alice Wasielewski, December 6, 2010

First XML, Then the iPad: Another Life Preserver for Publishers

December 6, 2010

Publishers love XML. Well, not the coding of XML. Publishers love the versatility and slicing – dicing functions of XML. Now the publishers have another life preserver as traditional cost structures and marketing methods come under increased pressure.

Why the iPad Newspaper is Doomed” is broken down into a long list of all the reasons why Rupert Murdoch’s latest news venture is destined to be an epic fail.  Gawker, publisher working to reinvent itself, asserts:

Rupert Murdoch is putting $30 million and 100 journalists behind an iPad newspaper called “The Daily. He even has support from Apple CEO Steve Jobs. But no one really believes this thing will last.”

The reasons are that the morning news will be compiled the evening before, the scope is too broad,  Murdoch has not had success online before, huge amounts of subscriptions will have to be sold, links will be non-existent, costs are too expensive to maintain, the cost-free news competition s too tough, and the staff is all traditional news not tech.

Yet, the post does also have a short list of reasons for optimism, which are: Steve Jobs, huge iPad sales, Murdoch’s unexpected success with Fox News, and the success of some other iPad publications.  It’s the Fox News angle that interests me most.  Rupert Murdoch, love him or hate him, has been known to sniff out an opportunity and keep putting his extensive resources behind it until it pans out.  This iPad newspaper seems like a long shot, but sometimes Murdoch knows something we don’t. The story, as real journalists say, is still being written.

Alice Wasielewski, December 6, 2010

SAP: Will Technology Renew the Company?

December 6, 2010

My view: Nope.

Bloomberg’s “SAP’s Hana Speeds the Database Race” describes in a quite accepting way talk from Oracle’s chief technical officer about “technology renewing Oracle.”

The Johnny Appleseed product is a big honking appliance that is chock full of RAM. The idea is to put lots of data in memory and get outputs really fast. Now I have heard this type of claim before. You will want to read the article and make up your own mind. Just don’t forget to ask about the cost of the appliance. My code words for really expensive, as you may know, is “big honking.”

Here’s an interesting passage from the Bloomberg, let’s believe everything a big company tells us story:

Users can run calculations from computers that include Apple’s (AAPL) iPad and see tables and charts containing their data, fed over the Internet, from servers running SAP programs. “You can process a staggering number of records and get transformational kinds of results,” says Jim Shepherd, an analyst at Gartner (IT). “An analysis that might have taken 30 minutes with conventional technology you can now do in seconds, or sub seconds. It changes the whole character of the decision-making process.”

There you have it. The truth delivered from a mid tier consulting firm. When I read the passage, I glanced at the nearest mountain to see if there were a consultant with two iPads containing the “truth”. The mountain did not seem to have either consultants or iPads. Make up your own mind about these pronouncements, please.

Several observations:

  • Big data is a big problem and throwing hardware at engineering developed with 1950 traffic planning in mind won’t do the job in my experience.
  • Appliances are becoming a problem. Each is different, so there are management costs involved. These big honkers are not toasters. Think headcount and the need for redundancy.
  • Access to big data is not going to be improved as long as the traditional query methods are used. The future, in my opinion, is a search type of interface. In case you have not been in touch with planet earth’s business intelligence community for a while, these folks are not embracing hard core programming the way their kids are adopting their smartphones.

To wrap up: SAP’s problems have everything to do with technology and its associated costs. SAP is a company I like to monitor because it has that old IBM DNA, and it is a management case study in action. The firm flips from multi year deployments to messy legal hassles to up and down pricing, to appliances.

Maybe you dig this? Certainly the mid tier consulting firms chow down on these assertions the way I do when someone in Harrod’s Creek throws me a crust of bread. I am not so sure that the clients will as willing as they have in the past, however.

Here’s an interesting statement from the Bloomberg write up:

SAP has been quicker to deliver compelling technology and more open to customers’ needs since the management change this year, says Thorsten Poetter, a vice-president at Bayer responsible for data analysis. “SAP wasn’t very open to listening to what we said. That has changed dramatically.” The first version of Hana still requires too much extra work to load financial information from Bayer’s databases, and the company is waiting for an improvement next year that speeds the process, Poetter says. Bayer may use Hana to give its salespeople more current information as they head into customer meetings. “The general performance of a business intelligence system can’t be fast enough,” says Poetter. “I always compare it to Google. It has to be as fast as typing.”

Hmmm. Does this mean not ready for prime time even though a mid tier consultant says the gizmo is ready to roll? I will go with the Bayer view for now. Mid tier = marketing? A hypothesis.

Stephen E Arnold, December 6, 2010

Freebie

Goose Defeathered: Real Time Truth Revealed

December 5, 2010

The goose returned from snowy England and France. Alas, the trip to sunny Luxembourg was not possible. Luxembourg, as you know, is  San Tropez North. The trip was uneventful. I wanted to call attention to the work of a sketch artist who heard my talk about real time search. I don’t recall using the phrase “no bullshit”, but I was cold and without adequate supplies of Diet Coke. The Skinker’s crew had something called beer, which I don’t drink.

I want to reproduce a summary of my talk which is now on Flickr at this link. The screenshot from my ancient browser renders the goose without feathers, a nice pair of bald spots and a pregnant tummy. Perfect!

skinkers

I have posted one of the screenshots from my talk, and I will pepper my blog and its two or three readers with other screenshots from my winter wonderland Euro adventure in the next few days.

Where are those feathers?

Stephen E Arnold, December 5, 2010

Freebie, just like the talk at Skinker’s.

Greed Feedback Loops: Web Indexing, SEO, and Content

December 5, 2010

Wow, I thought the teeth gnashing  over “objective search results” was a dead issue. Objectivity is not part of the “free” Web search method. Uninformed people accept results as factual, relevant, and worth an invitation to have lunch with Plato. Wrong. Objective search results are a bit of myth and have been for decades.

Some education, gentle reader. A commercial database exercises editorial control. If you ran a query for ESOP on the Dialog system for File 15, you got a list of results in which the controlled term was applied or, if you were a savvy searcher, in documents in which the string ESOP appeared in a field or an abstract/full text field. The only objectivity involved was that Dialog matched on a string. No string. No match.

Online information is rife with subjectivity.

In the commercial database world, the subjectivity comes into play when the database producer selected an article to summarize, the controlled terms to apply, how the searcher framed his or her query, and what file to use in the first place. In ABI/INFORM the content set guaranteed that you would get only articles from magazine and journals we thought were important. The terms were the domain of the editors. The searcher controlled the query. Dialog was passive.

Flash forward to free Web search.

Search is expensive and the money to pay for content processing and the other bits and pieces of the so called “free system.” The most used Web search services get money mostly from advertising; that is third party payers. The reason advertisers pay money is to get access to Web search users. The present Web search system is largely built to maximize the money that flows to the search service provider. Nothing about the process is objective in my opinion. Unlike Dialog, free Web search meddles with the search results anywhere it can in order to derive benefit for itself. A  happy user is not the goal of the system. A happy advertiser is the main focus in my opinion.

In the good old days, there was overt meddling, but was the the user’s query and the database producer’s editorial policy. The timesharing company providing the service selected some databases for its service and excluded others. Users had no control over the timesharing vendor. Dialog and LexisNexis did what was necessary to maximize revenues and control the customer, the database producer, and the revenues.

But even in the good old days most online searchers di=d not worry much about the database producers’ editorial policies. Today almost no one thinks about the provenance of a content object. The Web search service wants clicks and advertisers. The advertiser wants clicks, leads, and sales. The content is not the main concern of the advertiser. Getting traffic is the main concern. And the Webmaster of an individual Web site wants traffic. The user wants information for free. The SEO industry sprang up to help anyone with money spoof the free Web indexes in order to get more traffic for a Web site which had little or no traffic in many cast. These are the ingredients of the feedback loop that has made free Web search the biased service it is. And the feedback loop that almost guarantees a lack of subjectivity.

Now read “When Businesses Attack Their Customers” or one of the dozens of other write ups by English majors, failed programmers, and search engine optimization experts. The notion of a Web search system fiddling the results seems to be a real light bulb moment. Give me a break. Consider these typical functions in Web indexing and posting today:

  • Lousy content created to get clicks from the clueless. There’s big money in crap content because of programs like Google AdWords. But those annoying pop up ads, those are just variations on the crap content scheme. Lousy content exists because search engines incentive the creators of this content. Users are unable to think critically about information, preferring to take whatever is dished up as gospel.
  • The Web indexes are not in the education business. Web indexes are in the traffic and advertising business, and these outfits will do what’s necessary to get traffic. If the National Railway Retirement Board adds an important document, that document may want a long time before a Web search engine indexes it. Put up a post about Mel Gibson’s court battle, and that document is front and center really fast. Certain content attacks clicks, and that content gets the limelight.
  • People who use the Web describe themselves as good researchers. Baloney. Most people look for information the way a Stone Age person made a fire: Wait for a lighting strike, steal or borrow a burning stick from a tribesman, or get two rocks and bash them together. Primitive queries cause Web search systems to deliver what the user wants without the user having to think about source, provenance, accuracy, or freshness. By delivering what users may want, Web search engines create a way to offer advertisers what appears to be a great sales advantage. I think the present approach delivers advertisers meaningless clicks, big bills, and lots of wacky metrics. Sales. Not so much.

I don’t think the commercial online search systems and the commercial database producers have a future filled with exploding revenues and ever higher quality content. I think the feedback loop set up and fed by free Web search is broken. In its wake is the even more subjective and probably easier to manipulate “social search” method. If you don’t know something, just ask a fried. That will work really well on certain topics. The uninformed are now leading the uninformed. Stupid is and as stupid does.

I use the Exalead Web index. No index is perfect, but I am more confident in Exalead’s approach because the company is not into the ad game. I also use DuckDuckGo and Blekko. Neither is perfect, but I have more confidence in the relevancy of the results, but I don’t know the scope of the companies’ indexes, not their respective editorial policies. The other Web indexes are little more than ad engines.

And SEO or search engine optimization? That “discipline” was created to get a Web page to the top of a results list. Never was the SEO motivation precision, recall, or relevancy. Accuracy of the content was not a primary concern. Clicks were it. As SEO “experts” trashed relevancy methods, the Web search engines abandoned objectivity and went for the clicks and money. I don’t have a problem with this, what I have a problem with is the baloney manufactured about bias, lousy search results, and other problems. These problems, in my opinion, complement the the naive and uninformed approach to research most users of Web search systems rely upon.

A failure in some education systems virtually ensures that critical thinking is in danger of becoming extinct. In an iPad mad world with attention deficit disorder professionals running rampant, I suppose the howls of outrage may be news. For me, this is an old story and an indication of the state of Web search.

The feedback loop is up and operating. Irrelevancy will increase in the quest for ad revenue. No easy fix in sight for a problem that’s been around for a decade. Now the Web search providers want to push search results to users before the users search. Gee, that’s a great opportunity to deliver subjectively ordered results based on advertiser needs. The scary part is that many Web users neither know no care about provenance, precision, recall, or relevance.

Welcome to a future with lots of lousy searchers who think they are experts.

Give me a break.

If you know an information professional, sometimes called a librarian, take a moment and get some advice from a real pro about searching. Too much work? Maybe that’s why so many bad decisions are evident today? Bad data, uninformed decisions, a lack of critical thinking, and flawed information skills are nutrients for big and bad mistakes.

Stephen E Arnold, November  30, 2010

Freebie

What Yahoo Users Needed to Know in 2010

December 5, 2010

When a search giant such as Google hints that it knows what users want, I often bristle. Then after i read such articles as “Yahoo’s Top Mobile Searches of 2010 Reveal Mobile’s Real-Time Nature,” I think Google may be right. In my work, I am not . sure what I will be researching from one day to the next. I poked around for a Victor Bout sidekick. I probed the fat underbelly of health fraud. I followed links on RedTram that delivered me to surprising Web sites.

However, I have never run a query for the topics highlighted in this Yahoo centric write up about mobile search topics in 2010. I knew I was old and clueless, but my cluelessness cannot be remediated by these topics:

BP oil spill
World Cup
Miley Cyrus
Kim Kardashian
Lady Gaga
iPhone
Megan Fox
Justin Bieber
American Idol
Britney Spears.

Who is Justin Bieber? I don’t think he works at Google. And Kim Kardashian? Perhaps she is on the staff at Brookhaven National Lab.

Scary and enlightening simultaneously. And what does this list tell me about Yahoo’s users? Just that I don’t want to sit next to one on a long, international flight.

Stephen E Arnold, December 5, 2010

Freebie

Is Clever a Way to Cheat in the Page Display Race?

December 5, 2010

Google and Microsoft Cheat on Slow Start. Should You?” explores how big Web players speed their page load times. Fascinating stuff. Google has tried to make speed one of its distinguishing characteristics. I did not know that Google “cheated” by ignoring certain Internet conventions. The assertion is interesting and triggers in my mind some thoughts about how Google’s corporate culture operates.

Here’s the passage that I thought was interesting with regard to the speed up trick and suggestive about Google’s approach to problems it finds “frustrating”.

The Google engineers on the mailing list have taken on a more frustrated tone recently, so it’s possible that they decided the best way to make forward progress was to just turn it on and see whether the internet actually melts down or not. It’s also possible that I happen to part of an ongoing test that they’re running.

My view is that the Math Club crowd assumes it is better to day “we’re sorry” instead of fooling around with troublesome and annoying work processes that involve other humans, often not in the Math Club.

In my opinion, I do not think Google’s “let’s do it anyway” approach is confined to this single instance. Try explaining this technical issue to your local elected official.

Stephen E Arnold, December 4, 2010

Freebie

GGF: An Acronym of Angst

December 4, 2010

Mr. Goose is back in the US of A. I got more hugs from TSA than I did from the legions of people at my various talks in Paris and London. What did I spy when I opened my RSS reader after a fun filled nine hour flight from CDG?

Groupon is playing hard to get. You can read the cornucopia of posts yourself. I found this one amusing: “It All Changes When the Founder Drives a Porsche.” Here’s a sample:

With Groupon, with the money problem solved, they can “go for it.” Basically, the motivation for a big exit is no longer motivated by “how much money can I get,” it is motived by “what is my legacy.” That simple shift makes their rejection of Google’s $6B offer not that surprising.

Yep, solved. For now.

My take on the deal was GGF. The first G is Groupon, of course. The hammer dial, people centric coupon outfit. The second G is Google. The company is running into some rapids, and it needs a home run. Heck, if the Math Club can’t come up with something that works, just throw billions at the problem. The method used to work really well for Microsoft.

The F is Facebook. Contemplate friends and members couponing their teenaged hearts out. So GGF may be a triad to watch.

Stephen E Arnold, December 4, 2010

Freebie just like Facebook

« Previous PageNext Page »

  •  Only search links from this page: