hakia’s Founder Riza Berkan on Search
August 12, 2008
Dr. Riza Berkan, founder of hakia, a company engaged in search and content processing, reveals the depth of engineering behind the firm’s semantic technology. Dr. Berkan said here:
If you want broad semantic search, you have to develop the platform to support it, as we have. You cannot simply use an index and convert it to semantic search.
With its unique engineering foundation, the hakia system goes through a learning process similar to that of the human brain. Dr. Berkan added:
We take the page and content, and create queries and answers that can be asked to that page, which are then ready before the query comes.
He emphasized that “there is a level of suffering and discontent with the current solutions”. He continued:
I think the next phase of the search will have credibility rankings. For example, for medical searches, first you will see government results – FDA, National Institutes of Health, National Science Foundation. – then commercial – WebMD – then some doctor in Southern California – and then user contributed content. You give users such results with every search; for example, searching for Madonna, you first get her site, then her official fan site, and eventually fan Web logs.
You can read the full text of the interview with Dr. Riza Berkan on the ArnoldIT.com Web in the Search Wizards Speak series. The interview was conducted by Avi Deitcher for ArnoldIT.com.
Stephen Arnold, August 12, 2008
Data Centers: Part of the Cost Puzzle
August 11, 2008
The “commentary” is “Servers: Why thrifty Isn’t Nifty” which appears here. The “commentary” is by a wizard, Kenneth G. Brill, and he takes a strong stand on the topic of data center costs. The “commentary” is sponsored by SAP, an outfit that exercises servers to the max. Mr. Brill is the executive director of the highly regarded Uptime Institute in Santa Fe, New Mexico. Santa Fe is a high-tech haven. The Santa Fe Institute and numerous think tanks populate this city, a reasonable drive from LANL (Los Alamos National Laboratory). LANL is world famous for its security as you may know. With chaos theory and technical Jedis in every nook and cranny of the city except the art galleries, I am most respectful of ideas from that fair city’s intelligentsia.
The hook for the “commentary” is a report called Revolutionizing Data Center Efficiency. The guts of the report are recommendations to chief information officers about data centers. With the shift to cloud computing, data centers are hotter than a Project Runway winner’s little black dress. For me the most interesting part of this “commentary” was this statement:
One of these recommendations is to dramatically improve cost knowledge within IT…The facility investment required to merely plug-in the blades was an unplanned $54 million. An additional unplanned $30 million was required to run the blades over three years. So what appeared to be a $22 million decision was really an enterprise decision of over $106 million.
The “commentary” includes a table with data that backs up his analysis. The data are useful but as you will learn at the foot of this essay, offer only a partial glimpse of a more significant cost issue. You may want to read my modest essay about cost here.
What baffles me is the headline “Servers: Why Thrifty Isn’t Nifty”. Forbes’s editors are more in the know about language that I. I’m not sure about the use of the word thrifty because the “commentary” uses servers as an example of the cost analysis problem facing organizations when folks make assumptions without experience, adequate accounting methods, and a rat pack of 25 year old MBAs calculating costs.
Let me make this simple: cost estimations usually have little connection to the actual expenditures required to make a data center work. This applies to the data centers themselves, applications, or the add ons that organizations layer on their information technology infrastructure.
Poor cost analysis can sink the ship.
Mr. Brill has done a fine job of pointing out one cost hockey stick curve. There are others. Until folks like the sponsors of Mr. Brill’s “commentary” spell out what’s needed to run bloated and inefficient enterprise applications, cost overruns will remain standard operating procedure in organizations.
Before I close this encomium to Santa Fe thinking, may I point out:
- Engineering data centers is not trivial
- Traditional methods don’t work particularly well nor economically in the world of multi core servers and peta-scale storage devices stuffed into poor engineered facilities
- Buying high end equipment increases costs because when one of those exotic gizmos dies, it is often tough to get a replacement or a fix quickly. The better approach is to view hardware like disposable napkins?
Which is better?
[a] Dirt cheap hardware that delivers 4X to 15X the performance of exotic name brand servers or [b] really expensive hardware that both fails and runs slowly at an extremely high price? If you picked the disposable napkin approach, you are on the right track. Better engineering can do more than reduce the need for expensive, high end data center gear. By moving routine tasks to the operating system, other savings can be found. Re engineering cooling mechanisms can extend drive and power supply life and reduce power demands. There are other engineering options to exercise. Throwing money at a problem works if the money is “smart”. Stupid money just creates more overruns.
Mr. Brill’s “commentary” provides one view of data center costs, but I trust that he has the brand name versus generic costing in the report he references. If not, there’s always an opportunity in Santa Fe for opening an art gallery or joining LANL’s security team.
Stephen Arnold, August 11, 2008
Stephen Arnold, August 11, 2008
Microsoft SharePoint Olympic Watch
August 11, 2008
Microsoft’s plan to get Silverlight on millions of personal computers is now underway. It’s too soon to determine if it wins the gold for software downloads. One of my sources reports that the Mojave Web site runs on Flash. Hmmm. If this image is real and not Photoshopped, , I guess most attendees know what this translucent blue screen shot is. A BSOD (blue screen of death) appears at the Chinese Olympics. You can see the ghostly image here, courtesy of PowerApple.com. In case the image 404s, here’s what I saw.
If you have any additional information about this “image”, please, let me know.
Stephen Arnold, August 11, 2008
Hot News: Google Is Interested in Content
August 11, 2008
That wild and wonderful New York Times has a rear view mirror article that you must read. It’s here and called “Is Google a Media Company?” by Miguel Helft, a really good writer. For me, the key point in the article is this statement:
Google has long insisted that it has no plans to own or create content, and that it is a friend, not a foe, of media companies. The Google search engine sends huge numbers of users to the digital doorsteps of thousands of media companies, many of which also rely on Google to place ads on their sites.
This is, of course, Google’s standard verbiage, its “game plan” talk.
Mr. Helft quotes a range of experts who offer a contrary view. A Harvard professor (David B. Yoffie) surely is in the know, is quoted saying:
‘If I am a content provider and I depend upon Google as a mechanism to drive traffic to me, should I fear that they may compete with me in the future?’ Professor Yoffie asked. ‘The answer is absolutely, positively yes.’
I talk a bit–I recall I devoted 20 or 25 pages–to Google’s publishing and content acquisition / distribution inventions in my August 2007 study Google Version 2.0. If you are curious, there’s more information here. Outsell, a nifty consulting outfit in Burlingame, California, recycled some of my research late last year. There is a bit of dissonance between what my research suggested and the tasty sound bites in the New York Times article.
The key point is that Google’s been beavering away in “publishing” for quite a while. Actually, publishing, even the word media, is too narrow. Google has somewhat wider vistas in mind if I understand its patent documents and technical papers.
It’s exciting to know that now the paper of record has made it official. Google has some media thoughts in its Googzilla brain.
Stephen Arnold, August 11, 2008
New Era in Visualization Emerging
August 11, 2008
Traditional tools can’t deal with petabyte and larger data sets. The Department of Homeland Security and the National Science Foundation have tapped Georgia Tech “as the lead academic research institution for all national Foundations of Data and Visual Analytics (FODAVA) research efforts. Seven other FODAVA Partnership Awards will be announced later this year, all working in conjunction with eleven Georgia Tech investigators to advance the field.” News of an initial grant of $3 million was reported by the university earlier this month. You can read one version of the announcement here in PhysOrg.com’s article “New Grant Supports Emerging Field of Massive Data Analysis and Visual Analytics.”
I think this is important because US government funding does have in impact on information-related innovation. Data mining, text mining, and search have blossomed with government support. A recipient of US government money is asked to look for ways to push the innovations into the commercial channel. The idea is for government funds to “give back” to citizens.
My view of visualization is mixed. Most of the startling visualizations such as Australia National University’s three dimensional rock are interesting but not practical. Last week I marveled at a collection of wild and crazy visualizations. The problem is that most visualizations get in the way of my understanding the data. A good example is Indiana University’s visualization of movies. I still have a heck of time figuring out what the visualization depicts. For me, using it is out of the question. You can see this visualization here.
My hunch is that visualization will be in my face in the months and years ahead.
Stephen Arnold, August 11, 2008
.
Google and Hosted Telephony
August 11, 2008
Network World’s Matthew Nickasch wrote an interesting article “Will Google Consider Hosted Telephony?”. You will want to read it in its entirety. The story is here. The premise of the story is that Google may offer a range of wireless services from the cloud. Mr. Nickasch asserts:
While no official plans, or even rumors have been released, a Google-hosted VoIP environment may be incredibly popular for organizations that utilize Google Apps for all other collaboration needs. We’ve seen our fair share of free hosted VoIP environments, like Skype, Free World Dialup, etc, but Google has yet to venture into such a market.
My own research into Google’s telephony activities suggested to me that:
- Google started working on mobile and other telephony services as early as 1999
- Telephony, based on my analysis of Google patent documents, has been one of the areas of intense activity for almost a decade.
- Google’s innovations extend deeper than hosted applications; for example, Google has a clever invention for routing calls in a distributed mesh environment.
Mr. Nickasch ends his article with several questions. What’s your take? Has Google lost its chance to make a telco or has Google a different game underway? In Google Version 2.0, I discuss options for Google’s “other game”. Hosted services are already here, and I think Googzilla is watching and learning.
Stephen Arnold, August 11, 2008
QlikTech: More Business Intelligence from Sweden
August 11, 2008
QlikTech is one of the fastest growing business intelligence companies in the world. The company’s Web site here asserts that it has more than 7,300 customers. Based in Lund, QlikTech has morphed from consulting to software. Its core technology is software that makes exploring data a point-and-click affair. Most graphical interfaces require that the user know what specific statistical processes can do and how they work. QlikTech’s approach exposes options. When a user clicks on a option, excluded in inappropriate options are grayed out. A typical manager can point and click her way analysis of a Web site’s traffic or explore cash flow.
The company offers a number of demos here. One caution. Not all of the demos work. In my tests, latency played a part in the Java demos I tried. The Ajax demos were for the most part acceptable, but several rendered empty browser screens. You will need to explore these on your computer.
Sybase has inked a deal with QlikTech for the company’s analytics system. You can read the article from the WebNewsWire here. Sybase will use QlikTech to provide “dashboards” to Sybase users who want to give Sybase licensees point-and-click interfaces and graphical displays that show important data at a glance.
Sybase offers its own analytic tools (Sybase IQ), but a typical user needs training and technical expertise that most managers cannot acquire quickly. So, QlikTech to the rescue. QlikView operates in-memory, thus eliminating the hassle of building cubes and delays associated with traditional queries. The QlikView system automatically associates related data as a user clicks on options in the interface. With the in memory approach, a user can whip through data in a more fluid manner.
Business intelligence is becoming the new “search”. QlikView’s technology can manipulate most structured data.
Stephen Arnold, August 11, 2008
Beyond Search’s Search Function Back On Track
August 10, 2008
I have had many positive comments about the search function for my Web log “Beyond Search”. Last week, we had reports of current postings not appearing in the index. Our hosting company had in place a method to block certain clickstreams when certain conditions were detected by the hosting company’s automated systems. The increasing demand for access to the site and the additional content indexed by the Blossom search system caused a slow down in “Beyond Search.” The hosting company, Blossom.com, and my engineering team have resolved the problem. Thank for your patience. Blossom.com’s Web log indexing system continues to delight me. If you are looking for a search system for a Web site or a Web log, please navigate to http://www.blossom.com and check the company. Feel free to mention that Beyond Search is happy. I’m sufficiently happy to award the Blossom.com team three happy quacks. We’re back to normal, but my normal may be different from your normal. Anyway you can search for posts about SearchCloud, Sprylogics, and of course my favorite SharePoint. Enjoy.
Stephen Arnold, August 10, 2008
Search Fundamentals: Cost
August 10, 2008
Set aside the fancy buzz words like taxonomies, natural language processing, and automatic classification. I want to relate one anecdote from a real-life conversation last week and then review five search fundamentals.
Anecdote
I’m sitting in a fancy conference room near Tyson’s Corner. The subject is large-scale information systems, not search. But search was assumed to be a function that would be available to the larger online system. And that’s where the problem with search fundamentals became a time bomb. The people in the room assumed that search was not a problem. One could send an email to one of the 300 vendors in the search and content processing market, negotiate a licensing deal, install the software, and move on to more important activities. After all, search was a mud flap on a very exotic sports car. Who gets excited about mud flaps?
The situation is becoming more and more common. I think it is a consequence of Googling. Most of the people with whom I meet in North America use Google for general Web search. The company’s name has become a verb, and the use of Google is becoming more ubiquitous each day. If I open Firefox, I have a Google search box available at all times.
If Google works, how hard can search be?
Five Fundamentals
I have created a table that lists five search fundamentals. Feel free to scan it, even recycle it in your search procurement background write ups. I want to make a few comments about each fundamental and then wrap up this essay with what seems to me to be an obvious caution. Table after jump.
NTT Video Search
August 10, 2008
Video search evokes thoughts of Autonomy and Google, rarely NTT, the Japanese communications giant. According to DigitalBroadcasting.com, NTT has a robust media search technology cleverly named Robust Media Search. You can read about the invention here. The write up is a news release, so it has a pro-NTT spin. Imagine that.
NTT is no newcomer to search. The company has been pecking away at media search, based on proprietary NTT technologies, since 1996.
Video search is important. In my own research, I find that many organizations plop a 20-something in front of a Web cam and capture minutes of video about a topic. Google, the search wizards, are among the worst offenders. Google records presentations accompanied by almost unreadable visuals about many topics. To find the occasional gem, I have to let the video drone as I listen for an interesting fact. The video search engines are not too good. There are false drops and often the need to run the entire video to locate the single point referenced in the search system. Grrr. I hate video.
Will the NTT invention make my life easier as I try to cope with the rising tide of rich media? If you want to learn more about NTT’s technology, you can navigate here and deal with the registration process. I refused to do this.
Here’s what I have pieced together about this new search technology:
- Search is one component of a number of rich media services. These range from distribution to digital fingerprinting. More about this is here.
- The wizard identified with the search technology is Takayuki Kurozumi, Ph.D. Bio here.
- A field trial with BayTSP began in April 2008. More about the trial is here. Information about BayTSP is here. The “TSP” is an acronym for “track, secure, and protect”.
I have not been able to locate public information about the outcome of this test. Based on my experience with Japanese search systems, the technology may find its way into a network service. The “search” does not necessarily mean that I will be able to look for a video. The “search” may be a function for a copyright holder to locate and track video or audio content used without permission of the copyright holder.
Stephen Arnold, August 10, 2008