Company Profiles Coming to Beyond Search

August 8, 2008

I talked with the team working on this Web log today at lunch. After I bought everyone super burritos, I was able to gather some ideas for making the Beyond Search Web site more useful to me, the team, and the two or three readers out there.

The Search Wizards Speak series on ArnoldIT.com has been well received. Several of the interviews have been recycled and turned up in Web logs in lands far from rural Kentucky and our lone “authentic” Mexican restaurant. One of the people working there has a non-hill folk accent, so the Cantina Kentucky must be muy authentico.

The idea that emerged between mouthfuls of “authentic” burritos was to post one or two page profiles of the companies mentioned in the stories in the Web log. I thought the idea was pretty awful, but the burrito-sated Beyond Search team thought it was wonderful.

Here’s the plan.

I have developed on a restaurant napkin a rough outline for what should be included in each of the company profiles. A team member or one of the writers who work on this Web log will write the profile. I have gigabytes of info about search, and I will let the lucky journalist grind through these data and then tap other sources.

Each profile will have a comments section. If you want to add information or correct an error, use the comments form. Once a year, we will roll the comments into the baseline profile. In this way, you can get some basic information about the companies mentioned in the Web log. You can also update or correct the basic entry.

I think we will be cutting and pasting from company information of search vendors’ Web sites. I am thinking about adding my unique stamp to each write up with my personal “likes and dislikes” for each system. My attorney says he wants to think about this “likes and dislikes” stuff, so stay tuned on that point.

Keep in mind that I do really meaty analyses of companies in the search and content processing business. The profiles, like the interviews in Search Wizards Speak, will provide some useful information but the juicy stuff will not be included.

So what’s juicy?

Well, I just completed ripping through Endeca’s patent documents. I have identified some upsides and downsides to the inventions disclosed. I have then worked through the publicly available information about Endeca, made a couple of calls, and thought about what I have learned. That type of detail is not going to be in these free two-page profiles. Some lucky or silly outfit is going to have to pay me for the slog through the golden prose of lawyers and engineers. The prose makes Henry James’s novels look like the script to the new Batman movie.

I want to post a couple of test profiles and invite comments. I will go slowly at first, but if I can get the kinks worked out, my goal is to have one profile every week or two.

One of the burrito eaters suggested I sell profiles to companies who want the Beyond Search team to write about a specific firm. I am a greedy goose, but I want to put that idea on the back burner until I figure out if this is a feasible activity. There’s a lot of email and chasing required to get an interview completed. I’m not sure about search company profiles. The idea of money is easier to experience than the actual process of squeezing a beet for nectar.

Watch this Web log for a link to the first profile. I’m thinking next week. Comments? Suggestions? Let me know in the comments section below this article.

Stephen Arnold, August 8, 2008

Will Microsoft Bring Home the Gold in the SharePoint Olympics?

August 8, 2008

The Olympics are underway. If you have any questions, you will want to navigate to the Beijing Organizing Committee for the Olympic Games’ portal here. Ooops. That’s not the SharePoint site, and this MSDN article “SharePoint Server 2007 Powers Beijing 2008 Olympic Games” does not include a link to the SharePoint site. You can read this post, dated August 5, 2008l, here. The screenshot featured on the site does not look like any of the pages on the “official” site at http://en.beijing2008.cn/.

Here’s the “official” site’s look and feel:

olympics official

And here’s the screen shot of Microsoft SharePoint and its “official” site:

clip_image001_2

I think I have figured out what’s going on, but it would be nice if the MSDN post contained links to pages, not screenshots without a url or trackback link. You can navigate to a July 2008 case study here and learn more about this high profile opportunity for SharePoint. Here’s the architecture diagram for the Microsoft system:

set up

Compared to the SharePoint placemat diagram here, it seems to me that this Olympics’ diagram is a simplified schematic.

One oddity is that the drop down box that one uses to specify the viewer’s country is tough to control The video won’t play until you click on the country, but the scroll function is somewhat immature. The video is displayed on the NBColympics.com Web site, and I was puzzled by the design of that page.

A happy quack to the SharePoint team. Nothing but smooth sailing for the next couple of weeks.

Stephen Arnold, August 8, 2008

SearchCloud Updated

August 7, 2008

SearchCloud.net has updated its search system. You can now adjust term weightings from the results page. If you are not familiar with this system, you will want to navigate to http://www.searchcloud.net. The idea is to enter a term and then select a font size for that term. The system puts the term in the cloud and makes an internal notation to weight that term in proportion to its font size. The larger the font, the more significant the term is to your query. Term weighting has been available to search system administrators, but the function is usually excluded from user facing controls. I wrote a profile of the company’s system, and you can read that essay here.

Other changes in the update include:

  • Tweaks to the interface
  • A hint box
  • Results can be copied.

I will keep you posted about developments. A happy quack to those who support term weighting.

Stephen Arnold, August 7, 2008

Sprylogics’ CTO Zivkovic Talks about Cluuz.com

August 7, 2008

The popular media fawn over search and process content companies with modest demos that work on limited result sets. Cluuz.com–a company I profiled here several weeks ago here–offers more hearty fare. The company uses Yahoo’s index to showcase its technology. You can take Cluuz.com for a test drive here. I was quite interested in the company’s approach because it uses Fancy Dan technology in a way that was immediately useful for me. Cluuz.com is a demonstration of Toronto-based Sprylogics International Inc. The company is traded on the Toronto exchange symbol TSXV:SPY.

With roots in the intelligence community, unlocking Sprylogics took some work. Once I established contact with Alex Zivkovic, I was impressed with his responsiveness and his candor.

You can read about the origins of the Cluuz.com service as well as some of the company’s other interesting content processing technology. The company offers a search system, but the real substance of the company is how the company processes content, even the Yahoo search index into a significantly more useful form.

The Cluuz.com system puts on display the firm’s proprietary semantic graph technology. You can see relationships for specific subsets of relevant content. I often use the system to locate information about a topic and then explore the identified experts and their relationships. This feature saves me hours of work trying to find a connection between two people. Cluuz.com makes this a trivial task.

Mr. Zivkovic told me:

So, we have clustering. We have entity extraction. We have a relational ship analysis in a graph format. I want to point out that for enterprise applications, the Cluuz.com functions are significantly more rich. For example, a query can be run across internal content and external content. The user sees that the internal information is useful but not exactly on point. Our graph technology makes it easy for the user to spot useful information from an external source such as the Web in conjunction with the internal information. With a single click, the user can be looking into those information objects.

I probed into the “guts” of the system. Mr. Zivkovic revealed:

Our engineers have worked hard to perform multiple text processing operations in an optimized way. Our technology can, in most cases, process content and update the indexes in a minute or less. We keep the details of our method close to the vest. I can say that we use some of the techniques that you and others have identified as characteristic of high-speed systems like those at Google, for example.

You can read the full interview with Mr. Zivkovic in the Search Wizards Speaks interview collection on the ArnoldIT.com Web site. The full text of this exclusive interview is here. A complete index of the interviews in this series is here.

Google Search Appliance: Showing Some Fangs

August 6, 2008

Assorted wizards have hit the replay button for Google’s official description of the Google Search Appliance (GSA)

If you missed the official highlights film, here’s a recap:

  • $30,000 starting price, good for two years, “support” and 500,000 document capacity. The bigger gizmos each can handle 10 million documents. These work like Christmas tree lights. When you need more, just buy more GSAs and plug them in. This is the same type of connectivity “big Google” enjoys when it scales.
  • Group personalization; for example, marketing wizards see brochures-type information and engineers see documents with equations
  • Metadata extraction so you can search by author, department, and other discovered index points.

If you want jump right into Google’s official description, just click here. You can even watch a video about Universal Search, which is Google’s way of dancing away from the far more significant semantic functionality that will be described in a forthcoming white paper from a big consulting firm. This forthcoming report–alas–costs money and it even contains my name in very small type as a contributor. Universal Search was the PR flash created for Google’s rush Searchology conference not long after an investment bank published a detailed report of a far larger technical search initiative (Programmable Search Engine) within the Googleplex. For true Google watchers, you will enjoy Google’s analysis of complexity. The title of the video is a bit of Googley humor because when it comes to enterprise or behind the firewall search, complexity is really not that helpful. Somewhere between 50 and 75 percent of the users of a search system are dissatisfied with the search system. Complexity is one of the “problems” that Google wants to resolve with its GSA.

When you buy the upscale versions of the GSA, you can implement fail over to another GSA. GSAs can be distributed geographically as well. The GSA comes with support for various repositories such as EMC Documentum. This means that the GSA can index the Document content without custom coding. The GSAs support the OneBox API, which is an important component in Google’s enterprise strategy. With the GSA, a clever programmer can use the GSA to create Vivisimo-style federated search results, display live data from a Microsoft Exchange server so a “hit” on a person shows that person’s calendar, integrate Web and third-party commercial content with the behind-the-firewall information, and perform other important content processing tasks.

Google happily names some of its larger customers, including Adobe Systems, Kimberly-Clark, and Sunnybrook Health. The company also does not mention the deep penetration of the GSA into government agencies, police organizations, and universities.

Good “run the game plan” write ups are available from CNet here, my favorite TechCrunch with Eric Schonfeld’s readable touch here, and the “stilling hanging in there” eWeek write up here.

splash for videos

After registering for the enterprise videos, you will see this splash page. You can get more information about the upgrade to Version 5 of the GSA.

My Take

Now, here’s my take on this upgrade:

First, Google is responding to demands for better connectivity, more administrative control, and better security. With each upgrade to the GSA, Google has added features that have been available for a quarter century from outfits like Verity (now part of the Autonomy holdings). The changes are important because Google is often bad mouthed for offering a poor enterprise search solution. With this release, I am not so sure that the negatives competitors heap on these cheerful yellow boxes are warranted. This version of the GSA is better than most of the enterprise search appliances with which I am familiar and a worthy competitor where administrative and engineering resources are scarce.

Read more

Vadlo: New Bio Search Engine

August 6, 2008

In the last year, I’ve noticed that Google is doing less PowerPoint indexing. The cause may be lousy usage so the expense of dealing with the pick up sticks file formats  is not worth the computational effort. Another idea is that more people–including me–are dumping PowerPoint decks to Adobe Portable Document Format files and saving some time while reducing bloated PowerPoint files to a more manageable size. Whatever the reason, PowerPoints do contain useful information. Vadlo, a biological-oriented vertical search engine, indexes PowerPoints, training classes, references to protocols, seminars, databases, and software. Like Dieselpoint, Vadlo is a product of Chicago. I think the city is trying to change its metaphorical association from “meat packer” and “big shoulders” to search and content processing. Vadlo is owned by two biology scientists and its mission is “to locate biology research related information on the Web”. The company may get a boost because Google is not doing a particularly thorough job in this area, and as noted, the Google is doing an even poorer job of keeping up its PowerPoint indexing. Vadlo includes cartoons which are undoubtedly real side splitters for biologically-enriched wizards. You can download them here or wait until these turn up in the New Yorker Magazine.

cartoon

My recollection is that vádló is a Hungarian word which can be a name or mean “accuser”. My hunch is that clever biologists have unearthed this word to find a five letter domain name. I did a bit of poking around, and I found references to this system in comments appended to Cuil.com articles, in a couple of Google Groups, and on a handful of Web logs. One post was dated 2007, so this Valdo is not really a newcomer to search like the outfit I will be visiting later this month. Check out the system. Just don’t accuse me of having lousy language translation skills. I live in rural Kentucky, shoot squirrels, and eat burgoo.

Stephen Arnold, August 6, 2008

SiSense: Shows Good Sense

August 6, 2008

I am gathering information about Google’s slow but steady progress in the enterprise software sector. My research indicates that the tasty Gummy Bear that lures people to Googzilla is maps, satellite imagery, and the nifty overlays that a licensee can plop on a Google map. You may want to look at what SiSense–a business intelligence start up–is doing with the oft-reviled Google Spreadsheets. SiSense is in the business intelligence software business. A SiSense customer can navigate to http://www.sisense.com, learn about the tight integration between the SiSense software and Google spreadsheets as a data source, and download a software widget.

SiSense has concluded that some of its customers use Google spreadsheets to hold data which can then be crunched using SiSense’s business intelligence routines. One application is for a distributor with seven sales reps who use Google spreadsheets to hold various data. The SiSense licensee can such data from these spreadsheets, roll it up, and crunch away. SiSense hit may radar because it uses the Amazon S3 service as well.

Prices are not available. SiSense is gearing up. My hunch is that if Amazon introduces a spreadsheet, SiSense may jump from Google to Amazon. Conversely, if Google makes public its remarkable data management ecosystem, SiSense may say “Sayonara” to Amazon’s pretty darned exciting Web services.

Here’s my take on how Google is attacking the enterprise. In the hospital where my mother is recovering from a heart attack, there’s a sign in the elevator to the cardiology unit. It says, “Don’t stack boxes on the floor. In case of fire, the box may continue to burn. Water will also damage the contents of the cartons.”

Google is entering into deals that work like the fire in a carton. Once the base has been penetrated, top down efforts might not extinguish Google. Once the little Google flame starts to burn the entire box is at risk. IBM, Microsoft, and Oracle are quite happy with their big fire hoses. These hoses can put out any annoying Google fires, or so the assumption goes. I am not so sure. One part of Google’s enterprise strategy is to set a bunch of small boxes on fire with the excitement of Google functionality. Google can then sit back and wait for the heat to build and then cook some goose–hopefully not this Web log’s logo.

Stephen Arnold, August 6, 2008

Search Options: Betting in a Down Economy

August 5, 2008

Paula Hane, who writes for Information Today, the same outfit paying me for my KMWorld column, has a very interesting run down of search engine options here. I agree with most of her points, and I think highly of the search systems which she has flagged as an option to Google.

But I want to take another look at search options and, true to my rhetorical approach, I want to take the opposite side of the argument. Ms. Hane who knows me has remarked about this aspect of my way of looking at information. Keep in mind I am not critical of her or Information Today. I want to be paid for my most recent column about Google’s geospatial services, the subject of the next column for KMWorld.

Here goes. Let’s get ready to rumble.

First, search is no longer “about search”. Search has become an umbrella term to refer to what I see as the next supra national monopoly. If you are looking for information, you probably use search less than 20 percent of the time. Most people locate information by asking someone or browsing through whatever sources are at hand. Search seems to be the number one way to get information, but people navigate directly to sites where the answer (an answer) may be found. I routinely field phone calls from sharp MBAs who prefer to be told something, not hunt for it.

Second, fancy technology is neither new nor fancy. Google has some rocket science in its bakery. The flour and the yeast date from 1993. Most of the zippy “new” search systems are built on “algorithms”. Some of Autonomy reaches back to the 18th century. Other companies just recycle functions that appear in books of algorithms. What makes something “new” is putting pieces together in a delightful way. Fresh, yes. New, no. Software lags algorithms and hardware. With fast and cheap processors, some “old” algorithms can be used in the types of systems Ms. Hane identifies; for example, Hakia, Powerset, etc. Google is not inventing “new” things; Google is cleverly assembling bits and pieces that are often well known to college juniors taking a third year math class.

Third, semantics–like natural language processing–is a hot notion. My view is that semantics work best in the plumbing. Language is slippery, and the semantic tools in use today add some value, but often the systems need human baby sitters. No one–including me–types well formed questions into a search box. I type two or three words, hit enter, and start looking at hits in the result list.

Fourth, social search sounds great. Get 200 smart people to be your pals and you can ask them for information. We do this now, or at least well connected people do. As soon as you open up a group to anyone, the social content can be spoofed. I understand the wisdom of crowds, and I think the idea of averaging guesses for the number of jelly beans in a jar is a great use of collective intelligence. For specialized work, let me ask a trusted expert in the subject. I don’t count jelly beans too often, and I don’t think you do either. Social = spoof.

Fifth, use a search system because a company pays you. Sorry, I don’t think this is a sustainable business model. Search is difficult. Search requires that a habit be formed. If the pay angle worked, the company would find that it becomes too expensive. The reason pay for search works is that not too many people search to get paid. When a person searches, there’s a reason. Getting a few pennies is not going to make me change my habits.

What’s this mean for Google competitors?

My contrarian analysis implies:

  1. Competitors have to leap frog Google. So far no one has been able to pull this off. Maybe some day. Just not today or for the foreseeable future.
  2. Google is not a search system. It’s an application platform. Search with the search box is just one application of the broader Google construct.
  3. Google will be lose its grip on search. As companies get larger, those companies lose their edge. This is happening to Google now. Look at how many of its services have no focus. Talk to a company that wants to get customer support. Google is losing its luster, and this means that the “next big thing” could come from a person who is Googley, just not working at Google.

So, Ms. Hane, what are we going to do with lame duck search solutions in a world dominated by a monopolistic supra national corporation that’s working on its digital arteriosclerosis. Dear reader, what do you say? Agree with me? Agree with Ms. Hane? Have another angle? Let me know.

Stephen Arnold, August 5, 2008

Intel: Cloud Factoid

August 4, 2008

I tracked down an Intel presentation from 2006 and also used in 2007. The link is to ZDNet here. The presentation offers some interesting insights into Intel’s data center problem or opportunities in mid 2006; namely:

  • Intel has 136 of these puppies with an average cost pegged in the $100 million to $200 million range
  • Average idle capacity was about 200 million CPU hours with capacity at 900 million CPU hours, give or take a few hundred thousand hours
  • In 2006, 62 percent of the 136 data centers were 10 years old or older.
  • Plans in 2006 were to move to eight strategic hub centers.

My initial reaction to this 2006 presentation was that Intel’s zippy new chips might find a place in Intel’s own data centers. It would be interesting to calculate the cost of power across the old data centers with the aging chips versus the newer “green” chips. I expect that the money flying out the air conditioning duct is trivial to a giant like Intel.

More on this issue appeared in Data Center Knowledge in 2007 here. In 2007, according to Data Center Knowlege Google had about 93,000 servers in its data centers.

In April 2008, Travis Broughton, Intel, wrote here:

Our cost-cutting measures tend to be related to at least two of the three “R’s” – reducing what we consume, many times by reusing what we already have.

I’m not sure what this means in the context of the Cloud Two initiative, but I will keep poking around.

Stephen Arnold, August 4, 2008

Knol: A Google Geologic Hillock with an Interesting Core

August 3, 2008

I am heading to Illinois, and I vowed I would hit the road and post a comment upon arrival in America’s most scenic area: the prairie between Bloomington and Chillicothe, Illinois. Breathtaking. Almost as stunning as the discussion about Knol, Google’s alleged Wikipedia “killer”.  Apophenia here weighs in with the “standing on the shoulders of giants” argument. The idea is that Google should have done more with Knol. For me the key point in his write up was:

What makes me most annoyed about Knol though is that it feels a bit icky. Wikipedia is a non-profit focused on creating a public good. Google is a for-profit entity with a lot of power in controlling where on the web people go. Knol content is produced by volunteers who contribute content for free so that Google can make money directly from ads and indirectly from search traffic. In return for ?

The challenge is valid if Knol were designed to generate revenue from ads. At the risk of being accused of recycling information that I have been speaking and writing about for six years, let remind myself that Google has a voracious hunger for information and data in any form. Knol fits into this Google-scape. I will return to this point after I refer you to iAppliance Web here. Bernard Cole’s “Google Knol Takes on Wikipedia’s Online Encyclopedia. The key point for me in this good article was:

Knol’s collaboration model is also more hierarchical. Article collaborators can suggest changes but cannot make them without the author’s approval. While this bottleneck may lead to Knol being less timely than Wikipedia, it should prevent the revision wars that plague controversial Wikipedia articles.

I absolutely agree that Google will get something for nothing when people contribute. A Knol article, as Mr. Cole notes, will have an “owner”, a person who has met some Googley criterion as an individual qualified to write a Knol essay.

Let’s step back. When I worked my way through Google’s patent documents and the publicly available technical papers here, I noticed that a great many of these Google writings refer to storage and data management systems that hold a wide range of metadata. Google wants data about the user’s context. Google wants data about user behavior. Google wants data from books in libraries. In its quest for data, Google has been the focal point of a firestorm about copyright. Google knows that for many queries, Wikipedia with its faults pops up at the top of various Google reports listing “important” sites.

Google is a publisher and has been for a long time. The company has a wide range of mechanisms to obtain “content” from users. With the purchase of JotSpot, Google gained access to a publishing system, not a Web log tool, but a system that allowed users to input specific items in a form. The resulting information is nicely structured and ready for additional Google massaging.

When I learned about Knol, my research gave me the foundation to see Knol as a typical Google Swiss Army knife play. Let me highlight a few of the functions that I noted. Keep in mind that Google keenly desires that a coal mine explosion under my log cabin in rural Kentucky explodes and coverts me to assorted quarks and leptons:

  1. Knol has an author, so Google can figure out that anything a Knol author posts has some degree of “quality”. Knowing the author, therefore, provides a hook to add a quality score to other writings by a Knol author. Google doesn’t have legions of subject matter experts. Knol provides a content source that can help with the “quality” scoring that Google does and sometimes in an unsatisfactory manner.
  2. Knol gives Google a hook to get copyrighted material that it owns, not some Jurassic publisher who sees Google as the cause of the pitiful condition of book, magazine, and journal publishers. Once a Knol author gets some content in the system and maybe a stroke from Google or a colleague, Katie, bar the door. I would publish my next monograph on Google in a heartbeat. The money would be okay if Google used its payment system to sell my work, but the visibility would be significant. In my business, visibility is reasonably important.
  3. Know gives Google a clump of information to analyze. Google wants to know the type of things that a company like Attensity or SAS can ferret out of text. These “nuggets” provide useful values to set threshold in other, separate or dependent processes within Google.

Notice that I did not focus on Wikipedia. Google, as I understand the company, floats serenely above the competition. The thrashings of companies threatened by Google are irrelevant to Google’s forward motion. I think Wikipedia needs some fixes, and I don’t think Knol will rush to do much more than what it is now doing. Knol is sitting there waiting to see if its “magnetism” is sufficiently strong to merit additional Google effort. If not, Knol’s history. If there is traffic, Google will over time nudge the service forward.

I also ignored the ad angle. Google’s patent documents contain scores of inventions for selling ads. There’s a game-based ad planning interface that to my knowledge remains behind closed doors. Everything Google does can have an ad stuck in it. So Knol may or may not have ads. Knol is not purpose built to sell more ads, but that’s an option for Google.

Based on my research, Google has a good sense of video content. Google has not figured out how to monetize it, but Google knows who makes hot videos, the traffic a hot video pulls, and similar metrics. Google knows similar data about Web logs. Now Google wants to know about individual authors’ willingness to generate original content and how the users will behave with regard to that content.

Scroll forward two years and think about Google as a primary publisher. Knol is one cog in a far larger exploration of the feasibility of Google’s becoming a combination of the old newspaper barons and the more financially frisky Robert Maxwells of the publishing world. Toss in a bit of motion picture studio and you have a new type of publishing company taking shape.

Granted Google Publishing may never come into being. Lawyers, Google’s own management, or a technical challenge from Jeff Bezos or a legal eagle could bring Googzilla down. But narrowing one’s view of Knol to a Wikipedia killer is not going to capture Knol, what it delivers, and where it may lead.

Knol is exciting for these reasons not because it is an ersatz Wikipedia. Okay, tell me I’m recycling old information, living in a dream world, or just plain wrong. Any of these is okay with me. Remember the disclaimer for this personal Web log.

Stephen Arnold, August 3, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta