Good SharePoint in the Cloud Forecast

March 7, 2009

I try to look at what’s new from the Microsoft SharePoint, Fast Search, and related content processing units once a week. Since the Fast Forward 2009 road map, there’s been not too much to grab my attention. I am fascinated with road maps. These are easier to create and deploy than software. I did come across a very useful set of PowerPoint slides here. The focus is SharePoint from the cloud. My hunch is that Microsoft will be packing SharePoint with search technology when the road map converts to shipping code. If this url doesn’t work for you, navigate here and click the faint Download link at  http://cid-0ddc65de8785e94e.skydrive.live.com/self.aspx/Public/mpdc-bpos%20-%20DIWUG20090217.pdf. Note that this information is on a Microsoft Sky Drive in Adobe PDF format, a fact I find amusing. The presentation is by Serge van den Oever of Macaw. Parts of the talk are in Dutch, but the meaty stuff is in the diagrams. Here’s an example of the type of information available. Note: this is a portion of a single slide; there’s more on the original:

sharepoint

Another useful slide shows the pricing in US dollars. Navigate to the original for this information. I don’t know how touchy the Microsoft legal eagles are about folks reproducing non a Dutch presentation with US SharePoint costs. There’s a screen shot of an application from Metavistech which looks interesting as well. There’s even a “pimp my SharePoint” slide for those with a yen to customize SharePoint and a sense of the California car culture. Instead of a hot tub, the slide suggests adding a wiki to SharePoint. Sounds cool.

Stephen Arnold, March 7, 2009

Deep Peep

March 7, 2009

A happy quack to the reader who sent me a link  to the Deep Peep beta. You can try the beta of the deep Web search engine here. The site said here:

DeepPeep is a search engine specialized in Web forms. The current beta version tracks 13,000 forms across 7 domains. DeepPeep helps you discover the entry points to content in Deep Web (aka Hidden Web) sites, including online databases and Web services. This search engine is designed to cater to the needs of casual Web users in search of online databases (e.g., to search for forms related to used cars), as well as expert users whose goal is to build applications that access hidden-Web information (e.g., to obtain forms in job domain that contain salary, or discover common attribute names in a domain). The development of DeepPeep has been funded by National Science Foundation award #0713637 III-COR: Discovering and Organizing Hidden-Web Sources.

Deep Web is one of those buzz words that waxes and wanes. For many years Bright Planet and Deep Web Technologies have been the systems I associated with indexing content behind passwords and user names. I wrote a report about Google’s programmable search engine in 2007. The PSE contains some “deep Web” functionality, but the GOOG exposes only a fraction of its “deep Web” capabilities to the adoring millions who use the Google search system. An example of a typical “deep Web” data set might be the flight information and prices available at an airline site or the information available to a registered user of an online service. Dealing with “deep Web” issues is a lot of work. Manual fixes to spider scripts are expensive and time consuming. The better “deep Web” systems employ sophisticated methods that eliminate most of the human fiddling required to navigate certain services.

Today quite a few systems have “deep Web” capability but don’t use that phrase to describe their systems. Here’s a screen shot from my test query for “search”. I used the single word “search” because the word pair “enterprise search” returned results that were not useful to me.

deep peep

Give the new system a spin and share your opinions in the comments section of this Web log.

Stephen Arnold, March 7, 2009

Censoring Search

March 7, 2009

The Japan Today Web site ran “Google, Yahoo!, Microsoft Urged Not to Censor Search” here. The article does a good job of summarizing the hoo hah over various Internet filtering efforts. The most interesting paragraph to me was:

RSF [Reporters without Borders] and Amnesty said that currently, “there are more than two dozen countries restricting Internet access on a regular basis.” They said they “understand the challenges of operating in countries that restrict Internet access; these countries are trying to pressure you to obey local laws that do not comport with international law and standards that protect freedom of expression. “But complying with local demands that violate international law does not justify your actions,” they said.

The point that struck me was the implicit assumption that Web indexes are not now filtered or in some way shaped. The broader filtering is not so much new as it is in the public eye. Consequently write ups that want a free Internet with sites available may want to do a bit more digging into what has been done by Web indexing and directory outfits for a long time.

At The Point (Top 5% of the Internet) in 1993–yep, that’s 25 years ago, folks–we built a business on filtering out porn, hate, and other types of sites we defined as inappropriate in our editorial policy. Since those early days of online directories and indexes, content is either not processed, skipped by the crawler, or blocked in the indexes.

Free and open. Sounds great. Not part of the fabric of most indexing operations. If you can’t figure out why, you qualify as an azure chip consultant, fully equipped to advise government entities, non profit institutions, and commercial entities about search, online access, and content. For me, filtering is the * only * way to approach online content. I filter for behind-the-firewall search with a vengeance. Why? You want the stuff in your laptop’s folders in the organization’s index? I filter with the force of legal guidance for eDiscovery. Why? You want to run afoul of the applicable laws as they apply to eDiscovery and redacting? I filter for libraries. Why? You want the library to create problems for patrons with problematic Web sites or malware? No, I didn’t think so.

Free and open. Silliness. Poke around and find out what the guidelines are for content at some of the high profile Web indexing and content companies. If you find a free and open index other than a dark net, shoot me an email at seaky2000 at yahoo dot com. I will check it out.

Stephen Arnold, March 7, 2009

Clouds Dissipate at HP

March 7, 2009

Hewlett Packard joined Yahoo is bailing out of the cloud storage business. You can read the ComputerWorld story “HP Shuts Down Upline Online Storage Service” here. HP has the distinction of going zero for two in the online game. First, the company muffed the bunny with AltaVista.com. When the wizards escaped the HP Compaq DEC set up, Google and other companies surged forward. Now HP pulls the plug on a service that did not work as well as Amazon’s service.  HP beaten by an eCommerce company. The most interesting comment in the write up in my opinion was:

HP’s Upline service had trouble from the start. Three weeks after opening in April last year, it went down for a week. Users at the time reported problems in the client software to upload and synchronize files with the hosted service — calling Upline a good idea that was horribly executed.

HP has some big customers, one of which is alleged to be Microsoft. I thought HP was an ink company.

Stephen Arnold, March 7, 2009

iPhone and Web Use: 65% Share

March 7, 2009

I missed this March 1, 2009, story. I wanted to snag the date and the data. My hunch is that both will become useful at some point in the future. You can read the story “Apple iPhone Controls over 66% of All Mobile Web Use” here.

Net Applications’ February results show the iPhone operating system having managed over nine times the usage of its next smartphone competitor, Windows Mobile, which had just 6.91 percent of the traffic measured across tens of thousands of sites. Other smartphone platforms haven’t fared any better, according to the metrics. Google’s Android and Symbian were both locked in a tie for 6.15 percent. Research in Motion’s email-centric BlackBerry OS was used less often at just 2.24 percent and was even outmatched by PalmOS devices, which represented 2.37 percent of cellular web use last month.

To me, we have an interesting duality in two different sectors. Google in Web search mirrors Apple in mobile. The other duality is Facebook and Twitter in the social space. Will the four become three? What happens to the also rans?

Stephen Arnold, March 7, 2009

Dead Tree Update: Times Roman Edition

March 7, 2009

Robert K. Blechman’s “The Decline and Fall of the Times Roman Empire” seemed at first glance to have little to do with my interests in search, content analysis, and text processing. You will want to read the essay in BlogCritics here. The article begins with the Times’s decision to sell its building. Once this was a great MBA notion. Now it suggests moving from a Long Island mansion to a trailer park in New Jersey. My metaphor, not the cultured Mr. Blechman’s. The Times has a number of businesses that are performing in a sub par way. Mr. Blechman provides useful background information about the information environment. His analysis is sound. For me, the most important point was:

image

How I see traditional media working to make newspapers, broadcast radio and television, and blockbuster motion pictures money earners in the Twitter Era. Source: http://thusagricola.com/wp-content/uploads/sisyphus.jpg

Having consolidated their smaller competitors out of existence, the declining newspapers can’t use the same trick that they used in the face of broadcast journalism, that is exploiting “local advantages in providing information to readers and connecting advertisers and consumers in a city.” This opportunity has been sucked away by the Internet.

I quite liked the phrase “sucked away by the Internet.”

Good writing. Incorrect view of reality in my opinion.

My view of this situation is distorted by my interest in search and my experience in traditional and electronic publishing. Points of importance to me not referenced in the write up include:

  • Electronic aggregators tried to work with established traditional media. The Business Dateline crafted by Ric Manning (Courier Journal & Louisville Times Co.) with some modest inputs from me and others on the team had to work quite hard to [a] explain what online meant as a revenue opportunity and [b] how electronic content different from print media. Believe me. We tried, and we arrived with the seal of approval of an old line monopolistic newspaper company. Didn’t matter. The mental leap was too great for those steeped in print. Sad thing is that even today, the leap is too great. Most traditional print wizards are clueless about the differences in the media.

Read more

SurfRay: Direction Becoming Clearer

March 6, 2009

A reader in Denmark asked me, “Have you read that Vaekstfonden has acquired SurfRay?” I had not read that nor had I heard the rumor. I checked the investment firm’s Web page this morning and I did not see any information about this step, which makes sense. I recall seeing Vaekstfonden’s name referenced in the documents I gathered when I started tracking the SurfRay activity. As you may know, SurfRay offers the Mondosoft, Ontolica, and Speed of Mind search and content processing technology. I located a ComputerWorld story here that appears to confirm the sale. My Danish is not too good, but I think the gist is that with the change in ownership, the Vaekstfonden sees the Danish technology as a solid. New management will be installed. The article references other SurfRay products with which I was not familiar; namely, IdleSurf. More information will be finding its way to Harrod’s Creek. If a reader has additional information, please, use the Comments section of this Web log. It’s a help to me if you have a source and can include it. With new information about old companies I need some guidance.

Stephen Arnold, March 6, 2009

Google and Its Personal Touch

March 6, 2009

You will want to navigate here, read “Why I Sued Google (and Won), and then make a copy for reference. I don’t know if the story is 100 percent accurate. My view is that this narrative provides possible insight into working with the world’s largest online ad company. Mr. Greenspan may want to create a T shirt with the statement:

But it’s not fair!–Google

I find the notion of fairness quite interesting. A happy quack to Mr. Greenspan, president of Think Computer here.

Stephen Arnold, March 6, 2009

Twitting Ain’t Search and Google Used to Suck

March 6, 2009

I am an addled goose, an OLD addled goose. I liked some of the points in “Twitter Ain’t Search” but I had some qualms about accepting the assertion that Twitter is not search. You must read the article here. For me the most interesting comment in the write up was:

I kind of view Twitter as dead simple blog platform for the masses (hence the adoption of it by the masses). Blog platforms like the one for this blog (Movable Type) can be complicated – especially for the mainstream folks who don’t know/ want to learn html commands.

My view is that Twitter is indeed micro blogging. But the significance of Twitter is in the information flows and the access thereto. Here’s why:

I have learned that electronic information generates enough paradoxes to give Epimenides a headache. Example: online information gave way to CD ROMs. The commercial online giants said, “CDs suck. Too small.” Yep, CDs then changed some unexpected sectors of the information industry and this was in the 1983 to 1985 time period. Then Lycos came along and people said, “Lycos sucks. No updates.” AltaVista.com came along, figured out the update thing and HP said, “AltaVista.com sucks.” So Google.com came online. Some people said, “Google sucks. It’s not a portal.” On and on.

Twitter is an example of the type of information opportunity that occurs when a sufficient number of users generate information flows. Who cares whether an individual Twitter message is “right” or “wrong”? Who cares if Twitter crashes and burns or whether it is bought by Verizon and turned into a subscriber only service. The US is not where the action is in information flows in case you haven’t heard.

Twitter is important because it represents a model of what one or more companies can use as an example. Google cracked Web search, but the real time SMS flows are new territory, and if you don’t understand that where information flows, money exists. Quick example: you are a law enforcement professional. You are dealing with a person of interest aged 17 in Rio de Janeiro. The person of interest coordinates a group of eight to 10 year olds. The “pack of kids” distracts a tourist, probably a complacent American pundit. Whilst engaged, the kids take the passport, billfold, and camera and scamper off. The whole deal is organized by text messages sent on disposable mobile phones thoughtfully provided by the person of interest. A system that permits searching of these SMS messages or Tweets in Twitter speak * could * be helpful to law enforcement. The messages could be baloney. But a search takes a short amount of time. If useful info0rmation becomes available, that’s a plus. If none becomes available, the law enforcement professional has learned something useful about the person of interest. I am sure one can think of other examples of the benefit of real time information flows generated by the technically hip, the permanently young, and middle school to college people who just see Twitter as another part of the everyday dataspace.

I am coming around to the view that Twitter-type systems are important and are likely to reshape the notion of real time search.

Stephen Arnold, March 6, 2009

MyRoar: NLP Financial Information Centric Service

March 6, 2009

A happy quack to the reader who alerted me to MyRoar.com. This is a vertical search service that relies on natural language processing. I did some sleuthing and learned that François Schiettecatte joined the company earlier this year. Mr.  Schiettecatte  has a distinguished track record in search, natural language processing, and content processing. French by birth, he went to university in the UK and has lived and worked in the US for many years. Here’s what the company says about MyRoar.com:

In today’s current political and economic environment people have never had more questions. MyRoar helps people sort through the hype to find just the answers they are looking for. Extraneous information is eliminated, while saving hours of time or abandonment of search. We provide a fun new interface that keeps users up to date on current news, which helps them formulate the best questions to ask. MyRoar is a Natural Language Processing Question Answering Search Engine. Using integrated technologies we are able to offer high precision allowing users to ask questions relating to finance and news. MyRoar integrates proprietary Question Answer matching techniques with the best English NLP tools that span the globe.

You can use the system here. The system performed quite well on my test queries; for example, “What are the current financials for Parker Hannifin?” returned two results with the data I wanted. I will try to get Mr. Schiettecatte  to participate in the Search Wizards Speak interview series. Give the system a whirl.

Stephen Arnold, March 6, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta