Exclusive Interview with CTO of BrightPlanet Now Available
October 13, 2009
William Bushee, BrightPlanet’s Vice President of Development and the company’s chief technologist, spoke with Stephen E. Arnold. The exclusive interview appears in the Search Wizards Speak series. Mr. Bushee was among the first search professionals to tackle Deep web information harvesting. The “Deep Web” refers to content that traditional Web indexing systems cannot access. Deep Web sites include most major news archives as well as thousands of specialized sources. These sources typically represent the best, most definitive content sources for their subject area. For example, in the health sciences field, the Centers for Disease Control, National Institutes of Health, PubMed, Mayo Clinic, and American Medical Association are all Deep Web sites, often inaccessible from conventional Web crawlers like Google and Yahoo. BrightPlanet supported the ArnoldIT.com analysis of the firm’s system. As a result of this investigation, the technology warranted an in depth discussion with Mr. Bushee.
The wide ranging interview focuses on BrightPlanet’s search, harvest, and OpenPlanet technology. Mr. Bushee told Search Wizards Speak: “As more information is being published directly to the Web, or published only on the Web, it is becoming critical that researchers and analysts have better ways of harvesting this content.”
Mr. Bushee told Search Wizards Speak:
There are two distinct problems that BrightPlanet focuses on for our customers. First we have the ability to harvest content from the Deep Web. And second, we can use our OpenPlanet framework to add enrichment, storage and visualization to harvested content. As more information is being published directly to the Web, or published only on the Web, it is becoming critical that researchers and analysts have better ways of harvesting this content. However, harvesting alone won’t solve the information overload problems researches are faced with today. The answer to a research project cannot be simply finding 5,000 raw documents, no matter how good they are. Researchers are already overwhelmed with too many links from Google and too much information in general. The answer needs to be better harvested content (not search), better analytics, better enrichment and better visualization of intelligence within the content – this is where BrightPlanet’s OpenPlanet framework comes into play. While BrightPlanet has a solid reputation within the Intelligence Community helping to fight the “War on Terror” our next mission is to be known as the commercial and academic leaders in harvesting relevant, high quality content from the Deep Web for those who need content for research, business intelligence or analysis.
You can read the full text of the interview at http://www.arnoldit.com/search-wizards-speak/brightplanet.html. More information about the company’s products and services is available at http://www.brightplanet.com. Mr. Bushee’s technology has gained solid support from some professional researchers and intelligence agencies. BrightPlanet has moved “beyond search” with its suite of content processing technology.
Stephen Arnold, October 13, 2009
Google Wave as a Publishing Tool
October 12, 2009
Yep, sooner or later someone was going to realize that Google Wave is a component of the “digital Gutenberg”. If you want to read the breathless prose of a professional journalist, navigate to “Exploring Google Wave – How Could It Transform Journalism and Publishing?” My reaction to the write up is that, like most Google analyses, the comparisons are based on what is familiar, comfortable. Google Wave is one component of a larger data management capability. Publishing will not be transformed. The Google platform creates a way to push beyond what’s familiar and comfortable. That’s going to be deeply disturbing and disruptive across a number of information centric business sectors. Wave is a subsystem. The real powerhouse is the Google data management system. We need a new term to describe what this platform makes possible. “Publishing” does not carry the freight of meaning in my opinion.
Stephen Arnold, October 12, 2009
SharePoint: The Enterprise Platform
October 12, 2009
I read “SharePoint 2010: The Enterprise Platform” with an open mind. Microsoft is “all over” the US Federal government. Many of the information technology savvy folks with whom I speak point out the advantages of the SharePoint solution. Programming is getting easier. Users are comfortable with the basic features and functions of the system. Competitors’ products are often more expensive to license. SharePoint is easily shaped into what an information professional needs to solve a particular problem. Microsoft makes available a large number of software “MRE”s; that is, ready to eat, no extra effort required to get certain capabilities or functionality.
Jeremy Thake’s article provides some useful background for SharePoint 2010. This release of SharePoint adds a number of new capabilities to an already richly endowed system. He did make a comment that I found interesting:
In my opinion and a lot of others SharePoint is “a jack of all trades and a master of none”, much like most of the other vendors who played the same card. SharePoint is extremely strong in the collaboration area from an End User perspective, but is weak for example in Records Management, Business Intelligence and Digital Asset Management.The days of purchasing a product for a specific area have clearly gone which is a shame because you pick one of the Enterprise Platforms and suffer in the weaker areas.
He concludes his write up with a reference to MOSS 2007 “horror stories” and makes clear that he loves SharePoint “anyway”.
My thought is that overburdened information technology professionals may find the charms of SharePoint fading when complexity and costs begin to rise. These two issues may be the stepping stones for Google, despite its flaws and weaknesses, to make significant gains at a time when Microsoft is hoping that SharePoint 2010 blunts the appeal of Google’s enterprise offerings.
Google is no match for Microsoft in terms of marketing. But Google does a much better job with the technology for a hybrid platform in my opinion. Can Google deal with the buzz saw of SharePoint 2010? Interesting face off to watch in the last weeks of 2009.
Stephen Arnold, October 12, 2009 No dough
Yahoo: A Case Study in the Effects of Delayed Investment in Infrastructure
October 12, 2009
In my client reports, I have pointed out that Yahoo has been behind the eight ball because of its information technology decisions. Panama is an excellent case in point. But there are other examples such as recoding Delicious.com, the multiple search and retrieval systems, and the inability to deliver advertisers the type of user ad pinpointing the Mad Ave crowd has wanted for years.
I don’t think too much about Yahoo because it is not in the search and content processing game I play. The article “Yahoo Pays Its ‘Technical Debt’ with IT Overhaul” triggered my interest in the company and the thoughts captured in this short post. ZDNet’s point was that Yahoo had a “rat’s nest” of systems. The focus recently has been rationalization of the infrastructure. For me the key point in the write up was:
Pullara [a Yahoo executive] went into details about Hadoop, which he called a love story. Yahoo started building its own MapReduce platform, but decided to go open source.
When I read this paragraph, three points came to mind:
- Yahoo is trying to follow Google. The notion of a Google legacy is directly relevant. The problem is that Yahoo is trying to tap into the Google legacy late in the game. I think it may be too late for Yahoo.
- Yahoo, if the article is accurate, is now openly admitting that its numerous technical gurus had not taken steps to reduce the complexity and associated costs of the fragmented, disconnected chunks of its infrastructure.
- The open source play may come back to bite Yahoo. The company’s ability to monetize strikes me as less effective because its new system has to stay one step ahead of those who can out do Yahoo with Yahoo’s own technology.
In short, Yahoo’s delay in tackling its information technology infrastructure problem has given Google plenty of time to build a big lead over Yahoo. Now the “new” Yahoo may be creating competitors who may find it easier to suck Yahoo’s blood than pursue Googzilla. The cost to Yahoo for its information technology blunders has been high and will become higher. Useful lesson for other firms in my opinion.
Stephen Arnold, October 12, 2009
Searchtastic: Twitter Search System
October 12, 2009
TechCrunch’s “Searchtastic Throws Its Hat Into The Twitter Search Engine Ring” called my attention to another real time search engine provider. The screenshot below shows the unique feature of the new service. I can search for Twitter users by their name:
I ran several test queries and found the results useful. When queries return null results, the system displays the search terms with a strikethrough and the message “remove words from search”. Useful but the interface was initially confusing. In comparative tests against my current favorite real time search system Collecta.com, I thought Searchtastic was useful but Collecta seemed more mature in this evolving sector of the search market.
Stephen Arnold, October 12, 2009
Google and Content Processing
October 12, 2009
I find the buzz about Google’s upgrades to its existing services and the chatter about Google Books interesting but not substantive. My interest is hooked when Google provides a glimpse of what its researchers are investigating. I had a conversation last week that pivoted on the question, “Why would anyone care what a researcher or graduate students working with Google do?” The question is a good one and it illustrates how angle of view determines what is or what is not important. The media find Google Books fascinating. The Web log authors focus on incremental jumps in Google’s publicly accessible functions. I look for deeper, tectonic clues about this trans-national, next generation company. I sometimes get lonely out on my frontier of research and analysis, but, as I said, perspective is important.
That’s why I want to highlighting a dense, turgid, and opaque patent application with the fetching title “Method and System for Processing Published Content on the Internet”. The document was published on October 8, 2009, but the ever efficient USPTO. The application was filed on June 9, 2009, but its technology drags like an earthworm through a number of previous Google filings in 2004 and more recent disclosures such as the control panel for a content owner’s administering of a distribution and charge back for content. As an isolated invention, the application is little more than a different charge at the well understood world of RSS feeds. The problem Google’s application resolves is inserting ads into RSS content without creating “unintended alerts”. When one puts the invention is a broader context, the system and method of the invention is more flexible and has a number of interesting applications. These are revealed in the claims section of the patent application.
Keep in mind that I am not a legal eagle. I am an addled goose. Nevertheless, what I found suggestive is that the system and method hooks into my analysis of Google’s semantic functions, its data management systems, and, of course, the guts of the Google computational platform itself for scale, performance, and access to other Google services. In short, this is a nifty little invention. The component that caught my attention is the controls made available to publishers. The idea is that a person with a Web log can “steer” or “control” some of the Google functions. The notion of an “augmented” feed in the context of advertising speaks to me of Google’s willingness to allow a content producer to use the Google system like a giant information facility. Everything is under one roof and the content producer can derive revenue by using this facility like a combination production, distribution, and monetization facility. In short, the invention builds out the “digital Gutenberg” aspect of the Google platform.
Here’s how Google explains this invention:
The invention is a method for processing content published on-line so as to identify each item in a unique manner. The invention includes software that receives and reads an RSS feed from a publisher. The software then identifies each item of content in the feed and creates a unique identifier for each item. Each item then has third party content or advertisements associated with the item based on the unique identifier. The entire feed is then stored and, when appropriate, updated. The publisher then receives the augmented feed which contains permanent associations between the third party advertising content and the items in the feed so that as the feed is modified or extended, the permanent relationships between the third party content and previously existing feed items are retained and readers of the publisher’s feed do not receive a false indication of new content each time the third party advertising content is rotated on an item.
The claims wander into the notion of a unique identifier for content objects, item augmentation, and other administrative operations that have considerable utility when applied at scale within the context of other Google services such as the programmable search engine. This is a lot more interesting than a tweak to an existing Google service. Plumbing is a foundation, but it is important in my opinion.
Stephen Arnold, October 12, 2009
The AP Snaps and Snarls
October 11, 2009
Dogs can be surprising. TechCrunch explains that the AP is “yapping again”. Read “You Can Ignore the AP’s Bluster. It Is Just a Negotiating Bluff” and get a good analysis of the 2009 Don Quixote event of the day. I think some content should be free. This Web log, written by an addled goose, is offered without charge. Complain and I refund your money and quack at you. Other information should carry a fee. If people don’t want to pay that fee, well, that’s a form of market research.
I try not to quote the AP in this Web log. I am a goose and terrified of those qualified to practice law. TechCrunch, as I recall, also avoids AP content.
In my opinion, the new types of services that I write about in my column for Information World Review, a Incisive Media property in London, England, present a user with interesting and accessible services. These next generations services, whether Tweetmeme.com or Trendsmap.com, represent what information delivery mechanisms are becoming. The notion that new services will embrace older business models has to be proven.
The AP is about to prove its hypothesis; namely, users will pay for AP content. The outcome of that test will be Googley. Data are going to make obvious what works and what does not work. In my opinion, the AP has a great opportunity to prove that its strategists are able to generate sufficient new revenue to make up for the lost revenue the firm has experienced. Furthermore, the AP will be able to prove or disprove the assertion that profitability will be sufficient to fund research and development, increased salaries, and staff additions.
The stakes are interesting. AP is betting the farm. Most of the poker games on TV hold this type of play to the end of the show. Maybe the AP is in Act III of a three act play. Exciting.
Stephen Arnold, October 11, 2009 No dough
Legal and Government Pressure Mounts on Google
October 11, 2009
How tough is Google? Another question might be, “How much money will Google spend on litigation?” I just read – scanned actually – the Reuters’ story “Germany Criticizes Google for Copyright Infringement”. The story reports that the German top politico used a podcast to rain on Google’s parade. The thorn in the German Chancellor’s paw is copyright or, as I understand it, Google’s stance on copyright. Reuters points out that the Google has scanned a truckload of books. Both German and French publishers are grousing. The FCC is poking around. And I have lost track of the various legal matters in which the Google plays a part. In short, maybe my assertion that legal eagles could cut off Googzilla’s oxygen was ahead of my time in 2004.
Stephen Arnold, October 11, 2009
SharePoint and User Adoption
October 11, 2009
I have been working through the SharePoint search related items that Overflight generates. One article “Ideas to Increase End Use Adoption” did not interest me when I first read it. I went back this morning and reviewed the article in SharePoint Buzz because it connected with a remark I heard in a meeting in Arlington, Virginia, last week. The article is straightforward. Some SharePoint features don’t get a quick uptake by users. In order to boost use of a SharePoint system, the author presents a number of ideas. These range from in person training to creating FAQs and other textual information to help users understand the features and functions of a SharePoint system. The article identifies multimedia content as a useful idea. A community-based support service is another good idea.
Now the question, “Why did I return to what is a common sense article about a software system?”
The answer is, “Users resist systems that create more hassles than solved problems.”
The SharePoint blog post underscored three points:
- User ignored systems are a problem, not problem solvers. Maybe training will help resolve this problem, but if users don’t use a system, there’s a deeper issue to resolve. It may be interface. It may be performance. It may be the functions are unrelated to the work task. I don’t know but I know there is a problem.
- Vendors are trying to resolve marketing issues by pushing users in certain directions. When I looked at the list of ways to boost adoption and usage of SharePoint functions, I thought about how some of my grade school teachers approached subjects.
- Microsoft’s new emphasis on UX or the user experience may be a lower cost way to solve deeper issues of a system’s design. The system itself may not deliver a solution, so the easier route is to put some lipstick on the beast and call it a day.
In my opinion, the discussion of user acceptance of certain SharePoint-based applications may point to a deeper and more troubling set of issues within the architecture of SharePoint itself. A developer may like what he or she has built. But users who ignore the service are making clear that something is off base. I am not sure training or an interface can do much if the problem resides within the deeper core of the SharePoint suite.
Stephen Arnold, October 11, 2009
Attensity Video
October 11, 2009
Videos about search and retrieval are challenging. Attensity’s latest video focuses on using Attensity’s “deep extraction” technology to find out what customers want. The video explains that Attensity’s software “diagrams sentences.” The use case focuses on email analysis. The benefit of Attensity is that it provides a way “to look for a needle within a stack of needles.” The video works in the concept of “customer sentiment”. If you want to watch the video, navigate to Scoopler and run a query for Attensity or click this link (may go dead after some period of time). Pretty slick five minute video. The marketing ante keeps getting raised as the economy waddles along.
Stephen Arnold, October 11, 2009

