Google Extends Government Indexing

June 18, 2010

Google has a better index of US government content than either the government or the vendors who are beavering away on this treasure trove. Now Google has added another chunk of content to its system. You can benefit from these data, but I would assert that Google’s MOMA Intranet may make even better use of the information. How? Just ask your local Googler for a demo.

The US Patent and Trademark Office (USPTO) is entering into a two year, no cost agreement with Google to make bulk electronic patent and trademark public data available. In this arrangement, the USPTO provides the data, Google hosts it for the public.

Research Buzz reported in their post, “Google Teaming Up With USPTO To Make Patent and Trademark Data Available” that the estimated size of this data storage will be about ten terabytes. This not so humble chunk of data will include patent grants and applications, trademark applications, and patent and trademark assignments, with more data (like trademark file histories) available in the future.

Google noted that it is only hosting the data provided by the USPTO; it isn’t altering it or changing it in any way. It should also be noted that this bulk hosting provided in zip files. It appears that Google wants you to download it to your own machines before you start analyzing it.

Skeptical geese might ask, “Why not crunch that content with the Guha / Halevy methods?” I think making the data with the benefit of semantic processing is slightly more useful than a big zip file.

Melody K. Smith, June 18, 2010

Freebie

FleeQ, a Semantic Search Engine

June 17, 2010

FleeQ is “a Web 3.0” search engine. The company’s Web site says, “Search everything in real time!” Another universal affirmative. According to the company’s Web site, “FleeQ pays 20X the CPC of AdSense.” I am a simple goose, so one site describes itself in two different ways. The company is based in Palo Alto, California.

The system, according to the firm’s Web site:

“FleeQ is a new kind of network. It powers your websites search/discovery for your users.”

In order to get a better sense of the system, I ran a number of test queries. You can follow along but make certain you enter the address: http://www.fleeq.com. Once you enter the site, it is a bit of work to get back to the search box.

Here’s the splash page which points out that I am using the Flash version of the service:

image

My most interesting test query was for the term “taxonomy.” The list of hits include two references to Wikipedia. This is the default results list:

image

The points to note are the two tabs which allow one click access to images and videos. There is a list of tabs across the screen below the search box. A click on the Facebook tab displays hits from Facebook that include the string “taxonomy”.

image

I did not discuss FleeQ.com in my lecture at the SLA’s Spotlight session. There are other real time search engines that illustrate the concepts in my talk.

I found FleeQ.com useful. The system strikes me as a metasearch with considerable plumbing designed to generate revenue from partners’ Web traffic.

Worth a look and the revenue generating options may be of interest. You can find some monetization information at http://www.fleeq.com/new/publishers.php. I am not sure I noted the “semantic” angle of the system, but you may be more discerning than I.

Stephen E Arnold, June 17, 2010

Freebie

Cut That Security Budget, Says Azure Chip Consultancy

June 17, 2010

Now I don’t know about you but when one fires up a modern day search and content processing system, the licensee has to have its security system in World Cup form. Active Directory is a popular method. Some search systems put their moist noses in the air, sniff the Active Directory settings, ingest them, and happily index content. Then when a user runs a query, the search system respects the Active Directory security settings. The idea is that a user with certain permissions can see only the content to which that person has access. Goof up the security and permissions and you have addled geese looking at golf club contributions, drafts of documents related to some hush hush matter, or personal information about that last visit to the local doctor.

I read “Enterprises Advised to Reduce IT Security Budgets” and wondered if the headline were a typographical error. Nope. The azure chip outfit Gartner apparently recommends “a three percent cut as economic situation improves.” What? The economy is improving so cutting a security budget is a recommendation. What about exposing those contract terms to eyes not authorized to see them? What happens if medical information seeps into search results when an employee is looking for information about the company picnic? What happens when the financial details of the Board of Directors’ golf outing finds its way into the hands of a committee working on reduction in force issues?

You should navigate to this article and read it for yourself. For me the most interesting comment in the write up was:

Vic Wheatman, a research director at Gartner, explained that the average percentage of IT spending on security in 2010 is five per cent, down from six per cent last year. “In 2009, in the face of a significant IT spending downturn, security spending grew slightly as a percentage of the IT budget, while many other IT spending areas were gutted,” he added. “With the economic situation projected to improve in 2010, organizations are ramping up investments in other spending areas faster than they are for IT security.”

I am not sure I am comfortable with this recommendation or the analysis itself. But I am an addled goose. The crazy stuff I write is a direct result of drinking mine run off effluent. The news story may be the flight of fancy of an azure chip marketing person.

For me, I will keep my spending for security at its pre crash level, thank you. The risk of creating a more costly problem by chopping security spending is too high for my operation. Your mileage may differ. In that case, rely on the “real” consultants at the azure chip outfits. My unsolicited opinion: Avoid Harrod’s Creek.

Stephen E Arnold, June 17, 2010

Freebie.

Endeca Crosses the Finish Line

June 17, 2010

Short honk: I heard that the engine behind Finish Line is Endeca’s. Navigate to the MarketWatch story “Finish Line Launches Mobile Commerce Site in Partnership with Endeca and Unbound Commerce.” The MarketWatch story said that the new site “m.finishline.com, [was] built on top of the same Endeca eCommerce search and merchandising technology that powers www.finishline.com.

image

For me, the key passage in the write up was:

Unbound Commerce used the open application-programming interfaces (APIs) of the Endeca Information Access Platform to connect its Mobile Presence platform to Endeca’s search and merchandising technology. This new, integrated product helped Finish Line quickly adapt its existing search and navigation infrastructure over to its new mobile commerce site. This consistency allowed the mobile site to feature Finish Line’s online customer reviews, which help shoppers to make a decision between competing brands while visiting one of Finish Line’s 670 stores. From a smartphone, shoppers can read reviews by other shoppers with similar tastes to determine the best product while standing in the aisle of the store.

Endeca’s system can generate facets with aplomb and now the company is combining structured and unstructured data in a mobile implementation. I have one source that that the system by Unbound Commerce. Endeca is demonstrating its platform’s versatility in the mobile eCommerce market.

Stephen E Arnold, June 17, 2010

Freebie

Google Austria Book Scanning Deal

June 17, 2010

Google keeps on plugging away with its book scanning project. I have been one of the people who think that Google has flowed into a vacuum. The sniping and legal flaps have not taken my eye off the ball, however. Google wants to scan, ingest the content, and make money from its effort. I think a big part of the book scanning effort is directed at Google’s knowledge base initiative. The more content processed by the Google, the better able its numerical recipes are at making decisions. The making money part is important but not the whole story.

Google, according to India’s Economic Times, has deal with Austria. The story “Google to Scan 400,000 Austrian Library Books” said:

Austria’s national library said on Tuesday it has struck a 30-million-euro deal with US Internet giant Google to digitize 400,000 copyright-free books, a vast collection spanning 400 years of European history. Johanna Rachinger, the head of the ONB library, hailed what she called an “important step,” arguing at a news conference that “there are few projects on such a scale elsewhere in Europe.” The Austrian library project concerns one of the world’s five biggest collections of 16th- to 19th-century literature, totaling some 120 million pages, the ONB said in a statement.

Important points. This is a 30 million euro deal. The content is non exclusive. The library solves a preservation problem along with some access and money issues.

Look ahead 10 years. When you want a book from this collection, will you use Google or some other service? Google is aiming for the long haul and a much bigger play. What about the “regular” scanning activity? Just keeps on clicking along in my opinion.

Stephen E Arnold, June 17, 2010

Freebie

Another Mom Says Article about Google

June 17, 2010

eWeek demonstrated that it is not going to win the MBA version of Dancing with the Stars. The article “10 Web Companies That Google Should Acquire” is one weird set of “shoulds”. I could almost hear my mom and the other moms in the neighborhood shouting, “Sergey, Larry, you should do this.” Like the kids in my neighborhood, I don’t think Sergey or Larry will listen. Even if both zillionaires did listen, Google is preoccupied with telling countries what to do. You think Google is in listening mode that recommends buying Facebook and Twitter. Why stop there, eWeek tells Google like a legally inept and technically challenged mom that it should snap up Pandora, Zoho, and Expedia. Yikes. A look at Google’s acquisitions reveals that Google marches to the beat of its own drummer. Obvious candidates like Catch Media slip off the hook or out of the net. Not so obvious outfits get ignored in the list of Google’s acquisitions; for example, Transformics. Highly publicized buy outs get little follow up or analysis; for example, Jotspot. eWeek was once a fat, must read. Now I am not sure what it is, but this article sounds like mom-think. Wacky and ultimately ignored. If Google were to make a move on Facebook, I wonder if Microsoft would flex its muscles. Redmond has a piece of the Facebook action and Google doesn’t. Assume Google tries to buy Twitter. Think any regulator would wake up and investigate? I do. Articles designed to generate traffic and create clicks by putting each specious, wacky point on a separate Web page illustrate the sad state of search engine optimization, analytic thinking, and substantive commentary. Too bad. Times change.

Stephen E Arnold, June 17, 2010\

Freebie

Attensity SAS Staff Shuffle

June 16, 2010

I learned recently that SAS lost Manya Mayes to Attensity. No big deal, but Ms Mayes had been at SAS for 15 years. Attensity seems to be serious about its text analytics business. You can get more information in the write up “Attensity Group Appints Manya Mayes as Director of Advanced Analytics.” Here’s what an Attensity officer said about the new hire:

Her SAS expertise and customer and product experience will be a great asset to the Attensity team. Her addition will build on Attensity’s current analytic capabilities, bringing advanced analytics expertise to the team.”

A couple of thoughts. I wonder why Ms. Mayes is not an officer of Attensity. Second, will SAS push back and make some noise about competition? When Google hired an Endeca expert in eCommerce, I received email suggesting that any connection between the Endeca hire and Google’s aspirations in markets where Endeca has a presence was silly.

That’s a silly goose for you I suppose.

Stephen E Arnold, June 16, 2010

Freebie

TUI, DUI, and PUI: Not Disney. User Experience

June 16, 2010

A happy quack to the reader who sent me a link to the June 8, 2010, write up “Three Types of GUIs – Past, Present and the Future.” My sense of humor subsystem processed the acronyms as “Toohee”, “Dewey”, and Poohee. I was incorrect. The acronyms refer to three types of graphical interfaces.

  • TUI is the tool user interface. Think of this interface as the Microsoft Word and Excel type of iconage and menu systems.
  • DUI is the desk top user interface. The idea is that enterprise software systems present the user with an environment. A user of a payroll system or a customer relationship management system lives and works within the application’s interface. In some MBA-dense businesses, Excel becomes the environment.
  • PUI is a process user interface. The example is using a browser to navigate to an application to make an airline reservation.

I found the write up interesting, and I quite liked the notion of a PUI or process centric user interface. I hear a lot about search-based applications and search embedded in enterprise systems.

For me, the most interesting comment in the write up was:

Now imagine business or enterprise apps that are process based, not single task items knit together by DIY process: An app that can pick up an idea, issue or request and run it through an unpredictable process that might look like a ball of yarn all the way to an implemented idea, a solved issue or a happy customer. For these, forget DUIs and TUIs, think PUIs. Imagine wizardly step by step, think two choices and a submit button, think that you will get exactly the information and the choices to make, or fields to fill in at the right time, then add what the iPhone and now the iPad has done to interface thinking. That is the future of business and enterprise apps and UIs. Bye bye to a million blog posts using the term “intuitive”, hello “just do it”.

I agree.

Stephen E Arnold, June 16, 2010

Another First for Autonomy

June 16, 2010

Here’s the headline: “Autonomy Has Fastest Archiving Revenue Growth among the Top Six Suppliers.” I was intrigued. I think of archiving as the sort of work that takes place when an organization performs a records management function. Archiving is more than a back up. The other point that I did not immediately grasp was the phrase “top six suppliers.” Thee guts of the write up is a report from a consulting firm that specializes in technical fields. I did a bit of poking around and learned here that the storage software vendors in the study were:

  • EMC with a reported / estimated market share of 21.7 percent and first quarter 2010 revenue of about $700 million
  • Symantec with a reported /estimated market share of 18.5 percent and first quarter revenue of about $530 million
  • IBM with a reported /estimated market share of 14.2 percent and first quarter revenue of about $430million
  • NetApp with a reported /estimated market share of 8.3 percent and first quarter revenue of about $250 million
  • Computer Associates or CA as the firm prefers being called today with a reported /estimated market share of 3.5 percent and first quarter revenue of about $105 million.

This is an estimated / projected market of about $3.1 billion. The top five vendors account for about 70 percent of the revenue.

What surprised me was IBM’s third place ranking. Symantec may be the vendor in the Top Three most vulnerable to the incursions of competitors. The final point that I noted is that with $1 billion divided among “Others”, newcomers may face considerable competition from incumbents and smaller players alike. The summary I saw talked about some business points such as the growth of storage revenues and IDC’s business services. Final question: “And the cloud?”

Stephen E Arnold, June 16, 2010

Freebie

Server Innovation May Boost Search Performance

June 16, 2010

The goslings and I like hardware, but we focus on the antics of the search and content processing sector. Truth be told we would rather build a gizmo than write about how to filter Chatroulette results. We want to highlight what we think is one of those hardware innovations that could open the door to more sophisticated content processing. Instead of waiting for a quantum computer to crunch Big Data, you will be able to buy A SeaMicro SM10000 FOR $140,000 and get your hands on a low power, 512 Atom processor fire breather.

You can get a quite good summary of the SeaMicro innovations in “SeaMicro Drops an Atom Bomb on the Server Industry.” Let me highlight three points from the write up and offer an observation from a flight departure lounge, not my usual goose pond.

First, the SeaMicro approach is to use Intel’s Atom chip, a low power, small gizmo. With proprietary hardware and software, SeaMicro chops the power requirement so that high performance comes with lower energy consumption. The cost of power for data centers is now a deal breaker. Electricity bills can add up to more than the cost of the server hardware in 36 months or less.

Second, the SeaMicro server uses some clever virtualization methods to make small lower power processors work like their larger, more inefficient big brothers. A SeaMicro server plugs in and operates like a data center in a box. Hook several SeaMicro servers together and you get the type of horsepower that AT&T has embedded in its most sophisticated infrastructure. Imagine. An Ashburn an a Dallas in your local hosting facility.

Third, the gizmos are plug and play. Remember the exciting days of Sun Microsystems system certification? Gone. Total cost of ownership on the SeaMicro servers is reduced. The article says “by 75 percent”. Half that would make me turn cart wheels.

What are the implications?

First, content processing bottlenecks could become less of a problem if the information retrieval system runs on a SeaMicro server. Search and retrieval is a weird combination of hurry up and wait with nasty demand spikes. Why wait for a software vendor to improve their software. Throw SeaMicro methods at the problem.

Second, the company has rolled out an innovation that I thought would come from the companies with the big, old fashioned data centers. I expect some scrambling by those caught flat footed by SeaMicro.

Finally, those in research computing will want to get their hands on one of the SeaMicro gizmos. With more fire power, ideas that were impractical on older servers may become doable.

SeaMicro may not end up the big winner, but its innovations warrant some attention.

Stephen E Arnold, June 16, 2010

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta