Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

Google Press, O Reilly, and a Possible Info Discontinuity

January 4, 2010

Google’s book on HTML5 is moving along. Soon it will be available for sale. At that moment, a seismic shock is triggered in the already Jello like world of traditional publishing. Oh, if you don’t know about the Google Press imprint, you can catch up on your reading by looking at:

For a more robust discussion of the tools Google will use as it solves the copyright problem for new, significant content, check out Google: The Digital Gutenberg, September 2009. Better yet, write me at seaky2000 at yahoo dot com and inquire about a 90 minute briefing on Google’s publishing technology and the disruptions these technologies are likely to let loose in 2010.

First, let me provide some context.

In Google: The Digital Gutenberg I pointed out that Google’s infrastructure works like a digital River Rouge. Put stuff in at one end and things come out the other. The steady progress of Google toward a clean, tidy solution to copyright hassles is for Google to become a publisher. What goes in at one end are content objects and what comes out the other can be just about anything Google can program its manufacturing system to produce.

Now I know that the publishers want Google to [a] quit being Google, which is tough since the Google is little more than a manifestation of technology anybody could have glued together 11 years ago, [b] subsidize publishers so the arbiters of what’s smart and what’s stupid can continue as museum curators of information, and [c] give publishers some of the profits from advertising so publishers can shop for white shoes and vintage motor yachts.

image

Google uses algorithms like a fishmonger to convert the beastie into tasty, easily sold fillets. Image source: http://www.fishingkites.co.nz/cleaning-fish/filleting_fish/fillet_2.jpg

The solution is simpler. When Google signs up an author, Google offers terms. The author takes the terms or leaves the terms. Now the Google does not go quickly into that good night. The Google takes baby steps. Google has a fondness for Tim O’Reilly, and it supports number of O’Reilly ventures, including the somewhat interesting Government 2.0 conference.

Read more

An Original Aggregator Teeters on the Brink

December 28, 2009

I sat on this write up for about a week. I read the December 20, 2009, “Revised and Condensed” write up in the New York Times. I don’t know if the piece is available online because I don’t use traditional media’s online services. I am more interested in how the traditional print and magazines to which I subscribe present information about the challenges consumer publishing in the US faces.

For your information, I ran a quick query before scheduling this write up for release on Beyond Search on December 28, 2009, and, to my surprise, this link on the New York Times’s Web site worked. Glory be!

My plan for this write up is to highlight some of the more striking points set forth in the article with the subtitle “A Reader’s Digest That Grandma Never Dreamed Of.” I won’t point out that Ms. Sperling, my anti-Arnold English teacher in high school, would have given the headline writer an F and inked in red: “A Reader’s Digest about Which Grandma Never Dreamed.” But why fiddle around with the small stuff when the overall point of the article is of larger import. I will comment on that at the end of this short write up.

Now that you have the plan of attack, let’s look at the passages I found interesting.

this sentence captures exquisitely the decay, the loss of a future, and the end of a traditional information company :

Walking the hallways now, it’s hard to imagine the bustle. More than half of the building is empty, a ghostly warren of empty cubicles and unused bathrooms. You can walk for long stretches without seeing anyone. A stand-alone brick addition has been condemned because of mold, a company spokesman said.

Ms. Sperling would have inked a circle around “mold” and written you have confused a frame or model with a saprotrophic fungi. Man, she was a picky one. She wanted the word spelled “mould”.

image

Image source: http://i.pbase.com/g5/61/391661/2/67960731.z5oyTWFv.jpg

Read more

Preliminary List of Beyond Search Evaluated Social Search Systems

December 23, 2009

The goslings and I had some disagreements about what to include and what to exclude. If you read my column in Incisive Media’s Information World Review, I have mentioned many of these systems. In London earlier this month a person asked me to run a table of the social search systems. I anticipate that a large number of azure chip consultants, poobahs, satraps, and SEO mavens will have a field day recycling these links. The addled goose is too old and too disinterested to honk much about short cuts.

As with our list of European enterprise search vendors, we will add to this list over time. I will not include my ratings for each system in this list. I have not decided about using my goose ratings as part of the Overflight service or one of the listings on my archive Web site. If you don’t agree with a site’s inclusion or if you have a site to suggest, use the comments section of the Web log. There will be some weird breaks and spacing issues. WordPress often baffles me with its handling of table code. If the breaks annoy you, the addled goose says, “Create your own list.” Honk.

Read more

Will Mr. Google Rustle the Adobe Cash Cow

December 18, 2009

I think most buisness intelligence write ups are dull. Corporate catastrophes can be fun! Just ask Bain, Boston Consulting Group, and other blue chip firms. I want to give you a glimpse of another Google disruption that is not in the “Sergey and Larry eat pizza books.” The informaton in this write up comes from open sources. The difference between this analysis of a single Google invention and telling anecdotes about advertising is that the Google is poised to put some major pain a some large outfits in a business sector not generally associated with Google. In this article, I refer to Google as Mr. Google and Googzilla. I find that making light of what may be one of the more significant capabiliteis of this company is fun for me. Enjoy. Oh, if you are annoyed by my writing style, may I remind you that this is my personal Web log and it available to you for free. Therefore, don’t write to complain about my approach, just go read something more appetizing to you.

Any one remember Andrew Herzfeld? Earlier this year, the New York Times pointed out that Andrew Hertzfeld, “who helped develop the original Maacintosh and now works at Google” that Mr. Google was looking for different cash cows. Graphical interfaces and related software wizardry are nothing new to Mr. Google. But Mr. Hertzfeld is a bit like Vint Cerf or Jeff Dean. These are humans with brains that dwarf the addled goose’s pitable gray matter. Mr. Hertzfeld is a wizard. In addition to the Macintosh work, he founded General Magic and then Eazel in 1999. He donned his Google T shirt in 2005. Not exactly an average Googler, but you get the idea that Mr. Hertzfeld has some graphics savvy amidst the Haskell crowd.

So, what’s Mr. Hertzfeld doing at Googzilla’s magic factory? Picasa? An in-browser image editor? I don’t know much, but I do know how to look at certain types of open source information. A recent example is US Patent 7631252, filed in July 2006. The title is “Selective Image Editing in a Browser”. To give you some context for Mr. Hertzfeld’s interests, he has a patent called “Graphical User Interface for Navigating btween Levels Displaying Hallway and Room Metaphors.” After looking at these two documents, my hunch is that the Google wants to visit the feedlot where Adobe’s cash cow Photoshop is getting fat.

You can read these documents and draw your own conclusion, but I am going to snap this invention into my Google capabilities matrix under “Graphics Disruption”. Hey, I am an addled goose, so those folks with image editing systems that run on the desktop or in the cloud can tell me I am off base. No problemo.

But, just for fun, let’s look at what the crystal clear prose of US7631252 tries to communicate.

Here’s the abstract:

Methods, tools, and systems are provided for editing an image in a browser. One method provides editing an image in a browser including maintaining a list of transformations applied to the image including a last transformation, receiving a selection from a user to rollback a transformation, the selection not including the last transformation, generating a unique identifier associated with the edited image without the selection and requesting a page using the unique identifier.

Not too exciting, right?

Now Mr. Google employs a junior poobah named Cyrus. This bright lad insists that I create illustrations for my books and lectures using Photoshop. The reason for this interesting assertion is that Cyrus does not read patent documents. Here’s a Google illustration that supports the patent:

hertzfeld

If you know about online image editing, you can figure out that the simplified interface supports a number of controls. The feature seems to be that behind the “simple” facade are some Photoshop-like functions.

What makes the patent interesting to me is that Mr. Google is supporting some computationally intensive and storage gobbling functions. Browser based roll back is one example.

The other aspect of the invention that I noted was that there is some smart software clanking around in the background. One quick example is the auto recognition capability that invokes certain functions. Mr. Google provides 21 claims for this invention. Most of these till earth that other image editing outfits have trampled into hard packed clay. A couple of them are going to allow Mr. Google to exert some disruptive forces in the image editing markets.

To put this in some perspective, Mr. Google has a vector capability. Mr. Google has a bitmap editing capability. Mr. Google has a plan for something. I wonder if there is a confection called the “creative sweet” in Mr. Google’s candy shop.

Stephen E. Arnold, December 18, 2009

Oyez, oyez, I want to report to the Jet Propulsion Lab that I was not paid to write about this invention, the Googler who does not read patents, or the coming pressure for the kids from Adobe. I would like to get paid for this type of serious patent analysis. I won’t even get a lump of coal for Christmas.

Google 2010: Speed Becomes a Competitive Advantage

December 12, 2009

In real estate, the keys to success are location, location, and location. In software, the keys to success are speed, speed, and speed. Which vendor is speed crazed, addicted like pro football players to the fawning of the media? The Google. In fact, the facet of Google which few appreciate is the giddiness a Googler experiences when he or she can point out how few milliseconds a process consumes, how quickly a bloated JavaScript function performs, and how rapidly petabytes of dross can be shoved through a hydraulic data mining system.
The dirty little secret of the major vendors is that their systems are piggy. Going fast is not in the code’s DNA. I can identify some examples that will make make vendors squirm. Ready>

How do you make Oracle, IBM DB2 and Microsoft SQL Server go fast? Easy. Throw hardware at the problem. The good news is that this solution is good for the economy, resellers, and the professional babysitters who work as apologists for these aging and ponderous systems. The bad news is that one never finishes throwing hardware at these members of the Kubanochoerus clan.

image

How do you make a content management system go fast? Easy. Throw hardware at the problem, strip out custom scripts, and reduce the number of users. This works like a champ but it means that already inefficient components remain inefficient. Sigh.

How do you make some enterprise search systems go faster? Easy. Throw hardware at the problem, chop out custom scripts, reduce the number of supported sessions, eliminate certain hog like sub functions, reduce the number of documents processed, and limit the number of queries supported in a time slice.

Bet you don’t hear this type of information from your ever optimistic and cheerful vendors.

The reason I want to call attention to the issue of performance is that Google in 2010 is going to ride the performance pony. It makes little difference if you try to ignore the Google’s pitch, your chief financial officer is going to look at the bills for the status quo. If that guy or gal makes the connection between the traditional approach to IT and the new methods advocated by Google, the Google is going to do a loop-de-doop around the Oracle DBAs, the Microsoft Certified Professionals, and the phalanxes of IBM sales engineers. Google’s marketing allies are CFOs who need relief from IT cost pressure and anyone born before 1994.

You can see some of Google’s method in “Google Updates Web App Toolkit for Speed.” There are other examples which range from the Chrome browser being faster than Internet Explorer 8 to the Google DNS play. Google not only wants to become the Internet.

image

Google wants to become the love child of the original business drivers of the original odd couple—IBM and Microsoft. Oracle, SAP, and other firms will be orphans in this brave new world unless:

  • Established enterprise software vendors find out how to deliver speed at a very competitive cost
  • Find a way out of the crazy Sergio Leone code that established vendors distribute as the next big thing
  • Can embrace and adapt to the new economic realities, abandoning the sit-around-the-campfire approach of talking about the good old days.

There you have it. Google will attack using performance as the pointy end of its marketing. Once the pointy end sticks an established vendor in a vulnerable spot like the buttocks, the Google will move in with cost and ease of use. You can get lots of detail in my Google trilogy.

Okay, now tell me how wrong I am. Just bring facts. For example, I point out in my monographs specific Google innovations related to performance. Don’t tell me about SharePoint’s or Oracle’s legendary speed. Show me data not created by house pet consultants. I am waiting. Honk, honk.

Stephen Arnold, December 12, 2009

I wish to disclose to the US Department of the Treasury that I was not paid to point out the fool’s gold that is presented as the real McCoy by mainstream vendors. Yep, a freebie.

The Google Gong Rings for ProQuest and Dissertation Content

December 7, 2009

A MOVIE CAMERA BUSINESS TRIES TO ADAPT

In June 1986, I was sold along with the electronic publishing assets of the Courier Journal & Louisville Times Co. to Bell+Howell. B+H owned a new media company, which in the late 1980s did business as University Microfilms with the acronym UMI. At that time, the company’s product line up spanned a number of media. At one end of the spectrum was the original business based on creating microfilm replicas of documents. These microfilms were sold to libraries. Generations of students used technology perfected during World War II for their access to information not in a library’s collection. At the other end were the electronic products from the Courier Journal: ABI/INFORM, Pharmaceutical News Index, and Business Dateline (the first full text local business news database with corrections made when the source updated the story’s facts).

image

Now this is an efficient research tool for today’s student. Source: http://www.archivalsuppliers.com/images/Picture%20284.jpg

When I was a student, I did research with microfilm. It was okay but it took a long time to get the reels, get them set up, and reviewed. Getting a hard copy of a document was a hassle. Some of the prints turned black and became unreadable quickly. I once dropped a reel and watched in horror as it unspooled, picked up dirt, and was unusable. I had to pay the library for a replacement. I think in the 1960s, a single reel cost me about $45 which was more than I made in my part time job. I loathed the stuff.

At the recent Online Information 2009 event in London, my colleague Ulla de Stricker was the keynoter for the “Publishers Delivering Value” track on December 3., 2009.  In her talk – which she mentioned the Google move into dissertations. Her reference inspired me to write this opinion piece. You can get iinformation about her at DeStricker.com. One of her example was the fact that Stanford University students may now submit their dissertations to Google while it is optional to submit them to ProQuest.

So I wandered over to the exhibit hall to visit with ProQuest, all the while reminiscing about my past experience with that company – known as UMI at the time.

MICROFILM: HARD TO USE, EASY TO DAMAGE AND MISFILE

When I was working on my PhD, I remember my fellow students talking about the costs of getting their dissertations “published” and then included in the Dissertation Abstracts index. I never had this problem because I took a job with the nuclear unit of Halliburton, never bothering to submit my dissertation once I got a real job.

image

A microfilm readers. Source: http://www.ucar.edu/library/collections/archive/media/photographs/481_1976_microfilm_lg.jpg

The whole system was a money making machine. When a library burned down, backfiles could be acquired when physical copies were not available. When a university got a grant for a new field of study, a collection of journals could be purchased from UMI on microfilm. Bang. Instant academic reference material. I don’t recall how much content the “old” UMI moved to microfilm. My recollection is that there were books, journals, newspaper, and, of course, dissertations. With all this film, I understood why B+H had paid tens of millions for the Courier Journal’s electronic publishing expertise. Buying expertise and using it are two different things, in my opinion.

MECHANICAL PRODUCTION WRONG FOR DIGITAL PRODUCTS

The production process for creating a microfilm was quite complicated and involved specialized cameras, film, and chemicals. The image I have of the UMI facility in Ann Arbor, Michigan, the first time I visited was a modern day castle surrounded by a moat. The castle was a large, one-story building surrounded by a settling pond. The chemicals from the film processing were pumped into the moat in order to separate certain high value residue from other chemicals. UMI processed so much film that the residue silver from the photographic process warranted this recycling effort.

image

Dinosaurs struggle with the concept of an apocalypse. Adapt or get grilled I suppose.

UMI had a silver mine in its monopoly on certain types of content. My recollection of UMI was that its core product was getting universities to require or maybe strongly recommend that doctoral dissertations had to be “published” by UMI. The microfilm copies of the dissertations were sold back to the doctoral students and to libraries interested in having a compact, relatively easy way to store volumes on a mind boggling range of topics. I did a project that required me to use a microfilm copy of something called the Elisaeis by a wild and crazy religious poet named William Alabaster, and several dissertations written about that nearly forgotten literary work. I also did a project for the Vatican and worked through microfilms of sermons from the middle ages in Latin. Now that was fun! Pretty sporty content to. Nothing like a hot omelie.

Read more

Some Thoughts About Real Time Content Processing

December 2, 2009

I wanted to provide my two or three readers with a summary of my comments about real time content processing at the Incisive international online information conference. I  arrived more addled than than normal due to three mechanical failures on America’s interpretation of a joint venture between Albanian and Galapagos Airlines. That means Delta Airlines I think.

What I wanted to accomplish in my talk was to make one point—real time search is here to stay. Why?

First, real time means lots of noise and modest information payload. To deal with lots of content requires a robust and expensive line up of hardware, software, and network resources. Marketers have been working overtime by slapping “real time” on any software product conceivable in the hopes of making another sale. And big time search vendors essentially ignored the real time information challenge. Plain vanilla search on content updated when the vendor decided was an easier game.

Real time can mean almost any thing. In fact, most search and content processing systems are not even close to real time. The reason is that slow downs can occur in any component of a large, complex content processing system. As long as the user gets some results, for many of the too-busy 30 somethings that is just fine. Any information is better than no information. Based on the performance of some commercial and governmental organizations, the approach is not working particularly well in my opinion.,

Let me give you an example of real time. In the 1920s, America decided that no booze was good news. Rum runners filled the gap. The US Coast Guard learned that it could tune a radio receiver to a frequency used by the liquor smugglers. The intercepts were in real time, and the Coast Guard increased its interdiction rate. The idea was that a bad buy talked and the Coast Guard listened in real time even though there was a slight delay in wireless transmissions. The same idea is operative today when good guys intercept mobile conversations or listen to table talk at a restaurant.

The problem is that communications and content believed to be real time are not. SMS may be delivered quickly, but I have received SMS sent a day or more earlier. The telco takes considerable license in billing for SMS and delivering SMS. No one seems to be the wiser.

A content management system often creates this ty8pe of conversation in an organization. Jack: “I can’t find my document.” Jill: “Did you put it in the system with the ‘index me’ metatag?’” Jack: “Yes.” Jill: “Gee, that happens to me all the time.” The reason is that the CMS indexes when it can or on a specific schedule. Content in some CMSs are not findable. So much for real time in the organization.

An early version of the Google Search Appliance could index so aggressively that the network was choked by the googlebot. System administrators solved the problem by indexing once a day, maybe twice a day. Again, the user perceives one thing and the system is doing another.

This means that real time will have a specific definition depending on the particular circumstances in which the system is installed and configured.

Several business sectors are gung ho for real time information.

Financial services firms will pay $500,000 for a single Exegy high speed content processing server. When that machine is saturated, just buy another Exegy server. Microsoft is working on a petascale real time content processing system for the financial services industry which will compete with such established vendors as Connotate and Relegence. But a delay of a millisecond or two can spoil the fun.

Accountants want to know exactly what money is where. Purchase order systems and accounts receivable have to be fast. Speed does not prevent accidents. The implosion of such corporate giants as Enron and Tyco make it clear that going faster does not make information or management decisions better.

Intelligence agencies want to know immediately when a term on a watch list appears in a content stream. A good example is “Bin Ladin” or “Bin Laden” or a variant. A delay can cost lives. Systems from Exalead and SRA can handle this type of problem and a range of other real time tasks without breaking a sweat.

The problem is that there is not certifying authority for “real time”. Organizations trying to implement real time may be falling for a pig in the poke or buying a horse without checking to see if it has been enhanced at a horse beauty salon.

In closing, real time is here to stay.

First, Google, Microsoft, and other vendors are jumping into indexing content from social networks, RSS feeds, and Web sites that update when new information is written to their databases. Like it or not, real time links or what appear to be real time links will be in these big commercial systems.

Second, enterprise vendors will provide connectors to handle RSS and other real time content. This geyser of information will be creating wet floors in organizations worldwide.

Third, vendors in many different enterprise sectors will be working to make fresh data available. You may not be able to escape real time information even if you work with an inventory control system.

Finally, users—particularly recent college graduate—will get real time information their own way, like it or not.

To wrap up, “what’s happening now, baby?” is going to be an increasingly common question you will have to answer.

Stephen Arnold, December 2, 2009

Oyez, oyez, I disclose to the National Intelligence Center that the Incisive organization paid me to write about real time information. In theory, I will get some money in eight to 12 weeks. Am I for sale to the highest bidder? I guess it depends on how good looking you are.

Cicumvallation: Reed Elsevier and Thomson as Vercingetorix

November 27, 2009

Google Scholar Gets Smart in Legal Information

One turkey received a presidential pardon. Other turkeys may not be so lucky on November 26, 2009, when the US celebrates Thanksgiving. I am befuddled about this holiday. There are not too many farmers in Harrod’s Creek. The fields contain the abandoned foundations of McMansions that the present economic meltdown have left like Shelly’s statue of Ozymandius. The “half buried in the sand” becomes half built homes in the horse farm.

As Kentuckians in my hollow give thanks for a day off from job hunting,, I am sitting by the goose pond trying to remember what I read in my copy of Caesar’s De Bello Gallico. I know Caesar did not write this memoir, but his PR bunnies did a pretty good job. I awoke this morning thinking about the connection between the battle of Alesia and what is now happening to the publishing giants Reed-Elsevier and Thomson Reuters. The trigger for this mental exercise was Google’s announcement that it had added legal content to Google Scholar.

vercingetorix

What’s Vercingetorix got to do with Google, Lexis, and Westlaw? Think military strategy. Starvation, death, surrender, and ritual killing. Just what today’s business giants relish.

Google has added the full text of US federal cases and state cases. The coverage of the federal cases, district and appellate, is from 1924 to the present. US state cases cover 1950 to the present. Additional content will be added; for example, I have one source that suggested that the Commonwealth of Virginia Supreme Court will provide Google with CD ROMs of cases back to 1924. Google, according to this source, is talking with other sources of US legal information and may provide access to additional legal information as well. What are these sources? Possibly
Public.Resource.Org and possibly Justia.org, among others.

The present service includes:

  • The full text of the legal document
  • Footnotes in the legal document
  • Page numbers in the legal document
  • Page breaks in the legal document
  • Hyperlinks in the legal document to cases
  • A tab to show how the case was cited in other documents
  • Links to non legal documents that cite a case.

You can read various pundits, mavens, and azure=chip consultants’ comments on this Google action at this link.

You may want to listen to a podcast called TWIL and listened to the November 23, 2009, show on which Google Scholar was discussed for about a half hour. You can find that discussion on iTunes. Just search for TWIL and download the program “Social Lubricants and Frictions.”

On the surface, the Google push into legal information is a modest amount of data in terms of Google’s daily petabyte flows. The service is easy to use, but the engineering required to provide access to the content strikes me as non-trivial. Content transformation is an expensive proposition, and the cost of fiddling with legal information is one of the primary reasons commercial online services have had to charges hefty fees to look at what amounts to taxpayer supported, public information.

The good news is that the information is free, easily accessible even from an iPhone or other mobile device. The Google service does the standard Google animal tricks of linking, displaying content with minimal latency, and updating new content in a a minute or so that content becoming available to Google software Dyson vacuum cleaner.

So what?

This service is similar to others I have written about in my three Google monographs. Be aware. My studies are not Sergey-and-Larry-eat-pizza books. I look at the Google open source technical and business information. I ignore most of what Google’s wizards “say” in public. These folks are “running the game plan” and add little useful information for my line of work. Your mileage may differ. If so, stop reading this blog post and hunt down a cheerful non-fiction Google book by a real live journalist. That’s not my game. I am an addled goose.

Now let me answer the “so what”.

First, the Google legal content is an incremental effort for the Google. This means that Google’s existing infrastructure, staff, and software can handle the content transformation, parsing, indexing, and serving. No additional big-buck investment is needed. In fact, I have heard that the legal content project, like Google News, was accomplished in the free time for play that Google makes available to its full time professionals. A bit of thought should make clear to you that commercial outfits who have to invest to handle legal content in a Google manner have a cost problem right out of the starting blocks.

Second, Google is doing content processing that should be the responsibility of the US government. I know. I know. The US government wants to create information and not compete with commercial outfits. But the outfits manipulating legal information have priced it so that most everyday Trents and Whitneys cannot afford to use these commercial services. Even some law firms cannot afford these services. Pro bono attorneys don’t have enough money to buy yellow pads to help their clients. Even kind hearted attorneys have to eat before they pay a couple a hundred bucks to run a query on the commercial online services from publicly traded companies out to make their shareholders have a great big financial payday. Google is operating like a government when it processes legal information and makes it available without direct charge to the user. The monetization takes place but on a different business model foundation. That also spells T-R-O-U-B-L-E for the commercial online services like Lexis and Westlaw.

Read more

Microsoft and News Corp.: A Tag Team of Giants Will Challenge Google

November 23, 2009

Government regulators are powerless when it comes to online. The best bet, in my opinion, is for large online companies to act as if litigation and regulator hand holding was a cost of doing business. While the legal eagles flap and the regulators meet bright, chipper people, the business of online moves forward.

The news that News Corp. and Microsoft are, according to “Microsoft Offers To Pay News Corp To “De-List” Itself From Google”, and other “experts”, these two giants want to form a digital World Wrestling Federation tag team. In the “fights” to come, these champions—Steve Ballmer and Rupert Murdoch–will take on the unlikely upstarts, Sergey the Algorithm Guy and Larry the Math Whiz.

image

Which of these two tag teams will grace the cover of the WWF marketing collateral? What will their personas become? Source: http://www.x-entertainment.com/pics5/wwe11click.jpg

The idea is to “pull” News Corp. content from Google or make it pay through its snout for the right to index News Corp. content. The deal will probably encompass any News Corp. content. Whatever Google deal is in place with News Corp. would be reworked. News Corp., like other traditional media companies is struggling to regain its revenue traction.

For Microsoft a new wrestling partner makes sense. Bing is gaining market share, but at the expense of Yahoo’s search share. Microsoft now faces Google’s 1,001 tiny cuts. The most recent is the browser based operating system. There is the problem of developers with Microsoft’s former employees rallying the Google faithful. There’s the pesky Android phone thing that went from a joke to a coalition of telephone-centric outfits. There’s the annoyance of Google in the US government. On and on. No one Google nick has to kill Microsoft. Nope. Google just needs to let a trickle of revenue slip away from the veins of Microsoft. The company’s rising blood pressure will do the rest. Eventually, the losses from the 1,001 tiny cuts will force the $70 billion Redmond wrestler to take a break. That “rest” may be what gives Google the opportunity to do significant damage with its as-yet-unappreciated play for the TV, cable, and independent motion picture business. Silverlight 4.0 may not be enough to overcome the structural changes in rich media. That’s real money. Almost as much as the telephony play promises to deliver to the somewhat low key team of Sergey the Algorithm Guy and Larry the Math Whiz

image

Sergey the Algorithm Guy and Larry the Math Whiz take a break from discussing the Kolmogorov-Smirnov test of normality. Training is tough for this duo. Long hours of solitary computation may exhaust the team before it tackles the Ballmer-Murdoch duo, which may be the most dangerous opponent the Math Guys have faced.

I look forward to the fight promoter to pull out all the stops. One of the Buffers will be the announcer. The cut man will be the master, Stitch Duran. The venue will be Las Vegas, followed by other world capitals of money, power, and citizen concern.

Nicholas Carlson reported:

Still, if News Corp were to “de-list” from Google, we’d expect to see all kinds of ads touting Bing as the only place to find the Wall Street Journal and MySpace pages online. Maybe that’d swing search engine share some, but we doubt it.

Read more

MarkLogic Tames Big Data

November 20, 2009

I spent several hours on November 18, 2009, at the MarkLogic client conference held in Washington, DC on November 18, 2009. I was expecting another long day of me-too presentations. What a surprise! The conference attracted about 250 people and featured presentations by a number of MarkLogic customers and engineers. There were several points that struck me:

First, unlike the old-fashioned trade show, this program was a combination of briefings, audience interaction, and informal conversations fueled by genuine enthusiasm. Much of that interest came from the people who had used the MarkLogic platform to deliver solutions in very different big data situations. Booz, Allen & Hamilton was particularly enthusiastic. As a former laborer in the BAH knowledge factory, the enthusiasm originates in one place—the client. BAH professionals are upbeat * only * when the firm’s customers are happy. BAH described using the MarkLogic platform as a way to solve a number of different client problems.

clip_image002

MarkLogic’s platform applied to an email use case caught the attention of audiences involved in certain types of investigative and data forensics work.Shown is the default interface which can be customized to the licensee’s requirements.

Second, those in the audience were upfront about their need to find solutions to big data problems—scale, analytics, performance. I assumed that those representing government entities would be looking for ways to respond to President Obama’s mandates. There was an undercurrent of responding to the Administration, but the imperative was the realization that tools like relational databases were not delivering solutions. Some in the audience, based on my observations, were actively looking for new ways to manipulate data. In my view, the MarkLogic system had blipped the radar in some government information technology shops, and the people with problems showed up to learn.

Read more

« Previous PageNext Page »

  •  Only search links from this page: