Enterprise Translation Systems

December 10, 2008

Update: December 14, 2008 I came across Nice Translator at http://www.nicetranslator.com/

Original Post

I received an email from a colleague who wanted to know about translation systems. I fired back an answer, but I thought you might want to have my short list of vendors to peruse. If you run a search on Google for “enterprise translation software”, you get more than 400,000 hits. That’s not too useful. If you want to experiment with free translation services, download this file.

BASIS Technologies licenses its various translation components to a number of search and content processing vendors; for example, Fast Search & Transfer was a customer. BASIS has been a leader in providing machine translation of Arabic and related languages. The Federal government has been a fan of BASIS’s systems. You can get some very specialized translation and language components; for example, a Japanese address analyzer.

Google provides a pretty good translation system. Right now, it is for free, which is a plus. Some of the translation systems shoot into six figures pretty quickly if you pack on the language packs and custom tuning. You can use the Google system by navigating here: http://translate.google.com. You can fiddle around and automate translation, but I have heard that Google monitors its translation system, so if you push too much through the system, the Googlers follow up. You can feed it a line of text or a url.

Language Weaver automated language translation. The company serves digital industries and enterprise customers directly and through strategic partnerships. You can hook this system into other enterprise software. Employees can access documents in their native language.The company recently added new language pairs:

  • Bulgarian to/from English
  • Hebrew to/from English
  • Serbian to/from English
  • Thai to/from English
  • Turkish to/from English.

Systran has been a player in translation for years. You have to buy Systran’s software. The desktop version works quite well. The enterprise system involves some fiddling, but you can automate the translation and perform some useful operations on the machine-generated files. You can get more information about Systran here. Systran is used for the Babel Fish online translation function in AltaVista.com and Yahoo.

How good are these systems?

None of the systems is perfect. None of the systems translates as well as a human with deep knowledge of the language pairs being translated. However, the speed of these systems and their “good enough” translations can cope with the volume of data flowing into an organization. I use several of these systems. I can get a sense of the document and then turn to a native speaker to clarify the translation.

I have unsubstantiated information that suggests Google has been making considerable progress with their online translation system. Because the system is available without charge, Google is becoming the default system. AltaVista.com still offers an online translation system, but Google has surpassed that system in speed and language pair support. When Google integrates its online translation system with its other enterprise services, I think Google will continue to chew away at the established vendors’ market share. The GOOG, however, seems happy to let customers find their online translation service. The economic downturn may shift the Google into higher gear.

Stephen Arnold, December 10, 2008

Scaling SQL Server

December 10, 2008

It is official. SQL Server does not scale too well. Sure, you can whip about some data, but when it comes to petascale data management, SQL Server is like the kids in the lower quartile of the SAT–good enough to compete but probably not cut out for computational chemistry in the first year of college. Microsoft, according to Eric Lai, has hired, David DeWitt,  a database wizard, to fix what Microsofties have not been able to do in the last six or seven years. Microsoft marketing asserts that SQL Server is a super-charged database management system. Anyone who has tried to get this puppy to handle petabytes of XML knows that SQL Server is a wee thing. You must read the ComputerWorld story here about this surprising admission of SQL Server’s weaknesses.

For me, the most interesting comment in this article was this statement:

DeWitt concedes today that [Google’s] MapReduce “does scale pretty well.” He hails its ability to continue queries without interruption if a particular server fails, which most clustered databases cannot do. But he stands by his argument, which is that true relational databases “give you a lot more leverage and good features.” And DeWitt said he will soon release research to back that up.

The article contains quite a few jibes and juicy items about Google, Microsoft, and assorted DBMS gurus. But the fact remains that Google had its MapReduce data management system in place and working pretty well before the firm’s initial public offering in 2004. Since that time, Google has invested heavily in dataspace technology, which in my opinion is the problem Microsoft and Dr. DeWitt have to address–Google’s head start. I think Microsoft’s new database lab is a good idea. But trying to narrow a five or six year gap is a big job. Microsoft may pull even with MapReduce and find that Google has disappeared into dataspace. Will Microsoft be able to leap frog Google? Possibly. But Microsoft has a problem now and may be facing the unpleasant situation of catching up to Google only to find the Google has managed to extend dataspace management lead.  Google is in “as is” mode. Microsoft is in “to be” mode. I think there is a significant difference in the two firm’s positions in DBMS.

Stephen Arnold, December 9, 2008

More YouTube Data

December 10, 2008

Mark Cuban, blog maverick, wrote “YouTube’s Desperation.” You can read the article here. Mr. Cuban wrote a thoughtful analysis of the challenges YouTube presents to Google’s management. Mr. Cuban’s discussion of YouTube options is also interesting. He identifies two options. First, keep spending money, a lot of money. Or, split YouTube into two sites. One site would host user generated content; the other, commercial content. I may be simplifying Mr. Cuban’s analysis, and I suggest you read his remarks directly. I don’t want to pull a quote from context  because that might distort his point of view.

For me, the most interesting part of the article was the run down of YouTube facts. Google does not provide much information about its services. I am not comfortable latching on to these data as 100 percent accurate, but I found them quite interesting. Here’s my summary of the juicy data in Mr. Cuban’s write up:

  • Google receives 13 gigabytes of YouTube content every second
  • “At .12 cents per gbs, thats [sic]  about 5mm dollars per year in upload costs,” wrote Mr. Cuban
  • “If over the coarse [sic] of a year, each of that video is watched 100 times, thats another $500mm in bandwidth costs,” added Mr. Cuban.

With the Viacom lawsuit chugging forward, Google may have to use some of its nifty monetization inventions disclosed in its patent documents. If YouTube cannot be monetized, Google may have to make some tough decisions about the service. Video search is a separate service, but the costs are almost certain to continue to rise.

When the YouTube deal was announced, Mr. Cuban was an early critic. At least he did not write an “I told you so” line in his Web log post.

Stephen Arnold, December 7, 2008

Google a Bandwidth Piggie or Player

December 9, 2008

A reader sent me a link to a 27 page report called “Estimating Google’s U.S. Consumer Internet Usage & Cost — 2007-2010” by Scott Cleland, president of Precursor. You can download the study here. The thesis of the report is that Google uses lots of bandwidth. Mr. Cleland makes a case for Google paying more money for its bandwidth usage. Mr. Cleland does a good job tracking Google, providing a link to the interesting FaberNovel analysis of Google’s business.

Google, however, has rousted its legal and communications wizards and responded to Mr. Cleland’s analysis of Google as a bandwidth piggie. You can read Sharon Gaudin’s “Google Fires Back at Analyst Claim It’s a Bandwidth Hog” here. The core of Google’s response is well stated by Ms. Gaudin:

“First and foremost, there’s a huge difference between your own home broadband connection, and the Internet as a whole. It’s the consumers voluntarily choosing to use our applications who are actually using their own broadband bandwidth — not Google. To say that Google somehow ‘uses’ consumers’ home broadband connections shows a fundamental misunderstanding of how the Internet actually works.”

In my opinion, one can shape bandwidth data in many ways. I am not sure I buy into Mr. Cleland’s analysis. I know that Google’s argument contains assertions, not substantive data. So, we have a bit of saber rattling from both camps.

What I find interesting is that this issue was sufficiently contentious that the first shots were fired late in the week of December 1, 2008. I expect the verbal sniping to continue. A full scale war between Google and some of the telcos is increasingly likely. These skirmishes are interesting. Bigger confrontations are almost sure to follow.

Stephen Arnold, December 9, 2008

Bad News for Commercial Database Vendors

December 9, 2008

Garett Rogers delivered a holiday surprise to commercial database vendors. On December 3, 2008, when some of the big guns in online news were handing our business cards to budget challenged librarians and documentalists, Mr. Rogers reported here that Google acquired from an outfit called Paper of Record the company’s archives of indexed newspapers. The Paper of Record’s Web site explains the purpose of the archive:

Conceived by electronic publishing and web pioneer, R.J. (Bob) Huggins in a local Ottawa, Mexican restaurant in 1999, PaperofRecord.com is a Global pioneer of searchable newspaper image documents presented in their original published form. The Toronto Star, (circulation 650,000) became the first newspaper in the world to have its entire history from 1892 to present, digitized for the world to see and search. This revolutionary process changed forever how large metropolitan newspapers conduct their research and became the genesis for PaperofRecord.

My thought is that this acquisition may be like putting a toe in the water. If it “feels” good, the GOOG may start making commercial databases free to users. The content becomes a platform for the online ads. With commercial database publishers hanging on to an outmoded business model, the commercial database sector could suffer sharp revenue drops. Libraries will point users to “free” services and if these prove satisfactory, commercial databases may be starved for revenue. What can the commercial database publishers do to “slow” Googzilla? I do not have any bright ideas. Do you?

Stephen Arnold, December 9, 2008

Stratify Adds Cloud Storage Services

December 9, 2008

On December 3, 2008, Stratify–a unit of Iron Mountain–announced new services for its thriving eDiscovery business. You can read the Stratify news release here. The core of the service is disaster recovery. Attorneys apparently have a need to make sure that the legions of attorneys who pour through electronic documents obtained as part of the discovery process can’t nuke the data. Stratify said:

To safeguard client eDiscovery data Stratify has invested in and deployed a fully replicated production datacenter with more than 250 terabytes of storage, 200 servers and redundant 100MB Internet access, coupled with highly trained personnel and security procedures.

Stratify (once did business as Purple Yogi) now wears a blue suit and polished shoes, no sneakers now. IDC’s Sue Feldman weighs in with an observation that the new service “raises the bar” for the companies competing for eDiscovery accounts.

Stratify’s news release added:

Stratify can restore access to client matters within four hours after a potential disaster, recover 100 percent of processed and loaded documents and system metadata, and lose no more than 59 minutes worth of review work product.

In my opinion, the eDiscovery sector is undergoing rapid change. The need for end-to-end solutions and bullet proof systems means that specialist vendors may be forced to add sophisticated new features in order to compete. The problem is that eDiscovery systems are selling to corporations. With the technology and market changing, well funded organizations with a strong client list may have an advantage. Stratify said that it had more than 250 matters underway at this time.

eDiscovery, like business intelligence, is becoming a magnet for search and content processing companies who want to find a way to pump up revenues.

Stephen Arnold, December 9, 2008

Overflight Enhancements

December 9, 2008

ArnoldIT.com’s Google monitoring service made some changes over the last few days. You can access the service by clicking here. Overflight Google allows you to look at the most recent Web log posts on more than 70 Google Web logs. The change is the addition of a link that says, “Show Overflight Update Stream”. When you click it, we display the additions to Google Web logs and put the date on each item. The Update Stream function has been added for each of the Google Web log clusters. If you want to scan headlines, you can browse the most recent items for each of the Google Web logs.

The other enhancement is the addition of entity extraction to the Exalead search system’s index of the corpus of Google Web logs. I am not too happy with the phrase “vertical search”, but I must admit, the Exalead index of more than 70 Web logs is a sharply focused vertical search engine. Here’s a screen shot of the Exalead entity extraction. You can use it to learn the name of the Google customer at Genentech and similar interesting ways to learn about the GOOG.

entity extraction 1

A happy quack to the Exalead team. More enhancements are coming. If you would like an Overflight service on your Web site, write seaky2000 at Yahoo dot come.

Stephen Arnold, December 9, 2008

Danish Software Excitement

December 9, 2008

I have been watching the posts about SurfRay in the comments section of this Web log. I also have included links to stories about the Microsoft Fast issue. IT Factory is–er, was–an integrator and software vendor. I visited the company’s Web site at  5:41 pm Eastern time and was greeted with this message:

On 1 December 2008 IT Factory A/S was adjudicated bankrupt by the Maritime and Commercial Court, Bankruptcy Division. The court appointed Boris Frederiksen, Attorney-at-Law as the trustee in bankruptcy. Please direct all inquiries regarding the bankrupt estate to Boris Frederiksen, Attorney-at-Law (bor@kammeradv.dk), Rune Derno, Attorney-at-Law (der@kammeradv.dk) or Cathrine Wollenberg Zittan, Assistant Attorney (cwz@kammeradv.dk).  We ask the public to appreciate that the employees of IT Factory A/S in bankruptcy are still in an employment relationship for which reason they are subject to the general rules of secrecy.

Most of the Web pages describing IT Factory’s services on which I clicked returned little useful information. I did find a snippet in my files from LinkedIn. Here’s that description:

IT factory A/S provides information technology consultancy and software development services. The company offers customer relationship and human resources management, online procurement, and office automation solutions. Additionally, it provides project implementation, management information systems, training, and support services. IT factory partners with IBM, Cisco, Computer Associates, Sun Microsystems, and Oracle. The company was founded in 1997 and is based in Birkeroed, Denmark with an additional office in New Delhi, India.

A list of some of the people who used to work at the company is here. There is a link to an outfit called Elastictime on the LinkedIn page. I lacked the motivation to cross check between the two companies. Let me know if you have information about a connection. Elastictime is a Microsoft partner.

I noted a reference to IT Factory in the comments to this Web log. I don’t know if the information in the comments is accurate. I recall the suggestion that SurfRay’s owner had some interaction with Stein Bagger, the top dog at IT Factory.

At any event, Denmark is certainly working overtime to capture the lead in questionable practices in information-centric software companies. The Los Angeles Times here reported that Mr. Bagger, who fled Denmark, walked into a Los Angeles police station and turned himself in. Needless to say, the LA police were not familiar with the plight of IT Factory. After a bit of checking, Mr. Bagger was permitted to enjoy the delights of the LA jail.

I have an informal search engine death watch on a sheet of paper. Maybe I should develop a list of alleged criminal practices among information-centric companies. In my opinion, as the economy continues to deteriorate, I may be forced to learn about other companies missteps. For me, the most interesting comment in the LA Times’s story was:

At the time, he was on top of the world. Bagger, chief executive of Copenhagen-based IT Factory, had recently been named Danish Entrepreneur of the Year by Ernst & Young after leading the previously crumbling computer company into what seemed to be an incredible turnaround. The Danish press reported that IT Factory had doubled its revenue and profit for each of its last three fiscal years.

I am curious about the due diligence performed by Ernst & Young when vetting executives for the prestigious award. Perhaps I should say, “previously prestigious” award?

Stephen Arnold, December 10, 2008

Microsoft Layoffs

December 9, 2008

Todd Bishop’s Microsoft Blog, produced by the Puget Sound Business Journal, reported on December 4, 2008, that Razorfish cut some fat. The most recent layoffs follow the 40 employees made redundant in October 2008. You can read the full story here. Razorfish is in the business of helping clients build high traffic Web sites. Microsoft got possession of the company with its acquisition aQuantive. The most recent layoffs affect West Coast offices. Like the Google, the economic downturn may provide a convenient smoke screen for trimming some staff. The aQuantive deal has gone silent in my opinion. Like the Fast Search and Powerset deals, Microsoft appears to act quickly and then permit the newly acquired units to sink or swim.

Stephen Arnold, December 9, 2008

Zell Hell: The Tribune on Life Support

December 9, 2008

I am now making good progress on my Google and Publishing study for Infonortics Ltd. I took a break from things Google to write the April 2008 Beyond Search report for the Gilbane Group and to work with Martin White on our Successful Enterprise Search Management available in a matter of days. I have been thinking about Google’s business model which seems to be insulated from the problems of certain traditional media; for example, the Chicago Tribune, once the world’s greatest newspaper or so a radio station’s call letters suggested. The financial genius of Sam Zell met the shift in advertiser and customer behavior. He learned that online can be a stern taskmaster. Street smarts don’t apply to some of the micro climates created when digital information flows through a tidy market like Chicago. Even the original Mayor Daley might find his ward power ineffectual when facing online’s oddities.

The Chicago Tribune itself reported here that the Tribune’s bankruptcy would not stop the company from selling the Chicago Cubs. Now that is a heck of a story. Little wonder that 20 somethings use Craigslist.org to find apartments for rent, not the Tribune classifieds. Business Week ran “Tribune Bankruptcy Snares Employees” here. The angle for this story in my opinion is nicely stated in this paragraph:

The bright side may be that Tribune employees didn’t have a lot of time to put much money toward ESOP contributions. “In a weird way, the employees are better off that the company crashed today instead of seven years from now,” says a banker familiar with the deal, who asked not to be named.

Now that’s how to find good news where I see a somewhat disturbing situation. In fact, in my Google and Publishing monograph, I set technology aside. The Tribune and soon more traditional publishing companies will find themselves under the wheels of a speeding business model. The Information superhighway–to drag up a metaphor from the 1994 to 1996 period–has a huge dinosaur staggering across six lanes of digital traffic. Google is not to blame, nor is any single company. The Tribune’s sharp management team is doing the 21st century equivalent of making buggy whips. The Tribune does good buggy whips, but the customers don’t want buggy whips. The customers want free apartment listings or other nifty things that a different business model coupled with whizzy technology delivers to their iPhones or BlackBerries. Mr. Sell and his crack management team are now living in a Harvard Business School case study that begins, “From his office overlooking the murky river, Mr. Zell ponders how he can escape from the financial catastrophe escalating by the minute.” This is no ivory tower Starbuck’s problem. This is the real deal, and the traditional tools aren’t working at the moment.

The question becomes more urgent because Google is pushing beyond books and into magazines. There are hundreds of posts about this move. In my opinion, Zell hell is about to invite others to the party. The GOOG is on the prowl for eyeballs and traffic. Microsoft dropped out of the race. Yahoo is busy rationalizing its work force. The commercial database publishers are oblivious. Google is disrupting the traditional information order. Who’s next for Zell hell?

Stephen Arnold, December 9, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta