Bing Kumo Brings Life to Old Domain

May 27, 2009

Busy day on the phone with media types and professional journalists. Bing Kumo, Microsoft’s tough new search dude, is expected soon. I enjoyed JR Raphael’s “Microsoft Bing Would Bring New Life to Old Domain.” You can read the story here. I found the history of the domain name interesting. Microsoft’s approach to Bing will become clear when Steve Ballmer demos the new system at the D Conference. (D means “digital” for the Wall Street Journal, owner of the show.) With the turnover in owners for Bing, let’s hope this use of the domain sticks.

Stephen Arnold, May 27, 2009

Useful SQL Injection Info

May 27, 2009

At Los Alamos National Lab several years ago, a fellow speaker for an in-house conference gave a brilliant analysis of SQL injection. The talk was not made public. I came across a Bitpipe white paper from Breach. I have a Bitpipe user name and password, so locating the document was no problem. If you don’t have access to Bitpipe, click here, fill out the form, and download the seven page document. Useful.

Stephen Arnold, May 27, 2009

EDI Data Transformation

May 27, 2009

Most of the mavens and pundits write about a handful of search vendors. Not me. I grub around in the dark and often very important corners of search. If you have to transform data for EDI in an XML environment, you will find Alex Woodie’s “MegaXML Looks to Drive Expense Out of EDI” here useful. Mr. Woodie describes a new product, which if it works as described can eliminate some sleepless nights and a long weekend or two. The article describes Task Performance Group’s MegaXML utility. For me, the key passage in the article was:

Task Performance Group launched MegaXML a decade ago to take advantage of the flexibility of XML. On the front end, the Windows-based product can generate and send EDI documents, such as purchase orders and invoices, over VANs or the Internet using protocols like AS2. And on the backend, MegaXML can translate EDI documents to the format needed for specific platforms, such as flat files for AS/400-based ERP systems on DB2.

MegaXML has a hybrid or semi-cloud option that may be worth investigating. Mr. Woodie wrote:

With the outsourcing option, MegaXML will reside on a Windows server in Task Performance Group’s data center near Chicago. After mapping the EDI documents to the customer’s systems (a process that takes a few days), the customer will upload and download documents to the MegaXML data center using Secure FTP (S/FTP). MegaXML, in turn, will handle the translation to EDI formats and the distribution via AS2 or another method.

Data transformation consumes a significant portion of an information technology group’s time and budget. MegaXML may be a partial solution in some situations. More information is available at www.megaXML.com

Stephen Arnold, May 27, 2009

Ramp Time for Web Killers: Google to Alta Vista, X to Google

May 26, 2009

Harry McCracken’s “How Long Did It Take for the World to Identify Google as an Alta Vista Killer?” here asks an interesting question. His write up provides some examples of early positive Google evaluations in trade and news publications. His conclusion was that no one figured out how good Google was until several years raced by. I agree with his concluding remark:

A Google killer may well be out there even as we speak. We may even be saying nice things about it. But it would amaze me if we’ve figured out yet that it’s going to kill Google…

Several ideas raced through my mind as I reviewed his chronological list of early Google references; namely:

  1. Google pushed into search at a time when the leading Web sites were becoming portals, an evolutionary arc that reached its zenith with Yahoo.com and the MSN.com Web sites in the mid 2000s. Both companies were in effect mini-AOLs with search relegated to a “search box” that wasn’t all that useful or interesting to me
  2. The leading Web search engines were running aground on two well known problems to those familiar with Web indexing: the cost of scaling to keep pace with the growing volume of new and changed content and the baked in problems of traditional server architecture. Google tackled input output, failure, and cheap scaling early in its history. The company did not reveal what it did until the job was done. This put the company several years ahead of its competition at the time of its 2004 IPO
  3. Existing search vendors were looking for exits from Web indexing. The most notable challenger after Hewlett Packard muffed the Alta Vista project was Fast Search & Transfer. At the time of 9-1-1, Fast Search had indexed breaking news before Google, and the Fast Search system was, in terms of Web indexing, the equal of Google. What did Fast Search do? It sold its advertising and Web search business to concentrate on enterprise search. A decision that cut a path to the financial quagmire in which Fast Search became stuck and the police action about which most people know nothing.
  4. Other search vendors ran out of cash, ran into index updating problems similar to those encountered by Excite and Lycos, or changed business direction.

Google’s emergence, as I have written in my Google trilogy here, was a combination of several factors: luck, technical acumen, talent availability from the Alta Vista effort, and business savvy on the part of Google’s investors. Killing Google, therefore, will take more than a simple technical innovation. A specific moment in time combined with other ingredients will be needed.

For some of the big players today, time has run out. A Google killer may be in someone’s garage, but until the other chemicals are mixed together, the GOOG has won. Every time I make this statement, I get howls of outrage from conference organizers, venture firms, and pundits. I stand by my claim that Web search is not effectively in Google’s paws. Let me excite some readers on a related front: Google is poised to pull the same 70 percent market share trick in other business sectors. Digital goodies from Yahoo and the Microsoft Bing Kumo play notwithstanding, embrace Googzilla or stay out of its way.

Stephen Arnold, May 26, 2009

LBS I: Embedded Search Stubs

May 26, 2009

Editor’s Note: LBS or Little Bafflers in Search is a new feature. The idea is to answer a question that pops up in discussions about search, content processing, and text mining. The answers are designed to be broadly informative, but the answers will not be definitive. This is after all a free Web log. If you have a fix or work around to an SLB use the comments section of the Web log for the information. Please, use the Roman numeral to identify to which SLB your comment applies.

Question: My enterprise software system includes an enterprise search and retrieval function. Is this function the same as the system that I can license directly from the search vendor. If there are differences, what are they?

Answer: The addled goose wants to tell you that search stubs are * not * the full search system available from a vendor. OEM or original equipment manufacturer licenses (sometimes called embedded search deals) differ from the full-boat, bells-and-whistles system you can license from a vendor or a vendor’s authorized reseller / integrator. These stubs work within the enterprise application you licensed. So, if you license a content management system or a customer support system, that software vendor may include search, but the search system:

  • Is slimmed down to meet the requirements of the third party licensing the search and content processing component; for example, certain functions are not available such as analytics, visualization, hit boosting
  • Limits on what can be indexed; for example, only content within the specific file store so that other repositories are not accessible to the indexing subsystem
  • Fewer connectors or filters than the full scale product so that certain content types cannot be processed.

There are other variations, but I think about these stubs as “trial versions”. I anticipate that vendors offer search systems on an original equipment manufacturer basis make an effort to balance features and performance with what the licensee will pay.

The fix? Upgrade to the full boat system. A stub is a sales lead generator in many cases.

Stephen Arnold, May 26, 2009

Description of Data.gov

May 26, 2009

A happy quack to the reader who sent me a link to Propublica’s “Gov’s Got Data” here. Data.gov is a data portal created by the US government. Republica reported that there were 47 raw data sets on the site plus another 27 software utilities. Click here for a sample data set.

For me, the most important comment was:

There’s not a lot there yet, but the new federal Web site, which the Obama administration had promised to create, is up and running. The site is designed to be a clearinghouse of data from federal agencies.

Data sets are the type of content that Wolfram Alpha and some of Google’s more sophisticated system ingest to generate value-added outputs.

The challenge is that US government agencies are silos and sharing is often a lengthy administrative process. After all, why share when headcount could be reduced due to the trimming of tasks for an agency. In my experience, government entities want to preserve data, tasks, and services to keep the bean counters from chopping a manager’s staff.

Long slog ahead for Data.gov I think.

Stephen Arnold, May 26, 2009

Cyberwarfare Attack Devices

May 26, 2009

If you worry about enterprise search, you won’t find much of interest in this Aviation Week. The addled goose, on the other hand, sees the story “Network Attack Weapons Emerge” here by David Fulghum as a precursor of similar information initiatives in the business arena. Information is a strategic asset and methods to locate, disrupt, intercept, and analyze those assets are going to remain and become increasingly significant. The core of the Aviatiion Week story was this comment:

Devices to launch and control cyber, electronic and information attacks are being tested and refined by the U.S. military and industry in preparation for moving out of the laboratory and into the warfighter’s backpack.

Mr. Fulghum added:

The Russians conducted a cyberattack that was well coordinated with what Russian troops were doing on the ground,” says a longtime specialist in military information operations. “It was obvious that someone conducting the cyber[war] was talking to those controlling the ground forces. They knew where the [cyber]talent was [in Russia], how to use it, and how to coordinate it. “That sophisticated planning at different levels of cyberwarfare surprised a lot of people in the Defense Dept.,” he says. “It looked like a seamless, combined operation that coordinated the use of a range of cyberweapons from the sophisticated to the high school kids that thought it was cool to deface official web sites. The techniques they used everybody knows about. The issue was how effective they were as part of a combined operation.”

I found interesting his description of the components of a cyberattack toolkit:

The three major elements of a cyberattack system are its toolbox, planning and execution capabilities. The toolbox is put together by the hardware and software experts in any organization to address specific missions. They maintain the database of available capabilities.

Worth reading.

Stephen Arnold, May 26, 2009

Tweetmeme: Snapshot

May 26, 2009

Tweetmeme is a service provided by Twitter that gathers all links posted on Twitter and determines which are the most popular. It then categorizes those links on the front page making it easier to find what you’re looking for. Readers can easily subscribe to each of the available categories, gaining access to the most popular, up-to-the-minute content through their Twitter account.

Twitter and its tools are the latest rage in Social Networking and business should be taking full advantage of what they can offer. If your business posts a blog, Tweetmeme provides the freshest, most relevant topics to be used as inspiration for the blog posts. Business can also use Tweetmeme’s service to send out time-sensitive information to large groups of customers or prospects.

Melanie Van Nuys, May 26, 2009

Enkia: Early Player in Smart Search

May 26, 2009

Last week, I received a call from a defrocked MBA looking for work. (No surprise that!) The young wizard wanted to know about Enkia, a spin out of Georgia Tech’s incubator program in the late 1990s. If you poke around Web traffic reports, you see a surge for Enkia in year 2000 and then a flat line. In November 2008, a person sent this Twitter message that plopped into my tracking system: “Enkia is alive.” I told the job hunter that I would poke through my search archives to see what information I had. I will be in Atlanta in June, and I will try to swing by the company’s office at 85 Fifth Street in Atlanta to see what’s shakin’. (The last time I tried this approach the TeezIR folks kept the door locked. Big addled geese are often not welcome. Gee, maybe it’s because the addled geese don’t believe the chunks of marketing food tossed at them by vendors.)

The Company

According to an August 2000 article here, the company was

building the foundation of the Intelligent Internet(TM) based on the latest discoveries in cognitive science and artificial intelligence. Enkia’s middleware products overcome the limitations of current Internet search technology by sensing what a browser or shopper wants and recommending information quickly and automatically. This software enables portal providers to create personalized experiences that encourage return site visits and increased sales. Founded in 1998, Enkia is a member of the Advanced Technology Development Center (ATDC), the Georgia Institute of Technology’s high-tech business incubator.

What It Does

Enkia, the name of a Sumerian god with special brain power, was an early entrant in the “artificial intelligence for the Web movement”. If you have been following the exploits of Google, Microsoft, and Yahoo, the notion of smart software is with us today. The marketing verbiage is different, but the notion is the same as it was for Enkia.

Here’s a description from a year 2000 business journal story:

The software [Dr. Ashwin Ram and his students developed, called Enkion, has a type of ESP, if you will, sensing browsers’ needs by what they click. Enkion builds on techniques of artificial intelligence to model the human mind. The technology automatically recommends relevant information so that users don’t have to wade through hundreds of search results.

The company put a demo online, and I had a screen shot of the service. I thought I had results screen shots, but my memory deteriorates more quickly than the value of a US government Treasury note.

image

Screen shot of the Enkia Search Orbit interface, no longer available.

When the service rolled out, Dr. Ram said here:

“EnkiaGuide helps anyone find their ‘needles’ in haystacks of data on and off the Internet,” Dr. Ram adds. “It can help users find their way through technical support libraries or large e-commerce sites, and allow corporations to organize pathways through their large proprietary databases. The EnkiaGuide can make sense out of information chaos.”

The Technology

In my archive, I had a copy of an older white paper which is still available online as of May 25, 2009, here:

The IRIA architecture builds upon and extends the experience-based agent approach by embedding it in a knowledge discovery and presentation engine using techniques from artificial intelligence and machine learning. Crushing demands on resources limit the amount of “smarts” typical web search engines can apply to any particular information resource requests.  IRIA’s design overcomes this problem by leveraging existing search engines for the brute force work of indexing and searching the web and by focusing its “smarts” on modeling and understanding the efforts of an individual or workgroup. The core of IRIA that makes this understanding possible is its reminding engine.  The reminding engine directly applies the experience-based agent approach to the problem of information search, consisting of a context-sensitive search mediator which uses a unified semantic knowledge base called a knowledge map to represent indexed pages, queries, and even browsing sessions in a single format.  This uniform representation enables the development of an experience-based map of available information resources, along with judgments about their relevance, allowing precise searches based on the history of research for an individual, group or online community.  The knowledge map is furthermore a browsable information resource in its own right, accessible by standard internetworking protocols; with appropriate security precautions, this enables workgroups at remote sites to view and exploit information collected by another workgroup.

Read more

The Boyle Conundrum: Old Media vs New Media

May 26, 2009

My New York Times today (May 25, 2009) contained an announcement of a price hike. The hard copy of the paper contained a story by Bran Stelter that had an amazing quotation. I found the statement indicative of the pickle in which traditional newspapers and “old” media find themselves. The story was “Payoff over a Web Singing Sensation Is Elusive.” The story is on the first page of the business section, and you may be able to find an online version of the story here. No guarantees, of course. The article is about FreemantleMedia Enterprises’ inability to monetize Susan Boyle, a contestant on Britain’s Got Talent TV show. Ms. Boyle, “frumpy Scotswoman” according to the New York Times, is a Web sensation. Despite that popularity, no cash flows to the show’s owners. The key statement in the write up in my opinion was:

The case reflects the inability of big media companies to maximize profit from supersize Internet audiences that seem to come from nowhere. In essence, the complexities of TV production are curbing the Web possibilities. Britain’s Got Talent” is produced jointly by three companies and distributed in Britain by a fourth, ITV, making it difficult to ascertain which of the companies can claim a video as its own.

Maybe litigation will provide the solution to the Gordian knot of “old media” and its business methods. Meanwhile, the price of the New York Times goes up and Susan Boyle videos get downloaded. Why not blame Google where a search for “Susan Boyle” returned nine million hits?

Stephen Arnold, May 26, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta