Teragram: Growth Strong Despite Downturn

December 16, 2008

I enjoy contrarians. I say the economy is lousy. A contrarian tells me that the economy is wonderful. I say that financial fraud undermines investor confidence. The contrarian tells me to trust American Express. In fact, American Express is one of the most trusted companies in the United States. In my view, I wouldn’t trust this outfit to walk my dog.

Teragram issued an interesting news release that contains information that is contrary to information I have compiled. Specifically, Teragram, now a unit of SAS, the statistics outfit, said here:

At a time when enterprises are concerned about a lagging economy and the bottom line, Teragram has consistently provided proven, money-saving knowledge management tools. Teragram helps knowledge workers automatically organize unstructured data sources, making information more accessible and enabling faster and more accurate knowledge and information sharing. This helps enterprises efficiently manage their growing amounts of information, saving time, resources and money.

I profiled Teragram in one of my studies for a teen aged publisher and reported that the company had some solid clients, interesting technology, and a hosted option to give its customers flexibility. But the economy is lousy and I am not inclined to trust big companies. Therefore, I will keep my eye on Teragram to make sure that it continues to move smoothly against the currents that are carrying some search and content processing companies over Victoria Falls. Yahoo is in some trouble with its world class search system. I reported on TeezIR’s elusiveness. SurfRay remains a mystery. Delphes seems to be on hiatus. Entopia is a flat out goner. And I know of one “big name” that is literally fighting for its life. Could it be good public relations? The marketing clout of SAS? Teragram’s Harvard connection? If anyone knows Teragram’s secret, please, share it.

Stephen Arnold, December 16, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, News, Search, Semantic, Text analytics, Text processing | Comments Off on Teragram: Growth Strong Despite Downturn

Wall Street Journal Figures Out What Google Is Doing, Gets Criticized

December 15, 2008

The Wall Street Journal’s Vishesh Kuman and Christopher Rhoads stumbled into a hornet’s nest. I think surprise may accompany these people and their editor for the next few days. The story “Google Wants Its Own Fast Track on the Web.” The story is here at the moment, but it will probably disappear or be unavailable due to heavy click traffic. Read it quickly so you have the context for the hundreds of comments this story has generated. Pundits whose comments I found useful are the Lessig Blog, Om Malik’s GigaOM, and Google’s own comment here.

The premise of the article is that the GOOG wants to create what Messrs. Kuman and Rhoads call “a fast lane.” In effect, the GOOG wants to get preferential treatment for its traffic. The story wanders forward with references to network neutrality, which is probably going to die like a polar bear sitting on an ice chunk in the Arctic circle. Network neutrality is a weird American term that is designed to prevent a telco from charging people based on arbitrary benchmarks. The Bell Telephone Co. figured out that differential pricing was the way to keep the monopoly in clover a long time ago. The lesson has not be forgotten by today’s data barons. The authors drag in the president elect and wraps up with use of a Google-coined phrase “OpenEdge.”

Why the firestorm? Here are my thoughts:

First, I prepared a briefing for several telcos in early 2008. My partner at the Mercer Island Group and I did a series of briefings for telecommunication companies. In that briefing, I showed a diagram from one of Google’s patent documents and enriched with information from Google’s technical papers. The diagram showed Google as the intermediary between a telco’s mobile customers and the Internet. In effect, with Google in the middle, the telco would get low latency rendering of content in the Googleplex (my term for Google’s computer and software infrastructure). The groups to a person snorted derision. I recall one sophisticated telco manager saying in the jargon of the Bell head, “That’s crap.” I had no rejoinder to that because I was reporting what my analyses of Google patents and technical papers said. So, until this Wall Street Journal story appeared, the notion of Google becoming the Internet was not on anyone’s radar. After all, I live in Kentucky and the Mercer Island Group is not McKinsey & Co. or Boston Consulting Group in terms of size and number of consultants. But MIG has some sharp nails is its toolkit.

Second, in my Google Version 2.0, which is mostly a summary of Google’s patent documents from August 2005 to June 2007, I reported on a series of give patent documents, filed the same day and eventually published on the same day by the ever efficient US Patent & Trademark Office. the five documents disclosed a big, somewhat crazy system for sucking in data from airline ticket sellers, camera manufacturers, and other structured data sources. The invention figured out the context of each datum and built a great big master database containing the data. The idea was that some companies could push the data to Google. Failing that, Google would use software to fill in the gaps and therefore have its own master database. BearStearns was sufficiently intrigued by this analysis to issue a report to its key clients about this innovation. Google’s attorneys asserted that the report contained proprietary Google data, an objection that went away when I provided the patent document number and the url to download the patent documents. Google’s attorneys, like many Googlers, are confident but sometimes uninformed about what the GOOG is doing with one paw while the other paw adjusts the lava lamps.

Third, in my Beyond Search study for the Gilbane Group, I reported that Google had developed the “dataspace” technology to provide the framework for Google to become the Internet. Sue Feldman at IDC, the big research firm near Boston, was sufficiently interested to work with me to create a special IDC report on this technology and its implications. The Beyond Search study and the IDC report went to hundreds of clients and was ignored. The idea of a dataspace with metadata about how long a person looks at a Web page and the use of meta metadata to make queries about the lineage and certainty of data was too strange.

What the Wall Street Journal has stumbled into is a piece of the Google strategy. My view is that Google is making an honest effort to involve the telcos in its business plan. If the telcos pass, then the GOOG will simply keep doing what it has been doing for a decade; that is, building out what I called in January 2008 in my briefings “Google Global Telecommunications”. Yep, Google is the sibling of the “old” AT&T model of a utility. Instead of just voice and data, GGT will combine smart software with its infrastructure and data to marginalize quite a few business operations.

Is this too big an idea today? Not for Google. But the idea is sufficiently big to trigger the storm front of comments. My thought is, “You ain’t seen nothing yet.” Ignorance of Google’s technology is commonplace. One would have thought that the telcos would take Google seriously by now. Guess not. If you want to dig into Google’s technology, you can still buy copies of my studies:

Bear Stearns is out of business, so I don’t know how you can get a copy of that 40 page report. You can order the dataspaces report directly from IDC. Just ask for Report 213562.

If you want me to brief your company on Google’s technology investments over the last decade, write me at seaky2000 at yahoo dot com. I have a number of different briefings, including the telco analysis and a new one on Google’s machine learning methods. These are a blend of technology analysis and examples in Google’s open source publications. I rely on my analytical methods to identify key trends and use only open source materials. Nevertheless, the capabilities of Google are–shall we say–quite interesting. Just look at what the GOOG has done in online advertising. The disruptive potential of its other technologies is comparable. What do you know about containers, janitors, and dataspaces? Not much I might suggest if I were not an addled goose.

Oh, let me address Messrs. Kumar and Rhoads, “You are somewhat correct, but you are grasping at straws when you suggest that Google requires the support and permission of any entity or individual. The GOOG is emerging as the first digital nation state.” Tough to understand, tough to regulate, and tough to thwart. Just ask the book publishers suggest I.

Stephen Arnold, December 15, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, Google, News, Online (general), Search, Semantic, Technology, Text analytics, Text processing | 4 Comments

Google Recipes

December 15, 2008

Last week I showed some “in the wild” functions on Google. These are test pages on which certain Google features appear. Finding an “in the wild” service is a hit and miss affair. I was curious about the query “recipes”. On Wednesday, December 10, 2008, I ran the query and it was ho hum regular Google laundry list format. Today (Sunday, December 14, 2008), the query generated an interesting result page. First, the Programmable Search Engine drop down box appears. Second, the source of the recipes is a Web site at http://allrecipes.com. Third, a hot link to a definition of recipes appears under the line about customized search results; for example, Results 1 – 10 of about 148,000,000 for recipes [definition]. (0.15 seconds). When I clicked the definition, I was directed here.

Advertisers may be willing to pay extra to be featured with the Google categories for their Web site or the “definition” hot link. Add to this the insertion of AdWords into the drop down suggestion box and what have you got? Subtle monetization. The GOOG is going to hit its revenue targets by offering advertisers some very tasty ad options. Ads, like Web pages, are losing their zing. The GOOG is responding.

Stephen Arnold, December 15, 2008

Written by Stephen E. Arnold · Filed Under Financial, Google, News, Search, Semantic, Text analytics, Text processing | Comments Off on Google Recipes

Expert System’s COGITO Answers

December 12, 2008

Expert System has launched COGITO Answers, which streamlines search and provides customer assistance on web sites, e-mail and mobile interfaces such as cell phones and PDAs while creating a company knowledge base. The platform allows users to search across multiple resources with a handy twist: it uses semantic analysis to absorb and understand a customer’s lingo, therefore analyzing the meaning of the text to process search results rather than just matching keywords. It interprets word usage in context. The program also tracks customer interface and stores all requests so the company can anticipate client needs and questions, thus cutting down response time and increasing accuracy. You can get more information by e-mailing answers@expertsystem.net.

Jessica Bratcher, December 12, 2008

Written by Stephen E. Arnold · Filed Under News, Online (general), Search, Semantic, Text analytics, Text processing | Comments Off on Expert System’s COGITO Answers

Stratify Adds Cloud Storage Services

December 9, 2008

On December 3, 2008, Stratify–a unit of Iron Mountain–announced new services for its thriving eDiscovery business. You can read the Stratify news release here. The core of the service is disaster recovery. Attorneys apparently have a need to make sure that the legions of attorneys who pour through electronic documents obtained as part of the discovery process can’t nuke the data. Stratify said:

To safeguard client eDiscovery data Stratify has invested in and deployed a fully replicated production datacenter with more than 250 terabytes of storage, 200 servers and redundant 100MB Internet access, coupled with highly trained personnel and security procedures.

Stratify (once did business as Purple Yogi) now wears a blue suit and polished shoes, no sneakers now. IDC’s Sue Feldman weighs in with an observation that the new service “raises the bar” for the companies competing for eDiscovery accounts.

Stratify’s news release added:

Stratify can restore access to client matters within four hours after a potential disaster, recover 100 percent of processed and loaded documents and system metadata, and lose no more than 59 minutes worth of review work product.

In my opinion, the eDiscovery sector is undergoing rapid change. The need for end-to-end solutions and bullet proof systems means that specialist vendors may be forced to add sophisticated new features in order to compete. The problem is that eDiscovery systems are selling to corporations. With the technology and market changing, well funded organizations with a strong client list may have an advantage. Stratify said that it had more than 250 matters underway at this time.

eDiscovery, like business intelligence, is becoming a magnet for search and content processing companies who want to find a way to pump up revenues.

Stephen Arnold, December 9, 2008

Written by Stephen E. Arnold · Filed Under EDiscovery, Enterprise, News, Search, Semantic, Text analytics, Text processing | Comments Off on Stratify Adds Cloud Storage Services

Arnold White Study Published

December 8, 2008

Galatea has published Successful Enterprise Search Management by Stephen E. Arnold and Martin White. The authors are widely known for their research and consulting in search and information management. An interview with Martin White is here.

The study approaches the management aspect of search in information-dense environments: Ineffective information access can make the difference between an organization meeting its goals and actually going out of business. Managers spend up to two hours a day searching for information, and more than 50% of the information they obtain has no value to them.

To support its advice, the book outlines case studies and references to specific vendors’ systems while offering practical guidance on how to better manage key elements of enterprise search including planning, preparation, implementation, and adaptation. Specific topics addressed include text mining and advanced content processing, information governance, and the challenges language itself presents.

“This book will be of value to any organization seeking to get the best out of its current search implementation, considering whether to upgrade the implementation or starting the process of specifying and selecting enterprise search software,” co-author Martin White said.

A detailed summary of the contents of the 130 page report is available on the Galatea Web site here. You can order a copy, which costs about US$200 here. A number of the longer essays in the Beyond Search Web log consists of information excised from the final report.

Stephen Arnold, December 8, 2008

Written by Stephen E. Arnold · Filed Under Library automation, News, Publishing, Search, Semantic, Text analytics, Text processing | 5 Comments

Yahoo Jumping Ahead of Google

December 7, 2008

On December 7, 2008, PCWorld reported that Yahoo will offer abstracts, not laundry lists of search results. The news story I saw appeared in the Yahoo technology news service. You can read “Yahoo Technology Will Offer Abstracts of Search Results” here. If the link goes dead, try the PCWorld site itself here. When I saw the story, the search engine on the PCWorld site couldn’t locate the story. Nothing new there, of course. The key point in the unsigned article was that Yahoo’s Bangalore research facility has figure out how to abstract key information on the page. The idea is that when a user searches for “hotel”, the system would provide an address, map, and other information. I described a similar function in my description of Google’s dossier function. See US20070198481. According to the news story, Yahoo will roll out this service in 2009. My thought is that these types of smart services work really well when described on paper. The value of these “reports” or “answer” type systems is that language can be tricky. Google’s approach relies on “context”, a system and method disclosed in the February 2007 patent documents filed by Google’s Ramanathan Guha. My hunch is that Yahoo went public because of the rumors that Google was starting to use some of its niftier technology in certain public facing services. The Googler with whom I had interaction in London knew zero about the dossier function. Maybe Yahoo is trying to jump ahead of Google. We’ll see. I think Yahoo needs to address the shortcomings of its core search service first.

Stephen Arnold, December 7, 2008

Written by Stephen E. Arnold · Filed Under News, Online (general), Search, Semantic, Text analytics, Text processing, Yahoo | 1 Comment

New Open Source Search Vendor

December 5, 2008

Hans-Christian Brockmann, founder of brox, an open source search and content processing company, reveals his vision for his company here. Mr. Brockmann wants to provide organizations with an alternative to proprietary search systems. In an exclusive interview with ArnoldIT.com, a management consulting firm founded by Stephen E. Arnold, Mr. Brockmann said:

Having spent 10 years catering to enterprise customers, we are very much aware of the strategic and political aspects of deploying IT-infrastructures to large organizations. Our project SMILA (SeMantic Information Logistics Architecture) surely can be considered an infrastructure project which embraces open source search within it. Open source solutions are being absorbed in the enterprise, as evidenced by Lucene interest.

The statistical significance of the 180,000 Lucene projects is it underscores the sheer amount of “do it yourself” projects out there, actually massively more than there are professional commercial search implementations.

We want to provide these “do it yourself” search projects with the tools they all are missing..

Having spent 10 years catering to enterprise customers, we are very much aware of the strategic and political aspects of deploying IT-infrastructures to large organizations. Our project SMILA (SeMantic Information Logistics Architecture) surely can be considered an infrastructure project which embraces open source search within it. Open source solutions are being absorbed in the enterprise, as evidenced by Lucene interest. The statistical significance of the 180,000 Lucene projects is it underscores the sheer amount of “do it yourself” projects out there, actually massively more than there are professional commercial search implementations. We want to provide these “do it yourself” search projects with the tools they all are missing.Having spent 10 years catering to enterprise customers, we are very much aware of the strategic and political aspects of deploying IT-infrastructures to large organizations. Our project SMILA (SeMantic Information Logistics Architecture) surely can be considered an infrastructure project which embraces open source search within it. Open source solutions are being absorbed in the enterprise, as evidenced by Lucene interest.

Mr. Brockmann’s estimate of the number of Lucene installations is one of the first ArnoldIT.com has been able to obtain.

Mr. Brockmann added:

Putting in the plumbing for the next generation of semantic applications is something every organization will have to do in order to remain competitive. In the course of this they will more or less all stumble across the same issues. To name a few: Security, scalability, connectivity, longevity and of course, maintenance and support for such a large infrastructure. We suggest it is best to share the cost of implementing and maintaining the infrastructure – which is not part of the strategic competencies of any company – on a shared basis. If you look at the top 1000 companies globally and ask them to deliver a list of application software they are using,the lists will be almost identical. The breadth and depth of use of certain products may be different, but basically the tools are similar.

More information about brox, the company Mr. Brockmann founded is here. You can read the full text of the interview here.

Stephen Arnold, December 5, 2008

Written by Stephen E. Arnold · Filed Under Interview, News, Open source, Search, Semantic | 1 Comment

Information 2009: Challenges and Trends

December 4, 2008

Before I was once again sent back to Kentucky by President Bush’s appointees, I recall sitting in a meeting when an administration official said, “We don’t know what we don’t know.” When we think about search, content processing, assisted navigation, and text mining, that catchphrase rings true.

Successes

But we are learning how to deliver some notable successes. Let me begin by highlighting several.

Paginas Amarillas is the leading online business directory in Columbia. The company has built a new systems using technology from a search and content processing company called Intelligenx. Similar success stories and be identified for Autonomy, Coveo, Exalead, and ISYS Search Software. Exalead has deployed a successful logistics information system which has made customers’ and employees’ information lives easier. According to my sources, the company’s chief financial officer is pleased as well because certain time consuming tasks have been accelerated which reduces operating costs. Autonomy has enjoyed similar success at the US Department of Energy.

Newcomers such as Attivio and Perfect Search also have satisfied customers. Open source companies can also point to notable successes; for example, Lemur Consulting’s use of Flax for a popular UK home furnishing Web site. In Web search, how many of you use Google? I can conclude that most of you are reasonably satisfied with ad-supported Web search.

Progress Evident

These companies underscore the progress that has been made in search and content processing. But there are some significant challenges. Let me mention several which trouble me.

These range from legal inquiries into financial improprieties at Fast Search & Transfer, now part of Microsoft to open Web squabbles about the financial stability of a Danish company which owns Mondosoft, Ontolica, and Speed of Mind. Other companies have shut their doors; for example, Alexa Web search, Delphes, and Lycos Europe. Some firms such as one vendor in Los Angeles has had to slash its staff to three employees and take steps to sell the firm’s intellectual property which rightly concerns some of the company’s clients.

User Concerns

Another warning may be found in the results from surveys such as the one I conducted for a US government agency in 2007 that found dissatisfaction with existing search systems in the 65 percent range. AIIM, a US trade group, reported slightly lower levels of dissatisfaction. Jane McConnell’s recently released study in Paris reports data in line with my findings. We need to be mindful that user expectations are changing in two different ways.

First, most people today know how to search with Google and get useful information most of the time. The fact that Google is search for upwards of 65 percent of North American users and almost 75 percent of European Union users means that Google is the search system by which users measure other types of information access. Google’s influence has been essentially unchecked by meaningful competition for 10 years. In my Web log, I have invested some time in describing Microsoft’s cloud computing initiatives from 1999 to the present day.

For me and maybe many of you, Google has become an environmental factor, and it is disrupting, possibly warping, many information spaces, including search, content processing, data management, applications like word processing, mapping, and others.

Microsoft is working to counter Google, and its strategy is a combination of software and low adoption costs. I believe that Microsoft’s SharePoint has become the dominant content management, collaboration, and search platform with 100 million licenses in organizations. SharePoint, however, is not well understood as technically complex and a work in progress. Anyone who asserts that SharePoint is simple or easy is misrepresenting the system. Here’s a diagram from a Microsoft Certified Gold vendor in New Zealand. Simple this is not.

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Online (general), Search, Semantic, Text processing | 1 Comment

Accelerating XML Parsing

December 4, 2008

I have received a number of comments about the high speed indexing referenced in the interview with Perfect Search. One reader asked me to call attention to the open source XML parser VTD-XML. The acronym means Virtual Token Descriptor for eXtensible Markup Language. The suite of open source software may not meet the needs of some content processing applications because the number of large documents imposes additional work for the developer. However, for database type and other types of records, the method can eliminate redundant parsing, which is computationally expensive. One reader sent me a link to a useful description of VTD-XML. Here are the links to this write up by James Zhang. The original series–“Index XML Documents with VTD-XML of VTD-XML”–was published by SOA World Magazine, whose url is www.soa.sys-con.com. (Note: Sys-con has republished at least one of the articles from this Beyond Search Web log.) The explanation of the method is in five parts. The first section provides a general description and the last section spells out the performance improvements:

How to turn the indexing capability on in your application here
Part 2 here — Sample code
Part 3 here — Sample code
Part 4 here — A discussion of application scenarios
Part 5 here — The benchmark table

The conclusion to the write up made this point:

It’s not uncommon that those overheads [redundant parsing of XML] account for 80%-90% or more of the total CPU cycles of running the application. VTD-XML obliterates those overheads since there’s not much overhead left to optimize. Using VTD-XML as a parser reduces XML parsing overhead by 5x-10x. Next VTD-XML’s incremental update uniquely eliminates the roundtrip overhead of updating XML. Moreover, this article shows VTD-XML’s innovative non-blocking, stateless XPath engine significantly outperforming Jaxen and Xalan. With the addition of the indexing capability, XML parsing has now become “optional.”

A happy quack to the reader who called the VTD-XML method to my attention.

Stephen Arnold, December 4, 2008

Written by Stephen E. Arnold · Filed Under News, Search, Semantic, Technology, Text processing | Comments Off on Accelerating XML Parsing

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search