Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

Is XML Running Out of Steam in Search?

July 22, 2011

XML is probably the most well known web technology in the world but users are discovering that depending on their needs other technologies can be quite valuable. According to the XML article “JSON vs XML – A Jason vs Freddie Sequel,” JSON is a functional and feature friendly web technology. XML/XMLHttpRequest refers to “the world wide XML standard for data” and is used to describe data format as well as transportation pattern.

The XMLHttpRequest is needed in order to obtain information from servers. Unless a proxy server featuring an AJAX XML toolkit is used, “the server has to be in the same domain as the web page.” JSON, stands for JavaScript Object Notation and this option of data formatting makes information obtained from any server native JavaScript. When information is obtained from the server it is already in JavaScript object format and ready to be used. In addition users can add additional tools such as methods and procedures to JavaScript depending on their needs.

JSON allows users to gain flexibility and build technology that meets their specific needs. “You can call it “serverless” programming. Users drop small pieces of JavaScript into their HTML to get big functionality. “XML is still highly used and is a good choice but JSON definitely gives them a run for their money.”

So what?

With certain XML centric vendors repositioning or taking a low, low, low profile, maybe XML for search is running out of steam or in search of a new way to generate revenues? Want me to identify some XML search engines which have drifted out of the spotlight? Well, I won’t. Let’s just look for vendors who are repositioning or telling me, “Just because we have no blog posts and no tweets, we are really cruising along.” Okay with me.

Stephen E Arnold, July 22, 2011

Sponsored by Pandia.com, publishers of the New Landscape of Enterprise Search

Google+: Strong on Social, Weak on Search? What?

July 22, 2011

According to the Computerworld article “Elgan” What I Lost On the Google+ Diet” Mike Elgan thinks it could be the next big thing. Elgan’s main point of his weeklong diet “was to see if consolidating and streamlining all social activity into Google+ was possible and, if so, desirable.”

He gave the platform high marks and found it quite functional and even a little addictive as a social media platform for all types of users. However, Elgan did admit that “Google+ is still a work in progress” and lacks a key factor, search. One of the most vital parts of any social media platform is the search capabilities. People go to Twitter and Facebook not only to talk but to search for jobs, news and other pertinent information.

Being able to search and network with clients in your industry, old high school friends or even potential employers is priceless. Though networking is key, search is the cornerstone of social media platforms. Elgan may thing Google+ could be the next social networking giant but without search, Google+ can’t seriously compete and is just an afterthought.

April Holmes, July 22, 2011

Sponsored by Pandia.com, publishers of the New Landscape of Enterprise Search

A Big Market Goes Beyond Google

July 22, 2011

Google may reign supreme in the US and other countries when it comes to the search engine world but they haven’t achieved world dominance just yet. According to the Search Engine Watch article “In China, Baidu Continues Search Market Domination Over Google,” Google’s position continues to decline in China’s growing search market. “While Google’s share fell from 19.2 percent to 18.9 percent, Baidu’s share grew to 75.9 percent, up slightly from 75.8 percent in Q1.”

Even as recent as 2009 numbers showed that Google held 35.6 percent of China’s search market compared to Baidu’s 58.4 percent. However, Google has seen a steady decline due to clashes with China’s government. Amid censorship battles as well as alleged Gmail hacking accusations, Google began to redirect its searches to Hong Kong.

On the bright side Google’s Android operating system “runs on about half of all new smartphones sold – roughly 5 million units per quarter.” And helps keeps them alive in China’s market. The search engine Sogou which grew to 2.4 percent aims to overtake Google’s spot within a year. It is an ambitious goal but even with their numbers declining one can be sure that Google is not waving the white flag just yet.

And there is Jike.com. It indexes ArnoldIT.com too.

Stephen E Arnold, July 22, 2011

Sponsored by Pandia.com, publishers of the New Landscape of Enterprise Search

SharePoint Records Governance and System Recovery

July 22, 2011

Microsoft SharePoint delivers content and collaboration services to hundreds of millions of users each day. Modern systems are complex, and most systems suffer hiccups at one time or another. In the midst or record heat waves, unexpected power issues are surfacing in North America. In Chicago and Louisville, power problems plagued a number of organizations. Human error can create issues such as the recent downtime for Google Apps. No organization is completely insulated from hardware or human error.

We found the information in “Your SharePoint Records Governance Plan and ISO 15489: Disaster Preparation and Recovery” Part One and Part Two useful and timely. The CMSWire story focuses on references to disaster preparation and recovery in ISO 15489. The article does an excellent job of summarizing the key steps in a SharePoint disaster and recovery plan. We found it interesting that this technical system and method is slotted under the umbrella of “records governance”. Regardless of the nomenclature, the information is essential for any SharePoint administrator or SharePoint certified professional.

The write up provides summary of the information provided by Microsoft in its SharePoint 2010 Administrator’s Companion. One of the key passages we noted was:

SharePoint 2010 performs full and differential backups on the entire farm, farm configuration information, service applications, web applications and content databases. What’s more, SharePoint 2010 performs granular backups. For the Records and Information Manager, this means site collections, sites, libraries and lists can be copied. Through the Backup and Restore interface, the administrator role can start a site collection backup, export a site or list, recover data from an unattached content database or check the status of a currently running granular backup. Also, SharePoint 2010 introduces a backup file containing antivirus settings, information rights management (IRM), outbound email settings, diagnostic loggings and workflow. The Companion advises, though, that there are limitations: Backups/restores are not scheduled, more than one Web application or service application cannot be backed up simultaneously without performing an entire farm backup and SharePoint 2010 doesn’t allow you to make backups directly to tape.

To be fully informed, the CMSWire story recommends that a SharePoint records governance plan be documented with a SharePoint dependent plan (software, hardware, software) and a SharePoint component plan (essentially settings and configuration information).

One of the most interesting suggestions is the recommendation to put in place a method for preserving the SharePoint implementation documentation. The four tricks are valuable. We don’t want to repeat each of them in this write up, but you will want to think about a “two stage recycle bin” and “versioning.”

This is a useful article and it will reside in our SharePoint tips folder. Well done!

Stephen E Arnold, July 22, 2011

Sponsored by SurfRay, the developer of the Ontolica search and content processing system.

DtSearch Marketing Tweak

July 21, 2011

Blogger Greg Duncan focuses on enterprise search provider dtSearch in his July 6, 2011, iteration of “Greg’s Cool [Insert Clever Name] of the Day.” Here he cites I Programmer’s article, “Getting started with dtSearch” as the only recent piece he’s seen that instructs on how to begin using dtSearch. The original is indeed a thorough resource.

The piece is interesting to us, though, because it shows that dtSearch is trying a different marketing angle to developers. I Programmer’s Ian Elliot wrote:

“I also have a great interest in desktop search – or rather how it generally doesn’t work under Windows. Since Vista, Window’s desktop search has been difficult to use, difficult to configure and difficult to manage. I’ve tried alternatives such as Windows Search 4.0 and Solr but there are problems with both. They tend to over complex and simply not worth the effort. Now I’m investigating dtSearch and I can tell you now, it’s a refreshing return to simplicity.”

dtSearch has also added to its proprietary file parsers. Let’s see whether these efforts keep the firm as the darling of the Microsoft developers’ search and retrieval eye. Founded in 1991, dtSearch is a stalwart in Microsoft-centric search.

Cynthia Murrell July 21, 2011

Sponsored by ArticleOnePartners.com, the leader in crowdsourced patent research and litigation support.

Autonomy-Repsol Articles at E-Business

July 21, 2011

We’ve found an interesting roundup of Autonomy-related information on the Repsol deal at E-Business Library. What is interesting is that the page looks as if it were assembled automatically. Does Panda have a way to discern auto generated pages.

But automated or not, there’s a lot of information, and Autonomy should be quite happy with whoever created the Repsol page. Here’s an example from one of the documents snippetized by the service. The source is a this press release which sums up the Autonomy Repsol agreement this way:

“Autonomy Corporation plc (LSE: AU. or AU.L), a global leader in infrastructure software for the enterprise, today announced that Repsol, Spain’s largest oil and gas company, has selected Autonomy’s cornerstone technology, IDOL (Intelligent Data Operating Layer) and Autonomy Virage for knowledge management across the enterprise.”

Repsol is a huge company with a LOT of infrastructure to manage. Autonomy provides expert tools for managing and analyzing information, including unstructured data, with their IDOL suite of products. In addition, Autonomy Virage is one of the leaders in video and audio search. Repsol employees will now be able to harness this power to manage their wealth of information and to share across their global operation. Sounds like a good choice.

Check out the roundup of articles at E-Business for more information. If you want to know what Autonomy is doing, you can navigate to Autonomy.com. The firm does a good job of posting information in a timely manner about its deals.

Programmers at Web indexing engines have their work cut out for them. Novices in search may have difficulty discerning the gems published by the addled goose from the pages generated from unknown methods.

Cynthia Murrell July 21, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Google Alerts

July 21, 2011

In the dust and haze of Google+, we wanted to make sure we captured this source of Google information. The  service is called Code Google, and you can locate it at http://google-alert.blogspot.com/.

image

The idea is that a Google hungry person can navigate to the site and see pointers to Google-related information.

A bit of clicking around revealed an unusual mix of information. When we looked at the site on July 20, 2011 at 9 pm Eastern, the most recent update was earlier in the afternoon. We were able to locate the service in the Google index, which struck us as interesting. The site popped up when we used the site operator.

We tucked this into our folder marked “Possible spider bait.” We like more focused information services or what we call “sites with a tight semantic vector.” Take a look. Make your own decision.

Cynthia Murrell July 21, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Attensity Command Center Gives Clients Control

July 21, 2011

Attensity Looks to Give Brands a Window into Social Media,” reports the Silicon Valley BizBlog. Attensity is touting its new Command Center software, which takes social media analysis a step further. It’s designed to display the real time information continuously to their customers’ employees. What caught my eye was this passage:

The Attensity Command Center is basically a bank of monitors and the back end software to run the monitors. Using proprietary, patented text analysis algorithms, the platform categorizes incoming tweets by subject, sentiment, and geography, etc. The goal is to aggregate and visualize what’s being said online, so that the customers can know in real time how many people are talking about them and what they’re saying.

Writer Jon Xavier experienced a demo of the product, and was suitably impressed. His only issue was that the passing tweets moved too fast to read them. He noted that to make full use of the software, a company would have to dedicate a couple of employees to monitoring and acting on the information.

image

Nope, it is not virtual. Will social media augment this reality? Image source: http://goo.gl/i3TIb

The interest in social media is fascinating. Once the Internet was for rocket scientists. Now the Internet is the place to stroll. A digital las ramblas. When gizmos are embedded in the human body, the Information Highway takes on an interesting shape. The metaphors used to describe the next big thing will be interesting. For now, Attensity touts control

With this offering, Attensity amps up marketing in the ad sector. Will it be enough to make headway against the Google+ marketing cyclone?

Stephen E Arnold, July 21, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Webtrends Analyzes SharePoint

July 21, 2011

SharePoint has many partnerships with outside companies and the recently announced another one with Webtrends. We’ve been following the new business relationship and found two articles on the topic. CMS Newswire has “Microsoft Prefers Webtrends for SharePoint 2010” that reads more like an advertisement than a brief description of the deal. (You can get the Webtrends’ write up at this link.)

Webtrends is a Web analytics and related software solutions company. Microsoft chose it to be the preferred analytics solution for SharePoint. It tracks SharePoint 2010 pages captures data in key reports: web parts viewed, breadcrumbs report, document action, activity by user, and user specific document interaction. We learned from the article:

“Webtrends has built an integrated analytics and optimization solution for our growing SharePoint community to help our customers get the most from SharePoint 2010,” said Jared Spataro, director, SharePoint product marketing at Microsoft. “Many SharePoint customers worldwide are Webtrends customers. We expect that this alliance will help drive business results for SharePoint 2010 customers across the globe.”

The second article comes from the Webtrends blog: “The Webtrends and Microsoft Alliance for SharePoint Provides a Game-Changing Platform.” It provides more details about he importance of the partnership. It outlines three reasons:

  1. Optimization for collaboration ad governance by collecting data and creating reports for clients
  2. Compatibility with SharePoint. More companies are using SharePoint 2010 to develop their web sites and the lines are blurring between Internet and intranet. Webtrends tracks the data with a variety of products in one self-contained platform
  3. Snap in approach. By having SharePoint and Webtrends under the same collaboration umbrella it guarantees a seamless transition between programs for business intelligence.

Like a number of other SharePoint centric write ups, the author seems to be following a sidewalk on Madison Avenue. We think that a handy tech can make Google Analytics perform some useful tricks for free?

Stephen E Arnold, July 21, 2011

Sponsored by  SurfRay, developers of the Ontolica search system for SharePoint

Symantec Snaps Up Clearwell to Enter E Discovery Market

July 20, 2011

I do some odd jobs for Enterprise Technology Management. Among them is hosting podcasts on various topics. Last week we did a podcast with several luminaries in the e discovery market. E Discovery is a term used to describe the content and text processing required to figure out what is in unstructured content gathered in a legal matter. There doesn’t have to be a law suit to trigger a company’s running an e Discovery project, but unlike search, e Discovery beckons legal eagles.

We read the article “Symantec acquires Clearwell Systems for $390m.” Perhaps best known for their antivirus software, Symantec also offers an array of information management solutions. Clearwell Systems specializes in e-discovery tools, used in response to litigation and other legal/ investigative matters.

Symantec gains much with the acquisition:

Symantec notes the acquisition will add archiving, backup and eDiscovery offerings to its existing offerings, enabling it to offer a broader set of information management capabilities to customers. The deal will help Symantec provide future product integration opportunities with Symantec backup and security, Symantec NetBackup, Data Loss Prevention and Data Insight, the company said.

This acquisition moves e-discovery to the cloud, while continuing the appliance approach.

On the podcast I learned:

  • There will be a push for more hosted services. Autonomy has done a good job with its Zantaz acquisition and its hosted services, so Symantec is going down a route that leads to a pay off.
  • The Clearwell approach will continue to feature its rapid deployment model. I associated the phrase “rocket docket” with Clearwell which connotes speedy service.
  • The Clearwell report and user audit functions will be expanded and enhanced. I saw a Clearwell report and watched an attorney pop it in an envelope for delivery to another attorney. The system impressed me because the report did not require any fiddling by the attorney. Good stuff.

Naturally, other new services are planned. Stay tuned.

Cynthia Murrell July 14, 2011

« Previous PageNext Page »

  •  Only search links from this page: