Inteltrax: Top Stories, June 18 to June 22
June 25, 2012
Inteltrax, the data fusion and business intelligence information service, captured three key stories germane to search this week, specifically, some fast-moving news in the big data business.
Our story, “Small Businesses Need Analytics Too” showcases the rising tide of small companies improving business through big data.
With the rise in big data business, “Analytic Customer Support Reaches New Heights” shows how helping the customer is helping vendors differentiate themselves.
Perhaps no news is bigger than the money IBM is spending on big data, as we covered in “IBM Sees the Future and Invests.”
The news landscape is always changing in big data. We’ll keep an eye on the small businesses and the IBMs and everyone in between to keep readers up to date, everyday.
Follow the Inteltrax news stream by visiting www.inteltrax.com
Patrick Roland, Editor, Inteltrax.
June 25, 2012
A Closer Look at Data Sovereignty Issues across Geographical Borders
June 25, 2012
At GigaOM.com, Barb Darrow weighs in on data sovereignty issues around the globe in her post, “Data Sovereignty Issues Still Weigh on Cloud Adoption.” Darrow points out that many large enterprises may embrace cloud computing, just not for key jobs because of restrictive regulations.
The author explains:
These laws…mandate that a company keep a customer’s data in that customer’s home country. One oft-cited reason is to prevent that data from being subpoenaed by a foreign power… Multiple regulations governing where a company can store customer data means that multinationals have to field data centers in every country where they have a presence — a trend that flies in the face of the appeal of borderless clouds.
One takeaway regarding the issue is that cloud service providers have to be able to meet regulatory obligations specific to the business sectors they address. Darrow also points out that until such hurtles are surpassed and cloud providers can provide detailed assurance where data resides, many businesses will keep residing on premise or in private clouds.
Darrow brings some good points to the discussion and highlights issues that need addressed in our global information age where data easily spans across physical borders. You may want to consider a third party solution built by experts in search and data management in the cloud with the European Union in mind. With Mindbreeze, you have options for on-premise and cloud usage:
Our information pairing technology makes you unbeatable. Information pairing unites enterprise information and Cloud information. This results in a complete overview of a company’s knowledge – the basis for your competitive advantage – allowing you to act quickly, reliably, dynamically and profitably in all business matters.
Compared to U.S. solutions, Mindbreeze seems to be on the right track. Read more about the full suite of solutions at http://www.mindbreeze.com/.
Philip West, June 25, 2012
Sponsored by Pandia.com
Connotate Comments on Data Scraping
June 22, 2012
There are a variety of different ways for businesses to store and manage their date. The recent Connotate blog post “Don’t Miss Out on Key Information With Data Scraping Solutions” regarding the company’s shifting positioning from smart agents to smart agents plus data scraping.
Data scraping is a technique in which a computer program extracts data from human-readable output coming from another program. According to the post, data scraping tools allow businesses to stay ahead of their competition by finding and collecting information online that can be housed in an internal database or shared with customers and employees. Data scraping technology helps businesses better understand your customer’s needs.
The article states:
“Things like current prices on travel or directories of doctors and other resources can help you give complete information to your customers so they continue doing business with you. When you work with Connotate, you can get the scraping solutions you want that will work the way you need them to.”
After reading this post, I’m interested to know, does data scraping suggest pulling content from web pages which have been set up to make it less easy to copy content?
Jasmine Ashton, June 22, 2012
Sponsored by PolySpot
Critical Patches Fend off Microsoft Active Attacks
June 22, 2012
Internet Explorer has long been a vulnerable target to cyber attacks, malware and the like, but Microsoft has just announced a large batch of patches in order to address vulnerabilities across a wide variety of their software offerings. ComputerWorld UK provides a complete breakdown in, “Microsoft Patches 26 Bugs, Warns Users of Active Attacks.”
Giving attention to all of the vulnerabilities, the author reports on what may be the most critical update, the one in need of adoption first. He quotes Andrew Storms, director of security operations at nCircle Security:
‘Certainly, [MS12-036] makes it to the top of the worrisome list,’ said Storms. That update, also rated critical, patches just one vulnerability in the Remote Desktop Protocol (RDP), a Windows component that lets users remotely access a PC or server. RDP is frequently used by corporate help desks, off-site users and IT administrators to manage servers at company data centers and those the enterprise farms out to cloud-based service providers.
Implications for those organizations that use Remote Desktop Protocol in any manner in their infrastructure, but especially in their enterprise SharePoint deployment, are obvious. There is need for concern and quick action in order to plug the security gap. However, it’s also reported that oddly, the updates must be manually downloaded.
The author continues:
All of the patches must be downloaded manually from Microsoft’s Download Center. They’re not served up through the usual Windows Update service or the enterprise-grade Windows Server Update Service (WSUS) software.
It seems that such critical updates, especially for those who use ubiquitous SharePoint, would be made more readily accessible and users would receive prompt notification.
For enterprises that are concerned about their security needs, consider a smart third party solution like Fabasoft Mindbreeze. Smaller and more agile, these companies can devote greater attention to security needs. Additionally, in the interest of being fair, Microsoft is always going to be a target for malware and viruses because of its sheer size. It is truly a huge target. However, adding Fabasoft Mindbreeze Enterprise to an existing SharePoint infrastructure will not only make the whole enterprise more secure, but also more easily accessible.
Read more about the security adherence of Fabasoft Mindbreeze, including relevant ISO standards. Just one example is as follows:
ISO 27001: The ISO standard 27001 is a worldwide recognized standard for the evaluation of the security of IT environments. For customers the certification means the adherence to clearly defined technical and security-based standards regarding all IT and business processes as well as all the company’s confidential information.
Sometimes bigger is not necessarily better, and this is one instance in which it definitely proves true. Move away from Microsoft, the major target of viruses and malware, and move toward a more agile, more secure solution. Fabasoft Mindbreeze Enterprise, and the whole suite of Mindbreeze products, can not only ease your security concerns, but also provide a more satisfying user experience.
Emily Rae Aldridge, June 22, 2012
Sponsored by Pandia.com
EU Data Laws Threaten Enterprise on the Cloud
June 21, 2012
Various cultures hold differing opinions on the nature of information and its implications on security. The United States has earned a reputation as increasingly difficult to deal with, as the quest for national security has led to regulations that create hardships for software developers and users alike. However, there is now talk about the implications of new European Union laws on Cloud platforms and enterprise software. IT World gives a full report in, “EU Data Laws are Latest Threat to Cloud.”
Kevin Fogerty, the author, introduces the issue:
European data-sovereignty laws requiring international companies to keep data on customers in the customer’s own country are not only causing headaches for database managers, they’re holding back adoption of cloud computing in many large companies according to a story in GigaOm yesterday. Corporate IT managers have been wary of European data-privacy laws since the early 2000s, when requirements designed to limit the degree to which corporations could move or exploit the personal data of customers came into vogue on the Continent.
Europe is clearly still struggling with the idea of national sovereignty versus sovereignty of the union. It is bleeding over into IT development and causing headaches. However, we are also told that the United States is driving some of these regulations and subsequent frustrations:
More recently, fears of U.S. prosecutors subpoenaing private data on European customers in European countries has accelerated the priority of data sovereignty laws as well.
So for multi-national companies who truly need an enterprise solution that will allow the transfer and access of data across country borders, what is to be done? For now, it may be that some of the legalities of the new regulations need time to be vetted and moderated. In the meantime, it would be wise for organizations to choose a smart third-party solution that can increase the efficiency of their SharePoint platform without running up against these regulations.
Fabasoft Mindbreeze is a great solution for organizations on either side of the Atlantic. For customers in the United States, Fabasoft Mindbreeze Enterprise guarantees the highest level of security. For European users, Mindbreeze was truly written with EU standards in mind; therefore insuring compliance without added stress or workload.
Daniel Fallmann addresses some of the concerns surrounding the use of the US Patriot Act to access international data:
During the development of Fabasoft Mindbreeze we focused 100% exclusively on European values – not one single bit of American software product is to be found. The US Patriot Act doesn’t apply. Not using US American manufactured software ensures that US authorities have no right to access European Cloud data.
While most users need not be concerned with the Patriot Act and its potential implications for unauthorized access of data, some European customers will rest assured knowing that Fabasoft Mindbreeze engineers products with EU standards in mind, ensuring fewer headaches.
Emily Rae Aldridge, June 21, 2012
Sponsored by Pandia.com
Understanding Search Features Available in Out of the Box SharePoint
June 20, 2012
In “SharePoint 2010 Search: Relevance, Refinement, People,” Jennifer Mason takes a closer look at powerful search features available in SharePoint to help you locate and access the data and content you’ve added to the farm. Mason explains her approach to the topic:
Search is everywhere, and SharePoint is no exception. By providing your users with a way to easily find their content you are able to greatly increase the usability and user adoption within your organization. This article will highlight the specific ways that SharePoint search enhances your environment.
She also explains basic content and people search options in out-of-the-box SharePoint that can be a big help to many users. But she also points out that any advanced and higher level search functionality that can make the search and navigation experience better for your users lies in the type of licensing you have and if you choose to invest in implementing FAST Search Server for SharePoint.
Mason goes on to comment on search strategies:
In most organizations a Search strategy is developed that includes information on what content sources need to be created as well as what scopes should be implemented. A good practice is to also have a primary resource that is responsible for reviewing the Search Analytics reports and taking steps to provide continuous improvements to the overall search experience. Search is an area in SharePoint that can potentially cross many teams and require multiple resources so it is a good idea to spend some time planning to ensure that your environment is scoped appropriately.
We agree that search is always deserving of improvement and attention as it is users’ means to access and reuse valuable business knowledge. Depending on your organization, you may not want to devote the time and effort for extensive configurations and training to develop a powerful search feature. We think it would be easier to go with a simple third-party solution like Mindbreeze, cutting down on the costly man hours.
Fabasoft Mindbreeze Enterprise provides consistent and comprehensive information access to both corporate and Cloud sources and . . .
finds every scrap of information within a very short time, whether document, contract, note, e-mail or calendar entry, in intranet or internet, person- or text-related. The software solution finds all required information, regardless of source, for its users. Get a comprehensive overview of corporate knowledge in seconds without redundancy or loss of data.
The seamless Cloud solution makes sure you find the right information you need at any time. Check out the full suite of solutions at Fabasoft Mindbreeze.
Philip West, June 20, 2012
Sponsored by Pandia.com
Richard Paterson Assesses Mobile Access Options for your SharePoint Site
June 19, 2012
Mobile access is no doubt becoming the ubiquitous and a go-to source for users as the office extends well beyond physical office walls. In “SharePoint on Mobile Devices: The Options,” Richard Paterson takes a look at ways to expose SharePoint portals on mobile devices, including out-of-the-box mobile SharePoint views, Responsive Design, Mobile Web Apps, and native apps.
Paterson points out that mobile views are not ‘one size fits all,’ rather the delivery style you choose depends on your target audience, target device, and site content. While an out-of-the-box SharePoint mobile view is simple to implement, it gives you only basic navigation around sites, lists, and document libraries.
Paterson has this to say about a framework solution:
A framework such as Mobile Entrée can provide a richer experience than the out of the box views and will normally provide an API to code against, making extensibility and customization of the functionality intuitive. They will normally operate on templates which allows for powerful front end customization. Views can be made available for different devices, e.g. you might want a two column layout on a tablet, whereas on a phone you would want just one. Frameworks will make form building simpler and also presenting back graphs and tables often used in Business Intelligence easy and more cost effective to implement.
If your site is geared toward news, marketing, or communication, Paterson suggests that a Responsive Design may be best, but it is lacking if you need forms or functional parts. A native device app can give you big results for the user experience, but Paterson explains it also requires the largest investment.
Paterson’s reviews are not comprehensive, but definitely provide a good introduction to mobile options. You may consider the read if you’re looking to beef up your site’s mobile experience. You may also consider adding a third party solution as a means to expand mobile offerings while also saving valuable time and investment resources. We like the feedback we’ve seen about Fabasoft Mindbreeze Mobile.
Here you can read about the Mobile Search solution from Mindbreeze:
Smartphones and tablets are constant companions, indispensable in the business world. Information needs to be able to be exchanged at all times and wherever you are. Easily. Quickly. Securely. Fabasoft Mindbreeze Mobile makes company data available on all mobile devices. Regardless of whether you have a BlackBerry®, iPhone®, Windows Phone or Android™ Smartphone or a tablet such as the Apple iPad, Samsung Chromebook/GalaxyTab or Blackberry Playbook, you can act independently and freely – yet always securely. Irrespective of what format the data is in. Full functionality: the display of the search results is homogenous to the tried and tested web client in terms of clear design and intuitive navigation.
Read more about the Fabasoft Mindbreeze Search Solutions that easily integrate into your SharePoint farm at http://www.mindbreeze.com/.
Philip West, June 19, 2012
Sponsored by Pandia.com
Deduplication: Flawed Method or Just More Woes for HP?
June 19, 2012
Is HP duping consumers or is SEPATON waging a product war? Its sales versus sales in the article SEPATON: HP Offers ‘Least Capable’ Dedupe in the Industry, with SEPATON full of fire and not pulling any punches. HP did a little dodge and duck, but mainly stayed straight forward.
Linda Mentzer and Peter Quirk from SEPATON stated:
“The HP B6200 offers the least capable de-dupe in the industry. Each tape device on a B6200 is effectively a distinct de-dupe domain. Backups sent to a one drive don’t de-dupe against backups sent to another drive on the same node! What happens when you’re back up exceeds the capability of one drive?”
Beauty is in the eyes of the beholder and apparently HP finds the B6200 much more appealing than its own creator. Sean Kenney from HP Storage challenged SEPATON’s comments, stating:
“It is important to note that the B6200 is a single logical system. Any emulated tape drive within a VTL completely de-duplicates with all other tape drives in that VTL. The B6200 is a terrific solution for database and multiplexed workloads.”
Ah, hardware performance and software issues it seems. HP doesn’t claim the system to be perfect, but instead presents realistic options to prevent issues. Though SEPATON argues that B6200 lacks functionality, they also make it a point to promote their other products. Could this be a coincidence or just more woes for HP?
Jennifer Shockley, June 19, 2012
The Alleged Received Wisdom about Predictive Coding
June 19, 2012
Let’s start off with a recommendation. Snag a copy of the Wall Street Journal and read the hard copy front page story in the Marketplace section, “Computers Carry Water of Pretrial Legal Work.” In theory, you can read the story online if you don’t have Sections A-1, A-10 of the June 18, 2012, newspaper. Check out a variant of the story appears as “Why Hire a Lawyer? Computers Are Cheaper.”
Now let me offer a possibly shocking observation: The costs of litigation are not going down for certain legal matters. Neither bargain basement human attorneys nor Fancy Dan content processing systems make the legal bills smaller. Your mileage may vary, but for those snared in some legal traffic jams, costs are tough to control. In fact, search and content processing can impact costs, just not in the way some of the licensees of next generation systems expect. That is one of the mysteries of online that few can penetrate.
The main idea of the Wall Street Journal story is that “predictive coding” can do work that human lawyers do for a higher cost but sometimes with much less precision. That’s the hint about costs in my opinion. But the article is traditional journalistic gold. Coming from the Murdoch organization, what did I expect? i2 Group has been chugging along with relationship maps for case analyses of important matters since 1990. Big alert: i2 Ltd. was a client of mine. Let’s see that was more than a couple of weeks ago that basic discovery functions were available.
The write up quotes published analyses which indicate that when humans review documents, those humans get tired and do a lousy job. The article cites “experts” who from Thomson Reuters, a firm steeped in legal and digital expertise, who point out that predictive coding is going to be an even bigger business. Here’s the passage I underlined: “Greg McPolin, an executive at the legal outsourcing firm Pangea3 which is owned by Thomson Reuters Corp., says about one third of the company’s clients are considering using predictive coding in their matters.” This factoid is likely to spawn a swarm of azure chip consultants who will explain how big the market for predictive coding will be. Good news for the firms engaged in this content processing activity.
What goes faster? The costs of a legal matter or the costs of a legal matter that requires automation and trained attorneys? Why do companies embrace automation plus human attorneys? Risk certainly is a turbo charger?
The article also explains how predictive coding works, offers some cost estimates for various actions related to a document, and adds some cautionary points about predictive coding proving itself in court. In short, we have a touchstone document about this niche in search and content processing.
My thoughts about predictive coding are related to the broader trends in the use of systems and methods to figure out what is in a corpus and what a document is about.
First, the driver for most content processing is related to two quite human needs. First, the costs of coping with large volumes of information is high and going up fast. Second, the need to reduce risk. Most professionals find quips about orange jump suits, sharing a cell with Mr. Madoff, and the iconic “perp walk” downright depressing. When a legal matter surfaces, the need to know what’s in a collection of content like corporate email is high. The need for speed is driven by executive urgency. The cost factor clicks in when the chief financial officer has to figure out the costs of determining what’s in those documents. Predictive coding to the rescue. One firm used the phrase “rocket docket” to communicate speed. Other firms promise optimized statistical routines. The big idea is that automation is fast and cheaper than having lots of attorneys sifting through documents in printed or digital form. The Wall Street Journal is right. Automated content processing is going to be a big business. I just hit the two key drivers. Why dance around what is fueling this sector?
More Predictive Silliness: Coding, Decisioning, Baloneying
June 18, 2012
It must be the summer vacation warm and fuzzies. I received another wild analytics news release today. This one comes from 5WPR, “a top 25 PR agency.” Wow. I learned from the spam: PeekAnalytics “delivers enterprise class Twitter analytics and help marketers understand their social consumers.”
What?
Then I read:
By identifying where Twitter users exist elsewhere on the Web, PeekAnalytics offers unparalleled audience metrics from consumer data aggregated not just from Twitter, but from over sixty social sites and every major blog platform.
The notion of algorithms explaining anything is interesting. But the problem with numerical recipes is that those who use outputs may not know what’s going on under the hood. Wide spread knowledge of the specific algorithms, the thresholds built into the system, and the assumptions underlying the selection of a particular method is in short supply.
Analytics is the realm of the one percent of the population trained to understand the strengths and weaknesses of specific mathematical systems and methods. The 99 percent are destined to accept analytics system outputs without knowing how the data were selected, shaped, formed, and presented given the constraints of the inputs. Who cares? Well, obviously not some marketers of predictive analytics, automated indexing, and some trigger trading systems. Too bad for me. I do care.
When I read about analytics and understanding, I shudder. As an old goose, each body shake costs me some feathers, and I don’t have many more to lose at age 67. The reality of fancy math is that those selling its benefits do not understand its limitations.
Consider the notion of using a group of analytic methods to figure out the meaning of a document. Then consider the numerical recipes required to identify a particular document as important from thousands or millions of other documents.
When companies describe the benefits of a mathematical system, the details are lost in the dust. In fact, bringing up a detail results in a wrinkled brow. Consider the Kolmogorov-Smirnov Test. Has this non parametric test been applied to the analytics system which marketers have presented to you in the last “death by PowerPoint” session? The response from 99.5 percent of the people in the world is, “Kolmo who?” or “Isn’t Smirnov a vodka?” Bzzzz. Wrong.
Mathematical methods which generate probabilities are essential to many business sectors. When one moves fuel rods at a nuclear reactor, the decision about what rod to put where is informed by a range of mathematical methods. Special training experts, often with degrees in nuclear engineering plus post graduate work handle the fuel rod manipulation. Take it from me. Direct observation is not the optimal way to figure out fuel pool rod distribution. Get the math “wrong” and some pretty exciting events transpire. Monte Carlo anyone? John Gray? Julian Steyn? If these names mean nothing to you, you would not want to sign up for work in a nuclear facility.
Why then would a person with zero knowledge of how numerical recipes, oddball outputs from particular types of algorithms, and little or know experience with probability methods use the outputs of a system as “truth.” The outputs of analytical systems require expertise to interpret. Looking at a nifty graphic generated by Spotfire or Palantir is NOT the same as understand what decisions have been made, what limitations exist within the data display, and what are the blind spots generated by the particular method or suite of methods. (Firms which do focus on explaining and delivering systems which make it clear to users about methods, constraints, and considerations include Digital Reasoning, Ikanow, and Content Analyst. Others? You are on your own, folks.)
Today I have yet another conference call with 30 somethings who are into analytics. Analytics is the “next big thing.” Just as people assume coding up a Web site is easy, people assume that mathematical methods are now the mental equivalent of clicking a mouse to get a document. Wrong.
The likelihood of misinterpreting the outputs of modern analytic systems is higher than it was when I entered the workforce after graduate school. These reasons include:
- A rise in the “something for nothing” approach to information. A few clicks, a phone call, and chit chat with colleagues makes many people expert in quite difficult systems and methods. In the mid 1960s, there was limited access to systems which could do clever stuff with tricks from my relative Vladimir Ivanovich Arnold. Today, the majority of the people with whom I interact assume their ability to generate a graph and interpret a scatter diagram equips them as analytic mavens. Math is and will remain hard. Nothing worthwhile comes easy. That truism is not too popular with the 30 somethings who explain the advantages of analytics products they sell.
- Sizzle over content. Most of the wild and crazy decisions I have learned about come from managers who accept analytic system outputs as a page from old Torah scrolls from Yitzchok Riesman’s collection. High ranking government officials want eye candy, so modern analytic systems generate snazzy graphics. Does the government official know what the methods were and the data’s limitations? Nope. Bring this up and the comment is, “Don’t get into the weeds with me, sir.” No problem. I am an old advisor in rural Kentucky.
- Entrepreneurs, failing search system vendors, and open source repackagers are painting the bandwagon and polishing the tubas and trombones. The analytics parade is on. From automated and predictive indexing to surfacing nuggets in social media—the music is loud and getting louder. With so many firms jumping into the bandwagon or joining the parade, the reality of analytics is essentially irrelevant.
The bottom line for me is that the social boom is at or near its crest. Marketers—particularly those in content processing and search—are desperate for a hook which will generate revenues. Analytics seems to be as good as any other idea which is converted by azure chip consultants and carpetbaggers into a “real business.”
The problem is that analytics is math. Math is easy as 1-2-3; math is as complex as MIT’s advanced courses. With each advance in computing power, more fancy math becomes possible. As math advances, the number of folks who can figure out what a method yields decreases. The result is a growing “cloud of unknowing” with regard to analytics. Putting this into a visualization makes clear the challenge.
Stephen E Arnold, June 18, 2012