Chiliads Discovery/Alert 8.2 Packed With “Enhancements” and “Fixes”
February 20, 2014
The press release from Chiliad titled Chiliad Introduces a New Discovery/Alert Release declares the data analysis company’s latest release. The Discovery/Alert 8.2 is promoted as a more advanced version of the discovery and alert service the company is known for. Among the new product features are both upgrades and fixes such as the Push Web Service, which enables adding documents to collections and the improved user interface for a better “configuration of external (e.g. Google, Solr) collections”. The article also names the following features,
“New operators with alternative ranking algorithms…support for range queries and faceting of concepts…Entity Resolution Framework…Ability to do searches across internal Chiliad and external Lucene/Solr systems with global ranking…Supported platforms include: Red Hat Linux 5.2 or later… Red Hat Linux 6.4 on IBM Power platform, ppc64 architecture (beta version)… Chiliad active-agent technology forms a virtual consolidated data center that enables multidimensional analysis and global ranking across all sources in real time.”
Along with these platform enhancements, Chiliad updates its web site and uses a variant of connecting dots. Chiliad ‘connects “all” the dots’, if you will. The pricing is available through an email contact. Chiliad promises that their technology vanquishes any need to search collection by collection. Instead, users are able to accurately consolidate collections into a data center.
Chelsea Kerwin, February 20, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Blackberry Adds SharePoint Access
February 20, 2014
Enterprise is moving toward mobile at a rapid pace, and applications that hope to stay in the game have to adapt. SharePoint has made great strides in mobile in the last two years particularly. And in response, Blackberry is enabling SharePoint mobile functions. Read the story in, “Work Drives for BlackBerry 10 Adds Sharepoint Access for BES Users.”
The article says:
“BlackBerry has updated their Work Drives application to v2.0 today for devices running OS 10.2+. This new version extends the application to allow remote file access to SharePoint sites. This is on top of the network drive access they had in the previous versions. Sadly you still need BlackBerry Enterprise Server 10.2 to use the network shares even though BlackBerry could easily expand the user base for the application by allowing all users to mount network drives.”
Stephen E. Arnold is a longtime leader in search and follows the latest on SharePoint on his Web site, ArnoldIT.com. Arnold finds that organizations are increasingly motivated by mobile technologies, as work and employees is moving increasingly off-site and outside of regular hours.
Emily Rae Aldridge, February 20, 2014
Frequentists Versus Bayesians: Is HP Amused?
February 19, 2014
I read a long report and then a handful of spin off reports about HP and Autonomy, mid February 2014 version. The Financial Times’s story is a for fee job. You can get a feel for the information in “HP Executives Knew of Autonomy’s Hardware Sales Losses: Report.” There are clever discussions of this allegedly “new information” in a number of blogs. What is interesting is an allegedly accurate chunk of information in “HP Explores Settlement of Autonomy Shareholder Lawsuit.” My head is spinning. HP buys something. Changes the person on watch when the deal was worked out. HP gets a new boss and makes changes to its board of directors. HP then accuses everyone except itself for buying Autonomy for a lot of money. HP then whips up the regulators, agitates accounting firms, and pokes Michael Lynch with a cattle prod.
As this activity was in the microwave, it appears that HP knew how the hardware/software deals were handled. If the reports are accurate, Dell hardware was more desirable than HP’s hardware.
But there is a more interesting twist. I refer you, gentle reader, to “A Fervent Defense of Frequentist Statistics.” Autonomy’s “black box” consists of Bayesian methods and what I call MCMC or Monte Carlo and Markov Chain techniques. The idea is that once some judgment calls are made, the Integrated Data Operating Layer or IDOL can chug away without human involvement. When properly resourced and trained, the Autonomy system works for certain types of content processing and information retrieval applications. You can read more about IDOL in our for-fee analysis of IDOL. This document reviews several important patents germane to the Autonomy system. You can purchase a copy of this analysis at https://gumroad.com/l/autonomy.
In a Fervent Defense, an old battle line is reactivated. The “frequentists” are not exactly thrilled with the rise of Bayesian methods. Autonomy emerged from Cambridge University when some of the Bayesian methods were revealed as crucial to World War II activities. Freqeuntists point out that there are some myths about Bayesian methods. The write up is not for MBAs, failed Web masters, and unemployed middle school teachers. For example, the myths allegedly dispelled in the article are:
- “Bayesian methods are optimal.
- Bayesian methods are optimal except for computational considerations.
- We can deal with computational constraints simply by making approximations to Bayes.
- The prior isn’t a big deal because Bayesians can always share likelihood ratios.
- Frequentist methods need to assume their model is correct, or that the data are i.i.d.
- Frequentist methods can only deal with simple models, and make arbitrary cutoffs in model complexity (aka: “I’m Bayesian because I want to do Solomonoff induction”).
- Frequentist methods hide their assumptions while Bayesian methods make assumptions explicit.
- Frequentist methods are fragile, Bayesian methods are robust.
- Frequentist methods are responsible for bad science
- Frequentist methods are unprincipled/hacky.
- Frequentist methods have no promising approach to computationally bounded inference.”
The key point is that HP is going to learn, already has learned, or learned and just forgotten that Bayesian methods are not a suitable for every single information processing application. In fact, using Bayesian when a frequentist method is more appropriate can produce unsatisfactory results for a discriminating data scientist. The use of frequentist methods when Bayesian is more appropriate can yield equally dissatisfying outputs.
The point is that if one buys a system built on one method and then applies it inappropriately, the knowledgeable user is going to be angry. It is possible that some disappointed users will take legal action, demand a license refund, or just hit the conference circuit and explain why such and such a system was a failure.
Will HP put the three ring circus of buying Autonomy to rest and then find itself mired in the jaws of a Bayesian versus frequentist dispute? My hunch is, “Yep.”
Could HP have convinced itself that Autonomy was a universal fix it kit for information processing problems? If the answer is, “Yes,” then HP is going to have to come to grips with licensees who are going to point out that the solution did not cure the problem.
In short, HP faces more excitement. The company will not be “idle” any time soon. HP may not be amused, but I am. Search is indeed a bit more difficult than some would have customers believe.
Stephen E Arnold, February 19, 2014
Thomson Reuters Acquires Entagen, Builds Cortellis Data Fusion Technology
February 19, 2014
The press release on ThomsonReuters.com titled Thomson Reuters Cortellis Data Fusion Addresses Big Data Challenges by Speeding Access to Critical Pharmaceutical Content announces the embrace of big data by revenue hungry Thomson Reuters. The new addition the suite of drug development technologies will offer users a more intuitive interface through which they will be able to analyze large volumes of data. Chris Bouton, General Manager at Thomson Reuters Life Sciences is quoted in the article,
“Cortellis Data Fusion gives us the ability to tie together information about entities like diseases and genes and the connections between them. We can do this from Cortellis data, from third party data, from a client’s internal data or from all of these at the same time. Our analytics enable the client to then explore these connections and identify unexpected associations, leading to new discoveries… driving novel insights for our clients is at the very core of our mission…”
The changes at Thomson Reuters are the result of the company’s acquisition of Entagen, according to the article. That company is a leader in the field of semantic search and has been working with biotech and pharmaceutical companies offering both development services and navigation software. Cortellis Data Fusion promises greater insights and better control over the data Thomson Reuters holds, while maintaining enterprise information security to keep the data safe.
Chelsea Kerwin, February 19, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
New Managers, Products from Centrifuge Systems
February 19, 2014
The announcement from Centrifuge titled Centrifuge Systems Strengthens Big Data Discovery and Security promotes the release of Centrifuge 2.10. The new features of the link analysis and visualization software include the ability to block access as well as grant access to specific individuals, a more flexible method of login validation and the ability to “define hidden data sources, data connections and connection parameters.” Stan Dushko, Chief Product Officer at Centrifuge, explains the upgrades and the reasoning behind them,
“With organizations steadily gathering vast amounts of data and much of it proprietary or sensitive in nature, exposing it within visualization tools without proper security controls in place may have unforeseen consequence…Can we really take the chance of providing open access to data we haven’t previously reviewed? Not knowing what’s in the data, is all the more reason to enforce proper security controls especially when the data itself is used to grant access or discover its existence altogether.”
The Big Data business intelligence software provider promises customers peace of mind and total confidence in their technology. They believe their system to be above and beyond the dashboard management systems of “traditional business intelligence solutions” due to their displays possibility of being reorganized in a more interactive way. Speaking of organization, you may notice that finding Centrifuge Systems in Google is an interesting exercise.
Chelsea Kerwin, February 19, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Altegrity owner of Kroll Faces Litigation and 2015 Debt Maturities
February 19, 2014
The article titled Debt Maturities May Blow the Whistle on Altegrity on the Deal Pipeline explores the possibilities facing the Virginia based company. Altegrity is a private-equity owned background check company with the subsidiary US Investigations Services Inc. (USIS). The company was responsible for vetting both Edward Snowden and Aaron Alexis (the Washington Navy Yard shooter). The article explains,
“In a complaint filed on Jan. 22 in the U.S. District Court … the DOJ suggested penalties of $5,500 to $11,000 per violation of the firm’s quality review protocol for 665,000 cases, or about 40% of the cases Altegrity subsidiary, US Investigations Services Inc., has handled. On the low end of that spectrum, the damages could reach $3.7 billion. However, the fund manager noted that Altegrity could seek to settle the case for a much lower amount, curtailing…a lengthy litigation process.”
A USIS spokesperson defended the company with the statement that the allegations only refer to a small group of people over a short period of time. One looming factor mentioned by S&P analyst Brian Milligan are Altegrity’s 2015 debt maturities, which allow the company some wiggle room for negotiation. We should also note that is the company that has owned Kroll, the corporate security firm, since about 2007. Kroll has a search component acquired from Engenium in 2006.
Chelsea Kerwin, February 19, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Public Facing Web Sites in SharePoint Online
February 19, 2014
There’s more than meets the eye when it comes to SharePoint Online. SharePoint is no doubt the enterprise solution with the largest chunk of the market, but many organizations do not use the implementation to its full potential. Read more in the Search Content Management article, “Creating Public-Facing Websites in SharePoint Online.”
The article says:
“Although SharePoint is probably best known for its user sites and team sites, SharePoint Online supports the creation of a publicly facing website. This SharePoint site is publicly accessible, even to those who do not have an Office 365 account. In fact, some organizations host their corporate website through SharePoint Online.”
Stephen E. Arnold is the man behind ArnoldIT.com and a longtime search expert. His SharePoint research points to customers wants better customization and end-user experience. By expanding SharePoint beyond its regular parameters, such as using it to build a public-facing website, organizations can get more bang for their buck.
Emily Rae Aldridge, February 19, 2014
Decree 72 in Vietnam Broadens Censorship
February 18, 2014
Communist governments are not exactly considered advocates for free-flowing information within their borders. Quartz does nothing to dispel this image with its article, “Vietnam’s New Social Media Crackdown Takes Aim at News Aggregators (and Enemies of the State).” The pieces tells us about Vietnam’s new Decree 72, which declares that websites must not “quote, gather or summarize information from press organizations or government websites.” Writer Adam Pasick reports:
“If you think that sounds ominously vague, you’re not alone. Critics of the new law noted that it would essentially ban any links to a news article on Facebook, which has 12 million users in Vietnam. ‘We are deeply concerned by the decree’s provisions that appear to limit the types of information individuals can share via personal social media accounts and on websites,’ the US Embassy in Hanoi said. Reporters Without Borders called Decree 72 ‘both nonsensical and extremely dangerous,’ saying it would require ‘massive and constant government surveillance of the entire internet.'”
Ostensibly, the decree is meant to fight copyright violations. However, critics note that its vagueness grants plenty of wiggle room to a government known for a heavy hand in this area; Pasick reminds us that this country threw 35 bloggers in jail for writing about things it would have preferred to keep hidden. Not to be overlooked is this gem—Pasick writes:
“And there is also the chilling language of the second half of Decree 72, which bans ‘information that is against Vietnam, undermining national security, social order, and national unity … or information distorting, slandering, and defaming the prestige of organizations, honor and dignity of individuals.'”
Ugh. Vietnam was already near the bottom of the press freedom index maintained by Reporters Without Borders. It seems the country’s government will not relinquish censorship any time soon; after all, it is an effective tool for controlling one’s citizens.
Cynthia Murrell, February 18, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Free Textbooks from Yumpu
February 18, 2014
Yay, free books! We love pointing out free resources, and now two textbooks relevant to content processing are available (in embedded PDF form) without charge via Yumpu.com. One, Information Technology for Management, from Turban, McLean, and Wetherbe, aims to bridge a crucial knowledge gap that plagues many businesses. It begins its task of educating managers in the mysteries of IT with a chapter titled, “Strategic Use of Information Technology in the Digital Economy.” As one might expect from a textbook, each chapter includes discussion questions and exercises. The authors illustrate their points with real-life examples from recognizable companies.
Another book, Data Mining Methods and Models, by Daniel T. Larose at Central Connecticut State University, tackles the building of data models. Like the above book, questions and exercises are provided for your enjoyment. This is the second book in a series. Its preface specifies that this volume:
“…explores the process of data mining from the point of view of model building: the development of complex and powerful predictive models that can deliver actionable results for a wide range of business and research problems…. [It provides]
*Models and techniques to uncover hidden nuggets of information
*Insight into how the data mining algorithms really work
*Experience of actually performing data mining on large data sets.”
If either of these sound like they could be of use to you (or someone you’ve been trying to explain these things to), don’t miss out on these free resources.
Cynthia Murrell, February 18, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Search and Management Appliances from InfoLibrarian
February 18, 2014
Folks looking for affordable data-management and search solutions should check out InfoLibrarian. You can get their Metadata Management Appliance and pair it with their Search Appliance, both for about $3,500. Just to be clear, these are not software applications; they are hardware units you would plug into your network like a hard drive. The description for the Management Appliance tells us:
“The InfoLibrarian Metadata Appliance takes enterprise search and metadata management to a whole new level. Manage and synchronize metadata, documents, files, source code, and virtually any digital asset. You name it… InfoLibrarian catalogs it. Hundreds of Adapters and document crawlers are available to automatically index, categorize and keep history of changes over time. Business friendly search engine/portal to navigate categories; perform search, impact analysis and data lineage analysis across disparate systems.”
The page goes on to emphasize certain features, like centralized, role-based security controls; automation options; simplified collaboration; and classification tools that go beyond those normally found in enterprise indexing products. To search your impeccably managed data, you could choose the corresponding Search Appliance. That description reads:
“The InfoLibrarian Search Appliance is ready to go, just plug it into your network and setup indexing of files, databases and web sites. Almost instantly, you can begin searching. It’s Fast … Powerful and Easy!
Hundreds of document crawlers are available to automatically index, categorize and keep history of changes over time. Bundled with all the features you expect including a simple search interface with integrated spell checking, advanced searching and configurable results.”
The page notes that you can customize this appliance with either templates or API. The highlight for me is InfoLibrarian’s vow that this device provides the “most secure search available.” That’s reason enough to look into it. See each product’s page for the full lists of their features.
Headquartered in Rochester, New York, InfoLibrarian has been helping organizations in a range of industries to manage and analyze data since 1998. The privately held company strives to provide their clients with the best metadata-integrated solutions on the market.
Cynthia Murrell, February 18, 2014
Sponsored by ArnoldIT.com, developer of Augmentext