Big Data on National Security
January 15, 2013
Big Data is all the buzz these days and its impact on national security, or security in general, is really growing. Security implications are obvious when technologists start talking about extracting data from minute data. On that note, Cloudera is hosting a forum on the national security of implications of Big Data on January 30th. The conversation is focused on Apache Hadoop. Read all the details in Bob Gourley’s blog entry, “Are You Architecting Sensemaking Solutions in the National Security Space? Register for 30 Jan Federal Big Data Forum Sponsored by Cloudera.”
Gourley begins:
“Friends at Cloudera are lead sponsors and coordinators of a new Big Data Forum focused on Apache Hadoop. The first, which will be held 30 January 2013 in Columbia Maryland, will be focused on lessons learned of use to the national security community. This is primarily for practitioners and leaders fielding real working Big Data solutions on Apache Hadoop and related technologies.”
The forum would be worth a look for those in this line of work. Many open source vendors, particularly those who deal with Big Data, are trying to address the issue of national security. LucidWorks is another company making an impact on security with its Big Data work. Their partnership with ISS brings their Big Data solutions to the federal government to tackle Special Operations, Counter-Drug, and Counter-Terrorism among others.
Emily Rae Aldridge, January 15, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
The Delicate Sensibilities of Google Play
January 15, 2013
Is Google practicing big-brother protection or is this just a misapplied algorithm? Mashable charges, “Google Play Scan-and-Match Feature Censoring Explicit Lyrics.” Designed as a way for users to get their personal music libraries into the cloud, the new Google Play Music Manager works by analyzing uploaded songs and matching them to licensed songs in its huge database. Some users, though, are complaining that Google has taken it upon itself to swap out songs with explicit lyrics for their sanitized (or “radio edit”) versions.
The video embedded in this post proposes that Google is attempting to shield itself from parental wrath. It also points out that Google Play’s contract hedges:
“Please note that when applying our policies, we may make exceptions. . . These exceptions are of course especially important in the context of music; they are also especially difficult lines to draw. We reserve the right to apply our policies in our sole discretion, and we are always evolving our views on complex issues.”
It is no surprise Google included the “you can’t sue us for capricious behavior” language; they see the insides of courtrooms quite often enough as it is. However, that doesn’t really make these unauthorized swaps any more acceptable, no matter how “complex” the issues. (Is censorship really such a complex issue? )
There is a “Fix Incorrect Match” button that seems to have resolved the issue for some users, but the folks at Mashable do not know whether it is an across-the-board fix. For what it’s worth, both iTunes Match and Amazon Cloud Player have also been said to produce incorrect matches, though not specifically because they were playing language-police.
Cynthia Murrell, January 15, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Bad News Affects Corporate Reputations in the UK
January 15, 2013
Google loses ground in the UK, but at least they are in good company. The Hollywood Reporter reveals, “BBC, Google, Apple, Amazon Drop in Annual U.K. Brand Ranking.” It seems each of these behemoths suffered from “negative buzz” last year, according to YouGov‘s BrandIndex. The relevant study polled 2,000 folks each day over the course of the year, asking whether they had heard anything positive or negative about many different brands.
Google‘s slide from the previous year was significant, falling from fourth place to tenth. Writer Georg Szalai explains:
“The Guardian said the YouGov researchers cited public anger in the U.K. amid a recent debate over measures by the likes of Google, Amazon and Starbucks to avoid taxes as a key factor hurting the scores of those companies. . . .
“In the case of Google, its brand buzz was also negative amid a change in privacy policies that have sparked a European probe.”
Indeed, we should not be surprised that Google took such a hit in Europe. Is the company concerned with improving public opinion across the Atlantic, or is it more interested in the short-term bottom line?
Apple‘s expulsion from the top ten (from sixth place last year) seems the result of its mapping debacle, the disappointment that is the iPhone 5, and legal tangles with Samsung. The slip Amazon saw was less dramatic; it only went from first to third place. The BBC faced a couple of scandals last year, and it’s overall brand suffered. However, two satellite projects, the BBC iPlayer radio and the BBC.co.uk Web page, both remain in the top ten.
I wonder what 2013 will bring for each of these companies. Will they behave themselves?
Cynthia Murrell, January 15, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
SEO Community Jumps to Conclusions About Google and Press Releases
January 15, 2013
Are press releases the red-headed stepchild of Google, or just misunderstood from a lack of complete information? An SEO pro schools his colleagues in Search Engine Journal’s “Get Over Yourself—Matt Cutts did Not Just Kill Another SEO Kitten.” His is a voice of reason in a field that tends to defensively vilify Google’s attempts to serve up only quality content.
The latest dustup began in the Google forums, where one poster asked about press release companies that only push their stories to “legitimate” (quality content) sites. Google’s Matt Cutts (probably unintentionally) stirred things up with his simple statement: “Note: I wouldn’t expect links form press release web sites to benefit your rankings, however.” Hyperbole ensued.
Many in the SEO community took those words to mean that Google will now ignore all links in every press release it encounters, and were quite perturbed. Writer and SEO veteran Alan Bleiweiss takes the alarmists to task, and it is entertaining to read. I’m more interested, though, in his comments on press releases. After acknowledging the wealth of garbage that is now often distributed as “press releases,” he wrote:
“REAL press releases, that communicate TRULY time sensitive newsworthy information, have, and always will be a valuable means of spreading information that deserves to be spread. REAL press releases don’t get written purely for the links. REAL press releases are designed to communicate with legitimate news people. REAL press releases are designed to let others know valid updated information.
“And a well-crafted press release, targeting truly accurate niche recipients can lead to legitimate journalists, bloggers and social media influencers contacting a site’s owners, or doing their own write-up on the subject, and potentially even generating their own links.
“So from a sustainable SEO perspective, press releases are STILL an SEO best practice recommendation. As part of a comprehensive marketing solution that is vital to providing multiple layers of direct and indirect signals for SEO purposes. But ONLY when those releases are executed properly.”
It is good to see such reasonable sentiments from someone in the search engine optimization field. Will Bleiweiss succeed in talking sense into his colleagues?
Cynthia Murrell, January 15, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Dr. Jerry Lucas: Exclusive Interview with TeleStrategies ISS Founder
January 14, 2013
Dr. Jerry Lucas, founder of TeleStrategies, is an expert in digital information and founder of the ISS World series of conferences. “ISS” is shorthand for “intelligence support systems.” The scope of Mr. Lucas’ interests range from the technical innards of modern communications systems to the exploding sectors for real time content processing. Analytics, fancy math, and online underpin Mr. Lucas’ expertise and form the backbone of the company’s training and conference activities.
What makes Dr. Lucas’ viewpoint of particular value is his deep experience in “lawful interception, criminal investigations, and intelligence gathering.” The perspective of an individual with Dr. Lucas’ professional career offers an important and refreshing alternative to the baloney promulgated by many of the consulting firms explaining online systems.
Dr. Lucas offered a more “internationalized” view of the Big Data trend which is exercising many US marketers’ and sales professionals’ activities. He said:
“Big Data” is an eye catching buzzword that works in the US. But as you go east across the globe, “Big Data” as a buzzword doesn’t get traction in the Middle East, Africa and Asia Pacific Regions if you remove Russia and China. One interesting note is that Russian and Chinese government agencies only buy from vendors based in their countries. The US Intelligence Community (IC) has big data problems because of the obvious massive amount of data gathered that’s now being measured in zettabytes. The data gathered and stored by the US Intelligence Community is growing beyond what typical database software products can handle as well as the tools to capture, store, manage and analyze the data. For the US, Western Europe, Russia and China, “Big Data” is a real problem and not a hyped up buzzword.
Western vendors have been caught in the boundaries between different countries’ requirements. Dr. Lucas observed:
A number of western vendors made a decision because of the negative press attention to abandon the global intelligence gathering market. In the US Congress Representative Chris Smith (R, NJ) sponsored a bill that went nowhere to ban the export of intelligence gathering products period. In France a Bull Group subsidiary, Amesys legally sold intelligence gathering systems to Lybia but received a lot of bad press during Arab Spring. Since Amesys represented only a few percent of Bull Group’s annual revenues, they just sold the division. Amesys is now a UAE company, Advanced Middle East Systems (Ames). My take away here is governments particularly in the Middle East, Africa and Asia have concerns about the long term regional presence of western intelligence gathering vendors who desire to keep a low public profile. For example, choosing not to exhibit at ISS World Programs. The next step by these vendors could be abandoning the regional marketplace and product support.
The desire for federated information access is, based on the vendors’ marketing efforts, is high. Dr. Lucas made this comment about the existence of information silos:
Consider the US where you have 16 federal organizations collecting intelligence data plus the oversight of the Office of Director of National Intelligence (ODNI). In addition there are nearly 30,000 local and state police organizations collecting intelligence data as well. Data sharing has been a well identified problem since 9/11. Congress established the ODNI in 2004 and funded the Department of Homeland Security to set up State and Local Data Fusion Centers. To date Congress has not been impressed. DNI James Clapper has come under intelligence gathering fire over Benghazi and the DHS has been criticized in an October Senate report that the $1 Billion spent by DHS on 70 state and local data fusion centers has been an alleged waste of money. The information silo or the information stovepipe problem will not go away quickly in the US for many reasons. Data cannot be shared because one agency doesn’t have the proper security clearances, job security which means “as long as I control access the data I have a job,” and privacy issues, among others.
The full text of the exclusive interview with Dr. Lucas is at http://www.arnoldit.com/search-wizards-speak/telestrategies-2.html. The full text of the 2011 interview with Dr. Lucas is at this link. Stephen E Arnold interviewed Dr. Lucas on January 10, 2013. The full text of the interview is available on the ArnoldIT.com subsite “Search Wizards Speak.”
Worth reading.
Donald Anderson, January 14, 2013
PolySpot Covers Enterprise Big Data Needs with Real Time Insights
January 14, 2013
When trying to identify a definition for big data many people turn to Gartner’s popular 3 V’s. However, we heard word from Information Week on something that goes above and beyond a simple definition in their article “Big Data 101: New Vendor-Neutral Guide addressing a new handbook for enterprises on big data from the Open Data Center Alliance (ODCA).
Despite the fact that many technologies are poised to address big data and are currently functioning with success, their are still organizations that express a greater desire for information on the basics of big data. The ODCA has answered their pleas for more information with their new “Big Data Consumer Guide.”
The article informed us:
The consumer guide summarizes how big data platforms can help a variety of industries. Banks, for instance, can correlate data from multiple, unrelated sources to potentially spot credit card fraud. In addition, the guide provides common definitions and lingo that organizations can use when working with big data providers.
This endeavor will help enterprises get a clear picture on the landscape at the time it is published, but new technological solutions in the big data arena pop up all the time. Innovative information delivery in the enterprise will continue to start and end with PolySpot, however.
Megan Feil, January 14, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Open Source Continues to Drive National Standards
January 14, 2013
Cloudant offers a managed database service in order to enhance search options on top of content management. Last fall Cloudant announced integrated text indexing and search based on Apache Lucene, which means the power of the leader open source search architecture. Building on its success, Cloudant has now joined the Open Geospatial Consortium in an effort to accelerate innovation and development of location aware apps and dynamic geo queries. Read the Directions Magazine article, “Cloudant Joins the OGC to Promote Geospatial Standards and Location-Based Applications,” for all of the details.
The article gets to the heart of the announcement:
“Cloudant today announced that it has joined the Open Geospatial Consortium (OGC), an international consortium that serves as a forum for collaboration on the development of geospatial interoperability standards. By joining the OGC, Cloudant aims to integrate geospatial standards into the Cloudant NoSQL database as a service (DBaaS) so Web, mobile and proprietary app developers can more easily introduce new geospatial features and analytics into their applications.”
Cloudant is doing well, but the greater issue is the continuing importance of open source technology on the national stage. Open source is contributing to standards and informing progress in all facets of technology. Another powerful software solution based on Apache Lucene is LucidWorks. They offer LucidWorks Search to satisfy enterprise search needs and LucidWorks Big Data to answer that emerging problem.
Emily Rae Aldridge, January 14, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Latest Desktop Version from dtSearch Available
January 14, 2013
We spotted dtSearch’s latest desktop version, v7.72.8085-Lz0, for sale at Release BB. Will this new release be a splash or a flash?
The product description reads:
“The dtSearch product line can instantly search terabytes of text across a desktop, network, Internet or Intranet site. dtSearch products also serve as tools for publishing, with instant text searching, large document collections to Web sites or portable media. Developers can embed dtSearch’s instant searching and file format support into their own applications.”
A few of the product’s features include a variety of helpful search options, data exports in several formats, and specialized forensic indexing and searching tools. See the company’s official Desktop product page for more details.
Incorporated in 1991, dtSearch began its R&D in 1988. They have since become a major provider of information management software, supplying award-winning solutions to firms in several fields and to numerous government agencies in the areas of defense, law enforcement, and space exploration. The company also makes its products available for incorporation into other commercial applications. dtSearch has distributors worldwide, and is headquartered in Bethesda, Maryland.
Cynthia Murrell, January 14, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Google Pays Hefty Sum to Belgian Newspapers
January 14, 2013
For Google, much rests on the “fair use” of third-party content. It is what allows them to display excerpts from, and to link to, other sites without paying a slew of fees every time a user searches the Web with their search engine. It is no surprise, then, that they would frame their deal with Belgian newspapers as something other than compensation for content. Jeff John Roberts at paidContent examines the issue in “Did Google Pay Belgian Newspapers a $6M Copyright Fee? Sure Looks Like It.”
Newspapers in Belgium have for some time demanded copyright fees for each display of a link to or excerpt from their publications. A recent deal, wherein Google has ponied up about $6.5 million plus legal fees, has ended the dispute. The company is adamant that their payout is in no way compensation for content, but rather payment for advertising with those papers. What choice do they have, really? Legal cases set precedents. Take a moment to imagine the costs if Google had to pay every site that ever appeared on a Google results page. The article observes:
“On its face, this is not a bad deal for Google. Given the anti-American regulatory climate in Europe, the company had a weak hand to play. Paying $6 million to end the Belgian headache may be a good investment, especially as the company can still claim (technically at least) that it still does not pay copyright fees for newspaper excerpts.
“The danger, of course, is that the rest of Europe will soon be beating a path to Google’s door demanding similar payouts. As we’ve noted, France and Germany are already kicking up dust over the copyright issue too (so is Brazil).”
Yes, no matter how Google frames it, the settlement is bound to have repercussions down the road. Roberts, however, points out that Google’s fair-use argument is actually pretty strong. European papers, he says, would do better to place their focus on catching up with North America in the digitation process. Perhaps. It will be interesting to see how this all plays out.
Cynthia Murrell, January 14, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Biotechnology News Reports on Vital Natural Language Processing Developments
January 14, 2013
Several biotechnology companies raced to release new 2012 products and we were filled in on these new releases by Bio IT World in their recent summary. A few important briefings related to the industry were also described in December Product and News Briefs.
In addition to reporting industry news from big players like IBM and announcing job opportunities, the majority of attention has been places on new products from Linguamatics, PerkinElmer, Titan Software, SoftGenetics, and Optibrium. These were all launched in the final month of 2012; however, most notably, Linguamatics has rolled out version 4.0 of text mining software platform, I2E.
The article discusses how natural language text mining will be opened up to a variety of different users with various needs. Continuing out of this topic, the author states:
“Regardless of how many disparate data sources need to be mined, I2E now has the power to analyze and extract information and knowledge from all of them simultaneously. Linguamatics I2E can now deal with recognition of novel compounds, which will give informaticians, researchers and patent analysts the power to investigate uncharted areas of innovation.”
Overall, this website offered a nice summary of some new products with Linguamatics offering some developments worth noting in the land of natural language processing. We shall see what 2013 holds.
Megan Feil, January 14, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search