National Library of Norway Makes Digital Copies to Share with Norway
December 17, 2013
The article titled Norway is Digitizing All Its Books and Making Them Free to Read on The Verge explains the effort by the National Library of Norway to make each and every book searchable and readable online for people with Norway IP addresses, this measure even for the oldest texts in the collection, which date back to the Middle Ages. The article states,
“It’s similar to the mass digitization efforts in the UK and Finland, but Norway has taken the extra step of making agreements with many publishers to allow anyone with a Norway IP address to access copyrighted material. The library owns equipment for scanning and text structure analysis of the books. It’s also adding metadata and storing the files in a database for easy retrieval.”
Begun in 2006, librarians have estimated that the entire project of digitizing will take between 20 and 30 years. It is questionable whether this online library will affront publishers, but in the article none are consulted. Much of the texts would no longer carry copyrights, like public records and historical documents, but the library also contains content of all media published. If Google was sued for merely trying to make books searchable online but not even supplying the entire contents of the texts, it seems likely that Norway will certainly face some opposition to their project.
Chelsea Kerwin, December XX, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
A Translation Guide for Scientific Papers
December 17, 2013
I think this is supposed to be funny. I am not sure that Elsevier, ACM, and other real academic publishers will see the humor, however. You may be able to find the original translation table at this Twitter link. No guarantees, so you know that I am indifferent to Google’s rules and regulations for important content. Let me highlight three of the “translations” from @sehnaoui:
- “It has long been known” means “I did not look up the original reference”
- “Correct within an order of magnitude” translates as “Wrong”
- “It is clear that much additional work will be required before a complete understanding of this phenomenon occurs” means “I don’t understand.”
- “It is hoped that this study will stimulate further investigations in this field” connotes “I quit.”
These phrases appear in information retrieval papers and in text mining, predictive analytics, and natural language processing studies.
Stephen E Arnold, December 17, 2013
Patch Fixes Microsoft and SharePoint Vulnerabilities
December 17, 2013
Patches are common with any software, but even more frequent with such a large suite as the one offered by Microsoft. Information Week covers the latest round of patches in their article, “Microsoft Patches Windows, Office, IE, SharePoint.”
The article gives more details on the specific vulnerability to SharePoint:
“Ultimately, the company discovered that the Office 365 desktop client, and in particular Microsoft Word, wasn’t verifying authentication headers by comparing them against SSL certificates. As a result, attackers were able to tell a Word client that they were a SharePoint server, when in reality the server was malicious.”
The latest patch fixes known issues. However, with a software as massive and ubiquitous as SharePoint, it is important to stay on top of the latest news and problems. Stephen E. Arnold of ArnoldIT stays on top of the latest in search, including SharePoint. Stay tuned for the latest problems and solutions for your SharePoint deployment.
Emily Rae Aldridge, December 17, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Instant Coffee as Business Intelligence Vendor Demo Metaphor
December 17, 2013
The article Instant Coffee on CooksInfo.com relates the history and attitudes of consumers toward Instant Coffee. It is an interesting note in the article that the aroma of coffee is actually sprayed on the top of the instant coffee jars before they are sealed to capture the smell that is so vital to the coffee-drinkers experience. This might be a stretch, but isn’t there a metaphor for this spray-on odor with search and business intelligence vendors’ demos?
A commenter who claims to have worked for Nestle defends the process:
“Do we spray coffee aroma on the powder to make it smell and taste like fresh coffee? Yes. Does every instant coffee manufacturer do this? If they have a half decent product yes. (Kraft are the other big producer) Where does this aroma come from? From the coffee in the jar…
If we sold coffee that had lost all its smell it would not only smell bland but taste bland too. So we capture the aroma, literally the molecules given off when extracting the liquor and taken from the air and stored.”
So this superficial approach to capturing the smell of coffee with coffee aroma in a can is similar to the demos created by vendors for search and business intelligence. Not quite the thing itself, but a little taste to make you thirsty.
Chelsea Kerwin, December 17, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Old Media Scripps Buys Newsy for Digital Video Platform
December 17, 2013
An article on TechCrunch titled Scripps Buys Newsy For $35M to Expand from TV and Newspapers to Digital Video explains the acquisition of Newsy, the media startup that is a digital video news platform, by Scripps, the TV and newspaper magnate. A Youtube video heralds the subsidization of Newsy, which should be made final in the beginning of 2014.
The article explains:
“This is about Scripps… buying an asset that gives it a digital video component to complement its existing TV and online services — effectively a bridge between the three areas where it already does business if you also count newspapers. It also gives the company access into an audience that consumes their news (and video) on devices like tablets, and has largely turned away from some of those more traditional platforms where Scripps still bases a majority of its business.”
Old media is on the move (quite old, Scripps was founded in 1879.) The company spent 35M on the acquisition, which it believes will bring them into the next generation of digital audiences. Newsy’s ad-supported videos are presently sent through web, mobile, tablet and certain TV platforms. The article suggests that the partnership Newsy had with AOL, Microsoft and Mashable may continue, but the companies haven’t announced their plans yet.
Chelsea Kerwin, December 17, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
DataStax and Google Create a Dream Team for Running Cassandra in the Cloud
December 17, 2013
The article titled DataStax Tests Enterprise-Grade Cassandra Database on Google Compute Engine on Yahoo Finance discusses the recent collaboration between Google and Datastax engineers. The results of the test were positive, with expected response times, operational constancy and strong disk I/O functioning under load.
The article explains the tests of Datastax Enterprise with Google Compute Engine:
“which recently became generally available to all developers. The combination of DataStax Enterprise and Google Compute Engine allows companies to deploy their critical applications on the Google Cloud Platform and grow their data to incredible levels while making sure they remain online at all times. DataStax and Google engineers collaborated to test and validate the scalability, reliability and performance of mission-critical online applications that are built on DataStax Enterprise with Google Compute Engine.”
Datastax boasts over 300 customers for its work powering big data apps. These include Adobe, eBay and Netflix. This collaboration with Google is planned to ease the use of Datastax in the cloud. Senior vice-president David Kloc of Datastax voiced his confidence in the new relationship, calling the platform “more reliable than ever before.” He has no reason to be humble, the NoSQL database that Datastax sells works securely with Apache Cassandra, enterprise search and visual management.
Chelsea Kerwin, December 17, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
A Non Search Person Explains Why Search Is a Lost Cause
December 16, 2013
The author of “2013: the Year ‘the Stream’ Crested” is focused on tapping into flows of data. Twitter and real time “Big Data” streams are the subtext for the essay. I liked the analysis. In one 2,500 word write up, the severe weaknesses of enterprise and Web search systems are exposed.
The main point of the article is that “the stream”—that is, flows of information and data—is what people want. The flow is of sufficient volume that making sense of it is difficult. Therefore, an opportunity exists for outfits like The Atlantic to provide curation, perspective, and editorial filtering. The write up’s code for this higher-value type of content process is “the stock.”
The article asserts:
This is the strange circumstance that obtained in 2013, given the volume of the stream. Regular Internet users only had three options: 1) be overwhelmed 2) hire a computer to deploy its logic to help sort things 3) get out of the water.
The take away for me is that the article makes clear that search and retrieval just don’t work. Some “new” is needed. Perhaps this frustration with search is the trigger behind the interest in “artificial intelligence” and “machine learning”? Predictive analytics may have a shot at solving the problem of finding and identifying needed information, but from what I have seen, there is a lot of talk about fancy math and little evidence that it works at low cost in a manner that makes sense to the average person. Data scientists are not a dime a dozen. Average folks are.
Will the search and content processing vendors step forward and provide concrete facts that show a particular system can solve a Big Data problem for Everyman and Everywoman? We know Google is shifting to an approach to search that yields revenue. Money, not precision and recall, is increasingly important. The search and content vendors who toss around the word “all” have not been able to deliver unless the content corpus is tightly defined and constrained.
Isn’t it obvious that processing infinite flows and changes to “old” content are likely to cost a lot of money. Google, Bing, and Yandex search are not particularly “good.” Each is becoming a system designed to support other functions. In fact, looking for information that is only five or six years “old” is an exercise in frustration. Where has that document “gone.” What other data are not in the index. The vendors are not talking.
In the enterprise, the problem is almost as hopeless. Vendors invent new words to describe a function that seems to convey high value. Do you remember this catchphrase: “One step to ROI”? How do you think that company performed? The founders were able to sell the company and some of the technology lives on today, but the limitations of the system remain painfully evident.
Search and retrieval is complex, expensive to implement in an effective manner, and stuck in a rut. Giving away a search system seems to reduce costs? But are license fees the major expense? Embracing fancy math seems to deliver high value answers? But are the outputs accurate? Users just assume these systems work.
Kudos to Atlantic for helping to make clear that in today’s data world, something new is needed. Changing the words used to describe such out of favor functions as “editorial policy”, controlled terms, scheduled updates, and the like is more popular than innovation.
Stephen E Arnold, December 16, 2013
Big Data: Is Grilling Better with Math?
December 16, 2013
Is there a connection between Big Data and grilling? Is there a connection between Big Data and your business?
I read “Big Data Beyond Business Intelligence: Rise Of The MBAs.” The write up is chock full of statements about large data sets and the numerical recipes required to tame them. But none of the article’s surprising comments matches one point I noticed.
Here’s the quote:
Software automation can’t improve without reorganizing a company around its data. Consider it organizational self-reflection, learning from every interaction humans have with work-related machines. Collaborative, social software is at the heart of this interaction. Software must find innovative ways to interface data with employees, visualization being the most promising form of data democratization.
I will be the first to admit that the economic revolution has left some businesses reeling, particularly in rural Kentucky. Other parts of the country are, according to some pundits, bursting with health.
Is a business reorganization better with Big Data?
Will Big Data deliver better grilled meat? Buy a copy of this book by Lilly and Gibson and see if there are ways to reorganize the business of grilling around self reflection. Big Data cannot deliver a sure fire winning steak? Will Big Data deliver for other businesses?
But for the business that is working hard to make sales, meet payroll, and serve its customers, Big Data as a concept is one facet of senior managers’ work. Information is important to a business. The idea that more information will contribute to better decisions is one of the buttons that marketers enjoy mashing. Software is useful, but it is by itself not a panacea. Software can sink a business as well as float it.
However, figuring out the nuances buried within Big Data, a term that is invoked, not defined, is difficult. The rise of the data scientist is a reminder that having volumes of data to review requires skills many do not possess. Data integrity is one issue. Another is the selection of mathematical tools to use. Then there is the challenge of configuring the procedures to deliver outputs that make sense.
Putting SharePoint in the Palm of Your Hand
December 16, 2013
Technology is moving toward mobile at a rapid rate. It comes as no surprise that enterprise technology is expected to keep up with the trend. And while major players like SharePoint are more mobile friendly than before, they are still playing catch-up compared to other mobile-born applications and software. GCN covers the latest in SharePoint mobile in their article, “How to Put SharePoint in the Palm of your Hand.”
The article begins:
“It is only logical that users would want access to SharePoint via their mobile devices. So how do you put an enterprise platform such as SharePoint, literally, in the hands of users? . . . SharePoint’s Mobile Browser View checks if the user’s mobile browser supports HTML5. If it does, then a contemporary mobile view is shown. If it does not, then a text-based view is shown. For more complex sites, developers can use SharePoint’s device channel feature to create a single site, but map the content to use different master pages and style sheets that are specific to a device or group of devices.”
Stephen E. Arnold of ArnoldIT is a longtime leader in search. He frequently covers SharePoint and helps users stay up to date on the latest in all things search, including enterprise. In much of his coverage, it is clear that SharePoint is improving in mobile, but still lags behind.
Emily Rae Aldridge, December 16, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Goldman Sachs Web Conference Leaves Out Search Vendors
December 16, 2013
What tech companies do the financial sector think are on top right now? TechCrunch discussed invitees ahead of the recent Goldman Sachs Private Internet Company Conference in Las Vegas in, “Here Are the Hottest Companies in Tech Right Now, According to Goldman Sachs.” Reporter Colleen Taylor reproduces for us the conference schedule, which apparently should have been kept on the down-low, but TechCrunch got a hold of somehow.
She writes:
“The Goldman Sachs conference for private web firms is one of the most high-end and hush-hush events in the tech world. It’s essentially like the Hackers Conference or dinners at Sheryl Sandberg’s house or Fight Club, except for tech executives who are likely to soon go through an IPO or big M&A deal. If you’re on the invite list, you’re in pretty good company — and the first rule is that you don’t talk about it to others.
[…] It bears mention that companies attending this conference have not necessarily engaged in an exclusive relationship with Goldman to manage their potential upcoming IPOs or M&A deals. In fact, most of them are free agents, fielding offers from any number of firms.”
Taylor points out a few notable absences, like Square, Dropbox, and Box. We, however, noticed something different: not a single search company is represented. Well, humph.
Cynthia Murrell, December 16, 2013
Sponsored by ArnoldIT.com, developer of Augmentext