German Data Protection Authority Challenges Google
November 5, 2014
Well, this is interesting. The Inquirer reports that the Germans are taking a stand against Google’s practice of consolidating users’ Web-wide data in, “Germany Tells Google to Pause for Permission Before Profiling People.” The Hamburg Data Protection Authority has a particular problem with Google’s one-privacy-policy-fits-all-countries stance. For its part, Google continues to assert that the “simpler, more effective services” it can provide by pulling the threads of our online presences are worth the privacy tradeoff. I’m sure the increased ad revenue is just a nice side-effect.
Reporter Dave Neal quotes Johannes Caspar, the Hamburg commissioner of data protection and freedom:
“On the substantial issue of combining user data across services, Google has not been willing to abide to the legally binding rules and refused to substantially improve the user’s controls. So we had to compel Google to do so by an administrative order. Our requirements aim at a fair balance between the concerns of the company and its users. The issue is up to Google now. The company must treat the data of its millions of users in a way that respects their privacy adequately while they use the various services of the company.”
I suppose we’ll see about that. What will be the next step in the struggle between Google and the world’s privacy advocates?
Cynthia Murrell, November 05, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Visit the Digital Vatican
November 5, 2014
The Vatican possesses one of the most extensive historical and religious archives in the world. Researchers are eager to visit the archive and read the documents. The Vatican, however, is selective about visitors due to the documents’ fragility and age. Message To Eagle, a paranormal and historical news blog, reports in the post “Vatican Library Puts 4,000 Manuscripts Available Online For Free” about how the Vatican is going digital.
The Vatican Apostolic Library will digitize 3000 records and people will be able to view them for free via the Web. The records will be stored in the DigitaVaticana program, which is based on a format designed by NASA to store images and astronomical data. Prior to the new digitization efforts, there were only 1100 records available online of the over 80000 items in the library. The current digital library is outdated and requires users to manually click through each file in order to view an item image. The new system will include a new search tool and be more graphics heavy.
To complete this task, the Vatican will draw on its own funds as well as crowdfunding to complete the project. They estimate it will take 50 million euros, over fifteen years, and 150 experts to digitize the entire library.
“The Vatican writes on the DigitaVaticana website, ‘Thanks to technology we can preserve the past and bequeath it to the future. The manuscripts will be freely available to everyone on the Vatican Library website and the world’s knowledge will truly become humanity’s heritage.’ “
Whitney Grace, November 05, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Attivio Highlights Content Intake Issues
November 4, 2014
I read “Digesting Ingestion.” The write up is important because it illustrates how vendors with roots in traditional information retrieval like Attivio are responding to changing market demands.
The article talks about the software required to hook a source like a Web page or a dynamic information source to a content processing and search system. Most vendors provide a number of software widgets to handle frequently encountered file types; for example, Microsoft Word content, HTML Web pages, and Adobe PDF documents. However, when less frequently encountered content types are required, a specialized software widget may be required.
Attivio states:
There are a number of multiplicative factors to consider from the perspective of trying to provide a high-quality connector that works across all versions of a source:
· The source software version, including patches, optional modules, and configuration
· Embedded or required 3rd party software (such as a relational database), including version, patches, optional modules and configuration
· Hardware and operating system version, including patches, optional modules, and configuration
· Throughput/capacity of the repository APIs
· Throughput/capacity and ability to operate in parallel.
This is useful information. In a real world example, Attivio reports that a number of other factors can come into play. These range from lacking appropriate computing resources to corrupt data that connectors send to the exception folder and my favorite Big Data.
Attivio is to be credited for identifying these issues. Search-centric vendors have to provide solutions to these challenges. I would point out that there are a number of companies that have leapfrogged search-centric approaches to high volume content intake.
These new players, not the well known companies providing search solutions, are the next generation in information access solutions. Watch for more information about automated collection and analysis of Internet accessible information and the firms redefining information access.
Stephen E Arnold, November 4, 2014
Connotate: Automated Data Extraction That Seems to Work Like a Traditional Alert
November 4, 2014
I found the write up “Using Automated Data Extraction to Find Out Who Makes How Much and Where They Make It” suggestive of what search systems will have to do to survive. The blog post presents information about Connotate’s automation functions.
I learned that Connotate has a client interested in gathering information about salaries. The write up reported:
They’re [the client] trying to scale up and found they could look into salaries and titles only in downtimes, and that wasn’t very often. In fact, they’ve been able to go to only a couple of websites and get information for just two job titles in two countries. But their plans call for learning about hundreds, if not thousands, of job titles across 75 countries. Since they were doing this manually, and only when time permitted, getting to where they needed to be was almost impossible.
The shift to automation as a key feature of information access is important. However, note that the client had a known problem and knew what information was required. Connotate then performed a standing query on accessible content and provided outputs to the client.
However, what about clients who do not know what information is germane to their business? How can automation that mimics knowing what to look for assist with pinpointing unknowns?
Search vendors will have to shift into a different development mode in order to provide services that deal with high volatility and unknowns in today’s business climate.
Stephen Arnold, November 4, 2014
Google Road Show: Annoying the Audience?
November 4, 2014
I wonder if the European officials are finding Google’s methods annoying. I read “Google’s ‘Right to Be Forgotten’ Roadshow Is Just a ‘Distraction’ – EU Digital Rights Group.”
I found this comment interesting:
“Google is in the perfect position to drive much needed change in this area, but the truth is that Google doesn’t want to restrict its own freedom to maneuver. It doesn’t want more regulation and misrepresenting this whole ‘right to be forgotten’ ruling is a way to distract from that,” claimed McNamee [EDRi executive director Joe McNamee].
The Google approach was quirky and charming to some. No more it seems.
Stephen E Arnold, November 4, 2014
Google Scholar Makes Caselaw Collection Free
November 4, 2014
Google Scholar is Google’s answer to an academic database. While it includes many scholarly articles and citations, access if often denied to the articles because they require a subscription fee or they are locked down in another way. It sucks for struggling researchers who do not have access to a public or university library. The official Google Scholar Blog details in the post, “Caselaw Is Set Free, What Next?” how paralegals, law students, and firms now have free access to loads of legal information. They rejoiced in finally having access to information and it made the law world realize the importance of legal information.
While the free legal information makes it easier to research cases, it opens another can of worms:
“We need to hugely increase the amount of freely-available material that explains the law. And we need to — in ways both trivial, and not — make it possible for people to find the laws that affect them using things they already know.”
Connections between information are also required to help people navigate the legal collection. The article provides an example about researching Tylenol, but because it is a brand name the researcher needs to know about acetaminophen and related connections. One way to solve the problem is using a categorization system similar to what science librarians use for agriculture is a model to be followed. All this information is available, but it is still difficult to access it.
Will Google launch a project that attempts to connect information collections? Would Google even try to monetize on this idea or continue trying offering free information?
Whitney Grace, November 04, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
X1 Social Discovery Posited as Solution to Social Media Discovery Problem
November 4, 2014
The article titled Why Printing Social Media Pages is a Bad Idea and May Violate an Attorney’s Duty of Competence on the X1 Discovery blog delves into the waste incurred by counsel when they take the time to print social media pages. The example used is the recent case of Stallings v. City of Johnston, a wrongful termination case wherein Jayne Stalling’s legal team spent a full week printing out some 500 pages of Facebook content. The article explains,
“This exercise was obviously costly. A week of paralegal and lawyer time could easily run $25,000 and no client should pay anywhere near that amount for a task that, with best practices technology, would require minutes instead of days to perform. But the high cost is only the tip of the iceberg…Many legal commentators note that the duty of competence arguably requires lawyers to conduct online investigations of opposing parties, key witnesses, jurors, including looking at social media.”
The article goes on to question the reliability of print screen images, which are still time-consuming. Screen shot images have not been allowed in several courts, but the answer proposed by the article is X1 Social Discovery. This is a single interface designed to address social media content from the leading sites while holding on to metadata and allowing for search. They also note the price is certainly preferable to that incurred in the Stallings case.
Chelsea Kerwin, November 04, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
SharePoint Does Not Always Increase Collaboration
November 4, 2014
In the quest for greater collaboration, some organizations have an “if you build it, they will come,” mentality. But SharePoint is not a field of dreams and many organizations are finding that simply adding the infrastructure is not enough. This idea is covered in the No Jitter article, “SharePoint = Collaboration? Not Always.”
The author gives many reasons for SharePoint’s inability to create collaboration:
“For one, not all employees will mesh well in the collaborative environment. Two, you need to understand how employees work before picking a portal. And three, simply making SharePoint a place to put documents for the sake of sharing and granting user permissions doesn’t ensure that collaboration will improve.”
The moral of the story is that software can only do so much, and it only really works at its capacity when an organization does the hard work of introspection. Stephen E. Arnold has committed his life’s work to following search, including SharePoint. He has a lot of great insight on enterprise software and reports many of his finding on ArnoldIT.com. SharePoint end users and managers alike will benefit from keeping a close eye on his SharePoint feed, featuring the latest tips, tricks, and news.
Emily Rae Aldridge, November 4, 2014
Another Amazing Interface from Archive.org
November 3, 2014
The Internet Archive has a design motif: A postage stamp album. You can see the Internet Arcade implementation at https://archive.org/details/internetarcade. I can do a screen shot of a very long screen, but I want to show you a snippet of the postage-stamp or card design motif. The interface presents about 890 hot links in the form of postage stamps pasted in a 1050s style album.
Click a link and you will be able to play an arcade game in your browse. Performance can be interesting. The white rectangles in the screenshot indicate that a graphic did not render. I grabbed this image after a period of 10 minutes. Rendering was leisurely. I think the horsepower for the system was munching hay.
There is a search box. A search for “anteater” returned a hit to the arcade game and to other content about anteaters, cockroaches, pest control, and other related concepts. Well, related to anteaters, not to the arcade game.
Stephen E Arnold, November 3, 2014
Desperate for Traffic? Paid Links Work, Just Do Not Get Caught
November 3, 2014
I read “Should Hotels.com Get a Google Slap for Soliciting Paid Links From Bloggers?” The main idea is that Google does not like this practice. However, my reading of the write up took a different turn.
First, the write up makes clear that paid links do work. So if you are desperate for traffic, the trick is to finesse the watchful eye of Mother Google.
Second, perhaps Google should formalize paid links and charge.
With revenue a growing concern from a very large one trick pony, I can envision this action as soon as Google figures out how to monetize YouTube, News, etc.
Stephen E Arnold, November 4, 2014