Hidden from Google: Interesting but Thin

July 15, 2014

I learned about the Web site Hidden from Google. You can check out the service and maybe submit some results that have disappeared. You may not know if the deletion or hiding of the document is a result of the European Right to Be Forgotten action, but if content disappears, this site could be a useful checkpoint.

Here’s what the service looks like as of 9 21 am Eastern on July 15, 2014.

image

According to the Web site:

The purpose of this site is to list all links which are being censored by search engines due to the recent ruling of “Right to be forgotten” in the EU. This list is a way of archiving the actions of censorship on the Internet. It is up to the reader to decide whether our liberties are being upheld or violated by the recent rulings by the EU.

I noticed that deal old BBC appeared in the list, a handful of media superstars, and some Web sites unknown to me. The “unknown” censored search term is intriguing, but I was not too keen on poking around when I was not sure what I was seeking. Perhaps one of the fancy predictive search engines can provide the missing information or not.

When I clicked on the “source” link sometimes I got a story that seemed germane; for example, http://bbc.in/1xhjKyK linked to one of those tiresome banker misdeed stories. Others pointed to stories that did not seem negative; for example, a guardian article that redirected to a story in Entrepreneur Magazine. http://bit.ly/1jukI7T. Teething pains I presume or my own search ineptness.

I did some clicking around and concluded that the service is interesting but lacks in depth content. I looked for references to the US health care Web sites. I am interested in tracking online access to RFPs, RFQs, and agreements with vendors. These contracts are fascinating because the contractors extend the investigative capabilities of certain US law enforcement entities. Since I first researched the RAC, MIC, and ZPIC contractors, among others, I have noticed that content has become increasingly difficult to find. Content I could pinpoint in 2009 and 2010 now eludes me. Of course, I may be the problem. There could be latency issues when spiders come crawling. There can be churn among the contractors maintaining Web sites. There can be many other issues, including a 21st century version of Adam Smith’s invisible hand. The paw might be connected to an outfit like Xerox or some other company providing services to these programs.

Several questions:

First, if the service depends on crowdsourcing, I am not sure how many of today’s expert searchers will know when a document has gone missing. Unless I had prior knowledge of a Medicare Integrity Contractor statement of work, how would I know I could not find it? Is this a flaw the site will be able to work around.

Second, I am not sure the folks who filled out Google’s form and sent proof of their wants an archive of information that was to go into the waste basket. Is there some action a forgotten person will take when he or she learns he or she is remembered?

Third, the idea is a good one. What happens when Google makes its uncomfortable to provide access to data that Google has removed? Maybe Mother Google is toothless and addled with its newfound interest in Hollywood and fashionable Google Glass gizmos. On the other hand, Google has lots of attorneys in trailers not too far from where the engineers work.

Stephen E Arnold, July 15, 2014

Steps Offered to Improve Government Data Sites

July 8, 2014

The article on FlowingData titled How to Make Government Data Sites Better uses the Center for Disease Control website to illustrate measures the government should take to make their data more accessible and manageable. The first suggestion is to provide files in a useable format. By avoiding PDFs and providing CSV files (or even raw data), the user will be in a much better position to work with the data. Another suggestion is simply losing or simplifying the multipart form that makes search nearly impossible. The author also proposes clearer and more consistent annotation, using the following scenario to illustrate the point,

“The CDC data subdomain makes use of the Socrata Open Data API,… It’s weekly data that has been updated regularly for the past few months. There’s an RSS feed. There’s an API. There’s a lot to like… There’s also a lot of variables without much annotation or metadata … When you share data, tell people where the data is from, the methodology behind it, and how we should interpret it. At the very least, include a link to a report in the vicinity of the dataset.”

Overall, the author makes many salient points about transparency, consistency and clutter. But there is an assumption in the article that the government actually desires to make data sites better, which may be the larger question. If no one implements these ideas, perhaps that will be answer enough.

Chelsea Kerwin, July 08, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Bill Suggests Replacing NTIS with Google Search

May 15, 2014

The article titled There’s a ‘Let Me Google That For You’ Bill on Talking Points Memo relates the substance of a bipartisan bill (sponsored by Tom Coburn and Clair McCaskill). The bills purpose is to save the taxpayer money by resorting to Google and eliminating the National Technical Information Service (NTIS). The article states,

“The bill is meant to cut down on “the collection and distribution of government information” by prioritizing using Google over spending money to obtain information from the National Technical Information Service (NTIS). NTIS, run by the Department of Commerce, is a repository of 3 million scientific, technical, engineering, and business texts. The bill would abolish the NTIS and move essential functions of the agency to other agencies like the National Archives.”

If the bill’s name sounds familiar, you have probably heard of the website it is named after, in which the website redirects you to Google. The bill is put forward to prevent waste by federal agencies in obtaining government documents for money when they are available online free of charge. Sounds like a no-brainer, especially since NTIS was founded in 1950, decades before the Internet was even a possibility. You can read the full bill here.

Chelsea Kerwin, May 15, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

India: The Future of Search

April 28, 2014

I read “New RTI Search Engine Makes the Task Tougher.” A number of government sites have made changes that seem to make finding information more difficult. In some cases, locating information may be almost impossible. When I lived in Washington, DC, as a grade school student, I remember my father stopping at a government agency and walking in to obtain some information. I am not sure how my father’s approach would be received today.

In the Pune Mirror article, I noted this passage about India’s Right to Information finding system:

a new search engine has been put in place that makes it mandatory for visitors to know the specific date, topic, category and sub-category in order to track a particular circular. Also, information like mode of payment for RTI fees, circulars, advertisements and office memorandums, that were up front as per their date of issuance from the year 2005, have gone missing.

In my experience, most users are not able to provide sufficiently narrow terms or provide key details about a needed item of information. As a result, it is now trivially easy for a governmental entity to drop a old-school photographer’s cloak over some information. I noted this comment in the article:

“With the new system in place, you need to know the exact date, topic, category and sub category in order to find the circular. Considering the level of literacy in this country, who will know all details?” he demanded. “We are all stake-holders and they should have asked before making these changes. All political parties have opposed the RTI Act.

The article points to an opinion that the new Indian search system is designed to “harass” users. I don’t agree. More commercial and governmental entities are fearful of user access to some information.

Is the use of the word “transparency” a signal that finding information is not in the cards. For me, I am not too concerned. I have developed a turtle like approach to these “retrieval enhancements.” I no longer look for information online as often as I did when I was but a callow lad.

I am pulling my head in my shell now. There. That’s better. Predictive search delivers pizza and sports scores. What more does a modern person require?

Stephen E Arnold, April 28, 2014

Google Promptly and Quietly Erases Lists of Government Partners

April 21, 2014

A pair of articles at PandoDaily tell an interesting story. First they published a piece titled, “Google Distances Itself from the Pentagon, Stays in Bed with Mercenaries and Intelligence Contractors.” In that article, reporter Yasha Levine reveals that, despite Google’s attempts to dissociate itself from the military-industrial complex after last year’s NSA kerfuffle, the search giant is still working closely with several of those agencies, and their contractors. He writes:

“In some cases — like the company’s dealings with the NSA and its sister agency, the NGA — Google deals with government agencies directly. But in recent years, Google has increasingly taken the role of subcontractor: selling its wares to military and intelligence agencies by partnering with established military contractors. It’s a very deliberate strategy on Google’s part, allowing it to more effectively sink its hooks into the nepotistic, old boy government networks of America’s military-intelligence-industrial complex.

“Over the past decade, Google Federal (as the company’s DC operation is called) has partnered up with old school establishment military contractors like Lockheed Martin, as well as smaller boutique outfits — including one closely connected to the CIA and former mercenary firm, Blackwater.”

Levine goes into detail, and that article is an interesting read. However, it was his follow-up piece, “Google Apparently Scrubs Military Contractor Partner Listing, After Pando Report” that really caught our attention. This story shares screenshots taken before and after the revelatory article was posted a couple days before. These images show Google’s Enterprise- Government page displaying lists of government partners. The second shows a page in perpetual-load mode. Levine tells us:

“Later [on the day the first article was posted], I noticed a strange thing: The official Google ‘Enterprise Government’ webpage that had listed some of the company’s military contractor partners no longer loaded. The page worked just fine less than a week ago, but now all it shows is some text up top telling government agencies to ditch their dinosaur IT services and get with Google — ‘Help your agency move fast and innovate’! — and then nothing but empty white space….

“I’ve asked several people to access the page from different parts of the United States and they all come back with the same answer: the page framework partially loads, but all the information is missing. It appears to be the only Google Enterprise page that does not load. I’ve looked around, but could not find this missing list of contractors displayed anywhere else on the Google Enterprise website.”

So, was this glitch purposeful? Well, as of this writing, the page is functioning. However, it no longer includes lists of partners, just links to more info for potential customers. Like Levine, I can find no such list elsewhere on the site. (The closest I found is a page where city reps laud Google for use in running local governments — much less controversial.) Good catch, Pando.

Cynthia Murrell, April 21, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Secrecy News Talks Declassification

April 15, 2014

Declassified records are an interesting element to the public. But there is more to declassification than simply putting them out there for the public to find. Findability and search play a role also. Secrecy News focuses on the topic in their blog entry, “Putting Declassified Records to Good Use.”

The article says:

“The final, climactic step in the declassification of government records is not the formal removal of classification markings or even the transfer of the declassified documents to public archives. The culmination of the declassification process is when the records are finally examined by an interested reader and their contents are absorbed into the body of public knowledge.”

Secrecy News is an FAS project on government secrecy. They provide documentary resources on secrecy, intelligence, and national security. Interested readers can subscribe for regular updates. Secrecy is a hot topic due to the Snowden case, but this blog has been in business for years, and offers a steady flow of information, even if not completely original in scope.

Emily Rae Aldridge, April 15, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Government Tackles Acquisition Inefficiencies

April 6, 2014

Given evidence like the vile backlog on veterans’ benefits and the still-operating paperwork bunker in Pennsylvania, one could be forgiven for suspecting that no one in government is even trying to bring our bureaucracy into this century. You may be surprised to know there is plan in place for at least part of the problem, as evidenced by the Integrated Award Environment: the Path Forward from the U.S. General Services Administration (GSA). That document, which looks suspiciously like a Power Point presentation converted to PDF, outlines the GSA’s recommendations for improving the federal government’s acquisition procedures.

Anyone interested in the details should check out the document, but the list of “our principles” summarizes the organization’s targets:

  • Open (source code, data, APIs)
  • Data as an asset
  • Continuous improvement
  • Effective user experience
  • Measurable transactions
  • Security is foundational
  • Build value over maintaining status quo

The paper expounds on each of these points, defining the implications of each goal, a point or two on maintaining balance, and questions workers should ask themselves going forward. For example, the section on “Open” notes that users must balance the stability of, say, Oracle with the agility of open source solutions and security with openness. For the data-enthused among us, the section on “Data as an asset” reads:

“Accurate, timely, complete, and authoritative”

Implies:

*Significant effort to manage data quality; implementers must have data-oriented SLAs

*Change control of the data needs to be transparent

*Will follow the data->information->knowledge chain Implies

Balance:

*Our flexibility has to account for the strong change management of our data Balance

Ask ourselves:

*“How do we ensure that we are providing timely and accurate data?”

*“How are we enabling decision-making through use of our data?”

So, next time you’re tempted to think our government is doomed to be stuck in the 20th century, remember that some folks within the bureaucracy are on the case. Soon, it may be time for them to party like it’s 1999.

Cynthia Murrell, April 06, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

New York Public Library Posts Maps

April 5, 2014

The New York Public Library has a massive collection of beautiful maps, but instead of keeping them locked in an archive Motherboard reports, “The New York Public Library Releases 20,000 Beautiful High Resolution Maps.”

All of the 20,000 maps are available via open access. What is even more amazing is that the NYPL decided to release the maps under the Creative Commons CCO 1.0 Universal Public Domain Dedication. If you are unfamiliar with a Creative Commons license, it means that users are free to download content and do whatever they want with it.

“Combined with its existing historical GIS program, the NYPL wants its users to engage with the maps, and allows them to warp (fitting together based on corresponding anchor points) and overlay the historic maps with modern geoweb services like Google and Open Street Map. Users can export WMS, KML files, and high-quality TIFFs. The historic map appears side by side with the modern maps, and users are invited to mark corresponding points on each, so you can overlay the historic map over the current day’s.”

Google Maps using old maps to explore the world of the past. It is yet another amazing use of modern technology and makes one wonder what people of yesterday would have thought about exploring their world via a small box.

Whitney Grace, April 5, 2014

Darpa Prods Big Data Experts

March 29, 2014

I read “Darpa Calls for Advanced Big Data Ideas.” If the write up is accurate, Darpa is not on board with the marketing innovations about Big Data, whatever the term means. Darpa wants more. According to the TechRadar story:

According to V3, DARPA director Arati Prabhakar told a briefing on emerging threats with the House Armed Services Committee’s Subcommittee on Intelligence that it is looking to come up with some advanced big data ideas. She said that DARPA is creating a new set of cyber security capabilities that will ensure that networked information is trustworthy.

Address “big data” may be easier if those talking about it would define the term and the context in which the phrase is being used. Those who chant “Big Data,” including Darpa, are just empowering the sales people, the self appointed experts, and the failed middle school teachers who write “reports” for mid tier consulting firms.

Stephen E Arnold, March 29, 2014

US Government Content Processing: A Case Study

March 24, 2014

I know that the article “Sinkhole of Bureaucracy” is an example of a single case example. Nevertheless, the write up tickled my funny bone. With fancy technology, USA.gov, and the hyper modern content processing systems used in many Federal agencies, reality is stranger than science fiction.

This passage snagged my attention:

inside the caverns of an old Pennsylvania limestone mine, there are 600 employees of the Office of Personnel Management. Their task is nothing top-secret. It is to process the retirement papers of the government’s own workers. But that system has a spectacular flaw. It still must be done entirely by hand, and almost entirely on paper.

One of President Obama’s advisors is quote as describing the manual operation as “that crazy cave.”

And the fix? The article asserts:

That failure imposes costs on federal retirees, who have to wait months for their full benefit checks. And it has imposed costs on the taxpayer: The Obama administration has now made the mine run faster, but mainly by paying for more fingers and feet. The staff working in the mine has increased by at least 200 people in the past five years. And the cost of processing each claim has increased from $82 to $108, as total spending on the retirement system reached $55.8 million.

One of the contractors operating the system is Iron Mountain. You may recall that this outfit has a search system and caught my attention when Iron Mountain sold the quite old Stratify (formerly Purple Yogi automatic indexing system to Autonomy).

My observations:

  1. Many systems have a human component that managers ignore, do not know about, or lack the management horsepower to address. When search systems or content processing systems generate floods of red ink, human processes are often the culprit
  2. The notion that modern technology has permeated organizations is false. The cost friction in many companies is directly related to small decisions that grow like a snowball rolling down a hill. When these processes reach the bottom, the mess is no longer amusing.
  3. Moving significant information from paper to a digital form and then using those data in a meaningful way to answer questions is quite difficult.

Do managers want to tackle these problems? In my experience, keeping up appearances and cost cutting are more important than old fashioned problem solving. In a recent LinkedIn post I pointed out that automatic indexing systems often require human input. Forgetting about those costs produces problems that are expensive to fix. Simple indexing won’t bail out the folks in the cave.

Stephen E Arnold, March 24, 2014

Stephen E Arnold, March 24, 2014

Next Page »