September 22, 2014
The Internet makes it easier to access information, including documents from the government. While accessing government documents might cost a few cents, it is amazing that the information can be accessed within a few mouse clicks. BoingBoing, run by the infamous Cory Doctorow, notes that five important US courts are removing their documents from the Internet in “As Office Of US Courts Withdraws Records For Five Top Benches, Can We Make Them Open?”
The court documents are housed on the PACER system, most notable for charging users ten cents a page to access information. Doctorow advocates for free information and stopping governments from spying on its citizens. It is not surprising that he supports reopening these documents, along with the Free Law Project, Internet Archive, and Public.Resource.Org.
The plea reads:
“Our judiciary is based on the idea that we conduct justice public, not in star chambers and smoke-filled back rooms. Our system of justice is based on access to the workings of our courts, and when you hide those workings behind a pay wall, you have imposed a poll tax on access to justice. Aaron [Swartz] and many others believed very deeply in this principle and we will continue to fight for access to justice, equal protection, and due process. These are not radical ideas and the Administrative Office of the U.S. Courts should join us in our commitment.”
Swartz is known for working against Internet censorship bills, so joining Doctorow and the others will get the right backers to make these documents available again. You can fight city hall and win, especially if you are a technology enthusiast with legal aid.
September 20, 2014
Forget running queries on Yandex.ru if Russia disconnects from the Internet. Sure, there may be workarounds, but these might invite some additional scrutiny. Why am I suggesting that some Russian content becomes unsearchable. Well, I believed the story “Russia to Be Disconnected from the Internet.” Isn’t Pravda a go to source for accurate, objective information?
The story asserts:
This is not a question of disconnecting Russia from the international network, yet, Russian operators will need to set up their equipment in a way to be able to disconnect the Russian Internet from the global network quickly in case of emergency, the newspaper wrote. As for the state of emergency, it goes about both military actions and large-scale riots in the country. In addition, the government reportedly discusses a possibility to empower the state with the function to administer domains. Currently this is a function of a public organization – the Coordination Center for the National Domain of the Internet. The purpose of the possible measure is not to isolate Russia from the outside world, but to protect the country, should the USA, for example, decide to disconnect Russia from the system of IP-addresses. It will be possible to avoid this threat, if Russia has a local regulator to distribute IP-addresses inside the country, rather than the ICANN, controlled by the United States government. This requires operators to set up “mirrors” that will be able to receive user requests and forward them to specific domain names.
Interesting. Who is being kept in the information closet? I suppose it depends on one’s point of view. Need an update for Sphinx Search? There will be a solution because some folks will plan ahead.
Stephen E Arnold, September 20, 2014
September 15, 2014
Say, here’s a thought: After spending billions for big-data software, federal managers are being advised to do their research before investing in solutions. We learn about this nugget of wisdom from Executive Gov in their piece, “Report: Fed Managers Should Ask Data Questions, Determine Quality/Impact Before Investing in Tech.” Writer Abba Forrester sums up the Federal Times report:
“Rutrell Yasin writes that the above managers should follow three steps as they seek to compress the high volume of data their agencies encounter in daily tasks and to derive value from them. According to Shawn Kingsberry, chief information officer for the Recovery Accountability and Transparency Board, federal managers should first determine the questions they need to ask of data then create a profile for the customer or target audience.
“Finally, they should consider the potential impact of the data, the insights and resulting technology investments on the agency.”
For any managers new to data management, the article notes they should choose a platform that includes data analysis tools and compiles data from multiple sources into one repository. It also advises agencies to employ a dedicated chief data officer and data scientists/ architects. Good suggestions, all. Apparently, agencies need to be told that a cursory or haphazard approach to data is almost certain to require more time, effort, and expense down the line.
Cynthia Murrell, September 15, 2014
August 26, 2014
I find the readers who send me links to the UK Daily Mail stories helpful. Are these referrers easily fooled?
The story in question has a Google friendly headline:
‘It’s all been a big lie!’ Obama administration lawyer now admits ‘missing’ Lois Lerner emails WERE backed up but claims it’s too hard to search for them”
The US government is a busy beaver when it comes to search. You can explore USA.gov at your leisure or seek information on myriad dot Gov Web sites without my inputs.
Here’s a passage from the write up. You determine if it is on the money:
‘The Department of Justice attorney told the Judicial Watch attorney on Friday.’ Fit ton said during a Monday afternoon Fox News broadcast, ‘that it turns out the federal government backs up all computer records in case something terrible happens in Washington and there is a catastrophe, so the government can continue operating.’ The catch, he added, is that the DOJ attorney also claimed ‘it would be too hard to go and get Lois Lerner’s emails from that backup system.’
This search and retrieval stuff seems to be difficult. Perhaps these folks should turn to a real expert like Dave Schubmehl, the Arnold surfer for real insight?
Stephen E Arnold, August
August 26, 2014
I wanted to document a report that ICREACH exists. For information, see The Intercept’s report. No further comment from Beyond Search.
Stephen E Arnold, August 26, 2014
August 1, 2014
I saw a discussion thread describing a proposed action to allow Google to become the National Technical Information Service. Although I served on the Board of NTIS years ago, I don’t think too much about the operation. Apparently Google does. Apparently there are some folks who ignore the repository aspect of NTIS. Google is search and pretty darned good the argument goes.
The idea is presented in S 2206, but the officials elected by the people are occupied with a number of weighty issues.
I did not this item about the efficacy of US government management. Take a look at “Poorly Managed HealthCare.gov Construction Cost $840 Million, Watchdog Finds.” A billion here and a billion there may not be a big deal as the economy improves according to some pundits.
What Google does not do, perhaps USA.gov will? What about the Library of Congress, various government document repositories, and, of course, the funding entities themselves?
Let Google do it? Why not?
Stephen E Arnold, August 4, 2014
July 15, 2014
I learned about the Web site Hidden from Google. You can check out the service and maybe submit some results that have disappeared. You may not know if the deletion or hiding of the document is a result of the European Right to Be Forgotten action, but if content disappears, this site could be a useful checkpoint.
Here’s what the service looks like as of 9 21 am Eastern on July 15, 2014.
According to the Web site:
The purpose of this site is to list all links which are being censored by search engines due to the recent ruling of “Right to be forgotten” in the EU. This list is a way of archiving the actions of censorship on the Internet. It is up to the reader to decide whether our liberties are being upheld or violated by the recent rulings by the EU.
I noticed that deal old BBC appeared in the list, a handful of media superstars, and some Web sites unknown to me. The “unknown” censored search term is intriguing, but I was not too keen on poking around when I was not sure what I was seeking. Perhaps one of the fancy predictive search engines can provide the missing information or not.
When I clicked on the “source” link sometimes I got a story that seemed germane; for example, http://bbc.in/1xhjKyK linked to one of those tiresome banker misdeed stories. Others pointed to stories that did not seem negative; for example, a guardian article that redirected to a story in Entrepreneur Magazine. http://bit.ly/1jukI7T. Teething pains I presume or my own search ineptness.
I did some clicking around and concluded that the service is interesting but lacks in depth content. I looked for references to the US health care Web sites. I am interested in tracking online access to RFPs, RFQs, and agreements with vendors. These contracts are fascinating because the contractors extend the investigative capabilities of certain US law enforcement entities. Since I first researched the RAC, MIC, and ZPIC contractors, among others, I have noticed that content has become increasingly difficult to find. Content I could pinpoint in 2009 and 2010 now eludes me. Of course, I may be the problem. There could be latency issues when spiders come crawling. There can be churn among the contractors maintaining Web sites. There can be many other issues, including a 21st century version of Adam Smith’s invisible hand. The paw might be connected to an outfit like Xerox or some other company providing services to these programs.
First, if the service depends on crowdsourcing, I am not sure how many of today’s expert searchers will know when a document has gone missing. Unless I had prior knowledge of a Medicare Integrity Contractor statement of work, how would I know I could not find it? Is this a flaw the site will be able to work around.
Second, I am not sure the folks who filled out Google’s form and sent proof of their wants an archive of information that was to go into the waste basket. Is there some action a forgotten person will take when he or she learns he or she is remembered?
Third, the idea is a good one. What happens when Google makes its uncomfortable to provide access to data that Google has removed? Maybe Mother Google is toothless and addled with its newfound interest in Hollywood and fashionable Google Glass gizmos. On the other hand, Google has lots of attorneys in trailers not too far from where the engineers work.
Stephen E Arnold, July 15, 2014
July 8, 2014
The article on FlowingData titled How to Make Government Data Sites Better uses the Center for Disease Control website to illustrate measures the government should take to make their data more accessible and manageable. The first suggestion is to provide files in a useable format. By avoiding PDFs and providing CSV files (or even raw data), the user will be in a much better position to work with the data. Another suggestion is simply losing or simplifying the multipart form that makes search nearly impossible. The author also proposes clearer and more consistent annotation, using the following scenario to illustrate the point,
“The CDC data subdomain makes use of the Socrata Open Data API,… It’s weekly data that has been updated regularly for the past few months. There’s an RSS feed. There’s an API. There’s a lot to like… There’s also a lot of variables without much annotation or metadata … When you share data, tell people where the data is from, the methodology behind it, and how we should interpret it. At the very least, include a link to a report in the vicinity of the dataset.”
Overall, the author makes many salient points about transparency, consistency and clutter. But there is an assumption in the article that the government actually desires to make data sites better, which may be the larger question. If no one implements these ideas, perhaps that will be answer enough.
Chelsea Kerwin, July 08, 2014
May 15, 2014
The article titled There’s a ‘Let Me Google That For You’ Bill on Talking Points Memo relates the substance of a bipartisan bill (sponsored by Tom Coburn and Clair McCaskill). The bills purpose is to save the taxpayer money by resorting to Google and eliminating the National Technical Information Service (NTIS). The article states,
“The bill is meant to cut down on “the collection and distribution of government information” by prioritizing using Google over spending money to obtain information from the National Technical Information Service (NTIS). NTIS, run by the Department of Commerce, is a repository of 3 million scientific, technical, engineering, and business texts. The bill would abolish the NTIS and move essential functions of the agency to other agencies like the National Archives.”
If the bill’s name sounds familiar, you have probably heard of the website it is named after, in which the website redirects you to Google. The bill is put forward to prevent waste by federal agencies in obtaining government documents for money when they are available online free of charge. Sounds like a no-brainer, especially since NTIS was founded in 1950, decades before the Internet was even a possibility. You can read the full bill here.
Chelsea Kerwin, May 15, 2014
April 28, 2014
I read “New RTI Search Engine Makes the Task Tougher.” A number of government sites have made changes that seem to make finding information more difficult. In some cases, locating information may be almost impossible. When I lived in Washington, DC, as a grade school student, I remember my father stopping at a government agency and walking in to obtain some information. I am not sure how my father’s approach would be received today.
In the Pune Mirror article, I noted this passage about India’s Right to Information finding system:
a new search engine has been put in place that makes it mandatory for visitors to know the specific date, topic, category and sub-category in order to track a particular circular. Also, information like mode of payment for RTI fees, circulars, advertisements and office memorandums, that were up front as per their date of issuance from the year 2005, have gone missing.
In my experience, most users are not able to provide sufficiently narrow terms or provide key details about a needed item of information. As a result, it is now trivially easy for a governmental entity to drop a old-school photographer’s cloak over some information. I noted this comment in the article:
“With the new system in place, you need to know the exact date, topic, category and sub category in order to find the circular. Considering the level of literacy in this country, who will know all details?” he demanded. “We are all stake-holders and they should have asked before making these changes. All political parties have opposed the RTI Act.
The article points to an opinion that the new Indian search system is designed to “harass” users. I don’t agree. More commercial and governmental entities are fearful of user access to some information.
Is the use of the word “transparency” a signal that finding information is not in the cards. For me, I am not too concerned. I have developed a turtle like approach to these “retrieval enhancements.” I no longer look for information online as often as I did when I was but a callow lad.
I am pulling my head in my shell now. There. That’s better. Predictive search delivers pizza and sports scores. What more does a modern person require?
Stephen E Arnold, April 28, 2014