Arolsen Archives

May 22, 2019

Documents from concentration camps have been expanded. The Arolsen Archive (the new name of the International Tracing Service) makes available 13 million pertaining to more than two million people, according to the Daily Beast (Newsweek). This is the “the world’s most comprehensive archive on the Holocaust’s victims and survivors.” You can explore the collection at this link.

Stephen E Arnold, May 22, 2019

Google: History? Backfiles Do Not Sell Ads

April 29, 2019

We spotted a very interesting article in Tablix: “Google Index Coverage”. We weren’t looking for the article, but it turned up in a list of search results and one of the DarkCyber researchers called it to my attention.

Background: Years ago we did a bit of work for a company engaged in data analysis related to the health and medical sectors. We had to track down the names of the companies who were hired by the US government to do some outsourced fraud investigation. We were able to locate the government statements of work and even some of the documents related to investigations. We noticed a couple of years ago that our bookmarks to some government documents did not resolve. With dependent on Bing, we checked that index. We tried US government Web sites related to the agencies involved. Nope. The information had disappeared, but in one case we did locate documents on a US government agency’s Web site. The data were “there” but the data were not in Bing, Exalead, Google, or Yandex. We also checked the recyclers of search results: Startpage, the DuckDuck thing, and MillionShort.

We had other information about content disappearing from sites like the Wayback Machine too. From our work for assorted search companies and our own work years ago on, which we sold to Lycos, we had considerable insight into the realities of paying for indexing that did not generate traffic or revenue. The conclusion we had reached and we assumed that other vendors would reach was:

Online search is not a “free public library.”

A library is/was/should be an archiving entity; that is, someone has to keep track and store physical copies of books and magazines.

Online services are not libraries. Online services sell ads as we did to Zima who wanted their drink in front of our users. This means one thing:

Web indexes dump costs.

The Tablix article makes clear that some data are expendable. Delete them.

Our view is:

Get used to it.

There are some knock on effects from the simple logic of reducing costs and increasing the efficiency of the free Web search systems. I have written about many of these, and you can search the 12,000 posts on this blog or pay to search commercial indexes for information in my more than 100 published articles related to search. You may even have a copy of one of my more than a dozen monographs; for example, the original Enterprise Search Reports or The Google Legacy.

  1. Content is disappearing from indexes on commercial and government Web sites. Examples range from the Tablix experience to the loss of the MIC contracts which detail exclusives for outfits like Xerox.
  2. Once the content is not findable, it may cease to exist for those dependent on free search and retrieval services. Sorry, Library of Congress, you don’t have the content, nor does the National Archives. The situation is worse in countries in Asia and Eastern Europe.
  3. Individuals — particularly the annoying millennials who want me to provide information for free — do not have the tools at hand to locate high value information. There are services which provide some useful mechanisms, but these are often affordable only by certain commercial enterprises, some academic research organizations, and law enforcement and intelligence agencies. This means that most people are clueless about the “accuracy”, “completeness,” and “provenance” of certain information.

Net net: If data generate revenue, it may be available online and findable. If the data do not, hasta la vista. The situation is one that gives me and my research team considerable discomfort.

Imagine how smart software trained on available data will behave? Probably in a pretty stupid way? Information is not what people believe it to be. Now we have a generation or two of people who think research is looking something up on a mobile device. Quite a combo: Ill informed humans and software trained on incomplete data.

Yeah, that’s just great.

Stephen E Arnold, April 28, 2019

The EU, the Internet Archive Dust Up: One Fact Overlooked

April 11, 2019

I read “EU Tells Internet Archive That Much Of Its Site Is ‘Terrorist Content’.” The main point is that Europol’s European Union Internet Referral Unit pointed out that the Internet Archive contains problematic information. The article explains that the Internet Archive explains:

there’s simply no way that (1) the site could have complied with the Terrorist Content Regulation had it been law last week when they received the notices, and (2) that they should have blocked all that obviously non-terrorist content. [emphasis in the original]

DarkCyber wants to point out a fact that may be of interest to the EUIRU and the Internet Archive; to wit: The site has information, but the site’s search system and interface make it very difficult to locate information. For EUIRU, the inadequate search system makes finding the potentially harmful information a challenge. For the Internet Archive, the findability system makes it equally difficult for IA staff to locate items so each can be reviewed.

What will the Internet Archive do? The options are limited and some are unpalatable: Fight the EU? Ignore the request? Block access from Europe? Go out of business? Address the issue head on? Worth watching how this develops.

Stephen E Arnold, April 11, 2019

Centralizing and Concentrating: Works Great Until It Does Not

April 1, 2019

No joke or joke? Let’s assume the story is true.

US airlines are proving that centralizing and concentrating online services works great until the system fails. I read “Computer Outage Affecting Major US Airlines including Southwest, Delta and United Causes Hundreds of Flight Delays Nationwide.” (I first saw the news in a UK stream from the Daily Mail, a British newspaper.) As I write this at 910 am US Eastern (April 1, 2019), the story is now appearing in other feeds. The problem appears to be one with software called Aerodata. By 840 am US Eastern time, more than 700 flights were affected.

What seems to be lousy systems administration, engineering, or business processes have made April 1, 2019, into unpleasant anecdotes, not frothy jokes.

Aerodata’s Web site cheerfully reports my public IP address which, not surprisingly, is not what my IP address is. The Web site requires Flash, a super unsecure software in my opinion. I was not able to locate current news from the company. I noticed that VMWare mentions that the company uses VSAN to power a modern software defined data center.  You can read the marketing inspired explanation at this link or you could at 917 am US Eastern on April 1, 2019.

According the a Chicago NBC outlet, all is well again. You can get this take at this link.

What happens if a cyber attack takes down a concentrated service?

Stephen E Arnold, April 1, 2019

Hashing Videos and Images Explained

March 17, 2019

A quite lucid explanation of video and image identification appears in “How Hashing Could Stop Violent Videos from Spreading.” Here’s one passage from the article:

Video hashing works by breaking down a video into key frames and giving each a unique alphanumerical signature, or hash. That hash is collected into a central database, where every video or photo that is uploaded to a platform is then compared against that dataset. The system requires a database of images and doesn’t use artificial intelligence to identify what is in an image — it only identifies a match between images and videos.

CNN emphasizes Microsoft’s PhotoDNA technology. Information about that system may be found at this link. The write up points out that Facebook and Google use “this technology.”

One question is, “If the technology is available and in use, why are offensive videos and images finding their way into public facing, easily accessible systems?”

The answer according to an expert quoted in the CNN story is:

The decision not to do this [implement more effective hashing filter methods] is a question of will and policy—not a question of technology.”

The answer is that platforms are one way to avoid the editorial responsibility associated with old school methods of communication; for example, wire services, newspapers, and magazine. These types of communication were not perfect, but in many cases, an editorial process prevented certain types of information from appearing  in certain publications. So far, the hands off approach of some digital channels and the over hyped use of smart software have not been as effective as the hopelessly old fashioned processes used by some traditional media outlets.

So will? Policy?

Nah, money, expediency, and the high school science club approach to management.

Stephen E Arnold, March 17, 2019

GPS: Ubiquitous and Helpful in Surprising Ways

March 6, 2019

Here’s a little write-up that highlights the power of GPS and WiFi tracking. Digital Trends reports, “It Turns Out That Find My iPhone Is Really Good at Finding a Stolen Car, Too.” Writer Andy Boxall relates:

“After stopping at an intersection, Chase Richardson was carjacked by an armed man who shouted for him to get out of the vehicle. Sensibly complying, Richardson got out, but at the same time left his work-issued Apple iPhone in the car. The criminal also demanded Richardson’s wallet and his own personal phone, then got in the car and drove away. The police arrived after Richardson called 911 at a Walgreens store, which is when the Find My iPhone feature was called into action. The service uses GPS to generally locate a registered device, which in this case was the work phone. The police apparently used Find My iPhone in real time to track down the stolen car. A police helicopter was called in to assist after the car was located, as the thief tried to evade arrest.”

We are pleased to learn Mr. Richardson was not hurt during the carjacking. Boxall mentions other cases where Find My iPhone has led to arrests, and notes similar tools like Google’s Find My Device, Samsung’s device location service, and third-party companies like Cerberus Anti-Theft. Such tools can be a huge help if someone makes off with your phone—or your car. Just remember that tracking software can have unintended consequences; the article closes with this kind wish:

“Whichever you choose, we hope it will only ever be used to find your phone down the back of a couch, and nothing more serious.”

We agree.

Cynthia Murrell, March 6, 2019

Moving the Google: Right to Be Forgotten Has an Impact

January 23, 2019

I have heard that it can be difficult to reach a human at Google. It appears that a Dutch surgeon and her attorneys were successful. “Right to Be Forgotten Used to Force Google to Remove Medical Negligence Link” states:

Amsterdam’s district court has forced Google to remove search results relating to a Dutch surgeon’s past medical suspension…

The difference between printed information and digital information is becoming discernible. Print can exist in multiple copies in tangible form in many places; for example, university libraries, archives, and personal information collections. Making a change to a printed document can be tricky, but it can be done.

However, changing the digital record is a bit easier; for example, deleting a pointer in an index.

The question becomes, “What happens when a person wants to reconstruct the details of a particular matter?”

The answer is that information is relative. Figuring out what happened becomes a bit more difficult and expensive.

What happens?

I can’t look up the answer online, but I could ask IBM Watson. These types of answers may have to suffice with Silly Putty information.

A court decision may leave behind a paper trail. But the actions of a single system administrator may be impossible to identify and verify.

Epistemology may be due for a renascence when setting the record straight.

Stephen E Arnold, January 23, 2019

About Those VPNs

December 26, 2018

News and chatter about VPNs are plentiful. We noted a flurry of stories about Chinese ownership of VPNs. We receive incredible deals for VPNs which are almost too good to be true. We noted this write up from AT&T (a former Baby Bell) and its Alienvault unit: “The Dangers of Free VPNs.”

The idea behind a VPN is hiding traffic from those able to gain access to that traffic. But there is a VPN provider in the mix. From that classic man in the middle position, the VPN may not be as secure as the user thinks.

The AT&T Alienvault viewpoint is slightly different: VPNs are the cat’s pajamas as long as the VPN is AT&T’s.

We learned from the write up:

Technically, VPN providers have the capacity to see everything you do while connected. If it really wanted to, a VPN company could see what videos you watched, read emails you send, or monitor your search history.

The write up points out without reference to lawful intercept orders, national security letters, and the ho hum everyday work in cheerful Ashburn, Virginia:

Thankfully, reputable providers don’t do this. A good provider shouldn’t take any logs of your activity, which means that although they could theoretically access your data, they discard it instead. These “no-log” companies don’t keep copies of your data, so even if they get subpoenaed by a government agency, they have no data that they can hand over. VPN providers may take different types of logs, so you need to be careful when reading the fine print of any potential provider. These logs can include your traffic, DNS requests, timestamps, bandwidth and IP address.

The write up includes a “How do I love thee” approach to the dangers of free VPNs.

Net net: Be scared. Just navigate to this link. AT&T provides VPN service with the goodness one expects.

By the way, note the reference to “logs.” Many gizmos in a data center offering VPN services maintain logs. Processing these auto generated files can yield quite useful information. Perhaps that’s why there are free and low cost services.

Zero logs strikes Beyond Search as something that is easy to say but undesirable and possibly difficult to achieve.

Are VPNs secure? Is Tor?

In January 2019, Beyond Search will cover more dark cyber related content. More news is forthcoming. Let’s face it enterprise search is a done deal. The Beyond Search goose is migrating to search related content plus adjacent issues like AT&T promoting its cheerful, unmonitored, we’re really great approach to online.

Stephen E Arnold, December 26, 2018

Deep Fakes: Technology Is Usually Neutral

December 18, 2018

Ferreting out fake news has become an obsession for search and AI jockeys around the globe. However, those jobs are nothing compared to the wave of fake photos and videos that grow increasingly convincing as technology helps to iron out the wrinkles. That’s a scary prospect to more than a few experts, as we discovered in a recent MIT Technology Review article, “Deepfake Busting Apps Can Spot Even A Single Pixel Out of Place.”

According to the story:

“That same technology is creating a growing class of footage and photos, called “deepfakes,” that have the potential to undermine truth, confuse viewers, and sow discord at a much larger scale than we’ve already seen with text-based fake news.”

Deepfakes are fun and possibly threatening to some. The “experts” at high tech firms will use their management expertise to reduce any anxieties the deepfakes spark. But some Luddites think these videos and images have the potential to disrupt governments and elections in countries where online is pervasive. Beyond Search is comforted by the knowledge that bright, objective, ethical minds are on the case. One question: What if these whiz kids are angling for a more selfish outcome?

Patrick Roland, December 18, 2018

Quote to Note: Experts from UK Take a Look at US Social Media

December 17, 2018

I read “Silicon Valley’s ‘Belated and Uncoordinated’ Efforts at Dealing with Russian Fake News Revealed.” The report was created by experts in the UK and leaded to the Washington Post.

Here’s a quote which suggests the principal finding:

“Social media have gone from being the natural infrastructure for sharing collective grievances and coordinating civic engagement to being a computational tool for social control, manipulated by canny political consultants and available to politicians in democracies and dictatorships alike,” the authors of the report wrote.

The idea is that technology is neutral until a person figures out how to use it as a weapon or to his or her advantage.

In the case of social media, the companies managed as if they were high school science clubs’ entries in a Science Fair, have created some interesting tools. A few of the tools are similar to the wizard who creates a death ray, uses it to cook a burger, and gives the gizmo away at a yard sale. A clever person picks it up and starts vaporizing the pets and the neighbors.

Remember that technology is neutral mantra. That’s something repeated by individuals who have not read The Technological Bluff by Jacques Ellul.

Does one want to access “all the world’s information”? Not me. Selectivity, editorial controls, policy controls, and informed decision making are helpful.

Anyone remember that Pandora’s box thing? In January 2019, Beyond Search is switching focus, and we are introducing a Web log to complement our video series “DarkCyber.”

Times, they are a-changin’.

Stephen E Arnold, December 17, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta