Yandex Learns Search Can Be Exciting

June 6, 2017

I am not sure if this Thomson Reuters “real news” story is accurate. I found it amusing. You are on your own with this item, gentle reader.

I read “Investigators Search Ukrainian Offices of Russia’s Yandex.” The main point struck me as:

Ukraine’s State Security Service (SBU) raided the local offices of Russia’s top search site Yandex on Monday in an operation that SBU spokesman Olena Gitlyanska said was part of a treason investigation.

The operative word is treason. Exciting, right?

Yandex has previously said it operates fully in accordance with Ukrainian law. It does not expect sanctions to have a material negative impact on its business.

Let’s assume that the “real news” is accurate. The idea that a Web indexing company is guilty of treason is interesting. I know that in my word with a parent’s group to identify potentially harmful sites for their children, I use Yandex as an example.

Ukrainian officials did not reference Yandex’s more interesting indexing policies. That’s a shame. Treason may be more important to the Ukrainian government that links to certain interesting types of videos.

Treason can have a “material negative impact,” however.

Stephen E Arnold, June 5, 2017

Google: Administrivia Is Hard and Expensive

May 29, 2017

I read “Accursed of Underpaying Women, Google Says It’s Too Expensive to Get Wage Data.” The real journalism outfit The Guardian revealed:

Google argued that it was too financially burdensome and logistically challenging to compile and hand over salary records that the government has requested, sparking a strong rebuke from the US Department of Labor (DoL), which has accused the Silicon Valley firm of underpaying women.

An attorney representing the government allegedly said:

“Google would be able to absorb the cost as easy as a dry kitchen sponge could absorb a single drop of water.”

It seems that Google is not into administrivia. It seems that Google wants to husband its resources. Solving death and Loon balloons need funding.

Tough luck, US Department of Labor.

Google allegedly explained:

“This is obviously a very time-consuming and burdensome project,” said Lisa Barnett Sween, one of Google’s attorneys, claiming that the company has already worked 2,300 hours at a cost of nearly $500,000 to partially comply with the government’s demands, which she argued were broad and unconstitutional. “Our courts must act to check this abuse of power.”

Absolutely. Obvious.

Google did promote Dr. Anna Patterson, the founder of Cuil and Xift (both search engines) recently. See. Progress. How long has Dr. Patterson been laboring at the GOOG. I think it is creeping up on a decade more or less.

Google and women. A perfect match. Why can’t the lawyers representing the US Department of Labor understand this simple fact. Equality, the hallmark of a high school science club.

Administrative detail. My hunch is that it is not interesting and maybe, just maybe… Never mind.

Stephen E Arnold, May 28, 2017

AI Not to Replace Lawyers, Not Yet

May 9, 2017

Robot or AI lawyers may be effective in locating relevant cases for references, but they are far away from replacing lawyers, who still need to go to the court and represent a client.

ReadWrite in a recently published analytical article titled Look at All the Amazing Things AI Can (and Can’t yet) Do for Lawyers says:

Even if AI can scan documents and predict which ones will be relevant to a legal case, other tasks such as actually advising a client or appearing in court cannot currently be performed by computers.

The author further explains that what the present generation of AI tools or robots does. They merely find relevant cases based on indexing and keywords, which was a time-consuming and cumbersome process. Thus, what robots do is eliminate the tedious work that was performed by interns or lower level employees. Lawyers still need to collect evidence, prepare the case and argue in the court to win a case. The robots are coming, but only for doing lower level jobs and not to snatch them.

Vishol Ingole, May 9, 2017

Google: The Male Female Thing

April 10, 2017

I have fond memories of my high school’s science club. My hunch is that some Google-type companies do too.

I look back and remember the days of Donald Jackson, who with his brother Bernard, published an article in a peer reviewed astronomy journal. Those guys were fixated on the moon. Go figure.

There was a canny lad named Phil Herbst, who shifted to fuzzy science with his interest in anthropology. Misguided. Anthropology. Who cares about that?

There was Steve Connett, who was into electrical engineering and the goodies which that required his parents to provide.

And the others?Males. Every one of them.

I don’t recall any females in the science club. Super smart Hope Davis, one of the females in my advanced physics class, had perfect pitch, a knack for mathematics, and a well founded disdain for the males in the science club.

My experience with her as a lab partner is that she was smarter than most of the fellows who gathered a couple of times a month to discuss explosives, corrosive chemical compounds, circuits which could terminate certain creatures with a zap, and the other nifty things the dozen or so regulars found fascinating.

Why was science club in the rust belt in 1958 a no go zone for really smart people like Hope Davis?

Image result for nerds

My favorite line from the motion picture “Revenge of the Nerds” is, “Nerds.” Poetic.

My answer is that the males in my science club were not exactly hot social items. Although I was the dumbest person in the club, I shared three qualities with the real brainiacs in the group:

  1. Zero awareness of females and their abilities. I was an only child, had zero exposure to females outside of class, and lived within my own weird little world of books and model airplanes
  2. My notion of conversation was my ability to repeat almost anything I read verbatim. (Alas, as I age, that wonderful automatic function does not work as well as it did. But when it was in high gear, absolutely no female in any of my classes wanted to speak with me. Who wanted a fat, nearsighted meatware audio book for a friend?)
  3. I was deeply uncomfortable around anyone not in the odd ball special classes my high school offered for students who seemed to get A grades and did not participate in [a] sports, [b] school governance, [c] social activities like parties and dances, and [d] activities understood by the high school administrators.

I thought of my high school science club when I read “Google Accused of ‘Extreme’ Gender Pay Discrimination by US Labor Department.” I quite like the word “extreme.” Quite charged and suggestive. I learned:

Google has discriminated against its female employees, according to the US Department of Labor (DoL), which said it had evidence of “systemic compensation disparities”.

Making a leap from the particular allegation against Google to a fuzzy swath of California, the real journalists who are struggling with their own demons, states:

The explosive allegation against one of the largest and most powerful companies in Silicon Valley comes at a time when the male-dominated tech industry is facing increased scrutiny over gender discrimination, pay disparities and sexual harassment.

Does the word “extreme” up the ante?

Read more

Why Do We Care More About Smaller Concerns? How Quantitative Numbing Impacts Emotional Response

February 14, 2017

The affecting article on Visual Business Intelligence titled When More is Less: Quantitative Numbing explains the phenomenon that many of us have probably witnessed on the news, in our friends and family, and even personally experienced in ourselves. A local news story about the death of an individual might provoke a stronger emotional response than news of a mass tragedy involving hundreds or thousands of deaths. Scott Slovic and Paul Slovic explore this in their book Numbers and Nerves. According to the article, this response is “built into our brains.” Another example explains the Donald Trump effect,

Because he exhibits so many examples of bad behavior, those behaviors are having relatively little impact on us. The sheer number of incidents creates a numbing effect. Any one of Trump’s greedy, racist, sexist, vulgar, discriminatory, anti-intellectual, and dishonest acts, if considered alone, would concern us more than the huge number of examples that now confront us. The larger the number, the lesser the impact…This tendency… is automatic, immediate, and unconscious.

The article suggests that the only reason to overcome this tendency is to engage with large quantities in a slower, more thoughtful way. An Abel Hertzberg quote helps convey this approach when considering the large-scale tragedy of the Holocaust: “There were not six million Jews murdered: there was one murder, six million times.” The difference between that consideration of individual murders vs. the total number is stark, and it needs to enter into the way we process daily events that are happening all over the world if we want to hold on to any semblance of compassion and humanity.

Chelsea Kerwin, February 14, 2017

Another Untraceable Dark Web Actor Put Behind Bars

January 19, 2017

A prison librarian in England who purchased drugs and weapons over the Dark Web for supplying them to prisoners was sentenced to 7-years in prison.

The Register in a news report Prison Librarian Swaps Books for Bars After Dark-Web Gun Buy Caper says:

Dwain Osborne, of Avenue Road, Penge, in London, was nabbed in October of 2015 after he sought to procure a Glock 19 – a staple of police and security forces worldwide – and 100 rounds of ammunition on the dark web. A search of Osborne’s house revealed the existence of a storage device, two stolen passports, and a police uniform.

Osborne was under the impression that like other Dark Web actors, he too is untraceable. What made the sleuths suspicious is not known, however, the swift action and prosecution are commendable. Law enforcement agencies are challenged by this new facet of crime wherein most perpetrators manage to remain anonymous.

Most arrests related to the purchase of arms and drugs over Dark Web were result of undercover operations. However, going beyond this type of modus operandi is the need of the hour.

Systems like Apacke Teka seem to be promising, but it is premature to say how such kind of systems will evolve and most importantly, will be implemented.

Vishal Ingole, January 19, 2017

Autonomy and Hewlett Packard: A How To from Fortune

January 16, 2017

I read “How Autonomy Fooled Hewlett-Packard.” The article was written by Jack T. Cielsielski, who is president of R.G. Associates, Inc. in Baltimore, Maryland. Mr. Ciesielski’s company publishes “The Analyst’s Accounting Observer, which is described as “a research service for institutional investors.” The company offers this example return on a $1 million investment:


The caption for the chart is “All performance data is net of advisory fees.  3, 5, 10 year returns are annualized total returns.  Inception is the annualized total return since 12/31/1992.  S&P 500 Total Return sourced from  Past performance is not indicative of future results.”

I am not sure if the write up is a Fortune-edited article, a Fortune-commissioned article, or an inclusion in Fortune which an entity purchased. For the purposes of Beyond Search, I will assume that the article is an example of “real” reporting and spot on in its objectivity and accuracy. I recognize that depending on where one sits and the tools and information available will affect what one perceives. This is the viewshed problem, which is illustrated below. Each color shows what the respective observer “sees.”

Image result for viewshed

I was interested in the write up because the legal dispute between the “old” Hewlett Packard and executives of Autonomy is on going. Obviously neither Mr. Ciesielski  Fortune does not want to find itself in the legal crossfire. My assumption is, therefore, that Fortune’s “real” journalists have figured out some of the nuances of the HP-Autonomy matter. I would point out that these nuances were overlooked or misinterpreted by HP’s executives, Board members, advisers, lawyers, and accountants. Too bad neither HP nor Autonomy had Fortune-caliber experts assisting when the $11 billion deal was conceived, executed, understood, and prosecuted. Some outfits have smarter, more thorough investigators, researchers, and analysts.

The write up points out that the former top dog of Autonomy USA (Christopher Egan) had to pay $800,000 in November 2016 he garnered from the HP buy out. The prime mover in this check writing was the US Securities & Exchange Commission. The Fortune article states:

HP relied on figures he had helped inflate. The facts of the case are now public.

Here’s the method used by Autonomy as reported by Fortune:

Autonomy’s UK-based senior managers directed a program swelling revenues by almost $200 million. Autonomy sold its software through “value-added” resellers, legitimate businesses providing additional services and support to product end users while also selling Autonomy’s software. Just five resellers, in 30 transactions, provided services to Autonomy that couldn’t be called legitimate.

Read more

Nagging for Google for Relevance Ranking Secrets

January 3, 2017

I read “Good Luck in Making Google Reveal Its Algorithm.” The title is incorrect. I think the word I expected was “algorithms and administrative interfaces.” The guts of Google’s PageRank system appear in the PageRank patent assigned to the Stanford Board of Directors. Because the “research” for PageRank is based in part on a US government grant, the PageRank method discloses the basic approach of the Google. If one looks at the “references” to other work, one will find mentions of Eugene Garfield (the original citation value wizard), the IBM Almaden Clever team, and a number of other researchers and inventors who devised a way to figure out what’s important in the context of linked information.

What folks ignore is that it is expensive to reengineer the algorithmic plumbing at an outfit like Google. Think in terms of Volkswagen rewriting its emissions code and rebuilding its manufacturing plants to produce non cheating vehicles. That’s the same problem the Google has faced but magnified by the rate at which changes have been required to keep the world’s most loved Web search system [a] working, [b] ahead of the spoofers who can manipulate Mother Google’s relevance ranking, [c] diverse content including videos and the social Plus stuff, and [d] mobile.

The result is that Google has taken its Airstream trailer and essentially added tailfins, solar panels, and new appliances; that is, the equivalent of a modern microwave instead of the old, inefficient toaster oven. But the point is that the Google Airstream is still an Airstream just “new and improved.”

The net net is that Google itself cannot easily explain what happens within the 15 years and ageing fast relevance Airstream. Outsiders essentially put up content, fiddle with whatever controls are available, and then wait to see what happens when one runs a query for the content.

The folks driving the Ford F-150 pulling the trailer have controls in the truck. The truck has a dashboard. The truck has extras. The truck has an engine. The entire multi part assemble is the Google search system.

The point is that Google’s algorithm is not ONE THING. It is a highly complex system, and there are not many people around who know the entire thing. The fact that it works is great. Sometimes, however, the folks driving the Ford F 150 have to fiddle with the dials and knobs. That administrative control panel is hooked to some parts of the gear in the Airstream. Other dials just do things to deal with what is happening right now. Love bugs make it hard to see out of the windscreen, so the driver squirts bug remover fluid and turns on the windshield wipers. The Airstream stuff comes along for the ride.

The article cited above explains that Google won’t tell a German whoop-de-doo how it works. Well, the author has got the “won’t tell” part right. Even if Google wanted to explain how its “algorithm” works, the company would probably just point to a stack of patents and journal articles and say, “There you go.”

The write up states:

We know that search results – and social media news feeds – are assembled by algorithms that determine the websites or news items likely to be most “relevant” for each user. The criteria used for determining relevance are many and varied, but some are calibrated by what your digital trail reveals about your interests and social network and, in that sense, the search results or news items that appear in your feed are personalized for you. But these powerful algorithms, which can indeed shape how you see the world, are proprietary and secret, which is wrong. So, Merkel argues, they should be less opaque.

The article also is correct when it says:

So just publishing secret stuff doesn’t do the trick. In a way, this is the hard lesson that WikiLeaks learned.

The write up uses Google as a whipping post. The issue is not math. The issue is the gap between those who use methods that are “obvious” and those who look for fuzzy solutions. Why not focus on other companies which use “obvious” systems and methods? Answer: Google is a big, fat, slow moving, predictable, ageing target.

Convenient for real journalists. Oh, 89 percent of this rare species does their research via Google, clueless about how the sausage is made. Grab those open source documents and start reading.

Stephen E Arnold, January 4, 2016

For Sale: Government Web Sites at a Bargain

December 21, 2016

We trust that government Web sites are safe and secure with our information as well as the data that keeps our countries running.  We also expect that government Web sites have top of the line security software and if they did get hacked, they would be able to rectify the situation in minutes.  Sadly, this is not the case says Computer World, because they posted an article entitled, “A Black Market Is Selling Access To Hacked Government Servers For $6.”

If you want to access a government server or Web site, all you need to do is download the Tor browser, access the xDedic marketplace on the Dark Web, and browse their catalog of endless government resources for sale.  What is alarming is that some of these Web sites are being sold for as little as six dollars!

How did the xDedic “merchants” get access to these supposed secure government sites?  It was through basic trial and error using different passwords until they scored a hit.  Security firm Kaspersky Lab weighs in:

It is a hacker’s dream, simplifying access to victims, making it cheaper and faster, and opening up new possibilities for both cybercriminals and advanced threat actors,’ Kaspersky said.

Criminal hackers can use the servers to send spam, steal data such as credit card information, and launch other types of attack…Once buyers have done their work, the merchants put the server back up for sale. The inventory is constantly evolving.

It is believed that the people who built the xDedic are Russian-speakers, possibly from a country with that as a language.  The Web site is selling mostly government site info from the Europe, Asia, and South America.  The majority of the Web sites are marked as “other”, however.  Kaspersky track down some of the victims and notified them of the stolen information.

The damage is already done.  Governments should be investing in secure Web software and testing to see if they can hack into them to prevent future attacks.  The Dark Web scores again.

Whitney Grace, December 21, 2016

IBM Open Sourciness Goes Only So Far

December 19, 2016

I love IBM, Big Blue, creator of Watson. Watson, as you may know, is a confection consisting of goodies from IBM’s internal code wizards, acquired technologies like the instantly Big Data friendly Vivisimo, and Lucene. Yep, like Attivio and many other “search” vendors, open source Lucene is the way to reduce the costs for basic information retrieval.

I assume you know about OpenLava, which is an open source system for managing certain types of IBM systems. The Open Lava Web page here states:

With an active community of users and developers, OpenLava development is accelerating, delivering high-quality implementations of important new features including:

  • Fair-share scheduling – allocate resources between users and groups according to configurable policies
  • Job pre-emption – Ensure that critical users, jobs and groups have the resources they need – when they need them
  • Docker support – Providing application isolation, fast service deployment and cloud mobility
  • Cloud & VM friendly auto-scaling – Easily add or remove cluster nodes on the fly without cluster re-configuration

These features are in addition to the many advanced capabilities already in OpenLava including job arrays, run-windows, n-way host failover, job limits, dependencies for multi-step workflows, parallel job support and much more.

I read “OpenLava under IBM Attack.” I believe everything I read on the Internet. The write up explains that that Big Blue wants the OpenLava open source code removed. The write up states:

IBM claims that the versions of OpenLava starting from 3.0 infringe their copyright
and that some source code have been stolen from them, copied, or otherwise taken
from their code base.

Several thoughts:

  1. The folks involved with OpenLava did knowingly and intentionally rip off IBM’s software, and the marketer of Watson and its open source tinged Watson is taking a logical and appropriate action against the open source alternative to IBM’s own management software
  2. IBM is unhappy with OpenLava’s adoption by IBM customers. IBM customers should buy only software from IBM-authorized sources. Other old school enterprise software companies have this philosophy too.
  3. There is a failure to communicate. OpenLava is not making its case understandable to the outfit poised to hire 25,000 more employees and IBM is not making itself clear to the crafty folks at OpenLava.

I don’t have a dog in the fight. But I find it interesting that IBM Watson with its Lucene tinged capabilities is finding open source distasteful in some circumstances.

Life was far simpler when open source projects were more malleable. Next stop? The legal eagles’ nests.

Stephen E Arnold, December 19, 2016

Next Page »

  • Archives

  • Recent Posts

  • Meta