Russian Meddling Across Platforms
November 13, 2017
During our last presidential election, Russia sowed American division through online propaganda appearing well beyond Facebook. An article in Ubergizmo reports, “Google Finds Evidence of Russia-Linked Ads on Search, YouTube, and Gmail.” Leave it to the search company to find these clues. Writer Adnan Farooqui tells us:
The Washington Post reports that Google has discovered evidence that a campaign by the Russian government spread propaganda through advertising on its platforms. A recent report revealed that Twitter had uncovered similar ads as well. The scribe mentions that Google’s investigation into the matter is in early stages for now. It’s said to be in the process of separating ads from legitimate Russian sources from the ones used to spread propaganda.
For its part, Google assures us they are working with researchers and with other companies to investigate ways bad actors have abused the Google ecosystem. They also emphasize their “strict” policies on targeted advertising; political ads cannot be targeted by race or religion, for example. Will their efforts be enough to stop foreign interference in its tracks?
Cynthia Murrell, November 13, 2017
Google and Search Trust: Math Is Objective, Right?
November 11, 2017
I trust Google. No, I really trust Google. The reason is that I have a reasonable grasp of the mechanism for displaying search result. I also have developed some okay behaviors when I cannot locate PowerPoint files, PDF files, or find current information from pastesites. I try to look quickly at ads on a page and then discard hits which point to those “relevant” inclusions. I even avoid Google’s free services because these —despite some Xoogler protests — these can and do disappear without warning.
Trust, however, seems to mean different things to different people. Consider the write up “It’s Time to Stop Trusting Google Search Already.” The write up suggests that people usually trust Google. The main point is that those people should not trust Google. I like the “already” too. Very hip. Breezy like almost, gentle reader.
I noted this passage:
Alongside pushing Google to stop “fake news,” we should be looking for ways to limit trust in, and reliance on, search algorithms themselves. That might mean seeking handpicked video playlists instead of searching YouTube Kids, which recently drew criticism for surfacing inappropriate videos.
I find the notion of trusting algorithms interesting. Perhaps the issue is not “algorithms” but:
- Threshold values which determine what’s in and what’s out
- Data quality
- Administrative controls which permit “overrides” by really bright sales “engineers”
- The sequence in the work flow for implementing particular algorithms or methods
- Inputs from other Google systems which function in a manner similar to human user clicks
- Quarterly financial objectives.
Trust is good; knowledge of systems and methods, engineer bias, sequence influence, and similar topics might be more fruitful than this fatalistic viewpoint:
But when something like search screws up, we can’t just tell Google to offer the right answers. We have to operate on the assumption that it won’t ever have them.
By the way, was Google’s search system and method “objective” when it integrated the GoTo, Overture, Yahoo pay to play methods which culminated in the hefty payment to the Yahooligans in 2004? Was Google ever more than “Clever”?
Stephen E Arnold, November 11, 2017
Elastic Remains Strategically Bouncy
November 10, 2017
Enterprise search remains a dull and rusty sword in the museum of enterprise applications. Frankly, other than wordsmithing with wild and crazy jargon, the technology for finding information in an organization works a bit like the blacksmith under the spreading chestnut tree.
The big news from my point of view has been the uptake in open source enterprise search software. The lead dog is Lucene. Even the much hyped free version of Fast Search technology pitched as Solr is built on Lucene.
Yep, there are proprietary solutions, but where are these folks? Outfits with search technology are capturing the hearts and minds of decision makers who want solutions to findability problems, not the high speed sleet of buzzwords like ontology, taxonomy, natural language processing, facets, semantics, yada, yada, yada.
I read an article, which I assume is true, because I believe everything I read on the Internet and in white papers. The write up is “Elastic Acquires SaaS Site Search Leader Swiftype.” Elastic is the result of a bold search experience called Compass. The champion of this defunct system was Shay Banon, who created Elasticsearch.
For many people, Elasticsearch and the for fee “extras” available from the company Elastic is Lucene. Disagree? Everyone is entitled to an opinion, gentle reader.
The write up informed me:
Elastic, the company behind Elasticsearch, and the Elastic Stack, the most widely-used collection of open source products for solving mission-critical use cases like search, logging, and analytics, today announced that it has acquired Swiftype, a San Francisco-based startup founded in 2012 and backed by Y Combinator and New Enterprise Associates (NEA). Swiftype is the creator of the popular SaaS-based Site Search and the recently introduced Enterprise Search products.
Swiftype used Elastic to captur3e some customers with its search solution. According to the write up, even Dr. Pepper found a pepper upper with Swiftype’s Elasticsearch based system.
Why’s this important? I jotted down three reasons as I was watching a group of confused deer trying to cross a busy highway. (Deer, like investors in enterprise search dream spinners, are confused by the movement of fast moving automobiles and loud pick up trucks.)
First, compare Elastic’s acquisition with Lucidworks purchase of an interface company. Elastic bought people, a solution, and customers. Interfaces are okay, but those who want to find information need a system that springs into action quickly and can be used to deal with real world information problems. Arts and crafts are important, but not as important as search that returns relevant results and performs useful functions like chopping log files into useful digital lumber.
Second, Elastic has been on a role. We profiled the company for a wonky self appointed blue chip consulting firm years ago. The report went nowhere due to the managerial expertise of a self appointed search expert. See this link for details of this maven. In that report, my team of researchers verified that large companies were adopting Elasticsearch because those firms had the most to gain from an open source product which could be supported by third party engineers. Another plus was that the Elasticsearch product could be extended and amplified without the handcuffs of a proprietary search vendor’s license restrictions.
Third, Elasticsearch worked. Sure, it was a hassle to become familiar with the system. But if there were an issue, the Lucene community was usually available for advice and often for prompt fixes. Mr. Banon pushed innovations down the trail as well. It was clear five years ago and it is clear today that Elastic and Elasticsearch are the go to systems for some savvy people. Contrast that with the floundering of outfits flogging their search systems on LinkedIn or on vapid webinars about concepts.
Net net: Elastic is an outfit to watch. For most of Elastic’s competitors watching is easy when one is driving a Model T behind the race leader in one of those zippy Hellcats with 700 horsepower.
Even blacksmiths take notice when this baby roars down the highway. And the deer? The deer run the other way.
Stephen E Arnold, November 10, 2017
Google Search and Hot News: Sensitivity and Relevance
November 10, 2017
I read “Google Is Surfacing Texas Shooter Misinformation in Search Results — Thanks Also to Twitter.” What struck me about the article was the headline; specifically, the implication for me was that Google was not responding to user queries. Google is actively “surfacing” or fetching and displaying information about the event. Twitter is also involved. I don’t think of Twitter as much more than a party line. One can look up keywords or see a stream of content containing a keyword or a, to use Twitter speak, “hash tags.”
The write up explains:
Users of Google’s search engine who conduct internet searches for queries such as “who is Devin Patrick Kelley?” — or just do a simple search for his name — can be exposed to tweets claiming the shooter was a Muslim convert; or a member of Antifa; or a Democrat supporter…
I think I understand. A user inputs a term and Google’s system matches the user’s query to the content in the Google index. Google maintains many indexes, despite its assertion that it is a “universal search engine.” One has to search across different Google services and their indexes to build up a mosaic of what Google has indexed about a topic; for example, blogs, news, the general index, maps, finance, etc.
Developing a composite view of what Google has indexed takes time and patience. The results may vary depending on whether the user is logged in, searching from a particular geographic location, or has enabled or disabled certain behind the scenes functions for the Google system.
The write up contains this statement:
Safe to say, the algorithmic architecture that underpins so much of the content internet users are exposed to via tech giants’ mega platforms continues to enable lies to run far faster than truth online by favoring flaming nonsense (and/or flagrant calumny) over more robustly sourced information.
From my point of view, the ability to figure out what influences Google’s search results requires significant effort, numerous test queries, and recognition that Google search now balances on two pogo sticks. Once “pogo stick” is blunt force keyword search. When content is indexed, terms are plucked from source documents. The system may or may not assign additional index terms to the document; for example, geographic or time stamps.
The other “pogo stick” is discovery and assignment of metadata. I have explained some of the optional tags which Google may or may not include when processing a content object; for example, see the work of Dr. Alon Halevy and Dr. Ramanathan Guha.
But Google, like other smart content processing today, has a certain sensitivity. This means that streams of content processed may contain certain keywords.
When “news” takes place, the flood of content allows smart indexing systems to identify a “hot topic.” The test queries we ran for my monographs “The Google Legacy” and “Google Version 2.0” suggest that Google is sensitive to certain “triggers” in content. Feedback can be useful; it can also cause smart software to wobble a bit.
T shirts are easy; search is hard.
I believe that the challenge Google faces is similar to the problem Bing and Yandex are exploring as well; that is, certain numerical recipes can over react to certain inputs. These over reactions may increase the difficulty of determining what content object is “correct,” “factual,” or “verifiable.”
Expecting a free search system, regardless of its owner, to know what’s true and what’s false is understandable. In my opinion, making this type of determination with today’s technology, system limitations, and content analysis methods is impossible.
In short, the burden of figuring out what’s right and what’s not correct falls on the user, not exclusively on the search engine. Users, on the other hand, may not want the “objective” reality. Search vendors want traffic and want to generate revenue. Algorithms want nothing.
Mix these three elements and one takes a step closer to understanding that search and retrieval is not the slam dunk some folks would have me believe. In fact, the sensitivity of content processing systems to comparatively small inputs requires more discussion. Perhaps that type of information will come out of discussions about how best to deal with fake news and related topics in the context of today’s information retrieval environment.
Free search? Think about that too.
Stephen E Arnold, November 10, 2017
Ichidan Simplifies Dark Web Searches
November 10, 2017
Now there is an easier way to search the Dark Web, we learn from a write-up at Cylance, “Ichidan, a Search Engine for the Dark Web.” Cybersecurity pro and writer Kim Crawley informs us:
Ichidan is a search engine for looking up websites that are hosted through the Tor network, which may be the first time that’s been done at this scale. Websites on Tor usually have the .onion top level domain and you typically need a web browser with the Tor plugin or Tor’s own configured web browser in order to access them. … The search engine is less like Google and more like Shodan, in that it allows users to see technical information about .onion websites, including their connected network interfaces, such as TCP/IP ports.
Researchers at BleepingComputer explored the possibilities of this search engine. They were able to reproduce OnionScan’s findingss on the shrinkage of the Dark Web—the number of Dark Web services decreased from about 30,000 in April 2016 to about 4,400 not quite a year later (so by about 85%). Researchers found this alarming capability, too:
BleepingComputer was also able to use Ichidan to find a website which a lot of exposed ports, including OpenSSH, an email server, a Telnet implementation, vsftpd, and an exposed Fritzbox router. That sort of information is very attractive to cyber attackers. Using Ichidan is a lot easier than command line pentesting tools, which require more specific technical know-how.
Uh-oh. Crawley predicts that use of Icihan will grow as folks on both sides of the law discover its possibilities. She advises anyone administering a .onion site to strengthen their cyber defenses posthaste, “if they want to survive.”
Cynthia Murrell, November 10, 2017
Reddit Search Improves with Lucidworks
November 10, 2017
YouTube might swallow all of your free time with videos, but Reddit steals your entire life with videos, plus images, GIFS, posts, jokes, and cute pictures of doggos, danger noodles, trash pandas, and floofs. If you do not know what those are, then shame on you. If you are a redditor, then you might have noticed that the search function stinks worse than a troll face. According to TechCrunch, Reddit has finally given their search function a facelift, “Reddit Teams With Lucidworks To Build New Search Framework.”
Reddit has some serious stats when it comes to user searches and postings. The online discussion platform has more than 500 million users, generates 5 million comments, and 40 million searches are conducted each day. While one of Reddit’s search challenges is dealing with the varied content, another is returning personalized search results without redactors having to explicitly write them in the search box.
Reddit’s poor search performance is legendary and its head honchos wanted to improve it, but trying to find the time to fix it was a problem. That is why they hired Lucidworks to do the job for them:
Caldwell said that the company went with the Lucidworks Fusion platform because it had the right combination of technology and the ability to augment his engineering team, while helping search to continually evolve on Reddit. Buying a tool was only part of the solution though. Reddit also needed to hire a group of engineers with what Caldwell called “world class search and relevance engineering expertise.” To that end, he has set up a 30-person engineering search team devoted to maximizing the potential of the new search platform.
Lucidworks currently remains in charge of fixing Reddit’s search issues, but eventually, Reddit will take over. Within a few searches for danger noodle, floof, and doggo not only have more accurate results, but you can learn the aww language lingo through the results
Whitney Grace, November 10, 2017
Facebook and Foreign Policy
November 9, 2017
I knew online was important when I became involved in the commercial database sector in 1981. At that time, the idea that accessing online information to look up citations in Pharmaceutical News Index would mature into a policy crushing machine.
After reading “Facebook Can’t Cope with the World It’s Created,” I realized that online has arrived at the big dance. The company, however, lacks the jazzy moves of a John Travolta stayin’ alive.
Foreign Policy does not do fluffy “real news” write ups. You will have to navigate to the original at the link provided or make your way to a real library where the snappy publication is available.
I noted this assertion—well, maybe “real” news—in the article about everyone’s favorite social network:
On an earnings call earlier last week, Zuckerberg told investors and reporters “how upset I am that the Russians tried to use our tools to sow mistrust,” adding that he was “dead serious” about findings ways to tackle the problem. That would be a positive step — but it must also extend to examining Facebook’s tricky impacts in the rest of the world.
But the ace statement in the article is this observation, which I assume is 100 percent on the money:
In Myanmar today, Facebook is the internet.
There are some interesting groups in Myanmar, and it is reassuring to know that Facebook has everyone’s interests in mind. Free communication flows, friends, and nifty private groups.
What could possibly be untoward with these essential, unregulated modern functions? The government authorities are probably avid Facebookers too.
Stephen E Arnold, November 9, 2017
Google and VW: How the Quantum World Turns
November 9, 2017
I read “Google and VW Team Up on Quantum Computers.” The main idea is that two of the companies on EU litigators’ radar have become BFFs. Self driving cars? Clever advertising featuring the Pixel phone and VW campers driven by Woodstock types?
Neither.
The article informed me:
The two corporate heavyweights will work together using quantum computing as they try to solve complex puzzles related to the future of traffic.
I noted this statement:
“Volkswagen has enormous expertise in solving important, real-world engineering problems, and it is an honor for us to collaborate on how quantum computing may be able to make a difference in the automotive industry,” added Hartmut Neven, director of the Google Quantum Artificial Intelligence Laboratory.
Google is pretty good at cracking the problem of fake news and solving death. VW has the diesel emission technology nailed.
The fruits of this collaboration will improve the quality of life for those who have to commute in one of those autonomous autos on streets designed for medieval carts in the Italian town of Sienna. Here in Harrod’s Creek, deep in rural Kentucky I just walk. No almost unusable Google Maps. No cute VW bus with happy hippies. No worries.
Stephen E Arnold,
Yahoo: The Folks in the East Did It. Maybe
November 9, 2017
I have not thought too much about Yahoot. Sorry, I meant Yahoo. Oh, right, Yahoo is now Oath. I did revisit the online outfit when I read “Former Yahoo CEO Apologizes for Data Breaches, Blames Russians.” The real news and professional publishing outfit Thomson Reuters informed me:
Former Yahoo Chief Executive Marissa Mayer apologized on Wednesday for two massive data breaches at the internet company, blaming Russian agents for at least one of them, at a hearing on the growing number of cyber attacks on major U.S. companies.
I noted this passage:
Mayer said Yahoo has not been able to identify how the 2013 intrusion occurred and that the company did not learn of the incident until the U.S. government presented data to Yahoo in November 2016. She said even “robust” defenses are not enough to defend against state-sponsored attacks and compared the fight with hackers to an “arms race.”
There is nothing like taking action, telling users, and assuming responsibility in a timely manner. Oh, 2013 is just like yesterday.
Yahooooot. Silicon Valley sets the pace.
Stephen E Arnold, November 9, 2017
UX (That Means Interface) Excitement
November 9, 2017
I read an article in Thread Reader. The first person essay titled I think “Graviscera.” In theory, you can find the story at the link provided in the previous sentence.
The subject of the write up is the UX or what oldsters like me call an “interface.” The concept is simple, but like most digital thingies, it is a challenge to some. My father, before he died, struggled with using a mouse. He was a keyboard type of person. I find that I am okay with a mouse, but every once in a while, I long for XyWrite III+. IBM bought this fine word processors and, well, you can pretty much figure out the fate of that nifty, speedy piece of code.
In Graviscera, a person with a strong sense of what works expresses opinions about a number of the silly, perhaps stupid, interfaces foisted on users. I enjoyed the write up because it has oomph.
Here are three points, and I urge you to read the full essay. If you are under the age of 35, you will probably disagree with the ideas in the essay. If you are a bit older, you may recall keyboard centric and command line interfaces which did not require moving a cursor to and fro or putting a large finger on a Lilliputian icon in order to view a document. Believe me, old fingers and tiny icons on a zippy mobile phone can frustrate even a manic Facebook user or twitchy tweeter.
Now the three points I highlighted with an old fashioned orange marker:
- Google Maps is unusable. Yep, Graviscera nailed it. I am not sure what the Googlers are trying to accomplish with maps, but performing certain operations is impossible for Graviscera and me. Don’t believe me. Try to figure out what’s on a route from a mobile version of Google Maps. Give up? Now try the same thing on a desktop version of Google Maps. Give up? I have. As miserable as Bing is, I find its mapping function slightly less worse than Google’s.
- Use a mouse to view Twitter content. Graviscera points out that a keyboard interface would make life easier. That’s true. The low contrast of Twitter adds an additional usability challenge. Those with perfect eyesight probably love the mousing around thing on pale blue text. Well, I don’t, and Graviscera seems likely to agree.
- Keyboards and function keys work for many applications. Graviscera nails this. The focus is on point of sale terminals. But there are many applications which would benefit from consistent keyboard functions. Even the crazy IBM keyboard with the two dozen function keys were easier to operate than some mobile interfaces. Graviscera does not mention Fitbit, but I think it is a poster child for mobile wonkiness.
I recommend that you read Graviscera. Let me conclude with this quote from the write up:
Nobody will agree with me, citing anecdotes and examples that are meaningless in the current zeitgeist.
No need to fret. I agree with you.
Stephen E Arnold, November 9, 2017