February 28, 2017
This week’s HonkinNews considers the Facebook “manifesto.” Our interpretation is that companies like Facebook are countries too. Aren’t we lucky? The IBM security conference is scheduled for March 2017 and Beyond Search was invited. We assume that the data science root access breach will be one highlighted case study. The program also comments on the Pinterest Lens technology. Now after “pintering”, one can locate and buy a product. No words required. Two stories illustrate the depth or shallowness of thinking about online research. We present a list of “must use” search engines and note some notable omissions. Then we consider a comparison of conducting research on an ad supported system versus the commercial databases, books, and journals at a first-rate research library like Dartmouth’s. The subject of Google’s Loon balloons drifts in as well. We consider the question: Will Facebook free Internet drones engage in combat with Google’s free Internet Loon balloons? You can find it at this link.
Kenny Toth, February 28, 2017
February 28, 2017
I enjoy the “next frontier”-type article about search and retrieval. Consider “The Next Frontier of Internet and Search,” a write up in the estimable “real” journalism site Huffington Post. As I read the article, I heard “Scotty, give me more power.” I thought I heard 20 somethings shouting, “Aye, aye, captain.”
The write up told me, “Search is an ev3ryday part of our lives.” Yeah, maybe in some demographics and geo-political areas. In others, search is associated with finding food and water. But I get the idea. The author, Gianpiero Lotito of FacilityLive is talking about people with computing devices, an interest in information like finding a pizza, and the wherewithal to pay the fees for zip zip connectivity.
And the future? I learned:
he future of search appears to be in the algorithms behind the technology.
I understand algorithms applied to search and content processing. Since humans are expensive beasties, numerical recipes are definitely the go to way to perform many tasks. For indexing, humans fact checking, curating, and indexing textual information. The math does not work the way some expect when algorithms are applied to images and other rich media. Hey, sorry about that false drop in the face recognition program used by Interpol.
I loved this explanation of keyword search:
The difference among the search types is that: the keyword search only picks out the words that it thinks are relevant; the natural language search is closer to how the human brain processes information; the human language search that we practice is the exact matching between questions and answers as it happens in interactions between human beings.
This is as fascinating as the fake information about Boolean being a probabilistic method. What happened to string matching and good old truncation? The truism about people asking questions is intriguing as well. I wonder how many mobile users ask questions like, “Do manifolds apply to information spaces?” or “What is the chemistry allowing multi-layer ion deposition to take place?”
The write up drags in the Internet of Things. Talk to one’s Alexa or one’s thermostat via Google Home. That’s sort of natural language; for example, Alexa, play Elvis.
Here’s the paragraph I highlighted in NLP crazy red:
Ultimately, what the future holds is unknown, as the amount of time that we spend online increases, and technology becomes an innate part of our lives. It is expected that the desktop versions of search engines that we have become accustomed to will start to copy their mobile counterparts by embracing new methods and techniques like the human language search approach, thus providing accurate results. Fortunately these shifts are already being witnessed within the business sphere, and we can expect to see them being offered to the rest of society within a number of years, if not sooner.
Okay. No one knows the future. But we do know the past. There is little indication that mobile search will “copy” desktop search. Desktop search is a bit like digging in an archeological pit on Cyprus: Fun, particularly for the students and maybe a professor or two. For the locals, there often is a different perception of the diggers.
There are shifts in “the business sphere.” Those shifts are toward monopolistic, choice limited solutions. Users of these search systems are unaware of content filtering and lack the training to work around the advertising centric systems.
I will just sit here in Harrod’s Creek and let the future arrive courtesy of a company like FacilityLive, an outfit engaged in changing Internet searching so I can find exactly what I need. Yeah, right.
Stephen E Arnold, February 28, 2017
February 28, 2017
I read “Google’s Search Algorithm Is Like a Soccer Team.” Interesting notion but an old one. Years ago Google patented a system and method for deploying communication software agents. Some of these were called “janitors.” The name was cute. The idea was that the “janitors” would clean up some of the mess left when unruly bots left litter in a file structure.
The write up ignores Google’s technical documentation, journal papers, and wild and crazy patent documents. The author has a good sense of how algorithms work and how clever folks can hook them together to create a business process or manufacturing system to further the sale of online advertising.
The discussion of Google’s search algorithm (please, note the singular noun). I thought that Google had a slightly more sophisticated approach to providing search and retrieval in its various forms to its billions of information foragers.
I remember a time in the late 1990s, when co-workers would ask one another which search engine they used. Lycos? AltaVista? Yahoo? Dogpile? Ask Jeeves? The reason there was such a time, and the reason there is no longer such a time, is that Google had not yet introduced its search algorithm. Google’s search algorithm helped Google gain market share on its way to search engine preeminence. Imagine you were searching the internet in the mid 1990s, and your search engine of choice was Ask Jeeves.
Yep, that’s an interesting point: AskJeeves. As I recall, AskJeeves used manually prepared answers to a relatively small body of questions. AskJeeves was interesting but fizzled trying to generate money with online customer service. This is a last ditch tactic that many other search vendors have tried. How is that customer service working for you, gentle reader? Great, I bet.
So how does Google’s algorithm compare to a soccer team? I learned:
The search algorithm looks at a website’s incoming links and how important those pages are. The higher the number of quality page links coming in, the higher the website ranks. Think of a soccer team playing a match. Each player on one team represents a web page. And every pass made to a player on the team represents links from another website. A player’s ranking depends upon the amount of passes (links) they receive. If the player receives many passes from other important players, then the player’s score rises more than if they received passes from less talented players, i.e. those who receive fewer passes by lesser quality players. Every single time there is a pass, the rankings are updated. Google’s search algorithm uses links instead of passes.
Yep, that’s a shot on goal, but it is wide. The conclusion of this amazing soccer game metaphor is that “thus SEO was born.” And the reason? Algorithms.
That shot rolled slow and low only to bounce off the goal post and wobble wide. Time to get another forward, pay for a referee, and keep the advertising off the field. Well, that won’t work for the GOOG will it?
Stephen E Arnold, February 28, 2017
February 28, 2017
The article on Sys-Con Media titled Delivering Comprehensive Intelligent Search examines the accomplishments of World Wide Technology (WWT) in building a better search engine for the business organization. The Enterprise Search Project Manager and Manager of Enterprise Content at WWT discovered that the average employee will waste over a full week each year looking for the information they need to do their work. The article details how they approached a solution for enterprise search,
We used the Gartner Magic Quadrants and started talks with all of the Magic Quadrant leaders. Then, through a down-selection process, we eventually landed on HPE… It wound up being that we went with the HPE IDOL tool, which has been one of the leaders in enterprise search, as well as big data analytics, for well over a decade now, because it has very extensible platform, something that you can really scale out and customize and build on top of.
Trying to replicate what Google delivers in an enterprise is a complicated task because of how siloed data is in the typical organization. The new search solution offers vast improvements in presenting employees with the relevant information, and all of the relevant information and prevents major time waste through comprehensive and intelligent search.
Chelsea Kerwin, February 28, 2017
February 28, 2017
We thought Google was left-leaning, but an article at the Guardian, “How Google’s Search Algorithm Spreads False Information with a Rightwing Bias,” seems to contradict that assessment. The article cites recent research by the Observer, which found neo-Nazi and anti-Semitic views prominently featured in Google search results. The Guardian followed up with its own research and documented more examples of right-leaning misinformation, like climate-change denials, anti-LGBT tirades, and Sandy Hook conspiracy theories. Reporters Olivia Solon and Sam Levin tell us:
The Guardian’s latest findings further suggest that Google’s searches are contributing to the problem. In the past, when a journalist or academic exposes one of these algorithmic hiccups, humans at Google quietly make manual adjustments in a process that’s neither transparent nor accountable.
At the same time, politically motivated third parties including the ‘alt-right’, a far-right movement in the US, use a variety of techniques to trick the algorithm and push propaganda and misinformation higher up Google’s search rankings.
These insidious manipulations – both by Google and by third parties trying to game the system – impact how users of the search engine perceive the world, even influencing the way they vote. This has led some researchers to study Google’s role in the presidential election in the same way that they have scrutinized Facebook.
Robert Epstein from the American Institute for Behavioral Research and Technology has spent four years trying to reverse engineer Google’s search algorithms. He believes, based on systematic research, that Google has the power to rig elections through something he calls the search engine manipulation effect (SEME).
Epstein conducted five experiments in two countries to find that biased rankings in search results can shift the opinions of undecided voters. If Google tweaks its algorithm to show more positive search results for a candidate, the searcher may form a more positive opinion of that candidate.
This does add a whole new, insidious dimension to propaganda. Did Orwell foresee algorithms? Further complicating the matter is the element of filter bubbles, through which many consume only information from homogenous sources, allowing no room for contrary facts. The article delves into how propagandists are gaming the system and describes Google’s response, so interested readers may wish to navigate there for more information.
One particular point gives me chills– Epstein states that research shows the vast majority of readers are not aware that bias exists within search rankings; they have no idea they are being manipulated. Perhaps those of us with some understanding of search algorithms can spread that insight to the rest of the multitude. It seems such education is sorely needed.
Cynthia Murrell, February 28, 2017
February 27, 2017
Let’s create a scenario. You are a person trying to figure out how to index a chunk of content. You are working with cancer information sucked down from PubMed or a similar source. You run an extraction process and push the text through an indexing system. You use a system like Leximancer and look at the results. Hmmm.
Next you take a corpus of blog posts dealing with medical information. You suck down the content and run it through your extractor, your indexing system, and your Leximancer set up. You look at the results. Hmmm.
How do you figure out what terms are going to be important for your next batch of mixed content?
You might navigate to “Selecting Forecasting Methods in Data Science.” The write up does a good job of outlining some of the numerical recipes taught in university courses and discussed in textbooks. For example, you can get an overview in this nifty graphic:
And you can review outputs from the different methods identified like this:
What’s missing? For the person floundering away like one government agency’s employee at which I worked years ago, you pick the trend line you want. Then you try to plug in the numbers and generate some useful data. If that is too tough, you hire your friendly GSA schedule consultant to do the work for you. Yep, that’s how I ended up looking at:
- Manually selected data
- Lousy controls
- Outputs from different systems
- Misindexed text
- Entities which were not really entities
- A confused government employee.
Here’s the takeaway. Just because software is available to output stuff in a log file and Excel makes it easy to wrangle most of the data into rows and columns, none of the information may be useful, valid, or even in the same ball game.
When one then applies without understanding different forecasting methods, we have an example of how an individual can create a pretty exciting data analysis.
Descriptions of algorithms do not correlate with high value outputs. Data quality, sampling, understanding why curves are “different”, and other annoying details don’t fit into some busy work lives.
Stephen E Arnold, February 27, 2017
February 27, 2017
Analytics are catching up to content. In a recent ZDNet article, Digimind Partners with Ditto to Add Image Recognition to Social Media Monitoring, we are reminded images reign supreme on social media. Between Pinterest, Snapchat and Instagram, messages are often conveyed through images as opposed to text. Capitalizing on this, an intelligence software company Digimind has announced a partnership with Ditto Labs to introduce image-recognition technology into their social media monitoring software called Digimind Social. We learned,
“The Ditto integration lets brands identify the use of their logos across Twitter no matter the item or context. The detected images are then collected and processed on Digimind Social in the same way textual references, articles, or social media postings are analysed. Logos that are small, obscured, upside down, or in cluttered image montages are recognised. Object and scene recognition means that brands can position their products exactly where there customers are using them. Sentiment is measured by the amount of people in the image and counts how many of them are smiling. It even identifies objects such as bags, cars, car logos, or shoes.”
It was only a matter of time before these types of features emerged in social media monitoring. For years now, images have been shown to increase engagement even on platforms that began focused more on text. Will we see more watermarked logos on images? More creative ways to visually identify brands? Both are likely and we will be watching to see what transpires.
Megan Feil, February 27, 2017
February 27, 2017
Researchers are working to fix the problem of bias in software, we learn from the article, “He’s Brilliant, She’s Lovely: Teaching Computers to Be Less Sexist” at NPR’s blog, All Tech Considered. Writer Byrd Pinkerton begins by noting that this issue of software reflecting human biases is well-documented, citing this article from his colleague. He then informs us that Microsoft, for one, is doing something about it:
Adam Kalai thinks we should start with the bits of code that teach computers how to process language. He’s a researcher for Microsoft and his latest project — a joint effort with Boston University — has focused on something called a word embedding. ‘It’s kind of like a dictionary for a computer,’ he explains. Essentially, word embeddings are algorithms that translate the relationships between words into numbers so that a computer can work with them. You can grab a word embedding ‘dictionary’ that someone else has built and plug it into some bigger program that you are writing. …
Kalai and his colleagues have found a way to weed these biases out of word embedding algorithms. In a recent paper, they’ve shown that if you tell the algorithms to ignore certain relationships, they can extrapolate outwards.
And voila, a careful developer can teach an algorithm to fix its own bias. If only the process were so straightforward for humans. See the article for more about the technique.
Ultimately, though, the problem lies less with the biased algorithms themselves and more with the humans who seek to use them in decision-making. Researcher Kalai points to the marketing of health-related products as a project for which a company might actually want to differentiate between males and females. Pinkerton concludes:
For Kalai, the problem is not that people sometimes use word embedding algorithms that differentiate between gender or race, or even algorithms that reflect human bias. The problem is that people are using the algorithms as a black box piece of code, plugging them in to larger programs without considering the biases they contain, and without making careful decisions about whether or not they should be there.
So, though discoveries about biased software are concerning, it is good to know the issue is being addressed. We shall see how fast the effort progresses.
Cynthia Murrell, February 27, 2017
February 27, 2017
The article on InfoQ titled Amazon Introduces Rekognition for Image Analysis explores the managed service aimed at the explosive image market. According to research cited in the article, over 1 billion photos are taken every single day on Snapchat alone, compared to the 80 billion total taken in the year 2000. Rekognition’s deep learning power is focused on identifying meaning in visual content. The article states,
The capabilities that Rekognition provides include Object and Scene detection, Facial Analysis, Face Comparison and Facial Recognition. While Amazon Rekognition is a new public service, it has a proven track record. Jeff Barr, chief evangelist at AWS, explains: Powered by deep learning and built by our Computer Vision team over the course of many years, this fully-managed service already analyzes billions of images daily. It has been trained on thousands of objects and scenes. Rekognition was designed from the get-go to run at scale.
The facial analysis features include markers for image quality, facial landmarks like facial hair and open eyes, and sentiment expressed (smiling = happy.) The face comparison feature includes a similarity score that estimates the likelihood of two pictures being of the same person. Perhaps the most useful feature is object and scene detection, which Amazon believes will help users find specific moments by searching for certain objects. The use cases also span vacation rental markets and travel sites, which can now tag images with key terms for improved classifications.
Chelsea Kerwin, February 27, 2017
February 26, 2017
I read some of the Facebook manifesto. About half way through the screed I thought I was back in a class I audited decades ago about alternative political structures. That class struck me as intellectual confection, a bit like science fiction in 1962. The Facebook manifesto shared some ingredients, but it is an altogether different recipe for a new type of political construct. Facebook, not Google, is the big dog of information control. Lots of folks will not be happy; for example, traditional “real” journalists who want to pull the info-yarn and knit their vision of the perfect muffler and other countries who want to manage their information flows.
I thought about my “here we go again” reaction when I read “Facebook Plans to Rewire Your Life. Be Afraid.” Sorry, I am not afraid. Maybe when I was a bit younger, but 74 years of “innovative” thinking have dulled my senses. The write up which is from the “real” journalism outfit Bloomberg is more sensitive than I am. If you are a Facebooker, you will be happy with the Zuck’s manifesto. If you are struggling to figure out what is going on with hundreds of millions of people checking their “friends” and their “likes,” you will want to read the “real news” about Facebook.
Spoiler: Facebook is a new type of country.
The write up “reports”:
Facebook — launched, in Zuckerberg’s own words five years ago, to “extend people’s capacity to build and maintain relationships” — is turning into something of an extraterritorial state run by a small, unelected government that relies extensively on privately held algorithms for social engineering.
Yep, the same “we can do it better” thinking has infused many other high technology companies. Some see the attitude as arrogance. I see the approach as an extension of a high school math team. No one in the high school cares that much about the boys and girls who do not struggle to understand calculus. Those in the math club know that the other kids in the school just don’t “get it.”
The thinking has created some nifty technology. There’s the GOOG. There’s Palantir. There’s Uber. No doubt these companies have found traction in a world which seems to lack shared cultural norms and nation states which seem to be like a cookie jar from which elected officials take handfuls of cash.
The write up points out:
As for the “rewired” information infrastructure, it has helped to chase people into ideological silos and feed them content that reinforces confirmation biases. Facebook actively created the silos by fine-tuning the algorithm that lies at its center — the one that forms a user’s news feed. The algorithm prioritizes what it shows a user based, in large measure, on how many times the user has recently interacted with the poster and on the number of “likes” and comments the post has garnered. In other words, it stresses the most emotionally engaging posts from the people to whom you are drawn — during an election campaign, a recipe for a filter bubble and, what’s more, for amplifying emotional rather than rational arguments.
The traditional real journalists are supposed to do this job. Well, that’s real news. The New York Times wants to be like Netflix. Sounds great. In practice, well, the NYT is a newspaper with some baggage and maybe not enough cash to buy a ticket to zip zip land.
The real news story makes an interesting assertion:
It’s absurd to expect humility from Silicon Valley heroes. But Zuckerberg should realize that by trying to shape how people use Facebook, he may be creating a monster. His company’s other services — Messenger and WhatsApp — merely allow users to communicate without any interference, and that simple function is the source of the least controversial examples in Zuckerberg’s manifesto. “In Kenya, whole villages are in WhatsApp groups together, including their representatives,” the Facebook CEO writes. Well, so are my kids’ school mates, and that’s great.
But great translates to “virtual identify suicide.”
The fix? Get those billion people to cancel their accounts. Yep, that will work in the country of Facebook. I am, however, not afraid. Of course, I don’t use Facebook, worry about likes, or keep in touch with those folks from that audited class.
From my point of view, Facebook and Google to a lesser extent have been chugging along for years. Now the railroad want to lay new track. Your farm in the way? Well, there is a solution. Build the track anyway.
Stephen E Arnold, February 26, 2017