Yandex Incorporates Semantic Search

March 15, 2017

Apparently ahead of a rumored IPO launch, Russian search firm Yandex is introducing “Spectrum,” a semantic search feature. We learn of the development from “Russian Search Engine Yandex Gets a Semantic Injection” at the Association of Internet Research Specialists’ Articles Share pages. Writer Wushe Zhiyang observes that, though Yandex claims Spectrum can read users’ minds,  the tech appears to be a mix of semantic technology and machine learning. He specifies:

The system analyses users’ searches and identifies objects like personal names, films or cars. Each object is then classified into one or more categories, e.g. ‘film’, ‘car’, ‘medicine’. For each category there is a range of search intents. [For example] the ‘product’ category will have search intents such as buy something or read customer reviews. So we have a degree of natural language processing, taxonomy, all tied into ‘intent’, which sounds like a very good recipe for highly efficient advertising.

But what if a search query has many potential meanings? Yandex says that Spectrum is able to choose the category and the range of potential user intents for each query to match a user’s expectations as close as possible. It does this by looking at historic search patterns. If the majority of users searching for ‘gone with the wind’ expect to find a film, the majority of search results will be about the film, not the book.

As users’ interests and intents tend to change, the system performs query analysis several times a week’, says Yandex. This amounts to Spectrum analysing about five billion search queries.”

Yandex has been busy. The site recently partnered with VKontakte, Russia’s largest social network, and plans to surface public-facing parts of VKontakte user profiles, in real time, in Yandex searches. If the rumors of a plan to go public are true, will these added features help make Yandex’s IPO a success?

Cynthia Murrell, March 15, 2017

Enterprise Search in the Cloud: Which Service Provider?

March 9, 2017

In the wake of Amazon’s glitch, a number of publications rushed to report on the who, what, where, and why. ZDNet took a different approach in “Which Cloud Will Give You the Biggest Bang for the Buck?” The write up recycled in the best tradition of “real” journalism a report from a vendor named Cloud Spectator. I won’t ask too many questions about sample size, methodology, the meaning assigned to “value,” and statistical validity. I will assume that the information is not Facebook news.

The guts of the write  up is this chart, which is impossible to read in this blog post, but the original is reasonably legible:

image

What this chart reveals about hosting is that the 1&1 system is the big dog. I would point out that the naming of the service is “1+1” in the chart; the “real” name of the company is “1&1”, a real joy to search using free Web search systems.

Okay, 1+1 was on my radar as a very low cost provider of Web page hosting and other services. Now the company remains a low cost provider and has added a range of new services. Cloud Spectator finds the company A Number One. I was tempted to type ANo1, another keen string to plug into a Web search system.

What interested me was the cluster of outfits which the Cloud Spectator survey pegged as small dogs; for example, Amazon Web Services, the very same outfit that nuked some major Web sites. (Send in a two pizza team, Mr. Bezos.)

Close to Amazon’s lower third ranking was Microsoft Azure. Somehow that seems par for the new Microsoft. Google and the financially challenged Rackspace were in the middle of the pack. (What happened to Rackspace’s love affair with Robert Scobel, recently removed from the Gilmore Gang.)

But the major news for me was that IBM, yep, the owner of the famed and much admired Watson thing, was darn near last. IBM nosed out DimensionDate for the “Also Participated” badge.

Net net: Maybe 1&1 should get more attention. Perhaps the company will change its name to minimize the likelihood of misspellings. Alternatively 1&1 can hire Recode to endlessly repeat that one spells embarrassed with two r’s and two esses.

When it comes to search in the cloud, the question becomes, “How does one deploy an enterprise class search and content processing on the 1&1 system?” Good question.

Stephen E Arnold, March 9, 2017

Dark Web Explosives Buyer Busted Through FBI Infiltration

March 9, 2017

Here is the story of another successful Dark Web bust. Motherboard reports, “Undercover FBI Agent Busts Alleged Explosives Buyer on the Dark Web.” The 50-year-old suspect was based in Houston, and reporter Joseph Cox examined the related documents from the Southern District of Texas court. We are not surprised to learn that the FBI found this suspect through its infiltration of AlphaBay.; Cox writes:

The arrest was largely due to the work of an undercover agent who posed as an explosives seller on the dark web marketplace AlphaBay, showing that, even in the age of easy-to-use anonymization technology, old-school policing tactics are still highly effective at catching suspects.

According to the complaint, on August 21, an FBI Online Covert Employee (OCE)—essentially an undercover agent—located outside Houston logged into an AlphaBay vendor account they were running and opened an unsolicited private message from a user called boatmanstv. ‘looking for wireless transmitter with detonator,’ the message read. ‘Everything I need to set of a 5 gallon can of gas from a good distance away [sic].’ The pair started a rapport, and boatmanstv went into some detail about what he wanted to do with the explosives.

One thing led to another, and the buyer and “seller” agreed to an exchange after communicating for a couple of weeks. (Dark Web sting operations require patience. Lots of patience.) It became clear that Boatmanstv had some very specific plans in mind for a very specific target, and that he’d made plenty of purchases from AlphaBay before. The FBI was able to connect the suspect’s email account to other accounts, and finally to his place of business. He was arrested shortly after receiving and opening the FBI’s package, so it would appear there is one fewer violent criminal on the streets of Houston.

It is clear that the FBI, and other intelligence organizations, are infiltrating the Dark Web more and more. Let the illicit buyer be wary.

Cynthia Murrell, March 9, 2016

Index Is Important. Yes, Indexing.

March 8, 2017

I read “Ontologies: Practical Applications.” The main idea in the write up is that indexing is important. Now indexing is labeled in different ways today; for example, metadata, entity extraction, concepts, etc. I agree that indexing is important, but the challenge is that most people are happy with tags, keywords, or systems which return a result that has made a high percentage of users happy. Maybe semi-happy. Who really knows? Asking about search and content processing system satisfaction returns the same grim news year after year; that is, most users (roughly two thirds) are not thrilled with the tools available to locate information. Not much progress in 50 years it seems.

The write up informs me:

Ontologies are a critical component of the enterprise information architecture. Organizations must be capable of rapidly gathering and interpreting data that provides them with insights, which in turn will give their organization an operational advantage.  This is accomplished by developing ontologies that conceptualize the domain clearly, and allows transfer of knowledge between systems.

This seems to mean a classification system which makes sense to those who work in an organization. The challenge which we have encountered over the last half century is that the content and data flowing into an organization changes often rapidly over time. At any one point in time, the information today is not available. The organization sucks in what’s needed and hopes the information access system indexes the new content right away and makes it findable and usable in other software.

That’s the hope anyway.

The reality is that a gap exists between what’s accessible to a person in an organization and what information is being acquired and used by others in the organization. Search fails for most system users because what’s needed now is not indexed or if indexed, the information is not findable.

An ontology is a fancy way of saying that a consultant and software can cook up a classification system and use those terms to index content. Nifty idea, but what about that gap?

This is the killer for most indexing outfits. They make a sale because people are dissatisfied with the current methods of information access. An ontology or some other jazzed up indexing component is sold as the next big thing.

When an ontology, taxonomy, or other solution does not solve the problem, the company grouses about search and cotenant processing again.

Is there a fix? Who knows. But after 50 years in the information access sector, I know that jargon is not an effective way to solve very real problems. Money, know how, and old school methods are needed to make certain technologies deliver useful applications.

Ontologies. Great. Silver bullet. Nah. Practical applications? Nifty concept. Reality is different.

Stephen E Arnold, March 8, 2017

New Technologies Meet Resistance in Business

March 3, 2017

Trying to sell a state of the art, next-gen search and content processing system can be tough. In the article, “Most Companies Slow to Adopt New Business Tech Even When It Can Help,” Digital Trends demonstrates that a reluctance to invest in something new is not confined to Search. Writer Bruce Brown cites the Trends vs. Technologies 2016 report (PDF) from Capita Technology Solutions and Cisco. The survey polled 125 ICT [Information and Communications Tech] decision-makers working in insurance, manufacturing, finance, and the legal industry. More in-depth interviews were conducted with a dozen of these folks, spread evenly across those fields.

Most higher-ups acknowledge the importance of keeping on top of, and investing in, worthy technological developments. However, that awareness does not inform purchasing and implementation decisions as one might expect. Brown specifies:

The survey broke down tech trends into nine areas, asking the surveyed execs if the trends were relevant to their business, if they were being implemented within their industry, and more specifically if the specific technologies were being implemented within their own businesses. Regarding big data, for example, 90 percent said it was relevant to their business, 64 percent said it was being applied in their industry, but only 39 percent reported it being implemented in their own business. Artificial intelligence was ranked as relevant by 50 percent, applied in their industry by 25 percent, but implemented in their own companies by only 8 percent. The Internet of Things had 70 percent saying it is relevant, with 50 percent citing industry applications, but a mere 30 percent use it in their own business. The study analyzed why businesses were not implementing new technologies that they recognized could improve their bottom line. One of the most common roadblocks was a lack of skill in recognizing opportunities within organizations for the new technology. Other common issues were the perception of security risks, data governance concerns, and the inertia of legacy systems.

The survey also found the stain of mistrust, with 82 percent of respondents sure that much of what they hear about tech trends is pure hype. It is no surprise, then, that they hesitate to invest resources and impose change on their workers until they are convinced benefits will be worth the effort. Perhaps vendors would be wise to dispense with the hype and just lay out the facts as clearly as possible; potential customers are savvier than some seem to think.

Cynthia Murrell, March 3, 2017

 

Inside Loon Balloons

March 2, 2017

You may have heard about Google X’s Project Loon, which aims to bring Internet access to underserved, rural areas using solar-powered balloons. The post, “Here’s How Google Makes its Giant, Internet-Beaming Balloons,” at Business Insider takes us inside that three-year-old project, describing some of how the balloons are made and used. The article is packed with helpful photos and GIFs. We learn that the team has turned to hot-air-balloon manufacturer Raven Aerostar for their expertise. The write-up tells us:

The balloons fly high in the stratosphere at about 60,000 to 90,000 feet above Earth. That’s two to three times as high as most commercial airplanes. Raven Aerostar creates a special outer shell for the balloons, called the film, that can hold a lot of pressure — allowing the balloons to float in the stratosphere for longer. The film is as thin as a typical sandwich bag. … The film is made of a special formulation of polyethylene that allows it to retain strength when facing extreme temperatures of up to -112 degrees Fahrenheit.

We like the comparison sandwich bag. The balloons are tested in sub-freezing conditions at the McKinley Climatic Lab—see the article for dramatic footage of one of their test subjects bursting. We also learn about the “ballonet,” an internal compartment in each balloon that controls altitude and, thereby, direction. Each balloon is equipped with a GPS tracker, of course, and all electronics are secured in a tiny basket below.

One caveat is a bit disappointing—users cannot expect to stream high-quality videos through the balloons. Described as “comparable to 3G,” the service should be enough for one to visit websites and check email. That is certainly far better than nothing and could give rural small-business owners and remote workers the Internet access they need.

Cynthia Murrell, March 2, 2017

Search Like Star Trek: The Next Frontier

February 28, 2017

I enjoy the “next frontier”-type article about search and retrieval. Consider “The Next Frontier of Internet and Search,” a write up in the estimable “real” journalism site Huffington Post. As I read the article, I heard “Scotty, give me more power.” I thought I heard 20 somethings shouting, “Aye, aye, captain.”

The write up told me, “Search is an ev3ryday part of our lives.” Yeah, maybe in some demographics and geo-political areas. In others, search is associated with finding food and water. But I get the idea. The author, Gianpiero Lotito of FacilityLive is talking about people with computing devices, an interest in information like finding a pizza, and the wherewithal to pay the fees for zip zip connectivity.

And the future? I learned:

he future of search appears to be in the algorithms behind the technology.

I understand algorithms applied to search and content processing. Since humans are expensive beasties, numerical recipes are definitely the go to way to perform many tasks. For indexing, humans fact checking, curating, and indexing textual information. The math does not work the way some expect when algorithms are applied to images and other rich media. Hey, sorry about that false drop in the face recognition program used by Interpol.

I loved this explanation of keyword search:

The difference among the search types is that: the keyword search only picks out the words that it thinks are relevant; the natural language search is closer to how the human brain processes information; the human language search that we practice is the exact matching between questions and answers as it happens in interactions between human beings.

This is as fascinating as the fake information about Boolean being a probabilistic method. What happened to string matching and good old truncation? The truism about people asking questions is intriguing as well. I wonder how many mobile users ask questions like, “Do manifolds apply to information spaces?” or “What is the chemistry allowing multi-layer ion deposition to take place?”

Yeah, right.

The write up drags in the Internet of Things. Talk to one’s Alexa or one’s thermostat via Google Home. That’s sort of natural language; for example, Alexa, play Elvis.

Here’s the paragraph I highlighted in NLP crazy red:

Ultimately, what the future holds is unknown, as the amount of time that we spend online increases, and technology becomes an innate part of our lives. It is expected that the desktop versions of search engines that we have become accustomed to will start to copy their mobile counterparts by embracing new methods and techniques like the human language search approach, thus providing accurate results. Fortunately these shifts are already being witnessed within the business sphere, and we can expect to see them being offered to the rest of society within a number of years, if not sooner.

Okay. No one knows the future. But we do know the past. There is little indication that mobile search will “copy” desktop search. Desktop search is a bit like digging in an archeological pit on Cyprus: Fun, particularly for the students and maybe a professor or two. For the locals, there often is a different perception of the diggers.

There are shifts in “the business sphere.” Those shifts are toward monopolistic, choice limited solutions. Users of these search systems are unaware of content filtering and lack the training to work around the advertising centric systems.

I will just sit here in Harrod’s Creek and let the future arrive courtesy of a company like FacilityLive, an outfit engaged in changing Internet searching so I can find exactly what I need. Yeah, right.

Stephen E Arnold, February 28, 2017

Google and Its Search Soccer Team: Shot Hits the Post

February 28, 2017

I read “Google’s Search Algorithm Is Like a Soccer Team.” Interesting notion but an old one. Years ago Google patented a system and method for deploying communication software agents. Some of these were called “janitors.” The name was cute. The idea was that the “janitors” would clean up some of the mess left when unruly bots left litter in a file structure.

The write up ignores Google’s technical documentation, journal papers, and wild and crazy patent documents. The author has a good sense of how algorithms work and how clever folks can hook them together to create a business process or manufacturing system to further the sale of online advertising.

The discussion of Google’s search algorithm (please, note the singular noun). I thought that Google had a slightly more sophisticated approach to providing search and retrieval in its various forms to its billions of information foragers.

I remember a time in the late 1990s, when co-workers would ask one another which search engine they used. Lycos? AltaVista? Yahoo? Dogpile? Ask Jeeves? The reason there was such a time, and the reason there is no longer such a time, is that Google had not yet introduced its search algorithm. Google’s search algorithm helped Google gain market share on its way to search engine preeminence. Imagine you were searching the internet in the mid 1990s, and your search engine of choice was Ask Jeeves.

Yep, that’s an interesting point: AskJeeves. As I recall, AskJeeves used manually prepared answers to a relatively small body of questions. AskJeeves was interesting but fizzled trying to generate money with online customer service. This is a last ditch tactic that many other search vendors have tried. How is that customer service working for you, gentle reader? Great, I bet.

So how does Google’s algorithm compare to a soccer team? I learned:

The search algorithm looks at a website’s incoming links and how important those pages are. The higher the number of quality page links coming in, the higher the website ranks. Think of a soccer team playing a match. Each player on one team represents a web page. And every pass made to a player on the team represents links from another website. A player’s ranking depends upon the amount of passes (links) they receive. If the player receives many passes from other important players, then the player’s score rises more than if they received passes from less talented players, i.e. those who receive fewer passes by lesser quality players. Every single time there is a pass, the rankings are updated. Google’s search algorithm uses links instead of passes.

Yep, that’s a shot on goal, but it is wide. The conclusion of this amazing soccer game metaphor is that “thus SEO was born.” And the reason? Algorithms.

That shot rolled slow and low only to bounce off the goal post and wobble wide. Time to get another forward, pay for a referee, and keep the advertising off the field. Well, that won’t work for the GOOG will it?

Stephen E Arnold, February 28, 2017

Comprehensive, Intelligent Enterprise Search Is Already Here

February 28, 2017

The article on Sys-Con Media titled Delivering Comprehensive Intelligent Search examines the accomplishments of World Wide Technology (WWT) in building a better search engine for the business organization. The Enterprise Search Project Manager and Manager of Enterprise Content at WWT discovered that the average employee will waste over a full week each year looking for the information they need to do their work. The article details how they approached a solution for enterprise search,

We used the Gartner Magic Quadrants and started talks with all of the Magic Quadrant leaders. Then, through a down-selection process, we eventually landed on HPE… It wound up being that we went with the HPE IDOL tool, which has been one of the leaders in enterprise search, as well as big data analytics, for well over a decade now, because it has very extensible platform, something that you can really scale out and customize and build on top of.

Trying to replicate what Google delivers in an enterprise is a complicated task because of how siloed data is in the typical organization. The new search solution offers vast improvements in presenting employees with the relevant information, and all of the relevant information and prevents major time waste through comprehensive and intelligent search.

Chelsea Kerwin, February 28, 2017

When AI Spreads Propaganda

February 28, 2017

We thought Google was left-leaning, but an article at the Guardian, “How Google’s Search Algorithm Spreads False Information with a Rightwing Bias,” seems to contradict that assessment. The article cites recent research by the Observer, which found neo-Nazi and anti-Semitic views prominently featured in Google search results. The Guardian followed up with its own research and documented more examples of right-leaning misinformation, like climate-change denials, anti-LGBT tirades,  and Sandy Hook conspiracy theories. Reporters Olivia Solon and Sam Levin tell us:

The Guardian’s latest findings further suggest that Google’s searches are contributing to the problem. In the past, when a journalist or academic exposes one of these algorithmic hiccups, humans at Google quietly make manual adjustments in a process that’s neither transparent nor accountable.

At the same time, politically motivated third parties including the ‘alt-right’, a far-right movement in the US, use a variety of techniques to trick the algorithm and push propaganda and misinformation higher up Google’s search rankings.

These insidious manipulations – both by Google and by third parties trying to game the system – impact how users of the search engine perceive the world, even influencing the way they vote. This has led some researchers to study Google’s role in the presidential election in the same way that they have scrutinized Facebook.

Robert Epstein from the American Institute for Behavioral Research and Technology has spent four years trying to reverse engineer Google’s search algorithms. He believes, based on systematic research, that Google has the power to rig elections through something he calls the search engine manipulation effect (SEME).

Epstein conducted five experiments in two countries to find that biased rankings in search results can shift the opinions of undecided voters. If Google tweaks its algorithm to show more positive search results for a candidate, the searcher may form a more positive opinion of that candidate.

This does add a whole new, insidious dimension to propaganda. Did Orwell foresee algorithms? Further complicating the matter is the element of filter bubbles, through which many consume only information from homogenous sources, allowing no room for contrary facts. The article delves into how propagandists are gaming the system and describes Google’s response, so interested readers may wish to navigate there for more information.

One particular point gives me chills– Epstein states that research shows the vast majority of readers are not aware that bias exists within search rankings; they have no idea they are being manipulated. Perhaps those of us with some understanding of search algorithms can spread that insight to the rest of the multitude. It seems such education is sorely needed.

Cynthia Murrell, February 28, 2017

 

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta