Google and Amping the Pressure in the Ad Fire Hose

December 7, 2017

Screen real estate for mobile devices is limited. The number of queries on desktop boat anchor computers has flat lined, even for “real” researchers. What’s the fix?

A partial answer may appear in “Improving Search and Discovery on Google.” I learned from the write up:

More related searches. Google helps a busy person consider alternative ways of obtaining needed information.
Featured snippets. Google decides what’s important so a busy person does not have to think or assess too much.
Knowledge panels. Google helps a user obtain “real” knowledge. No thinking required.

Each of these search boosters allow Google to line up and display more advertising. Each time one clicks or swipes, Google obtains another item of data to allow its system to “predict” what a user wants and needs.

Now that’s relevance. Ads and feedback.

Why? To the user, search is just “there.” To Google, it’s a way to consume that Adwords inventory in my opinion.

Relevance? What could be more relevant than information which makes thinking easy?

Keep the money flowing in I say.

Stephen E Arnold, December 7, 2017

Written by Stephen E. Arnold · Filed Under Google, News, Search | Comments Off on Google and Amping the Pressure in the Ad Fire Hose

Neural Network Revamps Search for Research

December 7, 2017

Research is a pain, especially when you have to slog through millions of results to find specific and accurate results. It takes time and lot of reading, but neural networks could cut down on the investigation phase. The Economist wrote a new article about how AI will benefit research: “A Better Way To Search Through Scientific Papers.”

The Allen Institute for Artificial Intelligence developed Semantic Search to aid scientific research. Semantic Search’s purpose is to discover scientific papers most relevant to a particular problem. How does Semantic Scholar work?

Instead of relying on citations in other papers, or the frequency of recurring phrases to rank the relevance of papers, as it once did and rivals such as Google Scholar still do, the new version of Semantic Scholar applies AI to try to understand the context of those phrases, and thus achieve better results.

Semantic Scholar relies on a neural network, a system that mirrors real neural networks and learns by trial and error tests. To make Semantic Search work, the Allen Institute team annotated ten and sixty-seven abstracts. From this test sample, they found 7,000 medical terms with which 2,000 could be paired. The information was fed into the Semantic Search neural network, then it found more relationships based on the data. Through trial and error, the neural network learns more patterns.

The Allen Institute added 26 million biomedical research papers to the already 12 million in the database. The plan is to make scientific and medical research more readily available to professionals, but also to regular people.

Whitney Grace, December 7, 2017

Written by Stephen E. Arnold · Filed Under AI, News, Search, Semantic | Comments Off on Neural Network Revamps Search for Research

No More International Google Searches

December 6, 2017

One of the better things about Google is that when you needed to search for results in a different country, all you needed to do was change the domain tag. Google has decided it does not want to do that anymore shares the Verge in the article, “Google No Longer Lets You Change Domains To Search Other Countries.”

Google, instead, will deliver localized results based on your location.

If you need to access international results, however, the option can be changed on the settings menu on the bottom of google.com. Yes, you have to look for it, but it is there. Why does Google want to do this?

Google says it’s making the change because one out of five searches “is related to location,” and the company feels it’s critical to offer local information to provide the best results. The feature seems to be tailored most toward travelers: Google says that if you visit another country, it’ll automatically serve results local to where you’re visiting, then switch back again as soon as you arrive home. Before, if a traveler had kept typing in their home country’s Google domain, they may not have gotten what Google sees as ideal search results.

Before you think this is another way Google is trying to control search content, apparently Alphabet Inc. has already been doing this with YouTube and Gmail. The procedure has just been carried over to search results, but at least there is a way out of the localized content.

Whitney Grace, December 6, 2017

Written by Stephen E. Arnold · Filed Under Google, News, Search, Search quality | Comments Off on No More International Google Searches

Big Data and Search Solving Massive Language Processing Headaches

December 4, 2017

Written language can be a massive headache for those needing search strength. Different spoken languages can complicate things when you need to harness a massive amount of data. Thankfully, language processing is the answer, as software architect Federico Thomasetti wrote in his essay, “A Guide to Natural Language Processing.”

According to the story:

…the relationship between elements can be used to understand the importance of each individual element. TextRank actually uses a more complex formula than the original PageRank algorithm, because a link can be only present or not, while textual connections might be partially present. For instance, you might calculate that two sentences containing different words with the same stem (e.g., cat and cats both have cat as their stem) are only partially related.

The original paper describes a generic approach, rather than a specific method. In fact, it also describes two applications: keyword extraction and summarization. The key differences are:

the units you choose as a foundation of the relationship
the way you calculate the connection and its strength

Natural language processing is a tricky concept to wrap your head around. But it is becoming a thing that people have to recognize. Currently, millions of dollars are being funneled into perfecting this platform. Those who can really lead the pack here will undoubtedly have a place at the international tech table and possibly take over. This is a big deal.

Patrick Roland, December 4, 2017

Written by Stephen E. Arnold · Filed Under algorithms, News, Search, Technology | Comments Off on Big Data and Search Solving Massive Language Processing Headaches

Google Maps Misses the Bus

December 4, 2017

Google Maps is the preferred GPS system for millions of people. It uses real-time information to report accidents and stay updated on road conditions. It is great when you are driving or walking around a city, but when it comes to public transportation, especially to the airports, Google ignores it. City Lab discusses, “Why Doesn’t Google Maps Know The Best Way To the Airport?”

Speaking from personal experience on a recent trip to New York City, I had to get from Queens to LaGuardia airport. Google Maps took me the most roundabout way possible, instead of routing me to direct trains and buses. Google’s directions may have required less train switching, but it took me in the opposite direction of my destination.

Google Maps has a problem listing airport specific transportation in its app, but it really should not be a problem.

As Google describes things, putting those city-to-terminal routes into its mapping apps shouldn’t be that hard. A transit operator has to apply to be listed in Google Transit, publish its schedule in the standard General Transit Feed Specification (GTFS) format, and have Google run some quality tests on that feed before factoring it into directions.

But some smaller transit operations don’t get to the first step. They don’t even know it’s an option.

Transportation services may not know how to be added to Google, but Google also not reached out to them. Historically, Google has only reached out to large transportation entities, because it meant more business on their end. Google also has this weird clause transportation services need to sign before their information is added to Google Maps. It alleviates Google from “any defects in the data” and it sounds like Google does not want to be held responsible for misinformation displayed on Google Maps.

Whitney Grace, December 4, 2017

Written by Stephen E. Arnold · Filed Under Digital Library, Google, News, Search | Comments Off on Google Maps Misses the Bus

The Worlds Wealthiest People Should Fear Big Data

November 24, 2017

One of the strengths that the planets elite and wealthy have is secrecy. In most cases, average folks and media don’t know where big money is stored or how it is acquired. However, that recently changed for The Queen of England, several Trump cabinet members, and other powerful men and women. And they should be afraid of what big data and search can do with their info, as we learned in the Guardian’s piece, “Paradise Papers Leak Reveals Secrets of the World’s Elite Hidden Wealth.”

The story found a lot of fishy dealings with political donors and those in power, Queen Elizabeth having tax-free money in the Caymans and more. According to the story:

At the centre of the leak is Appleby, a law firm with outposts in Bermuda, the Cayman Islands, the British Virgin Islands, the Isle of Man, Jersey and Guernsey. In contrast to Mossack Fonseca, the discredited firm at the centre of last year’s Panama Papers investigation, Appleby prides itself on being a leading member of the “magic circle” of top-ranking offshore service providers.

Appleby says it has investigated all the allegations, and found “there is no evidence of any wrongdoing, either on the part of ourselves or our clients”, adding: “We are a law firm which advises clients on legitimate and lawful ways to conduct their business. We do not tolerate illegal behaviour.

Makes you wonder what would happen if some of the brightest minds in search and big data got ahold of this information? We suspect a lot of the financial knots this money ties to keep itself concealed would untangle. In an age of increasing transparency, we wouldn’t be shocked to see that happen.

Patrick Roland, November 24, 2017

Written by Stephen E. Arnold · Filed Under Big data, cybersecurity, News, Search | 2 Comments

Google Relevance: A Light Bulb Flickers

November 20, 2017

The Wall Street Journal published “Google Has Chosen an Answer for You. It’s Often Wrong” on November 17, 2017. The story is online, but you have to pay money to read it. I gave up on the WSJ’s online service years ago because at each renewal cycle, the WSJ kills my account. Pretty annoying because the pivot of the WSJ write up about Google implies that Google does not do information the way “real” news organizations do. Google does not annoy me the way “real” news outfits handle their online services.

For me, the WSJ is a collection of folks who find themselves looking at the exhaust pipes of the Google Hellcat. A source for a story like “Google Has Chosen an Answer for You. It’s Often Wrong” is a search engine optimization expert. Now that’s a source of relevance expertise! Another useful source are the terse posts by Googlers authorized to write vapid, cheery comments in Google’s “official” blogs. The guts of Google’s technology is described in wonky technical papers, the background and claims sections of the Google’s patent documents, and systematic queries run against Google’s multiple content indexes over time. A few random queries does not reveal the shape of the Googzilla in my experience. Toss in a lack of understanding about how Google’s algorithms work and their baked in biases, and you get a write up that slips on a banana peel of the imperative to generate advertising revenue.

I found the write up interesting for three reasons:

Unusual topic. Real journalists rarely address the question of relevance in ad-supported online services from a solid knowledge base. But today everyone is an expert in search. Just ask any millennial, please. Jonathan Edwards had less conviction about his beliefs than a person skilled in the use of locating a pizza joint on a Google Map.
SEO is an authority. SEO (search engine optimization) experts have done more to undermine relevance in online than any other group. The one exception are the teams who have to find ways to generate clicks from advertisers who want to shove money into the Google slot machine in the hopes of an online traffic pay day. Using SEO experts’ data as evidence grinds against my belief that old fashioned virtues like editorial policies, selectivity, comprehensive indexing, and a bear hug applied to precision and recall calculations are helpful when discussing relevance, accuracy, and provenance.
You don’t know what you don’t know. The presentation of the problems of converting a query into a correct answer reminds me of the many discussions I have had over the years with search engine developers. Natural language processing is tricky. Don’t believe me. Grab your copy of Gramatica didactica del espanol and check out the “rules” for el complemento circunstancial. Online systems struggle with what seems obvious to a reasonably informed human, but toss in multiple languages for automated question answer, and “Houston, we have a problem” echoes.

I urge you to read the original WSJ article yourself. You decide how bad the situation is at ad-supported online search services, big time “real” news organizations, and among clueless users who believe that what’s online is, by golly, the truth dusted in accuracy and frosted with rightness.

Humans often take the path of least resistance; therefore, performing high school term paper research is a task left to an ad supported online search system. “Hey, the game is on, and I have to check my Facebook” takes precedence over analytic thought. But there is a free lunch, right?

In my opinion, this particular article fits in the category of dead tree media envy. I find it amusing that the WSJ is irritated that Google search results may not be relevant or accurate. There’s 20 years of search evolution under Googzilla’s scales, gentle reader. The good old days of the juiced up CLEVER methods and Backrub’s old fashioned ideas about relevance are long gone.

I spoke with one of the earlier Googlers in 1999 at a now defunct (thank goodness) search engine conference. As I recall, that confident and young Google wizard told me in a supercilious way that truncation was “something Google would never do.”

What? Huh?

Guess what? Google introduced truncation because it was a required method to deliver features like classification of content. Mr. Page’s comment to me in 1999 and the subsequent embrace of truncation makes clear that Google was willing to make changes to increase its ability to capture the clicks of users. Kicking truncation to the curb and then digging through the gutter trash told me two things: [a] Google could change its mind for the sake of expediency prior to its IPO and [b] Google could say one thing and happily do another.

I thought that Google would sail into accuracy and relevance storms almost 20 years ago. Today Googzilla may be facing its own Ice Age. Articles like the one in the WSJ are just belated harbingers of push back against a commercial company that now has to conform to “standards” for accuracy, comprehensiveness, and relevance.

Hey, Google sells ads. Algorithmic methods refined over the last two decades make that process slick and useful. Selling ads does not pivot on investing money in identifying valid sources and the provenance of “facts.” Not even the WSJ article probes too deeply into the SEO experts’ assertions and survey data.

I assume I should be pleased that the WSJ has finally realized that algorithms integrated with online advertising generate a number of problematic issues for those concerned with factual and verifiable responses.

Written by Stephen E. Arnold · Filed Under algorithms, Business strategy, Feature, Search | Comments Off on Google Relevance: A Light Bulb Flickers

Searx: Another Privacy Oriented Web Search System

November 13, 2017

There are a number of privacy oriented Web search systems. If you want to poke around, try the quirky Unbubble or give Gibiru a whirl. I noted another entrant called Searx. There are some important differences. Searx is a system which takes a page from peer to peer access systems. You host it yourself. The system is a metasearch engine like Ixquick (Startpage). This means that the user’s query is converted to the query syntax used by search systems like Bing.com. The results are merged and a results list displayed. Deduplication is a slippery fish. You will need to scan the results and run through the familiar, but much maligned procedure of scan, click, browse, and save the Web page with the information you want. If you are like a millennial, you will take the first result because everything on the Web is true.

Stephen E Arnold, November 13, 2017

Written by Stephen E. Arnold · Filed Under News, Search | Comments Off on Searx: Another Privacy Oriented Web Search System

Ichidan Simplifies Dark Web Searches

November 10, 2017

Now there is an easier way to search the Dark Web, we learn from a write-up at Cylance, “Ichidan, a Search Engine for the Dark Web.” Cybersecurity pro and writer Kim Crawley informs us:

Ichidan is a search engine for looking up websites that are hosted through the Tor network, which may be the first time that’s been done at this scale. Websites on Tor usually have the .onion top level domain and you typically need a web browser with the Tor plugin or Tor’s own configured web browser in order to access them. … The search engine is less like Google and more like Shodan, in that it allows users to see technical information about .onion websites, including their connected network interfaces, such as TCP/IP ports.

Researchers at BleepingComputer explored the possibilities of this search engine. They were able to reproduce OnionScan’s findingss on the shrinkage of the Dark Web—the number of Dark Web services decreased from about 30,000 in April 2016 to about 4,400 not quite a year later (so by about 85%). Researchers found this alarming capability, too:

BleepingComputer was also able to use Ichidan to find a website which a lot of exposed ports, including OpenSSH, an email server, a Telnet implementation, vsftpd, and an exposed Fritzbox router. That sort of information is very attractive to cyber attackers. Using Ichidan is a lot easier than command line pentesting tools, which require more specific technical know-how.

Uh-oh. Crawley predicts that use of Icihan will grow as folks on both sides of the law discover its possibilities. She advises anyone administering a .onion site to strengthen their cyber defenses posthaste, “if they want to survive.”

Cynthia Murrell, November 10, 2017

Written by Stephen E. Arnold · Filed Under cybersecurity, Dark Web, News, Search | 1 Comment

Reddit Search Improves with Lucidworks

November 10, 2017

YouTube might swallow all of your free time with videos, but Reddit steals your entire life with videos, plus images, GIFS, posts, jokes, and cute pictures of doggos, danger noodles, trash pandas, and floofs. If you do not know what those are, then shame on you. If you are a redditor, then you might have noticed that the search function stinks worse than a troll face. According to TechCrunch, Reddit has finally given their search function a facelift, “Reddit Teams With Lucidworks To Build New Search Framework.”

Reddit has some serious stats when it comes to user searches and postings. The online discussion platform has more than 500 million users, generates 5 million comments, and 40 million searches are conducted each day. While one of Reddit’s search challenges is dealing with the varied content, another is returning personalized search results without redactors having to explicitly write them in the search box.

Reddit’s poor search performance is legendary and its head honchos wanted to improve it, but trying to find the time to fix it was a problem. That is why they hired Lucidworks to do the job for them:

Caldwell said that the company went with the Lucidworks Fusion platform because it had the right combination of technology and the ability to augment his engineering team, while helping search to continually evolve on Reddit. Buying a tool was only part of the solution though. Reddit also needed to hire a group of engineers with what Caldwell called “world class search and relevance engineering expertise.” To that end, he has set up a 30-person engineering search team devoted to maximizing the potential of the new search platform.

Lucidworks currently remains in charge of fixing Reddit’s search issues, but eventually, Reddit will take over. Within a few searches for danger noodle, floof, and doggo not only have more accurate results, but you can learn the aww language lingo through the results

Whitney Grace, November 10, 2017

Written by Stephen E. Arnold · Filed Under News, Search, Technology, Video | Comments Off on Reddit Search Improves with Lucidworks

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Google and Amping the Pressure in the Ad Fire Hose

Neural Network Revamps Search for Research

No More International Google Searches

Big Data and Search Solving Massive Language Processing Headaches

Google Maps Misses the Bus

The Worlds Wealthiest People Should Fear Big Data

Google Relevance: A Light Bulb Flickers

Searx: Another Privacy Oriented Web Search System

Ichidan Simplifies Dark Web Searches

Reddit Search Improves with Lucidworks

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta