Build an Alternative Google: How To Wanted

April 6, 2018

Hacker News presented an interesting question, “How would you build an internet scale web crawler?” We have been talking with companies which have developed Internet search systems that are not available for free Web search. Those conversations have produced some fascinating information. Some of the data will be included in my upcoming lecture for a government agency and then in my two presentations at the June 2018 Telestrategies ISS Conference in Prague.

What was interesting about this question was the few people responded. That is interesting because my team’s research for my new presentations on deanonymizing encrypted chat and deanonymizing digital currency transactions pivot on comprehensive Internet indexing. In fact, more companies are indexing the Internet content than at any time in the last 10 years.

The second issue the post triggered was a realization that only a handful of people jumped on the topic. This low response to the question in itself is interesting. With more activity in indexing, why aren’t more people helping out JustinGarrson? That’s a question worth thinking about.

Third, one of the responses to the Hacker News question was a pointer to the YaCy.net open source project. We once included this technology in our Internet Research for Law Enforcement training program. My recollection of the system is fuzzy, so I will get one of my team to take at look.

The final thought the Hacker News’ story triggered was, “Have people just accepted Bing, Google, Qwant, and a handful of metasearch systems as too dominant to challenge?” My view is that an opportunity exists to create a public facing Internet search and retrieval system. The reason? Outstanding alternatives to Bing, Google, and Qwant are available for those who qualify as customers and who are willing to pay the license fees.

My hunch is that just as enterprise search has coalesced around the open source Lucene/Solr technologies, free Web search has become “game over” because the ad supported model has won.

The problem, of course, is that a person looking for information usually does not realize that free Web search results are neither comprehensive, timely, or objective.

I hope individuals like JustinGarrison get the information needed to seize an opportunity in Internet search.

Stephen E Arnold, April 6, 2018

Written by Stephen E. Arnold · Filed Under News, Search, search engine | Comments Off on Build an Alternative Google: How To Wanted

Google and Search: More Churn Turmoil

April 4, 2018

I read “John Giannandrea, Head of Google’s Cornerstone Web-Search Unit, Steps Down.” I found the phrase “steps down” amusing. I think the wizard went to the Apple orchard. Since Mr. Giannandrea ran search, Google search has become less useful to me. Now I have to use multiple search systems to locate what I think are slam dunk queries. Nope. I get some pretty off the wall Google search results.

Two points jumped out of this story for me.

First, Google is forced to go back to one of the early Googlers from the AltaVista.com team. (I did some work for an outfit called PersimmonIT, which was a provider to AltaVista.com.) What’s interesting is that Jeff Dean is one of the really old Google guard. I know he’s bright and capable but that begs this question: “Aren’t their younger, smarter, and as or more capable professionals to get the over hyped Google artificial intelligence operation underway.” I can suggest at least one candidate from the DeepMind team. But, hey, who really cares?

Second, search must be pretty broken. The job has fallen to another old timer at the GOOG. Same question: “Aren’t there younger, more with it technical wizards who can handle the massively complex, software wrapped, advertising centric systems? (Yep, systems because there is “regular” search and “mobile” search. Two search systems are part of the index puzzle Google has built over the years.) Plus, do you remember Google’s “universal” search which, as aBearStearns’ legend has it, was cooked up over a weekend to deal with a PR problem triggered by an analyst’s report to which yours truly contributed. You know “universal.” One query gets you blog content, new Web sites, Google Books, Google Scholar, yada yada. That doesn’t exist and probably will never come to pass for some pretty good reasons. But saying something is just as good as delivering I assume.)

Net net: Google is now a mature company. The founders have distanced themselves from the legal troubles in which the company is mired. The company is caught in the Silicon Valley backlash. The Oracle Jave thing is a Freddie Kruger thing for the GOOG. Management change is a companion to the craziness which seems to characterize some units of the company.

I wonder if a query launched from a desktop computer will return on point results in the near future. I sure hope so.

Stephen E Arnold, April 4, 2018

Written by Stephen E. Arnold · Filed Under Google, Governance, Management, News, Search | Comments Off on Google and Search: More Churn Turmoil

Hidden Webs May Be a Content Escape Hatch

March 28, 2018

Beyond Search and the Dark Cyber research team discussed a topic which raised some concern among the team. Censorship may be nudging some individuals to the hidden Webs; for example, the Dark Web, i2p, ZeroWeb, etc.

In the wake of several US school shootings, the outcry of more control over gun sales has grown louder. Many organizations have begun to distance themselves from firearms related topics, like YouTube who removed all of their firearms content recently. The response has created a strange subculture, as we discovered in this recent NPR story, “Restricted by YouTube, Gun Enthusiasts are Taking Their Videos to Pornhub.”

According to the story:

“InRangeTV, which has some 144,000 subscribers on its YouTube channel, has chosen to publish videos on an adult website called Pornhub…InRangeTV also recently wrote on Facebook that it is defending “Why are we seeing continuing restrictions and challenges towards content about something demonstrably legal yet not against that which is clearly illegal?” It then posted links to YouTube videos on synthesizing meth and other illicit acts.”

This is an odd place for a freedom of speech battle to take place, but not completely. It seems right in line with something Larry Flynt would have perused. Conversely, as far right leaning content is going closer and closer toward the dark web (pornography is not the dark web, but it feels like that’s the direction this is heading) the dark web is beginning to try to take down YouTube with rightwing trolling at an extreme level. What all this means for average citizens is that search is going to get more complicated, no matter what you are hunting for.

We also noted that a site dedicated to off color content has become the new home for those who are interested in weaponry. We think the shift may be gaining momentum. How does one “find” these types of content? Perhaps encrypted chat or old fashioned word of mouth messaging. Worth watching this possible shift.

Patrick Roland, March 28, 2018

Written by Stephen E. Arnold · Filed Under News, Search, Security | Comments Off on Hidden Webs May Be a Content Escape Hatch

Million Short: A Metasearch Option

March 22, 2018

An interview at Forbes delves into the story behind Million Short, an alternative to Google for Internet Search. As concerns grow about online privacy, information accuracy, and filter bubbles, options that grant the user more control appeal to many. Contributor Julian Mitchell interviews Million Short founder and CEO Sanjay Arora in his piece, “This Search Engine Startup Helps You Find What Google Is Missing.” Mitchell informs us:

Founded in 2012, Million Short is an innovative search engine that takes a new and focused approach to organizing, accessing, and discovering data on the internet. The Toronto-based company aims to provide greater choices to users seeking information by magnifying the public’s access to data online. Cutting through the clutter of popular searches, most-viewed sites and sponsored suggestions, Million Short allows users to remove up to the top one million sites from the search set. Removing ‘an entire slice of the web’, the company hopes to balance the playing field for sites that may be new, suffer from poor SEO, have competitive keywords, or operate a small marketing budget. Million Short Founder and CEO Sanjay Arora shares the vision behind his company, overthrowing Google’s search engine monopoly, and his insight into the future of finding information online.

The subsequent interview gets into details, like Arora’s original motivation for creating Million Short—Search is too important to be dominated by a just few companies, he insists. The pair explores both advantages and challenges the company has seen, as well as a look to the future. See the article for more.

Cynthia Murrell, March 22, 2018

Written by Stephen E. Arnold · Filed Under Data, News, Search, search engine | Comments Off on Million Short: A Metasearch Option

Digital Antique Coca Cola Signs for Search

March 21, 2018

In a turn that is just about the most human thing we’ve ever heard, just as the world is on the cusp of an AI revolution, many are starting to look backward toward simpler times. We got a sideways glance at our fear of change from a PC Magazine story, “Download Your Entire Google Search History.”

The story is primarily about why on Earth anyone would want to see everything they have ever searched for. But it also touches on our desire for nostalgia in this lightning quick era:

“Users can now download their entire saved search history “to see a list of the terms you’ve searched for,” the company said. “This gives you access to your data when and where you want…For safety’s sake, don’t download past searches on a public computer—at the library, an Internet cafe, or even a friend’s house. Save the curiosity for home.”

This, oddly, isn’t the only place where nostalgia and AI are blending. Remember Nokia, the flip phone people? They are back and reintroducing a line of old school not-smart phones. On top of that, the company is dabbling in new tech like AI, which leads us to wonder where these two can possibly intersect. It’s an interesting move and one that will likely have antique hunters quivering.

Patrick Roland, March 21, 2018

Written by Stephen E. Arnold · Filed Under News, Search | Comments Off on Digital Antique Coca Cola Signs for Search

Google: Search Civility

March 21, 2018

Among the many fake news battles organizations like Facebook and Google are fighting, far right racist organizations. More often than not, hate groups are more clever at exposing flaws in algorithms than most companies give them credit for. Big tech is still trying to find solutions to these issues, but the problems keep cropping up, as we learned in a recent Phys.org story, “Google Under Fire for Anti-Semitic Search Results in Sweden.”

According to the story:

“A search on Google for the Holocaust showed an anti-Semitic blog post high up containing information about Swedish Jews. With their names, pictures and occupations listed, dozens of them were described in a humiliating and threatening manner, according to local media.

Searches for the neo-Nazi Nordic Resistance Movement’s propaganda website also appeared as news with “top stories from Nordfront.se.”

This isn’t the only occasion that algorithms have been infiltrated by offensive material. Take for example, the story of Facebook users who typed in “Videos of…” and had their search bar autofill with live sex acts. We are clearly still a long way from social media and big search cleaning up their act and once they do (if they do) we will then be in a controversial world of free speech violations.

What headaches will loom in the future?

Patrick Roland, March 21, 2018

Written by Stephen E. Arnold · Filed Under News, Search | Comments Off on Google: Search Civility

Search History: Flipping That Digital Stone May Reveal Interesting Things

March 12, 2018

The story is primarily about why on Earth anyone would want to see everything they have ever searched for. But it also touches on our desire for nostalgia in this lightning quick era:

“Users can now download their entire saved search history “to see a list of the terms you’ve searched for,” the company said. “This gives you access to your data when and where you want… For safety’s sake, don’t download past searches on a public computer—at the library, an Internet cafe, or even a friend’s house. Save the curiosity for home.”

A search history provides a useful pool of information about the user of Google search. Among the items of data which may be available are:

Time behavior signals; that is, when a person did searches and what the topics looked for in those time periods

Topic analysis; that is, what subjects did the searcher seek and how frequently were those topics queried

Link analysis; that is, what other sites were searched when a particular site was queries.

Other useful pieces of information can be extracted from a search history. When an analyst reviews the search history of the computers used by a group of people such as those individuals working on our studies of CyberOSINT, it is possible to develop a reasonable “snapshot” or “picture” of the topics we are investigating and the particular companies who products we are researching.

If you have not probed your search history, you might find that flipping over that digital rock may reveal some interesting insights.

Patrick Roland, March 12, 2018

Written by Stephen E. Arnold · Filed Under Google, News, Search | 1 Comment

A Step Forward but Museum Image Collections Remain a Search Challenge

March 8, 2018

For a few decades, art and history museums have been struggling with their online presences. The experience of seeing a Jpeg of a painting or sculpture is not the same as seeing it in person. That’s true. But there is one area where museums are holding a lot of valuable data and just now it’s starting to be searchable. We discovered this recently when the Metropolitan Museum of Art’s database “MetPublications.”

According to the page:

“MetPublications includes a description and table of contents for most titles, as well as information about the authors, reviews, awards, and links to related Met titles by author and by theme. Current book titles that are in-print may be previewed and fully searched online, with a link to purchase the book. The full contents of almost all other book titles may be read online, searched, or downloaded as a PDF.”

This includes over five hundred books about various exhibits that have spanned the last five decades. These slim volumes, usually released in conjunction with various exhibits, is fully searchable and a huge score for art lovers and historians. Previously, it was seen as too daunting and, potentially impossible. As far back as 2002 Computer Weekly was bemoaning the fact that museums had missed the digital boat. Turns out museums like the Met didn’t miss the boat, it’s just that their ship sails a little more slowly than the white knuckle world of Silicon Valley. Better late than never, we say.

Patrick Roland, March 8, 2018

Written by Stephen E. Arnold · Filed Under News, Rich media, Search | 1 Comment

The New York Times Wants to Change Your Google Habit

March 1, 2018

Sunday is a slightly less crazy day. I took time to scan “The Case Against Google.” I had the dead tree edition of the New York Times Magazine for February 25, 2018. You may be able to access this remarkable hybridization of Harvard MBA think, DNA engineered to stick pins in Google, and good old establishment journalism toasted at Yale University.

The author is a wildly successful author. Charles Duhigg loves his family, makes time for his children, writes advice books, and immerses himself in a single project at a time. When he comes up for air, he breathes deeply of Google outputs in order to obtain information. If the Google fails, he picks up the phone. I assume those whom he calls answer the ring tone. I find that most people do not answer their phones, but that’s another habit which may require analysis.

I worked through the write up. I noted three things straight away.

First, the timeline structure of the story is logical. However, leaving it up to me to figure out which date matched which egregious Google action was annoying. Fortunately, after writing The Google Legacy, Google Version 2.0, and Google: The Digital Gutenberg, I had the general timeline in mind. Other readers may not.

Second, the statement early in the write up reveals the drift of the essay’s argument. The best selling author of The Power of Habit writes:

Within computer science, this kind of algorithmic alchemy is sometimes known as vertical search, and it’s notoriously hard to master. Even Google, with its thousands of Ph.D.s, gets spooked by vertical-search problems.

I am not into arguments about horizontal and vertical search. I ran around that mulberry tree with a number of companies, including a couple of New York investment banks. Been there. Done that. There are differences in how the components of a findability solution operate, but the basic plumbing is similar. One must not confuse search with the specific technology employed to deliver a particular type of output. Want to argue? First, read The New Landscape of Search, published by Pandia before the outfit shut down. Then, send me an email with your argument.

Third, cherry picking from Google’s statements makes it possible to paint a somewhat negative picture of the great and much loved Google. With more than 60,000 employees, many blogs, many public presentations, oodles of YouTube videos, and a library full of technical papers and patents, the Google folks say a lot. The problem is that finding a quote to support almost any statement is not hard; it just takes persistence. Here’s an example:

We absolutely do not make changes 5to our search algorithm to disadvantage competitors.

Written by Stephen E. Arnold · Filed Under Feature, Google, Natural language processing, Search | Comments Off on The New York Times Wants to Change Your Google Habit

Amazon Beats Google for Holiday Advertising

February 28, 2018

When Google first started out, it earned the majority of its income from online ads. Online advertising used to be a surefire way for a regular income, but ad blockers, private browsing, and changes in the Internet of things have made Internet ad profits dwindle from dollars to cents. Google used to be on top, but now Amazon might be angling its way to the top. AdTechDaily published the article, “Amazon Leads The Crowd For Holiday Paid Search Advertising” how who dominated the 2017 holiday advertising market.

The data in the article is about Amazon UK, but the UK usually bears a strong resemblance to its American counterpart. Kantar Media conducted a survey about click rates for UK retailers in the 2017 holiday season. Amazon captured 8.8% of mobile ad clicks and 7.5% of desktop clicks. The data collection for the survey was quite enlightening:

Kantar Media found that 4,259 advertisers sponsored the keywords via text ads on mobile search, compared with 3,798 advertisers sponsoring the same keywords via desktop search. Of these, only seven retailers generated a click share higher than 1% for both desktop and mobile search text advertising. Together, these retailers captured a combined 26% share of all desktop clicks and 28% of mobile clicks on the 990 retail keywords studied.Online giant Amazon.co.uk held a significant lead ahead of Argos, the retailer in second place for both desktop and mobile search ad clicks. Currys, John Lewis and online marketplace AO.com completed the top five in the list.

Google is a competitive advertising marketplace, but large retailers have the deep pockets and large inventories to give them a run or a “click” for their money The retailers sponsor a higher number of keywords based on their inventories, so they can have bigger ad campaigns with bigger budgets. It also does not hurt to have well-known brands in their inventories. Luxury brands are always reliable.

Google is struggling with its online ads, shall we call this the Froogle Fumble?

Whitney Grace, February 28, 2018

Written by Stephen E. Arnold · Filed Under Advertising, Amazon, Google, News, Search | 1 Comment

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Build an Alternative Google: How To Wanted

Google and Search: More Churn Turmoil

Hidden Webs May Be a Content Escape Hatch

Million Short: A Metasearch Option

Digital Antique Coca Cola Signs for Search

Google: Search Civility

Search History: Flipping That Digital Stone May Reveal Interesting Things

A Step Forward but Museum Image Collections Remain a Search Challenge

The New York Times Wants to Change Your Google Habit

Amazon Beats Google for Holiday Advertising

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta