Google Struggles with Indexing?

November 14, 2018

You probably know that Google traffic was routed to China. The culprit was something obvious. In this case, Nigeria. Yep, Nigeria. You can read about the mistake that provided some interesting bits and bytes to the Middle Kingdom. Yeah, I know. Nigeria. “A Nigerian Company Is in Trouble with Google for Re-Routing Traffic to Russia, China” provides some allegedly accurate information.

But the major news I noted here in Harrod’s Creek concerned Google News and its indexing. Your experience may be different from mine, but Google indexing can be interesting. I was looking for an outfit identified as Inovatio, which is a university anchored outfit in China. The reference to Inovatio in Google aimed me at a rock band and a design company in Slovenia. Google’s smart search system changed Inovatio to innovation even when I used quote marks. I did locate the Inovatio operation using a Chinese search engine. I was able to track Ampthon.com which listed Inovatio and provided the university affiliation to allow me to get some info about an outfit providing surveillance and intercept services to countries in need of this capability.

Google. Indexing. Yeah.

“Google News Publishers Complaining About Indexing Issues” highlights another issue with the beloved Google. I learned:

In the past few days there has been an uptick in complaints from Google News publishers around Google not indexing their new news content. Gary Illyes from Google did a rare appearance on Twitter to say he passed along the feedback to the Google News team to investigate. You can scan through the Google News Help forums and see a nice number of complaints. Also David Esteve, the SEO at the Spanish newspaper El Confidencial, posted his concerns on Twitter.

The good news is that the write up mentions that this indexing glitch is a known issue.

Net net: Many people with whom I speak believe that Google’s index is comprehensive, timely, and consistent.

Yeah, also smart because Inovatio is really innovation.

Stephen E Arnold, November 14, 2018

Written by Stephen E. Arnold · Filed Under Google, Indexing, News | 3 Comments

Indexing Matters: The Investment Sector Analysis

October 15, 2018

I read reports which explain why large monopolistic or oligopolistic companies alter the behavior of certain ecosystems. I don’t see that many because analysts are preoccupied with more practical matters; namely, their bonuses, appearances on Bloomberg TV or CNBC, and riding their hobby horses.

I read and then reread “Platform Giants and Venture Backed Startups.” The premise struck me as obvious. The whales of online are functioning like giant electromagnets. There companies pull traffic, attention, and money. At the same time, they emit beacons which are tuned to the inner ears of investors.

Image result for jello cubed dessert

Looks tasty but only semi organized. And from what is this confection fabricated? Answer: Cow hooves. Intellectual Jello, lovingly crafted to delight the eye.

The squeaks of these ultra high frequency waves alert those looking for big paydays to put their money into startups which do not compete head on with the outfits operating like electromagnets.

The “Platform Giant” write up assembles observations from a report which asserts the opposite; that is, big electromagnets do not have an impact on start ups and most investors.

Put that aside.

The core of the write up makes clear that indexing and classification make a difference. The idea is that if one classifies and marshals data, the classification creates a way to look at the data, the world, and in this particular case the way investments flow or do not flow.

What goes in “Internet software” becomes the trigger for the conclusion. Invest to compete against the Google? Not a good idea.

The question becomes, “Who does the indexing, classification, ontology, and related bits of the taxonomy?”

Indexing is important. But more important is the creation of the knowledge structure and the categories which will be used to chop, slice, and organize data for analysis.

Get the knowledge structure wrong and the flawed categorization creates findings that are probably misleading at best and just off base.

Who takes the time to work out the knowledge structure before training humans and smart software to assign metadata?

The write up suggests that humans (either with agenda or without, with expertise or not, or with a wonky knowledge superstructure or not) do.

Net net: Counting is verifiable. Pegging what to count may be more like organizing cubes of a gelatin dessert.

Stephen E Arnold, October 15, 2018

Written by Stephen E. Arnold · Filed Under Indexing, News | Comments Off on Indexing Matters: The Investment Sector Analysis

The Semantic Web: Technology Roadkill or a Roadside Snack?

September 24, 2018

I spotted a quote to note. Here it is:

The Semantic Web is as dead as last year’s roadkill.

The statement appears in “Whatever Happened to the Semantic Web?” The write up provides a run through of the starts and stops associated with making the Web into a more organized place.

I would point out that the state of the Semantic Web can be glimpsed in the TweetedTimes’ auto generated list of articles called “Semantic Search.” The collection of items focuses on a range of topics, but the thrust seems to be getting traffic for a Web site; for example, “How to Optimize Content for Semantic SEO.”

If you are an adherent of the Semantic Web, check out the included footnotes. I would point out that the Google has a number of Guha patents in its portfolio. I think the Semantic Web may be of interest to some at the online ad search giant.

Guha’s patents plus the work by Alon Halevy may suggest some interesting use cases for the mark up, triplet, smart agent system and methods.

Stephen E Arnold, September 24, 2018

Written by Stephen E. Arnold · Filed Under Google, Indexing, News | 3 Comments

Bing: No More Public URL Submissions

September 19, 2018

Ever wondered why some Web site content is not indexed? Heck, ever talk to a person who cannot find their Web site in a “free” Web index? I know that many people believe that “free” Web search services are comprehensive. Here’s a thought: The Web indexes are not comprehensive. The indexing is selective, disconnected from meaningful date and time stamps, and often limited to following links to a specified depth; for example, three levels down or fewer in many cases.

I thought about the perception of comprehensiveness when I read “Bing Is Removing Its Public URL Submission Tool.” The tool allowed a savvy SEO professional or an informed first time Web page creator to let Bing know that a site was online and ready for indexing.

No more.

How do “free” Web indexes find new sites? Now that’s a good question, and the answers range from “I don’t know” or “Bing and Google are just able to find these sites.”

A couple of thoughts:

Editorial or spidering policies are not spelled out by most Web indexing outfits
Users assume that if information is available online, that information is accurate
“Free” Web indexing services are not set up to deliver results that are necessarily timely (indexed on a daily basis) or comprehensive.

Bing’s allegedly turning off public url submissions is a small thing. My question, “Who looked at these submissions and made a decision about what to index or exclude from indexing?” Perhaps the submission form operated like a thermostat control in a hotel room?

Stephen E Arnold, September 18, 2018

Written by Stephen E. Arnold · Filed Under Indexing, Microsoft, News | Comments Off on Bing: No More Public URL Submissions

Semantic Struggles and Metadata

August 31, 2018

I have noticed the flood of links and social media posts about semantics from David Amerland. I found many of the observations interesting; a few struck me as a wildly different view of indexing. A recent essay by David Amerland “Snipers Use Metadata Much Like Semantic Search Does” caught the Beyond Search team’s attention.

Learn about “The Sniper Mind” at this link.

According to the story:

“There are two key takeaways here [about metadata and trained killers]: First, such skills are directly transferable in the business domain and even in most life situations. Second, in order to use their brain in this way snipers need training. The mental training and the psychological aids that are developed as a result of it is what I detailed…”

We must admit that it is a fresh metaphor: Comparing killers’ use of indexing with semantic search. In our experience with professional indexing systems and human indexers, the word “sniper” has not to our recollection been used.

Watch your back, your blindside, or ontology. Oh, also metaphors.

Patrick Roland, August 31, 2018

Written by Stephen E. Arnold · Filed Under Indexing, News | 1 Comment

Deindexing SEO Delivers Revenue Results

June 7, 2018

SEO is still an important aspect of the Google algorithm and other search engine crawlers. In my opinion, tweaking Web pages can result in a boost for content in some queries. I have a hunch that Google’s system then ignores subsequent tweaks. The Web master then has an opportunity to buy Google advertising, and the content becomes more findable. But that’s just an opinion.

The received wisdom is that the key to great SEO is to generate great content, which is the crawlers then index. Robin Rozhon shares that technical SEO has a big impact on your Web site, especially if it is large. In his article, “Crawling & Indexing: Technical SEO Basics That Drive Revenue (Case Study)” Rozhon discusses to maximize technical SEO, including deindexing benefits.

Rozhan ran an experiment where they deindexed over 400,000 of their 500,000 Web sites and 80% of their URLs, because search engines indexed them as duplicate category URLs. Their organic traffic highly increased. Before you deindex your Web sites, check into Google Analytics to determine how well the pages are doing.

Also to determine what pages to deindex collect data about the URLs and find out what the parameters are along with other data. Use Google Analytics, Google Search Console, Screaming Frog, log files, and other data about the URL to understand its performance.

Facets and filters are another important contribution to URLs:

“Faceted navigation is another common troublemaker on ecommerce websites we have been dealing with.Every combination of facets and filters creates a unique URL. This is a good thing and a bad thing at the same time, because it creates tons of great landing pages but also tons of super specific landing pages no one cares about.”

They also have pros and cons:

I learned this about “facets”:

Facets are discoverable crawlable and indexable by search engines;
Wait! Facets are not discoverable if multiple items from the same facet are selected (e.g. Adidas and Nike t-shirts).
Facets contain self-referencing canonical tags;

And what about filters?

Filters are not discoverable;
Filters contain a “noindex’ tag;
Filters use url parameters that are configured in Google Search Console and Bing Webmaster tools.

As a librarian, I believe that old school ideas have found their way into the zippy modern approach to indexing via humans and semi smart software.

In the end, consolidate pages and remove any dead weight to drive traffic to the juicy content and increase sales. Why did they not say that to begin with, instead of putting us through the technical jargon?

Whitney Grace, June 7, 2018

Written by Stephen E. Arnold · Filed Under Indexing, News | 2 Comments

Fake News May Be a Forever Feature

June 4, 2018

While the world’s big names in social media go on tour to tout the ways in which they are snuffing out fake news, the fake news machine keeps rolling along. Mark Zuckerberg and company can do all the testifying in Washington they want, but that does not mean the criminal element will just curl up and go away. They certainly aren’t going anywhere when there is money to be made and there is plenty of that, according to a surprising BoingBoing story, “It’s Laughably Simple to Buy Thousands of Cheap, Plausible Facebook Identities.”

According to the story:

“[F]or $13, a Buzzfeed reporter was able to buy the longstanding Facebook profile of a fake 23 year old British woman living in London with 921 friends and a deep, plausible dossier of activities, likes and messages. The reporter’s contact said they could supply 5,000 more Facebook identities at any time.”

The danger is that there is essentially no way to really stop this as bot makers get more sophisticated and adjust to Facebook and other social media outlets’ algorithm changes. Some experts even fear that this unstoppable tide of bots will have deadly consequences. We’ll keep watching this story, but don’t have a lot of faith things will get better any time soon.

Patrick Roland, June 4, 2018

Written by Stephen E. Arnold · Filed Under Indexing, News | Comments Off on Fake News May Be a Forever Feature

Are Auto Suggestions Inherently Problematic?

June 3, 2018

Politics is a dangerous subject to bring up in any social situation. My advice is to keep quiet and nod, then you can avoid loudmouths trying to press their agendas down your throat. Despite attempts to remain polite, the Internet always brings out the worst in people and The Sun shares how with a simple search engine function, “‘Trump Should Be Shot’ Google And Bing Searches For ‘Trump’ And ‘Conservatives’ Offer Disgusting Auto-Suggestions.”

Auto-complete is notorious for making hilarious mistakes and the same is with auto-suggest on search engines, but these end up to be more gruesome than a misspelling.

If you want to see some interesting suggestions, type “Trump should be…” into a blank search bar and the results are endless, including: shot, arrested, killed, in jail, arrested banned from Twitter (okay, the last one might be a little funny).

Typing in “conservatives need…” results in less derogatory terms, but the auto-suggestions include: to die, to go, a new party, and not apply.

Hmmm.

What creates these auto-suggestions?

“These are based on a number of factors including real-time searches, trending results, your location, and previous activity.The intuitive predictions change in “response to new characters being entered into the search box” explains Google. And the company also has its own set of “autocomplete policies” in case something untoward should pop up.Along with prohibiting predictions that contain sexually explicit, violent, and harmful terms, Google says it also removes hateful suggestions against groups and individuals. ‘We remove predictions that include graphic descriptions of violence or advocate violence generally,’ states the firm.”

Google and Bing deserve some credit for removing the slander from auto-complete, but sometimes they only do it when they are pushed. Trolls and bigots create these terms and it would be nice to see them scrubbed from auto-suggest, but it is near impossible. Hey, Bing and Google try scrubbing 4chan!

Whitney Grace, June 3, 2018

Written by Stephen E. Arnold · Filed Under Indexing, News, Search | Comments Off on Are Auto Suggestions Inherently Problematic?

Google: Excellence Evolves to Good Enough

May 25, 2018

I read “YouTube’s Infamous Algorithm Is Now Breaking the Subscription Feed.” I assume the write up is accurate. I believe everything I read on the Internet.

The main point of the write up seems to me to be that good enough is the high water mark.

I noted this passage, allegedly output by a real, thinking Googler:

Just to clarify. We are currently experimenting with how to show content in the subs feed. We find that some viewers are able to more easily find the videos they want to watch when we order the subs feed in a personalized order vs always showing most recent video first.

I also found this statement interesting:

With chronological view thrown out, it’s going to become even more difficult to find new videos you haven’t seen — especially if you follow someone who uploads at a regular time each day.

I would like to mention that Google, along wit In-Q-Tel, invested in Recorded Future. That company has some pretty solid date and time stamping capabilities. Furthermore, my hunch is that the founders of the company know the importance of time metadata to some of the Recorded Future customers.

What would happen if Google integrated some of Recorded Future’s time capabilities into YouTube and into good old Google search results.

From my point of view, good enough means “sells ads.” But I am usually incorrect, and I expect to learn just how off base I am when I explain how one eCommerce giant is about to modify the landscape for industrial strength content analysis. Oh, that company’s technology does the date and time metadata pretty well.

More on this mythical “revolution” on June 5th and June 6th. In the meantime, try and find live feeds of the Hawaii volcano event using YouTube search. Helpful, no?

Stephen E Arnold, May 25, 2018

Written by Stephen E. Arnold · Filed Under Advertising, algorithms, Google, Indexing, News | Comments Off on Google: Excellence Evolves to Good Enough

LightTag Helps AI Developers Label Training Data

May 16, 2018

The creators of LightTag are betting on the AI boom, we learn from TechCrunch’s post, “LightTag Is a Text Annotation Platform for Data Scientists Creating AI Training Data.” Built by a former Natural Language researcher for Citigroup, the shiny new startup hopes to assist AI developers with one of their most labor-intensive and error-prone tasks—labeling the data used to train AI systems. Since it is a job carried out by teams of imperfect humans, errors often abound. LightTag’s team-based workflow, user interface, and quality controls are designed to mitigate these imperfections. Writer Steve O’Hear cites founder Tal Perry as he reports:

“Perry says LightTag’s annotation interface is designed to keep labelers ‘effective and engaged’. It also employs its own ‘AI’ to learn from previous labeling and make annotation suggestions. The platform also automates the work of managing a project, in terms of assigning tasks to labelers and making sure there is enough overlap and duplication to keep accuracy and consistency high. ‘We’ve made it dead-simple to annotate with a team (sounds obvious, but nothing else makes it easy),’ he says. ‘To make sure the data is good, LightTag automatically assigns work to team members so that there is overlap between them. This allows project managers to measure agreement and recognize problems in their project early on. For example, if a specific annotator is performing worse than others’.”

For the organizations in certain industries like healthcare, law, and banking that simply cannot risk outsourcing the task, LightTag offers an on-premise version. The write-up includes a couple GIFs of the software at work, so check it out if curious. Though it only recently launched publicly, the beta software has been tried out by select clients, including these noteworthy uses: An energy company is using it to predict drilling issues at certain depths with data from oil-rig logs, and a medical imaging company has used it to label MRI-scan reports. We are curious to see whether the young startup will be able to capitalize on the current AI boom, as Perry predicts.

Cynthia Murrell, May 16, 2018

Written by Stephen E. Arnold · Filed Under Indexing, News | Comments Off on LightTag Helps AI Developers Label Training Data

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Google Struggles with Indexing?

Indexing Matters: The Investment Sector Analysis

The Semantic Web: Technology Roadkill or a Roadside Snack?

Bing: No More Public URL Submissions

Semantic Struggles and Metadata

Deindexing SEO Delivers Revenue Results

Fake News May Be a Forever Feature

Are Auto Suggestions Inherently Problematic?

Google: Excellence Evolves to Good Enough

LightTag Helps AI Developers Label Training Data

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta