A Librarian Looks at Google Dorking

August 24, 2020

In order to find solutions for their jobs, many people simply conduct a Google search. Google searching for solutions is practiced by teachers to executives to even software developers. Software developers spend an inordinate amount of their time searching for code libraries and language tutorials. One developer named Alec had the brilliant idea to create “dorking.” What is dorking?

“Use advanced Google Search to find any webpage, emails, info, or secrets

cost: $0

time: 2 minutes

Software engineers have long joked about how much of their job is simply Googling things

Now you can do the same, but for free”

Dorking is free! That is great! How does it work? Dorking is a tip guide using Boolean operators and other Google advanced search options to locate information. Dorking, however, does need a bit of coding knowledge to understand how it works.

Most some of these tips can be plugged into a Google search box, such as finding similar sites and find specific pages that must include a phrase in the Title text. Others need that coding knowledge to make them work. For example finding every email on a Web page requires this:

image

Yep, dorking for everyone.

After a few practice trials, these dorking tips are sure to work for even the most novice of Googlers. It will also make anyone, not just software developers, appear like experts. As a librarian, why not assign field types and codes, return Boolean logic, and respect existing Google operators. Putting a word in quotes and then getting a result without the word is — how should I frame it. I know — dorky.

Whitney Grace, MLS, August 24, 2020

Surprising Google Data

August 20, 2020

DarkCyber is not sure if these data are accurate. We have had some interesting interactions with NordVPN, and we are skeptical about this outfit. Nevertheless, let’s look beyond a dicey transaction with the NordVPN outfit and focus on the data in “When Looking for a VPN, Chinese Citizens Search for Google.”

The article asserts:

New research by NordVPN reveals that when looking for VPN services on Baidu, the local equivalent of Google, the Chinese are mostly trying to get access to Google – in fact, 40,35% of all VPN service-related searches have to do with Google. YouTube comes second on the list, accounting for 31,58% of all searches. Other research by NordVPN has shown that YouTube holds the most desired restricted content, with 82,7% of Internet users worldwide searching for how to unblock this video sharing platform.

If valid, these data suggest that Google’s market magnetism is powerful. Perhaps a type of quantum search entanglement?

Stephen E Arnold, August 20, 2020

SlideShare: Some Work to Do

August 12, 2020

DarkCyber noted “Scribd Acquires Presentation Sharing Service SlideShare from LinkedIn.” In 2004, one could locate presentations on Google by searching for the extension ppt and its variants. In 2006, SlideShare became available. Then something happened. PowerPoints became more difficult to locate. When an online search pointed to a PowerPoint deck, the content was:

  1. Marketing fluff
  2. Incorrectly rendered with weird typography and wonky graphics
  3. Corrupted files.

What about today? DarkCyber’s most recent foray into the slide deck content wilderness produced zero; for example, SlideShare search produced identical pages of search results. The query retrieved slide decks on unrelated topics. Even worse, a query would result in SlideShare’s sending email upon email pointing to other slide decks. The one characteristic of these related slide deck was/is that they were unrelated to the information we sought.

There are online presentation services. There are open source presentation tools like SoftMaker’s. There is the venerable Keynote which never quite converts a PowerPoint file correctly.

Is there a future in a searchable collection of slide decks? In theory, yes. In reality, the cost of finding, indexing, and making searchable presentations faces some big hurdles; for example:

  1. Many organizations — for example, DARPA — standardize on PDF file formats. These are okay, but indexing these can be an interesting challenge
  2. Some presenters put their talks in the cloud, hoping that an Internet connection will allow their slides to display
  3. The Zoom world puts PowerPoints and other presentation materials on the user’s computer, never to make it into a more publicly accessible repository.

Like the dream of collecting conferences, presentations, and poster sessions, some content remains beyond the reach of researchers and analysts. The desire to get anyone looking for a slide deck to subscribe to a service gives operators of this service a chance to engage in spreadsheet fever. Here’s how this works? If there are X researchers, and we get Y percent of them. We can charge Z per year? By substituting guesstimates for the variables, the service becomes a winner.

The reality is that finding information in slide decks is more difficult today than it was in 2004. Access to information is becoming more difficult. DarkCyber would like to experience a SlideShare with useful content, more effective search and retrieval, and far less one page duplicates of ads for books.

Someday. Maybe?

Stephen E Arnold, August 12, 2020

NetDocuments Employs BA Insight Tech for Enterprise Search

August 10, 2020

For a secure, cloud-based data solution, many law firms, legal departments, and compliance teams turn to NetDocuments. Now the platform has adopted technology from a familiar name to simplify its clients’ access to information. A post at PRWeb reveals, “NetDocuments Introduces NetKnowledge Enterprise Search Powered by BA Insight.” We find it interesting that the 16-year-old BA Insight is licensing its askable-knowledge system to create the new tool, NetKnowledge. The press release describes the system’s advantages:

“Eliminate Downloading and Indexing Data for Search: No longer does content within NetDocuments need to be downloaded and indexed to be part of an organization’s enterprise search. Simply search within the NetDocuments platform, and NetKnowledge will find relevant data–along with information from other sources —and present it to users.

“Enforce Access Controls on Sensitive Information: Sensitive information may need to be restricted to certain individuals, but that data also needs to be available to others via enterprise search. NetKnowledge respects data restriction policies at the source and will only present data to individuals with proper access rights.

“Manage Large and Disparate Data Sets Across the Organization: NetKnowledge helps organizations bring all its data together to form a single source of truth, so users do not have to perform multiple searches in different places to get the information they need.”

Founded in 2004, BA Insight is based in Boston, Massachusetts. The company is dedicated to making information easier to find for organizations of all stripes. NetDocuments is headquartered in Lehi, Utah. The company was founded in 1999 and acquired by Clearlake Capital Group in 2017.

Cynthia Murrell, August 10, 2020

Search Engines: Plumbing Becomes a Thing Again

August 10, 2020

Two search related items.

The first is Hndex. If you want to locate articles posted to HackerNews, a tech-oriented headline aggregation site, you have an option. This is an example of what might be labeled a “site specific search” solution: One site, search it. Navigate to https://hndex.org and plug in a search term. We entered a query for “enterprise search” and retrieved on point results. The comments are available; however, these are not indexed. Click the “cached” button, and you can view the original article. Click the “comments” button and you can view the comments. HackerNews provides its own search service, which is weirdly located at the bottom of the page. DarkCyber will reserve further comments until we have experimented with the system for a few days.

The second is Infinity Search, another metasearch engine positioned as a free Web search system. DarkCyber finds metasearch engines interesting, but these often pretend to be running their own crawlers. To Infinity Search’s credit the company states:

When you search for something on our site, we take the results from other search engines and our own indexes, organize it, and display it directly to you without logging any information about you.

Metasearch systems have to deduplicate results lists and find a way to remain in the good graces of companies running primary Web crawlers. Disclaimer: My son worked for Vivisimo (now the heart and soul of one of IBM’s marketing confections. He has moved to other adventures, but I remember our talks about the issues metasearch presents. For example, latency, screwed up query interpolation, and wonky deduplication which deduplicates useful results out of the results list. I think Vivisimo lives on in Yippy.com, but I am not a fan of metasearch systems which recycle others’ indexes and remain vulnerable to partners who pull out of deals, thus putting a dent in results.

Stephen E Arnold, August 10, 2020

Why Enterprise Search Remains a Problem

August 8, 2020

I read “Let’s Build a Full-Text Search Engine.” The write up does a reasonable job of walking through the basics of building a search engine. The focus is full text search, but I think in terms of an organization and its content. As a result, the system summarized will not handle video, images, and other types of content. The code examples are clear, and I liked the straightforward approach.

However, there is a potential bump in the information superhighway. Here’s a Venn diagram from the article. Notice the work you have to do to find documents with small, wild cat?

image

If I search for “smith”, “order”, “tile” — I want only the documents in which the Boolean AND is applied by default. I want Smith’s orders for tile. I have to call the person. I don’t want to go on scavenger hunt. (There are other minor nits too, but the AND’ing thing is huge to me.)

Stephen E Arnold, August 6, 2020

Do Not Gamble. Own the Casino. The Google Way?

August 3, 2020

I read “Google’s Top Search Result?” What a surprise? No, not the fact that Google present Google-centric results at the top of mobile search results. The surprise is that until July 28, 2020, no one knew that Google’s magical algorithmic, math-is-objective, super duper relevance scooper got more Google goodies than any other “content producer.” Amazing.

In the good old days of big desktop anchor computers and monitors, there was screen real estate. Google filled the screen with objective results and, of course, some advertisements.

That was then; this is now. Mobile screens are mostly squint-generators. In order to be seen and generate clicks, the Google has to work overtime.

The challenges include:

  • Traffic, eyeballs, and individuals who will go ga-ga over that which is Googley.
  • Sizzle that will burn the greedy fingertips of competitors who want to be placed front and center.
  • Useful information for consumers. Yep, what Google displays eliminates the need to think. Advertisers who want to be listed on a Google Map. Something can be worked out.

A number of organizations have groused about Google’s magical algorithmic, math-is-objective, super duper relevance scooper.

What’s fascinating is that it has taken two decades for some people to understand the wisdom embedded in the observation, “Own the casino.”

Pretty good advice and someone at the GOOG took it.

Stephen E Arnold, August 3, 2020

Search and Predicting Behavior

August 3, 2020

DarkCyber is interested in predictive analytics. Bayesian and other “statistical methods” are a go-to technique, and they find their way into many of the smart software systems. Developers rarely explain that systems share many features and functions. Marketers, usually kept in the dark like mushrooms, are free to formulate an interesting assertion or two.

I read “Google Searches During Pandemic Hint at Future Increase in Suicide,” and I was not sure about the methodology. Nevertheless, the write up provides some insight into what can be wiggled from Google search data.

Specifically Columbia University experts have concluded that financial distress is “strongly linked to suicide.”

Okay.

I learned:

The researchers used an algorithm to analyze Google trends data from March 3, 2019, to April 18, 2020, and identify proportional changes over time in searches for 18 terms related to suicide and known suicide risk factors.

What algorithm?

The method is described this way:

The proportion of queries related to depression was slightly higher than the pre-pandemic period, and moderately higher for panic attack.

Perhaps the researchers looked at the number of searches and noted the increase? So comparing raw numbers? Tenure tracks and grants await! Because that leap between search and future behavior…

Stephen E Arnold, August 3, 2020

Untangling Streaming: Responses to a Huge Web Search Fail

July 22, 2020

More and more users rely on a patchwork of internet streaming services for their video entertainment. Anyone who subscribes to several of these knows the time-wasting tedium of combing through different menus, each with a different UI, just to find something to watch. With even more proprietary streaming services on the horizon, it seems that problem is poised to grow. However, there are at least two apps that provide viable solutions—Reelgood and JustWatch. “These Two Underdog Apps Have Solved Streaming TV’s Biggest Headache,” Fast Company observes. Writer Jared Newman reports:

“Instead of making you bounce between disparate apps, both services can tell you what’s available on practically any streaming service. You can then add movies and shows to a watch list, get more suggestions based on your viewing habits, and even load their apps on your television to use as a centralized streaming menu. Compared to the app overload of most streaming devices, the universal guides offered by JustWatch and Reelgood seem like the ideal way to watch TV in the streaming era.”

Sounds helpful. But why does it take “underdog” apps to do what common sense suggests devices like Roku and Amazon Fire TV should already offer? There are several business reasons, we’re told, like Netflix’s resistance to the aggregation of its content or the fact that streaming services pay for placement on those platforms. As for Reelgood and JustWatch, they each have their own business models. It comes as no surprise that each involves user data. Newman writes:

“JustWatch says that … about 70% of its revenue comes from targeting users with movie trailers based on their viewing habits. For every movie or TV show users click on, JustWatch builds up a taste profile, then separates users into anonymized groups based on what they might like. Movie studios such as Universal and Paramount then give JustWatch a budget to target users with relevant video trailers on sites like Facebook and YouTube. … Reelgood, meanwhile, started from more of a Silicon Valley mindset of building up the product first and finding ways to monetize it later. Sanderson, a former ad product manager at Facebook, initially thought that would take the shape of recommendation-style targeted ads within the service, but lately the company’s been leaning more into selling access to its data.”

See the write-up for more on the business considerations and plans for each of these entities, big and small. There are other notable players in this arena, including TV Time, Simkl, Watchworthy, Wander, and VUniverse. It will be interesting to see where the market, and the technology, go from here.

Cynthia Murrell, July 22, 2020

Google Alerts: Lost in Cyber Space?

July 16, 2020

Check out these headlines from my Google Alert for the phrase “enterprise search”.

image

The Covid angle is back. Who publishes this type of news? An outfit called Daily Research Chronicles. An outstanding SEO outfit? Maybe?

And how about these high relevance links to my enterprise search alert?

image

Silicon steel, analog cameras, and dental film.

Sure, the alerts are a free service. Sure, an item every week or three points to something relevant.

But the spoofiness of the service from outfits like Daily Research Chronicles begs me to ask?

What about those quality and relevance algorithms, dearest Google?

Stephen E Arnold, July 16, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta