USAFacts Centralizes Access to Data on Government Spending

May 12, 2017

Former Microsoft CEO Steve Ballmer’s recent project was inspired by his wife, Connie, who wished him to practice more philanthropy. Wouldn’t it help to know what our government is already doing  with its (our) money, he wondered? Out of this question has sprung USAFacts, a website that serves up “federal, state, and local data from over 70 government sources.” I appreciate the presentation, which ties data to four specific directives embedded in the Preamble to our Constitution. For example, the heading Establish Justice and Ensure Domestic Tranquility leads to stats on Crime and Disaster, Safeguarding Consumers and Employees, and Child Safety and Social Services. Tying such information to our founding document will prompt many to consider these data points in a more thoughtful way.

The site’s About page describes its team’s approach and methodology. The effort has not been easy; we’re told:

With his business background, Steve searched for solid, reliable, impartial numbers to tell the story… but eventually realized he wasn’t going to find them. He put together a small team of people – economists, writers, researchers – and got to work.

We soon discovered that dealing with something as big and complex as government – with its more than 90,000 jurisdictions and 23 million employees – required an organizing framework. What better place to look than the Constitution, and, more specifically, the preamble to the Constitution? … While we don’t make judgments about policy, we all agree on the broad purposes of government as laid out in the preamble to the Constitution.

Still, in beta, USA Facts is partnering with academic institutions like the Stanford Institute for Economic Policy Research, the Penn Wharton Budget Model, and Lynchburg College. They are working to document their process and controls, and plan to have their methods reviewed by a “prominent” accounting firm for accuracy. We look forward to watching this project grow.

Cynthia Murrell, May 12, 2017

Sci-Tech Queries Versus Google Queries

May 8, 2017

I saw a reference to an academic paper. Its title is “Academic Search in Response to Major Scientific Events.” The main point is that Web searchers, based on Google Trend data, are “bursty” and demonstrate “surging interests.” The sci-tech crowd like college professors in search of tenure use a “gradually growing search pattern.”

My thought when I read the write up was that more than half of Google’s online traffic comes from mobile devices; for example, “Where can I buy a pizza?”-type queries. The academics, based on the information in the write up, paw through journal literature using more traditional methods. I have used the phrase “boat anchor” computers to characterize “real” academic research.

The write up does not address mobile queries, which seems to me to be important. I am fuzzy on how Google hooks mobile queries delivered via voice, apps, and icons on a pixel with its Google Trend data. And about that Google Trend data. Is it accurate as Google works overtime to distribute ads in more places as users’ displays on mobile devices are small compared to the boat anchor gizmos.

The other point I hoped would be addressed is the role of personalization in Google queries. In this week’s HonkinNews, we give the example of searching for “filters.” Google’s smart and invasive system delivered Bloom filters, not water filters. We wanted information about water filters. Helpful.

My thought is more focused data collection is necessary by the researchers. Three word queries. Hasn’t that been the norm for a while?

Stephen E Arnold, May 8, 2017

Revealing the Google Relevance Sins

May 2, 2017

I was surprised to read “Google’s Project Owl”. Talk about unintended consequences. An SEO centric publication reported that Google was going to get on the stick and smite fake news and “problematic content.” (I am not sure what “problematic content” is because I think a person’s point of view influences this determination.”

The write up states in real journalistic rhetoric:

Project Owl is Google’s internal name for its endeavor to fight back on problematic searches. The owl name was picked for no specific reason, Google said. However, the idea of an owl as a symbol for wisdom is appropriate. Google’s effort seeks to bring some wisdom back into areas where it is sorely needed.

Right, wisdom. From a vendor of content wrapped in pay to play advertising and “black box” algorithms which mysteriously have magical powers on sites like Foundem and the poor folks who were trying to make French tax forms findable.

My view of the initiative and the real journalistic write up is typical of what folks in Harrod’s Creek think about Left Coast types:

  1. The write up underscores the fact that Google’s quality function, which I wrote about in my three Google monographs, does not work. What determines the clever PageRank method? Well, a clever way to determine a signal of quality. Heh heh. Doesn’t work.
  2. Google is now on the hook to figure out what content is problematic and then find a way to remove that content from the Google indexes. Yep, not one index, but dozens. Google Local (crooked shops, anyone), YouTube (the oodles of porn which is easily findable by an enterprising 12 year old using the Yandex video search function), news (why are there no ads on Google News? Hmmm.), and other fave services from the GOOG.
  3. Relevance is essentially non existent for most queries. I like the idea of using “authoritative sources” for obscure queries. Yep, those Lady Gaga hits keep on rocking when a person searches for animal abuse and meat dresses.

Let me boil this down.

If a person relies on a free, ad supported Web search system for information, you may be getting a jolt from which your gray matter will not recover.

What’s the fix? I know the write up champions search engine optimization and explaining how to erode relevance for a user’s online query. But I am old fashioned. Multiple sources, interviews, reading of primary sources, and analytical thinking.

Hey, boring. Precision and recall are sure less fun than relaxing queries to amp up the irrelevance output.

Tough.

Stephen E Arnold, May 2, 2017

Keyword Search vs. Semantic Search for Patent Seekers

April 26, 2017

The article on BIP Counsels titled An Introduction to Patent Search, Keyword Search, and Semantic Searches offers a brief overview of the differences between keyword, and semantic search. The article is geared towards inventors and technologists in the early stages of filing a patent application. The article states,

If an inventor proceeds with the patent filing process without performing an exhaustive prior art search, it may hamper the patent application at a later point, such as in the prosecution process. Hence, a thorough search involving all possible relevant techniques is always advisable… Search tools such as ‘semantic search assistant’ help the user find similar patent families based on freely entered text.  The search method is ideal for concept based search.

Ultimately the article fails to go beyond the superficial when it comes to keyword and semantic search. One almost suspects that the author (BananaIP patent attorneys) wants to send potential DIY-patent researchers running into their office for help. Yes, terminology plays a key role in keyword searches. Yes, semantic search can help narrow the focus and relevancy of the results. If you want more information than that, you may want to visit the patent attorney. But probably not the one that wrote this article.

Chelsea Kerwin, April 26, 2017

Google Search Quality: Heading South?

April 25, 2017

Forbes, the capitalists tool, ran this article or sponsored content on April 17, 2017: “Is Google’s Search Quality Starting to Decline?” My first reaction was the question, “Compared to what? Precision and recall scores? Other free, ad supported Web search systems? Looking up information in a commercial database?

My questions were just off base or from another dimension.

The capitalist tool does not fool around when it comes to explaining why something is good or bad. The capitalist tool walks like Commodore Vanderbilt; that is, somewhat unsteadily in his dotage.

I learned from the capitalist tool:

Individual users, companies and organizations, and even governments have stepped up to blame Google for not providing quality results.

The “quality” idea comes from Search Engine Land, a publication which embraces Web search and search engine optimization. That orientation is okay with me, but it has very little to do with relevance. There is that annoying precision calculation. Plus, there is the equally annoying recall calculation. Some die hards actually create a statistically valid sample and attempt to determine if results from queries delivered the information the person running the query expected. There are library schools and researchers who worry about these silly methods. Not so much with the SEO crowd.

Back to the argument in the capitalist tool. I highlighted this passage:

users have always had the ability to report offensive auto complete suggestions, but now, Google has made the process more visible and immediate. In an even bigger push, Google has employed more than 10,000 independent contractors to serve as “quality raters,” responsible for identifying and flagging inaccurate and offensive material including fake news, for various search queries.

Ah, Google’s quality scores determined by Google’s smart software and its well crafted algorithms are no longer enough? Well, that’s a surprise. I thought the fake news, the mismatched ads, and the relaxation of queries to make that ad inventory shrink more rapidly were not much of an issue. Well, there is that push back from outfits like AT&T, but what’s a few cancelled ads from a minnow like AT&T.

The capitalist tool knows where it’s next Whopper is coming from. I circled this statement:

It’s important to realize just how sophisticated Google is, and how far it’s come from its early stages, as well as the impossibility of having a “perfect” search platform. Humans are flawed creatures, and our actions are what are dictating the shape of search. We can patchwork some of these problems, but the Google search quality crisis won’t disappear overnight, and can’t be blamed for being anything more than the byproduct of a sufficiently sophisticated machine designed to serve us.

Interesting idea—blame.

My takeaway from this scintillating analysis is that the capitalist tool needs to do a few queries about “quality”. Just a thought. By the way, the databases to use will not be part of the Google.com result set. Google partitions its indexes so that a research has to run queries across different Google silos. Also, commercial databases are likely to provide more comprehensive results from sources Google does not index. Hey, who cares about this precision and recall stuff when writing about offensive answers to queries, Google’s auto complete mechanism, rich snippets, and popularity?

Not too many at Forbes I surmise. Maybe SEO is search to these smart people who can demystify SEO and mystify information retrieval.

Stephen E Arnold, April 25, 2017

HonkinNews for April 18, 2017 Now Available

April 18, 2017

From the friendly skies of rural Kentucky, this week’s HonkinNews talks about the benefits of a visit to Louisville, Kentucky. Injuries are possible. HonkinNews report that a mid tier consulting firm has decided that people do not search. When you look for information online, you really “insight.” Yep, that sounds pretty crazy to Beyond Search as well. Even more startling are the companies the thrashing consulting firm identifies as leaders in “insight.” Spoiler: Recorded Future, Palantir Technologies, and other companies of this ilk are not included. Why? Insight means enterprise search. HonkinNews also take a quick look at what we call the “high school science club disorder” or HSSCD. Although not on the list of official medical conditions, we report on some striking parallels between Stephen E Arnold’s high school science club in 1958 and Google’s response to allegations from the US Department of Labor about Google’s compensation plan. From the Beyond Alexa service, HonkinNews recycles some information about must-use Amazon Alexa skills. Fancy some Eastern philosophy or words from fashionistas. You will learn what to have Alexa deliver for your auditory delight. A technological news flash about pizza adds flavor to this week’s show. You will want to use DRU to get your slice. No, DRU is not based on “drool”, although one of the Beyond Search team does droll when someone mentions pizza. DRU is a Domino Robotic Unit. Yummy. HonkinNews speculates about a rumored “new” functions for those who write using Microsoft Word. If you like Windows 10’s start menu ads, you will love LinkedIn information displayed next to that memo you are trying to finish so you can leave early. View the program to find out if Clippy will return. You can view the program here.

NB. One viewer of the program wanted to know why the program is in black and white and is pretty lousy. The reason is that we film on a Bell & Howell camera. We are in rural Kentucky, and we use what we have. Enough said. You can “insight” old fashioned eight mm film too.

Kenny Toth, April 18, 2017

Yikes! Google Skeptics Amp Up

April 6, 2017

Beyond providing search, email, office suite services, and not doing any evil, another of Google’s goals is to ramp up its search speed.  Media Post shares via its Search Marketing Daily column that “Search Experts Skeptical Of Google Amp Updates.”  Google’s Accelerated Mobile Project (AMP) might make it easier to access the original URL from search results, companies who rely on mobile search for marketing and advertising are not happy with it.

AMP reduces a Web site’s functionality by caching the content and in search results it prioritizes AMP.  Companies are losing potential clients when they are unable to display their wares in the growing mobile market.  It also does not bode well for Google, which draws a significant profit from ad revenue.  Why would Google hinder its own clients?  It is all in an effort to make the end user’s Google mobile search experience better.

The clients want to forgo the AMP experience:

‘If load times and user experience is really the issue here, then Google should prioritize based on load speed,’ wrote Yee Cheng Chin. ‘An AMP site with tons of images isn’t necessarily better than a simple minimal static page Web site served over CDN. I also want to use Google to look for relevant content, not whether a website conforms to Google’s own proprietary standards when searching.’ Chin, along with others, simply want to know how to disable the feature.

End users are frustrated as well because AMP changes the original URL’s content and does not always show what would be available on a full page.

The load times might be fast, pages are easier to read, but original intent and content are lost.  What is the solution?  Wait for technology to be upgraded enough to handle the original Web pages and bigger screens.

Whitney Grace, April 6, 2017

Palantir Technologies: 9000 Words about a Secretive Company

April 3, 2017

Palantir Technologies is a search and content processing company. The technology is pretty good. The company’s marketing pretty good. Its public profile is now darned good. I don’t have much to say about Palantir’s wheel interface, its patents, or its usefulness to “operators.” If you are not familiar with the company, you may want to read or at least skim the weirdo Fortune Magazine Web article “Donald Trump, Palantir, and the Crazy Battle to Clean Up a Multibillion Dollar Military Procurement Swamp.” The subtitle is a helpful statement:

Peter Thiel’s software company says it has a product that will save soldiers’ lives—and hundreds of millions in taxpayer funds. The Army, which has spent billions on a failed alternative, isn’t interested. Weill the president and his generals ride to the rescue?”

The article, minus the pull quotes, is more than 9000 words long. The net net of the write  up is that changing the US government’s method of purchasing goods and services may be tough to modify. I used to work at a Beltway Bandit outfit. Legend has it that my employer helped set up the US Department of the Navy and many of the business processes so many contractors know and love.

One has to change elected officials, government professionals who operate procurement processes, outfits like Beltway Bandits, and assorted legal eagles.

Why take 9000 words to reach this conclusion. My hunch is that the journey was fun: Fun for the Fortune Magazine staff, fun for the author, and fun for the ad sales person who peppered the infinite page with ads.

Will Palantir Technologies enjoy the write up? I suppose it depends on whom one asks. Perhaps a reader connected to IBM could ask Watson about the Analyst’s Notebook team. What are their views of Palantir? For most folks, my thought is that the Palantir connection to President Trump may provide a viewshed from which to assess the impact of this real journalism essay thing.

Stephen E Arnold, April 3, 2017

Seventeen Visions of the Future From Microsoft Researchers

March 31, 2017

Here’s a bit of PR from Microsoft that could pay off in many ways, should the company be wise enough to listen to these women. Microsoft’s blog posts, “17 for ’17: Microsoft Researchers on What to Expect in 2017 and in 2027.” As part of their Computer Science Education Week, the company shares 17 well-informed perspectives on the future of tech, presented by 17 talented researchers. On the way to introducing these insights, the post reminds us:

In this ‘age of acceleration,’ in which advances in technology and the globalization of business are transforming entire industries and society itself, it’s more critical than ever for everyone to be digitally literate, especially our kids. This is particularly true for women and girls who, while representing roughly 50 percent of the world’s population, account for less than 20 percent of computer science graduates in 34 OECD countries, according to this report. This has far-reaching societal and economic consequences.

Consequences like a worldwide shortage of qualified computer scientists, which could be eased by a surge of women entering the field. That’s why they call personnel management ”human resources,” after all.

We are pleased to see one particular researcher on the list, Sue Dumais, who happens to be an alum of the historic Bell Labs. Dumais now works as deputy managing director at Microsoft’s Redmond, Washington, lab. Her view for 2017 makes perfect sense—more progress in, and reliance upon, deep learning models. Among other things, she expects these models to continue improving internet search results. What about further down the road? Here’s Dumais’ vision:

What will be the key advance or topic of discussion in search and information retrieval in 2027?

The search box will disappear. It will be replaced by search functionality that is more ubiquitous, embedded and contextually sensitive. We are seeing the beginnings of this transformation with spoken queries, especially in mobile and smart home settings.  This trend will accelerate with the ability to issue queries consisting of sound, images, or video, and with the use of context to proactively retrieve information related to the current location, content, entities, or activities without explicit queries.

The post urges readers to share this list, in the hope that it will inspire talented kids of all genders to pursue careers in computer science.

Cynthia Murrell, March 31, 2017

Diffeo Incorporates Meta Search Technology

March 24, 2017

Will search-and-discovery firm  Diffeo’s recent acquisition give it the edge? Yahoo Finance shares, “Diffeo Acquires Meta Search and Launches New Offering.” Startup Meta Search developed a local computer and cloud search system that uses smart indexing to assign index terms and keep the terms consistent. Diffeo provides a range of advanced content processing services based on collaborative machine intelligence. The press release specifies:

Diffeo’s content discovery platform accelerates research analysts by applying text analytics and machine intelligence algorithms to users’ in-progress files, so that it can recommend content that fills in knowledge gaps — often before the user thinks of searching. Diffeo acts as a personal research assistant that scours both the user’s files and the Internet. The company describes its technology as collaborative machine intelligence.

Diffeo and Meta’s services complement each other. Meta provides unified search across the content on all of a user’s cloud platforms and devices. Diffeo’s Advanced Discovery Toolbox displays recommendations alongside in-progress documents to accelerate the work of research analysts by uncovering key connections.

Meta’s platform integrates cloud environments into a single keyword search interface, enabling users to search their files on all cloud drives, such as Dropbox, Google Drive, Slack and Evernote all at once. Meta also improves search quality by intelligently analyzing each document, determining the most important concepts, and automatically applying those concepts as ‘Smart Tags’ to the user’s documents.

This seems like a promising combination. Founded in 2012, Diffeo made Meta Search its first acquisition on January 10 of this year. The company is currently hiring. Meta Search, now called Diffeo Cloud Search, is based in Boston.

Cynthia Murrell, March 24, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta