Keyword Search vs. Semantic Search for Patent Seekers

April 26, 2017

The article on BIP Counsels titled An Introduction to Patent Search, Keyword Search, and Semantic Searches offers a brief overview of the differences between keyword, and semantic search. The article is geared towards inventors and technologists in the early stages of filing a patent application. The article states,

If an inventor proceeds with the patent filing process without performing an exhaustive prior art search, it may hamper the patent application at a later point, such as in the prosecution process. Hence, a thorough search involving all possible relevant techniques is always advisable… Search tools such as ‘semantic search assistant’ help the user find similar patent families based on freely entered text.  The search method is ideal for concept based search.

Ultimately the article fails to go beyond the superficial when it comes to keyword and semantic search. One almost suspects that the author (BananaIP patent attorneys) wants to send potential DIY-patent researchers running into their office for help. Yes, terminology plays a key role in keyword searches. Yes, semantic search can help narrow the focus and relevancy of the results. If you want more information than that, you may want to visit the patent attorney. But probably not the one that wrote this article.

Chelsea Kerwin, April 26, 2017

Google Search Quality: Heading South?

April 25, 2017

Forbes, the capitalists tool, ran this article or sponsored content on April 17, 2017: “Is Google’s Search Quality Starting to Decline?” My first reaction was the question, “Compared to what? Precision and recall scores? Other free, ad supported Web search systems? Looking up information in a commercial database?

My questions were just off base or from another dimension.

The capitalist tool does not fool around when it comes to explaining why something is good or bad. The capitalist tool walks like Commodore Vanderbilt; that is, somewhat unsteadily in his dotage.

I learned from the capitalist tool:

Individual users, companies and organizations, and even governments have stepped up to blame Google for not providing quality results.

The “quality” idea comes from Search Engine Land, a publication which embraces Web search and search engine optimization. That orientation is okay with me, but it has very little to do with relevance. There is that annoying precision calculation. Plus, there is the equally annoying recall calculation. Some die hards actually create a statistically valid sample and attempt to determine if results from queries delivered the information the person running the query expected. There are library schools and researchers who worry about these silly methods. Not so much with the SEO crowd.

Back to the argument in the capitalist tool. I highlighted this passage:

users have always had the ability to report offensive auto complete suggestions, but now, Google has made the process more visible and immediate. In an even bigger push, Google has employed more than 10,000 independent contractors to serve as “quality raters,” responsible for identifying and flagging inaccurate and offensive material including fake news, for various search queries.

Ah, Google’s quality scores determined by Google’s smart software and its well crafted algorithms are no longer enough? Well, that’s a surprise. I thought the fake news, the mismatched ads, and the relaxation of queries to make that ad inventory shrink more rapidly were not much of an issue. Well, there is that push back from outfits like AT&T, but what’s a few cancelled ads from a minnow like AT&T.

The capitalist tool knows where it’s next Whopper is coming from. I circled this statement:

It’s important to realize just how sophisticated Google is, and how far it’s come from its early stages, as well as the impossibility of having a “perfect” search platform. Humans are flawed creatures, and our actions are what are dictating the shape of search. We can patchwork some of these problems, but the Google search quality crisis won’t disappear overnight, and can’t be blamed for being anything more than the byproduct of a sufficiently sophisticated machine designed to serve us.

Interesting idea—blame.

My takeaway from this scintillating analysis is that the capitalist tool needs to do a few queries about “quality”. Just a thought. By the way, the databases to use will not be part of the Google.com result set. Google partitions its indexes so that a research has to run queries across different Google silos. Also, commercial databases are likely to provide more comprehensive results from sources Google does not index. Hey, who cares about this precision and recall stuff when writing about offensive answers to queries, Google’s auto complete mechanism, rich snippets, and popularity?

Not too many at Forbes I surmise. Maybe SEO is search to these smart people who can demystify SEO and mystify information retrieval.

Stephen E Arnold, April 25, 2017

HonkinNews for April 18, 2017 Now Available

April 18, 2017

From the friendly skies of rural Kentucky, this week’s HonkinNews talks about the benefits of a visit to Louisville, Kentucky. Injuries are possible. HonkinNews report that a mid tier consulting firm has decided that people do not search. When you look for information online, you really “insight.” Yep, that sounds pretty crazy to Beyond Search as well. Even more startling are the companies the thrashing consulting firm identifies as leaders in “insight.” Spoiler: Recorded Future, Palantir Technologies, and other companies of this ilk are not included. Why? Insight means enterprise search. HonkinNews also take a quick look at what we call the “high school science club disorder” or HSSCD. Although not on the list of official medical conditions, we report on some striking parallels between Stephen E Arnold’s high school science club in 1958 and Google’s response to allegations from the US Department of Labor about Google’s compensation plan. From the Beyond Alexa service, HonkinNews recycles some information about must-use Amazon Alexa skills. Fancy some Eastern philosophy or words from fashionistas. You will learn what to have Alexa deliver for your auditory delight. A technological news flash about pizza adds flavor to this week’s show. You will want to use DRU to get your slice. No, DRU is not based on “drool”, although one of the Beyond Search team does droll when someone mentions pizza. DRU is a Domino Robotic Unit. Yummy. HonkinNews speculates about a rumored “new” functions for those who write using Microsoft Word. If you like Windows 10’s start menu ads, you will love LinkedIn information displayed next to that memo you are trying to finish so you can leave early. View the program to find out if Clippy will return. You can view the program here.

NB. One viewer of the program wanted to know why the program is in black and white and is pretty lousy. The reason is that we film on a Bell & Howell camera. We are in rural Kentucky, and we use what we have. Enough said. You can “insight” old fashioned eight mm film too.

Kenny Toth, April 18, 2017

Yikes! Google Skeptics Amp Up

April 6, 2017

Beyond providing search, email, office suite services, and not doing any evil, another of Google’s goals is to ramp up its search speed.  Media Post shares via its Search Marketing Daily column that “Search Experts Skeptical Of Google Amp Updates.”  Google’s Accelerated Mobile Project (AMP) might make it easier to access the original URL from search results, companies who rely on mobile search for marketing and advertising are not happy with it.

AMP reduces a Web site’s functionality by caching the content and in search results it prioritizes AMP.  Companies are losing potential clients when they are unable to display their wares in the growing mobile market.  It also does not bode well for Google, which draws a significant profit from ad revenue.  Why would Google hinder its own clients?  It is all in an effort to make the end user’s Google mobile search experience better.

The clients want to forgo the AMP experience:

‘If load times and user experience is really the issue here, then Google should prioritize based on load speed,’ wrote Yee Cheng Chin. ‘An AMP site with tons of images isn’t necessarily better than a simple minimal static page Web site served over CDN. I also want to use Google to look for relevant content, not whether a website conforms to Google’s own proprietary standards when searching.’ Chin, along with others, simply want to know how to disable the feature.

End users are frustrated as well because AMP changes the original URL’s content and does not always show what would be available on a full page.

The load times might be fast, pages are easier to read, but original intent and content are lost.  What is the solution?  Wait for technology to be upgraded enough to handle the original Web pages and bigger screens.

Whitney Grace, April 6, 2017

Palantir Technologies: 9000 Words about a Secretive Company

April 3, 2017

Palantir Technologies is a search and content processing company. The technology is pretty good. The company’s marketing pretty good. Its public profile is now darned good. I don’t have much to say about Palantir’s wheel interface, its patents, or its usefulness to “operators.” If you are not familiar with the company, you may want to read or at least skim the weirdo Fortune Magazine Web article “Donald Trump, Palantir, and the Crazy Battle to Clean Up a Multibillion Dollar Military Procurement Swamp.” The subtitle is a helpful statement:

Peter Thiel’s software company says it has a product that will save soldiers’ lives—and hundreds of millions in taxpayer funds. The Army, which has spent billions on a failed alternative, isn’t interested. Weill the president and his generals ride to the rescue?”

The article, minus the pull quotes, is more than 9000 words long. The net net of the write  up is that changing the US government’s method of purchasing goods and services may be tough to modify. I used to work at a Beltway Bandit outfit. Legend has it that my employer helped set up the US Department of the Navy and many of the business processes so many contractors know and love.

One has to change elected officials, government professionals who operate procurement processes, outfits like Beltway Bandits, and assorted legal eagles.

Why take 9000 words to reach this conclusion. My hunch is that the journey was fun: Fun for the Fortune Magazine staff, fun for the author, and fun for the ad sales person who peppered the infinite page with ads.

Will Palantir Technologies enjoy the write up? I suppose it depends on whom one asks. Perhaps a reader connected to IBM could ask Watson about the Analyst’s Notebook team. What are their views of Palantir? For most folks, my thought is that the Palantir connection to President Trump may provide a viewshed from which to assess the impact of this real journalism essay thing.

Stephen E Arnold, April 3, 2017

Seventeen Visions of the Future From Microsoft Researchers

March 31, 2017

Here’s a bit of PR from Microsoft that could pay off in many ways, should the company be wise enough to listen to these women. Microsoft’s blog posts, “17 for ’17: Microsoft Researchers on What to Expect in 2017 and in 2027.” As part of their Computer Science Education Week, the company shares 17 well-informed perspectives on the future of tech, presented by 17 talented researchers. On the way to introducing these insights, the post reminds us:

In this ‘age of acceleration,’ in which advances in technology and the globalization of business are transforming entire industries and society itself, it’s more critical than ever for everyone to be digitally literate, especially our kids. This is particularly true for women and girls who, while representing roughly 50 percent of the world’s population, account for less than 20 percent of computer science graduates in 34 OECD countries, according to this report. This has far-reaching societal and economic consequences.

Consequences like a worldwide shortage of qualified computer scientists, which could be eased by a surge of women entering the field. That’s why they call personnel management ”human resources,” after all.

We are pleased to see one particular researcher on the list, Sue Dumais, who happens to be an alum of the historic Bell Labs. Dumais now works as deputy managing director at Microsoft’s Redmond, Washington, lab. Her view for 2017 makes perfect sense—more progress in, and reliance upon, deep learning models. Among other things, she expects these models to continue improving internet search results. What about further down the road? Here’s Dumais’ vision:

What will be the key advance or topic of discussion in search and information retrieval in 2027?

The search box will disappear. It will be replaced by search functionality that is more ubiquitous, embedded and contextually sensitive. We are seeing the beginnings of this transformation with spoken queries, especially in mobile and smart home settings.  This trend will accelerate with the ability to issue queries consisting of sound, images, or video, and with the use of context to proactively retrieve information related to the current location, content, entities, or activities without explicit queries.

The post urges readers to share this list, in the hope that it will inspire talented kids of all genders to pursue careers in computer science.

Cynthia Murrell, March 31, 2017

Diffeo Incorporates Meta Search Technology

March 24, 2017

Will search-and-discovery firm  Diffeo’s recent acquisition give it the edge? Yahoo Finance shares, “Diffeo Acquires Meta Search and Launches New Offering.” Startup Meta Search developed a local computer and cloud search system that uses smart indexing to assign index terms and keep the terms consistent. Diffeo provides a range of advanced content processing services based on collaborative machine intelligence. The press release specifies:

Diffeo’s content discovery platform accelerates research analysts by applying text analytics and machine intelligence algorithms to users’ in-progress files, so that it can recommend content that fills in knowledge gaps — often before the user thinks of searching. Diffeo acts as a personal research assistant that scours both the user’s files and the Internet. The company describes its technology as collaborative machine intelligence.

Diffeo and Meta’s services complement each other. Meta provides unified search across the content on all of a user’s cloud platforms and devices. Diffeo’s Advanced Discovery Toolbox displays recommendations alongside in-progress documents to accelerate the work of research analysts by uncovering key connections.

Meta’s platform integrates cloud environments into a single keyword search interface, enabling users to search their files on all cloud drives, such as Dropbox, Google Drive, Slack and Evernote all at once. Meta also improves search quality by intelligently analyzing each document, determining the most important concepts, and automatically applying those concepts as ‘Smart Tags’ to the user’s documents.

This seems like a promising combination. Founded in 2012, Diffeo made Meta Search its first acquisition on January 10 of this year. The company is currently hiring. Meta Search, now called Diffeo Cloud Search, is based in Boston.

Cynthia Murrell, March 24, 2017

Bitcoin Alternative Monero Accepted by AlphaBay

March 17, 2017

As institutions like banks and law enforcement come to grips with the flow of Bitcoin, another cyber currency is suddenly gaining ground. Bloomberg Technology reveals, “New Digital Currency Spikes as Drug Dealers Get More Secrecy.” The coin in question, Monero, has been around for a couple of years, but was recently given a boost by the marketplace AlphaBay, one of the most popular destinations for buyers of illicit drugs on the Dark Web. In the two weeks after the site announced it would soon accept Monero, the total worth of that currency in circulation jumped to over $100 million (from about $25 million the previous month). Writer Yuji Nakamura explains why a shift may be underway:

Bitcoin, the most popular digital currency in the world with a total value of $9.1 billion, also allows users to move funds discreetly and uses a network of miners to verify the authenticity of each trade. But its privacy has come under threat as governments and private investigators increase their ability to track transactions across the bitcoin network and trace funds to bank accounts ultimately used to convert digital assets to and from traditional currencies like U.S. dollars.

Monero similarly uses a network of miners to verify its trades, but mixes multiple transactions together to make it harder to trace the genesis of the funds. It also adopts ‘dual-key stealth’ addresses, which make it difficult for third-parties to pinpoint who received the funds.

For any two outputs, from the same or different transactions, you cannot prove they were sent to the same person,’ Riccardo Spagni, a lead developer of Monero, wrote by e-mail. Jumbling trades together makes it ‘impossible to tell which transaction, of a set of transactions, a particular input comes from. It appears to come from all of them.

Though Monero has yet to withstand the trials of AlphaBay-level volumes for long, its security features received praise from investor and prominent digital-currency-advocate Roger Ver. As of this writing, Monero is ranked fifth among digital currencies in overall market value. Click here for a list of digital currencies ranked, in real time, by market cap.

Cynthia Murrell, March 17, 2017

Attivio Takes on SCOLA Repository

March 16, 2017

We noticed that Attivio is back to enterprise search, and now uses the fetching catchphrase, “data dexterity company.” Their News page announces, “Attivio Chosen as Enterprise Search Platform for World’s Largest Repository of Foreign Language Media.” We’ve been keeping an eye on Attivio as it grows. With this press release, Attivio touts a large, recent feather in their cap—providing enterprise search services to SCOLA, a non-profit dedicated to helping different peoples around the world learn about each other. This tool enables SCOLA’s subscribers to find any content in any language, we’re told. The organization regards today’s information technology as crucial to their efforts. The write-up explains: 

SCOLA provides a wide range of online language learning services, including international TV programming, videos, radio, and newspapers in over 200 native languages, via a secure browser-based application. At 85 terabytes, it houses the largest repository of foreign language media in the world. With its users asking for an easier way to find and categorize this information, SCOLA chose Attivio Enterprise Search to act as the primary access point for information through the web portal. This enables users, including teachers and consumers, to enter a single keyword and find information across all formats, languages and geographical regions in a matter of seconds. After looking at several options, SCOLA chose Attivio Enterprise Search because of its multi-language support and ease of customization. ‘When you have 84,000 videos in 200 languages, trying to find the right content for a themed lesson is overwhelming,’ said Maggie Artus, project manager at SCOLA. ‘With the Attivio search function, the user only sees instant results. The behind-the-scenes processing complexity is completely hidden.’”

Attivia was founded in 2007, and is headquartered in Newton, Massachusetts. The company’s client roster includes prominent organizations like UBS, Cisco, Citi, and DARPA. They are also hiring for several positions as of this writing.

Cynthia Murrell, March 16, 2017

Yandex Incorporates Semantic Search

March 15, 2017

Apparently ahead of a rumored IPO launch, Russian search firm Yandex is introducing “Spectrum,” a semantic search feature. We learn of the development from “Russian Search Engine Yandex Gets a Semantic Injection” at the Association of Internet Research Specialists’ Articles Share pages. Writer Wushe Zhiyang observes that, though Yandex claims Spectrum can read users’ minds,  the tech appears to be a mix of semantic technology and machine learning. He specifies:

The system analyses users’ searches and identifies objects like personal names, films or cars. Each object is then classified into one or more categories, e.g. ‘film’, ‘car’, ‘medicine’. For each category there is a range of search intents. [For example] the ‘product’ category will have search intents such as buy something or read customer reviews. So we have a degree of natural language processing, taxonomy, all tied into ‘intent’, which sounds like a very good recipe for highly efficient advertising.

But what if a search query has many potential meanings? Yandex says that Spectrum is able to choose the category and the range of potential user intents for each query to match a user’s expectations as close as possible. It does this by looking at historic search patterns. If the majority of users searching for ‘gone with the wind’ expect to find a film, the majority of search results will be about the film, not the book.

As users’ interests and intents tend to change, the system performs query analysis several times a week’, says Yandex. This amounts to Spectrum analysing about five billion search queries.”

Yandex has been busy. The site recently partnered with VKontakte, Russia’s largest social network, and plans to surface public-facing parts of VKontakte user profiles, in real time, in Yandex searches. If the rumors of a plan to go public are true, will these added features help make Yandex’s IPO a success?

Cynthia Murrell, March 15, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta