Techspert: Search and Experts

April 6, 2020

How Our AI Search Technology Finds Experts Others Can’t” provides a crunchy description about an application of artificial intelligence. Techspert.io provides a diagram of its approach:

techspert small

The idea is that the approach operates with pinpoint precision. Then a semantic search engine is used to identify context. The old school lingo was Endeca’s Guided Search or maybe side search. Then a social graph is generated. That’s a relationship map like those used by i2 Ltd’s Analysts Notebook in the early 1990s. The i2 Ltd outfit had some Cambridge grads on its team. Finally the system can identify candidates.

What’s interesting is that the pinpoint angle appears to focus on a narrow domain; that is, individuals in STM with a focus on the M (medicine, biotechnology, etc.). This approach reduces the difficulty of indexing for any business or technical discipline. Focus means that descriptive terms are narrower than general business lingo. Second, the crawling for specialized personnel becomes somewhat easier because many sites can be ignored because they are not related to medicine and related fields; for example, the garden gnome site www.designsoscano.com. Plus, the social graph complexity can be reduced by applying qualifiers that NOT out individuals and other entities unrelated to the focus of Techspert.io; for example, David Drummond and Jennifer Blakely.

Several observations are warranted:

  1. The implemented method is useful when deployed in a focused way; that is, vertical search for different “terminologies”.
  2. Scaling the approach across different content domains may require innovative engineering. And the engineering solutions will be expensive to implement, update, and enhance.
  3. Generating market magnetism will require effective marketing and sales programs. Business development must generate sufficient revenue because once certain hires are made by a company, the recruiting service is put on ice; and sustainable revenues will have to come from recruiting services which offer lower costs, perquisites to customers, etc. These factors may inhibit some venture cash investments.

Worth monitoring this firm. A pivot may be necessary due to the uncertain economic environment.

Stephen E Arnold, April 6, 2020

Semantic Search: From Whence to What

April 2, 2020

A post from semantic SEO firm InLinks traces “The Evolution of Semantic Search.” The buzzword-filled summary does relate an interesting saga, which prompts us to wonder why enterprise search results are generally still pretty poor.

The write-up traces the evolution from the card-catalogue-like directories of early Yahoo to today’s semantic search. Along the way it details these concepts and milestones: directory-based search vs. text-based search; the crawl and discover phase; JavaScript challenges; turning text into math; the continuous bag of words (COBW) and nGrams; vectors; semantic markup; and trusted seed sets. See the post for elaboration on any of these headings.

The piece concludes:

“We started the journey of search by discussing how human-led web directories like Yahoo Directory and the Open Directory Project was surpassed by full-text search. The move to Semantic search, though, is a blending of the two ideas. At its heart, Google’s Knowledge-based extrapolates ideas from web pages and augments its database. However, the initial data set is trained by using ‘trusted seed sets’. the most visible of these is the Wikipedia foundation. Wikipedia is curated by humans and if something is listed in Wikipedia, it is almost always listed as an entity in Google’s Knowledge Graph. … So in many regards. the Knowledge Graph is the old web Directory going full circle. The original directories used a tree-like structure to give the directory and ontology, whilst the Knowledge Graph is more fluid in its ontology. In addition, the smallest unit of a directory structure was really a web page (or more often a website) whilst the smallest unit of a knowledge graph is an entity which can appear in many pages, but both ideas do in fact stem from humans making the initial decisions.”

Here is where we are reminded of the post’s source—For the SEO platform, the takeaway is that what Google considers an “entity” has become key to effective SEO marketing. For our part, we look forward to the continuation of the saga, hopefully resulting in truly effective enterprise search solutions. Some day.

Cynthia Murrell, April 2, 2020

April Surprise: PhpSearch Images

April 1, 2020

For an interesting search experience, navigate to this link which is powered by the SRCH2 search system. The content available from the search box inside the fish is interesting. Running queries on the image search system can be particularly interesting.

I suppose I could provide some queries for you to test, but I will leave that to you, gentle reader.

The SRCH2 technology has been around for a number of years. I tracked down the company when I was working on the New Landscape of Search, but I decided not to include the company because it was focusing on mobile.

For information about the company, navigate to this link.

Stephen E Arnold, April 1, 2020

Swagiggle? Nope, Not an April Fooler

April 1, 2020

Big ecommerce sites like eBay and Amazon depend on a robust, accurate, and functional search engine. Without a powerful search application, searching for items on eBay and Amazon is like looking through every page of a printed catalog. The only difference is that there are millions of items compared to the thousands in one catalog. Amazon and eBay are not always accurate, especially when users edit and add content without being monitored. That means there is room for improvement and a startup to worm their way into the big leagues. Swagiggle is a:

“Swagiggle is a precision shopping search and product discovery website created by WAND, Inc. to demonstrate the capabilities of its taxonomy based product data organization and enrichment abilities featured in the WAND eCommerce Taxonomy Portal and PIM. WAND, Inc. is the world’s leading provider of pre-defined taxonomies, including the WAND Product and Service Taxonomy.

Have you ever had the experience of going to a category on an online retail site and seeing mis-categorized items? Or, a bunch of items dumped into a catch-all “Accessories” category. At Swagiggle, our goal is to provide accurate and specific categories so that our users can quickly find exactly the products they are looking for. From there, we assign product specifications so that users can filter through the items in a category and find exactly what they want.”

Wand’s Swagiggle sounds like an awesome product. Using products from its clients, Swagiggle offers an online catalog for users to search for products they wish to buy. These products range from clothing to cleaning products. The items are organized by large categories, then users man drill down to specific items or search with key words. It is a pretty standard search engine, but it has one major problem. The drilling down aspect does fill dated and half the time pictures and content would not load. The loading time is extraordinary long too. Plus, due to the variety of their clients, items offered on Swagiggle are very random. Swagiggle needs tofu the broken pictures and figure out how to make itself faster.

Whitney Grace, April 1, 2020

Dark Web Search: Specialized Services Are Still Better

March 26, 2020

Free Dark Web search is a hit-and-miss solution. In fact, “free” Dark Web search is often useless. Some experts do not agree with DarkCyber’s view, however. The reason is that these experts may not be aware of the specialized services available to government agencies and qualified licensees.

Here’s a recent example of cheerleading for a limited Dark Web search system.

A search engine does not exist for the Dark Web, until now says Digital Shadows in the article, “Dark Web Search Engine Kilos: Tipping The Scales InFavor Of Cybercrime.” Back in 2017, there used to be a search engine dubbed Grams that specialized in searching the Dark Web. It was taken down when its creator Larry Harmon, supposed operate of Helix the Bitcoin tumbling service. The Dark Web was search engine free, until November 2019 when Kilos debuted.

Kilos piggy backs on the same concept of Grams: using a Google-like search structure to locate illegal goods and services, bad actors, and cybercriminal marketplaces. Kilos has indexed more platforms, search functions, and includes many ways to ensure that users remain anonymous. Grams and Kilos are clearly linked based on the names that are units of measure.

Grams was the prominent search engine to use for the Dark Web, because it searched every where including Dream Market, Hansa, and AlphaBay and users could also hide their Bitcoin transactions via Helix. Grams did not have a powerful structure to crawl and index the Internet. Also it was expensive to maintain. This resulted in it going dark in 2017.

The argument is that Kilos is killing the Dark Web search scene as a more robust and powerful crawler/indexer. It already has indexed Samsara, Versus, Cannazon, CannaHome, and Cryptonia. Plus it has way more search functions to filter search results. Every day Kilos indexes more of the Dark Web’s content and has a unique feature Grams did not:

“Since the site’s creation in November 2019, the Kilos administrator has not only focused on increasing the site’s index but has also implemented updates and added new features and services to the site. These updates and features ensure the security and anonymity of its users but have also added a human element to the site not previously seen on dark web-based search engines, by allowing direct communication between the administrator and the users, and also between the users themselves.”

Kilos is adding more services to keep its users happy and anonymous. Among the upgrades are a CAPTCHA ranking system, faster search algorithm, a new Bitcoin mixer service, live chat, and ways to directly communicate with the administration.

Reading about Kilos sounds like an impressive search application startup, but wipe away the technology and its another tool to help bad actors hurt and break the system.

So what’s the issue? Kilos focuses on Dark Web storefronts, not the higher-value content in other Dark Web, difficult-to-index content pools.

But PR is PR, even in the Dark Web world.

Whitney Grace, March 26, 2020

Cloud Search Magic

March 26, 2020

Storing files on the cloud is a marvelous way to back up files and also free up valuable memory on devices. There is one big problem if you offload files on the cloud: finding them. There are various platforms to store files in the cloud, but Popular Science explains in the article “Find Any File In The Cloud” if you are unfamiliar with the platform it will be harder to find files.

The article explores popular cloud hosting platforms and walks readers through how to locate and search for files. The platforms examined are Google Drive, Dropbox, iCloud, and OneDrive. Each specific platform has its intricacies, but are important to master:

“But if you haven’t taken the time to explore a platform in depth, or if you use several and often get confused, you might find it harder to track down particular files compared to having them on a local hard drive. It doesn’t have to be this way, though. All the big cloud storage providers have useful tools for searching through your files and folders, whether you’re using a web browser, a desktop computer, or your phone.”

Be aware that these platforms can change based on the device accessing them. Many devices have mobile and desktop interfaces, so things are changed around if you move from one machine to another. None of these platforms are superior to the other, but users will prefer one to the other based on the type of machine they are using.

Another thing to consider when selecting a platform to use are the security parameters each one uses. The platform could be easy to use, but it also might be easy to hack.

Whitney Grace, March 26, 2020

Daedalus Enterprise Search Appliance with ElasticSearch Inside

March 25, 2020

Open source software is a boon to companies and organizations that cannot afford the steep price tag of proprietary software. Open source, however, does have its drawbacks, including lack of customer support, the software is only as good as its developer, and security issues. PR Web describes how the Department of Defense is getting an overdue search upgrade: “PSSC Labs Launches Daedalus Enterprise Search Appliance.”

The Department of Defense relied on Elasticsearch for many digital tasks, including cybersecurity and logistics. Elasticsearch was providing the one and done solution the Department of Defense needed for its advanced workloads. Enter the PSSC Labs with its Daedalus Enterprise Search Appliance to the rescue. PSSC Labs designs and builds custom big data and high performance computing solutions. Daedalus Enterprise Search Appliance is a new platform powered by Elastic and compatible with Elastic Cloud Enterprise.

The Daedalus Enterprise Search Appliance will upgrade the Department of Defense’s system components. It also will not be a huge investment and will be a reasonable upgrade cost. The Department of Defense went with PSSC Labs because:

“ ‘We chose Elasticsearch as the foundation of the platform because it offers the flexibility and simplicity other application packages do not. With Elastic, everything is included in one simple per node price. This means companies can utilize the high-performance Elastic Stack for a variety of workloads including log analysis, cybersecurity, simple distributed storage, geospatial data analysis, and other concepts that are still yet to be discovered,’ said Alex Lesser, PSSC Labs Vice President.”

Other than the reasonable cost and product quality, the Department of Defense selected PSSC Labs’ Daedalus Enterprise Search Appliance because it was built on Elastic. Elastic is an open source software, but many proprietary software companies build their own products on free technology. The move to the Daedalus Enterprise Search Appliance should relatively simple as the current Department of Defense system is based on Elasticsearch.

Whitney Grace, March 25, 2020

Semantic Sci-Fi: Search Is Great

March 23, 2020

I read “Keyword Search is DEAD; Semantic Search Is Smart.” I assume the folks at Medium consider each article, weigh its value, and then release only the highest value content.

comic

Semantic search is better than any other type of search in the galaxy.

Let’s assume that the write up is correct and keyword search is dead. Further, we shall ignore the syntax of SQL queries, the dependence of policeware and intelware systems on users’ looking for named entities, and overlook the interaction of people using an automobile’s navigation service by saying, “Home.” These are examples of keyword search, and I decided to give a few examples, skipping how keyword search functions in desktop search, chemical structure systems, medical research, and good old, bandwidth trimming YouTube.

Okay, what’s the write up say beyond “keyword search is dead.”

Here are some points I extracted as I worked my way through the write up. I required more than three minutes (the Medium estimate) because my blood pressure was spiking, and I was hyper ventilating.

Factoid 1 from the write up :

If you do semantic search, you can get all information as per your intent.

What’s with this “all.” Content domains, no matter what the clueless believe, are incomplete. There is no “all” when it comes online information which is indexed.

Factoid 2 from the write up:

semantic search seeks to understand natural language the way a human would.

Yep, natural language queries are possible within certain types of content domains. However, the systems I have worked with and have an opportunity to use in controlled situations exhibit a number of persistent problems. These range from computational constraints. One system could support four simultaneous users on a corpus of fewer than 100,000 text documents. Others simply output “good enough” results. Not surprisingly when a physician needs an antitoxin to save a child’s life, keywords work better than “good enough” in my experience. NLP has been getting better, but the idea that systems can integrate widely different data which may be incomplete, incorrect, or stale and return a useful output is a big hurdle. So far no one has gotten over it on a consistent, affordable basis. Short cuts to reduce index look ups can be packaged as semantics and NLP but mostly these are clever ways to improve “efficiency.” Understanding sometimes. Precision and recall? Not yet.

Read more

Semantic Search Allegedly Adds A Boost To Product Discovery

March 20, 2020

Semantic search is one of the old reliable pieces of jargon for improving a search application, but it appears to be old hat. Semantic search, however, can, when correctly implemented, add a much needed boost for product discovery.

Grid Dynamics explains semantic magic in the article, “Boosting Product Discovery With Semantic Search.” We all know that human language is a complicated beast, which is why it has taken decades to develop decent voce to text and automated foreign language translation algorithms.

Humans learn from infancy to process speech based on the context and life experience. As technology has progressed, search engines are expected to perform the same actions which is where semantic search enters the game. Semantic search not only matches key words and phrases, but it brings meaning to them. Ecommerce Web sites require more than keyword and phrase search. Customers want to sort products based on price, brands, ratings, etc.

I am a librarian, and I know that irrelevant results often appear in any search and there are two types of these results: Obviously irrelevant values and values with subtle differences. A simple solution does not exist to fix all the irrelevant results.

Solutions are usually built a hybrid of semantic search and unstructured data. For the semantic search part, they must have: single words must be part of unbreakable multi-word phrases, business domain knowledge retracts/enhances query options, ambiguous matching need to be fixed with saliency to match attributes. Boolean queries also can be implemented in new ways to alter searches. Semantic search can also be used with different physical properties and merchandising rules.

Semantic search is a powerful tool for ecommerce Web sites, but:

“However, the power of semantic search largely depends on the richness and quality of the domain data – product attribution as well as synonyms. If your customers often perform out-of-dictionary search, then semantic search quality will suffer. It can include

• searches by subjective features like occasion of clothing (church dress) or age group for hi-tech device (laptops for kids)

• searches for brands which aren’t carried by your site, but it has similar products which can be suggested instead of just dropping the brand value from a query”

Never doubt how semantic search can improve a ecommerce search engine, but be sure to instill proper parameters for it to work correctly. Semantic search will remain a favorite of marketing whether a system is helping the person looking for information or hindering relevancy.

Whitney Grace, March 20, 2020

A New Horizon for Verizon: Swizzled Search Results

March 19, 2020

DarkCyber read “Yahoo, AOL, OneSearch Results Biased in Favor of Parent Company Verizon Media’s Web Sites.” The main idea seems to be that like baker’s in 11th century France a thumb on the scales could pay dividends. A gram here, a gram there.

The article asserts:

You may not be surprised to learn that the search results from all three of Verizon Media’s search engines are biased in favor of Verizon Media websites. Yahoo!, AOL, and OneSearch all boosts the ranking of Verizon Media brands in organic search results. That is to say, regular web results excluding ads, news, shopping, image, and video search results.

Surprised? Nope. What is the bit of revelatory factoid is that Bing indexes the Verizon content. Neither Bing nor Google reveals exactly how many Web sites their respective systems index. Useless information like how many links the crawlers follow in a Web site is not made explicit.

DarkCyber’s test queries suggest that Bing indexes only sites with a higher probability of being clicked. We have noted that for some queries, the Bing results closely parallel Google’s. Bing search administrators, are you monitoring Mother Google?

Therefore, such a happy coincidence that Bing indexes and displays in a favorable position the Verizon owned sites. In the good old days, the approach was called hit boosting. Today it probably has the words artificial intelligence and semantic technology obfuscating shaping content to meet a specific business need.

Progress in search? Absolutely just search engine optimization, however.

Stephen E Arnold, March 19, 2020

 

https://www.ctrl.blog/entry/verizon-media-search.html

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta