NetDocuments Employs BA Insight Tech for Enterprise Search
August 10, 2020
For a secure, cloud-based data solution, many law firms, legal departments, and compliance teams turn to NetDocuments. Now the platform has adopted technology from a familiar name to simplify its clients’ access to information. A post at PRWeb reveals, “NetDocuments Introduces NetKnowledge Enterprise Search Powered by BA Insight.” We find it interesting that the 16-year-old BA Insight is licensing its askable-knowledge system to create the new tool, NetKnowledge. The press release describes the system’s advantages:
“Eliminate Downloading and Indexing Data for Search: No longer does content within NetDocuments need to be downloaded and indexed to be part of an organization’s enterprise search. Simply search within the NetDocuments platform, and NetKnowledge will find relevant data–along with information from other sources —and present it to users.
“Enforce Access Controls on Sensitive Information: Sensitive information may need to be restricted to certain individuals, but that data also needs to be available to others via enterprise search. NetKnowledge respects data restriction policies at the source and will only present data to individuals with proper access rights.
“Manage Large and Disparate Data Sets Across the Organization: NetKnowledge helps organizations bring all its data together to form a single source of truth, so users do not have to perform multiple searches in different places to get the information they need.”
Founded in 2004, BA Insight is based in Boston, Massachusetts. The company is dedicated to making information easier to find for organizations of all stripes. NetDocuments is headquartered in Lehi, Utah. The company was founded in 1999 and acquired by Clearlake Capital Group in 2017.
Cynthia Murrell, August 10, 2020
Search Engines: Plumbing Becomes a Thing Again
August 10, 2020
Two search related items.
The first is Hndex. If you want to locate articles posted to HackerNews, a tech-oriented headline aggregation site, you have an option. This is an example of what might be labeled a “site specific search” solution: One site, search it. Navigate to https://hndex.org and plug in a search term. We entered a query for “enterprise search” and retrieved on point results. The comments are available; however, these are not indexed. Click the “cached” button, and you can view the original article. Click the “comments” button and you can view the comments. HackerNews provides its own search service, which is weirdly located at the bottom of the page. DarkCyber will reserve further comments until we have experimented with the system for a few days.
The second is Infinity Search, another metasearch engine positioned as a free Web search system. DarkCyber finds metasearch engines interesting, but these often pretend to be running their own crawlers. To Infinity Search’s credit the company states:
When you search for something on our site, we take the results from other search engines and our own indexes, organize it, and display it directly to you without logging any information about you.
Metasearch systems have to deduplicate results lists and find a way to remain in the good graces of companies running primary Web crawlers. Disclaimer: My son worked for Vivisimo (now the heart and soul of one of IBM’s marketing confections. He has moved to other adventures, but I remember our talks about the issues metasearch presents. For example, latency, screwed up query interpolation, and wonky deduplication which deduplicates useful results out of the results list. I think Vivisimo lives on in Yippy.com, but I am not a fan of metasearch systems which recycle others’ indexes and remain vulnerable to partners who pull out of deals, thus putting a dent in results.
Stephen E Arnold, August 10, 2020
Why Enterprise Search Remains a Problem
August 8, 2020
I read “Let’s Build a Full-Text Search Engine.” The write up does a reasonable job of walking through the basics of building a search engine. The focus is full text search, but I think in terms of an organization and its content. As a result, the system summarized will not handle video, images, and other types of content. The code examples are clear, and I liked the straightforward approach.
However, there is a potential bump in the information superhighway. Here’s a Venn diagram from the article. Notice the work you have to do to find documents with small, wild cat?
If I search for “smith”, “order”, “tile” — I want only the documents in which the Boolean AND is applied by default. I want Smith’s orders for tile. I have to call the person. I don’t want to go on scavenger hunt. (There are other minor nits too, but the AND’ing thing is huge to me.)
Stephen E Arnold, August 6, 2020
Do Not Gamble. Own the Casino. The Google Way?
August 3, 2020
I read “Google’s Top Search Result?” What a surprise? No, not the fact that Google present Google-centric results at the top of mobile search results. The surprise is that until July 28, 2020, no one knew that Google’s magical algorithmic, math-is-objective, super duper relevance scooper got more Google goodies than any other “content producer.” Amazing.
In the good old days of big desktop anchor computers and monitors, there was screen real estate. Google filled the screen with objective results and, of course, some advertisements.
That was then; this is now. Mobile screens are mostly squint-generators. In order to be seen and generate clicks, the Google has to work overtime.
The challenges include:
- Traffic, eyeballs, and individuals who will go ga-ga over that which is Googley.
- Sizzle that will burn the greedy fingertips of competitors who want to be placed front and center.
- Useful information for consumers. Yep, what Google displays eliminates the need to think. Advertisers who want to be listed on a Google Map. Something can be worked out.
A number of organizations have groused about Google’s magical algorithmic, math-is-objective, super duper relevance scooper.
What’s fascinating is that it has taken two decades for some people to understand the wisdom embedded in the observation, “Own the casino.”
Pretty good advice and someone at the GOOG took it.
Stephen E Arnold, August 3, 2020
Search and Predicting Behavior
August 3, 2020
DarkCyber is interested in predictive analytics. Bayesian and other “statistical methods” are a go-to technique, and they find their way into many of the smart software systems. Developers rarely explain that systems share many features and functions. Marketers, usually kept in the dark like mushrooms, are free to formulate an interesting assertion or two.
I read “Google Searches During Pandemic Hint at Future Increase in Suicide,” and I was not sure about the methodology. Nevertheless, the write up provides some insight into what can be wiggled from Google search data.
Specifically Columbia University experts have concluded that financial distress is “strongly linked to suicide.”
Okay.
I learned:
The researchers used an algorithm to analyze Google trends data from March 3, 2019, to April 18, 2020, and identify proportional changes over time in searches for 18 terms related to suicide and known suicide risk factors.
What algorithm?
The method is described this way:
The proportion of queries related to depression was slightly higher than the pre-pandemic period, and moderately higher for panic attack.
Perhaps the researchers looked at the number of searches and noted the increase? So comparing raw numbers? Tenure tracks and grants await! Because that leap between search and future behavior…
Stephen E Arnold, August 3, 2020
Untangling Streaming: Responses to a Huge Web Search Fail
July 22, 2020
More and more users rely on a patchwork of internet streaming services for their video entertainment. Anyone who subscribes to several of these knows the time-wasting tedium of combing through different menus, each with a different UI, just to find something to watch. With even more proprietary streaming services on the horizon, it seems that problem is poised to grow. However, there are at least two apps that provide viable solutions—Reelgood and JustWatch. “These Two Underdog Apps Have Solved Streaming TV’s Biggest Headache,” Fast Company observes. Writer Jared Newman reports:
“Instead of making you bounce between disparate apps, both services can tell you what’s available on practically any streaming service. You can then add movies and shows to a watch list, get more suggestions based on your viewing habits, and even load their apps on your television to use as a centralized streaming menu. Compared to the app overload of most streaming devices, the universal guides offered by JustWatch and Reelgood seem like the ideal way to watch TV in the streaming era.”
Sounds helpful. But why does it take “underdog” apps to do what common sense suggests devices like Roku and Amazon Fire TV should already offer? There are several business reasons, we’re told, like Netflix’s resistance to the aggregation of its content or the fact that streaming services pay for placement on those platforms. As for Reelgood and JustWatch, they each have their own business models. It comes as no surprise that each involves user data. Newman writes:
“JustWatch says that … about 70% of its revenue comes from targeting users with movie trailers based on their viewing habits. For every movie or TV show users click on, JustWatch builds up a taste profile, then separates users into anonymized groups based on what they might like. Movie studios such as Universal and Paramount then give JustWatch a budget to target users with relevant video trailers on sites like Facebook and YouTube. … Reelgood, meanwhile, started from more of a Silicon Valley mindset of building up the product first and finding ways to monetize it later. Sanderson, a former ad product manager at Facebook, initially thought that would take the shape of recommendation-style targeted ads within the service, but lately the company’s been leaning more into selling access to its data.”
See the write-up for more on the business considerations and plans for each of these entities, big and small. There are other notable players in this arena, including TV Time, Simkl, Watchworthy, Wander, and VUniverse. It will be interesting to see where the market, and the technology, go from here.
Cynthia Murrell, July 22, 2020
Google Alerts: Lost in Cyber Space?
July 16, 2020
Check out these headlines from my Google Alert for the phrase “enterprise search”.
The Covid angle is back. Who publishes this type of news? An outfit called Daily Research Chronicles. An outstanding SEO outfit? Maybe?
And how about these high relevance links to my enterprise search alert?
Silicon steel, analog cameras, and dental film.
Sure, the alerts are a free service. Sure, an item every week or three points to something relevant.
But the spoofiness of the service from outfits like Daily Research Chronicles begs me to ask?
What about those quality and relevance algorithms, dearest Google?
Stephen E Arnold, July 16, 2020
Visual Search Engines Provide Different POV Than the Google
July 15, 2020
Google image search is the standard visual search tool people use. It does not, however, provide the extra kick needed for deeper dives, especially with all the Pinterest results. Tech Funnel addresses how visual search engines are an advantage for businesses as well as points out nine great ones in: “Popular 9 Visual Search Engines To Know.”
There are many benefits to using visual search, such as it that it connects with younger generations because they connect with images when they use social media and apps. They are far more likely to purchase an item through these platforms than a Web site. Visual search also allows people to emotionally connect with a brand than standard text and it boosts revenue as it will be the next way people search for items along with voice search.
Popular visual search engines include Pinterest Lens that allows users to take photos of items and they can find, save, or shop for them. Fashion retailers are already using it, so Pinterest users can find clothing their models wear. Google Lens is similar to Pinterest Lens, except its applications are more diverse. It can be used for translation, searching for items, places, people, etc.
Amazon Rekognitio, Instagram Shopping, Snapchat Camera Search, and eBay powered by Cassini search engine have visual search engines dedicated to searching and locating items from photos. They each have different aspects, but all perform the same function. Bing appears to be different:
“From the viewpoint of a user, the experience gotten from Bing Visual Search is similar to other various visual search platforms. However, its feature of an extensive developer platform makes it preferable by a lot of developers.
With Bing Visual Search, developers are enabled to instruct the search engine on the particular data people can get from a specific photo. This means that if Bing Visual Search directs an individual to a certain product on your website, the developer has the ability to determine what information should be provided to the visitor.”
CamFind and EasyJet are the most original engines, because they are not associated with shopping nor Google. CamFind is the first successful mobile visual engine that uses image detection. EasyJet allows people to book flights based off photos, so now you can finally discover where you screen wallpaper is located.
Whitney Grace, July 15, 2020
Search History: Mostly Forgotten and Definitely of Zero Interest to the Smart Software Crowd
July 10, 2020
There’s an interesting, if selective, write up about online information search and retrieval. Navigate to “The Bourne Collection: Online Search Is Older Than You Think.”
An interesting statement appears in the write up:
Founder Roger Summit had been part of Lockheed Missiles and Space Corporation’s mid-1960s Information Sciences Laboratory (1964). He had built his ideas about iterative search—a “dialog” between the user and the computer—into a separate online search division for Lockheed. (This was very different from the “take your best shot” approach of modern search engines, where you generally need to run a new search to refine irrelevant results). Dialog licensed access to leading databases in a variety of fields, which you could search with its powerful tools. While the overall amount of information was far smaller than on the modern web, it was far, far more relevant and better organized.
For the modern online experts, such a quaint, irrelevant, and inefficient concept.
Stephen E Arnold, July 10, 2020
The Myth of Data Federation: Not a New Problem, Not One Easily Solved
July 8, 2020
I read “A Plan to Make Police Data Open Source Started on Reddit.” The main point of this particular article is:
The Police Data Accessibility Project aims to request, download, clean, and standardize public records that right now are overly difficult to find.
Interesting, but I interpreted the Silicon Valley centric write up differently. If you are a marketer of systems which purport to normalize disparate types of data, aggregate them, federate indexes, and make the data accessible, analyzable, retrievable, and bang on dead simple — stop reading now. I don’t want to deal with squeals from vendors about their superior systems.
For the individual reading this sentence, a word of advice. Fasten your seat belt.
Some points to consider when reading the article cited above, listening to a Vimeo “insider” sales pitch, or just doing techno babble with your Spin class pals:
- Dealing with disparate data requires time and money as well as NOT ONE but multiple software tools.
- Even with a well resourced and technologically adept staff, exceptions require attention. A failure to deal with the stuff in the Exceptions folder can skew the outputs of some Fancy Dan analytic systems. Example: How about that Detroit facial recognition system? Nifty, eh?
- The flows of real time data are a big problem — are you ready for this — a challenge to the Facebooks, Googles, and Microsofts of the world. The reason is that the volume of data and CHANGES TO THOSE ALREADY PROCESSED ITEMS OF INFORMATION is a very, very tough problem. No, faster processors, bigger pipes, and zippy SSDs won’t do the job. The trouble lies within, the intradevice and intra software module flow. The fix is to sample, and sampling increases the risk of inaccuracies. Example: Remember Detroit’s facial recognition accuracy. The arrested individual may share some impressions with you.
- The baloney about “all” data or “any” type is crazy talk. When one deals with more than 18,000 police forces in the US, outputs from surveillance devices from different vendors, and the geodumps of individuals and their ad tracking beacons — this is going to be mashed up and made usable. Noble idea. There are many noble ideas.
Why am I taking the time to repeat what anyone with experience in large scale data normalization and analysis knows?
Baloney can be thinly sliced, smeared with gochujang, and served on Delft plates. Know what? Still baloney.
Gobble this:
Still, data is an important piece of understanding what law enforcement looks like in the US now, and what it could look like in the future. And making that information more accessible, and the stories people tell about policing more transparent, is a first step.
But the killer assumption is that the humans involved don’t make errors, systems remain online, and file formats are forever.
That baloney. It really is incredible. Just not what you think.
Stephen E Arnold, July 8, 2020