Qwant Pitches Map Privacy

July 14, 2019

Digital maps are an indispensable tool, especially if you ceaselessly use a GPS.  While digital maps are accurate, fast, and reliable, the also track and store user information.  One semi-logical argument is that if you have nothing to hide, what is the big deal about information being stored.  On the other hand, you should have the right to protect your privacy whether or not you have anything to hide.  Qwant Maps believes in preserving user privacy, so it is an open source and privacy-preserving map tool.  Qwant Maps was created so users have exclusive control over their geolocated data.

Qwant Maps built its tool on OpenStreetMap, a free and collaborative geographical database supported by more than one million voluntary contributors.  OpenStreetMap is not an out-of-the box solution and requires some tech savviness to use it.  Qwant Maps’s team developed a geoparsing engine to make OpenStreetMap more user friendly.

“To overcome these shortcomings and to meet the needs of most of people, Qwant Maps has developed — or participated to the development — its own software components. The will of Qwant Maps is to create a virtuous synergy between Qwant Maps and OpenStreetMap. Thus Qwant Maps uses OpenStreetMap data to generate its own vector tiles, its own base map, its own web APIs. Also Qwant Maps feeds its geoparsing web service as well as its online applications thanks to OpenStreetMap data.”

All of the code for both the Qwant Maps geosparsing tool and OpenStreetMap are open source.  Qwant Maps also uses Mimirsbrunn as its search engine, Kartotherian as a visual rendering tool based on vector tiles, and Idunn is used to highlight all information on the tiles.

Whitney Grace, July 5, 2019

A Complete List of Google Alternatives: Not Just Incomplete but a Reflection of Misinformation

July 10, 2019

Let’s start with the title: “A Complete List of Alternatives To The Google Search Engine.” Why? Google is not a particularly useful system if I understand the argument in the Collective Evolution write up. DarkCyber believes that Google is useful, but it is one source of Web content pointers.

What is on the complete list? How about 10 search systems. Here they are:

  • StartPage – StartPage gives you Google search results, but without the tracking (based in the Netherlands).
  • Searx – A privacy-friendly and versatile metasearch engine that’s also open source.
  • MetaGer – An open source metasearch engine with good features, based in Germany.
  • SwissCows – A zero-tracking private search engine based in Switzerland, hosted on secure Swiss infrastructure.
  • Qwant – A private search engine based in France.
  • DuckDuckGo – A private search engine based in the US.
  • Mojeek – The only true search engine (rather than metasearch engine) that has its own crawler and index (based in the UK).
  • YaCy – A decentralized, open source, peer-to-peer search engine.
  • Givero – Based in Denmark, Givero offers more privacy than Google and combines search with charitable donations.
  • Ecosia – Ecosia is based in Germany and donates a part of revenues to planting trees.

Several observations:

  1. StartPage (formerly IxQuick, created by a former Wall Street type) is a metasearch system. The company uses results from other sources, passes the query against these sources, and displays a single list of results. Like DuckDuckGo, MetaGer, and similar systems, one is getting spider output from third parties. Sources can range from Bing, Common Crawl, or other sources. DarkCyber is not enthusiastic about metasearch engines because it is very difficult to know what’s in and what’s out, if the de-duplication function actually works, and the rate of refresh for the system.
  2. Omissions include Bing.com, Yandex.com, plus Exalead. Despite the unusual marketing of the Exalead Web search system — that is to say, none — you can use it at this link. DarkCyber recommends running queries against Google as well as these systems for general search results.
  3. DarkCyber makes use of specialist search systems as well. Some of these are provided to us by the intelware companies with which we interact. Two sources worth mentioning are Talkwalker and Webhose. Neither is based in the US, but each provides affordable and useful content to those serious about information spidered from open sources.
  4. Those who want to access information may find the list of tools compiled by MK Bergman a helpful place to begin. Many are not specific to searching via an ad supported system, but there are some gems in the list. DarkCyber also consults sources like Swiss Leaks, which can be quite useful.

DarkCyber’s point is that calling the systems in the  “complete” list “complete” is not helpful. In fact, it is filled with information that is unlikely to result in a thorough search.

Is DarkCyber surprised? Nah, par for the “experts” who are writing about search today.

Stephen E Arnold, July 10, 2019

Enterprise Search: Decades of Disappointment? Yep, Yep

June 25, 2019

I stumble across allegedly accurate factoids about enterprise search once in a while. The coverage of the systems which can index and make findable data and information in an enterprise is a shadow of its former self. I noted the story in a blog which caters to the content management industry (no, I don’t know what that means). The write up was “9 Takeaways from the Digital Workplace Experience Conference 2019.” No, I don’t know what a “digital experience” is either. But hold on, there was a reference to a study by Simpler Media Group (no, I never had heard of this outfit before I read the article). That study included data. One factoid is allegedly accurate and it makes clear that the outfits selling enterprise search systems face a bit of a challenge. Here’s the passage I noted. Remember. This is an alleged factoid, not something I can verify because I no longer maintain my data and files about the fantastical world of enterprise search:

Enterprise search followed document management in the survey but, again, only 11% said it was effective. “This pattern repeated itself over and over across many tools and technologies,” Fagan [a person who is an expert in Simpler Media] said. “In each instance, the tool’s reported importance far exceeded the actual efficacy.”

Enterprise search has been around a long time. IBM STAIRS? InQuire? NetWeaver? Sophia? My all time fave Entopia. (Search apparently has some relationship to utopia I assume.) In its remarkable 40 or more year history, people cannot use a system to find the information they need within the organization for which they work.

As non text data, encrypted content, and intercepted information proliferates, finding is more difficult today than at any time in my professional career.

There’s a lot of wonky enterprise software floating around. Just read through some Capterra listings. But 11 percent. That’s special. Is the number “true”. Sure, who can doubt Simple Media and a blog about something I can’t define, content management.

After decades of disappointment, it seems that enterprise search has some opportunities for improvement. Simple? Right? Just manage the content? Make information findable? One size fits all? Sure.

Stephen E Arnold, June 25, 2019

Keeeb: A Personal Google for Everyone

June 24, 2019

That line “a personal Google for everyone” allegedly appeared in the Wall Street Journal. The phrase was a description of Keeeb, a company offering an enterprise intelligence platform. I remembered the phrase when I read the news release titled “Keeeb Adds Former Co-Founder and CTO of Leading Cognitive Search Provider Attivio and VP Technology of FAST Search & Transfer.” According to the news item:

The New York enterprise intelligence company Keeeb reinforces its technical leadership with the addition of Sid Probstein, former co-founder and CTO of the leading cognitive search provider Attivio and former VP Technology of FAST Search & Transfer (acquired by Microsoft in 2017). As new CTO Probstein will lead Keeeb’s global software development team and drive the next generations of the unique platform that unleashes enterprise intelligence.

What’s interesting is that the company Keeeb uses the name “Keeeb Deutschland GmbH and describes itself as a New York company.

Attivio’s tag line “cognitive search provider” struck me as a bit of a piggyback ride on the wild and crazy IBM Watson cognitive computing marketing blitz which has largely slowed to crawl. Remember H&R Block using Watson or Watson curing cancer? I do. Attivio, before embracing cognitiveness, dabbled in customer support, analytics, and a number of other “disciplines” as it worked to grow its sustainable revenue.

Fast Search & Transfer is also an interesting company. Some of the Fast Search & Transfer technology lives on in Microsoft, which bought the company in 2008. There was some legal and law enforcement agitation about Fast Search’s finances. Ultimately there was embarrassment for the founder of that firm.

DarkCyber will add Keeeb to its list of enterprise search vendors, a list is now growing less rapidly than it did in the hay days of “search” between 2002 and 2011. Why did the pace slow?

Several reasons:

  • The huge financial payoffs from search did not materialize. In fact, the largest of the search vendors is now embroiled in a high profile trial in England.
  • The emergence of Elasticsearch (which I think of as the son of Compass) became available as open source. Proprietary search engines looked less appealing in terms of support and freedom to fiddle with code than proprietary offerings from outfits like Fast Search & Transfer.
  • The promises that search vendors made about easy access to enterprise content were impossible to meet. Clients either ran out of patience, money, or time. The few healthy search vendors were bought. Others tightened their belt and carried on.

Where will Keeeb fit into the information access landscape? I don’t’ know. It seems to me as the author of the first three Enterprise Search Reports, that companies like DataWalk and Diffeo are what search should have become. Maybe Keeeb will be forward leaning too?

Stephen E Arnold, June 24, 2019

Have Fun Searching Nonprofit Tax Records

June 21, 2019

If you work at a nonprofit organization, the word free is magical! Databases are also a magical source of information and the life blood for anyone writing grants. A free, authoritative database is like a magic wand. ProPublica is a news source focused on nonprofits and it recently published the story about a free way to search IRS records: “You Can Now Search The Full Text Of 3 Million Nonprofit Tax Records For Free.”

Along with being a newsroom ProPublica also launched a brand new tool: the Nonprofit Explorer database that searches the full text of three million digitally filed IRS nonprofit tax filings. Nonprofit Explorer contains tax records from more than 1.8 million nonprofits as well as names for key employees and organization directors. Users can search for terms anywhere in the tax records. The only catch is the that the nonprofits needed to file their taxes digitally, but nearly two-thirds do so.

How can you use the Nonprofit Explorer:

“For one, this feature lets you find organizations that gave grants to other nonprofits. Any nonprofit that gives grants to another must list those grants on its tax forms — meaning that you can research a nonprofit’s funding by using our search. A search for “ProPublica,” for example, will bring up dozens of foundations that have given us grants to fund our reporting (as well as a few filings that reference Nonprofit Explorer itself).

Just another example: When private foundations have investments or ownership interest in for-profit companies, they have to list those on their tax filings as well. If you want to research which foundations have investments in a company like ExxonMobil, for example, you can simply search for the company name and check which organizations list it as an investment.”

Usually a database like this requires a yearly subscription. Most nonprofits cannot afford subscription fees, so ProPublica is providing a public service that will assist millions. ProPublica probably uses their own database to apply for grants to fund it.

Keep in mind that some bad actors set up non profit organizations for some interesting purposes. Access to these records may provide useful to some investigators.

Whitney Grace, June 21, 2019

Google Has Changed Search Results Again

June 20, 2019

Google cannot let anything rest. Whether that is a good thing or a bad thing is a matter of opinion, to others it turns into annoyance. Has Google’s latest changes to search results confounded its users once again? Inc. looks into Google’s newest change in the article, “Google Just Announced A Major Change To Your Search Results.”

The new change appears to be a simple one. Google will no longer show multiple results from the same Web site, except occasionally. What is even more uncanny is that Google deployed it right under our noses. The search engine did not change anything in the search algorithm, however, this could be bad for content creators and businesses centered on content curation.

More unique Web sites will be introduced into the top search results, most people click on the first few results displayed. What this means for content people is that:

“A lot of you spend a lot of resources creating content for the very purpose of showing up at the top of organic search results. This change means that you’ll have to consider how to adjust your content and search engine optimization strategy knowing that less of your pages will show up for relevant searches…Google says there is an exception. When the company’s search algorithm thinks that more than one result from the same site is especially relevant to a search, it will continue to display them in the top results. It doesn’t give specific examples as to what this means or when it will make this exception, but Google does say they will continue to make adjustments as it rolls out across search results.”

Google fiddles the ad giant engine has made assurances that multiple results from the same page will display they are the most pertinent. As a librarian, perhaps some of these changes will restore relevance and add a pinch of precision and recall?

Whitney Grace, June 20, 2019

HP-Autonomy and the KPMG Due Diligence Document

June 15, 2019

I noted this article in The Register, a UK online publication: “HP CFO Cathie Lesjak Didn’t Even Read KPMG’s Autonomy Due Diligence Before $11bn Biz Gobble.” The write up reports that Hewlett Packard professionals did not read a report about Autonomy prepared by the accounting and consulting services firm KPMG. DarkCyber finds the information in the article interesting. We noted this statement in the Register’s write up:

Barrister Robert Miles QC asked her: “I think you didn’t, yourself, read a due diligence report prepared by KPMG, is that right?” Lesjak replied: “I did not.”

As intriguing as this exchange between Autonomy’s attorney and an HP executive involved in the astounding $11 billion purchase, the Register provides a link to the “confidential” and “draft” report about the finances of Autonomy.

Image result for buyer beware

The document is available at this link. Note: that confidential documents can be removed from public access at any time. DarkCyber, an organization with more time but fewer resources than HP, read the document online.

DarkCyber’s conclusion is that HP’s failure to read the KPMG draft deprived the HP executives of information germane to the purchase price of $11 billion.

Other items of interest to DarkCyber in the KPMG document dated August 9, 2011, were:

  • KPMG itself lacked access to certain information; for example, certain details related to Autonomy’s income taxes
  • Autonomy’s financials (top line revenue and profits) were softening after the $870 million in revenue reported in FY2010
  • Autonomy used a method known as “Tower” in order to achieve certain financial objectives; namely, obtain maximum financial benefits from its activities such as loans.

The KPMG report is a “draft” and its authors presented sufficient information (even though that information is incomplete) to call into question the purchase of Autonomy for $11 billion.

The deal did not work out for either HP or Autonomy. HP lost traction with its shareholders. Autonomy found itself mired in an unpleasant and highly visible legal battle.

DarkCyber’s view is that companies engaged in search, retrieval, content processing, and allied disciplines have an unusual track record. For example, a number of little known companies simply failed to meet their revenue objectives and went out of business. Examples include Delphes (Canada), Entopia (Israel), InQuire, and others.

Other firms engaged in Autonomy-type software and services sought buyers in order to avoid financial problems. Examples include Exalead (acquired by Dassault), Vivisimo (acquired by IBM), and others.

Convera and Fast Search & Transfer are examples of enterprise search and Autonomy-type services caught in the same business quagmire as Autonomy; that is, robust promises about technology, difficulties generating sustainable revenue, problems in satisfying customers, and problems controlling infrastructure, R&D, and customer support costs. Convera (once Excalibur) was rescued by Allen & Company but was unable to deliver satisfactory solutions to information processing needs at Intel and the NBA. Fast Search & Transfer was involved in a financial investigation related to the company’s balance sheets. Microsoft stepped in and bought Fast Search in 2008.

Most of these problems with Autonomy-type companies stemmed from a combination of these miscalculations, errors in judgment, or over optimistic marketing:

  1. Search and retrieval is difficult to define; therefore, whatever system is installed at an organization will disappoint most of a system’s users. For this reason, large companies have a specialized system for legal, one for bench chemists, one for marketing, etc. Due to disenchantment, competitors can make a sale only to face clamors for engineering fixes or termination of the contract. Sustainable revenues are, therefore, a characteristic of Autonomy-type companies. (The KPMG report makes clear that Autonomy relied on acquisitions to increase its top line revenue.)
  2. Enterprise search vendors typically over promise and under deliver. Sales professionals and marketers glibly explain the value of unlocking the hidden value of an organization’s data. The reality is that the costs of determining what data are available, who can view certain data, cleansing and validating that data, indexing the data, and then keeping the indexes up to date and in line with access privileges is a significant burden. The cost of “unlocking’ exceed the available resources and appetite for investment in many licensees of Autonomy-type search systems. (The KPMG rolls these costs into undifferentiated line items, a serious omission. These costs help explain the “you can’t get there from here” problem inherent in Autonomy-type software.)
  3. Autonomy-type systems from the period covered in the KPMG report were mostly proprietary code. Over time, these code bases became increasingly complex and at the same time more fragile. As a result, the costs of standing up a system, fine tuning it, and then tailoring it to the needs of the licensee grew over time. Like the content preparation work in item 2, the ongoing costs of the Autonomy-type system added another set of hard to control costs. (The KPMG report does not provide detail related to the costs of triage engineering to fix urgent problems, on-going fixes, and work needed to keep the foundation system current with competitors’ innovations.)

There are other issues with the KPMG which DarkCyber noticed.

Net net: KPMG did a good job making clear that the deal was likely to be a difficult one due to the tax methods, the intra company financial processes, and the mechanisms used to allow Autonomy to demonstrate growth and reasonable margins over the period of time covered by the KPMG professionals.

HP seemed oblivious to the issues “enterprise search” posed; specifically, enterprise search is a niche business delivering expensive, proprietary solutions which rarely satisfy its users regardless of the vendor involved.

HP wanted to buy and buy big and fast. Autonomy appeared to be the solution to HP’s problems. KPMG identified the issues. Impulse buy? Maybe. Uninformed buy? Looks like it. Did Autonomy buff its show car software? Of course, getting the customer to buy is the objective.

Profiles of selected Autonomy-type software vendors are available without charge at the Xenky.com Vendors Web page. You can find that collection of vendor profiles at this link.

Stephen E Arnold, June 15, 2019

Windows and Search: A Work in Progress, Slow Progress

June 13, 2019

Unless you know a file’s specific name, trying to find it using the Windows search function sucks. The Windows search function is notoriously bad in each version from 1995 to the latest Windows 10. Searching on a Windows PC is so bad that Apple makes a point of stating how fast and accurate its Spotlight Search function is. In June 2019, Microsoft debuted its latest Windows version dubbed 1903. MS Power User explores how Windows’ 1903 has changed search (or so Microsoft claims) in the article, “How To Use The Enhanced Windows 10 Search in 1903.”

It is hard to understand how a company that revolutionized how people interact with computers cannot get a simple function correct. Yes, search has its own complexities that require well written code, but it remains one of the simplest machine learning functions compared to language translation, photo editing, and processing audio files. MS Power User agrees that Microsoft let the ball drop when it comes to search, but 1903 might be software patch it needs:

“Microsoft’s Windows 10 has had search as one of its pain points ever since it debuted. Search was often panned for being slow, inaccurate and sometimes just for not finding anything at all. With Windows 10 1903, Microsoft has tackled that. First. Cortana and Search were split apart so the Windows team could tackle both individually. This means that Cortana gets better at Cortana things, while search gets better at Search things. With 1903, those seeds have already borne some fruit.”

To improve search with 1903, users have to adjust the search settings. Windows 1903 has two options: “classic search” and “enhanced search.” By selecting the enhanced search option, the full power of Windows search is projected over a computer’s entire hard drive. Windows classic search sucks. Why is Microsoft still including it in their OS when there is a better option? In fact, why are they even forcing users to choose between the classic and the enhanced search?

A good OS should not make its user work harder. A good OS is a tool that is supposed to easily organize and communicate information. Windows, you are letting me down.

Whitney Grace, June 13, 2019

Google: Can Semantic Relaxing Display More Ads?

June 10, 2019

For some reason, vendors of search systems have shuddered if a user’s query returns a null set. the idea is that a user sends a query to a system or more correctly an index. The terms in the query do not match entries in the database. The system displays a message which says, “No results match your query.”

For some individuals, that null set response is high value information. One can bump into null sets when running queries on a Web site; for example, send the anti fungicide query to the Arnold Information Technology blog at this link. Here’s the result:


From this response, one knows that there is no content containing the search phrase. That’s valuable for some people.

To address this problem, modern systems “relax” the query. The idea is that the user did not want what he or she typed in the search box. The search system then changes the query and displays those results to the stupid user. Other systems take action and display results which the system determines are related to the query. You can see these relaxed results when you enter the query shadowdragon into Google. Here are the results:


Google ignored my spelling and displays information about a video game, not the little known company Shadowdragon. At least Google told me what it did and offers a way to rerun the query using the word I actually entered. But the point is that the search was “relaxed.”

The purpose of semantic expansion is a variation of Endeca’s facets. The idea is that a key word belongs to a category. If a system can identify a category, then the user can get more results by selecting the category and maybe finding something useful. Endeca’s wine demonstration makes this function and its value clear.

Read more

Google Makes Search, Mmmmm, Better

June 7, 2019

First AR Objects Launch in Google Search with 3D Animals” reports that Google makes search better again. Search for an animal on a supported device while you are doing the Google Lens thing and you will see a three dimensional animal. I would be thrilled if a query returned relevant results. Plus, I am okay with relevant links directing me to a relevant document which may or may not contain an illustration. Ah, progress. What happens if Google reconnects with a robot company so that as one looks at an AR rendering of a tiger, a robot tiger comes to the user’s location and snarls. Relevant? Heck, yes.

Stephen E Arnold, June 7, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta