Calling Out Search: Too Little, Too Late

January 20, 2020

The write up’s title is going to be censored in DarkCyber. We are not shrinking violets, but we think that stop word lists do exist. Problem? Buzz your favorite ad supported search vendor and voice your complaints.

The write is “How Is Search So #%&! Bad? A ‘Case Study’.” The author appears to be frustrated with the outputs of ad supported and probably other types of seemingly “free” search systems providing links to Web content. This is what some people call “open source intelligence online”. There are other information resources available, but most of the consumer oriented, eyeball hungry vendors ignore i2p, forums with minimal traffic, what some experts call the Dark Web, and even some government information services. How many people pay any attention to the US National Archives? Be honest in your assessment.

Here’s a passage we noted:

Google Search is ridiculously, utterly bad.

This seems clear.

The write up provides some examples, but I anticipate that some other people have found that the connection between a user’s query and the Google search outputs is tenuous at best. One criticism DarkCyber has of the write up is that it mentions Google, shifts to Reddit, and then to metadata. The key point for us was the focus on time.

Now time is an interesting issue in indexing. Years ago I did a research project on the “meaning” of “real time” in online services. I think my research team identified five or six different types of time. I will skip the nuances we identified and focus only on the data or freshness of an item in a results list.

Let’s by sympathetic to the indexing company. Here’s why:

First, many documents do not provide an explicit date in the text of the article. In Beyond Search and DarkCyber, you will notice that we provide the author’s name and a day and data at which the article was posted. Many write ups on the open Web don’t bother. In fact, there will be no easy way to date the time the author posted the story within the content displayed in a browser. Don’t you love news releases which do not include a date, time, and time zone?

Second, many write ups include dates and times in the text of an article. For example, the reference to Day 2 of the recent CES trade show may include the explicit date January 8, 2020, for a product announcement. The approach is similar to using CES without spelling out “Consumer Electronics Show.” Buy, hey, these folks are busy, and everyone in the know understands the what and when, right?

Third, auto-assigned dates by operating systems may be “correct” when a file or content object is created. But what happens when a file or drive is restored? The original dates and metadata may be replaced with the time stamp of the restore. What about date last accessed or date last changed? Too much detail. Yada yada.

Fourth, time sorting is possible. Google invested in Recorded Future (now part of Insight). I had heard that someone at the GOOG thought Recorded Future’s time functions were nifty. Guess not. Google did not implement more sophisticated time functions in any service other than those related to advertising. For the great unwashed masses of those who don’t work at Google, tough luck I supposed.

Fifth, when was the content first indexed. More significantly, when was the content last updated. Important? May be, gentle reader. May be.

There are several other conditions as well. For the purposes of a blog post, I want to make clear: The person who is annoyed with search should have been annoyed decades ago. These time problems are not new, and they are persistent.

The author with a penchant for tardy profanity stated:

Part of the issue in this specific case is that they’ve started ignoring settings for displaying results from specific time periods. It’s definitely not the whole issue though, and not something new or specific to phone searches. Now, I’ve always been biased towards the new – books, tech, everything, but I can’t help but feel that a lot of things which were done pretty well before are done worse today. We do have better technology, yet we somehow build inferior solutions with it all too often. Further, if they had the same bias of showing me only recent results I’ll understand it better, but that’s not even the case. And yes, I get that the incentives of users and providers don’t align perfectly, that Google isn’t your friend, etc. But what is DDG’s excuse? As for the Case Study part, and me saying this isn’t simply a rant – I lied, hence the quotation marks in the title. Don’t trust everything you read, especially the goddamn dates on your search results.

The write up omits a few other minor problems with modern search and retrieval systems. Yep, this includes Reddit, LinkedIn, and a bunch of others. Let me provide a few dot points:

  • Poorly implemented Boolean search
  • Zero information about what’s in an index
  • Zero information about what’s excluded from and index and why
  • Minimal auto linking to information about an “author” or the “source” of the content
  • No data to make a precision or recall calculation possible and reproducible
  • No data to make it possible to determine overlap among Web indexes. Analyses must be brute forced. Due to the volatility, latency, and editorial vagaries of ad supported Web search systems, data are mostly suggestive.

Why? Why are none of these dot points operative?

Answer: Too expensive, too hard, not appropriate for our customers, and “What are you talking about? We never heard of half these issues you identified.”

Net net: Years ago I wrote an article for Searcher Magazine, edited at the time by Barbara Quint, a bit of an expert in online information retrieval. She worked at RAND for a number of years as an information expert. She said, “Do you really want me to use the title ‘Search Sucks’ on your article.” I told her, use whatever title you want. But if you agree with me, go with “sucks.”  She used “sucks”. Let’s see that was a couple of decades ago.

Did anyone care? Nope. Does anyone care today? Nope. There you go.

Stephen E Arnold, January 20, 2020

European Commission Facial Recognition White Paper

January 20, 2020

The EC is trying to herd ducks. The facial recognition issue may become less of a backburner issue and more of a mass of congealed spaghetti. A white paper, allegedly the real deal, of course, has surfaced. You can download the document from this link: https://www.euractiv.com/wp-content/uploads/sites/2/2020/01/AI-white-paper-EURACTIV.pdf. If it is not available, DarkCyber doesn’t have any bright ideas. The document has no title and is dated “12/12”. As you know, none of the Web search engines are very good when it comes to traditional bibliographic metadata.

Stephen E Arnold, January 19, 2020

The Clearview Write Up: A Great Quote

January 20, 2020

DarkCyber does not want to join in the hand waving about the facial recognition company called Clearview. Instead, we want to point out that the article is available without a pay wall from this link: https://bit.ly/2TO26H1

Also, the write up contains a great quote about technology like facial recognition. Here it is:

It’s creepy what they’re doing, but there will be many more of these companies. There is no monopoly on math.—Al Gidari, a privacy professor at Stanford Law School

DarkCyber wants to point out that a number of companies have gathered collections of images from a wide range of sources. The write up points to investors who may or may not be the power grid behind this particular technology application.

The inventor fits a stereotype: College drop out, long hair, etc.

The write up also identifies officers who allegedly found the database of images and the services helpful.

The New York Times continues to report on specialized technology. There are upsides and downsides to the information. One upside is that the write ups inform people about technology and its utility. The downside is that the information presented may generate a situation in which individuals can be put at risk or a negative tint given to something that is applied math and publicly accessible data.

It is interesting to consider combining services; for example, brand monitoring and image search. Perhaps that is another story for the New York Times?

Stephen E Arnold, January 20, 2020

Answering This Question: What Does Country X Export? Now Easy

January 20, 2020

Economic complexity is not the bane of college fresh persons. Nope. Investors, New Age snake oil vendors, and quick-buck artists need to answer questions like “What does Peru export?” Answering the question requires work, including interacting with the wonderful resources of online government agencies and non governmental organizations. Now you can answer the Peru question as well as exports by any other country with a mouse click. Head over to the tree map visualization tool from the Observatory of Economic Complexity. You will find the Web system at this link. Data are not real time, but, for now, the reports are free. Hours of fun, just like those cram sessions at university for the Econ 101 mid term.

Stephen E Arnold, January 20, 2020

Twitter: Embracing Management Maturity?

January 20, 2020

Twitter has a new initiative in 2020 to keep academic researchers honest, although it is not advertised in that manner. TechCrunch shares the details in the article, “Twitter Offers More Support To Researchers-To ‘Keep Us Accountable.’” Twitter’s new support for academic researchers is a new hub called “Twitter Data for Academic Researchers” and it has easier access to Twitter’s information and support about its APIs. Within the hub, one can apply for a developer account, links for researcher tools, and information about the APIs Twitter offers.

Twitter apparently added the Twitter Data for Academic Researchers hub this year based off researchers’ demands. The social media platform states they want to encourage communication and offer more support between developers. One reason Twitter wants more transparency and easier communication with its developers is due to the United States’s 2020 presidential election. Twitter, like most social media platforms, wants to cut down the number of bots and/or false news reports that effected the 2016 election. There is also the need to tamper down these accounts on a regular basis:

“Tracking conversation flow on Twitter also still means playing a game of ‘bot or not’ — one that has major implications for the health of democracies. And in Europe Twitter is one of a number of platform giants which, in 2018, signed up to a voluntary Code of Practice on disinformation that commits it to addressing fake accounts and online bots, as well as to empowering the research community to monitor online disinformation via “privacy-compliant” access to platform data.”

Twitter wants to support its developer community, but the transparency also makes it easier for Twitter to hold people responsible for their actions. They are keeping tabs on how their technology is used, while also assisting developers with their work. It is a great idea and if trouble arises, it might make it easier to track down the bad actors who started the mess. It is also another score for Twitter, because Facebook does not support academics well. Facebook has altered its APIs for researchers and Facebook does not want to stop false information spreading.

Whitney Grace, January 20, 2020

New Chinese Facial Recognition Camera Reduces False Positives

January 19, 2020

In a move that should surprise nobody, China has created the ultimate facial recognition hardware. The Telegraph reports, “China Unveils 500 Megapixel Camera that Can Identify Every Face in a Crowd of Tens of Thousands.” Researchers revealed the “super camera,” which can see four times more detail than the human eye, at China’s International Industry Fair. Of course, no surveillance tech is complete without an AI; writer Freddie Hayward tells us:

“The camera’s artificial intelligence will be able to scan a crowd and identify an individual within seconds. Samantha Hoffman, an analyst at the Australian Strategic Policy Institute, told the ABC that the government has massive databases of people’s images and that data generated from surveillance video can be ‘fed into a pool of data that, combined with AI processing, can generate tools for social control, including tools linked to the Social Credit System’.”

Yes, the Social Credit System. China is no stranger to spying on its people, and this development will only make their current practices more effective. We learn:

“China currently has an estimated 200 million CCTV cameras watching over its citizens. For the past few years the country has been building a social credit system that will generate a score for each citizen based upon data about their lives, such as their credit score, whether they donate to charity, and their parenting ability. Punishments and rewards that citizens will receive based upon their score include access to better schools and universities and restricted travel. The current CCTV network is a central tool in gathering data about its citizens, but the cameras aren’t always powerful enough to take a clear picture of someone’s face in a crowd. The new 500 megapixel, or 500 million pixel, camera will help to remedy this.”

Indeed it will. I suppose if you are going to build a social system around snooping on the people, it should be as accurate as possible. You wouldn’t want to keep one citizen out of a good school because someone who looked like them was caught littering.

Cynthia Murrell, January 19, 2020

Google Allegedly Ostracized

January 18, 2020

I worked in the San Francisco area once affectionately known as Plastic Fantastic. My recollection is that most of the people with whom I worked and socialized were flexible. There was the occassional throwback who longed for the rigidity of the Midwestern farm life. But overall, chill was the word. The outfit who paid me to do whatever it was they thought I was my skill was an easy going money machine. Most of the high technology outfits were just starting to get a sense of the power and impact afforded those who were comfortable with online technologies, nifty must have gadgets, and a realization that members of the high school science club could call the shots.

Imagine my surprise when I read the allegedly accurate “San Francisco Pride Members Pass Resolution to Ban Google, YouTube from Future Parades.” The write up states:

Members of the LGBTQ+ organization say they passed an amendment to ban Google, YouTube and Alphabet, as well as the Alameda County Sheriff’s Office, from future celebrations after a vote at their monthly membership meeting Wednesday night. In a statement released to SFGATE on Thursday, SF Pride members and former Google engineers Laurence Berland and Tyler Breisacher said they are now urging the board of directors to formally approve the motion at their upcoming meeting on Feb. 5.

Remarkable if true. The Google HR and marketing departments will have to step up their efforts. Recruitment may become more difficult. The PR vibes are doing the Hopf fibration thing. (This is a nice way of saying, “Difficult to understand.”)

Stephen E Arnold, January 18, 2020

Amazon and Microsoft: Different Ways to Leverage $1 Billion

January 17, 2020

Author and big gun Brad Smith, president of Microsoft, allegedly wrote “Microsoft Will Be Carbon Negative by 2030.” To achieve this goal, the company will spend $1 billion dollars. Okay, that appears to work out to $8.3 million per month for 10 years. That’s about 11 Azure Cognitive S4 transactions. Impressive. I suppose it depends on one’s point of view. From the PR perspective, this is probably a decent billion. From other points of view, one’s mileage may vary.

Now contrast this Microsoft $1 billion with Amazon’s. Dark Cyber noted “During Bezos Visit, India minister Says Amazon’s $1 Billion Investment Is No Big Favour.” The write up states something that is a PR downer:

Amazon and Walmart’s Flipkart are facing mounting criticism from India’s brick-and-mortar retailers, which accuse the U.S. giants of violating Indian law by racking up billions of dollars of losses to fund deep discounts and discriminating against small sellers. The companies deny the allegations.

Amazon’s reaction? Read on:

Bezos said on Wednesday [January 15, 2020] Amazon would invest $1 billion to bring small businesses online in the country, adding to the $5.5 billion the company had committed since 2014.

Stepping back, Microsoft is going for good ink. Amazon seems to be going after what may be the second or third largest market in the world for Amazon services and battery powered Ring doorbells.

Interesting uses of $1 billion.

Stephen E Arnold, January 17, 2020

The New Doing Gooder Google

January 17, 2020

Google’s cheerleading unit likes to remind us, amid the constant criticisms, that the company makes some positive contributions to society. For example, it seems their AI has gotten good at detecting cancer. We learn from AndoridCentral that “Google’s AI Is Better at Detecting Cancer than Doctors, Says Study.” About the same research, Ausdroid reports, “Google Publish their Impressive Breast Cancer Screening Using AI Results.” The capabilities are courtesy of technology developed by Google acquisition DeepMind. The study was performed by Google Health in conjunction with Cancer Research UK Imperial Centre, Northwestern University, and Royal Surrey County Hospital. Researchers used deep-learning tools to create AI detection models and applied them to almost 30,000 patients for whom results were already known. Muhammad Jarir Kanji of AndroidCentral writes:

“The system was trained using a large dataset of mammograms from women in the two countries. Even more telling than its better accuracy than doctors was the fact that it did so with far less information than the radiologists it was competing with, who also had access to the patients’ medical history and previous mammograms in their deliberations. … While the paper noted that ‘AI may be uniquely poised to help with’ the challenge of detecting breast cancer, Darzi said the system was not yet at a stage where it could replace a human reader.”

Emphasis on “yet.” Meanwhile, Ausdroid’s Scott Plowman emphasizes:

“The data sets were also NOT used to train the AI system and thus we totally unknown to the system.

Comparing the positive results from the AI to those patients who ended up having biopsy-confirmed breast cancer the AI demonstrated a ‘statistically significant’ improvement in ‘absolute specificity’ of 1.2% (UK – double read), and 5.7% (USA – single read) and an improvement in absolute sensitivity of 2.7% (UK) and 9.4% (USA). For reference, sensitivity is the ability to correctly identify lesions and specificity is how accurate it is at identifying those without lesions. This means that it has a reduction in both false positives and false negatives.”

If Google’s PR team spins more stories like this one, they just might be able to burnish the company’s reputation.

Cynthia Murrell, January 08, 2020

US China Deal: The Honeymoon Will Not Last Long

January 17, 2020

DarkCyber spotted a write up called “China Bracing for US Tech War with Plan to Cut Reliance on Imports of Key Components to Just 25 Per Cent.” If the information in the write up is accurate, the implications for certain countries and companies selling to China could be interesting. We noted this statement in the article:

China is aiming to increase its reliance on domestic production for key components, including chips and controlling systems, to 75 per cent by 2025, according to a former minister.

So a dollar spent by China to shore up its Great Firewall will allegedly become $0.25 in 60 months or less.

This statement seemed to more of a warning and less of an olive branch extended to the US:

The move, which includes a series of plans to improve weak links in the areas of hi-tech research and crucial component development “one by one”, is seen as part of China’s preparation for a intensifying technology war with the United States.

(“China Must Rein in SOEs to Gain Upper Hand in Tech War, Help Private Firms like Huawei to Innovate” provides some color on China’s desire to become the dominant technology player in the future.)

To support the knowledge sector, the write up reveals:

China will also increase the number of “national manufacturing innovation centers” to 40 by 2025 from 11 at the end of 2019 “to cover all major industries”. China’s first national manufacturing innovation centre was launched in 2016, focusing on making and researching electric vehicle batteries.

The concluding section of the write up states the obvious:

is increasingly clear that a technology rivalry between China and US is set to deepen…with competition in next generation communication, 5G and artificial intelligence key areas of contention.

Net net: A calm before the storm.

Stephen E Arnold, January 17, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta