Free Dissertation? Act Fast or You May Have to Pay Up and a Lot
June 20, 2020
DarkCyber spotted “Discovering Dennis Ritchie’s Lost Dissertation.” The main point of the write up is that a wizard failed to hand over a copy of his dissertation to the institution library. As a result, no PhD and no scanning, indexing, and selling of the good student’s work by University Microfilms. I have no clue what this outfit is called today, but in the 1960s, the outfit zoomed through Kodak film and helped animate environmental controls on photoprocessing chemicals. Silver and all that, of course.
The main point of the write up for me is the link to the aforementioned dissertation. Free and online as June 20, 2020, at Ritchie_dissertation.pdf. Miss this chance and you may have to pony up some hard cash for a professional publishing/database company’s honest work of making money by converting students’ fear and perspiration into an online charge.
Oh, what did the student cook up? The C language.
Stephen E Arnold, June 20, 2020
Short Cut Debater Delight: URL to a Snippet
June 19, 2020
Let us journey back in time. I was a high school and college debate person. I think one of my “advisors” called us “debaters,” but I think he was saying, “De-daters.” Yeah, popular.
The year is 1964, and my debate partner was a silver tongued Greek American named Nick G. I was a fat, bespeckled trailer court person who hid in the library. My job was to read stuff and write summaries on 5×8 note cards. Remember those?
If I spotted a useful fact about the National Defense Education Act or similar burning topic for a 19 year old, I would cross reference the factoid, index it with a color tinted pencil, and organize the note cards in my really big wooden box. Cool, right?
Flash forward to a debate at some empty campus in January and a “debate tournament.” Sad affairs? You bet. Nick and I were listening to a couple of swifties from Dartmouth explain that Nick and I were stupid, losers from an intellectual nowheresville, and candidates for life in a tuna packing plant owned by one of the Dartmouth wizard’s family.
I spotted a note card, a snippet, and a cross reference. Coincidence, maybe. Cut to the punch line: The Dartmouth rebuttal person changed the factoids and quoted an edited version of the information I had recorded in my blissful hours of alone-ness in the library.
My turn to speak arrived, and I began by pointing out that snippets out of context were not the stuff an Ivy Leaguer was fabricated. The “fabrication” of misstatements, misquotes, and misrepresentations were proof that the arguments constructed by the shortcut artists from Hanover, New Hampshire (wherever that was) were fluff.
Bingo. I summed up our case and sat down.
We won the debate and the tournament. I think my father-in-law used the trophy as a tie rack.
I thought of Eleazar’s losers. Nick and I ate a pizza at some joint before the bus ride back to the frozen Midwest where our one-horse college pumped information into hungry Illinoisans.
Google is allegedly going to facilitate short cut thinking if the information in “Google’s New Chrome Extension Lets You Link Directly to Specific Text on a Page” is accurate, but today, who knows?
The idea is that a person creates or fabricates a factoid, creates a link, and leads the Dartmouth-type research to just what is needed to support a castle of clips.
The old fashioned approach mostly required finding information, reading something, copying or photocopying the pages, converting the information to a note card, and going through the indexing thing.
The process had the effect of imprinting the information on the mind. If one had a good memory as Nick or I did, we could pull information, find the source, and convert that item into a useful addition to our argument.
What happens if one takes a shortcut? You get the Dartmouth approach to information; that is, fix it up and skip the work.
The write up states:
The Google extension builds upon a new feature that was recently added to Chromium called Text Fragments, which works by appending extra linking information to a URL after a #. It’s the same technology that Google now sometimes uses to link to specific parts of a webpage in search results. However, these URLs can be long and difficult to manually create if you’re linking to longer sections of text, or complex web pages where the same words are repeated multiple times. This extension simplifies the creation process.
Right, who needs context? Also, what happens when Google “hides” urls so one has to use Google search to locate a source?
Any wonder why some of the arguments presented by “real” lawyers and journalists are so stupid?
The intellectual rigor has not just relaxed; it has checked into Hotel California and chilling out. Bump, bump, bump. Hanover arrives in La La Land.
Stephen E Arnold, June 19, 2020
IBM: No Facial Recognition? Why? Just Ask Watson
June 19, 2020
Racial tensions are at an all time high due to the recent unnecessary deaths of many black people at the hands of law enforcement. People of all think backgrounds are calling for an elimination of technology that amplifies racial profiling tactics, particularly facial recognition technology. IBM, along with other technology companies, want to support diversity and anti-violent police acts, so Live Mint shares how: “IBM Gets Out Of Facial Recognition Business, Oppose Use For Mass Surveillance.”
CEO of IBM Arvind Krishna stated that her company would no longer sell facial recognition or analysis software for racial profiling and mass surveillance. Mass surveillance is another major fear that people hold when it coms to facial technology. The potential for a huge spy network is possible and if government backs the bill it becomes scarier than Orwell’s imagination.
Facial recognition also perpetuates racial profiling, because the data used to train the AI is usually biased. Most of the data only contains information related to white individuals, because they are the ones designing the technology making their data more readily available. Due to the lack of diverse facial data, current facial recognition technology is prone to more errors with ethnic minorities.
Krishna is right to not offer the technology, especially when it could do more harm than good:
“ ‘IBM firmly opposes and will not condone uses of any technology, including facial recognition technology offered by other vendors, for mass surveillance, racial profiling, violations of basic human rights and freedoms, or any purpose which is not consistent with our values and Principles of Trust and Transparency,’he was quoted as saying in the letter.”
Facial recognition is off the radars for most law enforcement agencies right now, but that does not mean the technology should not be developed. The biggest solution to ensure accuracy is acquiring more diverse data.
One of my DarkCyber colleagues said at Shake Shack a day ago, “Maybe the IBM facial recognition technology does not work, and this was an easy way to get out of sticky ditch.”
Maybe? Watson, what do you think?
Whitney Grace, June 19, 2020
Smart Software Developers Need to Up Their Training Regimen
June 19, 2020
Oh, the irony. The Next Web reports, “Microsoft’s AI Editor Uses Photo of Wrong Mixed-Race Popstar in Story About Racism.” Not a good look, Microsoft. The brief write-up tells us:
“According to The Guardian, Microsoft software used to replace the human journalists running news site MSN.com confused two mixed-race members of British pop group Little Mix. In an MSN.com article headlined ‘Little Mix star Jade Thirlwall says she faced horrific racism at school’, the software mistakenly picked a photo of Thirwall’s fellow band member Leigh-Anne Pinnock. MSN.com has since replaced the incorrect image. The outlet’s remaining human staff have been warned that the software could automatically publish the Guardian’s article again, and told to remove the story when it does. However, they were also informed that their AI overlord could overrule their attempts to delete it.”
That warning is mildly disturbing. That Skynet-like point aside, MSN is facing scrutiny for the display of bias. Thirwall called it out for the mix-up on Instagram, noting it is one that happens so often it has “become a running joke.”
One may conclude this is one job we should leave to human editors, but they tend to make similar mistakes. In fact, we’re reminded, AI’s inability to distinguish between individuals of color reflects its training at the hands of mostly white developers. Microsoft plans to replace the AI that made the error with a newer version—as it replaces dozens of human workers with the same software. Let us hope this iteration is better trained.
Cynthia Murrell, June 19, 2020
Mindbreeze: Big News from Austria
June 19, 2020
Moving enterprise search and data analysis to the cloud means security becomes an even greater concern, and one provider recently had an audit performed on its platform. Olean Times Herald reports, “Mindbreeze InSpire SaaS Receives SOC2 Type 1 Attestation.” A System and Organization Controls 2 audit assesses how well a system complies with certain standards on the handling of data. “Type 1” means the assessment reports on a snapshot of time, no longer than six months. Consulting company KPMG completed the audit report. The write-up tells us:
“In the context of the auditing process, KPMG examined whether the Trust Services Criteria (TSC) for security – issued by the American Institute of Certified Public Accountants (AICPA) – are observed. This involved inspecting and documenting the existing internal control mechanisms for the services offered, such as those relating to risk minimization, access controls, monitoring measures, and communication. The audit took the form of an ISAE 3000 Type 1 audit (testing the design and the implementation for a specific deadline) and was conducted over a period of roughly four weeks. Mindbreeze received the final test results as an ISAE 3000 SOC2 Type 1 Report.”
The report will provide information to Mindbreeze’s clients and auditors. Founder and CEO Daniel Fallmann emphasizes that tight security and adherence to operating standards are priorities for his firm. The company’s platforms rely on AI tech to produce business insights to its clients. Based in Chicago, Mindbreeze was founded in 2015.
Cynthia Murrell, June 19, 2020
Real Estate Firm Wants to Be Real
June 19, 2020
Digital business listings can be just as lucrative as physical property holdings. The right domain name can sell for thousands and videogames sell digital objects and upgrades in micro transactions. When a digital holding that belongs to you, however, is “stolen” it can be difficult to reclaim it. The Fisher Group shares how this happened to them in the blog post, “Google Gave Away Our Business Listing To A Competitor And Our Fight To Get It Back.”
A real estate firm associated with Summit Sotheby’s had their Google Business account merged with another agent’s. There was not an explanation for the sudden merger and the firm was forced to rely on webmaster support manned entirely by volunteers. The forums offered no help and the real estate firm spent over year trying to track down someone who could assist them.
The real estate firm wanted their old page back, because I had positive reviews from clients and through hard work they reached the top of searches for their area. Without anywhere else to turn, they were forced to write a plea:
“So at this point we’ve decided to write about it more publicly in hopes of getting the attention of someone at Google who can help us out with this unique and annoying situation. We’ll be also seeing what the SEO community thinks and if they’ve had this experience before, we certainly couldn’t find anyone else with this issue searching around on Reddit, Facebook groups and other SEO forums.
While we’re sad we’re losing out on some business, we would like to see that this doesn’t happen to anyone else in the future because we know how precious those reviews can be to businesses of all sizes.”
The issue still is not resolved. Remember: If one is not findable in Google, one may not exist or be “real.”
Whitney Grace, June 19, 2020
Web Analytics: A Fancy Way of Saying You Have a Blue Ribbon Winning Bloodhound Tracking You
June 18, 2020
DarkCyber is easily confused. Every day brings more incredible cyber security marketing hoo-hah. And each day more incredible security issues come to light. A good example was the Wall Street Journal’s story “Russian Hackers Evaded Firms’ Detection Tools”, Wednesday, June 18, 2020. Yeah, those cyber tools are special.
The story “Lightweight Alternatives to Google Analytics” is a helpful round up of digital bloodhounds. If you are looking for ways to make sense of Web site log files, you can work through the snapshots of such systems as GoatCounter, Plausible, Simple Analytics, and Fathom.
The intriguing segment of the write up is, in DarkCyber’s opinion, this statement:
Google tracks and stores a huge amount of information about users.
A 2018 paper [PDF] by Douglas Schmidt highlights the extent of Google’s tracking, with location tracking on Android devices as one example:
Both Android and Chrome send data to Google even in the absence of any user interaction. Our experiments show that a dormant, stationary Android phone (with Chrome active in the background) communicated location information to Google 340 times during a 24-hour period, or at an average of 14 data communications per hour. The paper distinguishes between “active” and “passive” tracking. Active tracking is when the user directly uses or logs into a Google service, such as performing a search, logging into Gmail, and so on. In addition to recording all of a user’s search keywords, Google passively tracks users as they visit web sites that use GA and other Google publisher tools. Schmidt found that in an example “day in the life” scenario, “Google collected or inferred over two-thirds of the information through passive means”. Schmidt’s paper details how GA cookie tracking works, noting the difference between “1st-party” and “3rd-party” cookies — the latter of which track users and their ad clicks across multiple sites: While a GA cookie is specific to the particular domain of the website that user visits (called a “1st-party cookie”), a DoubleClick cookie is typically associated with a common 3rd-party domain (such as doubleclick.net). Google uses such cookies to track user interaction across multiple 3rd-party websites. When a user interacts with an advertisement on a website, DoubleClick’s conversion tracking tools (e.g. Floodlight) places cookies on a user’s computer and generates a unique client ID. Thereafter, if the user visits the advertised website, the stored cookie information gets accessed by the DoubleClick server, thereby recording the visit as a valid conversion. Because such a large percentage of web sites use Google advertising products as well as GA, this has the effect that the company knows a large fraction of users’ browsing history across many web sites, both popular sites and smaller “mom and pop” sites. In short, Google knows a lot about what you like, where you are, and what you buy. Google does provide ways to turn off features like targeted advertising and location tracking, as well as to delete the personalized profile associated with an account. However, these features are almost entirely opt-in, and most users either don’t know about them or just never bother to turn them off. Of course, just switching away from GA won’t eliminate all of these privacy issues (for example, it will do nothing to stop Android location tracking or search tracking), but it’s one way to reduce the huge amount of data Google collects. In addition, for site owners that use a GA alternative, Google does not get a behind-the-scenes look at the site’s traffic patterns — data which it could conceivably use in the future to build a competing tool.
A paywall may be protecting this write up. Nevertheless, if the information in the passage quoted above is accurate, Google’s senior management may have to do some explaining as the company executes some “Dancing with the Stars” footwork if regulators decide to dig into such assertions.
And the bloodhound, “Who me?” Woof.
Stephen E Arnold, June 18, 2020
AI Failures: Fast and Furious Arrivals
June 18, 2020
What can go wrong with AI? Quite a lot, actually, including errors and biases that can cause harm if left unchecked. ImmuniWeb’s Security Blog discusses what it considers the “Top 10 Failures of AI.” Entries range from “AI fails to do image recognition” to “AI that hated humans.” Examples of racial bias and misogyny are included, as well dangerously flawed medical advice from IBM’s famous Watson. See the article for details on these cases and more.
The post goes on to discuss reasons AI fails: bad or insufficient data, bad engineering, or the wrong area of application. To avoid these perils, we’re advised:
“Never overestimate the capabilities of AI. It doesn’t make miracles and it is nowhere close to those ‘strong AI’ smarties from Hollywood blockbusters. You need a lot of relevant, cleaned and verified data to train an adequate model. The data is crucial for machine learning, but it is not all you need. Choosing a correct algorithm and tuning its parameters need a lot of tests and trials by a team of highly qualified experts. Most importantly, an AI system has a very limited capability of replacing humans. It can replace humans in simple, but tedious tasks, that consist of a lot of repeating routines. Any complex task that requires non-trivial approach to solution may lead to a high level of errors by AI. The best role an AI can play now is an assistant to humans who use AI as a tool to do a lot of routines and repeating operations.”
The article concludes, sensibly, by tooting ImmuniWeb’s own horn. It mentions a couple of awards, and emphasizes it views AI as a way to augment, not replace, human capabilities. We’re told it tests and updates its AI models “relentlessly. Focused on AI for application security, the small company was founded just last year in Geneva, Switzerland.
Cynthia Murrell, June 18, 2020
Hulbee Is In the Enterprise Search Derby
June 18, 2020
Enterprise search should be an easy out-of-the-box, deployable solution, but more often it is a confusing mess. Companies like Hulbee Enterprise Search develop search programs that delete the guesswork and immediately function:
“Hulbee Enterprise Search not only provides a simple search software, but also consolidates our experience and knowledge, which has been accumulated for over 17 years and combines intelligent search, format diversity, different corporate infrastructures, security, etc. in areas such as document management.
Our goal is to create a timely software technology for you that meets all security requirements. We would be very pleased if you test our software. Request a Proof of Concept.
Our software complements existing software products from other manufacturers such as SharePoint, Exchange, DMS etc. through the innovation of the search. It is thus not a competition, but an addition to and completion of the optimal search in the company.”
The purpose of enterprise search is to quickly locate information, so it can be employed by a business. Information includes structured and unstructured data, so enterprise search needs to be robust and smart enough to filter relevant results. Search must also be compliant with security measures, especially as more businesses host their data on clouds.
Enterprise search solutions like Hulbee must be flexible enough to adjust to changing security measures, but also continue to offer the same and better features for search.
Customization is key to being a contender in the marker for enterprise search.
Whitney Grace, June 18, 2020
BLEURT Blabs NLG Well
June 18, 2020
Humans want their technological offspring to sound like them. AI and machine learning have advanced to where computers can carry simple conversations, but they are far from fluent. Natural language generation had grown in recent years and their performance is measured by humans and automatic metrics. Neowin discusses Google’s BLEURT, an automatic metric for natural language models, in “BLEURT Is A Metric For Natural Language Generators Boasting Unprecedented Accuracy.”
BLEURT works like some other smart software:
“At the heart of BLEURT is machine learning. And for any machine learning model, perhaps the most important commodity is the data it trains on. However, the training data for an NLG performance metric is limited. Indeed, on the WMT Metrics Task dataset, which is currently the largest collection of human ratings, contains approximately 260,000 human ratings apropos the news domain only. If it were to be used as the sole training dataset, the WMT Metric Task dataset would lead to a loss of generality and robustness of the trained model.”
BLEURT’s research team employed transfer learning to improve upon the WMT MEtrick Task dataset and also used a novel pre-training scheme to better its robustness and accuracy. It underwent two training phases: language modeling followed by evaluating NLG models. BLEURT scored the highest amongst similar technology. BLEURT’s goal is to improve Google’s language abilities.
Whitney Grace, June 18, 2020