Amazon: Buying More Innovation
February 26, 2020
DarkCyber noted the article “Amazon Acquires Turkish Startup Datarow.” The word “startup” is rather loosely applied. Datarow was founded in 2016. Not a spring chicken in DarkCyber’s view is a four year old outfit.
What’s interesting about this acquisition is that it provides the sometimes unartful Amazon with an outfit that specializes in making easier-to-use data tools. The firm appears to have been built around AWS Redshift.
The company’s quite wonky Web site says:
We’re proud to have created an innovative tool that facilitates data exploration and visualization for data analysts in Amazon Redshift, providing users with an easy to use interface to create tables, load data, author queries, perform visual analysis, and collaborate with others to share SQL code, analysis, and results. Together with AWS, we look forward to taking our tool to the next level for customers.
The company provides what it calls “data governance,” a term which DarkCyber means “get your act together” with regard to information. This is easier said than done, but it is a hot button among companies struggling to reduce costs, comply with assorted rules and regulations, and figure out what’s actually happening in their lines of business. Profit and loss statements are not up to the job of dealing with diverse content, audio, video, real time data, and tweets. Well, neither is Amazon, but that’s not germane.
Will Amazon AWS Redshift (love the naming, don’t you?) become easier to use? Perhaps Datarow will become responsible for the AWS Web site?
Stephen E Arnold, February 26, 2020
Facial Recognition: Those Error Rates? An Issue, Of Course
February 21, 2020
DarkCyber read “Machines Are Struggling to Recognize People in China.” The write up asserts:
The country’s ubiquitous facial recognition technology has been stymied by face masks.
One of the unexpected consequences of the Covid 19 virus is that citizens with face masks cannot be recognized.
“Unexpected” when adversarial fashion has been getting some traction among those who wish to move anonymously.
The write up adds:
Recently, Chinese authorities in some provinces have made medical face masks mandatory in public and the use and popularity of these is going up across the country. However, interestingly, as millions of masks are now worn by Chinese people, there has been an unintended consequence. Not only have the country’s near ubiquitous facial-recognition surveillance cameras been stymied, life is reported to have become difficult for ordinary citizens who use their faces for everyday things such as accessing their homes and bank accounts.
Now an “admission” by a US company:
Companies such as Apple have confirmed that the facial recognition software on their phones need a view of the person’s full face, including the nose, lips and jaw line, for them to work accurately. That said, a race for the next generation of facial-recognition technology is on, with algorithms that can go beyond masks. Time will tell whether they work. I bet they will.
To sum up: Masks defeat facial recognition. The future is a method of identification that can work with what is not covered plus any other data available to the system; for example, pattern of walking and geo-location.
For now, though, the remedy for the use of masks is lousy facial recognition and more effort to find innovations.
The author of the write up is a — wait for it — venture capital professional. And what country leads the world in facial recognition? China, according to the VC professional.
The future is better person recognition of which the face is one factor.
Stephen E Arnold, February 21, 2020
Map Economics: Useful Content and One Major Omission
February 13, 2020
DarkCyber spotted a paper called “The Economics of Maps.” The authors have presented some extremely useful and interesting information about depicting the real world.
One of the most useful aspects of the article is the list of companies providing different types of mapping services and data. The list of firms in this business includes such providers, vendors, and technology companies as:
Airbus
Farmers Edge
Mapbox
Pitney Bowes
There are some significant omissions; for example, the category for geo-analytics for law enforcement and intelligence applications; for example, the low profile Geogence and investigative tools like those available from Verint.
Worth reading and tucking into one’s intelligence folder in our opinion.
Stephen E Arnold, February 13, 2020
Easy Facial Recognition
February 11, 2020
DarkCyber spotted a Twitter thread. You can view it here (verified on February 8, 2020). The main point is that using open source software, an individual was able to obtain (scrape; that is copying) images from publicly accessible services. Then the images were “processed.” The idea was identify a person from an image. Net net: People can object to facial recognition, but once a technology migrates from “little known” to public-available, there may be difficulty putting the tech cat bag in the black bag.
Stephen E Arnold, February 11, 2020
Math Resources
January 27, 2020
One of the DarkCyber team spotted a list of math resources available. Some cost money; others are free. Math Vault lists courses, platforms, tools, and question – answering sites. Some are relatively mainstream like Wolfram Alpha; others, less well publicized like ProofWiki. You can find the listing at this link.
Kenny Toth, January 26, 2020
Google and Data: Doing Stuff Without Data?
January 25, 2020
The Verge has been one of the foot soldiers carrying a pointy stick toward the Google. A few days ago, Google mobilized its desktop search results. The idea was to make search results look the same; that is, virtually impossible to determine where a link came from, who paid for it, and how it was linked to a finger tap or an honest-to-goodness thumb typed word or phrase.
The Verge noted the difference because its experts looked at a page of results on a tiny display device and then on a bigger device and noted the similarity or differences. “Google’s Ads Just Look Like Search Results Now” stated on January 23, 2020:
In what appears to be something of a purposeful dark pattern, the only thing differentiating ads and search results is a small black-and-white “Ad” icon next to the former.
Yikes, a dark pattern. Tricking users. Changing to match mobile.
A day later, The Verge reported that “Google is backtracking on its controversial desktop search results redesign.” The write up stated:
The company says it will experiment with favicon placement.
But the point is not the Verge’s useful coverage of the Google shift. For DarkCyber, the new interface illustrates that the baloney about Google using data to determine its actions, the importance of A B testing, and the overall brilliance of Googlers illustrates that the GOOG does what it wants.
If Google’s “data” cannot inform the company that an interface change will irritate outfits like the Verge, users, and denizens of the Twitter thing — maybe the company’s data dependence is a shibboleth?
If Google cannot interpret A B data in a way to avoid backlash and crawfishing, maybe Google’s data skills are not what the PR machine says?
DarkCyber thought experimenting and analysis came first at the Google. It seems that these steps come after guessing. Ah, the Google.
Stephen E Arnold, January 25, 2020
Abandoned Books: Yep, Analytics to the Rescue
January 6, 2020
DarkCyber noted “The Most ‘Abandoned’ Books on GoodReads.” The idea is that by using available data, a list of books people could not finish reading can be generated. Disclosure: I will try free or $1.99 books on my Kindle and bail out if the content does not make me quiver with excitement.
The research, which is presented in academic finery, reports that the the author of Harry Potter’s adventurers churned out a book few people could finish. The title? The Casual Vacancy by J.K. Rowling. I was unaware of the book, but I will wager that the author is happy enough with the advance and any royalty checks which clear the bank. Success is not completion; success is money I assume.
I want to direct your attention, gentle reader, to the explanation of the methodology used to award this singular honor to J.K. Rowling, who is probably pleased as punch with the bank interaction referenced in the preceding paragraph.
Several points merit brief, very brief comment:
- Bayesian. A go to method. Works reasonably well. Guessing has its benefits.
- Data sets. Not exactly comprehensive. Amazon? What about the Kindle customer data, including time to abandonment, page of abandonment, etc.? Library of Congress? Any data to share? Top 20 library systems in the US? Got some numbers; for example, number of copies in circulation?
- Communication. The write up is a good example why some big time thinkers ignore the inputs of certain analysts.
To sum up, perhaps The Casual Vacancy may make a great gift when offered by Hamilton Books? A coffee table book perhaps?
Stephen E Arnold, January 6, 2020
Megaputer Spans Text Analysis Disciplines
January 6, 2020
What exactly do we mean by “text analysis”? That depends entirely on the context. Megaputer shares a useful list of the most popular types in its post, “What’s in a Text Analysis Tool?” The introduction explains:
“If you ask five different people, ‘What does a Text Analysis tool do?’, it is very likely you will get five different responses. The term Text Analysis is used to cover a broad range of tasks that include identifying important information in text: from a low, structural level to more complicated, high-level concepts. Included in this very broad category are also tools that convert audio to text and perform Optical Character Recognition (OCR); however, the focus of these tools is on the input, rather than the core tasks of text analysis. Text Analysis tools not only perform different tasks, but they are also targeted to different user bases. For example, the needs of a researcher studying the reactions of people on Twitter during election debates may require different Text Analysis tasks than those of a healthcare specialist creating a model for the prediction of sepsis in medical records. Additionally, some of these tools require the user to have knowledge of a programming language like Python or Java, whereas other platforms offer a Graphical User Interface.”
The list begins with two of the basics—Part-of-Speech (POS) Taggers and Syntactic Parsing. These tasks usually underpin more complex analysis. Concordance or Keyword tools create alphabetical lists of a text’s words and put them into context. Text Annotation Tools, either manual or automated, tag parts of a text according to a designated schema or categorization model, while Entity Recognition Tools often use knowledge graphs to identify people, organizations, and locations. Topic Identification and Modeling Tools derive emerging themes or high-level subjects using text-clustering methods. Sentiment Analysis Tools diagnose positive and negative sentiments, some with more refinement than others. Query Search Tools let users search text for a word or a phrase, while Summarization Tools pick out and present key points from lengthy texts (provided they are well organized.) See the article for more on any of these categories.
The post concludes by noting that most text analysis platforms offer one or two of the above functions, but that users often require more than that. This is where the article shows its PR roots—Megaputer, as it happens, offers just such an all-in-one platform called PolyAnalyst. Still, the write-up is a handy rundown of some different text-analysis tasks.
Based in Bloomington, Indiana, Megaputer launched in 1997. The company grew out of AI research from the Moscow State University and Bauman Technical University. Just a few of their many prominent clients include HP, Johnson & Johnson, American Express, and several US government offices.
Cynthia Murrell, January 02, 2020
Visual Data Exploration via Natural Language
November 4, 2019
New York University announced a natural language interface for data visualization. You can read the rah rah from the university here. The main idea is that a person can use simple English to create complex machine learning based visualizations. Sounds like the answer to a Wall Street analyst’s prayers.
The university reported:
A team at the NYU Tandon School of Engineering’s Visualization and Data Analytics (VIDA) lab, led by Claudio Silva, professor in the department of computer science and engineering, developed a framework called VisFlow, by which those who may not be experts in machine learning can create highly flexible data visualizations from almost any data. Furthermore, the team made it easier and more intuitive to edit these models by developing an extension of VisFlow called FlowSense, which allows users to synthesize data exploration pipelines through a natural language interface.
You can download (as of November 3, 2019, but no promises the document will be online after this date) “FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System.”
DarkCyber wants to point out that talking to a computer to get information continues to be of interest to many researchers. Will this innovation put human analysts out of their jobs.
Maybe not tomorrow but in the future. Absolutely. And what will those newly-unemployed people do for money?
Interesting question and one some may find difficult to consider at this time.
Stephen E Arnold, November 4, 2019
Tools and Tips for Google Analytics Implementations
September 16, 2019
Here is a handy resource to bookmark for anyone with Google Analytics in their future. Hacking Analytics describes “The Complexity of Implementing Google Analytics.” Writer and solution architect/ data manager Julien Kervizic explains:
“There is more than just placing a small snippet on a website to implement Google analytics. There are different integration patterns in order to capture the data into Google Analytics, and each integration is subject to a lot of pitfalls and potential regressions needed to guard against. There are also question as to whether or how to use the different APIs provided by GA.”
Kervizic begins by detailing three primary integration patterns: scraping a website, pushing events into a JavaScript data layer, and tapping into structured data. Next are several pitfalls one might run into and ways to counter each. See the write-up for those details.
Of course, your tracking setup is futile if it is not maintained. We learn about automated tests and monitoring tools to help with this step. Last but not least are Google Analytics APIs; Kervizic writes:
“Implementing Google analytics, sometimes requires integrating with Google Analytics APIs, be it for reporting purpose, to push some backend data, or to provide cost or product information. Google Analytics has 3 main APIs for these purposes.”
These are the three main APIs: the reporting API, augmented with the dimensions & metrics explorer for checking different field-naming; the measurement protocol with its hit builder tool for setting up requests; and the management API for automating data imports, managing audiences, and uploading cost info from third-party ad providers.
Cynthia Murrell, September 16, 2019