Map Economics: Useful Content and One Major Omission

February 13, 2020

DarkCyber spotted a paper called “The Economics of Maps.” The authors have presented some extremely useful and interesting information about depicting the real world.

One of the most useful aspects of the article is the list of companies providing different types of mapping services and data. The list of firms in this business includes such providers, vendors, and technology companies as:

Airbus

Farmers Edge

Mapbox

Pitney Bowes

There are some significant omissions; for example, the category for geo-analytics for law enforcement and intelligence applications; for example, the low profile Geogence and investigative tools like those available from Verint.

Worth reading and tucking into one’s intelligence folder in our opinion.

Stephen E Arnold, February 13, 2020

Easy Facial Recognition

February 11, 2020

DarkCyber spotted a Twitter thread. You can view it here (verified on February 8, 2020). The main point is that using open source software, an individual was able to obtain (scrape; that is copying) images from publicly accessible services. Then the images were “processed.” The idea was identify a person from an image. Net net: People can object to facial recognition, but once a technology migrates from “little known” to public-available, there may be difficulty putting the tech cat bag in the black bag.

Stephen E Arnold, February 11, 2020

Math Resources

January 27, 2020

One of the DarkCyber team spotted a list of math resources available. Some cost money; others are free. Math Vault lists courses, platforms, tools, and question – answering sites. Some are relatively mainstream like Wolfram Alpha; others, less well publicized like ProofWiki. You can find the listing at this link.

Kenny Toth, January 26, 2020

Google and Data: Doing Stuff Without Data?

January 25, 2020

The Verge has been one of the foot soldiers carrying a pointy stick toward the Google. A few days ago, Google mobilized its desktop search results. The idea was to make search results look the same; that is, virtually impossible to determine where a link came from, who paid for it, and how it was linked to a finger tap or an honest-to-goodness thumb typed word or phrase.

The Verge noted the difference because its experts looked at a page of results on a tiny display device and then on a bigger device and noted the similarity or differences. “Google’s Ads Just Look Like Search Results Now” stated on January 23, 2020:

In what appears to be something of a purposeful dark pattern, the only thing differentiating ads and search results is a small black-and-white “Ad” icon next to the former.

Yikes, a dark pattern. Tricking users. Changing to match mobile.

A day later, The Verge reported that “Google is backtracking on its controversial desktop search results redesign.” The write up stated:

The company says it will experiment with favicon placement.

But the point is not the Verge’s useful coverage of the Google shift. For DarkCyber, the new interface illustrates that the baloney about Google using data to determine its actions, the importance of A B testing, and the overall brilliance of Googlers illustrates that the GOOG does what it wants.

If Google’s “data” cannot inform the company that an interface change will irritate outfits like the Verge, users, and denizens of the Twitter thing — maybe the company’s data dependence is a shibboleth?

If Google cannot interpret A B data in a way to avoid backlash and crawfishing, maybe Google’s data skills are not what the PR machine says?

DarkCyber thought experimenting and analysis came first at the Google. It seems that these steps come after guessing. Ah, the Google.

Stephen E Arnold, January 25, 2020

Abandoned Books: Yep, Analytics to the Rescue

January 6, 2020

DarkCyber noted “The Most ‘Abandoned’ Books on GoodReads.” The idea is that by using available data, a list of books people could not finish reading can be generated. Disclosure: I will try free or $1.99 books on my Kindle and bail out if the content does not make me quiver with excitement.

The research, which is presented in academic finery, reports that the the author of Harry Potter’s adventurers churned out a book few people could finish. The title? The Casual Vacancy by J.K. Rowling. I was unaware of the book, but I will wager that the author is happy enough with the advance and any royalty checks which clear the bank. Success is not completion; success is money I assume.

I want to direct your attention, gentle reader, to the explanation of the methodology used to award this singular honor to J.K. Rowling, who is probably pleased as punch with the bank interaction referenced in the preceding paragraph.

Several points merit brief, very brief comment:

  • Bayesian. A go to method. Works reasonably well. Guessing has its benefits.
  • Data sets. Not exactly comprehensive. Amazon? What about the Kindle customer data, including time to abandonment, page of abandonment, etc.? Library of Congress? Any data to share? Top 20 library systems in the US? Got some numbers; for example, number of copies in circulation?
  • Communication. The write up is a good example why some big time thinkers ignore the inputs of certain analysts.

To sum up, perhaps The Casual Vacancy may make a great gift when offered by Hamilton Books? A coffee table book perhaps?

Stephen E Arnold, January 6, 2020

Megaputer Spans Text Analysis Disciplines

January 6, 2020

What exactly do we mean by “text analysis”? That depends entirely on the context. Megaputer shares a useful list of the most popular types in its post, “What’s in a Text Analysis Tool?” The introduction explains:

“If you ask five different people, ‘What does a Text Analysis tool do?’, it is very likely you will get five different responses. The term Text Analysis is used to cover a broad range of tasks that include identifying important information in text: from a low, structural level to more complicated, high-level concepts. Included in this very broad category are also tools that convert audio to text and perform Optical Character Recognition (OCR); however, the focus of these tools is on the input, rather than the core tasks of text analysis. Text Analysis tools not only perform different tasks, but they are also targeted to different user bases. For example, the needs of a researcher studying the reactions of people on Twitter during election debates may require different Text Analysis tasks than those of a healthcare specialist creating a model for the prediction of sepsis in medical records. Additionally, some of these tools require the user to have knowledge of a programming language like Python or Java, whereas other platforms offer a Graphical User Interface.”

The list begins with two of the basics—Part-of-Speech (POS) Taggers and Syntactic Parsing. These tasks usually underpin more complex analysis. Concordance or Keyword tools create alphabetical lists of a text’s words and put them into context. Text Annotation Tools, either manual or automated, tag parts of a text according to a designated schema or categorization model, while Entity Recognition Tools often use knowledge graphs to identify people, organizations, and locations. Topic Identification and Modeling Tools derive emerging themes or high-level subjects using text-clustering methods. Sentiment Analysis Tools diagnose positive and negative sentiments, some with more refinement than others. Query Search Tools let users search text for a word or a phrase, while Summarization Tools pick out and present key points from lengthy texts (provided they are well organized.) See the article for more on any of these categories.

The post concludes by noting that most text analysis platforms offer one or two of the above functions, but that users often require more than that. This is where the article shows its PR roots—Megaputer, as it happens, offers just such an all-in-one platform called PolyAnalyst. Still, the write-up is a handy rundown of some different text-analysis tasks.

Based in Bloomington, Indiana, Megaputer launched in 1997. The company grew out of AI research from the Moscow State University and Bauman Technical University. Just a few of their many prominent clients include HP, Johnson & Johnson, American Express, and several US government offices.

Cynthia Murrell, January 02, 2020

Visual Data Exploration via Natural Language

November 4, 2019

New York University announced a natural language interface for data visualization. You can read the rah rah from the university here. The main idea is that a person can use simple English to create complex machine learning based visualizations. Sounds like the answer to a Wall Street analyst’s prayers.

The university reported:

A team at the NYU Tandon School of Engineering’s Visualization and Data Analytics (VIDA) lab, led by Claudio Silva, professor in the department of computer science and engineering, developed a framework called VisFlow, by which those who may not be experts in machine learning can create highly flexible data visualizations from almost any data. Furthermore, the team made it easier and more intuitive to edit these models by developing an extension of VisFlow called FlowSense, which allows users to synthesize data exploration pipelines through a natural language interface.

You can download (as of November 3, 2019, but no promises the document will be online after this date) “FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System.”

DarkCyber wants to point out that talking to a computer to get information continues to be of interest to many researchers. Will this innovation put human analysts out of their jobs.

Maybe not tomorrow but in the future. Absolutely. And what will those newly-unemployed people do for money?

Interesting question and one some may find difficult to consider at this time.

Stephen E Arnold, November 4, 2019

 

Tools and Tips for Google Analytics Implementations

September 16, 2019

Here is a handy resource to bookmark for anyone with Google Analytics in their future. Hacking Analytics describes “The Complexity of Implementing Google Analytics.” Writer and solution architect/ data manager Julien Kervizic explains:

“There is more than just placing a small snippet on a website to implement Google analytics. There are different integration patterns in order to capture the data into Google Analytics, and each integration is subject to a lot of pitfalls and potential regressions needed to guard against. There are also question as to whether or how to use the different APIs provided by GA.”

Kervizic begins by detailing three primary integration patterns: scraping a website, pushing events into a JavaScript data layer, and tapping into structured data. Next are several pitfalls one might run into and ways to counter each. See the write-up for those details.

Of course, your tracking setup is futile if it is not maintained. We learn about automated tests and monitoring tools to help with this step. Last but not least are Google Analytics APIs; Kervizic writes:

“Implementing Google analytics, sometimes requires integrating with Google Analytics APIs, be it for reporting purpose, to push some backend data, or to provide cost or product information. Google Analytics has 3 main APIs for these purposes.”

These are the three main APIs: the reporting API, augmented with the dimensions & metrics explorer for checking different field-naming; the measurement protocol with its hit builder tool for setting up requests; and the management API for automating data imports, managing audiences, and uploading cost info from third-party ad providers.

Cynthia Murrell, September 16, 2019

Graph Theory: Moving to the Mainstream

August 21, 2019

Physics helps engineers master their craft and binary is the start of all basic code, but graph theory is the key to understanding data science. Few people understand the power behind data science, but it powers Web sites they visit everyday: eBay, Facebook, and the all-powerful Google. Graph theory is part of mathematics and allows data to be presented in a clear, concise manner. Analytics India shares a list of game theory software that will make any data scientist’s job easier: “Top 10 Graph Theory Software.” The article explains that:

“Apart from knowing graph theory, it is necessary that one is not only able to create graphs but understand and analyze them. Graph theory software makes this job much easier. There are plenty of tools available to assist a detailed analysis. Here we list down the top 10 software for graph theory popular among the tech folks. They are presented in a random order and are available on major operating systems like Windows, MacOS and Linux.”

Among the recommended software are Tikz and PGF used in scientific research to create vector style graphs. Gephi is free to download and is best used for network visualization and data exploration. NetworkX is a reliable Python library for graphs and networks. LaTeXDraw is for document preparation and typesetting with a graphics editor. It is built on Java. One popular open source tool for mathematics projects is Sage. It is used for outlining graphs and hyper graphs.

MATLAB requires a subscription, but it is extremely powerful tool in creating graph theory visualizations and has a bioinformatics toolbox packed with more ways to explore graph theory functions. Graphic designers favor Inkscape for its ease of use and ability to create many different diagrams. GraphViz is famous for various graphical options for graph theory and also has customizable options. NodeXI is a Microsoft Excel template that is exclusively used for network graphs. One only has to enter a network edge list and then a graph is generated. Finally, MetaPost is used as a programming language and an interpreter program. It can use macros to make graph theory features.

Most of these graph theory software are available with free downloads with upgraded subscription services.

Whitney Grace, August 21, 2019

Hadoop Fail: A Warning Signal in Big Data Fantasy Land?

August 11, 2019

DarkCyber notices when high profile companies talk about data federation, data lakes, and intelligent federation of real time data with historical data. Examples include Amazon and Anduril to name two companies offering this type of data capability.

What Happened to Hadoop and Where Do We Go from Here?” does not directly discuss the data management systems in Amazon and Anduril, but the points the author highlights may be germane to thinking about what is possible and what remains just out of reach when it comes to processing the rarely defined world of “Big Data.”

The write up focuses on Hadoop, the elephant logo thing. Three issues are identified:

  1. Data provenance was tough to maintain and therefore determine. This is a variation on the GIGO theme (garbage in, garbage out)
  2. Creating a data lake is complicated. With talent shortages, the problem of complexity may hardwire failure.
  3. The big pool of data becomes the focus. That’s okay, but the application to solve the problem is often lost.

Why is a discussion of Hadoop relevant to Amazon and Anduril? The reason is that despite the weaknesses of these systems, both companies are addressing the “Hadoop problem” but in different ways.

These two firms, therefore, may be significant because of their approach and their different angles of attacks.

Amazon is providing a platform which, in the hands of a skilled Amazon technologist, can deliver a cohesive data environment. Furthermore, the digital craftsman can build a solution that works. It may be expensive and possibly flakey, but it mostly works.

Anduril, on the other hand, delivers the federation in a box. Anduril is a hardware product, smart software, and applications. License, deploy, and use.

Despite the different angles of attack, both companies are making headway in the data federation, data lake, and real time analytics sector.

The issue is not what will happen to Hadoop, the issue is how quickly will competitors respond to these different ways of dealing with Big Data.

Stephen E Arnold, August 11, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta