Easy Monitoring for Web Site Deltas
February 9, 2023
We have been monitoring selected Clear Web pages for a research project. We looked at a number of solutions and turned to VisualPing.io. The system is easy to use. Enter a url of the Web page for which you want a notification of a delta (change). Enter an email, and the system will provide you with a notification. The service is free if you want to monitory five Web pages per day. The company has a pricing FAQ which explains the cost of more notification. The Visual Ping service assumes a user wants to monitor the same Web site or sites on a continuous basis. In order to kill monitoring for one site, a bit of effort is required. Our approach was to have a different team member sign up and enter the revised monitor list. There may be an easier way, but without an explicit method, a direct solution worked for us.
Stephen E Arnold, February 9, 2023
Consumer Image Manipulation: Deep Fakes or Yes, That Is Granny!
September 7, 2022
I find deep fake services interesting. Good actors can create clever TikTok and YouTube videos. Bad actors can whip up a fake video résumé and chase a work from home job. There are other uses as well; for example, a zippy video professional can create a deep fake of a “star” who may be dead or just stubborn and generate a scene. Magic and maybe cheaper.
I read “Use This Free Tool to Restore Faces in Old Family Photos.” The main idea is that a crappy old photo with blurry faces can be made almost like “new.” The write up says:
This online tool—called GFPGAN—first made it onto our radar when it was featured in the August 28 edition of the (excellent) Recomendo newsletter, specifically, a post by Kevin Kelly. In it, he says that he uses this free program to restore his own old family photos, noting that it focuses solely on the faces of those pictured, and “works pretty well, sometimes perfectly, in color and black and white.”
The service has another trick amidst its zeros and ones:
According to the ByteXD post, in addition to fixing or restoring faces in old photos, you can also use GFPGAN to increase the resolution of the entire image. Plus, because the tool works using artificial intelligence, it can also come in handy if you need to fix AI art portraits. ByteXD provides instructions for both upscaling and improving the quality of AI art portraits, for people interested in those features.
Will it work on passport photos and other types of interesting documents? We will have to wait until the bad actors explore and innovate.
Stephen E Arnold, September 8, 2022
Why Investigative Software Is Expensive
December 3, 2020
In a forthcoming interview, I explore industrial-strength policeware and intelware with a person who was Intelligence Officer of the Year. In that review, which will appear in a few weeks, the question of cost of policeware and intelware is addressed. Systems like those from IBM’s i2, Palantir Technologies, Verint, and similar vendors are pricey. Not only is there a six or seven figure license fee, the client has to pay for training, often months of instruction. Plus, these i2-type systems require systems and engineering support. One tip off of to the fully loaded costs is the phrase “forward deployed engineer.” The implicit message is that these i2-type systems require an outside expert to keep the digital plumbing humming along. But who is responsible for the data? The user. If the user fumbles the data bundle, bad outputs are indeed possible.
What’s the big deal? Why not download Maltego? Why not use one of the $100 to $3,000 solutions from jazzy startups by former intelligence officers? These are “good enough”, some may assert. One facet of the cost of industrial strength systems available to qualified licensees is a little appreciated function: Dealing with data.
“Keep Data Consistency During Database Migration” does a good job of explaining what has to happen in a reliable, consistent way when one of the multiple data sources contributes “new” or “fresh” data to an intelware or policeware system. The number of companies providing middleware to perform these functions is growing. Why?
Most companies wanting to get into the knowledge extraction business have to deal with the issues identified in the article. Most organizations do not handle these tasks elegantly, rapidly, or accurately.
Injecting incorrect, stale, inaccurate data into a knowledge centric process like those in industrial strength policeware causes those systems to output unreliable results.
What’s the consequence?
Investigators and analysts learn to ignore certain outputs.
Why? The outputs can be more serious than a flawed diagram whipped up by an MBA who worries only about the impression he or she makes on a group of prospects attending a Zoom meeting.
Data consistency is a big deal.
Stephen E Arnold, December 2, 2020
Update for TemaTres, a Taxonomy Tool
March 25, 2020
In order to create and maintain a Web site, database, or other information source, a powerful knowledge management applications needed. There are numerous proprietary knowledge management software on the market, but the problem is often the price tag and solutions are not available out of the box. Open source software is the best way to save money and curate a knowledge management application to your specifications. The question remains: what open source knowledge management software should you download?
One of the top knowledge management software available via open source is TeamTres. TeamTres is described as a:
“Web application for management formal representations of knowledge, thesauri, taxonomies and multilingual vocabularies.”
TemaTres allows users to manage, publish, and share ontologies, taxonomies, thesauri, and glossaries. TemaTres includes numerous features that are designed for the best taxonomy development experience. Among these features are: MARC21 XML Schema, search function, keyword suggestions, user management, multilingual interface, scope notes, relationship visualizations, term reports, terminology mapping, unique code for each term, free terms control, vocabulary harmonization features, no limits on delimiters, integration into web tools, and more.
TemaTres requires programming knowledge to make it functional. Data governance is an important part of knowledge management and it gives editorial control over content. It is an underrated, but valuable tool.
Whitney Grace, March 25, 2020
Into R? A List for You
May 12, 2019
Computerworld, which runs some pretty unusual stories, published “Great R Packages for Data Import, Wrangling and Visualization.” “Great” is an interesting word. In the lingo of Computerworld, a real journalist did some searching, talked to some people, and created a list. As it turns out, the effort is useful. Looking at the Computerworld table is quite a bit easier than trying to dig information out of assorted online sources. Plus, people are not too keen on the phone and email thing now.
The listing includes a mixture of different tools, software, and utilities. There are more than 80 listings. I wasn’t sure what to make of XML’s inclusion in the list, but, the source is Computerworld, and I assume that the “real” journalist knows much more than I.
Two observations:
- Earthworm lists without classification or alphabetization are less useful to me than listings which are sorted by tags and alphabetized within categories. Excel does perform this helpful trick.
- Some items in the earthworm list have links and others do not. Consistency, I suppose, is the hobgoblin of some types of intellectual work
- An indication of which item is free or for fee would be useful too.
Despite these shortcomings, you may want to download the list and tuck it into your “Things I love about R” folder.
Stephen E Arnold, May 12, 2019
Machine Learning Frameworks: Why Not Just Use Amazon?
September 16, 2018
A colleague sent me a link to “The 10 Most Popular Machine Learning Frameworks Used by Data Scientists.” I found the write up interesting despite the author’s failure to define the word popular and the bound phrase data scientists. But few folks in an era of “real” journalism fool around with my quaint notions.
According to the write up, the data come from an outfit called Figure Eight. I don’t know the company, but I assume their professionals adhere to the basics of Statistics 101. You know the boring stuff like sample size, objectivity of the sample, sample selection, data validity, etc. Like information in our time of “real” news and “real” journalists, some of these annoying aspects of churning out data in which an old geezer like me can have some confidence. You know like the 70 percent accuracy of some US facial recognition systems. Close enough for horseshoes, I suppose.
Here’s the list. My comments about each “learning framework” appear in italics after each “learning framework’s” name:
- Pandas — an open source, BSD-licensed library
- Numpy — a package for scientific computing with Python
- Scikit-learn — another BSD licensed collection of tools for data mining and data analysis
- Matplotlib — a Python 2D plotting library for graphics
- TensorFlow — an open source machine learning framework
- Keras — a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano
- Seaborn — a Python data visualization library based on matplotlib
- Pytorch & Torch
- AWS Deep Learning AMI — infrastructure and tools to accelerate deep learning in the cloud. Not to be annoying but defining AMI as Amazon Machine Learning Interface might be useful to some
- Google Cloud ML Engine — neural-net-based ML service with a typically Googley line up of Googley services.
Stepping back, I noticed a handful of what I am sure are irrelevant points which are of little interest to a “real” journalists creating “real” news.
First, notice that the list is self referential with python love. Frameworks depend on other python loving frameworks. There’s nothing inherently bad about this self referential approach to shipping up a list, and it makes it a heck of a lot easier to create the list in the first place.
Second, the information about Amazon is slightly misleading. In my lecture in Washington, DC on September 7, I mentioned that Amazon’s approach to machine learning supports Apache MXNet and Gluon, TensorFlow, Microsoft Cognitive Toolkit, Caffe, Caffe2, Theano, Torch, PyTorch, Chainer, and Keras. I found this approach interesting, but of little interest to those creating a survey or developing an informed list about machine learning frameworks; for example, Amazon is executing a quite clever play. In bridge, I think the phrase “trump card” suggests what the Bezos momentum machine has cooked up. Notice the past tense because this Amazon stuff has been chugging along in at least one US government agency for about four, four and one half years.
Third, Google brings up dead last. What about IBM? What about Microsoft and its CNTK. Ah, another acronym, but I as a non real journalist will reveal that this acronym means Microsoft Cognitive Toolkit. More information is available in Microsoft’s wonderful prose at this link. By the way, the Amazon machine learning spinning momentum thing supports the CNTK. Imagine that? Right, I didn’t think so.
Net net: The machine learning framework list may benefit from a bit of refinement. On the other hand, just use Amazon and move down the road to a new type of smart software lock in. Want to know more? Write benkent2020 @ yahoo dot com and inquire about our for fee Amazon briefing about machine learning, real time data marketplaces, and a couple of other most off the radar activities. Have you seen Amazon’s facial recognition camera? It’s part of the Amazon machine learning imitative, and it has some interesting capabilities.
Stephen E Arnold, September 16, 2018
Useful AI Tools and Frameworks
July 6, 2018
We have found a useful resource: DZone shares “10 Open-Source Tools/Frameworks for Artificial Intelligence.” We do like open-source software. The write-up discusses the advantages offered by each entry in detail, so navigate there to compare and contrast the options. For example, regarding the popular TensorFlow, writer Somanath Veettil describes:
“TensorFlow is an open-source software library, which was originally developed by researchers and engineers working on the Google Brain Team. TensorFlow is for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow provides multiple APIs. The lowest level API — TensorFlow Core — provides you with complete programming control. The higher level APIs are built on top of TensorFlow Core. These higher level APIs are typically easier to learn and use than TensorFlow Core. In addition, the higher level APIs make repetitive tasks easier and more consistent between different users. A high-level API like tf.estimator helps you manage data sets, estimators, training, and inference. The central unit of data in TensorFlow is the tensor. A tensor consists of a set of primitive values shaped into an array of any number of dimensions. A tensor’s rank is its number of dimensions.”
The rest of Veettil’s entries are these: Apache SystemML, Caffe, Apache Mahout, OpenNN, Torch, Neuroph, Deeplearning4j (the “j” is for Java), Mycroft, and OpenCog. I note that several options employ a neural network, but approach that technology in different ways. It is nice to have so many choices for implementing AI; now the challenge is to determine which system is best for one’s particular needs. This list could help with that.
Cynthia Murrell, July 6, 2018
The Spirit of 1862 Seems to Live On
July 2, 2018
Years ago I learned about a Confederate spy who worked in a telegraph office used by General Henry Halleck and General US Grant. The confederate spy allegedly “filtered” orders. This man in the middle exploit took place in 1862. You can find some information about this incident at this link. The Verge dipped into history for its 2013 write up “How Lincoln Used the Telegraph Office to Spy on Citizens Long Before the NSA.” Information about the US Signals Corps and Bell Telephone / AT&T is abundant.
Why am I dipping into history?
The reason is that I read several articles similar to “8 AT&T Buildings That Are Central to NSA Spying.” The Intercept’s story, which struck me as a bit surprising, triggered this cascade of “wow, what a surprise” copycat articles.
Even though I live in rural Kentucky, the “spy hubs” did not strike me as news, a surprise, or different from systems and methods in use in many countries. Just as Cairo, Illinois, was important to General Grant, cities with large populations and substantial data flows are important today.
Stephen E Arnold, July 2, 2018
CyberOSINT: Next Generation Information Access Explains the Tech Behind the Facebook, GSR, Cambridge Analytica Matter
April 5, 2018
In 2015, I published CyberOSINT: Next Generation Information Access. This is a quick reminder that the profiles of the vendors who have created software systems and tools for law enforcement and intelligence professionals remains timely.
The 200 page book provides examples, screenshots, and explanations of the tools which are available to analyze social media information. The book is the most comprehensive run down of the open source, commercial, and cloud based systems which can make sense of social media data, lawful intercept data, and general text and imagery content.
Companies described in this collection of “tools” include:
- Cyveillance (now LookingGlass)
- Decisive Analytics
- IBM i2 (Analysts Notebook)
- Geofeedia
- Leidos
- Palantir Gotham
- and more than a dozen developers of commercial and open source, high impact cyberOSINT tool vendors.
The book is available for $49. Additional information is available on my Xenky.com Web site. You can buy the PDF book online at this link gum.co/cyberosint.
Get the CyberOSINT monograph. It’s the standard reference for practical and effective analysis, text analytics, and next generation solutions.
Stephen E Arnold, April 5, 2018
Multi-purpose Search Tool Is Like Magic
March 2, 2018
The Internet of things has evolved from an entertaining gimmick to instantly access information to an indispensable tool for daily life. Search engines like Google and Duckduckgo make searching the Internet simple, but in closed systems like databases and storage silos, searching is still complicated. Usually, individual systems have their own out-of-the-box search engines, but its accuracy is so-so. Cloud computing complicates search even more. Instead of searching just one system, cloud computing requires search software that can handle multiple systems at once. The search technology is out there, but can it really perform as well as Google or even DuckDuckGo?
The Code Project wrote about a new, multi-faceted search tool in the post, “Multidatabase Text Search Tool.” Searching text in all files across many systems is one of the most complicated procedures for a search engine, especially if you want accuracy and curated results. That is what DBTextFinder was developed for:
DBTextFinder is a simple tool that helps you to perform a precise search in all the stored procedures, functions, triggers, packages and views code, or a selected subset of them, using regular expressions.Additionally, you can search for a given text in all the text fields of a selected set of tables, using regular expressions too.The application provides connections to MySQL, SQL Server and Oracle servers, and supports remote connections via WCF services. You can easily extend the list of available DBMS writing your own connectors without having to change the application code.
DBTextFinder appears to have it all. It is programmable, gets along well with other computer languages, and was designed to be user-friendly. What more could you ask for?
Whitney Grace, March 2, 2018