AI SLIDE: A Breakthrough or a Shaped Insight
March 4, 2020
DarkCyber noted an interesting, although sketchy summary, of a CPU and hash table approach to machine learning. “Deep Learning Rethink Overcomes Major Obstacle in AI Industry” suggests that Amazon and Google are barking up the wrong artificial intelligence method.
The innovation is the use of hash tables for deep learning. The idea is that one looks up an item, perfect for Intel CPUs. The “old” way relies on matrix mathematics, perfect for nVidia graphics chips. In fact, the solution is a search problem, a point in the write up which may annoy the Googlers; to wit:
“You don’t need to train all the neurons on every case,” Medini [a Rice wizard] said. “We thought, ‘If we only want to pick the neurons that are relevant, then it’s a search problem.’ So, algorithmically, the idea was to use locality-sensitive hashing to get away from matrix multiplication.”
The reason this insight is important is that if it proves useful and can flip the opinions of those innovators with tens of thousands of GPUs generating heat is that machine learning becomes less expensive. (How much does it cost to cool lots of GPUs doing math? Answer: A lot.)
The approach is dubbed SLIDE. The acronym is about as slick as relying on an Intel processor: Sub-Linear Deep Learning Engine. Too bad AMD. You have Linus, the YouTube star, as your cheerleader.
Advantages include:
- Cheaper
- Faster
- More efficient training.
Disadvantages revealed include:
- Memory is needed, lots of memory
- Unexpected cache thrashing (data are here, oops, data are not hear, rinse and repeat)
- Access to Intel engineers reduced the inefficiency by 50 percent, but 50 percent of what? Misses, latency, halts, other?
The point of the announcement is to make clear that Amazon and Google are going about machine learning the wrong way. Does anyone at either firm care? Sure, and it will be fun for the researchers to check out their approach, look up what was investigated in the past, and figure out if it is better to switch than fight.
Net net: Seems interesting and definitely a rah rah for Intel. The write up makes no reference to IBM or other machine learning outfits. Marketing or shaped insight? It is too soon to answer this question definitively.
Stephen E Arnold, March 4, 2020
Cyber Security Marketing: About to Get Much Noisier in 2020
March 4, 2020
“Businesses at Risk for Cyber attack but Take Few Precautions” states:
Although businesses are increasingly at risk for cyber attacks on their mobile devices, many aren’t taking steps to protect smartphones and tablets.
Let’s assume this statement is accurate and based on verifiable data.
Given this assumption, what will 2020 mean for the hundreds of vendors selling cyber “early warning” intelligence, smart cyber moats, and tools to prevent phishing emails from snapping confidential information?
The answer is, “More marketing.”
Another possible answer is, “More insight into how some organizations respond to threats like ransomware and loss of data.
Interesting disconnect which does not seem to slow venture firms’ appetites for smart cyber intelligence firms.
If the risk is high, why not take action? Perhaps priorities, cost, and complexity have an impact?
Stephen E Arnold, March 4, 2020
Adobe PDF: Maybe as Interesting As Flash?
March 4, 2020
Adobe Portable Document Format files flashed on DarkCyber’s radar in the mid 1980s. Adobe pitched the virtues of PDF to big publishing companies. And Stephen E Arnold worked at such an organization at this time. I was given the job of examining the early version of PDF referenced by the code named Trapeze.
Trapeze artists fall to their death. Adobe Acrobat pulled off a spectacular trick, survived, became sort of open, and now seems to be a permanent part of the landscape decorated with the dumpsters burning Microsoft XPS Document Writer files.
A very good write up about the problems PDF files is FilingDB’s “What’s So Hard about PDF Text Extraction?” The information in this write up makes explicit why PDFs are not easy to manipulate, analyze, and mine.
The write up provides the data needed to understand that when a vendor says, “We process the hidden content in PDF files”, those vendors do not explain how much and what is omitted, ignored, and unindexed.
People believe that when specifying a filetype: command to Bing or Google delivers comprehensive content from PDF files. No way, sad to say. The same problem exists for any search or content processing vendor’s connectors for PDF files.
This is important when one is conducting mission critical data analysis, certain investigations, and other types of work in which “zero error” is the goal. Will the problem be remediated. Maybe, but I spotted in the 1980s, and it persists today.
Stephen E Arnold, March 4, 2020
BA Insight: Interesting Spin for Enterprise Search
March 4, 2020
DarkCyber noted BA Insight’s blog post “Make Federation A Part Of Your Single Pane Of Glass.” What’s interesting in the write up are the assertions about enterprise search. Note that the BA Insight Web site include search along with a number of other terms, including “knowledge,” “seekers,” “connectors”, “smart hub”, and “auto classification.”
Let’s look at the assertions which attracted DarkCyber’s attention.
- “Many have considered enterprise search to be too complex.” Interesting but a number of companies have failed because what people want a search system to deliver is inherently tricky. The Google Search Appliance was “easier” to implement than a local install of Entopia, for example, but the GSA failed because meeting information needs is difficult in many cases.
- Users want a “single pane of glass.” Plus “This improved unified view will dramatically improve the search experience.” The problem remains is that information is not equal. Lawyers have to guard litigation information. Drug researchers have to keep pharma research under wraps. Human resources, what some millennials call “people” jobs have to guard employee health data, salary information, data related to hiring distributions. The “single pane of glass” is an interesting assertion, but federation is more difficult to achieve than some believe… until the services and consulting fees are tallied.
- “And, you go live quickly, instantly adding value (you don’t wait six months for crawling to complete).” The speed with which a customer can go live depends upon a number of factors; for example, dealing with security levels, processing content so that it is “findable” by a user, and latencies which creep into distributed systems. Instantly is an appealing term like new. But instantly?
Several observations:
- BA Insight is a vendor of search and retrieval services for organizations. The company has worked very hard to explain that search is more than search.
- The benefits of the BA Insight approach reads like a checklist of the types of problems which once plagued most enterprise search vendors from Autonomy and Verity. Unfortunately many of these challenges remain today.
- BA Insight has moved from its SharePoint centric approach to a wider range of platforms. T
The marketing is interesting. Data backing the assertions would be helpful.
Stephen E Arnold, March 4, 2020
Enterprise Document Management: A Remarkable Point of View
March 3, 2020
DarkCyber spotted “What Is an Enterprise Document Management (EDM) System? How to Implement Full Document Control.” The write up is lengthy, running about 4,000 words. There are pictures like this one:
ECM is enterprise content management and in the middle is Enterprise Document Management which is abbreviated DMS, not EDM.
The idea is that documents have to be managed, and DarkCyber assumes that most organizations do not manage their content — regardless of its format — particularly well until the company is involved in a legal matter. Then document management becomes the responsibility of the lawyers.
In order to do any type of document or content management, employees have to follow the rules. The rules are the underlying foundation of the article. A company manufacturing interior panels for an automaker will have to have a product management system, an system to deal with drawings (paper and digital), supplier data, and other bits and pieces to make sure the “door cards” are produced.
The problem is that guidelines often do not translate into consistent employee behavior. One big reason is that the guidelines don’t fit into the work flows and the incentive schemes do not reward the time and effort required to make sure the information ends up in the “system.” Many professionals write something, text it, and move on. Enterprise systems typically do not track fine grained information very well.
Like enterprise search, the “document management” folks try to make workers who may be concerned about becoming redundant, a sick child, an angry boss, or any other perturbation in the consultant’s checklist ignore many information rules.
There is an association focused on records management. There are companies concerned with content management. There are vendors who focus on images, videos, audio, and tweets.
The myth that an EDM, ECM, or enterprise search system can create an affordable, non invasive, legally compliant, and effective way to deal with the digital fruit cake in organizations is worth lots of money.
The problem is that these systems, methods, guidelines, data lakes, federation technologies, smart software, etc. etc. don’t work.
The article does a good job of explaining what a consultant recommends. The information it presents provides fodder for the marketing animals who are going to help sell systems, training, and consulting.
The reality is that humans generate information and use a range of systems to produce content. Tweets about a missed shipment from a person mobile phone may be prohibited. Yeah, explain that to the person who got the order in the door and kept the commitment to the customer.
There are conferences, blogs, consulting firms, reports, and BrightPlanet videos about managing information.
The write up states:
There is no use documenting and managing poor workflows, processes, and documentation. To survive in business, you have to adapt, change and improve. That means continuously evaluating your business operations to identify shortfalls, areas for improvements, and strengths for continuous investment. Regular internal audits of your management systems will enable you to evaluate the effectiveness of your Enterprise Document Management solution.
Right. When these silver bullet, pie-in-the-sky solutions cost more than budgeted, employees quit using them, and triage costs threaten the survival of the company — call in the consultants.
Today’s systems do not work with the people actually doing information creation. As a result, most fail to deliver. Sound familiar? It should. You, gentle reader, will never follow the information rules unless you are specifically paid to follow them or given an ultimatum like “do this or get fired.”
Tweet that and let me know if you managed that information.
Stephen E Arnold, March 3, 2020
Import.io and Connotate: One Year Later
March 3, 2020
There has been an interesting shift in search and content processing. Import.io, founded in 2012, purchased Connotate. Before you ask, “Connotate what?”, let me say that Connotate was a content scraping and analysis firm. I paid some attention to Connotate when it acquired Fetch, an outfit with an honest-to-goodness Xoogler on its team. Fetch processed structure data and Connotate was mostly an unstructured data outfit. I asked a Connotate professional when the company would process Dark Web content, only to be told, “We can’t comment on that.” Secretive, right.
Connotate was founded in 2000 and required about $25 million in funding. The amount Import.io paid was not revealed in a source to which DarkCyber has access. Import.io, which has ingested about $38 million. DarkCyber assumes that the stakeholders are confident that 1 + 1 will equal 3 or more.
Import.io says:
We are funded by some of the greatest minds in technology.
The great minds include AME Cloud Ventures, Open Ocean, IP Group, and several others.
The company explains:
Starting from a simple web data extractor and evolving to an enterprise level solution for concurrently getting data that drives business, industry, and goodness.
What’s the company provide? The answer is Web data integration: Identify, extract, prepare, integrate, and consume content from a user-provided list of urls. To illustrate the depth of the company’s capabilities, Import.io defines “prepare” this way:
Integrate prepared data with a library of APIs to support seamless integration with internal business systems and workflows or deliver it to any data repository to develop robust data sets for advanced analytics capabilities.
The firm’s Web site makes it clear that it serves the online travel, retail, manufacturing, hedge fund, advisory services, data scientists, analysts, journalists, marketing and product, hospitality, and media producers. These are a mix of sectors and industries, and DarkCyber did not create the grammatically inconsistent listing.
Import.io offers videos which provide some information about one of its important innovations “interactive extractors.” The idea is to convert script editing to point-and-click choices.
The company is growing. About a year ago, Import.io said that it experienced record sales growth. The company provided a link to its Help Center, but a number of panels contained neither information nor links to content.
The company offers a free version and a premium version. Price quotes are provided by the company.
Like Amplyfi and maybe ServiceMaster, Import.io is a company providing search and content processing with a 21st century business positioning. A new buzzword is needed to convey what Import.io, Amplyfi, and Service Master are providing. DarkCyber believes that these companies are examples of where search and content processing has begun to coalesce.
The question is, “Is acquiring, indexing, and analyzing OSINT content a truck stop or a destination like Miami Beach?”
Worth monitoring the trajectory of the company.
Stephen E Arnold, March 3, 2020
Microsoft Azure: Search, Artificial Intelligence, and Some Mystical Magic
March 3, 2020
DarkCyber spotted “Microsoft Announcements on Azure Artificial Intelligence.” The article is a summary of assorted Microsoft Azure assertions. Note that the article did not offer any information about Cortana’s and Windows 10 search semi-failure to thrill its users. But Azure is different. Microsoft does Azure better than Windows 10 updates… sometimes.
There were several highlights in the article.
First, Azure has artificial intelligence. The approach is open, interoperable, workflow, and “easy adaptation.” Is this way certified Microsoft Azure professionals are buying new houses and fancier automobiles?
Second, Azure does machine learning. The idea is that there are agents, applications, a machine learning model engine, support for R, and an enterprise edition. DarkCyber does not know a single person running Azure to make life better, faster, and cheaper except Azure consultants. But the big assertion is that Azure’s ML “delivers a unified data science experience.” DarkCyber wonders, “Does this include Outlook attachments?”
Third, Azure has updated some of its “old” features. There’s nothing like constant improvement like the flow of Windows 10 updates, uninstalls, and reinstalls. Now Azure does better decision making. Sentiment analysis has more deep learning and natural language processing. The system can do image analysis, and its has some of that Cortana goodness which has been repositioned in Windows 10 because it was so darned wonderful.
Fourth, Azure does knowledge mining. Azure does cognitive search. Azure recognizes forms.
The showcase client is a publishing company. The Atlantic has gone all in on the Azure systems. Another happy camper is AutoTrader.ca. Plus Archive 360 is tickled with the ability to use Azure cognitive search quickly and cost effectively. Yep, DarkCyber believes this was a smooth, easy implementation.
If you doubt that Microsoft is number one, read the article. If not, you will enjoy some of the ironies. How many search systems does Microsoft offer? How many of them are super? Who remembers Fast Search & Transfer?
Yep, super search the Azure way. It’s just like using Word’s numbering feature or figuring out PowerPoint backgrounds.
Stephen E Arnold, March 3, 2020
Canadian Government Computers: Getting Arthritic
March 3, 2020
Government organizations are usually the last to upgrade their computer systems to anything resembling state of the art technology. Lack of new technology prevents the government organizations from implementing new, streamlined procedures and even catching bad actors who cheat the system. CBC explains that something much worse could happen with old government technology in the article: “Aging Government Computer Systems At Risk Of ‘Critical Failure,’ Trudeau Warned.”
One would expect Canada to be on top of its computer systems compared to the United States, but in many ways the country has just as many problems as its southern neighbor. Canadian Prime Minister Justin Trudeau described that his country’s computer infrastructures are outdated and on the brink of collapsing. Many of the computer systems are almost sixty years old, but the technology can no longer be maintained.
The Canadian Press accessed documents through the Access to Information Act, but most of the obtained information was blacked out except lines like “to stabilize mission-critical systems.”
Outdated computer systems are not high on politicians’ priority list, but they are learning that their constituents are happier when government organizations work. One of Canada’s most notorious antiquated and very used computer system is the Employment and Social Development that manages insurance benefits for every citizen. Upgrades are in the budget, but they cost more than anticipated.
“The Liberals have already made multiple changes to the federal social safety net that required programming changes to old systems. The documents to Trudeau suggest the aged systems pose a problem for more changes the Liberals have promised.
“The complex array of existing programs and services means that future program changes, to continue providing Canadians with the programs and services they expect when interacting with their government, will need to account for pressures on legacy IT systems, which are facing rust-out and critical failure,” part of the briefing binder says.
“These aging platforms neither meet the desired digital interaction nor are capable of full automation, and thus are unable to deliver cost-savings through back-office functions.”
Upgrades are planned, but the projects are complex. Also funding is required to keep systems running that were not meant to be used for so long. Funding is pulled away from upgrades to keep the legacy systems running. Canada did not attempt to update its systems as long they worked.
There is not a need to panic yet, but the warning signs are starting to blink. Canada’s government tried to update some IT-related projects in the past but they did not do well. It is estimated that upgrading all of Canada’s systems will take a decade. The problem will be finding the money and the right people to handle the project. Once the decade is over and everything is upgraded, the Canadian government will have to start all over again because technology advances so quickly. At least there will be a better system to upgrade from.
Whitney Grace, March 3, 2020
Facebook Is Definitely Evil: Plus or Minus Three Percent at a 95 Percent Confidence Level
March 2, 2020
The Verge Tech Survey 2020 allegedly and theoretically reveals the deepest thoughts, preferences, and perceptions of people in the US. The details of these people are sketchy, but that’s not the point of the survey. The findings suggest that Facebook is a problem. Amazon is a problem. Other big tech companies are problems. Trouble right here is digital city.
The survey findings come from a survey of 1123 people “nationally representative of the US.” There was no information about income, group with which the subject identifies, or methodology. But the result is a plus or minus three percent at a 95 percent confidence level. That sure seems okay despite DarkCyber’s questions about:
- Sample selection. Who pulled the sample, from where, were people volunteers, etc.
- “Nationally representative” means what? Was it the proportional representation method? How many people from Montana and the other “states”? What about Puerto Rico? Who worked for which company?
- Plus or minus three percent. That’s a swing at a 95 percent confidence level. In terms of optical character recognition that works out to three to six errors per page about 95 percent of the time. Is this close enough for a drone strike or an enforcement action. Oh, right, this is a survey about big tech. Big tech doesn’t think the DarkCyber way, right?
- What were the socio economic strata of the individuals in the sample?
What’s revealed or discovered?
First, people love most of the high profile “names” or “brands.” Amazon is numero uno, the Google is number two, and YouTube (which is the Google in case you have forgotten is number three. So far, the data look like a name recognition test. “Do you prefer this unknown lye soap or Dove?” Yep, people prefer Dove. But lye soap may be making a come back.
The stunning finding is that Facebook and Twitter impact society in a negative way. Contrast this to lovable Google and Amazon, 72 percent are favorable to the Google and 70 percent are favorable to Amazon.
Here’s the data about which companies people trust. Darned Amazing. People trust Microsoft and Amazon the most.
Which companies do the homeless and people in rural West Virginia trust?
Plus 72 percent of the sample believe Facebook has too much “power.” What does power mean? No clue for the context of this survey.
Gentle reader, please, examine the article containing these data. I want to go back in time and reflect on the people who struggled in my statistics classes. Painful memories but I picked up some cash tutoring. I got out of that business because some folks don’t grasp numerical recipes.
Stephen E Arnold, March 2, 20020
Here Is a Cheery Observation: Everything Is Hackable
March 2, 2020
We noted Vineet Kumar’s observations about security. “From Needle to Airplane, Everything Is Hackable, Says India’s Leading Cybersecurity Guru” includes this statement:
Every industry is hackable today. From the needle to the airplane, everything is hackable today. Smart technology penetration into organizations and even into homes leaves everyone susceptible to hacking.
Is there a fix?
Yep, embrace Mr. Kumar’s Cyber Peace Foundation.
What’s the outfit deliver?
Cyber Peace Foundation is a leading multi-stakeholders initiative and is crowdsourcing cybersecurity needs for civil society. The organization has over 12,000 members and 1,200 volunteers, from different parts of the world. It engages in spreading awareness and promoting technical research and in bringing together the government, industry experts, and academia.
There’s also a conference and a global cyber challenge:
Throwing light on the need for safer cyberspace: There are different ways and means through which your data can be stolen. By just clicking on one link, all your date can be gone and you may not even realize that your data is gone.
If everything is hackable, presumably his conference registration and its other Web forms are security risks. Odd that he did not emphasize the security of his operation, its bug bounty hunters, and it ethical hackers exempt from his glittering generality about “everything.”
Gurus are exempt perhaps?
Stephen E Arnold, March 2, 2020