Hyperbolic Reasoning: Smart Software and Blockchain

March 22, 2018

What pairing of words can rival “blockchain” and “artificial intelligence”? I submit that this word duo could become the next peanut butter and jelly, Ma and Pa Kettle, or semantic search. (Yeah, I know “semantic search” is a bit fuzzy, but like smart software and blockchain, marketing and hyperbolic reasoning is mostly unbounded.)

I read “How Blockchain Can Transform Artificial Intelligence.” Now I don’t know what “artificial intelligence” is. I think I understand that blockchain is a distributed database. Blockchain has the charming characteristic of housing malware, stolen videos, and CP (that’s child pornography, I believe).

I agree that a database and data management system are important to many smart software systems. I am not sure that blockchain is the right dog for the Iditarod race, however.

The write up begs to differ. I learned:

By creating segments of verified databases, models can be successfully built and implemented upon only datasets which have been verified. This will detect any faults or irregularity in the data supply chain. It also helps to reduce the stress of troubleshooting and finding abnormal datasets since the data stream is available in segments. Finally, blockchain technology is synonymous with immutability, this means the data is traceable and auditable.

And the article identifies other benefits. But won’t other types of data management systems work as well or better than the much flogged blockchain?

I would suggest that some public blockchains leak information. Furthermore, the blockchain technology can house “attachments”, unwanted fellow travelers accompanying the encrypted data and assorted impedimenta the technology requires.

Some organizations like GSR and Cambridge Analytica prefer to keep their data and access to those data under wraps. The firestorm about Cambridge Analytica’s use of social media data certainly suggests to me that a blockchain approach may not have been an enhancement to the Cambridge Analytica system.

But read the write up. Make your own judgment.

For me, the this plus that approach to buzzwordisms does not convince. The promise— indeed the hope— that zippy technologies will deliver synergies is an example of hyperbolic reasoning.

Stephen E Arnold, March 22, 2018

Schmidt Admits It Is Hard to Discern Between Fact and Fiction

March 15, 2018

One basic research essential is learning how to tell the difference between fact and fiction.  It used to be easier to control and verify news because information dissemination was limited to physical mediums.  The Internet blew everything out of the water and made it more difficult to discern fact and fiction.  Humans can be taught tricks, but AI still has a lot to learn.  The Daily Mail reports that, “Alphabet Chairman Eric Schmidt Admits It Is ‘Very Difficult’ For Google’s Algorithm To Separate Fact From Fiction In Its Search Results.”

Millions of articles and other content is posted daily online.  Google’s job is to sift through it and delivery the most accurate results.  When opposing viewpoints are shared, Google’s algorithm has difficulty figuring out the truth.  Eric Schmidt says that can be fixed with tweaking.  He viewed fact vs. fiction problems as bugs that need repair and with some work they can be fixed.  The article highlights some of the more infamous examples of Google’s failing such as the AutoComplete feature and how conspiracy theories can be regarded as fact.

Search results displaying only hard truth will be as elusive as accurate sentiment analytics.

Schmidt added:

That is a core problem of humans that they tend to learn from each other and their friends are like them.  And so until we decide collectively that occasionally somebody not like you should be inserted into your database, which is sort of a social values thing, I think we are going to have this problem.’

Or we can just wait until we make artificial intelligence smarter.

Whitney Grace, March 15, 2018

Oracle: Sparking the Database Fire

October 3, 2017

Hadoop? Er, what? And Microsoft .SQLServer? Or MarkLogic’s XML, business intelligence, analytics, and search offering? Amazon’s storage complex? IBM’s DB2? The recently-endowed MongoDB?

I thought of these systems when I read “Targeting Cybersecurity, Larry Ellison Debuts Oracle’s New ‘Self-Driving’ Database.”

For me, the main point of the write up is that the Oracle database is coming. There’s nothing like an announcement to keep the Oracle faithful in the fold.
If the write up is accurate, Oracle is embracing buzzy trends, storage that eliminates the guess work, and security. (Remember Secure Enterprise Search, the Security Server, and the nifty credential verification procedures? I do.)
The new version of Oracle, according to the write up, will deliver self driving. Cars don’t do this too well, but the Oracle database will and darned soon.

The 18c Autonomous Database or 18cad will:

  • Fix itself
  • Cost less than Amazon’s cloud
  • Go faster
  • Be online 99.995 percent of the time

And more, of course.

Let’s assume that Oracle 18cad works as described. (Words are usually easier to do than software I remind myself.)

The customers look to be big winners. Better, faster, cheaper. Oracle believes its revenues will soar because happy customers just buy more Oracle goodies.

Will there be  a downside?

What about database administrators? Some organizations may assume that 18cad will allow some expensive database administrator (DBA) heads to roll.
What about the competition? I anticipate more marketing fireworks or at least some open source “sparks” and competitive flames to heat up the cold autumn days.

Stephen E Arnold, October 3, 2017

Short Honk: Database Cost

September 26, 2017

If you want to get a sense of the time and computational cost under the covers of Big Data processing, please, read “Cost in the Land of Databases.” Two takeaways for me were [a] real time is different from what some individuals believe, and [b] if you want to crunch Big Data bring money and technical expertise, not assumptions that data are easy.

Stephen E Arnold, September 26, 2017

AI to Tackle Image Reading

September 11, 2017

The new frontier in analytics might just be pictures. Known to baffle even the most advanced AI systems, the ability to break pictures into recognizable parts and then use them to derive meaning has been a quest for many for some time. It appears that Disney Research in cahoots with UC Davis believe they are near a breakthrough.

Phys.org quotes Markus Gross, vice president at Disney Research, as saying,

We’ve seen tremendous progress in the ability of computers to detect and categorize objects, to understand scenes and even to write basic captions, but these capabilities have been developed largely by training computer programs with huge numbers of images that have been carefully and laboriously labeled as to their content. As computer vision applications tackle increasingly complex problems, creating these large training data sets has become a serious bottleneck.

A perfect example of the application of this is MIT attempts to use AI to share recipes and nutritional information just by viewing a picture of food. The sky is the limit when it comes to possibilities if Disney and MIT can help AI over the current hump of limitations.

Catherine Lamsfuss, September 11, 2017

Blockchain Quote to Note: The Value of Big Data as an Efficient Error Reducer

September 6, 2017

I read “Blockchains for Artificial Intelligence: From Decentralized Model Exchanges to Model Audit Trails.” The foundation of the write up is that blockchain technology can be used to bring more control to data and models. The idea is an interesting one. I spotted a passage tucked into the lower 20 percent of the article which I judged to be a quote to note. Here’s the passage I highlighted:

as you added more data — not just a bit more data but orders of magnitude more data — and kept the algorithms the same, then the error rates kept going down, by a lot. By the time the datasets were three orders of magnitude larger, error was less than 5%. In many domains, there’s a world of difference between 18% and 5%, because only the latter is good enough for real-world application. Moreover, the best-performing algorithms were the simplest; and the worst algorithm was the fanciest. Boring old perceptrons from the 1950s were beating state-of-the-art techniques.

Bayesian methods date from the 18th century and work well. Despite LaPlacian and Markovian bolt ons, the drift problem bedevils some implementations. The solution? Pump in more training data, and the centuries old techniques work like a jazzed millennial with a bundle of venture money.

Care to name a large online outfit which may find this an idea worth nudging forward? I don’t think it will be Verizon Oath or Tronc.

Stephen E Arnold, September 6, 2017

An Automatic Observer for Neural Nets

August 25, 2017

We are making progress in training AI systems through the neural net approach, but exactly how those systems make their decisions remains difficult to discern. Now, Tech Crunch reveals, “MIT CSAIL Research Offers a Fully Automated Way to Peer Inside Neural Nets.” Writer Darrell Etherington recalls that, a couple years ago, the same team of researchers described a way to understand these decisions using human reviewers. A fully automated process will be much more efficient and lead to greater understanding of what works and what doesn’t. Etherington explains:

Current deep learning techniques leave a lot of questions around how systems actually arrive at their results – the networks employ successive layers of signal processing to classify objects, translate text, or perform other functions, but we have very little means of gaining insight into how each layer of the network is doing its actual decision-making. The MIT CSAIL team’s system uses doctored neural nets that report back the strength with which every individual node responds to a given input image, and those images that generate the strongest response are then analyzed. This analysis was originally performed by Mechanical Turk workers, who would catalogue each based on specific visual concepts found in the images, but now that work has been automated, so that the classification is machine-generated. Already, the research is providing interesting insight into how neural nets operate, for example showing that a network trained to add color to black and white images ends up concentrating a significant portion of its nodes to identifying textures in the pictures.

The write-up points us to MIT’s own article on the subject for more information. We’re reminded that, because the human thought process is still largely a mystery to us, AI neural nets are based on hypothetical models that attempt to mimic ourselves. Perhaps, the piece suggests, a better understanding of such systems could inform the field of neuroscience. Sounds fair.

Cynthia Murrell, August 25, 2017

Google and Apple Narrow Search Results

August 11, 2017

Remaining relevant means making money in technology and Google and Apple are not about to be outdone by Amazon despite it appears that may be the case. In an effort to stem the potential loss of revenue both Apple and Google are re-engineering their search capabilities to “buttress the value of traditional search.”

According to GeoMarketing, the two tech giants are approaching the same problem from different angles:

In a sense, the battle between the mobile web and apps is a proxy war between Google and Apple.

For Google,

The (Q&A box) fits right in with the current idea of getting direct, personalized responses to queries as opposed to the traditional method of showing infinite hypertext listings based on general popularity. It follows a path that Google has already taken with its search functions, including the automatic addition of the term “near me” into the search box as well as providing searchable menu listings for restaurants and direct bookings to salons and spas.

Apple is focusing on apps rather than search, but with the same end in mind.

As consumers are demanding local results and more organic answers to their search questions, search giants have to continually find ways to accommodate. As long as it results in more revenue, the infinite chase is worth it, we suppose.

Catherine Lamsfuss, August 11, 2017

Palantir Settles Discrimination Case

May 15, 2017

Does this count as irony? Palantir, who has built its data-analysis business largely on its relationships with government organizations, has a Department of Labor analysis to thank for recent charges of discrimination. No word on whether that Department used Palantir software to “sift through” the reports. Now, Business Insider tells us, “Palantir Will Shell Out $1.7 Million to Settle Claims that It Discriminated Against Asian Engineers.” Writer Julie Bort tells us that, in addition to that payout, Palantir will make job offers to eight unspecified Asians. She also explains:

The issue arose because, as a government contractor, Palantir must report its diversity statistics to the government. The Labor Department sifted through these reports and concluded that even though Palantir received a huge number of qualified Asian applicants for certain roles, it was hiring only small numbers of them. Palantir, being the big data company that it is, did its own sifting and produced a data-filled response that it said refuted the allegations and showed that in some tech titles 25%-38% of its employees were Asians. Apparently, Palantirs protestations weren’t enough on to satisfy government regulators, so the company agreed to settle.

For its part, Palantir insists on their innocence but say they settled in order to put the matter behind them. Bort notes the unusual nature of this case—according to the Equal Employment Opportunity Commission, African-Americans, Latin-Americans, and women are more underrepresented in tech fields than Asians. Is the Department of Labor making it a rule to analyze the hiring patterns of companies required to report diversity statistics? If they are consistent, there should soon be a number of such lawsuits regarding discrimination against other groups. We shall see.

Cynthia Murrell, May 15, 2017

To Make Data Analytics Sort of Work: Attention to Detail

March 10, 2017

I read “The Much-Needed Business Facet for Modern Data Integration.” The write up presents some useful information. Not many of the “go fast and break things” crowd will relate to some of the ideas and suggestions, but I found the article refreshing.

What does one do to make modern data centric activities sort of work? The answers are ones that I have found many more youthful wizards often elect to ignore.

Here they are:

  1. Do data preparation. Yikes. Normalization of data. I have fielded this question in the past, “Who has time for that?” Answer: Too few, gentle reader. Too few.
  2. Profile the data. Another gasp. In my experience it is helpful to determine what data are actually germane to the goal. Think about the polls for the recent
  3. Create data libraries. Good idea. But it is much more fun to just recreate data sets. Very Zen like.
  4. Have rules which are now explained as “data governance.” The jargon does not change the need for editorial and data guidelines.
  5. Take a stab at data quality. This is another way of saying, “Clean up the data.” Even whiz bang modern systems are confused with differences like I.B.M and International Business Machines or numbers with decimal points in the incorrect place.
  6. Get colleagues in the game. This is a good idea, but in many organizations in which I have worked “team” is spelled “my bonus.”

Useful checklist. I fear that those who color unicorns will not like the dog work which accompanies implementing the ideas. That’s what makes search and content processing so darned interesting.

Stephen E Arnold, March 10, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta