Oracle: Sparking the Database Fire

October 3, 2017

Hadoop? Er, what? And Microsoft .SQLServer? Or MarkLogic’s XML, business intelligence, analytics, and search offering? Amazon’s storage complex? IBM’s DB2? The recently-endowed MongoDB?

I thought of these systems when I read “Targeting Cybersecurity, Larry Ellison Debuts Oracle’s New ‘Self-Driving’ Database.”

For me, the main point of the write up is that the Oracle database is coming. There’s nothing like an announcement to keep the Oracle faithful in the fold.
If the write up is accurate, Oracle is embracing buzzy trends, storage that eliminates the guess work, and security. (Remember Secure Enterprise Search, the Security Server, and the nifty credential verification procedures? I do.)
The new version of Oracle, according to the write up, will deliver self driving. Cars don’t do this too well, but the Oracle database will and darned soon.

The 18c Autonomous Database or 18cad will:

  • Fix itself
  • Cost less than Amazon’s cloud
  • Go faster
  • Be online 99.995 percent of the time

And more, of course.

Let’s assume that Oracle 18cad works as described. (Words are usually easier to do than software I remind myself.)

The customers look to be big winners. Better, faster, cheaper. Oracle believes its revenues will soar because happy customers just buy more Oracle goodies.

Will there be  a downside?

What about database administrators? Some organizations may assume that 18cad will allow some expensive database administrator (DBA) heads to roll.
What about the competition? I anticipate more marketing fireworks or at least some open source “sparks” and competitive flames to heat up the cold autumn days.

Stephen E Arnold, October 3, 2017

Short Honk: Database Cost

September 26, 2017

If you want to get a sense of the time and computational cost under the covers of Big Data processing, please, read “Cost in the Land of Databases.” Two takeaways for me were [a] real time is different from what some individuals believe, and [b] if you want to crunch Big Data bring money and technical expertise, not assumptions that data are easy.

Stephen E Arnold, September 26, 2017

AI to Tackle Image Reading

September 11, 2017

The new frontier in analytics might just be pictures. Known to baffle even the most advanced AI systems, the ability to break pictures into recognizable parts and then use them to derive meaning has been a quest for many for some time. It appears that Disney Research in cahoots with UC Davis believe they are near a breakthrough.

Phys.org quotes Markus Gross, vice president at Disney Research, as saying,

We’ve seen tremendous progress in the ability of computers to detect and categorize objects, to understand scenes and even to write basic captions, but these capabilities have been developed largely by training computer programs with huge numbers of images that have been carefully and laboriously labeled as to their content. As computer vision applications tackle increasingly complex problems, creating these large training data sets has become a serious bottleneck.

A perfect example of the application of this is MIT attempts to use AI to share recipes and nutritional information just by viewing a picture of food. The sky is the limit when it comes to possibilities if Disney and MIT can help AI over the current hump of limitations.

Catherine Lamsfuss, September 11, 2017

Blockchain Quote to Note: The Value of Big Data as an Efficient Error Reducer

September 6, 2017

I read “Blockchains for Artificial Intelligence: From Decentralized Model Exchanges to Model Audit Trails.” The foundation of the write up is that blockchain technology can be used to bring more control to data and models. The idea is an interesting one. I spotted a passage tucked into the lower 20 percent of the article which I judged to be a quote to note. Here’s the passage I highlighted:

as you added more data — not just a bit more data but orders of magnitude more data — and kept the algorithms the same, then the error rates kept going down, by a lot. By the time the datasets were three orders of magnitude larger, error was less than 5%. In many domains, there’s a world of difference between 18% and 5%, because only the latter is good enough for real-world application. Moreover, the best-performing algorithms were the simplest; and the worst algorithm was the fanciest. Boring old perceptrons from the 1950s were beating state-of-the-art techniques.

Bayesian methods date from the 18th century and work well. Despite LaPlacian and Markovian bolt ons, the drift problem bedevils some implementations. The solution? Pump in more training data, and the centuries old techniques work like a jazzed millennial with a bundle of venture money.

Care to name a large online outfit which may find this an idea worth nudging forward? I don’t think it will be Verizon Oath or Tronc.

Stephen E Arnold, September 6, 2017

An Automatic Observer for Neural Nets

August 25, 2017

We are making progress in training AI systems through the neural net approach, but exactly how those systems make their decisions remains difficult to discern. Now, Tech Crunch reveals, “MIT CSAIL Research Offers a Fully Automated Way to Peer Inside Neural Nets.” Writer Darrell Etherington recalls that, a couple years ago, the same team of researchers described a way to understand these decisions using human reviewers. A fully automated process will be much more efficient and lead to greater understanding of what works and what doesn’t. Etherington explains:

Current deep learning techniques leave a lot of questions around how systems actually arrive at their results – the networks employ successive layers of signal processing to classify objects, translate text, or perform other functions, but we have very little means of gaining insight into how each layer of the network is doing its actual decision-making. The MIT CSAIL team’s system uses doctored neural nets that report back the strength with which every individual node responds to a given input image, and those images that generate the strongest response are then analyzed. This analysis was originally performed by Mechanical Turk workers, who would catalogue each based on specific visual concepts found in the images, but now that work has been automated, so that the classification is machine-generated. Already, the research is providing interesting insight into how neural nets operate, for example showing that a network trained to add color to black and white images ends up concentrating a significant portion of its nodes to identifying textures in the pictures.

The write-up points us to MIT’s own article on the subject for more information. We’re reminded that, because the human thought process is still largely a mystery to us, AI neural nets are based on hypothetical models that attempt to mimic ourselves. Perhaps, the piece suggests, a better understanding of such systems could inform the field of neuroscience. Sounds fair.

Cynthia Murrell, August 25, 2017

Google and Apple Narrow Search Results

August 11, 2017

Remaining relevant means making money in technology and Google and Apple are not about to be outdone by Amazon despite it appears that may be the case. In an effort to stem the potential loss of revenue both Apple and Google are re-engineering their search capabilities to “buttress the value of traditional search.”

According to GeoMarketing, the two tech giants are approaching the same problem from different angles:

In a sense, the battle between the mobile web and apps is a proxy war between Google and Apple.

For Google,

The (Q&A box) fits right in with the current idea of getting direct, personalized responses to queries as opposed to the traditional method of showing infinite hypertext listings based on general popularity. It follows a path that Google has already taken with its search functions, including the automatic addition of the term “near me” into the search box as well as providing searchable menu listings for restaurants and direct bookings to salons and spas.

Apple is focusing on apps rather than search, but with the same end in mind.

As consumers are demanding local results and more organic answers to their search questions, search giants have to continually find ways to accommodate. As long as it results in more revenue, the infinite chase is worth it, we suppose.

Catherine Lamsfuss, August 11, 2017

Palantir Settles Discrimination Case

May 15, 2017

Does this count as irony? Palantir, who has built its data-analysis business largely on its relationships with government organizations, has a Department of Labor analysis to thank for recent charges of discrimination. No word on whether that Department used Palantir software to “sift through” the reports. Now, Business Insider tells us, “Palantir Will Shell Out $1.7 Million to Settle Claims that It Discriminated Against Asian Engineers.” Writer Julie Bort tells us that, in addition to that payout, Palantir will make job offers to eight unspecified Asians. She also explains:

The issue arose because, as a government contractor, Palantir must report its diversity statistics to the government. The Labor Department sifted through these reports and concluded that even though Palantir received a huge number of qualified Asian applicants for certain roles, it was hiring only small numbers of them. Palantir, being the big data company that it is, did its own sifting and produced a data-filled response that it said refuted the allegations and showed that in some tech titles 25%-38% of its employees were Asians. Apparently, Palantirs protestations weren’t enough on to satisfy government regulators, so the company agreed to settle.

For its part, Palantir insists on their innocence but say they settled in order to put the matter behind them. Bort notes the unusual nature of this case—according to the Equal Employment Opportunity Commission, African-Americans, Latin-Americans, and women are more underrepresented in tech fields than Asians. Is the Department of Labor making it a rule to analyze the hiring patterns of companies required to report diversity statistics? If they are consistent, there should soon be a number of such lawsuits regarding discrimination against other groups. We shall see.

Cynthia Murrell, May 15, 2017

To Make Data Analytics Sort of Work: Attention to Detail

March 10, 2017

I read “The Much-Needed Business Facet for Modern Data Integration.” The write up presents some useful information. Not many of the “go fast and break things” crowd will relate to some of the ideas and suggestions, but I found the article refreshing.

What does one do to make modern data centric activities sort of work? The answers are ones that I have found many more youthful wizards often elect to ignore.

Here they are:

  1. Do data preparation. Yikes. Normalization of data. I have fielded this question in the past, “Who has time for that?” Answer: Too few, gentle reader. Too few.
  2. Profile the data. Another gasp. In my experience it is helpful to determine what data are actually germane to the goal. Think about the polls for the recent
  3. Create data libraries. Good idea. But it is much more fun to just recreate data sets. Very Zen like.
  4. Have rules which are now explained as “data governance.” The jargon does not change the need for editorial and data guidelines.
  5. Take a stab at data quality. This is another way of saying, “Clean up the data.” Even whiz bang modern systems are confused with differences like I.B.M and International Business Machines or numbers with decimal points in the incorrect place.
  6. Get colleagues in the game. This is a good idea, but in many organizations in which I have worked “team” is spelled “my bonus.”

Useful checklist. I fear that those who color unicorns will not like the dog work which accompanies implementing the ideas. That’s what makes search and content processing so darned interesting.

Stephen E Arnold, March 10, 2017

ScyllaDB Version 3.1 Available

March 8, 2017

According to Scylla, their latest release is currently the fastest NoSQL database. We learn about the update from SiliconAngle’s article, “ScyllaDB Revamps NoSQL Database in 1.3 Release.” To support their claim, the company points to a performance benchmark test executed by the Yahoo Cloud Serving Benchmark project. That group compared ScyllaDB to the open source Cassandra database, and found Scylla to be 4.6 times faster than a standard Cassandra cluster.

Writer Mike Wheatley elaborates on the product:

ScyllaDB’s biggest differentiator is that it’s compatible with the Apache Cassandra database APIs. As such, the creators claims that ScyllaDB can be used as a drop-in replacement for Cassandra itself, offering users the benefit of improved performance and scale that comes from the integration with a light key/value store.

The company says the new release is geared towards development teams that have struggled with Big Data projects, and claims a number of performance advantages over more traditional development approach, including:

*10X throughput of baseline Cassandra – more than 1,000,000 CQL operations per second per node

*Sub 1msec 99% latency

*10X per-node storage capacity over Cassandra

*Self-tuning database: zero configuration needed to max out hardware

*Unparalleled high availability, native multi-datacenter awareness

*Drop-in replacement for Cassandra – no additional scripts or code required”

Wheatley cites Scylla’s CTO when he points to better integration with graph databases and improved support for Thrift, Date Tiered Compaction Strategy, Large Partitions, Docker, and CQL tracing. I notice the company is hiring as of this writing. Don’t let the Tel Aviv location of Scylla’s headquarters stop from applying you if you don’t happen to live nearby—they note that their developers can work from anywhere in the world.

Cynthia Murrell, March 8, 2016

IBM and Root Access Misstep?

March 2, 2017

Maybe this is fake news? Maybe. Navigate to “Big Blue’s Big Blunder: IBM Accidentally Hands Over Root Access to Its Data Science Servers.” When I read the title, my first reaction was, “Hey, Yahoot is back in the security news.” Wrong.

According to the write up, which I assume to be exposing the “truth”:

IBM left private keys to the Docker host environment in its Data Science Experience service inside freely available containers. This potentially granted the cloud service’s users root access to the underlying container-hosting machines – and potentially to other machines in Big Blue’s Spark computing cluster. Effectively, Big Blue handed its cloud users the secrets needed to potentially commandeer and control its service’s computers.

IBM hopped to it. Two weeks after the stumble was discovered, IBM fixed the problem.

The write up includes this upbeat statement, attributed to the person using a demo account which exposed the glitch:

I think that IBM already has some amazing infosec people and a genuine commitment to protecting their services, and it’s a matter of instilling security culture and processes across their entire organization. That said, any company that has products allowing users to run untrusted code should think long and hard about their system architecture. This is not to imply that containers were poorly designed (because I don’t think they were), but more that they’re so new that best practices in their use are still being actively developed. Compare a newer-model table saw to one decades old: The new one comes stock with an abundance of safety features including emergency stopping, a riving knife, push sticks, etc, as a result of evolving culture and standards through time and understanding.

Bad news. Good news.

Let’s ask Watson about IBM security. Hold that thought, please. Watson is working on health care information. And don’t forget the March 2017 security conference sponsored by those security pros at IBM.

Stephen E Arnold, March 2, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta