Data Lake and Semantics: Swimming in Waste Water?

November 6, 2015

I read a darned fascinating write up called “Use Semantics to Keep Your Data Lake Clear.” There is a touch of fantasy in the idea of importing heterogeneous “data” into a giant data lake. The result is, in my experience, more like waste water in a pre-treatment plant in Saranda, Albania. Trust me. Distasteful.

Looks really nice, right?

The write up invokes a mid tier consultant and then tosses in the fuzzy word term governance. We are now on semi solid ground, right? I do like the image of a data swap which contrasts nicely with the images from On Golden Pond.

I noted this passage:

Using a semantic data model, you represent the meaning of a data string as binary objects – typically in triplicates made up of two objects and an action. For example, to describe a dog that is playing with a ball, your objects are DOG and BALL, and their relationship is PLAY. In order for the data tool to understand what is happening between these three bits of information, the data model is organized in a linear fashion, with the active object first – in this case, DOG. If the data were structured as BALL, DOG, and PLAY, the assumption would be that the ball was playing with the dog. This simple structure can express very complex ideas and makes it easy to organize information in a data lake and then integrate additional large data stores.

Okay.

Next I circled:

A semantic data lake is incredibly agile. The architecture quickly adapts to changing business needs, as well as to the frequent addition of new and continually changing data sets. No schemas, lengthy data preparation, or curating is required before analytics work can begin. Data is ingested once and is then usable by any and all analytic applications. Best of all, analysis isn’t impeded by the limitations of pre-selected data sets or pre-formulated questions, which frees users to follow the data trail wherever it may lead them.

Yep, makes perfect sense. But there is one tiny problem. Garbage in, garbage out. Not even modern jargon can solve this decades old computer challenge.

Fantasy is much better than reality.

Stephen E Arnold, November 6, 2015

Comments

2 Responses to “Data Lake and Semantics: Swimming in Waste Water?”

  1. bitshares bitcoin on November 7th, 2015 3:57 am

    bitshares bitcoin

    Data Lake and Semantics: Swimming in Waste Water? : Stephen E. Arnold @ Beyond Search

  2. Agen Moorlife on December 22nd, 2015 8:56 am

    I used to be suggested this website by way of my
    cousin. I am now not sure whether this post is written via him as nobody else recognize
    such targeted about my problem. You’re amazing! Thank you!

  • Archives

  • Recent Posts

  • Meta