Braiding Big Data

October 26, 2015

An apt metaphor to explain big data is the act of braiding. Braiding requires person to take three or more locks of hair and alternating weaving them together. The end result is clean, pretty hairstyle that keeps a person’s hair in place and off the face. Big data is like braiding, because specially tailored software takes an unruly mess of data, including the combed and uncombed strands, and organizes them into a legible format. Perhaps this is why TopQuadrant named its popular big data software TopBraid, read more about its software upgrade in “TopQuadrant Launches TopBraid 5.0.”

TopBraid Suite is an enterprise Web-based solution set that simplifies the development and management of standards-based, model driven solutions focused on taxonomy, ontology, metadata management, reference data governance, and data virtualization. The newest upgrade for TopBraid builds on the current enterprise information management solutions and adds new options:

“ ‘It continues to be our goal to improve ways for users to harness the full potential of their data,’ said Irene Polikoff, CEO and co-founder of TopQuadrant. ‘This latest release of 5.0 includes an exciting new feature, AutoClassifier. While our TopBraid Enterprise Vocabulary Net (EVN) Tagger has let users manually tag content with concepts from their vocabularies for several years, AutoClassifier completely automates that process.’ “

The AutoClassifer makes it easier to add and edit tags before making them a part of the production tag set. Other new features are for TopBraid Enterprise Vocabulary Net (TopBraid EVN), TopBraid Reference Data Manager (RDM), TopBraid Insight, and the TopBraid platform, including improvements in internationalization and a new component for increasing system availability in enterprise environments, TopBraid DataCache.

TopBraid might be the solution an enterprise system needs to braid its data into style.

Whitney Grace, October 26, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Data, Metadata, News | Comments Off on Braiding Big Data

Libraries Failure to Make Room for Developer Librarians

October 23, 2015

The article titled Libraries’ Tech Pipeline Problem on Geek Feminism explores the lack of diverse developers. The author, a librarian, is extremely frustrated with the approach many libraries have taken. Rather than refocusing their hiring and training practices to emphasize technical skills, many are simply hiring more and more vendors, hardly a solution. The article states,

“The biggest issue I see is that we offer a fair number of very basic learn-to-code workshops, but we don’t offer a realistic path from there to writing code as a job. To put a finer point on it, we do not offer “junior developer” positions in libraries; we write job ads asking for unicorns, with expert- or near-expert-level skills in at least two areas (I’ve seen ones that wanted strong skills in development, user experience, and devops, for instance).”

The options available are that librarians either learn to code in their spare time (not viable), or enter the tech workforce temporarily and bring your skills back after a few years. This option is also full of drawbacks, especially that even white women are marginalized in the tech industry. Instead, the article stipulates the libraries need to make more room for hiring and promoting people with coding skills and interests while also joining the coding communities like Code4Lib.

Chelsea Kerwin, October 23, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Big data, Business strategy, Database, Digital Library, Library automation, Marketing, News, Security | Comments Off on Libraries Failure to Make Room for Developer Librarians

University Partners up with Leidos to Investigate How to Cut Costs in Healthcare with Big Data Usage

October 22, 2015

The article on News360 titled Gulu Gambhir: Leidos Virginia Tech to Research Big Data Usage for Healthcare Field explains the partnership based on researching the possible reduction in healthcare costs through big data. Obviously, healthcare costs in this country have gotten out of control, and perhaps that is more clear to students who grew up watching the cost of single pain pill grow larger and larger without regulation. The article doesn’t go into detail on how the application of big data from electronic health records might ease costs, but Leidos CTO Gulu Gambhir sounds optimistic.

“The company said Thursday the team will utilize technical data from healthcare providers to develop methods that address the sector’s challenges in terms of cost and the quality of care. Gulu Gambhir, chief technology officer and a senior vice president at Leidos, said the company entered the partnership to gain knowledge for its commercial and federal healthcare business.”

The partnership also affords excellent opportunities for Virginia Tech students to gain real-world, hands-on knowledge of data research, hopefully while innovating the healthcare industry. Leidos has supplied funding to the university’s Center for Business Intelligence and Analytics as well as a fellowship program for grad students studying advanced information systems related to healthcare research.
Chelsea Kerwin, October 22, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Business intelligence, Business strategy, Corporate Concerns, Data, News, Search, Technology | 1 Comment

Algorithmic Bias and the Unintentional Discrimination in the Results

October 21, 2015

The article titled When Big Data Becomes Bad Data on Tech In America discusses the legal ramifications of relying on algorithms for companies. The “disparate impact” theory has been used in the courtroom for some time to ensure that discriminatory policies be struck down whether they were created with the intention to discriminate or not. Algorithmic bias occurs all the time, and according to the spirit of the law, it discriminates although unintentionally. The article states,

“It’s troubling enough when Flickr’s auto-tagging of online photos label pictures of black men as “animal” or “ape,” or when researchers determine that Google search results for black-sounding names are more likely to be accompanied by ads about criminal activity than search results for white-sounding names. But what about when big data is used to determine a person’s credit score, ability to get hired, or even the length of a prison sentence?”

The article also reminds us that data can often be a reflection of “historical or institutional discrimination.” The only thing that matters is whether the results are biased. This is where the question of human bias becomes irrelevant. There are legal scholars and researchers arguing on behalf of ethical machine learning design that roots out algorithmic bias. Stronger regulations and better oversight of the algorithms themselves might be the only way to prevent time in court.

Chelsea Kerwin, October 21, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Big data, Data, Google, Image search, News, Search, Search quality | Comments Off on Algorithmic Bias and the Unintentional Discrimination in the Results

Spark Burns Down Hadoop

October 20, 2015

I read “Apache Spark vs Hadoop.” I conceptualized Ronda Rousey climbing in the octagon with Ramazan Emeev. A big gate. As a certain presidential candidate might say, “Huge.”

Alas, the dust up between Spark (MapReduce on steroids) and Hadoop (a batch operation clustering system) was not much of a contest, according to the article.

I highlighted this passage:

With Apache Spark, you can act on your data in whatever way you want. Want to look for interesting tidbits in your data? You can perform some quick queries. Want to run something you know will take a long time? You can use a batch job. Want to process your data streams in real time? You can do that too.

The key to the Spark wonderfulness is RDDs or resilient distributed datasets. I underlined with definition:

They’re fine-grained, keeping track of all changes that have been made from other transformations such as map or join. This means that it’s possible to recover from failures by rebuilding from these transformations (which is why they’re called Resilient Distributed Datasets).

My goodness with these features, poor, old Hadoop may not stand a chance. Now who would win a fight between Rousey and Emeev? One could, I assume, input data about the two fighters and perform on quick queries and get an “answer.”

Like most NoSQL confections, will the answer match what happens in the ring?

Stephen E Arnold, October 20, 2015

Written by Stephen E. Arnold · Filed Under Big data, Database, News | Comments Off on Spark Burns Down Hadoop

Big Data: The Seven Step Method

October 19, 2015

I hear quite a bit about methods with steps. I, therefore, was not surprised to read “The Seven ‘Simple’ Steps To Big Data.” The main idea is that Big Data can be complicated. I suppose most things can be complicated if certain fundamentals are not mastered. My great grandmother could tat, which I learned was a way to make weird things placed on chairs to prevent the fabric from wearing or staining. I watched her.

Why write a book explaining how to “do” Big Data. It takes seven easy steps. Making a lace thing requires a book, good vision, supplies, and skill.

She explained. I did not get it 60 years ago, and I don’t know how to tat. I can, however, write about it. Most of the comments about Big Data fall into this category. Folks cannot “do” Big Data, but, by golly, many people can write about Big Data.

The article presents seven steps. Before you try to follow these steps, you may want to consider whether you or your organization has the resources to get the foundational knowledge and processes in place before you “do” Big Data.

Here are the steps:

Get a “business rationale.” I think this means that one should have a reason to “do” Big Data and then explain how Big Data will make an immediate and direct contribution to one’s organization. Accountants may not understand Big Data, but they do get the idea of cost overrun and spending for something that generates grousing in carpetland.
Learn the lingo. Yep, knowing what words mean can be important. However, if one employs a mid tier consultant, why not let that expert translate? Works well, at least for the compensated consultant.
“Care about data lineage.” With regard to terminology, I am not sure what data lineage means. My hunch is that data should be valid, in a processable form, and fresh.
When and where factor. This is another puzzler to me. The idea remains murky, which may inhibit one’s ability to “do” Big Data. But maybe not?
Correlation does not imply causation. Ah, a chestnut from various classes which taught me about mathy things. The idea is that bonehead mistakes occur in Statistics 101 and real life in Fortune 1000 outfits. See www.TylerVigen.com/spurious-correlations.
Be a trained seal: Balance “new innovation” with “hardened enterprise grade tech.” I think this means use what is in the text book and whizzy new system.
Rely on “reference architectures.” I assume this means buying a name brand Big Data system to “do” one’s Big Data activity.

Does the list appear simple? Not to me. Tatting is a walk in the park compared to figuring out how this list of sophisms makes Big Data easy. Maybe tatting will make a come back? Is there a market for tatted antimacassars into which terminated Big Data experts can dive.?

Stephen E Arnold, October 15, 2015

Written by Stephen E. Arnold · Filed Under Big data, Marketing, News | Comments Off on Big Data: The Seven Step Method

US General Services Administration: Changes Ahead

October 17, 2015

I read “David Shive: GSA Ramps Up IT Consolidation through Acquisition Process Updates, Analytics Adoption.” Then I read “Mary Davie: GSA Updates Federal Acquistion Gateway Platform.” Ah, memories of FAR. You are familiar with the rich, informative compendium known as Federal Acquisition Regulation.

The write ups indicate that changes are afoot at the GSA, where 18f.gov is busy inventing point-and-click Web site services and cloud computing.

According to the Shive write up:

“We are looking at our data management strategies so we can effectively coalesce that data, and putting good predictive analytics on top of that so that we can make good decisions about things that are happening, and predicting things that are going to happen and drive down costs for things like maintenance of infrastructure,” Shive told the station [part of ExecutiveGov maybe?] in an interview.

Efficiency in government is welcomed by those who have the opportunity to interact with the professionals at their stations. One innovation is interesting:

GSA is also working to implement a statement of work library for multiple procurement categories and the a click-and-pay service on the site.

No word about the search system, and not much information about who pays whom and for what.

Stephen E Arnold, October 17, 2015

Written by Stephen E. Arnold · Filed Under Analytics, Big data, Government, News | Comments Off on US General Services Administration: Changes Ahead

Meg Whitman, President of HP, Gets Flack for Partial Follow-Through on Ultimatum

October 14, 2015

The article titled HP Didn’t Actually Fire All the Employees It Threatened to Cut on Business Insider details the management teachings from Hewlett Packard. To summarize, HP recently delivered an ultimatum to several hundred employees that they had to shift off HP’s payroll and become contract workers for significantly lower pay with HP’s partner Ciber. If they refused, they would be let go. Except that the employees mutinied and complained, resulting in HP negotiating for higher salaries from Ciber as well as holding on to a few employees who refused the deal. The article states,

“On top of that, HP is also shipping most of the jobs in this business unit offshore. Whitman wants 60% of the Enterprise Services division jobs to be in low-cost areas of the world, compared to less than 40% today. Employees in this unit fully expect HP to line up more take-it-or-leave it contract jobs, they tell us, so we’ll see how HP handles the next one if it does materialize.”

This is all in the midst of HP’s massive layoffs of over 80,000 employees, 51,000 of whom have already been let go. Morale must be under the building. The non-negotiable ultimatum strategy did not seem to work, and at any rate is bad business, especially when coupled with it being overturned later in a handful of instances.

Chelsea Kerwin, October 14, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under Acquisition, Analytics, Big data, Enterprise, Marketing, News | Comments Off on Meg Whitman, President of HP, Gets Flack for Partial Follow-Through on Ultimatum

The Question: How Big Is Big?

October 9, 2015

I recall one of my math teachers yapping about infinity. At the time, I did not care. My reaction was that as long as I could add a one to another number, I could just keep on going. Boring.

Later I became interested in the work of a German who did not drive a Volkswagen diesel. His name was Georg Cantor. As I worked though some of the articles discussing his thinking about comparing the infinite set of rational numbers with the infinite set of natural numbers by a rather brain dead procedure of listing and enumerating all the rational, I realized why the esteemed thinker fell into a programmer-type state politely described as mental illness. When he died in 1918, he thought about infinity and worried about the author of Romeo and Juliet.

I thought of the infinity thing when I read “Is Big Data Too Big?” Where some folks saw a legitimate question, I noted a lack of sensitivity to the perils of thinking about infinity. I assume the author wanted to avoid throwing himself into Joseph of Arimathea’s role in history, ordinality, and cardinality. As a result, the write ignores math and focuses on what I call business philosophy. Socrates, Nietzsche, and Russell were just so short sighted.

I noted this passage:

For major corporations the amount of data they can collect on each of us is vast. Indeed, estimates made about the amount of data collection show that the total volume of data in the world doubled between the start of this year and last month. By 2020 the amount of data in the world will be doubling every seven days. That is just too difficult to get our heads around…!

This exclamation mark approach is a good way to sidestep irritating mathy stuff.

I learned that big data can pose some psychological challenges. Cantor’s acquaintances would probably agree if they were still around and had a Facebook page.

But here’s the killer paragraph:

he vast amount of data and the ever-increasing number of reports causes another psychological issue of being overwhelmed. That leads to disinterest and lack of attention. And that, in turn, means we stop gaining from the data because we are not analyzing it properly. The more data we collect, the less valuable it can become because of our brains. As the article, How Can Big Data Trigger Positive Emotions explains, it is possible to make data interesting and appeal to our staff, but we have to work at it. If we don’t take steps to make data more psychologically engaging we are in danger of just producing data for data’s sake and not getting anything from it other than the desire to collect more data.

I agree. It is important to make numbers more psychologically engaging. Is there a downside? I keep thinking about Georg pondering that the power set of a countably infinite set is uncountably infinite.

Thus, is big data too big? Nope, big data can never be big enough because it will be bigger. Georg, Georg, are you with me on this?

Stephen E Arnold, October 9, 2015

Written by Stephen E. Arnold · Filed Under Big data, News | Comments Off on The Question: How Big Is Big?

Big Data: Systems of Insight

October 6, 2015

I read “All Your Big Data Will Mean Nothing without Systems of Insight.” The title reminded me of the verbiage generated by mid tier consulting firms and adjuncts teaching MBA courses at some institutions of higher learning. Malarkey, parental advice, and Big Data—a Paula Dean-type recipe for low-calorie intellectual fare.

Can one live on the outputs of mid tier consulting firm lingo prepared to be fudgier?

The notion of a system of insight is not particularly interesting. The rhetorical trip of moving from a particular to a more general concept fools some beginning debaters. For a more experienced debater, the key is to keep the eye on the ball, which, in this case, is the tenuous connection between Big Data and strategic management methods. (I am not sure these exist even after reading every one of Peter Drucker’s books.)

But I like to deal with particulars.

Computerworld is a sister or first cousin unit of the IDC outfit which sold my research on Amazon without asking my permission. My valiant legal eagle was able to disappear the report. I was concerned with the connection of my name and the names of two of my researchers with the IDC outfit. I have presented some of the back story in previous blog posts. I included screenshots along with the details of not issuing a contract, using content in ways to which I would never agree, and engaging in letters with my attorney offering inducements to drop the matter. Wow. A big company is unable to get organized and then pays its law firm to find a solution to the self created problem.

The report in question was a limp wristed, eight pages in length and available to Amazon’s eager readers of romance novels for a mere $3,500. Hey, the good stuff in our research was chopped out, leaving a GrapeNut flakes experience for those able to read the document. I am a lousy writer, but I try to get my points across in a colorful way. Cereal bowl writing is not for me.

What does this have to do with Big Data and a system of insights?

Aren’t Amazon’s sales data big? Isn’t it possible to look at what sells on Amazon by scanning the company’s public information about books? Won’t a casual Google search reveal information about Amazon’s best selling eBooks? Best sellers’ lists rarely feature eight pages of watered down analysis of a search vendor with some soul bonding with the outstanding Fast Search & Transfer operation. How many folks visiting the digital WalMart buy $3,500 reports with my name on them?

Er, zero. So what’s the disconnect between basic data about what sells on Amazon, issuing appropriate contractual documents, and selling research with my name and two of my goslings on the $3,500, eight page document. That’s brilliant data analysis for sure.

The write up explains:

Businesses want to use data to understand customers, but they can’t do that without harnessing insights and consistently turning data into effective action.

That sort of makes sense except that the company which owns Computerworld, under the keen-eyed Dave Schubmehl, appeared to ignore this step when trying to sell a report with my name on it to the Amazon faithful. Do the folks at Computerworld and the company’s various knowledge properties connect data with their colleagues’ decisions?

Written by Stephen E. Arnold · Filed Under Big data, Feature, Financial, Publishing | 2 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Employment
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Braiding Big Data

Libraries Failure to Make Room for Developer Librarians

University Partners up with Leidos to Investigate How to Cut Costs in Healthcare with Big Data Usage

Algorithmic Bias and the Unintentional Discrimination in the Results

Spark Burns Down Hadoop

Big Data: The Seven Step Method

US General Services Administration: Changes Ahead

Meg Whitman, President of HP, Gets Flack for Partial Follow-Through on Ultimatum

The Question: How Big Is Big?

Big Data: Systems of Insight

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta