Smart Software Figures Out What Makes Stories Tick

November 28, 2016

I recall sitting in high school when I was 14 years old and listening to our English teacher explain the basic plots used by fiction writers. The teacher was Miss Dalton and he seemed quite happy to point out that fiction depended upon: Man versus man, man versus the environment, man versus himself, man versus belief, and maybe one or two others. I don’t recall the details of a chalkboard session in 1959.

Not to fear.

I read “Fiction Books Narratives Down to Six Emotional Story Lines.” Smart software and some PhDs have cracked the code. Ivory Tower types processed digital versions of 1,327 books of fiction. I learned:

They [the Ivory Tower types] then applied three different natural language processing filters used for sentiment analysis to extract the emotional content of 10,000-word stories. The first filter—dubbed singular value decomposition—reveals the underlying basis of the emotional storyline, the second—referred to as hierarchical clustering—helps differentiate between different groups of emotional storylines, and the third—which is a type of neural network—uses a self-learning approach to sort the actual storylines from the background noise. Used together, these three approaches provide robust findings, as documented on the hedonometer.org website.

Okay, and what’s the smart software say today that Miss Dalton did not tell me more than 50 years ago?

[The Ivory Tower types] determined that there were six main emotional storylines. These include ‘rags to riches’ (sentiment rises), ‘riches to rags’ (fall), ‘man in a hole’ (fall-rise), ‘Icarus’ (rise-fall), ‘Cinderella’ (rise-fall-rise), ‘Oedipus’ (fall-rise-fall). This approach could, in turn, be used to create compelling stories by gaining a better understanding of what has previously made for great storylines. It could also teach common sense to artificial intelligence systems.

Ah, progress.

Stephen E Arnold, November 28, 2016

Written by Stephen E. Arnold · Filed Under AI, algorithms, Natural language processing, News | Comments Off on Smart Software Figures Out What Makes Stories Tick

Machine Learning Does Not Have All the Answers

November 25, 2016

Despite our broader knowledge, we still believe that if we press a few buttons and press enter computers can do all work for us. The advent of machine learning and artificial intelligence does not repress this belief, but instead big data vendors rely on this image to sell their wares. Big data, though, has its weaknesses and before you deploy a solution you should read Network World’s, “6 Machine Learning Misunderstandings.”

Pulling from Juniper Networks’s security intelligence software engineer Roman Sinayev explains some of the pitfalls to avoid before implementing big data technology. It is important not to take into consideration all the variables and unexpected variables, otherwise that one forgotten factor could wreck havoc on your system. Also, do not forget to actually understand the data you are analyzing and its origin. Pushing forward on a project without understanding the data background is a guaranteed fail.

Other practical advice, is to build a test model, add more data when the model does not deliver, but some advice that is new even to us is:

One type of algorithm that has recently been successful in practical applications is ensemble learning – a process by which multiple models combine to solve a computational intelligence problem. One example of ensemble learning is stacking simple classifiers like logistic regressions. These ensemble learning methods can improve predictive performance more than any of these classifiers individually.

Employing more than one algorithm? It makes sense and is practical advice why did that not cross our minds? The rest of the advice offered is general stuff that can be applied to any project in any field, just change the lingo and expert providing it.

Whitney Grace, November 25, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under AI, algorithms, Content processing, Data, News, Security, Technology | Comments Off on Machine Learning Does Not Have All the Answers

Do Not Forget to Show Your Work

November 24, 2016

Showing work is messy, necessary step to prove how one arrived at a solution. Most of the time it is never reviewed, but with big data people wonder how computer algorithms arrive at their conclusions. Engadget explains that computers are being forced to prove their results in, “MIT Makes Neural Networks Show Their Work.”

Understanding neural networks is extremely difficult, but MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed a way to map the complex systems. CSAIL figured the task out by splitting networks in two smaller modules. One for extracting text segments and scoring according to their length and accordance and the second module predicts the segment’s subject and attempts to classify them. The mapping modules sounds almost as complex as the actual neural networks. To alleviate the stress and add a giggle to their research, CSAIL had the modules analyze beer reviews:

For their test, the team used online reviews from a beer rating website and had their network attempt to rank beers on a 5-star scale based on the brew’s aroma, palate, and appearance, using the site’s written reviews. After training the system, the CSAIL team found that their neural network rated beers based on aroma and appearance the same way that humans did 95 and 96 percent of the time, respectively. On the more subjective field of “palate,” the network agreed with people 80 percent of the time.

One set of data is as good as another to test CSAIL’s network mapping tool. CSAIL hopes to fine tune the machine learning project and use it in breast cancer research to analyze pathologist data.

Whitney Grace, November 24, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under AI, algorithms, Big data, News, Search, Technology, Tools | Comments Off on Do Not Forget to Show Your Work

Big Data Teaches Us We Are Big Paranoid

November 18, 2016

I love election years! Actually, that is sarcasm. Election years bring out the worst in Americans. The media runs rampant with predictions that each nominee is the equivalent of the anti-Christ and will “doom America,” “ruin the nation,” or “destroy humanity.” The sane voter knows that whoever the next president is will probably not destroy the nation or everyday life…much. Fear, hysteria, and paranoia sells more than puff pieces and big data supports that theory. Popular news site Newsweek shares that, “Our Trust In Big Data Shows We Don’t Trust Ourselves.”

The article starts with a new acronym: DATA. It is not that new, but Newsweek takes a new spin on it. D means dimensions or different datasets, the ability to combine multiple data streams for new insights. A is for automatic, which is self-explanatory. T stands for time and how data is processed in real time. The second A is for artificial intelligence that discovers all the patterns in the data.

Artificial intelligence is where the problems start to emerge. Big data algorithms can be unintentionally programmed with bias. In order to interpret data, artificial intelligence must learn from prior datasets. These older datasets can show human bias, such as racism, sexism, and socioeconomic prejudices.

Our machines are not as objectives as we believe:

But our readiness to hand over difficult choices to machines tells us more about how we see ourselves.

Instead of seeing a job applicant as a person facing their own choices, capable of overcoming their disadvantages, they become a data point in a mathematical model. Instead of seeing an employer as a person of judgment, bringing wisdom and experience to hard decisions, they become a vector for unconscious bias and inconsistent behavior. Why do we trust the machines, biased and unaccountable as they are? Because we no longer trust ourselves.”

Newsweek really knows how to be dramatic. We no longer trust ourselves? No, we trust ourselves more than ever, because we rely on machines to make our simple decisions so we can concentrate on more important topics. However, what we deem important is biased. Taking the Newsweek example, what a job applicant considers an important submission, a HR representative will see as the 500^th submission that week. Big data should provide us with better, more diverse perspectives.

Whitney Grace, November 18, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under AI, algorithms, Big data, Data, Enterprise, News | Comments Off on Big Data Teaches Us We Are Big Paranoid

Hard and Soft Clustering Explained

November 17, 2016

I read “An Introduction to Clustering and Different Methods of Clustering.” Clustering, it seems, remains a popular topic among the quasi-search and content processing crowd. What’s interesting about this write up is that it introduces hard clustering and soft clustering. I had assumed that clustering was neither hard nor soft. Here’s the distinction:

In hard clustering, each data point either belongs to a cluster completely or not. For example, in the above example each customer is put into one group out of the 10 groups.
In soft clustering, instead of putting each data point into a separate cluster, a probability or likelihood of that data point to be in those clusters is assigned.

The write up then highlights these go-to methods of clustering:

K means clustering
Hierarchical clustering.

The write up introduces the idea of supervised learning. I noted that the article did not point out that training is a time consuming and often expensive exercise. The omission complements the “quick look” approach in the write up.

I am not sure that a person interested in clustering will be able to make a giant leap forward. Perhaps the effort will result in a hard soft landing?

Stephen E Arnold, November 17, 2016

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Hard and Soft Clustering Explained

AI to Profile Gang Members on Twitter

November 16, 2016

Researchers from Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) are claiming that an algorithm developed by them is capable of identifying gang members on Twitter.

Vice.com recently published an article titled Researchers Claim AI Can Identify Gang Members on Twitter, which claims that:

A deep learning AI algorithm that can identify street gang members based solely on their Twitter posts, and with 77 percent accuracy.

The article then points out the shortcomings of the algorithm or AI by saying this:

According to one expert contacted by Motherboard, this technology has serious shortcomings that might end up doing more harm than good, especially if a computer pegs someone as a gang member just because they use certain words, enjoy rap, or frequently use certain emojis—all criteria employed by this experimental AI.

The shortcomings do not end here. The data on Twitter is being analyzed in a silo. For example, let us assume that few gang members are identified using the algorithm (remember, no location information is taken into consideration by the AI), what next?

Is it not necessary then to also identify other social media profiles of the supposed gang members, look at Big Data generated by them, analyze their communication patterns and then form some conclusion? Unfortunately, none of this is done by the AI. It, in fact, would be a mammoth task to extrapolate data from multiple sources just to identify people with certain traits.

And most importantly, what if the AI is put in place, and someone just for the sake of fun projects an innocent person as a gang member? As rightly pointed out in the article – machines trained on prejudiced data tend to reproduce those same, very human, prejudices.

Vishal Ingole, November 16, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under AI, algorithms, Big data, Dark Web, News, Security, Technology, Twitter | Comments Off on AI to Profile Gang Members on Twitter

Lawyers Might Be Automated Too

November 14, 2016

The worry with artificial intelligence is that it will automate jobs and leave people without a way to earn income. The general belief is that AI will automate manufacturing, retail, food service, and other industries, but what about law? One would think that lawyers would never lose their jobs, because a human is required to navigate litigation and represent a person in court, right? According to The Inquirer article, “UCL Creates AI ‘Lawbot’ That Rules on Cases With Surprising Accuracy” lawyers might be automated too.

On a level akin to Watson, researchers at University College London, led by Dr. Nikoalos Aletras, created an algorithm that peruses case information and can predict accurate verdicts. The UCL team fed the algorithm litigation information from cases about torture, degrading treatment, privacy, and fair trials. They hope the algorithm will be used to identify patterns in human rights abuses.

Dr. Aletras does not think AI will replace judges and lawyers, but it could be used as a tool to identify patterns in cases with specific outcomes. The algorithm has a 79% accuracy rate, which is not bad considering the amount of documentation involved. Also the downside is:

At a wider level, although 79 percent is a bit more ED-209 than we’d like for now, it does suggest that we’re a long way towards being able to install an ethical and moral code that would allow AI to … you know, not kill us and that. With so many doomsayers warning us that the closer that we get to the so-called ‘singularity’ between humans and machines, the more likely we are to be toast as a race, it’s something of a good news story to see what’s being done to ensure AI stays on the straight and narrow.

Automation in the legal arena is a strong possibility for when “…implementation and interpretation of the law that is required, less so than the fact themselves.” The human element is still needed to decide cases, but perhaps it would cut down on the amount of light verdicts for pedophiles, sex traffickers, rapists, and other bad guys. It does make one wonder what alternative fields lawyers would consider?

Whitney Grace, November 14, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under AI, algorithms, Legal matters, News, Technology | Comments Off on Lawyers Might Be Automated Too

IBM Watson: Cruel, Cruel Caveats

November 12, 2016

There’s nothing like a cruel caveat applied to IBM Watson. Navigate to “Cognitive Computing Applications Present New Business Challenges.” These challenges are not “new”; what’s new is that naive smart software licensees are discovering that training software is difficult, time consuming, and expensive. Best of all, the training is not forever. Smart systems need to be retrained because language and data change.

The write up reports that an executive involved in smart software at Rabobank, a Dutch outfit, offered this observation at the World of Watson conference held at the end of October 2016 :

AI is everywhere, and people think it’s so fantastic. And these companies, including IBM, come in and then you go to do a project and see that it’s not really that great yet,” Serrurier Schepper said. “You have to train a model, and it takes time.”

The story continues:

After building a centralized AI unit, teams should look for quick wins and then publicize their success, Serrurier Schepper said. Models may take a long time to train, but once they’re delivering strong results, sharing this with the rest of the company can help build support for future initiatives.

Yep, time. Time is money, which is a statement any bank professional with Excel can understand.

How does one avoid failing? That’s easy. The write up reports:

Choosing the right use cases for cognitive computing applications is also important. There is a general notion that AI software can perform just about any task. And while that may be the ultimate goal of the technology, today’s tools are a ways off from that. Enterprises need to identify business problems where the technology is competent, and that’s not always a simple proposition.

The point is that no matter how generalized the perception that smart software like Watson can be, the licensee has to figure out exactly what problem to attack. The reason is that the time and cost of creating a model and then training the smart software will put the project deep into a swamp of red, mercury tinged muck.

But be prepared to spend money. The write up quotes another Watson aware executive as saying:

“If you get too hung up on ROI, you’ll never do anything.”

I disagree. Those involved in the project may have an opportunity to look for a new job. It’s the time and cost thing that creates these new horizons for some smart software champions.

Stephen E Arnold, November 12, 2016

Written by Stephen E. Arnold · Filed Under AI, algorithms, IBM Watson, News | Comments Off on IBM Watson: Cruel, Cruel Caveats

Lucidworks Hires Watson

November 7, 2016

One of our favorite companies to track is Lucidworks, due to their commitment to open source technology and development in business enterprise systems. The San Diego Times shares that “Lucidworks Integrates IBM Watson To Fusion Enterprise Discovery Platform.” This means that Lucidworks has integrated IBM’s supercomputer into their Fusion platform to help developers create discovery applications to capture data and discover insights. In short, they have added a powerful big data algorithm.

While Lucidworks is built on open source software, adding a proprietary supercomputer will only benefit their clients. Watson has proven itself an invaluable big data tool and paired with the Fusion platform will do wonders for enterprise systems. Data is a key component to every industry, but understanding and implementing it is difficult:

Lucidworks’ Fusion is an application framework for creating powerful enterprise discovery apps that help organizations access all their information to make better, data-driven decisions. Fusion can process massive amounts of structured and multi-structured data in context, including voice, text, numerical, and spatial data. By integrating Watson’s ability to read 800 million pages per second, Fusion can deliver insights within seconds. Developers benefit from this platform by cutting down the work and time it takes to create enterprise discovery apps from months to weeks.

With the Watson upgrade to Lucidworks’ Fusion platform, users gain natural language processing and machine learning. It makes the Fusion platform act more like a Star Trek computer that can provide data analysis and even interpret results.

Whitney Grace, November 7, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Data, IBM Watson, News, Security | Comments Off on Lucidworks Hires Watson

Is Your Company a Data Management Leader or Laggard?

November 4, 2016

The article titled Companies are Falling Short in Data Management on IT ProPortal describes the obstacles facing many businesses when it comes to data management optimization. Why does this matter? The article states that big data analytics and the internet of things will combine to form an over $300 billion industry by 2020. Companies that fail to build up their capabilities will lose out—big. The article explains,

More than two thirds of data management leaders believe they have an effective data management strategy. They also believe they are approaching data cleansing and analytics the right way…The [SAS] report also says that approximately 10 per cent of companies it calls ‘laggards’, believe the same thing. The problem is – there are as many ‘laggards’, as there are leaders in the majority of industries, which leads SAS to a conclusion that ‘many companies are falling short in data management’.

In order to avoid this trend, company leaders must identify the obstacles impeding their path. A better focus on staff training and development is only possible after recognizing that a lack of internal skills is one of the most common issues. Additionally, companies must clearly define their data strategy and disseminate the vision among all levels of personnel.

Chelsea Kerwin, November 4, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Written by Stephen E. Arnold · Filed Under algorithms, Data, Management, News, Portals, Technology, Uncategorized | Comments Off on Is Your Company a Data Management Leader or Laggard?

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Smart Software Figures Out What Makes Stories Tick

Machine Learning Does Not Have All the Answers

Do Not Forget to Show Your Work

Big Data Teaches Us We Are Big Paranoid

Hard and Soft Clustering Explained

AI to Profile Gang Members on Twitter

Lawyers Might Be Automated Too

IBM Watson: Cruel, Cruel Caveats

Lucidworks Hires Watson

Is Your Company a Data Management Leader or Laggard?

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta