Scrape the Content

July 17, 2014

What is artoo.js? It borrows the name for everyone’s favorite Star Wars droid that speaks in beeps. What it does is completely different. It is a piece of JavaScript code that runs on your browser’s console and provides you with scrapping utilities. This is a fine example of what happens when a Star Wars fan combines their computer savvy with their entertainment preference.

While the developer’s geek creed is established, does this make it a good tool? Let us study the features: data download scraped methods, Web crawls, scrapes any Web page, downloads instructions, JQuery is programmed in. Not bad, but why use artoo.js?

“Using browsers as scraping platforms comes with a lot of advantages:

• • Fast coding: You can prototype your code live thanks to JavaScript browsers’ REPL and peruse the DOM with tools specifically built for web development.

• • No more authentication issues: No longer need to deploy clever solutions to enable your spiders to authenticate on the website you intent to scrape. You are already authenticated on your browser as a human being.

Tools for non-devs: You can easily design tools for non-dev people. One could easily build an application with a UI on top of artoo.js. Moreover, it gives you the possibility to create bookmarklets on the fly to execute your personnal scripts.”

We are sold! It offers more features than the average scraper and it makes the hob easier. This is the scrape utility you are looking for.

Whitney Grace, July 17, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

TextTeaser Goes Open Source

July 16, 2014

If you are looking for an auto-summarization tool, TechCrunch says “Auto-Summarization Tool TextTeaser Relaunches As Open Source Code.” Joe Balbin is the creator of TextTeaser and he added it to GitHub after experiencing scalability issues in the API. Balbin recoded the program and the process is now faster. Developers have two plan options: one is $12 for ever 1000 articles summarized, while the enterprise plan is $250/month and comes with a dedicated server to store the article source.

“ ‘In this TextTeaser, you can train your own summarizer,’ Balbin explains. ‘You can provide the category and source of the article that will be used to improve the quality of the summaries. In the future, users might also have the ability to provide what keyword is important and what is not.’ ”

TextTeaser is used in reader apps, such as Gist. Balbin hopes to optimize the program for medical, financial, and legal documents.

TextTeaser sounds like it makes reading faster. The code is a valuable tool. We will stay tuned to see how else it is used.

Whitney Grace, July 16, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Connotate Shows Growth And Webdata Browser

June 20, 2014

In February 2014, NJTC TechWire wrote an article on “Connotate Announces 25% YOY Growth In Total Contract Value For 2013.” Connotate has made a name for itself by being a leading provider of Webdata extraction and monitoring solutions. The company’s revenue grew 25% in 2013 and among other positives for Connotate were the release of Connotate 4.0, a new Web site, and new multi-year deal renewals. On top of the record growth, BIIA reports that “Connotate Launches Connotate4,” a Web browser that simplified and streamlines Webdata extraction. Connotate4 will do more than provide users with a custom browser:

? “Inline data transformations within the Agent development process is a powerful new capability that will ease data integration and customization.

? Enhanced change detection with highlighting can be requested during the Agent development process via a simple point-and-click checkbox, enabling highlighted change detection that is easily illustrated at the character, word or phrase level.

? Parallel extraction tasks makes it faster to complete tasks, allowing even more scalability for even larger extractions.

? Build and expand capabilities turn the act of re-using a single Agent for related extraction tasks a one-click event, allowing for faster Agent creation.

? A simplified user interface enabling simplified and faster Agent development.”

Connotate brags that the new browser will give user access to around 95% of Webdata and is adaptable as new technologies are made. Connotate aims to place itself in the next wave of indispensable enterprise tools.

Whitney Grace, June 20, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Recent Innovations At KB Crawl

May 30, 2014

It is not an uncommon thought in the technology sector that search tools could become more important that business intelligence. Veille Mag reports that KB Crawl President Bruno Etinne does not agree with this idea. In the article, “KB Crawl Or How To Structure Unstructured Data” states that most Web sites are designed these days to make finding information easier than typing keywords into a search engine. Information is categorized so finely; it leads to more business intelligence solutions than to search.

Such thinking might have led KB Crawl’s “new look,” described as way for data to meet the needs of many departments:

“KB Crawl “new look” for example prepare data for Excel that contains a mapping tool as PowerView will connect to publishing systems or online booking. The last application is that of a client who has financed a portion of the development. The software meets the needs of marketing, documentation, ereputation, strategy and decision support that are fundamental to economic intelligence. It allows you to make the right decisions.”

KB Crawl has designed its software as a SaaS with a simple user interface and with a new version releasing soon.

While information might be easy to find, if it is not readily available users will turn to a search function. Is KB Crawl depending on people to have a certain amount of information literacy? Clearly, the have forgotten that search is a business intelligence tool.

Whitney Grace, May 30, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Alternatives to Google Popping up Everywhere

April 25, 2014

It’s a golden era for search alternatives. For a while there folks were worried about Google monopolizing the internet, but it’s not shaking out that way. Far from it, in fact. We are currently living in a golden age of niche search tools, as we discovered from a recent Virtual Strategy Magazine story, “MaxxCAT Raises the Bar for Search Performance with MaxxCAT 5.0.”

According to the story:

The 5.0 performance enhancements really come into their own when you begin looking at the scalability of our appliance in the enterprise…Sure, if you can build an index for a small amount of data in 5 minutes instead of 10, it’s nice, but it’s just 5 minutes. However, if you can index terabytes of data in 5 hours instead of 10 hours, that’s a huge difference.

MaxxCAT isn’t the only boat on this alternative Google sea, in fact, they aren’t even the biggest of the bunch. It’s not tough to find alternates, there are articles everywhere. The trickier part is finding one that fits your needs. Each serves a purpose, whether it is open source technology or privacy protection, that suits someone and repels others. This trial and error period is part of the fun, in our books.

Patrick Roland, April 25, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Litigation Software dtSearch Demo

April 16, 2014

The dtSearch Desktop Demonstration Video on nlsblog.org shows how to setup and search with dtSearch for Windows. The 12 minute video begins with an introduction to dtSearch, which is able to “recognize text in over 200 common file types.” By indexing the locations of words in different files, dtSearch is able to build an almost limitless index of documents. The demo walks through the setup of dtSearch. After naming the index,

“It is important to keep in mind that when we add items here, dtSearch is not creating copies… but links to those files. A good practice is to put the files and folder that we want to run searches on into a single centralized location, before we create the index… all we need to do is add this discovery folder, and the subfolders and files will be automatically included…dtSearch reads the text in the linked files and creates a searchable words list.”

Then you are able to search which index to search through, and limit it to one case, or all cases. The word appears with a number, show how often it appears in the index. Then you can add the keyword to the search request to find the documents in which the word appears. You are able to preview a document, copy a file, and create a search report. The demo goes into great detail about all of the search options, and should certainly be viewed in full to learn the best methods, but it does not provide metrics for the time required to build the initial index or update it. These metrics are useful.

Chelsea Kerwin, April 16, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

New Data Integration Tool from BA Insight

April 11, 2014

A new data integration platform promises to simplify the process of deploying search-driven applications, save organizations time and money, and improve security. BA Insight posts, “BA Insight Announces Knowledge Integration Platform 2014 for Rapid Implementation of Search-Drive Applications.” No definition of “knowledge” is included, however.

The press release specifies:

“The BAI Knowledge Integration Platform turns enterprise search engines into knowledge engines by transforming the way information is found to get the right information to the right people at the right time. It has the flexibility to function as a comprehensive solution or be implemented in a phased approach to meet growing organizational needs. The platform consists of three robust engines:

*User Experience Engine – drives remarkable user experiences for finding and exploring knowledge or experts via an extensible engine and a library of powerful components

*Content Intelligence Engine – increases findability using automated classification, metadata generation, and text analytics

*Content Connectivity Engine – provides secure connectivity to a wide variety of content systems, enabling unified views of all knowledge assets”

The press release notes that several prominent global companies are using this platform, including the Apache Corporation. (No, that has nothing to do with open source software; it is a huge energy-exploration enterprise.) The write-up also emphasizes that the platform builds on an organization’s existing infrastructure to present users with an integrated view of their data.

BA Insight aims to make enterprise search more comprehensive and easier to use. Founded in 2004, the company is headquartered in Boston with offices in Chicago, Washington, DC, and Sacramento, California.

Cynthia Murrell, April 11, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

SharePoint Adds PDF Converter

April 7, 2014

One of the major complaints about SharePoint is that users often have to leave the platform in order to accomplish basic tasks. SharePoint is getting closer to complete, and Microsoft is making some needed improvements. However, add-ons are also filling an important role in improving the user experience. Virtual Strategy covers one addition in their article, “SharePoint Now More Killer With PDF Document Converter; It’s No Fool’s Joke.”

The article begins:

“Today, PortalFront Tru Apps announces a new ‘Convert to PDF’ feature in SharePoint, bringing SharePoint a step closer to maturity . . . Converting documents from Word (doc, docx), Excel (xslx), PowerPoint and other formats to PDF directly in SharePoint libraries was not possible. The app also allows batch conversion and supports many other file types to PDF.”

Add-ons have been the key to SharePoint satisfaction according to many experts. Stephen E. Arnold is one of those experts, and he puts his thoughts down on the Web site, ArnoldIT.com. He covers a lot of SharePoint news and has found that user experience is highest when customization is at its best. But since many organizations cannot fully support internal customization, add-ons are key.

Emily Rae Aldridge, April 7, 2014

Documentation Toolkit for SharePoint

April 2, 2014

Documentation Toolkit for SharePoint 4.0 was released this week by Acceleratio Ltd. SharePoint 2007 is supported and new best practices and features were added. Read all the details in the PRWeb release, “Documentation Toolkit for SharePoint 4.0 – New SharePoint Best Practices, Enhanced Permissions Reports and Completely New Interface Design.”

The release says:

“Acceleratio Ltd., an innovative software development company, released a new version of Documentation Toolkit for SharePoint. Version 4.0 comes with improved Permissions Reports, a redesigned interface and an improved compare wizard. New Best Practices were added for more efficient analysis of the SharePoint farm configuration.“

Stephen E. Arnold of ArnoldIT.com has made a name for himself following and analyzing all things search, including SharePoint. But one thing is certain from the coverage; SharePoint gets more powerful and more complicated all at the same time. This opens a wide space for add-ons and value-added software that improve the user experience and customization of SharePoint without adding a lot of hassle.

Emily Rae Aldridge, April 2, 2014

Partnership with SAP, SharePoint, and Open Text

March 24, 2014

SharePoint is improved by customization. Third-party add-ons are often the backbone of this customization, since SharePoint has become such a complex infrastructure. In the latest news, SharePoint is partnering with other vendors to increase efficiency. Read more in the Fierce Content Management article, “OpenText Brings Governance to SharePoint, SAP.”

The article begins:

“In one of the more odd product announcements in some time, three giants of enterprise software–OpenText, SharePoint and SAP–have come together around a governance, content management and an ERP solution. This three-headed monster is called SAP Content Management for Microsoft SharePoint by OpenText. You can view SAP data inside SharePoint or SharePoint content inside SAP, and OpenText takes care of the governance bits to make sure everything is done within the rules of the organization.”

Stephen E. Arnold has made a career out of reporting and analyzing all things search. His SharePoint coverage also points to the importance of customization, especially through add-ons. Read more on his Web site ArnoldIT.com.

Emily Rae Aldridge, March 24, 2014

Next Page »