On the Hunt for Thesauri

December 15, 2016

How do you create a taxonomy? These curated lists do not just write themselves, although they seem to do that these days.  Companies that specialize in file management and organization develop taxonomies.  Usually they offer customers an out-of-the-box option that can be individualized with additional words, categories, etc.  Taxonomies can be generalized lists, think of a one size fits all deal.  Certain industries, however, need specialized taxonomies that include words, phrases, and other jargon particular to that field.  Similar to the generalized taxonomies, there are canned industry specific taxonomies, except the more specialized the industry the less likely there is a canned list.

This is where the taxonomy lists needed to be created from scratch.  Where do the taxonomy writers get the content for their lists?  They turn to the tried, true resources that have aided researchers for generations: dictionaries, encyclopedias, technical manuals, and thesauri are perhaps one of the most important tools for taxonomy writers, because they include not only words and their meanings, but also synonyms and antonyms words within a field.

If you need to write a taxonomy and are at a lost, check out MultiTes.  It is a Web site that includes tools and other resources to get your taxonomy job done.  Multisystems built MultiTes and they:

…developed our first computer program for Thesaurus Management on PC’s in 1983, using dBase II under CPM, predecessor of the DOS operating system.  Today, more than three decades later, our products are as easy to install and use. In addition, with MultiTes Online all that is needed is a web connected device with a modern web browser.

In other words, they have experience and know their taxonomies.

Whitney Grace, December 15, 2016

Is Sketch Search the Next Big Thing?

December 5, 2016

There’s text search and image search, but soon, searching may be done via hand-drawn sketching. Digital Trends released a story, Forget keywords — this new system lets you search with rudimentary sketches, which covers an emerging technology. Two researchers at Queen Mary University of London’s (QMUL) School of Electronic Engineering and Computer Science taught a deep learning neural network to recognize queries in the form of sketches and then return matches in the form of products. Sketch may have an advantage surpassing image search,

Both of those search modalities have problems,” he says. “Text-based search means that you have to try and describe the item you are looking for. This is especially difficult when you want to describe something at length, because retrieval becomes less accurate the more text you type. Photo-based search, on the other hand, lets you take a picture of an item and then find that particular product. It’s very direct, but it is also overly constrained, allowing you to find just one specific product instead of offering other similar items you may also be interested in.

This search technology is positioning itself to online retail commerce — and perhaps also only users with the ability to sketch? Yes, why read? Drawing pictures works really well for everyone. We think this might present monetization opportunities for Pinterest.

Megan Feil, December 5, 2016

Facial Recognition Fraught with Inaccuracies

November 2, 2016

Images of more than 117 million adult Americans are with law enforcement agencies, yet the rate of accurately identifying people accurately is minuscule.

A news report by The Register titled Meanwhile, in America: Half of adults’ faces are in police databases says:

One in four American law enforcement agencies across federal, state, and local levels use facial recognition technology, the study estimates. And now some US police departments have begun deploying real-time facial recognition systems.

Though facial recognition software vendors claim accuracy rates anywhere between 60 to 95 percent, statistics tell an entirely different story:

Of the FBI’s 36,420 searches of state license photo and mug shot databases, only 210 (0.6 per cent) yielded likely candidates for further investigations,” the study says. “Overall, 8,590 (4 per cent) of the FBI’s 214,920 searches yielded likely matches.

Some of the impediments for accuracy include low light conditions in which the images are captured, lower procession power or numerous simultaneous search requests and slow search algorithms. The report also reveals that human involvement also reduces the overall accuracy by more than 50 percent.

The report also touches a very pertinent point – privacy. Police departments and other law enforcement agencies are increasingly deploying real-time facial recognition. It not only is an invasion of privacy but the vulnerable networks can also be tapped into by non-state actors. Facial recognition should be used only in case of serious crimes, using it blatantly is an absolute no-no. It can be used in many ways for tracking people, even though they may not be criminals. Thus, it remains to be answered, who will watch the watchmen?

Vishal Ingole, November 2, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Another Robot Finds a Library Home

August 23, 2016

Job automation has its benefits and downsides.  Some of the benefits are that it frees workers up to take on other tasks, cost-effectiveness, efficiency, and quicker turn around.  The downside is that it could take jobs and could take out the human factor in customer service.   When it comes to libraries, automation and books/research appear to be the antithesis of each other.  Automation, better known as robots, is invading libraries once again and people are up in arms that librarians are going to be replaced.

ArchImag.com shares the story “Robot Librarians Invade Libraries In Singapore” about how the A*Star Research library uses a robot to shelf read.  If you are unfamiliar with library lingo, shelf reading means scanning the shelves to make sure all the books are in their proper order.  The shelf reading robot has been dubbed AuRoSS.  During the night AuRoSS scans books’ RFID tags, then generates a report about misplaced items.  Humans are still needed to put materials back in order.

The fear, however, is that robots can fulfill the same role as a librarian.  Attach a few robotic arms to AuRoSS and it could place the books in the proper places by itself.  There already is a robot named Hugh answering reference questions:

New technologies thus seem to storm the libraries. Recall that one of the first librarian robots, Hugh could officially take his position at the university library in Aberystwyth, Wales, at the beginning of September 2016. Designed to meet the oral requests by students, he can tell them where the desired book is stored or show them on any shelf are the books on the topic that interests them.

It is going to happen.  Robots are going to take over the tasks of some current jobs.  Professional research and public libraries, however, will still need someone to teach people the proper way to use materials and find resources.  It is not as easy as one would think.

Whitney Grace, August 23, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/

Read the Latest Release from…Virgil

August 18, 2016

The Vatican Library is one of the world’s greatest treasures, because it archives much of western culture’s history.  It probably is on par with the legendary Library of Alexandria, beloved by Cleopatra and burned to the ground.  How many people would love the opportunity to delve into the Vatican Library for a private tour?  Thankfully the Vatican Library shares its treasures with the world via the Internet and now, according to Archaeology News Network, the “Vatican Library Digitises 1600 Year-Old Manuscript Containing Works Of Virgil.”

The digital version of Virgil’s work is not the only item the library plans to scan online, but it does promise donors who pledge 500 euros or more they will receive a faithful reproduction of a 1600 manuscript by the famous author.  NTT DATA is working with the Vatican Library on Digita Vaticana, the digitization project.  NTT DATA has worked with the library since April 2014 and plans to create digital copies of over 3,000 manuscripts to be made available to the general public.

“ ‘Our library is an important storehouse of the global culture of humankind,’ said Monsignor Cesare Pasini, Prefect of the Vatican Apostolic Library. ‘We are delighted the process of digital archiving will make these wonderful ancient manuscripts more widely available to the world and thereby strengthen the deep spirit of humankind’s shared universal heritage.’”

Projects like these point to the value of preserving the original work as well as making it available for research to people who might otherwise make it to the Vatican.  The Vatican also limits the amount of people who can access the documents.

Whitney Grace, August 18, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden /Dark Web meet up on August 23, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233019199/

 

Libraries Will Save the Internet

June 10, 2016

Libraries are more than place to check out free DVDs and books and use a computer.  Most people do not believe this and if you try to tell them otherwise, their eyes glaze offer and they start chanting “obsolete” under their breath.  BoingBoing, however, agrees that “How Libraries Can Save The Internet Of Things From The Web’s Centralized Fate”.  For the past twenty years, the Internet has become more centralized and content is increasingly reliant on proprietary sites, such as social media, Amazon, and Google.

Back in the old days, the greatest fear was that the government would take control of the Internet.  The opposite has happened with corporations consolidating the Internet.  Decentralization is taking place, mostly to keep the Internet anonymous.  Usually, these are tied to the Dark Web.  The next big thing in the Internet is “the Internet of things,” which will be mostly decentralized and that can be protected if the groundwork is laid now.  Libraries can protect decentralized systems, because

“Libraries can support a decentralized system with both computing power and lobbying muscle. The fights libraries have pursued for a free, fair and open Internet infrastructure show that we’re players in the political arena, which is every bit as important as servers and bandwidth.  What would services built with library ethics and values look like? They’d look like libraries: Universal access to knowledge. Anonymity of information inquiry. A focus on literacy and on quality of information. A strong service commitment to ensure that they are available at every level of power and privilege.”

Libraries can teach people how to access services like Tor and disseminate the information to a greater extent than many other institutes within the community.  While this is possible, in many ways it is not realistic due to many factors.  Many of the decentralized factors are associated with the Dark Web, which is held in a negative light.  Libraries also have limited budgets and trying to install a program like this will need finances, which the library board might not want to invest in.  Also comes the problem of locating someone to teach these services.  Many libraries are staffed by librarians that are limited in their knowledge, although they can learn.

It is possible, it would just be hard.

 

Whitney Grace, June 10, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

A Dead Startup Tally Sheet

March 17, 2016

Startups are the buzzword for companies that are starting up in the tech industry, usually with an innovative idea that garners them several million in investments.  Some startups are successful, others plodder along, and many simply fail.  CBS Insights makes an interesting (and valid) comparison with tech startups and dot-com bust that fizzled out quicker than a faulty firecracker.

While most starts appear to be run by competent teams that, sometimes they fizzle out or are acquired by a larger company.  Many of them are will not make it as a headlining company.  As a result, CBS Insights invented, “The Downround Tracker: Which Companies Are Not Living Up To The Expectations?”

CBS Insights named this tech boom, the “unicorn era,” probably from the rare and mythical sightings of some of these companies.  The Downround Tracker tracks unicorn era startups that have folded or were purchased.  Since 2015, fifty-six total companies have made the Downround Tracker list, including LiveScribe, Fab.com, Yodle, Escrow.com, eMusic, Adesto Technologies, and others.

Browse through the list and some of the names will be familiar and others will make you wonder what some of these companies did in the first place.  Companies come and go in a fashion that appears to be quicker than any other generation.  At least in shows that human ingenuity is still working, cue Kanas’s “Dust in the Wind.”

 

Whitney Grace, March 17, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Authors Guild Loses Fair Use Argument, Petitions Supreme Court for Copyright Fee Payment from Google

January 12, 2016

The article on Fortune titled Authors Guild Asks Supreme Court to Hear Google Books Copyright Case continues the 10 year battle over Google’s massive book scanning project. Only recently in October of 2015 the Google project received  a ruling in their favor due to the “transformative” nature of the scanning from a unanimous appeals court. Now the Authors Guild, with increasing desperation to claim ownership over their work, takes the fight to the Supreme Court for consideration. The article explains,

“The Authors Guild may be hoping the high profile nature of the case, which at one time transfixed the tech and publishing communities, will tempt the Supreme Court to weigh in on the scope of fair use… “This case represents an unprecedented judicial expansion of the fair-use doctrine that threatens copyright protection in the digital age. The decision below authorizing mass copying, distribution, and display of unaltered content conflicts with this Court’s decisions and the Copyright Act itself.”

In the petition to the Supreme Court, the Authors Guild is now requesting payment of copyright fees rather than a stoppage of the scanning of 20 million books. Perhaps they should have asked for that first, since Google has all but already won this one.

 

 
Chelsea Kerwin, January 12, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Data Managers as Data Librarians

December 31, 2015

The tools of a librarian may be the key to better data governance, according to an article at InFocus titled, “What Librarians Can Teach Us About Managing Big Data.” Writer Joseph Dossantos begins by outlining the plight data managers often find themselves in: executives can talk a big game about big data, but want to foist all the responsibility onto their overworked and outdated IT departments. The article asserts, though, that today’s emphasis on data analysis will force a shift in perspective and approach—data organization will come to resemble the Dewey Decimal System. Dossantos writes:

“Traditional Data Warehouses do not work unless there a common vocabulary and understanding of a problem, but consider how things work in academia.  Every day, tenured professors  and students pore over raw material looking for new insights into the past and new ways to explain culture, politics, and philosophy.  Their sources of choice:  archived photographs, primary documents found in a city hall, monastery or excavation site, scrolls from a long-abandoned cave, or voice recordings from the Oval office – in short, anything in any kind of format.  And who can help them find what they are looking for?  A skilled librarian who knows how to effectively search for not only books, but primary source material across the world, who can understand, create, and navigate a catalog to accelerate a researcher’s efforts.”

The article goes on to discuss the influence of the “Wikipedia mindset;” data accuracy and whether it matters; and devising structures to address different researchers’ needs. See the article for details on each of these (especially on meeting different needs.) The write-up concludes with a call for data-governance professionals to think of themselves as “data librarians.” Is this approach the key to more effective data search and analysis?

Cynthia Murrell, December 31, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

An Early Computer-Assisted Concordance

November 17, 2015

An interesting post at Mashable, “1955: The Univac Bible,” takes us back in time to examine an innovative indexing project. Writer Chris Wild tells us about the preacher who realized that these newfangled “computers” might be able to help with a classically tedious and time-consuming task: compiling a book’s concordance, or alphabetical list of key words, their locations in the text, and the context in which each is used. Specifically, Rev. John Ellison and his team wanted to create the concordance for the recently completed Revised Standard Version of the Bible (also newfangled.) Wild tells us how it was done:

“Five women spent five months transcribing the Bible’s approximately 800,000 words into binary code on magnetic tape. A second set of tapes was produced separately to weed out typing mistakes. It took Univac five hours to compare the two sets and ensure the accuracy of the transcription. The computer then spat out a list of all words, then a narrower list of key words. The biggest challenge was how to teach Univac to gather the right amount of context with each word. Bosgang spent 13 weeks composing the 1,800 instructions necessary to make it work. Once that was done, the concordance was alphabetized, and converted from binary code to readable type, producing a final 2,000-page book. All told, the computer shaved an estimated 23 years off the whole process.”

The article is worth checking out, both for more details on the project and for the historic photos. How much time would that job take now? It is good to remind ourselves that tagging and indexing data has only recently become a task that can be taken for granted.

Cynthia Murrell, November 17, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Next Page »