CyberOSINT banner

An Early Computer-Assisted Concordance

November 17, 2015

An interesting post at Mashable, “1955: The Univac Bible,” takes us back in time to examine an innovative indexing project. Writer Chris Wild tells us about the preacher who realized that these newfangled “computers” might be able to help with a classically tedious and time-consuming task: compiling a book’s concordance, or alphabetical list of key words, their locations in the text, and the context in which each is used. Specifically, Rev. John Ellison and his team wanted to create the concordance for the recently completed Revised Standard Version of the Bible (also newfangled.) Wild tells us how it was done:

“Five women spent five months transcribing the Bible’s approximately 800,000 words into binary code on magnetic tape. A second set of tapes was produced separately to weed out typing mistakes. It took Univac five hours to compare the two sets and ensure the accuracy of the transcription. The computer then spat out a list of all words, then a narrower list of key words. The biggest challenge was how to teach Univac to gather the right amount of context with each word. Bosgang spent 13 weeks composing the 1,800 instructions necessary to make it work. Once that was done, the concordance was alphabetized, and converted from binary code to readable type, producing a final 2,000-page book. All told, the computer shaved an estimated 23 years off the whole process.”

The article is worth checking out, both for more details on the project and for the historic photos. How much time would that job take now? It is good to remind ourselves that tagging and indexing data has only recently become a task that can be taken for granted.

Cynthia Murrell, November 17, 2015

Sponsored by, publisher of the CyberOSINT monograph


Product Hunt Adds Collections to Its Search Results

November 13, 2015

Product Hunt is a website for the cutting-edge consumer, where users share information about the latest and greatest in the tech market. The Next Web tells us, “Product Hunt Now Lets You Follow and Search for Collections.” A “collection” can be established by any user to curate and share groups of products. An example would be a selection of website-building tools, or of the best electronic-device accessories for charging electronic devices. The very brief write-up reveals:

Product Hunt, the Web’s favorite destination to discover new apps, gadgets and connected services, has updated its Collections feature, allowing users to follow and search for curated lists. You can now follow any collection you find interesting to receive notifications when new products are added to them. Collections will also show up in search results alongside products. In addition, curators can add comments to products in their collections to describe them or note why they’ve included them in their list.”

So now finding the best of the latest is even easier. An important tool for anyone with a need, and the means, to keep in front of the technology curve. Launched in 2013, Product Hunt is based in San Francisco. Their Collections feature was launched last December, and this year the site also added sections specifically for books and for games.

Cynthia Murrell, November 13, 2015

Sponsored by, publisher of the CyberOSINT monograph


Libraries Failure to Make Room for Developer Librarians

October 23, 2015

The article titled Libraries’ Tech Pipeline Problem on Geek Feminism explores the lack of diverse developers. The author, a librarian, is extremely frustrated with the approach many libraries have taken. Rather than refocusing their hiring and training practices to emphasize technical skills, many are simply hiring more and more vendors, hardly a solution. The article states,

“The biggest issue I see is that we offer a fair number of very basic learn-to-code workshops, but we don’t offer a realistic path from there to writing code as a job. To put a finer point on it, we do not offer “junior developer” positions in libraries; we write job ads asking for unicorns, with expert- or near-expert-level skills in at least two areas (I’ve seen ones that wanted strong skills in development, user experience, and devops, for instance).”

The options available are that librarians either learn to code in their spare time (not viable), or enter the tech workforce temporarily and bring your skills back after a few years. This option is also full of drawbacks, especially that even white women are marginalized in the tech industry. Instead, the article stipulates the libraries need to make more room for hiring and promoting people with coding skills and interests while also joining the coding communities like Code4Lib.


Chelsea Kerwin, October 23, 2015

Sponsored by, publisher of the CyberOSINT monograph


Funding Granted for American Archive Search Project

September 23, 2015

Here’s an interesting project: we received an announcement about funding for Pop Up Archive: Search Your Sound. A joint effort of the WGBH Educational Foundation and the American Archive of Public Broadcasting, the venture’s goal is nothing less than to make almost 40,000 hours of Public Broadcasting media content easily accessible. The American Archive, now under the care of WGBH and the Library of Congress, has digitized that wealth of sound and video. Now, the details are in the metadata. The announcement reveals:

As we’ve written before, metadata creation for media at scale benefits from both machine analysis and human correction. Pop Up Archive and WGBH are combining forces to do just that. Innovative features of the project include:

*Speech-to-text and audio analysis tools to transcribe and analyze almost 40,000 hours of digital audio from the American Archive of Public Broadcasting

*Open source web-based tools to improve transcripts and descriptive data by engaging the public in a crowdsourced, participatory cataloging project

*Creating and distributing data sets to provide a public database of audiovisual metadata for use by other projects.

“In addition to Pop Up Archive’s machine transcripts and automatic entity extraction (tagging), we’ll be conducting research in partnership with the HiPSTAS center at University of Texas at Austin to identify characteristics in audio beyond the words themselves. That could include emotional reactions like laughter and crying, speaker identities, and transitions between moods or segments.”

The project just received almost $900,000 in funding from the Institute of Museum and Library Services. This loot is on top of the grant received in 2013, from the Corporation for Public Broadcasting, that got the project started. But will it be enough money to develop a system that delivers on-point results? If not, we may be stuck with something clunky, something that resembles the old Autonomy Virage, Blinkxx, Exalead video search, or Google YouTube search. Let us hope this worthy endeavor continues to attract funding so that, someday, anyone can reliably (and intuitively) find valuable Public Broadcasting content.

Cynthia Murrell, September 23, 2015

Sponsored by, publisher of the CyberOSINT monograph

A Search Engine for College Students Purchasing Textbooks

August 27, 2015

The article on Life Hacker titled TUN’s Textbook Search Engine Compares Prices from Thousands of Sellers reviews TUN, or the “Textbook Save Engine.” It’s an ongoing issue for college students that tuition and fees are only the beginning of the expenses. Textbook costs alone can skyrocket for students who have no choice but to buy the assigned books if they want to pass their classes. TUN offers students all of the options available from thousands of booksellers. The article says,

“The “Textbook Save Engine” can search by ISBN, author, or title, and you can even use the service to sell textbooks as well. According to the main search page…students who have used the service have saved over 80% on average buying textbooks. That’s a lot of savings when you normally have to spend hundreds of dollars on books every semester… TUN’s textbook search engine even scours other sites for finding and buying cheap textbooks; like Amazon, Chegg, and Abe Books.”

After typing in the book title, you get a list of editions. For example, when I entered Pride and Prejudice, which I had to read for two separate English courses, TUN listed an annotated version, several versions with different forewords (which are occasionally studied in the classroom as well) and Pride and Prejudice and Zombies. After you select an edition, you are brought to the results, laid out with shipping and total prices. A handy tool for students who leave themselves enough time to order their books ahead of the beginning of the class.

Chelsea Kerwin, August 27, 2015

Sponsored by, publisher of the CyberOSINT monograph

Library Design Improves

June 10, 2015

I like libraries. If you enjoy visiting them as well, navigate to “These Modern Libraries Look Like Alien Spaceships On The Inside.” Among the libraries featured are the Beinecke Rare Book and Manuscript Library (Yale), Bibliotheca Alexandrina, and Biblioteca España.

Stephen E Arnold, June 9, 2015

Reading in the Attention Deficit World

May 12, 2015

The article on Popist titled Telling the Truth with Charts outlines the most effective and simple method of presenting the information on the waning of book-reading among Americans. While the article focuses on the effectiveness of the chart, the information in the chart is disturbing as well, stating that the amount of Americans who read zero books in 2014 is up to 23% from 8% in 1987. The article links to another article on The Atlantic titled The Decline of the American Book Lover. That article presents an argument for some hope,

“The percentage of young folks reading for pleasure stopped declining. Last year, the NEA found that 52 percent of 18-24 year-olds had read a book outside of work or school, the same as in the pre-Facebook days of 2002. If book culture were in terminal decline, this is the demographic where you’d expect it to be fading fastest. Perhaps the worst of the fall is over. “

The article demonstrates the connection between education level and reading for pleasure, which may be validation for many teachers and professors. However, there also seems to be a growing tendency among students to read, even homework, without absorbing anything, or in other words, to skim texts instead of paying close attention. This may be the effect of too much TV or

Facebook, or even the No Child Left Behind generation entering college. Students are far more interested in their grades than in their education, and just tallying up the numbers of books they or anyone else read is not going to paint an accurate portrait. Similarly, what books are the readers reading? If they are all Twilight and 50 Shades of Grey, do we still celebrate the accomplishment?

Chelsea Kerwin, May 12, 2014

Sponsored by, publisher of the CyberOSINT monograph

Research Like the Old School

April 24, 2015

There was a time before the Internet that if you wanted to research something you had to go to the library, dig through old archives, and check encyclopedias for quick facts.  While it seems that all information is at your disposable with a few keystrokes, but search results are often polluted with paid ads and unless your information comes from a trusted source, you can’t count it as fact.

LifeHacker, like many of us, knows that if you want to get the truth behind a topic, you have to do some old school sleuthing.  The article “How To Research Like A Journalist When The Internet Doesn’t Deliver” drills down tried and true research methods that will continue to withstand the sands of time or the wrecking ball (depending on how long libraries remain brick and mortar buildings).

The article pushes using librarians as resources and even going as far as petitioning government agencies and filing FOIA requests for information.  When it makes the claim that some information is only available in person or strictly for other librarians, this is both true and false.  Many libraries are trying to digitize their information, but due to budgets are limited in their resources.  Also unless the librarian works in a top secret archive, most of the information is readily available to anyone with or without the MLS degree.

Old school interviews are always great, especially when you have to cite a source.  You can always cite your own interview and verify it cam straight from the horse’s mouth.  One useful way to team the Internet with interviews is tracking down the interviewees.

Lastly, this is the best piece of advice from the article:

“Finally, once you’ve done all of this digging, visited government agencies, libraries, and the offices of the people with the knowledge you need, don’t lose it. Archive everything. Digitize those notes and the recordings of your interviews. Make copies of any material you’ve gotten your hands on, then scan them and archive them safely.”

The Internet is full of false information.  By placing a little more credence out there, will make the information more safe to use or claim as the truth.

These tips are useful, even if a little obvious, but they however still fail to mention the important step that all librarians know: doing the actual footwork and proper search methods to find things.

Whitney Grace, April 24, 2015

Sponsored by, publisher of the CyberOSINT monograph

Worrying about Losing Obsolete Information

March 9, 2015

Ready to hear another side to the endangered library argument that has been tossed around since the 1990s? Hopes and Fears revives people’s worries about losing data from obsolete mediums and how libraries are evolving rather than disappearing in “The Near And Far Future Of Libraries.” The article points out the same old fears that some obsolete mediums have not been transitioned to a digital archive yet and they might be forgotten. It also mentions that libraries are transforming their spaces into gathering places for people to study, read, and meet (like that is new).

Mixed in with the fear of disappearing libraries, new ways that artificial intelligence is helping to preserve knowledge and help people learn how to harness their information is discussed. Some new insights about how libraries are changing are made, but the bulk of the article is very disorganized and is hard to tie together.

Some valid ideas made include that centralizing too much information on Web sites like Wikipedia, social media networks, and even the Internet Archive are dangerous, because one Web site is easier to block than hundreds. Another important advantage is that more interactive technology tools are actually helping people better use their information. Robots like Vincent and Nancy from Westport Library are an example of how people can better physically interact with information and use it to their advantage.

What is the most interesting archival idea presented is the Rosetta Disk, a thin nickel disk three inches in diameter that holds over 14,000 pages of information. While it is meant to preserve knowledge for ready access in the future it is also is good backup:

“We aren’t creating the Rosetta Disk specifically with an apocalypse in mind, or for a society that’s undergoing major upheaval, but over the span of millennia, I think you have to expect that to happen occasionally. In that case, the Rosetta Disk is a good long-term backup. You might think of it as a “secret decoder ring” for information we leave for the future in human language form.”

Libraries and information are changing. We do have to preserve obsolete knowledge before it degrades and we have to upgrade libraries for them to remain relevant. It is very similar to old historical sites with low visitor attendance. They are changing the way they interact with people and presenting their historical information to draw people to them. Do not be fearful, embrace the change.

Whitney Grace, March 09, 2015
Sponsored by, developer of Augmentext

Early English Texts Now Available Online

February 16, 2015

The phrase “early English literature” encompasses texts written from the mid-fifteenth century to 1700. Now, the University of Oxford’s Bodleian Libraries tells us about its exciting project to make such works available to anyone with Internet access in, “Thousands of Early English Books Released Online to Public by Bodleian Libraries and Partners.” The University of Michigan Library is also involved in the project, which will release some 25,000 texts. The fully searchable files can be downloaded in different formats or read online.

The works were compiled some time ago by the Early English Books Online Text Creation Partnership (EEBO-TCP), which spent 15 years manually entering and XML-encoding the texts. The results were made available to users of academic libraries at the time, but were released into the public domain at the turn of the new year. The post informs us:

“Members of the public, teachers and researchers around the world can now have access to thousands of transcriptions of English texts published during the first two centuries of printing in England. The corpus includes important works by literary giants like Chaucer and Bacon, but also contains many rare and little-known materials that were previously only available to those with access to special collections at academic libraries.

“The text-only files are a unique resource for members of the public to browse for curious and interesting topics and titles ranging from witchcraft and homeopathy to poetry and recipes. In addition to browsing and reading text-only versions of these early English books, users of EEBO-TCP can also search the entire corpus, which contains more than two million pages and nearly a billion words. The text has been encoded with Extensible Markup Language (XML), allowing individuals to search for keywords and themes across the entire collection of works, in individual books or even within specific sections of text such as stage directions or tables of contents.”

Michael Popham, head of the Bodleian Libraries’ digital collections, is excited about the full-search functionality. He expects the tool will allow users to make connections, cross-references, and discoveries unlike ever before.

Cynthia Murrell, February 16, 2015

Sponsored by, developer of Augmentext

Next Page »