How Forensic Linguistics Helped Unmask Rowling
August 23, 2013
By now most have heard that J.K. Rowling, famous for her astoundingly successful Harry Potter books, has been revealed as the author of the well-received crime novel “The Cuckoo’s Calling.” Time spoke to one of the analysts who discovered that author Robert Galbraith was actually Rowling, and shares what they learned in, “J.K. Rowling’s Secret: a Forensic Linguist Explains how He Figured it Out.”
It started with a tip. Richard Brooks, editor of the British “Sunday Times,” received a mysterious tweet claiming that “Robert Galbraith” was a pen name for Rowling. Before taking the claim to the book’s publisher, Brooks called on Patrick Juola of Duquesne University to linguistically compare “The Cuckoo’s Calling” with the Potter books. Joula has had years of experience with forensic linguistics, specifically authorship attribution. Journalist Lily Rothman writes:
“The science is more frequently applied in legal cases, such as with wills of questionable origin, but it works with literature too. (Another school of forensic linguistics puts an emphasis on impressions and style, but Juola says he’s always worried that people using that approach will just find whatever they’re looking for.)
“But couldn’t an author trying to disguise herself just use different words? It’s not so easy, Juola explains. Word length, for example, is something the author might think to change — sure, some people are more prone to ‘utilize sesquipedalian lexical items,’ he jokes, but that can change with their audiences. What the author won’t think to change are the short words, the articles and prepositions. Juola asked me where a fork goes relative to a plate; I answered ‘on the left’ and wouldn’t ever think to change that, but another person might say ‘to the left’ or ‘on the left side.'”
One tool Juola uses is the free Java Graphical Authorship Attribution Program. After taking out rare words, names, and plot points, the software calculates the hundred most-used words from an author under consideration. Though a correlation does not conclusively prove that two authors are the same person, it can certainly help make the case. “Sunday Times” reporters took their findings to Galbraith’s/ Rowling’s publisher, who confirmed the connection. Though Rowling has said that using the pen name was liberating, she (and her favorite charities) may be happy with the over 500,000 percent increase in “Cukoo’s Calling” sales since her identity was uncovered.
The article notes that, though folks have been statistically analyzing text since the 1800s, our turn to e-books may make for a sharp increase in such revelations. Before that development, the process was slow even with computers, since textual analysis had to be preceded by the manual entry of texts via keyboard. Now, though, importing an entire tome is a snap. Rowling may be just be the last famous author to enjoy the anonymity of a pen name, even for just a few months.
Cynthia Murrell, August 23, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Locate That Pesky Web Server
August 17, 2013
Where are Web sites hosted? The average user has no idea how to harness the right tools to locate where a server is located, but there might be a common solution. Makeuseof.com, gotta love that Web site, wrote the article, “Find Out Where A Web Site’s Server Is Located With FlagFox And Flag For Chrome.” Made for two open source OS, the Flag and FlagFox plugins are rather simple. Whenever you visit a Web site, the URL bar displays its server’s country of origin. Judging by the plugin’s name you can tell it displays the flag.
Pretty neat, huh? It is also pretty useful:
“This little flag isn’t just cool to show off, but it can also serve some interesting purposes, for example it can let you know which country a server is located in (especially when the server location doesn’t match the top-level domain like .co.uk, .de, etc.), help you troubleshoot why a certain connection may be acting slow, or help you identify when you’ve accidentally landed on a phishing Web site. Say you try to visit your bank’s website which usually shows your country’s flag, but suddenly you see a completely different flag. The chances that you’ve landed on a phishing site are very high. The flag shown by the extension also serves as a reminder of where our data goes — you practically visit the world through your browsing habits!”
It does more than show colorful flags too. Clicking on the flag displays technical data about the server: postal code, Web hosting provider, location, IP Address, and ISP. It also has the Web of Trust rating and embed other techy features. That is just for the Firefox version, the Google plug-in has a few more features that are specific to Google.
For the common users, use this tool as a way to prevent identity theft and catch phishing Web sites. Another simple tool to keep your Internet experience safe.
Whitney Grace, August 17, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Compressor Contest
August 15, 2013
If you want to squish text, here’s a useful resource. Blogger and tech strategist Matt Mahoney hosts a contest that puts lossless data compression programs to the test. Using a particular text dump, the English version of Wikipedia from March 3, 2006, he examines the compressed size of the data‘s first billion bytes. He explains the reason for the initiative:
“The goal of this benchmark is not to find the best overall compression program, but to encourage research in artificial intelligence and natural language processing (NLP). A fundamental problem in both NLP and text compression is modeling: the ability to distinguish between high probability strings like recognize speech and low probability strings like reckon eyes peach. . . .
“Compressors are ranked by the compressed size of enwik9 (109 bytes) plus the size of a zip archive containing the decompresser. Options are selected for maximum compression at the cost of speed and memory. Other data in the table does not affect rankings. This benchmark is for informational purposes only. There is no prize money for a top ranking.”
Still, bragging rights themselves will be worth it for the winner. See the write-up for all the technical details, including a detailed rundown of each compressor.
Cynthia Murrell, August 15, 2013
Sponsored by ArnoldIT.com, developer of
Useful Suggestion For The Magic Bullet Hadoop
August 13, 2013
Downloading Hadoop and expecting it to solve all your problems is dumb way to use the software. Silicon Angle has some suggestions on how to use Hadoop in, “To Succeed With Hadoop: Find Specific Problem Areas And Solve Them.” The advice comes from Datameer CEO/Founder Stefan Groschupf at the recent Hadoop Summit 2013.
Groschupf acknowledged that Hadoop is another tool in the big data toolbox and the real power of a company does not come from just its tools, build its customer base, quality products with an edge that no one else has, and to stay in the black. Most importantly is to find a problem no one else has resolved and do it yourself.
That seems to be the only advice the article offers. The rest is an advertisement for Datameer 3.0, which is the newest tool for big data analytics:
“Datameer 3.0 adds new Smart Analytic functions. With a single click, it automatically identifies patterns, relationships, and recommendations based on data stored in Hadoop. For the first time, four advanced machine learning techniques become self-service and accessible for data-driven business users: Clustering, Decision Trees, Column Dependencies and Recommendations. Until now, these advanced analytics required highly specialized data scientists to build custom functionality, which was a costly and time-consuming process.”
So get a gimmick kids! Once you have that you will succeed. It worked for Bill Gates, Walt Disney, Steve Jobs, and Lady Gaga.
Whitney Grace, August 13, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Google Pays AdBlock Enough Said
August 12, 2013
Pulling from the German Web site Horizont comes an interesting insight into Google’s dealings with AdBlock: “Google Is Funding AdBlock Plus.” AdBlock is a plugin users can download to block ads on their Internet browsers, but it still allows ads through that it deems “acceptable.” What makes an ad acceptable? Apparently if you pay Eyeo, the company that created AdBlock, enough money it will allow your ads to pass through the plugin. In a not too surprising deal, Google pays up.
Google is not the only one that pays out of pocket. Amazon, Reddit, and Yandex are also on the acceptable list. So money oils the squeaky wheel, but what makes Google stand out is what might be a suspect transaction. Forgive the translation below from Google. Deutsch ist eine schwierige Sprache.
“However, as a glance at the forum said of AdBlock Plus to receive Google AdWords shows, there are a total of nine entries. Till user – probably AdBlock Plus boss Till Faida personally – put the post on 18 Online in June 2013, six anonymous users briefly discussed what Google AdWords are ever exactly brought no reasons why it would violate the Acceptable Ads Rules, and three days later, on 21 June 2013, wrote Till user “Added” in the forum thread – Google’s advertising has since no longer filtered by AdBlock Plus. The same applies for the AdSense advertising program in which third party sites advertising Google can embed in their web sites, and be compensated for it.”
Suspect? Yes. Good business practice? No. If you do not want targeted ads, should you switch to a different plugin? Not a bad idea.
Whitney Grace, August 12, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Rocket Software Adds National Language Support
August 2, 2013
Rocket Software is making it easier for clients to do business across language barriers, we learn from “Rocket Software’s System Builder Extensible Architecture 6.2.2 Provides National Language Support” at Database Trends and Applications. This latest iteration of the company’s platform embraces national language support technology, as well as other improvements. We learn from the write-up:
“Along with the new SB/XA Designer and integration with Rocket CorVu Business Intelligence tools, SB/XA now provides National Language Support (NLS) to UniVerse customers, enabling them to store data in many character sets. The new support addresses the increasing globalization of the world economy, requiring software developers to implement solutions that can be easily adapted to different languages, cultures, customs, and regulations in order to establish a stronger presence in the worldwide market. With multi-language translation in SB/XA, NLS support allows organizations to run their business in the language and data format of their choice.”
The addition of NLS was prompted by feedback from a customer overseas, who wished to furnish an application to its call-center employees in their native language—a step that I’m sure considerably reduces misunderstandings.
Founded in 1990, Rocket Software distributes its enterprise software and hardware worldwide through independent service vendor (ISV) partners. The company is headquartered in Newton, Massachusetts, and maintains offices around the world. Rocket Software focuses on helping each organization get the most from their unique IT situation.
Cynthia Murrell, August 02, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Information Manager Enterprise User Edition Improves on SharePoint Usability
July 22, 2013
An article on MarketWatch titled MetaVis Makes It Easier to Organize Content in SharePoint 2013 refers to the announcement by MetaVis that its product Information Manager Enterprise User Edition supports SharePoint. Working to simplify SharePoint for users, this edition allows for bulk copy, upload, download and classify content features. The article states,
“Organizing information in SharePoint does not need to be hard,” said Peter Senescu, President and Co-founder of MetaVis Technologies. “MetaVis Information Manager Enterprise Edition provides users with more control to manage content directly from the SharePoint 2013 user interface minimizing the learning curve and increasing the use of metadata. For a SharePoint deployment to be successful, content needs to be properly tagged and easily searched. “
A free trial version is available, touting such features as Remap Content, which enables the user to move content easily into new fields, Security Trimmed, which limits access to locations or items as permitted by the user and Hide/Show Features, which works with the permission levels to only reveal features to users with permission. In spite of some concern that SharePoint is at the end of its usefulness, Metavis has continued to stick with it. Whether or not this is the right choice remains to be seen.
Chelsea Kerwin, July 22, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Norconex Offers Open Source HTTP Crawler
July 16, 2013
Most commercial enterprise search vendors offer their own HTTP crawler, and several are open-source. One new entry to the field stands out, though, for its odd blend of web and enterprise search functionality. In the post, “Norconex Gives Back to Open-Source,” Norconex describes their crawler and associated libraries:
“The Norconex HTTP Collector is an HTTP Crawler meant to give the greatest flexibility possible for developers and integrators. It makes it easy for Java developers to add custom features, so no one will get stuck again when dealing with odd requirements, difficult websites, or close-source crawler limitations. . . . The HTTP collector can be used stand-alone or embedded as a library in your own software.
“Norconex may release other collectors for various data sources in the future. In the meantime, we have encapsulated the document parsing process and sending of parsed data to your target search engine or repository into two separate libraries. We are releasing them as Norconex Importer and Norconex Committer.”
Norconex tells us that they focused on a simple configuration, as well as providing features that cannot be found in some existing crawlers. The enterprise search firm was founded in 2007 and is based in Ottawa, Canada.
Cynthia Murrell, July 16, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Use Rapportive Mailtester and Connect to Track Down Any Email Address
June 23, 2013
The article titled The Art of Finding Anyone’s Email Address on Life After Cubes direct all would be spammers to several helpful tools. Three sites mentioned are Rapportive.com, Mailtester.com and Connect.Data.com, which are used to track down social media profiles, test e-mail addresses for legitimacy and search through a massive online directory, respectively. The article explains the early steps, beginning with finding the contact,
“For this example, let’s say I find Raymond Stuoper, who is the Senior Director of Technology Partnerships (real title, fake person)… Rapportive integrates directly into Gmail and not only gives you social media information about people who email you, but it also tells you social media information about people you’re emailing. You can type in any email address and Rapportive will look for social media accounts associated with that address. If it finds one, you know the email address is correct.”
It goes on with mailtester, checking different versions to see if they work. If neither of these options is successful, Connect is available. There you can “buy” the contact’s business card (with the name, phone number and email address) with their points system. We hope all PR honchettes are perking up their ears at these tips for adding to their already sagging address books.
Chelsea Kerwin, June 23, 2013
Sponsored by ArnoldIT.com, developer of Augmentext.
The Fastest Windows Desktop Search
June 11, 2013
The MakeUseOf article “What Are the Fastest Tools for Windows Desktop Search?” gives readers a glimpse of several different desktop search tools and tries to determine whether Windows desktop search really is faster or if it comes up short when compared to other third party tools. Windows search is easy to use. Open up any explorer window or folder and you will find a search bar at the top right corner of the page. Searches can also be initiated from the Start Menu. The average search time for a Windows search was 3m 30s for un-indexed search and on average <1s fir un-indexed search. Also the Windows search indexing keeps a continual index of all files and folders which can improve overall search speeds.
The next program featured is the search tool Everything. The simplistic search interface provides an empty window that has a search bar across the top and that delivers results below as you type. This simple yet effective search tool produces instantaneous real-time results. It also works by indexing to produce even faster results. Listary was the third search tool reviewed and unlike the previous two it does not have a separate search interface. You simply start typing and it can determine whether you want to search or not. The average search time was <1s for a computer-wide search. Though all three are great tools the author has a clear winner.
“My winner? I prefer Everything. Listary offers the same “find as you type” instantaneous search results but the interface can sometimes be intrusive, especially when you accidentally bring it up. I like how Everything is both fast and compact and only shows up when I open it myself.”
Both third party tools seem worth a try but neither made the June 2013 Publisher Information today article about desktop search which makes one wonder what other potential winners are out there just waiting to be discovered.
April Holmes, June 11, 2013
Sponsored by ArnoldIT.com, developer of Augmentext