Managing Tweets as records part 2: how to capture

Community Topic(s):

Keywords: ERM, Records Management, Twitter

Current Rating:
(0 ratings)

In my last post, I talked about Twitter and records management, with emphasis on whether a given Tweet might rise to the level of a record and need to be managed. In this post I will talk more about the technical aspects of capturing and managing Tweets. 

One way to manage Tweets more effectively is by essentially translating them into email messages (technically posts, at least in Outlook 2007). There is an Outlook plugin called TwInbox (formerly OutTwit) that does exactly this. Once the plugin is installed, it downloads all Twitter content from a particular account. This includes tweets sent from that account, mentions, direct messages, and the main Twitter stream of users that account follows - even those whose streams are not public. The posts are stored in the local Outlook message store and can then be moved into file shares, ECRM repositories, SharePoint libraries, etc. And since the Tweets are converted into Outlook items, they can be managed with rules, folders, and Search Folders in Outlook. 
 
Another way is to subscribe to a Tweet stream or query results using RSS. The end results will vary depending on the RSS client used to subscribe to the stream, but Outlook 2007 converts RSS posts into Outlook items which can be managed as described above. Other clients can output RSS streams into Excel spreadsheets or flat files, XML documents, email messages, or even PDFs, all of which could then be managed as with other similar types of records. 
 
A number of vendors have begun to target this issue with solutions designed to archive Twitter and other social media websites. As of this post, vendors in the market include but are probably not limited to Autonomy, Backupify, FaceTime, Iterasi, Smarsh, Socialware, Sonian, and ZL Technologies (using FaceTime's Unified Security Gateway). And there are any number of Twitter-specific websites that offer to backup and/or archive Twitter posts including BackupMyTweets, Tweetake, TweetBackup, TweetScan, and TwitterBackup. 
 
Some of these solutions are cloud-based, which could result in the same issues with discovery and compliance that Twitter presents in terms of how to make Tweets available for review and production. In others, the Tweets are archived locally to an application server or an appliance, but users still need to confirm the format used to store Tweets and other archived content. They also need to confirm whether Tweets can be kept selectively or whether only the entire stream is kept; the format used to store and manage Tweets and other content; and how that content could be exported out and/or made available to other parties in the case of discovery, public records, audit, or some other reason. 
 
And the format will vary substantially between providers. Backupify, for example, allows users to export the stream as a single XML file. This is readable, but results in essentially an all-or-nothing approach requiring archiving and production of the entire stream within the backup rather than individual items. Tweetake, on the other hand, exports to Excel, one item per line. 
 
Finally, one of the things all organizations should be aware of is that the Library of Congress (LoC) has acquired Twitter's archives in its entirety. Every public Tweet ever sent will be stored and made available through the LoC's website at some point. The announcement can be found here. As of this post, the archive is not available yet; per the agreement, "Only after a six-month delay can the Tweets be used for internal library use, for non-commercial research, public display by the library itself, and preservation." I bring this up because regardless of what retention period your organization assigns to a given Tweet, once the LoC has it, it will be preserved and presumably could be made available. This isn't any different than any other communications that cross the firewall, but it is something to be taken into consideration. 
Report

Rate Post

You need to log in to rate blog posts. Click here to login.

Add a Comment

You need to log in to post messages. Click here to login.

Comments

Jesse Wilkins

I should have noted that the use of XML is good for a number of reasons; more immediately, it would be a trivial exercise (less than an hour) for an organization to examine the XML stream and write a stylesheet that broke the stream into each individual Tweet. The organization would still have the issue of determining which ones to save vs. discard, but this would be doable and somewhat automate-able.
Report
Was this helpful? Yes No
Reply

Julie Colgan

Jesse, great post, as always!

I just wanted to add a quick point about contracts for any 3rd-party backup service you might choose to invite into your Twitter stream (or your Facebook stream, or your whatever ...). Even for those cool free services (like one listed above that I use personally), you need to be reading ALL of the fine print and understanding what control you have over the content you pass to them.

In particular, if you ever decide to terminate the relationship, understand how you will get your content back(assuming you can get it at all), whether or not they will keep a copy of it and if so for how long and in what way, how they wipe your data off of their machines, etc. Even though the LOC is planning to archive all public tweets, what about services that back up your DMs, Mentions, your Facebook account, your gmail account, and so on? You need to be aware of where that content lives as well as your relative ability to determine its destiny.

And now, back you to Jess!
Report
Was this helpful? Yes No
Reply

This post and comment(s) reflect the personal perspectives of community members, and not necessarily those of their employers or of AIIM International