Migrating Gmail Between Accounts

Using OfflineIMAP to migrate email between Gmail accounts while retaining labels and message dates.

TL;DR

You can just jump ahead to view the results or the configuration used.

Leading a Double Life

I’ve been leading a double life for 13 years.

I’ve had my own domain since April 2000, and ran my own mail server. In February 2005 I signed up for a Gmail account when it was a beta service. This means that I’ve had two email accounts in parallel.

In January 2012 I moved my domain to Google Apps (now G Suite), so I have two Gmail accounts each with their accumulated emails.

I decided that it was finally time to try and move across and use my own domain as the primary account.

I decided to use OfflineIMAP to move the email messages across. I simply installed version 7.0.2 from the Debian stable repository.

I have used OfflineIMAP in the past. I think it was so I could read/compose email using Mutt while offline in the days when WiFi was not ubiquitous.

Approach

I searched the Internet to see if this had been done already, read the documentation, and looked through the example configuration file.

When I needed to understand some of the internal details of OfflineIMAP (normally when my configuration had triggered an exception) I had the source code available. I examined the Python scripts locally, or I could have cloned the source repository.

I would pull from the Gmail folders [Gmail]/All Mail and [Gmail]/Sent Mail, and then push to Gmail folders Xfer/All Mail and Xfer/Sent Mail.

I discovered that OfflineIMAP would synchronise the labels, so I would not lose the benefits of the extensive filtering on the source account over the years. They would gain a label Xfer/All Mail or Xfer/Sent, but that could be useful to know in the future.

I configured OfflineIMAP with two accounts. One would be used to pull from the source Gmail account to a local folder, and the second would be used to push from the local folder to the target Gmail account.

This had the bonus benefit of being able to run first with the Pull account and view the messages, and then run the Push account to see the result of the upload.

I initially started by specifying maxage so I could limit my experiments to recent messages.

Tweaks

My first attempt was running quickly. I pulled down messages from the source, and pushed then up to the target. The messages had the original labels, and they retained the “Unread”, “Flagged”, and “Imported” status.

Message Dates

I noticed that the date displayed against the message (in the list view, and in message view) was the time I ran the import. That had to change.

Date Header

I updated the configuration, and added utime_from_header = yes. This sets the file modification time on the message file from the “Date” header when downloaded, and then that timestamp is used as the received timestamp when the message is uploaded.

This passed a quick inspection, but I came across a message that hadn’t had the timestamp set. Viewing the message revealed the date header in the email was:

Date: Fri, 30 3 7 15:4:26 -4

It was no wonder that this date had failed to be parsed.

Received Header

This lead me to closer inspection of the results. There were some minor discrepancies between the date displayed on the source account and the target account.

The date displayed by Gmail is not from the email message, but is the “Created on” date (use “Show original” to see). This means you see when the message arrived in the mailbox, not the time it claimed to be (given incorrect clocks or delivery delays).

Fortunately, Gmail correctly adds “Received” headers, so this information is available in the message.

Received: by 10.100.91.11 with SMTP id o11cs412060anb;
        Sun, 1 Apr 2007 08:17:24 -0700 (PDT)

I wrote the helper script set-received-mtime which extracts the first “Received” header, or if not available extracts the “Date” header. This date is used to set the modification time on the message file.

This is combined with presynchook and postsynchook to set the modification time on the newly downloaded messages.

Running

Customise the configuration for your accounts.

If you are using 2-Step Verification to sign into your Google account you will need to create App passwords. Otherwise you will need to turn on access for less secure apps.

Check the configuration:

$ offlineimap -c offlineimaprc --info

Run a download and check message contents and timestamps:

$ offlineimap -c offlineimaprc -o -a Pull

Run a push and verify uploaded messages:

$ offlineimap -c offlineimaprc -o -a Push

To verify on a smaller dataset before the full migration, set maxage in the [Account Pull] section. When you are ready for the full migration remove the metadata from ~/.offlineimap and the downloaded folders All Mail and Sent Mail.

Results

How Long Did It Take?

I downloaded a total of ~120,000 messages totalling ~4GB in around 6½ hours.

The upload took considerable longer. I ran it over a couple of days, and I had to restart the process many times due to an exception being thrown (often the dreaded “Too many read 0”). Fortunately OfflineIMAP is designed to pick-up the process from where it had got to when next run. I think the actual execution time was ~40 hours.

Making a incremental run to fetch any newer messages takes just a couple of minutes (but see limitations below).

Tweaking

There was some minor tweaking to do following the upload.

I’d used nested labels, and they became flattened. So instead of the ‘group’ label having the nested labels ‘item1’ and ‘item2’, I had two labels ‘group/item1’ and ‘group/item2’. This was easily rectified by manually creating the ‘group’ label.

I needed to set update the label settings to apply the settings for “Show in label list” (“show”, “hide”, “show if unread”) and “Show in message list” (“show”, “hide”) to match the source account.

Limitations

Although you can run the synchronisation again, it only transfers new message, but does not update the flags or labels.

This is due to the message filename being changed to match the destination Gmail account UID, so it no longer matches the source Gmail account. Fortunately a combination of OfflineIMAP’s ‘FMD5’ and the readonly = True configuration means source messages were not deleted or infinitely duplicated.

Future Work

OfflineIMAP

It should be possible to incorporate the parsing of the “Received” date into OfflineIMAP, and then this timestamp could be used by the utime_from_header configuration.

If ‘Gmail’ were supported as a local repository (by introducing ‘MappedGmailRepository’) then a direct Gmail to Gmail synchronisation could be possible.

imaplib2

I believe the “Too many read 0” exception thrown by imaplib2 is due to the assumption that an socket being reported as ready for read will lead to data being readable is violated by using a wrapped SSL socket.

I think a simple script connecting to Gmail with accompanying wireshark capture could confirm/deny my theory.

Configuration

My solution requires an offlinemaprc configuration file, and a helper script set-received-mtime.

This worked successfully for me, but YMMV. Take some time to understand it as OfflineIMAP is a powerful tool.

offlineimaprc

This is the configuration file used by OfflineIMAP.

Place this into the directory created for performing the migration, and configure the value of localfolders in [DEFAULT] section to match.

Set the correct credentials for the [Repository Source] and [Repository Target].

If you want to limit the email pulled back for initial testing, set the value of maxage in the [Account Pull] section.

[DEFAULT]
localfolders = ~/GmailMigration
timestamp = %(localfolders)s/syncstart.timestamp

sslcacertfile = /etc/ssl/certs/ca-certificates.crt
auth_mechanisms = PLAIN
synclabels = yes

[general]
accounts = Pull, Push
socktimeout = 60

[Account Pull]
remoterepository = Source
localrepository = In
presynchook = [ -f %timestamp)s ] || touch %(timestamp)s
postsynchook = \
    find %(localfolders)s -newer %(timestamp)s -name '*,FMD5=*' -print0 \
    | xargs -r0 %(localfolders)s/set-received-mtime \
    && rm %(timestamp)s
# maxage = 2018-09-01

[Account Push]
remoterepository = Target
localrepository = Out

[Repository Source]
type = Gmail
remoteuser = myaccount@gmail.com
remotepass = mysecretpassword
folderfilter = lambda folder: folder in ['[Gmail]/All Mail', '[Gmail]/Sent Mail']
nametrans = lambda folder: folder.replace('[Gmail]/', '', 1)
readonly = True

[Repository In]
type = GmailMaildir

[Repository Out]
type = Maildir
nametrans = lambda x: 'Xfer/' + x
readonly = True

[Repository Target]
type = Gmail
remoteuser = me@example.com
remotepass = mysecretpassword
folderfilter = lambda folder: folder in ['Xfer/All Mail', 'Xfer/Sent Mail']
nametrans = lambda folder: folder.replace('Xfer/', '', 1)

set-received-mtime

Place this script in the migration directory, and make it executable.

#! /bin/sh -e

receivedHeader() {
    < "$1" sed -n -e '
        /^Received:/{
            h
            : o
            n
            /^[ \t]/ {
                H
                b o
            }
            x
            s/\n//g
            s/Received:\s*//
            s/.*;\s*//
            p
            q
        }
        /^$/q'
}

dateHeader() {
    < "$1" sed -n -e '
        /^Date:/{
            s/Date:\s*//
            p
            q
        }
        /^$/q'
}

for message
do
    mtime="$(receivedHeader "$message")"
    [ -z "$mtime" ] && mtime="$(dateHeader "$message")"
    [ -n "$mtime" ] && touch -m -d "$mtime" "$message"
done