It used to be easy to analyze your Twitter data: you'd go to your settings and ask for a download, and there among all the files would be a CSV file full of your tweets and the associated metadata. You could then load the CSV into Excel, convert it into a table, and save the resulting workbook. Once it was all in hand you were able to apply filters, searches, and, well, whatever analytical techniques you liked.
But things have changed at Twitter, and if you request a download of your data it comes as a set of JSON files. That shouldn't be a problem, after all, Excel can handle JSON. So, you try to open the file in Excel using its data import tooling, which usually works very well indeed as a basic ETL system. Only it doesn't open.
SEE: 30 things you should never do in Microsoft Office (free PDF)
That error led me down a garden path of Stack Overflow articles and a whole raft of different JSON developer tools, all to find that Twitter wasn't quite following the standards. That exercise also led me to discover that most JSON developer tools aren't worth the download, and that I probably needed to know quite a bit more Python than I do if I wanted to get anywhere. At least until I got an intriguing error message from one tool that pointed to me to a possible solution.
It turns out that Twitter is using a non-standard version of JSON lists for its exports, and you're going to need to make a quick change to the file it delivers if you want to be able to use your data in any analytical tooling. If you've been on Twitter as long as I have, that can mean editing at the very least a 100GB plus file (in my case two) to remove the text Twitter has added to the start of what would otherwise be a relatively normal list formatted set of JSON documents. You'll need a decent text editor to make the changes. I recommend something like Visual Studio Code, which can happily load very large text files.
Once loaded, delete the text up to the first "[". That's the start of a JSON list block, and all the content after it is the JSON documents that represent each tweet you've sent. I'd recommend saving the edited file as .json rather than the original .js to be sure.
Excel's JSON import tools can now load the data, but you're going to have to construct a data transformation to get at the information you want. In the Data section of the ribbon choose Get Data and in the drop down pick From File and then from JSON. The JSON will load and you'll be presented with a list of records.
As it stands that's not particularly helpful, and you're going to need to do some additional work to build a transform query that can bring the data you want into Excel. I'm going to be looking for just the date a tweet was posted and its content, as I've got a specific need in mind for my data (I've been reviewing every book I've read in tweet form for the last 11 years or so, and I want to extract that data and use it elsewhere).
To build my transform I need to first convert the list to a table. Click the Convert to table button in the ribbon. This gives you a single column view of the JSON records, which you can now use to choose the elements you want to use in your final spreadsheet. Once that first conversion is complete you'll see the list has been converted to a column, and there's a new icon in the table column header. Click this to get a list of the columns you can expand the JSON data into, using JSON keys as column names and the values as the data.
I only need the created_at and full_text columns, so I can deselect all the fields and choose the two I need. Once you've chosen the columns you want, click OK. The resulting data is then displayed. You can see what steps have been applied in the Query Settings view.
There's one problem with the data that's been extracted, it uses a ddd mmm dd hh:mm:ss +TZ YYY format that's not handled by Excel. We'll need to use more of Excel's transformation tools to convert that into an easier to parse format, removing the timestamp and the time zone offset. We can use the Split column function here to first make two new columns, one for the date and one for the year. Luckily for us the fixed ddd mmm dd format used for the day, month, and year gives us a 10 character offset from the start of the string. We can use this to split the column by number of characters, using the option to do this as far left as possible.
We now have two columns, so we can use the same function to take the last four yyyy characters and split the column again, using the last four characters from the right. We can then delete the time column, leaving us with a date column and a year column. These can then be concatenated, using the Merge Columns option in the Transform ribbon. Give the new column an appropriate name, choose a space separator and click OK. You can then use the Detect Data Type function to convert it to a date field.
We're now ready to import the original Twitter JSON into Excel. Click Close and Load. You should see a file size counter as the file loads. When it's complete, you're presented with a list of the number of loaded rows, and a populated Excel table. You can then use standard Excel techniques, like filters to extract the information you want.
Twitter's switch to pure JSON exports makes sense, as it's how it provides data to partners and is likely to be how it populates its own applications. Dropping CSV from exports will save time and compute, even if it does put more of a burden on the end user. However, once you get round the initial loading issues and how it stores certain key data, you're able to use Excel's data transformation tools to give you a way of analyzing your data.