This is the second installment of a series I’m creating on “de-googling”. My goal here is not to eradicate Google from my life, but rather to limit my dependence on them. You can read some back-story in the previous article Untangling my life from the Google ecosystem.
My first goal here is to try to get a bunch of pictures back from Google Photos. When I started shutting down home based services, I had chosen Google Photos as my replacement. My goal here isn’t to remove all of my photos from Google Photos. Simply to remove my dependence on it. I am invested in the Google Assistant technology, part of which is a smart display in my kitchen. When it’s idle, displays pictures from Google Photos. Which I kind of like!
Finding a 100% replacement for all that Google Photos does for me feels unlikely. GPhotos features facial recognition and AI that makes it easy to find images later. It even recognizes my dogs! Which is a little creepy. The goal is to find a nice place to store all of my photos, that has sharing capabilities. A mobile app with instant upload or sync features is not just a bonus, but a requirement. GPhotos will automatically sync images I take with my phones camera. This acts as a nice convenient place to get the at the photos later, or share them. That functionality is important to me.
I looked at a number of photo hosting solutions. Many of them had some of the recognition and AI features that GPhotos does. Most of those did not include the instant upload feature. Nextcloud ended up as my winner. I was already starting to build it for photography storage. We’d bought a Canon M50 and were starting to experiment with photography. Google Photos scales your images down if they’re not uploaded from a Pixel. I wanted to store my pictures at full res. I also have prior experience with Nextcloud. The project has come a long way in the few years since I used it last. This may me a great solution for other things in the future.
One down-side. I don’t want to get back into hosting at home, if I can avoid it. The power consumption alone makes it a wash financially. A cloud provider like Digitalocean also means my services are running in an actual datacenter! Instead of my basement with limited power backup and cooling. Gigs and gigs and gigs of attached storage on any cloud provider gets expensive. Photo and maybe Video storage would blow up that budget quickly. So I wanted to back Nextcloud with an S3 like storage bucket. This proved to be a bit of a journey.
Getting my data out of Google Photos
But first, let’s take a step back. My photos are still sitting in GPhotos. Many of these I have no other copies of. I was in the process of consolidating ALL of my photos in GPhotos, until I got cold feet. You can download images from the GPhotos UI, but this would quickly become a tedious task. So I started looking at how to get them out.
Google offers a neat service, which I do believe existed before GDPR forced such things, called Google Takeout. Which lets you download ALL of your google data. The process of using Google Takeout is pretty well documented by Google themselves. I thought I’d share some of my experience here.
My goal was to download just Google Photos, which was easy enough to select in the Google Takeout wizard. They ask you how you’d like to download the data, the obvious choice was a Zip archive. They ask you what size to break the archive up into. However, one of the things Google Takeout does not tell you is just how much data you can expect to be downloading. At least I didn’t see it. So I picked 2gb chunks. It seemed like a good choice at the time.
The problem with that was… The resulting archive was almost 300gb. Do the math there.. After over 24 hours of waiting, I received an email with individual links to 140 2gb archive chunks. Each a zip file is a 2GB chunk. And in the case of videos larger than 2GB, they’re actually split. So an 8GB video file is chopped up and would then need to be re-joined. Not to mention they don’t give you some easy way to queue up all of the files. You need to click on each one, and download it in your browser. A programmer could probably have figured out how to automate this, but that’s not me.
So I re-started the process choosing 8gb chunks instead. I also went into GPhotos and deleted all of the videos that were from my Youtube recordings. I had copies of those backed up with my source media anyway. This helped a ton, I got it down to something like a dozen archives instead of 140.
So what was actually in the archives?
So what was in the archives? Each instant upload photo was grouped into a folder based on the date it was taken. So I had thousands of folders that looked like ‘2019-02-12’. These contained all the photos that were taken that day. Each photo had a JSON file as well. These files contained the date and location metadata for each image, and if there were things like comments or tags. I didn’t care so much about that metadata, and deleted all of them using find.
Then there was albums. I had done a lot of work to group things like events, into albums. And there are smart albums that Google Photos creates like “Friends and Family”. In my mind, I guess I pictured these like metadata. Something I’d find in those JSON files. It didn’t work out that way though. How they’re organized on the back-end I don’t know, but in the archives I ended up with photo duplication. Which actually worked out perfectly for my needs. Albums were individual folders, and all of the images and videos in those albums were duplicated inside of those folders. So our trip to Disney in 2019 was all lumped into a folder called Disney 2019. It was ALSO inside of the dated folders that coincided with when I took the pictures.
I even found a PDF of a photo-book we’d ordered from Google Photos at one point! Kinda neat that this was included. In all I’d say that these archives could have been organized more efficiently. They were good enough for my needs though! If I had a ton of metadata I was trying to import on the other hand. I think that would have been a little more trying.
I’ll start off by saying that I’ll make a deeper dive article about the minutia of building Nextcloud. Here’s the basics though. My sites are running in Podman containers on a CentOS 8 system on Digital Ocean. So I wanted Nextcloud to just fit right into that platform. I ended up using the official Nextcloud container, and paired it with a MariaDB container. Just for kicks, I added a Collabora container. Collabora is an online collaborative document editing suite, much like Google Docs, for Nextcloud, leveraging LibreOffice. And all of this is running behind an nginx vhost. I mapped in storage to podman for nextclouds config, and data. The data mapping later became less important, but I’ll get to that.
I pretty quickly figured out that the 80gb data volume allocated to my droplet would be a problem when I started collecting photos over time. It was large enough to hold what I had today, but not a ton of room for growth. Especially considering my other sites are all sharing that storage. So as I mentioned, I decided that I wanted to back it with S3, or an S3 like bucket. Digital Ocean has a service called Spaces, which uses an S3 API, though it is simplified. I felt like it should serve my needs, and it was right at hand. It did not go well, at least not a first.
Putting Photos… In a bucket
When I started looking into how to use an S3 compatible storage back-end, I quickly found information about Nextcloud’s External Storage plugin. External Storage is meant to.. well… bring in external storage. Like CIFS mounts, and S3. So I gave that a shot. I quickly ran into a number of time-out errors. Even errors regarding how many requests I was making, as I was importing photos. It was clear that this was not as robust as I’d hoped. I wasn’t sure if this was a DO limitation, or an S3 external storage limitation. So I bit the bullet and made an actual S3 bucket on AWS, and pointed the config there. It worked SO much better, but there was still a problem I ran into regarding deleting files.
I was starting to get a little dismayed. Then as I was researching this external storage deletion problem, I came across something. Someone who mentioned how they got around the deletion problem by using S3 as Primary Storage instead of External Storage. Wait.. What? This was exactly what I wanted from the start! Backing the primary Nextcloud storage with S3.
So I decided to give DO’s spaces a second shot, as primary storage instead. I reconfigured Nextcloud, which was a little challenging in itself, but once I got it working, was BEAUTIFUL! I was able to upload a swath of a dozen or so folders at once, no errors. So I decided to give it a real test, and configured the Nextcloud sync client on my laptop. Then I drug ALL 40gb of photos into the sync folder, and went to bed.
The next morning I found ALL of the photos had successfully sync’d. All of them were accessible from within the web UI. AWESOME! I turned sync back off for those folders. That way I wouldn’t be forced to keep all that space between Nextcloud and my laptop. The client cleaned up the sync directory. I still have them all backed up on an external hard drive of course, 2 is after all 1.
Nextcloud is great for my needs, and if I decide to drop, or at least limit, use of Google Docs, I’ll have this as a platform for that as well. Nextcloud does not, in my experience, offer the best photo management or gallery functionality. So if that’s what you’re looking for, you may want to dig a little deeper. For my needs though, it’s perfect. As I mentioned, I’ll go a bit deeper on how exactly I configured Nextcloud in another article. Thanks for reading!
[…] as I mentioned in my recent article about Google Photos, I’ve recently gotten back into the NextCloud game. My goal here isn’t to replace […]
[…] Getting my digital life back from Google Photos […]