repository of boxes

Linux, at its base, is just a kernel, but what does the kernel do without additional packages? The Kernel enables everything, but software that interacts with the kernel is what makes a linux distribution.  Every distribution has its own way of handling things.  Some focus on stability, as a platform for critical workloads, others on being on the cutting edge of the latest that a Linux desktop has to offer.  All of them have a few things in common, however. 

I am going to talk about one of those commonalities: software repositories, specifically the rpm repository.  Some distributions give you source trees in order to build the packages that you intend to run on your system from the source code that makes up everything in the open-source community.  However many others do the building for you and provide you with binaries that simply need to be placed on your system.  The latter, a binary distribution, needs a way to place these binaries and supporting files on your system such that they are usable.  When you build from source, part of the build files included with the source generally gives you a mechanism for installation of the binaries once they are built.  If you’re not building them, how are you to know where they go?  This is where packages come into play. 

A package is usually an archive of some sort, with some metadata around it.  The metadata gives you (or more accurately your package manager) information about how to install the binaries and their supporting files, as well as execute scripts that take some basic steps to enable your system to run these packages.  Another bit of metadata has to do with other software that the software you are installing depends on.  These are called package dependencies.  If a package needs another package installed, it will be listed as a dependency. 

There was a time when package management was still a bit manual.  Red Hat based distributions use a package type called rpm, this once stood for the “Red Hat Package Manager”, and has since been renamed to the recursive acronym, “RPM package manager”, as it is not exclusive to Red Hat distributions.  Suse for example, uses rpm packages.  Rpm packages by themselves can be installed with the command line tool, rpm.  In the early days, that was it, if an rpm package had a dependency on another package, you would be told of the missing dependencies, and have to go hunt down those rpm’s, and install them.  If they had other dependencies, you’d have to go find them and install them.  This led to what was referred to as “Dependency hell”, following a seemingly endless tree of dependencies until you got everything installed.  

Thankfully now every civilized linux distribution includes a package manager backed by a software repository.  A repository is just what it sounds like: a cataloged collection of software packages, that make it easier to find, and install software that is designed to work with your distribution, including dependency resolution.  Now, on a Red Hat based distribution, you can use dnf (previously yum) to query a software repository, find the software you’re looking for, and install it.  

So, I may have rambled a bit there, but I believe that in order to appreciate where we are today, which I promise is about to get technical and build an RPM repository, by understanding where we came from. 

Repository structure

There are several ways in which you might choose to utilize your repository: local files, nfs share, http, ftp, each may help inform how you should lay out your files. However, the basics are this:  You’ll want a directory to store all of your rpm’s in, and then beneath that directory, there will be a repodata directory, which contains the repository metadata.  Where you choose to place this directory tree is up to you, but the tree must remain intact.  A collection of RPM packages, with the repodata directory beneath it.  Keep this in mind. 

Creating a repository

For my example, I am going to use some pre-existing rpm’s, pulled from the repositories provided by my distribution.  You may be re-serving rpm’s provided to you, or perhaps rpm’s that you created yourself, the procedure is the same. 

$ ls

nginx-1.24.0-1.fc37.x86_64.rpm   nginx-filesystem-1.24.0-1.fc37.noarch.rpm

nginx-core-1.24.0-1.fc37.x86_64.rpm

Creating a repository is actually quite simple.  There is a package that you need to install, which provides the createrepo command.  This command will, as expected, create a repo.  As you can see, I already have it installed. 

[gangrif@shinji repos]$ sudo dnf install createrepo_c

[sudo] password for gangrif:

Last metadata expiration check: 2:00:39 ago on Sun 02 Jul 2023 08:28:16 PM EDT.

Package createrepo_c-0.21.1-1.fc37.x86_64 is already installed.

Dependencies resolved.

Nothing to do.

Complete!

Now, to actually create the repo, we simply run createrepo, and give it a path to create the repo in. 

$ createrepo .

Directory walk started

Directory walk done - 3 packages

Temporary output repo path: ./.repodata/

Preparing sqlite DBs

Pool started (with 5 workers)

Pool finished

And now, if we look at the directory structure, we see the repodata sub-directory now exists, and within we have.. Well.. the repository metadata.  

$ tree

.

├── nginx-1.24.0-1.fc37.x86_64.rpm

├── nginx-core-1.24.0-1.fc37.x86_64.rpm

├── nginx-filesystem-1.24.0-1.fc37.noarch.rpm

└── repodata

├── 232501a16de6f5935a2a6796a8667bebe552c8dbd679989b4d1b7df041641613-filelists.xml.gz

├── 506e44035a86e159451afa682575a82cc8602bb4b6307cc99cfd8e7ec086230f-primary.xml.gz

├── 8dde4448bc62acb0a65c9a2cf4fbcffc342c7d8d11ea8bfa88ec56debd8d6109-filelists.sqlite.bz2

├── a7567bbfd8131765cbd4bf629069e49a82295402bbeef288d2a663c5c3f81c9b-primary.sqlite.bz2

├── bcd92bf30ac0a078e460e5a2eb03082ec0888d8329e6640b2bbe45fe595a06df-other.sqlite.bz2

├── e787ad258b41edfa415c52ae0295397156180b0d6fed6ef8ff32a026d1d544ed-other.xml.gz

└── repomd.xml

2 directories, 10 files

Now that we have the metadata, we can do something with it! 

Configuring the repository on a client

A repository isn’t much use unless you set it up on a client.  Now, a client doesn’t have to be remote.  In fact, setting up a local repository is a great way to test if your repo is actually working as expected.  The following example is about as simple as it gets, I am pointing dnf to a local file repository. 

In /etc/yum.repos.d/ you can place an additional .repo file to define more repositories. 

$ cat /etc/yum.repos.d/mylocal.repo

[mylocal]

name=My Local Repo

baseurl=file:///home/gangrif/repos

enabled=1

gpgcheck=0

I’ve given my repo a name, pointed the URL to a local path, enabled it, and told it not to bother with gpg checks.  Signed rpm’s could take another blog post, I may tackle that at a later date if there is interest. 

Now, if we check dnf’s list of repos, we should have a new repo listed. You can see “mylocal” in the list of repositories. 

$ dnf repolist

repo id                                 repo name

fedora                                  Fedora 37 - x86_64

fedora-cisco-openh264                   Fedora 37 openh264 (From Cisco) - x86_64

fedora-modular                          Fedora Modular 37 - x86_64

google-chrome                           google-chrome

mylocal                                 My Local Repo

phracek-PyCharm                         Copr repo for PyCharm owned by phracek

rpmfusion-nonfree-nvidia-driver         RPM Fusion for Fedora 37 - Nonfree - NVIDIA Driver

rpmfusion-nonfree-steam                 RPM Fusion for Fedora 37 - Nonfree - Steam

updates                                 Fedora 37 - x86_64 - Updates

updates-modular                         Fedora Modular 37 - x86_64 - Updates

And, if we look up the package info for nginx (which is the only package I added to my repo) you’ll see two results, one for the official package from Fedora updates, and the other from my local repo.  Dnf does not care that these are the same packages, it just tells me what it knows about. 

$ dnf info nginx

My Local Repo                                                          749 kB/s | 2.3 kB 00:00    

Available Packages

Name     : nginx

Epoch    : 1

Version  : 1.24.0

Release  : 1.fc37

Architecture : x86_64

Size     : 34 k

Source   : nginx-1.24.0-1.fc37.src.rpm

Repository   : mylocal

Summary  : A high performance web server and reverse proxy server

URL      : https://nginx.org

License  : BSD

Description  : Nginx is a web server and a reverse proxy server for HTTP, SMTP, POP3 and

          : IMAP protocols, with a strong focus on high concurrency, performance and low

          : memory usage.

Name     : nginx

Epoch    : 1

Version  : 1.24.0

Release  : 1.fc37

Architecture : x86_64

Size     : 34 k

Source   : nginx-1.24.0-1.fc37.src.rpm

Repository   : updates

Summary  : A high performance web server and reverse proxy server

URL      : https://nginx.org

License  : BSD

Description  : Nginx is a web server and a reverse proxy server for HTTP, SMTP, POP3 and

          : IMAP protocols, with a strong focus on high concurrency, performance and low

          : memory usage.

Updating your repository

It is important to remember that whenever you make a change to your repository, you need to re-run createrepo. You do this in the same way that you did when you first created the repo. This will re-scan the files in your repository, and re-build the metadata Even for seemingly simple changes, you will need to updated this repository metadata. Part of the data in the repository is the checksum of the rpms in the repo. Even the smallest change to the rpm packages will change that checksum. Even just adding a gpg signature. So, remember to re-run createrepo whenever

Remote access

Now, obviously, a local repository is only so useful.  If you are managing hundreds of machines, or even just two, you’re going to want to make this repository remotely accessible.  I am not going to cover the details in this post (you can read up on the topic here though), but I will say that the most common way this is generally done is over http or https.  For this, you’d simply setup an http server (like nginx, or httpd) and place the entire directory structure of your repository into the webroot of your http server.  This is where that pathing matters I mentioned earlier.  If you have more than one repo, you’ll want them separated into different directories, so that their repodata does not clash.  A good example might be if you are publishing repositories for two different architectures of your packages.  You might have a myrepo/x86 and a myrepo/x86_64, each its own collection of RPMs, and each its own repodata.  

Closing

Thanks for reading!  I hope you’ve found this article useful.  If you have, feel free to leave me a comment below!  Let me know if you’d like to see more content around package repositories.