Using Sourcegraph to Search 34,000+ Fedora Repositories

Photo by Markus Winkler on Unsplash

In October 2021, a Fedora Linux user asked a question about licensing. Fedora Project Leader Matthew Miller left a response: “Since we don’t have a complete, exploded, searchable repository of all of the packages in Fedora, I don’t have a quick way to check.” 

Followed by: “…or possibly pay Sourcegraph to do it for us. They seem like nice people.” He is correct, we (Sourcegraph) are nice people, but we don’t want your money. Instead, we wanted to team up with the Fedora community.

The Fedora Community can now search their universe of open source code—currently over 34,000 repositories and counting.

Introduction to code search

For those who aren’t familiar with the concept of code search, it enables teams to onboard to a new codebase and find answers faster, helps to identify security risks, and many other use cases. Sourcegraph has indexed over two-million repositories across multiple code hosts such as GitHub and GitLab. This article is going to focus strictly on code search for src.fedoraproject.org. Sourcegraph provides both a web app and CLI interface.

Using the Web app

When using the Sourcegraph web app you will need to start each search with repo:^src.fedoraprojects.org before entering any search queries. Using this link to the web app will include this initial string as shown here:

Sourcegraph web app interface

The following sections will provide some web app examples of searches that might be of interest.

Find repositories using popular OSI-approved licenses 

The following query will scan all the repositories for software that is compatible with the “Open Source Definition” (OSD).

repo:^src.fedoraproject.org/ lang:"RPM Spec" License: ^.*apache|bsd|gpl|lgpl|mit|mpl|cddl|epl.*$
License search

Find files with TODOs

The following query can find TODOs in 34k repositories. This is great for those looking to contribute to projects that need help.

repo:^src.fedoraproject.org/ "TODO"
Search for TODO

Find files being served via FTP

A co-worker of mine from back in the day told me “FTP is a dead protocol”. Is it? You can add to this query to find any other protocol such as irc, https, etc.

repo:^src.fedoraproject.org/ (?:ftp)://[A-Za-z0-9-]{0,63}(.[A-Za-z0-9-]{0,63})+(:d{1,4})?/*(/*[A-Za-z0-9-._]+/*)*(?.*)?(#.*)?
Search for protocol

Find files with a vulnerable version of Log4j

This query will find any files that are possibly vulnerable (false positives can happen) to CVE-2021-44228 aka Log4j. You can also search for other vulnerabilities that can then be reported to project maintainers.

repo:^src.fedoraproject.org/ org.apache.logging.log4j 2.((0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15)(.[0-9]+)) count:all
Search for log4j

Use the CLI

Sourcegraph also has a command-line interface tool called src, which allows you to do everything I just mentioned above, plus other useful commands like getting results in JSON for programmatic consumption.

src search -json 'repo:^src.fedoraproject.org/ lang:"RPM Spec" License: ^.*apache|bsd|g
pl|lgpl|mit|mpl|cddl|epl.*$'

JSON output

JSON output

Search Syntax

The examples shown may be a good starting point but are by no means the only queries that may be made. You can view all search query syntaxes and create your own as needed.

Conclusion

As you can see, with Sourcegraph, the Fedora Linux community can now quickly search for all code hosted at src.fedoraproject.org, regardless of whether they are literal or complex regex queries.

I appreciate the Fedora Linux community being so helpful and welcoming. If you have anything you want to add or questions, my team and I will be in the comments section below. You can also join us on Slack.

Special thanks to Vanesa Ortiz for making this collaboration happen, Ben Venker for his help fixing my broken regex (multiple times), as well as Rebecca Dodd and Nick Moore for their help with editing.

Fedora Project community

10 Comments

  1. wefrwe

    how download (rsync, wget curl etc) all package to disc?
    for install offline/ update

    • Justin Dorfman

      Try this:

      docker run
      --publish 7080:7080 --publish 127.0.0.1:3370:3370 --rm
      --volume ~/.sourcegraph/config:/etc/sourcegraph
      --volume ~/.sourcegraph/data:/var/opt/sourcegraph
      sourcegraph/server:3.37.0
  2. Darvond

    Neat. Seems like something the Debian Project could use, given the recent accusations of poor maintainership that have been heard across the winds.

    https://unixsheikh.com/articles/the-delusions-of-debian.html

    The arduous TL;DR being that Debian’s packaging has more abandoned, unmaintained, and orphaned than would initially be lead on. This creates headaches for everyone.

    • Justin Dorfman

      Thanks for pointing this out, I was not aware that the Debian community was struggling. Do you think it’s worth reaching out?

      • Darvond

        Sure, why not. It’d at the least be an option for them to consider.

  3. Michael Gruber

    First of all thanks for this great service!

    src.fedoraproject.org hosts packaging sources, that is spec files, source identifiers (archive name and sha256 checksum), patches and possibly other “small items” checked into “dist-git” as we call it. Being able to search them in one place is quite useful already. (plus the tests namespace …)

    This does not include the actual package source (upstream source) code as for the indexed GitHub and GitLab repos. This is somewhat more complicated (and more than somewhat more code), of course.

    • Justin Dorfman

      Our pleasure!

      Yes, I should have been more clear in the article. This is just the first step of indexing the Fedora universe. We will continue to work with the Fedora team to make the integration more useful.

      This is somewhat more complicated (and more than somewhat more code), of course.

      Exactly. We don’t have a timeline yet but it is definitely on our radar.

  4. junaruga

    Here is alternative way to search all the RPM spec files.

    $ wget http://src.fedoraproject.org/repo/rpm-specs-latest.tar.xz
    $ tar xf rpm-specs-latest.tar.xz
    $ cd rpm-specs
    $ grep 'something' *.spec
  5. fedorafree

    sourcegraph itself should be open source and installable on fedora on disk and not just on the cloud. It should be freely downloadable from the fedora store, and its code must be open source and accessible.

  6. Positively “AMAZING”.. !! Benefits the “Universe as a Whole”.. not just Fedora’s!! Keep up the Good Work !!

Comments are Closed

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat. Fedora Magazine aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. The Fedora logo is a trademark of Red Hat, Inc. Terms and Conditions