In October 2021, a Fedora Linux user asked a question about licensing. Fedora Project Leader Matthew Miller left a response: “Since we don’t have a complete, exploded, searchable repository of all of the packages in Fedora, I don’t have a quick way to check.”
Followed by: “…or possibly pay Sourcegraph to do it for us. They seem like nice people.” He is correct, we (Sourcegraph) are nice people, but we don’t want your money. Instead, we wanted to team up with the Fedora community.
The Fedora Community can now search their universe of open source code—currently over 34,000 repositories and counting.
Introduction to code search
For those who aren’t familiar with the concept of code search, it enables teams to onboard to a new codebase and find answers faster, helps to identify security risks, and many other use cases. Sourcegraph has indexed over two-million repositories across multiple code hosts such as GitHub and GitLab. This article is going to focus strictly on code search for src.fedoraproject.org. Sourcegraph provides both a web app and CLI interface.
Using the Web app
When using the Sourcegraph web app you will need to start each search with repo:^src.fedoraprojects.org before entering any search queries. Using this link to the web app will include this initial string as shown here:
The following sections will provide some web app examples of searches that might be of interest.
Find repositories using popular OSI-approved licenses
The following query will scan all the repositories for software that is compatible with the “Open Source Definition” (OSD).
repo:^src.fedoraproject.org/ lang:"RPM Spec" License: ^.*apache|bsd|gpl|lgpl|mit|mpl|cddl|epl.*$
Find files with TODOs
The following query can find TODOs in 34k repositories. This is great for those looking to contribute to projects that need help.
Find files being served via FTP
A co-worker of mine from back in the day told me “FTP is a dead protocol”. Is it? You can add to this query to find any other protocol such as irc, https, etc.
Find files with a vulnerable version of Log4j
This query will find any files that are possibly vulnerable (false positives can happen) to CVE-2021-44228 aka Log4j. You can also search for other vulnerabilities that can then be reported to project maintainers.
repo:^src.fedoraproject.org/ org.apache.logging.log4j 2.((0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15)(.[0-9]+)) count:all
Use the CLI
Sourcegraph also has a command-line interface tool called src, which allows you to do everything I just mentioned above, plus other useful commands like getting results in JSON for programmatic consumption.
src search -json 'repo:^src.fedoraproject.org/ lang:"RPM Spec" License: ^.*apache|bsd|g pl|lgpl|mit|mpl|cddl|epl.*$'
The examples shown may be a good starting point but are by no means the only queries that may be made. You can view all search query syntaxes and create your own as needed.
As you can see, with Sourcegraph, the Fedora Linux community can now quickly search for all code hosted at src.fedoraproject.org, regardless of whether they are literal or complex regex queries.
I appreciate the Fedora Linux community being so helpful and welcoming. If you have anything you want to add or questions, my team and I will be in the comments section below. You can also join us on Slack.
Special thanks to Vanesa Ortiz for making this collaboration happen, Ben Venker for his help fixing my broken regex (multiple times), as well as Rebecca Dodd and Nick Moore for their help with editing.