In October 2021, a Fedora Linux user asked a question about licensing. Fedora Project Leader Matthew Miller left a response: “Since we don’t have a complete, exploded, searchable repository of all of the packages in Fedora, I don’t have a quick way to check.”
Followed by: “…or possibly pay Sourcegraph to do it for us. They seem like nice people.” He is correct, we (Sourcegraph) are nice people, but we don’t want your money. Instead, we wanted to team up with the Fedora community.
The Fedora Community can now search their universe of open source code—currently over 34,000 repositories and counting.
Introduction to code search
For those who aren’t familiar with the concept of code search, it enables teams to onboard to a new codebase and find answers faster, helps to identify security risks, and many other use cases. Sourcegraph has indexed over two-million repositories across multiple code hosts such as GitHub and GitLab. This article is going to focus strictly on code search for src.fedoraproject.org. Sourcegraph provides both a web app and CLI interface.
Using the Web app
When using the Sourcegraph web app you will need to start each search with repo:^src.fedoraprojects.org before entering any search queries. Using this link to the web app will include this initial string as shown here:
The following sections will provide some web app examples of searches that might be of interest.
Find repositories using popular OSI-approved licenses
The following query will scan all the repositories for software that is compatible with the “Open Source Definition” (OSD).
repo:^src.fedoraproject.org/ lang:"RPM Spec" License: ^.*apache|bsd|gpl|lgpl|mit|mpl|cddl|epl.*$
Find files with TODOs
The following query can find TODOs in 34k repositories. This is great for those looking to contribute to projects that need help.
repo:^src.fedoraproject.org/ "TODO"
Find files being served via FTP
A co-worker of mine from back in the day told me “FTP is a dead protocol”. Is it? You can add to this query to find any other protocol such as irc, https, etc.
repo:^src.fedoraproject.org/ (?:ftp)://[A-Za-z0-9-]{0,63}(.[A-Za-z0-9-]{0,63})+(:d{1,4})?/*(/*[A-Za-z0-9-._]+/*)*(?.*)?(#.*)?
Find files with a vulnerable version of Log4j
This query will find any files that are possibly vulnerable (false positives can happen) to CVE-2021-44228 aka Log4j. You can also search for other vulnerabilities that can then be reported to project maintainers.
repo:^src.fedoraproject.org/ org.apache.logging.log4j 2.((0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15)(.[0-9]+)) count:all
Use the CLI
Sourcegraph also has a command-line interface tool called src, which allows you to do everything I just mentioned above, plus other useful commands like getting results in JSON for programmatic consumption.
src search -json 'repo:^src.fedoraproject.org/ lang:"RPM Spec" License: ^.*apache|bsd|g pl|lgpl|mit|mpl|cddl|epl.*$'
JSON output
Search Syntax
The examples shown may be a good starting point but are by no means the only queries that may be made. You can view all search query syntaxes and create your own as needed.
Conclusion
As you can see, with Sourcegraph, the Fedora Linux community can now quickly search for all code hosted at src.fedoraproject.org, regardless of whether they are literal or complex regex queries.
I appreciate the Fedora Linux community being so helpful and welcoming. If you have anything you want to add or questions, my team and I will be in the comments section below. You can also join us on Slack.
Special thanks to Vanesa Ortiz for making this collaboration happen, Ben Venker for his help fixing my broken regex (multiple times), as well as Rebecca Dodd and Nick Moore for their help with editing.
wefrwe
how download (rsync, wget curl etc) all package to disc?
for install offline/ update
Justin Dorfman
Try this:
--publish 7080:7080 --publish 127.0.0.1:3370:3370 --rm
--volume ~/.sourcegraph/config:/etc/sourcegraph
--volume ~/.sourcegraph/data:/var/opt/sourcegraph
sourcegraph/server:3.37.0
Darvond
Neat. Seems like something the Debian Project could use, given the recent accusations of poor maintainership that have been heard across the winds.
https://unixsheikh.com/articles/the-delusions-of-debian.html
The arduous TL;DR being that Debian’s packaging has more abandoned, unmaintained, and orphaned than would initially be lead on. This creates headaches for everyone.
Justin Dorfman
Thanks for pointing this out, I was not aware that the Debian community was struggling. Do you think it’s worth reaching out?
Darvond
Sure, why not. It’d at the least be an option for them to consider.
Michael Gruber
First of all thanks for this great service!
src.fedoraproject.org hosts packaging sources, that is spec files, source identifiers (archive name and sha256 checksum), patches and possibly other “small items” checked into “dist-git” as we call it. Being able to search them in one place is quite useful already. (plus the tests namespace …)
This does not include the actual package source (upstream source) code as for the indexed GitHub and GitLab repos. This is somewhat more complicated (and more than somewhat more code), of course.
Justin Dorfman
Our pleasure!
Yes, I should have been more clear in the article. This is just the first step of indexing the Fedora universe. We will continue to work with the Fedora team to make the integration more useful.
Exactly. We don’t have a timeline yet but it is definitely on our radar.
junaruga
Here is alternative way to search all the RPM spec files.
$ tar xf rpm-specs-latest.tar.xz
$ cd rpm-specs
$ grep 'something' *.spec
fedorafree
sourcegraph itself should be open source and installable on fedora on disk and not just on the cloud. It should be freely downloadable from the fedora store, and its code must be open source and accessible.
Nick Vlahos
Positively “AMAZING”.. !! Benefits the “Universe as a Whole”.. not just Fedora’s!! Keep up the Good Work !!