I have a directory of nuget packages that I've downloaded from nuget.org. I'm trying to create a regex that will parse out the package name and version number from the filename. It doesn't seem difficult at first glance; the filenames have a clear pattern:
{PackageName}.{VersionNumber}.nupkg
Edge cases make it challenging though.
- Package names can have dashes, underscores, and numbers
- Package names can have effectively unlimited parts separated by dots
- Version numbers consist of 3-4 groups of numbers, separated by dots
- Version numbers sometimes are suffixed with pre-release tags (-alpha, -beta, etc)
Here's a sample list of nuget package filenames:
knockoutjs.3.4.2.nupkglog4net.2.0.8.nupkgruntime.tizen.4.0.0-armel.microsoft.netcore.jit.2.0.0.nupkgnuget.core.2.7.0-alpha.nupkgmicrosoft.identitymodel.6.1.7600.16394.nupkg
I want to be able to do a search/replace in a Serious Text Editor where the search is a regex with two groups, one for the package name and one for the version number. The output should be "Package: \1 Version: \2". With the 5 packages above, the output should be:
Package: knockoutjs Version: 3.4.2Package: log4net Version: 2.0.8Package: runtime.tizen.4.0.0-armel.microsoft.netcore.jit Version: 2.0.0Package: nuget.core Version: 2.7.0-alphaPackage: microsoft.identitymodel Version: 6.1.7600.16394
The closest relatively concise regex I've come up with is:
^([^\s]*)\.((?:[0-9]+\.){3,})nupkg$
...which results in the following output:
Package: knockoutjs Version: 3.4.2.Package: log4net Version: 2.0.8.Package: runtime.tizen.4.0.0-armel.microsoft.netcore.jit Version: 2.0.0.nuget.core.2.7.0-alpha.nupkgPackage: microsoft.identitymodel.6 Version: 1.7600.16394.
It handles the first three decently, although I don't want that trailing dot. It doesn't even match on the fourth one, and the fifth one has the first part of the version number lumped in with the package name.