filetools
Set of file tools.
Authors:
- Stijn De Weirdt (Ghent University)
- Dries Verdegem (Ghent University)
- Kenneth Hoste (Ghent University)
- Pieter De Baets (Ghent University)
- Jens Timmerman (Ghent University)
- Toon Willems (Ghent University)
- Ward Poelmans (Ghent University)
- Fotis Georgatos (Uni.Lu, NTUA)
- Sotiris Fragkiskos (NTUA, CERN)
- Davide Vanzo (ACCRE, Vanderbilt University)
- Damian Alvarez (Forschungszentrum Juelich GmbH)
- Maxime Boissonneault (Compute Canada)
ZlibChecksum
¶
adjust_permissions(provided_path, permission_bits, add=True, onlyfiles=False, onlydirs=False, recursive=True, group_id=None, relative=True, ignore_errors=False)
¶
Change permissions for specified path, using specified permission bits
PARAMETER | DESCRIPTION |
---|---|
add
|
add permissions relative to current permissions (only relevant if 'relative' is set to True)
DEFAULT:
|
onlyfiles
|
only change permissions on files (not directories)
DEFAULT:
|
onlydirs
|
only change permissions on directories (not files)
DEFAULT:
|
recursive
|
change permissions recursively (only makes sense if path is a directory)
DEFAULT:
|
group_id
|
also change group ownership to group with this group ID
DEFAULT:
|
relative
|
add/remove permissions relative to current permissions (if False, hard set specified permissions)
DEFAULT:
|
ignore_errors
|
ignore errors that occur when changing permissions (up to a maximum ratio specified by --max-fail-ratio-adjust-permissions configuration option) Add or remove (if add is False) permission_bits from all files (if onlydirs is False) and directories (if onlyfiles is False) in path
DEFAULT:
|
apply_patch(patch_file, dest, fn=None, copy=False, level=None, use_git=False)
¶
Apply a patch to source code in directory dest - assume unified diff created with "diff -ru old new"
Raises EasyBuildError on any error and returns True on success
apply_regex_substitutions(paths, regex_subs, backup='.orig.eb', on_missing_match=None, match_all=False, single_line=True)
¶
Apply specified list of regex substitutions.
PARAMETER | DESCRIPTION |
---|---|
paths
|
list of paths to files to patch (or just a single filepath)
|
regex_subs
|
list of substitutions to apply, specified as (
|
backup
|
create backup of original file with specified suffix (no backup if value evaluates to False)
DEFAULT:
|
on_missing_match
|
Define what to do when no match was found in the file. Can be 'error' to raise an error, 'warn' to print a warning or 'ignore' to do nothing Defaults to the value of --strict
DEFAULT:
|
match_all
|
Expect to match all patterns in all files instead of at least one per file for error/warning reporting
DEFAULT:
|
single_line
|
Replace first match of each pattern for each line in the order of the patterns. If False the patterns are applied in order to the full text and may match line breaks.
DEFAULT:
|
back_up_file(src_file, backup_extension='bak', hidden=False, strip_fn=None)
¶
Backs up a file appending a backup extension and timestamp to it (if there is already an existing backup).
PARAMETER | DESCRIPTION |
---|---|
src_file
|
file to be back up
|
backup_extension
|
extension to use for the backup file (can be empty or None)
DEFAULT:
|
hidden
|
make backup hidden (leading dot in filename)
DEFAULT:
|
strip_fn
|
strip specified trailing substring from filename of backup
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
location of backed up file |
calc_block_checksum(path, algorithm)
¶
Calculate a checksum of a file by reading it into blocks
change_dir(path)
¶
Change to directory at specified location.
PARAMETER | DESCRIPTION |
---|---|
path
|
location to change to
|
RETURNS | DESCRIPTION |
---|---|
previous location we were in |
check_lock(lock_name)
¶
Check whether a lock with specified name already exists.
If it exists, either wait until it's released, or raise an error (depending on --wait-on-lock-* configuration option).
clean_up_locks()
¶
Clean up all still existing locks that were created in this session.
clean_up_locks_signal_handler(signum, frame)
¶
Signal handler, cleans up locks & exits with received signal number.
cleanup(logfile, tempdir, testing, silent=False)
¶
Cleanup the specified log file and the tmp directory, if desired.
PARAMETER | DESCRIPTION |
---|---|
logfile
|
path to log file to clean up
|
tempdir
|
path to temporary directory to clean up
|
testing
|
are we in testing mode? if so, don't actually clean up anything
|
silent
|
be silent (don't print anything to stdout)
DEFAULT:
|
compute_checksum(path, checksum_type=DEFAULT_CHECKSUM)
¶
Compute checksum of specified file.
PARAMETER | DESCRIPTION |
---|---|
path
|
Path of file to compute checksum for
|
checksum_type
|
type(s) of checksum ('adler32', 'crc32', 'md5', 'sha1', 'sha256', 'sha512', 'size')
DEFAULT:
|
convert_name(name, upper=False)
¶
Converts name so it can be used as variable name
copy(paths, target_path, force_in_dry_run=False, **kwargs)
¶
Copy single file/directory or list of files and directories to specified location
PARAMETER | DESCRIPTION |
---|---|
paths
|
path(s) to copy
|
target_path
|
target location
|
force_in_dry_run
|
force running the command during dry run
DEFAULT:
|
kwargs
|
additional named arguments to pass down to copy_dir
DEFAULT:
|
copy_dir(path, target_path, force_in_dry_run=False, dirs_exist_ok=False, check_for_recursive_symlinks=True, **kwargs)
¶
Copy a directory from specified location to specified location
PARAMETER | DESCRIPTION |
---|---|
path
|
the original directory path
|
target_path
|
path to copy the directory to
|
force_in_dry_run
|
force running the command during dry run
DEFAULT:
|
dirs_exist_ok
|
boolean indicating whether it's OK if the target directory already exists
DEFAULT:
|
check_for_recursive_symlinks
|
If symlink arg is not given or False check for recursive symlinks first shutil.copytree is used if the target path does not exist yet; if the target path already exists, the 'copy' function will be used to copy the contents of the source path to the target path Additional specified named arguments are passed down to shutil.copytree/copy if used.
DEFAULT:
|
copy_easyblocks(paths, target_dir)
¶
Find right location for easyblock file and copy it there
copy_file(path, target_path, force_in_dry_run=False)
¶
Copy a file from specified location to specified location
PARAMETER | DESCRIPTION |
---|---|
path
|
the original filepath
|
target_path
|
path to copy the file to
|
force_in_dry_run
|
force copying of file during dry run
DEFAULT:
|
copy_files(paths, target_path, force_in_dry_run=False, target_single_file=False, allow_empty=True, verbose=False)
¶
Copy list of files to specified target path. Target directory is created if it doesn't exist yet.
PARAMETER | DESCRIPTION |
---|---|
paths
|
list of filepaths to copy
|
target_path
|
path to copy files to
|
force_in_dry_run
|
force copying of files during dry run
DEFAULT:
|
target_single_file
|
if there's only a single file to copy, copy to a file at target path (not a directory)
DEFAULT:
|
allow_empty
|
allow empty list of paths to copy as input (if False: raise error on empty input list)
DEFAULT:
|
verbose
|
print a message to report copying of files
DEFAULT:
|
copy_framework_files(paths, target_dir)
¶
Find right location for framework file and copy it there
create_index(path, ignore_dirs=None)
¶
Create index for files in specified path.
create_lock(lock_name)
¶
Create lock with specified name.
create_non_existing_paths(paths, max_tries=10000)
¶
Create directories with given paths (including the parent directories). When a directory in the same location for any of the specified paths already exists, then the suffix '_' is appended , with i iteratively picked between 0 and (max_tries-1), until an index is found so that all required paths are non-existing. All created directories have the same suffix.
PARAMETER | DESCRIPTION |
---|---|
paths
|
list of directory paths to be created
|
max_tries
|
maximum number of tries before failing
DEFAULT:
|
create_patch_info(patch_spec)
¶
Create info dictionary from specified patch spec.
create_unused_dir(parent_folder, name)
¶
Create a new folder in parent_folder using name as the name. When a folder of that name already exists, '_0' is appended which is retried for increasing numbers until an unused name was found
decode_class_name(name)
¶
Return decoded version of class name.
decode_string(name)
¶
Decoding function to revert result of encode_string.
derive_alt_pypi_url(url)
¶
Derive alternative PyPI URL for given URL.
det_common_path_prefix(paths)
¶
Determine common path prefix for a given list of paths.
det_file_size(http_header)
¶
Determine size of file from provided HTTP header info (without downloading it).
det_lock_path(lock_name)
¶
Determine full path for lock with specifed name.
det_patched_files(path=None, txt=None, omit_ab_prefix=False, github=False, filter_deleted=False)
¶
Determine list of patched files from a patch. It searches for "+++ path/to/patched/file" lines to determine the patched files. Note: does not correctly handle filepaths with spaces.
PARAMETER | DESCRIPTION |
---|---|
path
|
the path to the diff
DEFAULT:
|
txt
|
the contents of the diff (either path or txt should be give)
DEFAULT:
|
omit_ab_prefix
|
ignore the a/ or b/ prefix of the files
DEFAULT:
|
github
|
only consider lines that start with 'diff --git' to determine list of patched files
DEFAULT:
|
filter_deleted
|
filter out all files that were deleted by the patch
DEFAULT:
|
det_size(path)
¶
Determine total size of given filepath (in bytes).
diff_files(path1, path2)
¶
Return unified diff between two files
dir_contains_files(path, recursive=True)
¶
Return True if the given directory does contain any file
:recursive If False only the path itself is considered, else all subdirectories are also searched
download_file(filename, url, path, forced=False, trace=True)
¶
Download a file from the given URL, to the specified path.
dump_index(path, max_age_sec=None)
¶
Create index for files in specified path, and dump it to file (alphabetically sorted).
encode_class_name(name)
¶
return encoded version of class name
encode_string(name)
¶
This encoding function handles funky software names ad infinitum, like: example: '0_foo+0x0x#-$__' becomes: '0_underscore_foo_plus_0x0x_hash__minus__dollar__underscore__underscore_' The intention is to have a robust escaping mechanism for names like c++, C# et al
It has been inspired by the concepts seen at, but in lowercase style: * http://fossies.org/dox/netcdf-4.2.1.1/escapes_8c_source.html * http://celldesigner.org/help/CDH_Species_01.html * http://research.cs.berkeley.edu/project/sbp/darcsrepo-no-longer-updated/src/edu/berkeley/sbp/misc/ReflectiveWalker.java # noqa and can be extended freely as per ISO/IEC 10646:2012 / Unicode 6.1 names: * http://www.unicode.org/versions/Unicode6.1.0/ For readability of >2 words, it is suggested to use CamelCase style. So, yes, 'GreekSmallLetterEtaWithPsiliAndOxia' could indeed be a fully valid software name; software "electron" in the original spelling anyone? ;-)
expand_glob_paths(glob_paths)
¶
Expand specified glob paths to a list of unique non-glob paths to only files.
extract_cmd(filepath, overwrite=False)
¶
Determines the file type of file at filepath, returns extract cmd based on file suffix
extract_file(fn, dest, cmd=None, extra_options=None, overwrite=False, forced=False, change_into_dir=False, trace=True)
¶
Extract file at given path to specified directory
PARAMETER | DESCRIPTION |
---|---|
fn
|
path to file to extract
|
dest
|
location to extract to
|
cmd
|
extract command to use (derived from filename if not specified)
DEFAULT:
|
extra_options
|
extra options to pass to extract command
DEFAULT:
|
overwrite
|
overwrite existing unpacked file
DEFAULT:
|
forced
|
force extraction in (extended) dry run mode
DEFAULT:
|
change_into_dir
|
change into resulting directorys
DEFAULT:
|
trace
|
produce trace output for extract command being run
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
path to directory (in case of success) |
find_backup_name_candidate(src_file)
¶
Returns a non-existing file to be used as destination for backup files
find_base_dir()
¶
Try to locate a possible new base directory - this is typically a single subdir, e.g. from untarring a tarball - when extracting multiple tarballs in the same directory, expect only the first one to give the correct path
find_easyconfigs(path, ignore_dirs=None)
¶
Find .eb easyconfig files in path
find_eb_script(script_name)
¶
Find EasyBuild script with given name (in easybuild/scripts subdirectory).
find_extension(filename)
¶
Find best match for filename extension.
find_flexlm_license(custom_env_vars=None, lic_specs=None)
¶
Find FlexLM license.
Considered specified list of environment variables; checks for path to existing license file or valid license server specification; duplicate paths are not retained in the returned list of license specs.
If no license is found through environment variables, also consider 'lic_specs'.
PARAMETER | DESCRIPTION |
---|---|
custom_env_vars
|
list of environment variables to considered (if None, only consider $LM_LICENSE_FILE)
DEFAULT:
|
lic_specs
|
list of license specifications
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
tuple with list of valid license specs found and name of first valid environment variable |
find_glob_pattern(glob_pattern, fail_on_no_match=True)
¶
Find unique file/dir matching glob_pattern (raises error if more than one match is found)
get_cwd(must_exist=True)
¶
Retrieve current working directory
get_easyblock_class_name(path)
¶
Make sure file is an easyblock and get easyblock class name
get_first_non_existing_parent_path(path)
¶
Get first directory that does not exist, starting at path and going up.
get_source_tarball_from_git(filename, target_dir, git_config)
¶
Downloads a git repository, at a specific tag or commit, recursively or not, and make an archive with it
PARAMETER | DESCRIPTION |
---|---|
filename
|
name of the archive file to save the code to (including extension)
|
target_dir
|
target directory where to save the archive to
|
git_config
|
dictionary containing url, repo_name, recursive, and one of tag or commit
|
guess_patch_level(patched_files, parent_dir)
¶
Guess patch level based on list of patched files and specified directory.
has_recursive_symlinks(path)
¶
Check the given directory for recursive symlinks.
That means symlinks to folders inside the path which would cause infinite loops when traversed regularily.
PARAMETER | DESCRIPTION |
---|---|
path
|
Path to directory to check
|
install_fake_vsc()
¶
Put fake 'vsc' Python package in place, to catch easyblocks/scripts that still import from vsc.* namespace (vsc-base & vsc-install were ingested into the EasyBuild framework for EasyBuild 4.0, see https://github.com/easybuilders/easybuild-framework/pull/2708)
is_alt_pypi_url(url)
¶
Determine whether specified URL is already an alternative PyPI URL, i.e. whether it contains a hash.
is_binary(contents)
¶
Check whether given bytestring represents the contents of a binary file or not.
is_generic_easyblock(easyblock)
¶
Return whether specified easyblock name is a generic easyblock or not.
is_parent_path(path1, path2)
¶
Return True if path1 is a prefix of path2
PARAMETER | DESCRIPTION |
---|---|
path1
|
absolute or relative path
|
path2
|
absolute or relative path
|
is_patch_file(path)
¶
Determine whether file at specified path is a patch file (based on +++ and --- lines being present).
is_readable(path)
¶
Return whether file at specified location exists and is readable.
is_sha256_checksum(value)
¶
Check whether provided string is a SHA256 checksum.
load_index(path, ignore_dirs=None)
¶
Load index for specified path, and return contents (or None if no index exists).
locate_files(files, paths, ignore_subdirs=None)
¶
Determine full path for list of files, in given list of paths (directories).
make_archive(source_dir, archive_file=None, archive_dir=None, reproducible=True)
¶
Create an archive file of the given directory The format of the tarball is defined by the extension of the archive file name
:source_dir: string with path to directory to be archived :archive_file: string with filename of archive :archive_dir: string with path to directory to place the archive :reproducible: make a tarball that is reproducible accross systems - see https://reproducible-builds.org/docs/archives/ - requires uncompressed or LZMA compressed archive images - gzip is currently not supported due to undeterministic data injected in its headers see https://github.com/python/cpython/issues/112346
Default behaviour: reproducible tarball in .tar.xz
mkdir(path, parents=False, set_gid=None, sticky=None)
¶
Create a directory Directory is the path to create
PARAMETER | DESCRIPTION |
---|---|
parents
|
create parent directories if needed (mkdir -p)
DEFAULT:
|
set_gid
|
set group ID bit, to make subdirectories and files inherit group
DEFAULT:
|
sticky
|
set the sticky bit on this directory (a.k.a. the restricted deletion flag), to avoid users can removing/renaming files in this directory
DEFAULT:
|
modify_env(old, new)
¶
NO LONGER SUPPORTED: use modify_env from easybuild.tools.environment instead
move_file(path, target_path, force_in_dry_run=False)
¶
Move a file from path to target_path
PARAMETER | DESCRIPTION |
---|---|
path
|
the original filepath
|
target_path
|
path to move the file to
|
force_in_dry_run
|
force running the command during dry run
DEFAULT:
|
move_logs(src_logfile, target_logfile)
¶
Move log file(s).
normalize_path(path)
¶
Normalize path removing empty and dot components.
Similar to os.path.normpath but does not resolve '..' which may return a wrong path when symlinks are used
open_file(path, mode)
¶
Open a (usually) text file. If mode is not binary, then utf-8 encoding will be used for Python 3.x
parse_http_header_fields_urlpat(arg, urlpat=None, header=None, urlpat_headers_collection=None, maxdepth=3)
¶
Recurse into multi-line string "[URLPAT::][HEADER:]FILE|FIELD" where FILE may be another such string or file containing lines matching the same format, such as "^https://www.example.com::/path/to/headers.txt", and flatten the result to dict e.g. {'^https://www.example.com': ['Authorization: Basic token', 'User-Agent: Special Agent']}
path_matches(path, paths)
¶
Check whether given path matches any of the provided paths.
pypi_source_urls(pkg_name)
¶
Fetch list of source URLs (incl. source filename) for specified Python package from PyPI, using 'simple' PyPI API.
read_file(path, log_error=True, mode='r')
¶
Read contents of file at given path, in a robust way.
register_lock_cleanup_signal_handlers()
¶
Register signal handler for signals that cancel the current EasyBuild session, so we can clean up the locks that were created first.
remove(paths)
¶
Remove single file/directory or list of files and directories
PARAMETER | DESCRIPTION |
---|---|
paths
|
path(s) to remove
|
remove_dir(path)
¶
Remove directory at specified path.
remove_file(path)
¶
Remove file at specified path.
remove_lock(lock_name)
¶
Remove lock with specified name.
resolve_path(path)
¶
Return fully resolved path for given path.
PARAMETER | DESCRIPTION |
---|---|
path
|
path that (maybe) contains symlinks
|
search_file(paths, query, short=False, ignore_dirs=None, silent=False, filename_only=False, terse=False, case_sensitive=False)
¶
Search for files using in specified paths using specified search query (regular expression)
PARAMETER | DESCRIPTION |
---|---|
paths
|
list of paths to search in
|
query
|
search query to use (regular expression); will be used case-insensitive
|
short
|
figure out common prefix of hits, use variable to factor it out
DEFAULT:
|
ignore_dirs
|
list of directories to ignore (default: ['.git', '.svn'])
DEFAULT:
|
silent
|
whether or not to remain silent (don't print anything)
DEFAULT:
|
filename_only
|
only return filenames, not file paths
DEFAULT:
|
terse
|
stick to terse (machine-readable) output, as opposed to pretty-printing
DEFAULT:
|
set_gid_sticky_bits(path, set_gid=None, sticky=None, recursive=False)
¶
Set GID/sticky bits on specified path.
symlink(source_path, symlink_path, use_abspath_source=True)
¶
Create a symlink at the specified path to the given path.
PARAMETER | DESCRIPTION |
---|---|
source_path
|
source file path
|
symlink_path
|
symlink file path
|
use_abspath_source
|
resolves the absolute path of source_path
DEFAULT:
|
verify_checksum(path, checksums, computed_checksums=None)
¶
Verify checksum of specified file.
PARAMETER | DESCRIPTION |
---|---|
path
|
path of file to verify checksum of
|
checksums
|
checksum values (and type, optionally, default is sha256), e.g., 'af314', ('sha', '5ec1b')
|
computed_checksums
|
Optional dictionary of (current) checksum(s) for this file indexed by the checksum type (e.g. 'sha256'). Each existing entry will be used, missing ones will be computed.
DEFAULT:
|
weld_paths(path1, path2)
¶
Weld two paths together, taking into account overlap between tail of 1st path with head of 2nd path.
which(cmd, retain_all=False, check_perms=True, log_ok=True, on_error=WARN)
¶
Return (first) path in $PATH for specified command, or None if command is not found
PARAMETER | DESCRIPTION |
---|---|
retain_all
|
returns all locations to the specified command in $PATH, not just the first one
DEFAULT:
|
check_perms
|
check whether candidate path has read/exec permissions before accepting it as a match
DEFAULT:
|
log_ok
|
Log an info message where the command has been found (if any)
DEFAULT:
|
on_error
|
What to do if the command was not found, default: WARN. Possible values: IGNORE, WARN, ERROR
DEFAULT:
|
write_file(path, data, append=False, forced=False, backup=False, always_overwrite=True, verbose=False, show_progress=False, size=None)
¶
Write given contents to file at given path; overwrites current file contents without backup by default!
PARAMETER | DESCRIPTION |
---|---|
path
|
location of file
|
data
|
contents to write to file. Can be a file-like object of binary data
|
append
|
append to existing file rather than overwrite
DEFAULT:
|
forced
|
force actually writing file in (extended) dry run mode
DEFAULT:
|
backup
|
back up existing file before overwriting or modifying it
DEFAULT:
|
always_overwrite
|
don't require --force to overwrite an existing file
DEFAULT:
|
verbose
|
be verbose, i.e. inform where backup file was created
DEFAULT:
|
show_progress
|
show progress bar while writing file
DEFAULT:
|
size
|
size (in bytes) of data to write (used for progress bar)
DEFAULT:
|