Skip to content

filetools

Set of file tools.

Authors:

  • Stijn De Weirdt (Ghent University)
  • Dries Verdegem (Ghent University)
  • Kenneth Hoste (Ghent University)
  • Pieter De Baets (Ghent University)
  • Jens Timmerman (Ghent University)
  • Toon Willems (Ghent University)
  • Ward Poelmans (Ghent University)
  • Fotis Georgatos (Uni.Lu, NTUA)
  • Sotiris Fragkiskos (NTUA, CERN)
  • Davide Vanzo (ACCRE, Vanderbilt University)
  • Damian Alvarez (Forschungszentrum Juelich GmbH)
  • Maxime Boissonneault (Compute Canada)

ZlibChecksum

Bases: object

wrapper class for adler32 and crc32 checksums to match the interface of the hashlib module

hexdigest()

Return hex string of the checksum

update(data)

Calculates a new checksum using the old one and the new data

adjust_permissions(provided_path, permission_bits, add=True, onlyfiles=False, onlydirs=False, recursive=True, group_id=None, relative=True, ignore_errors=False, skip_symlinks=None)

Change permissions for specified path, using specified permission bits

PARAMETER DESCRIPTION
add

add permissions relative to current permissions (only relevant if 'relative' is set to True)

DEFAULT: True

onlyfiles

only change permissions on files (not directories)

DEFAULT: False

onlydirs

only change permissions on directories (not files)

DEFAULT: False

recursive

change permissions recursively (only makes sense if path is a directory)

DEFAULT: True

group_id

also change group ownership to group with this group ID

DEFAULT: None

relative

add/remove permissions relative to current permissions (if False, hard set specified permissions)

DEFAULT: True

ignore_errors

ignore errors that occur when changing permissions (up to a maximum ratio specified by --max-fail-ratio-adjust-permissions configuration option) Add or remove (if add is False) permission_bits from all files (if onlydirs is False) and directories (if onlyfiles is False) in path

DEFAULT: False

apply_patch(patch_file, dest, fn=None, copy=False, level=None, use_git_am=False, use_git=False)

Apply a patch to source code in directory dest - assume unified diff created with "diff -ru old new"

Raises EasyBuildError on any error and returns True on success

apply_regex_substitutions(paths, regex_subs, backup='.orig.eb', on_missing_match=None)

Apply specified list of regex substitutions.

PARAMETER DESCRIPTION
paths

list of paths to files to patch (or just a single filepath)

regex_subs

list of substitutions to apply, specified as (, )

backup

create backup of original file with specified suffix (no backup if value evaluates to False)

DEFAULT: '.orig.eb'

on_missing_match

Define what to do when no match was found in the file. Can be 'error' to raise an error, 'warn' to print a warning or 'ignore' to do nothing Defaults to the value of --strict

DEFAULT: None

back_up_file(src_file, backup_extension='bak', hidden=False, strip_fn=None)

Backs up a file appending a backup extension and timestamp to it (if there is already an existing backup).

PARAMETER DESCRIPTION
src_file

file to be back up

backup_extension

extension to use for the backup file (can be empty or None)

DEFAULT: 'bak'

hidden

make backup hidden (leading dot in filename)

DEFAULT: False

strip_fn

strip specified trailing substring from filename of backup

DEFAULT: None

RETURNS DESCRIPTION

location of backed up file

calc_block_checksum(path, algorithm)

Calculate a checksum of a file by reading it into blocks

change_dir(path)

Change to directory at specified location.

PARAMETER DESCRIPTION
path

location to change to

RETURNS DESCRIPTION

previous location we were in

check_lock(lock_name)

Check whether a lock with specified name already exists.

If it exists, either wait until it's released, or raise an error (depending on --wait-on-lock configuration option).

clean_up_locks()

Clean up all still existing locks that were created in this session.

clean_up_locks_signal_handler(signum, frame)

Signal handler, cleans up locks & exits with received signal number.

cleanup(logfile, tempdir, testing, silent=False)

Cleanup the specified log file and the tmp directory, if desired.

PARAMETER DESCRIPTION
logfile

path to log file to clean up

tempdir

path to temporary directory to clean up

testing

are we in testing mode? if so, don't actually clean up anything

silent

be silent (don't print anything to stdout)

DEFAULT: False

compute_checksum(path, checksum_type=DEFAULT_CHECKSUM)

Compute checksum of specified file.

PARAMETER DESCRIPTION
path

Path of file to compute checksum for

checksum_type

type(s) of checksum ('adler32', 'crc32', 'md5' (default), 'sha1', 'sha256', 'sha512', 'size')

DEFAULT: DEFAULT_CHECKSUM

convert_name(name, upper=False)

Converts name so it can be used as variable name

copy(paths, target_path, force_in_dry_run=False, **kwargs)

Copy single file/directory or list of files and directories to specified location

PARAMETER DESCRIPTION
paths

path(s) to copy

target_path

target location

force_in_dry_run

force running the command during dry run

DEFAULT: False

kwargs

additional named arguments to pass down to copy_dir

DEFAULT: {}

copy_dir(path, target_path, force_in_dry_run=False, dirs_exist_ok=False, check_for_recursive_symlinks=True, **kwargs)

Copy a directory from specified location to specified location

PARAMETER DESCRIPTION
path

the original directory path

target_path

path to copy the directory to

force_in_dry_run

force running the command during dry run

DEFAULT: False

dirs_exist_ok

boolean indicating whether it's OK if the target directory already exists

DEFAULT: False

check_for_recursive_symlinks

If symlink arg is not given or False check for recursive symlinks first shutil.copytree is used if the target path does not exist yet; if the target path already exists, the 'copy' function will be used to copy the contents of the source path to the target path Additional specified named arguments are passed down to shutil.copytree/copy if used.

DEFAULT: True

copy_easyblocks(paths, target_dir)

Find right location for easyblock file and copy it there

copy_file(path, target_path, force_in_dry_run=False)

Copy a file from specified location to specified location

PARAMETER DESCRIPTION
path

the original filepath

target_path

path to copy the file to

force_in_dry_run

force copying of file during dry run

DEFAULT: False

copy_files(paths, target_path, force_in_dry_run=False, target_single_file=False, allow_empty=True, verbose=False)

Copy list of files to specified target path. Target directory is created if it doesn't exist yet.

PARAMETER DESCRIPTION
paths

list of filepaths to copy

target_path

path to copy files to

force_in_dry_run

force copying of files during dry run

DEFAULT: False

target_single_file

if there's only a single file to copy, copy to a file at target path (not a directory)

DEFAULT: False

allow_empty

allow empty list of paths to copy as input (if False: raise error on empty input list)

DEFAULT: True

verbose

print a message to report copying of files

DEFAULT: False

copy_framework_files(paths, target_dir)

Find right location for framework file and copy it there

copytree(src, dst, symlinks=False, ignore=None)

DEPRECATED and removed. Use copy_dir

create_index(path, ignore_dirs=None)

Create index for files in specified path.

create_lock(lock_name)

Create lock with specified name.

create_patch_info(patch_spec)

Create info dictionary from specified patch spec.

create_unused_dir(parent_folder, name)

Create a new folder in parent_folder using name as the name. When a folder of that name already exists, '_0' is appended which is retried for increasing numbers until an unused name was found

decode_class_name(name)

Return decoded version of class name.

decode_string(name)

Decoding function to revert result of encode_string.

derive_alt_pypi_url(url)

Derive alternate PyPI URL for given URL.

det_common_path_prefix(paths)

Determine common path prefix for a given list of paths.

det_file_size(http_header)

Determine size of file from provided HTTP header info (without downloading it).

det_lock_path(lock_name)

Determine full path for lock with specifed name.

det_patched_files(path=None, txt=None, omit_ab_prefix=False, github=False, filter_deleted=False)

Determine list of patched files from a patch. It searches for "+++ path/to/patched/file" lines to determine the patched files. Note: does not correctly handle filepaths with spaces.

PARAMETER DESCRIPTION
path

the path to the diff

DEFAULT: None

txt

the contents of the diff (either path or txt should be give)

DEFAULT: None

omit_ab_prefix

ignore the a/ or b/ prefix of the files

DEFAULT: False

github

only consider lines that start with 'diff --git' to determine list of patched files

DEFAULT: False

filter_deleted

filter out all files that were deleted by the patch

DEFAULT: False

det_size(path)

Determine total size of given filepath (in bytes).

diff_files(path1, path2)

Return unified diff between two files

dir_contains_files(path, recursive=True)

Return True if the given directory does contain any file

:recursive If False only the path itself is considered, else all subdirectories are also searched

download_file(filename, url, path, forced=False)

Download a file from the given URL, to the specified path.

dump_index(path, max_age_sec=None)

Create index for files in specified path, and dump it to file (alphabetically sorted).

encode_class_name(name)

return encoded version of class name

encode_string(name)

This encoding function handles funky software names ad infinitum, like: example: '0_foo+0x0x#-$__' becomes: '0_underscore_foo_plus_0x0x_hash__minus__dollar__underscore__underscore_' The intention is to have a robust escaping mechanism for names like c++, C# et al

It has been inspired by the concepts seen at, but in lowercase style: * http://fossies.org/dox/netcdf-4.2.1.1/escapes_8c_source.html * http://celldesigner.org/help/CDH_Species_01.html * http://research.cs.berkeley.edu/project/sbp/darcsrepo-no-longer-updated/src/edu/berkeley/sbp/misc/ReflectiveWalker.java # noqa and can be extended freely as per ISO/IEC 10646:2012 / Unicode 6.1 names: * http://www.unicode.org/versions/Unicode6.1.0/ For readability of >2 words, it is suggested to use CamelCase style. So, yes, 'GreekSmallLetterEtaWithPsiliAndOxia' could indeed be a fully valid software name; software "electron" in the original spelling anyone? ;-)

expand_glob_paths(glob_paths)

Expand specified glob paths to a list of unique non-glob paths to only files.

extract_cmd(filepath, overwrite=False)

Determines the file type of file at filepath, returns extract cmd based on file suffix

extract_file(fn, dest, cmd=None, extra_options=None, overwrite=False, forced=False, change_into_dir=None)

Extract file at given path to specified directory

PARAMETER DESCRIPTION
fn

path to file to extract

dest

location to extract to

cmd

extract command to use (derived from filename if not specified)

DEFAULT: None

extra_options

extra options to pass to extract command

DEFAULT: None

overwrite

overwrite existing unpacked file

DEFAULT: False

forced

force extraction in (extended) dry run mode

DEFAULT: False

change_into_dir

change into resulting directory; None (current default) implies True, but this is deprecated, this named argument should be set to False or True explicitely (in a future major release, default will be changed to False)

DEFAULT: None

RETURNS DESCRIPTION

path to directory (in case of success)

find_backup_name_candidate(src_file)

Returns a non-existing file to be used as destination for backup files

find_base_dir()

Try to locate a possible new base directory - this is typically a single subdir, e.g. from untarring a tarball - when extracting multiple tarballs in the same directory, expect only the first one to give the correct path

find_easyconfigs(path, ignore_dirs=None)

Find .eb easyconfig files in path

find_eb_script(script_name)

Find EasyBuild script with given name (in easybuild/scripts subdirectory).

find_extension(filename)

Find best match for filename extension.

find_flexlm_license(custom_env_vars=None, lic_specs=None)

Find FlexLM license.

Considered specified list of environment variables; checks for path to existing license file or valid license server specification; duplicate paths are not retained in the returned list of license specs.

If no license is found through environment variables, also consider 'lic_specs'.

PARAMETER DESCRIPTION
custom_env_vars

list of environment variables to considered (if None, only consider $LM_LICENSE_FILE)

DEFAULT: None

lic_specs

list of license specifications

DEFAULT: None

RETURNS DESCRIPTION

tuple with list of valid license specs found and name of first valid environment variable

find_glob_pattern(glob_pattern, fail_on_no_match=True)

Find unique file/dir matching glob_pattern (raises error if more than one match is found)

get_easyblock_class_name(path)

Make sure file is an easyblock and get easyblock class name

get_source_tarball_from_git(filename, targetdir, git_config)

Downloads a git repository, at a specific tag or commit, recursively or not, and make an archive with it

PARAMETER DESCRIPTION
filename

name of the archive to save the code to (must be .tar.gz)

targetdir

target directory where to save the archive to

git_config

dictionary containing url, repo_name, recursive, and one of tag or commit

guess_patch_level(patched_files, parent_dir)

Guess patch level based on list of patched files and specified directory.

Check the given directory for recursive symlinks.

That means symlinks to folders inside the path which would cause infinite loops when traversed regularily.

PARAMETER DESCRIPTION
path

Path to directory to check

install_fake_vsc()

Put fake 'vsc' Python package in place, to catch easyblocks/scripts that still import from vsc.* namespace (vsc-base & vsc-install were ingested into the EasyBuild framework for EasyBuild 4.0, see https://github.com/easybuilders/easybuild-framework/pull/2708)

is_alt_pypi_url(url)

Determine whether specified URL is already an alternate PyPI URL, i.e. whether it contains a hash.

is_binary(contents)

Check whether given bytestring represents the contents of a binary file or not.

is_generic_easyblock(easyblock)

Return whether specified easyblock name is a generic easyblock or not.

is_patch_file(path)

Determine whether file at specified path is a patch file (based on +++ and --- lines being present).

is_readable(path)

Return whether file at specified location exists and is readable.

is_sha256_checksum(value)

Check whether provided string is a SHA256 checksum.

load_index(path, ignore_dirs=None)

Load index for specified path, and return contents (or None if no index exists).

locate_files(files, paths, ignore_subdirs=None)

Determine full path for list of files, in given list of paths (directories).

mkdir(path, parents=False, set_gid=None, sticky=None)

Create a directory Directory is the path to create

PARAMETER DESCRIPTION
parents

create parent directories if needed (mkdir -p)

DEFAULT: False

set_gid

set group ID bit, to make subdirectories and files inherit group

DEFAULT: None

sticky

set the sticky bit on this directory (a.k.a. the restricted deletion flag), to avoid users can removing/renaming files in this directory

DEFAULT: None

modify_env(old, new)

NO LONGER SUPPORTED: use modify_env from easybuild.tools.environment instead

move_file(path, target_path, force_in_dry_run=False)

Move a file from path to target_path

PARAMETER DESCRIPTION
path

the original filepath

target_path

path to move the file to

force_in_dry_run

force running the command during dry run

DEFAULT: False

move_logs(src_logfile, target_logfile)

Move log file(s).

normalize_path(path)

Normalize path removing empty and dot components.

Similar to os.path.normpath but does not resolve '..' which may return a wrong path when symlinks are used

open_file(path, mode)

Open a (usually) text file. If mode is not binary, then utf-8 encoding will be used for Python 3.x

parse_http_header_fields_urlpat(arg, urlpat=None, header=None, urlpat_headers_collection=None, maxdepth=3)

Recurse into multi-line string "[URLPAT::][HEADER:]FILE|FIELD" where FILE may be another such string or file containing lines matching the same format, such as "^https://www.example.com::/path/to/headers.txt", and flatten the result to dict e.g. {'^https://www.example.com': ['Authorization: Basic token', 'User-Agent: Special Agent']}

parse_log_for_error(txt, regExp=None, stdout=True, msg=None)

NO LONGER SUPPORTED: use parse_log_for_error from easybuild.tools.run instead

path_matches(path, paths)

Check whether given path matches any of the provided paths.

pypi_source_urls(pkg_name)

Fetch list of source URLs (incl. source filename) for specified Python package from PyPI, using 'simple' PyPI API.

read_file(path, log_error=True, mode='r')

Read contents of file at given path, in a robust way.

register_lock_cleanup_signal_handlers()

Register signal handler for signals that cancel the current EasyBuild session, so we can clean up the locks that were created first.

remove(paths)

Remove single file/directory or list of files and directories

PARAMETER DESCRIPTION
paths

path(s) to remove

remove_dir(path)

Remove directory at specified path.

remove_file(path)

Remove file at specified path.

remove_lock(lock_name)

Remove lock with specified name.

resolve_path(path)

Return fully resolved path for given path.

PARAMETER DESCRIPTION
path

path that (maybe) contains symlinks

rmtree2(path, n=3)

Wrapper around shutil.rmtree to make it more robust when used on NFS mounted file systems.

run_cmd(cmd, log_ok=True, log_all=False, simple=False, inp=None, regexp=True, log_output=False, path=None)

NO LONGER SUPPORTED: use run_cmd from easybuild.tools.run instead

run_cmd_qa(cmd, qa, no_qa=None, log_ok=True, log_all=False, simple=False, regexp=True, std_qa=None, path=None)

NO LONGER SUPPORTED: use run_cmd_qa from easybuild.tools.run instead

search_file(paths, query, short=False, ignore_dirs=None, silent=False, filename_only=False, terse=False, case_sensitive=False)

Search for files using in specified paths using specified search query (regular expression)

PARAMETER DESCRIPTION
paths

list of paths to search in

query

search query to use (regular expression); will be used case-insensitive

short

figure out common prefix of hits, use variable to factor it out

DEFAULT: False

ignore_dirs

list of directories to ignore (default: ['.git', '.svn'])

DEFAULT: None

silent

whether or not to remain silent (don't print anything)

DEFAULT: False

filename_only

only return filenames, not file paths

DEFAULT: False

terse

stick to terse (machine-readable) output, as opposed to pretty-printing

DEFAULT: False

set_gid_sticky_bits(path, set_gid=None, sticky=None, recursive=False)

Set GID/sticky bits on specified path.

Create a symlink at the specified path to the given path.

PARAMETER DESCRIPTION
source_path

source file path

symlink_path

symlink file path

use_abspath_source

resolves the absolute path of source_path

DEFAULT: True

verify_checksum(path, checksums, computed_checksums=None)

Verify checksum of specified file.

PARAMETER DESCRIPTION
path

path of file to verify checksum of

checksums

checksum values to compare to (and type, optionally, default is MD5), e.g., 'af314', ('sha', '5ec1b')

computed_checksums

Optional dictionary of (current) checksum(s) for this file indexed by the checksum type (e.g. 'sha256'). Each existing entry will be used, missing ones will be computed.

DEFAULT: None

weld_paths(path1, path2)

Weld two paths together, taking into account overlap between tail of 1st path with head of 2nd path.

which(cmd, retain_all=False, check_perms=True, log_ok=True, log_error=None, on_error=None)

Return (first) path in $PATH for specified command, or None if command is not found

PARAMETER DESCRIPTION
retain_all

returns all locations to the specified command in $PATH, not just the first one

DEFAULT: False

check_perms

check whether candidate path has read/exec permissions before accepting it as a match

DEFAULT: True

log_ok

Log an info message where the command has been found (if any)

DEFAULT: True

on_error

What to do if the command was not found, default: WARN. Possible values: IGNORE, WARN, ERROR

DEFAULT: None

write_file(path, data, append=False, forced=False, backup=False, always_overwrite=True, verbose=False, show_progress=False, size=None)

Write given contents to file at given path; overwrites current file contents without backup by default!

PARAMETER DESCRIPTION
path

location of file

data

contents to write to file. Can be a file-like object of binary data

append

append to existing file rather than overwrite

DEFAULT: False

forced

force actually writing file in (extended) dry run mode

DEFAULT: False

backup

back up existing file before overwriting or modifying it

DEFAULT: False

always_overwrite

don't require --force to overwrite an existing file

DEFAULT: True

verbose

be verbose, i.e. inform where backup file was created

DEFAULT: False

show_progress

show progress bar while writing file

DEFAULT: False

size

size (in bytes) of data to write (used for progress bar)

DEFAULT: None