Package resource

Extracts package resource data

  • works!

  • intuitive

  • flexible

  • static type checks

The Problem

Some Python package authors create bash scripts to find a folder, using Path(__file__).parent which contains their data files.

Understandably, the UX of working with Python package data is beyond their patience level.

This module is for those who want the UX of extracting package data to be easy. Enough that they’ll go back and remove all those ugly hacks and bash scripts.

Note

PackageResource.package_data_folders yields data folders

First step to extracting package data is to narrow down the (package) folders. The Second step is extracting the data files.

Example

Extract package data to local cache

package data folder: data/currency

Note the path is relative

Local cache folder: $HOME/.cache/[package name]

import sys
from typing import TYPE_CHECKING
from collections.abc import Iterator
from functools import partial
from pathlib import PurePath
from logging_strict.util.package_resource import filter_by_suffix
from logging_strict.util.package_resource import filter_by_file_stem
from logging_strict.util.package_resource import PartSuffix
from logging_strict.util.package_resource import PartStem
from logging_strict.util.package_resource import package_data_folders
from logging_strict.util.package_resource import cache_extract

if sys.version_info >= (3, 9):  # pragma: no cover
    try:
        from importlib.resources.abc import Traversable  # py312+
    except ImportError:
        from importlib.abc import Traversable  # py39+
else:  # pragma: no cover
    msg_exc = "Traversable py39+"
    raise ImportError(msg_exc)

if TYPE_CHECKING:
    data_folder_path: str
    cb_file_stem: PartStem
    cb_file_suffix: PartSuffix
    generator_folders: Iterator[Traversable]
    path_entry: type[PurePath]

data_folder_path = "data/currency"
cb_file_stem = partial(filter_by_file_stem, "crypto_btc_default")
cb_file_suffix = partial(filter_by_suffix, ".bitcoin")
generator_folders = package_data_folders(
    cb_suffix=cb_file_suffix,
    cb_file_stem=cb_file_stem,
    package_name="decimals",
    path_relative_package_dir=data_folder_path,
)
for path_entry in cache_extract(
    generator_folders,
    package_name,
    cb_suffix=cb_file_suffix,
    cb_file_stem=cb_file_stem,
    is_overwrite=False,
):
    # path_entry is the extracted file path in local cache
    pass

So our file, data/currency/crypto_btc_default.bitcoin is extracted into folder $HOME/.cache/[package name]/data/currency

For more fine control, options are:

Note

DIY

Especially filter_by_file_stem(), but this might apply to filter_by_suffix() as well, these are for the simplest scenerio. They are both just a normal function. If/when necessary, roll your own

Note

package_data_folders param package_data_folders.package_name

Change to whichever package contains the data files you are interested in. Not the package in this example

Module private variables

logging_strict.util.package_resource.__all__: tuple[str, str, str, str, str, str] = ("filter_by_suffix", "filter_by_file_stem",    "PackageResource", "PartSuffix", "PartStem", "get_package_data")

Module object exports

logging_strict.util.package_resource.is_module_debug: bool = False

During development, turns on logging. Once unittest cover reaches 100%, turn off

logging_strict.util.package_resource.g_module: str = logging_strict.util.package_resource

logging dotted path

logging_strict.util.package_resource._LOGGER: logging.Logger

Complicated module. Does issue logging warnings

Module objects

class logging_strict.util.package_resource.PackageResource(package, package_data_folder_start)

In a Python package, could be any package installed into the virtual environment, which package data folder is the base folder in which to start the search for data files. As in a fallback folder

Do not assume the default start data folder is data. Impose rule that data files must not be stored in the package base folder; must be placed into a folder

Variables:
  • package (str) – package name

  • package_data_folder_start (str) – package base data folder name. Not relative path

cache_extract(base_folder_generator, /, cb_suffix=None, cb_file_stem=None, is_overwrite=False)

A generic extractor to local cache folder

package data ‣ cache folder

Parameters:
Returns:

local cached file path

Return type:

collections.abc.Iterator[pathlib.Path]

Caution

Refresh generator

Resources will not be extracted if the generator is exhausted. If running in a loop, reinitialize generator

get_parent_paths(*, cb_suffix=None, cb_file_stem=None, path_relative_package_dir=None, parent_count=1)

Example from a package there is a resource:

data/theme/size/category/[image file name]

The relative path is extracted. In this case, data, which is relevent only to the package, not to the final file system location. Interested in a relative path, not the absolute path from POV of the package

Remaining path

theme/size/category/[image file name]

resource: [image file name]

Parents: [“theme”, “size”, “category”]

The cb_suffix and cb_file_stem selects the relevent file

Caution

Location of package data files

CANNOT be in the base folder of a package. Move any package data files into an appropriately named/categoried sub-folder.

Strong assumption that there will never be data files in the package base folder. And if so, those aren’t data files, that’s clutter

Parameters:
Returns:

file name and respective parents as an Sequence[str]

Return type:

dict[str, Sequence[str]] | None

property package

Package name

Returns:

package name

Return type:

str

property package_data_folder_start

Package name

Returns:

package base data folder name. Not relative path

Return type:

str

package_data_folders(*, cb_suffix=None, cb_file_stem=None, path_relative_package_dir=None)

Generic generator for retrieving package data folder paths. Does not do the file extraction.

Caution

Generators delayed execution

Creating a generator will always succeed; the code is not immediately executed. If the code, would normally raise an Exception, have to execute the generator for that to occur.

This function is used as input to functions: PackageResource.resource_extract or PackageResource.cache_extract. So any Exception or logging would be delayed until those calls

Parameters:
Returns:

All py:class:importlib.resources.abc.Traversable paths. Possibly filtered by theme

Return type:

collections.abc.Iterator[importlib.resources.abc.Traversable]

Raises:
  • ImportError – package not installed. Before introspecting package data, install package

path_relative(y, /, *, path_relative_package_dir=None, parent_count=None)

Whilst traversing package data, a data file’s path, relative to a package folder, usually root folder, is unavailable. Only have the absolute path of the extracted data file

This limits flexibility. There might be need, especially during testing, to move the extracted data file to another folder

An Example y which is an absolute path package data extracted by importlib.resources.as_file(). Which should be zip safe

[venv path]/lib/python3.9/site-packages/decimals/data/currency/digital_tox_default.ini

Code sample is not extracting package data, instead fakes an absolute path, which needs to contain folder “data” although the local cache wouldn’t have this folder.

>>> from pathlib import Path
>>> from logging_strict.constants import g_app_name
>>> from logging_strict.util.package_resource import (
...     PackageResource,
...     _extract_folder,
... )
>>> path_local_cache = Path(_extract_folder(g_app_name))
>>> y = path_local_cache.joinpath(
...     "data", "currency", "nonsense", "digital_tox_default.ini"
... )
>>> pr = PackageResource("some package name", "data")
>>> pr.path_relative(y, parent_count=None)
PosixPath('currency/nonsense/digital_tox_default.ini')
>>> pr.path_relative(y, parent_count=0)
PosixPath('digital_tox_default.ini')
>>> pr.path_relative(y, parent_count=1)
PosixPath('nonsense/digital_tox_default.ini')
>>> pr.path_relative(y, parent_count=2)
PosixPath('currency/nonsense/digital_tox_default.ini')
>>> pr.path_relative(y, parent_count=3)  # can't do beyond start dir, "data"
PosixPath('currency/nonsense/digital_tox_default.ini')
Parameters:
  • y (pathlib.Path) – Extracted data file’s path

  • path_relative_package_dir (Path | str | None) – Default “data” (folder). Relative package path. Treat a base folder

  • parent_count (int | None) – Ignoring file name. Default None indicates entire relative path. Return x folders, from parent, working backwards

Returns:

Relative path excluding from path_relative_package_dir

Return type:

pathlib.Path

Raises:
  • TypeErrorNone, not a type[PurePath] or relative path

  • LookupError – Cannot return relative path from non-existing parent folder

resource_extract(base_folder_generator, path_dest, /, cb_suffix=None, cb_file_stem=None, is_overwrite=False, as_user=False)

A generic extractor

package data ‣ dest folder

Use task specific resource extractors for a cleaner UX

Parameters:
Returns:

local cached file path

Return type:

collections.abc.Iterator[pathlib.Path]

See also

Generator ‣ Resource folders PackageResource.package_data_folders

cb_suffix filter_by_suffix()

Caution

Refresh generator

Resources will not be extracted if the generator is exhausted. If running in a loop, reinitialize generator

Todo

acl permissions of dest folder

Check acl writable permissions Is dest folder tree writable?

class logging_strict.util.package_resource.PartStem(*args, **kwargs)

file stem callback functions Careful! Will return all files that match the file stem

Usage

from typing import TYPE_CHECKING
from functools import partial
from logging_strict.util.package_resource import filter_by_file_stem
from logging_strict.util.package_resource import PartStem

if TYPE_CHECKING:
    cb_file_stem: PartStem

cb_file_stem = partial(filter_by_file_stem, file_name)
cb_file_stem = partial(filter_by_file_stem, "index.theme")
Parameters:
  • file_expected (str) – File stem to search for. Can provide file name

  • test_file_stem (str) – file name or stem to test for

Returns:

True if file stem matches otherwise False

Return type:

bool

class logging_strict.util.package_resource.PartSuffix(*args, **kwargs)

Type of suffix callback functions

Usage

from typing import TYPE_CHECKING
from functools import partial
from logging_strict.util.package_resource import filter_by_suffix
from logging_strict.util.package_resource import PartSuffix

if TYPE_CHECKING:
    cb_suffix: PartSuffix
cb_suffix = partial(filter_by_suffix, (".svg", ".png"))
cb_suffix = partial(filter_by_suffix, ".toml")
Parameters:
  • expected_suffix (str | tuple[str, ...]) – Suffix or suffixes to search for

  • test_suffix (str) – file name or file suffixes concatenated

Returns:

True if suffix(es) match otherwise False

Return type:

bool

logging_strict.util.package_resource.filter_by_file_stem(expected_file_name, test_file_name)

This is the simpliest case, simple matching of package resource file name against expected file name

Usage

from functools import partial
from logging_strict.util.package_resource import filter_by_file_stem

cb_file_stem = partial(filter_by_file_stem, expected_file_name)
...

cb_file_stem is used extensively within this module

Parameters:
  • expected_file_name (str) – file name or stem. Are search for this

  • test_file_name (str) – The file name suffix testing against

Returns:

True if same otherwise False

Return type:

bool

Note

This is the simpliest case

For more complex cases write a lambda or function and

use functools.partial() to create a callback

logging_strict.util.package_resource.filter_by_suffix(expected_suffix, test_suffix)

Usage

from functools import partial
from logging_strict.package_resource import filter_by_suffix, PartSuffix

cb_suffix: PartSuffix = partial(filter_by_suffix, expected_suffix)
...

Then use cb_suffix as kwarg to PackageResource.cache_extract

Parameters:
  • expected_suffix (str | tuple[str, ...]) – Suffix (e.g. “.ppn”) searching for

  • test_suffix (str) – The file name suffix testing against

Returns:

True if same otherwise False

Return type:

bool

logging_strict.util.package_resource.get_package_data(package_name: str, file_name_stem: str, suffix='.csv', convert_to_path: Sequence[str] = ('data',), is_extract: bool | None = False) str

Export and read one package file. Exports to /run/user/[current session user id]. This tmp folder inaccessible to other users and contents automagically removed at system shutdown

Parameters:
  • file_name_stem (str) – without any suffixes

  • suffix (str | collections.abc.Sequence[str] | None) – str or tuple. Target file suffixes

  • convert_to_path (collections.abc.Sequence[str] | None) – Default ("data",). relative dotted path to subfolder, excluding package_name.

  • is_extract (bool) –

    Before reading file contents,

    True – extract to tmp folder

    False – read data file contents from within package

Returns:

file contents or on failure None

Return type:

str | None