Compare commits

..

No commits in common. 'master' and 'xmlparsing' have entirely different histories.

  1. 1
      .gitignore
  2. 27
      .pre-commit-config.yaml
  3. 15
      CHANGES.md
  4. 25
      Makefile
  5. 6
      README.md
  6. 8
      noxfile.py
  7. 79
      pyproject.toml
  8. 34
      src/sensospot_parser/__init__.py
  9. 30
      src/sensospot_parser/csv_parser.py
  10. 18
      src/sensospot_parser/parameters.py
  11. 27
      src/sensospot_parser/xml_parser.py
  12. 20
      tests/conftest.py
  13. 1
      tests/test_columns.py
  14. 29
      tests/test_csv_parser.py
  15. 4
      tests/test_parameters.py
  16. 17
      tests/test_sensospot_data.py
  17. 71
      tests/test_xml_parser.py
  18. 14
      tox.ini

1
.gitignore vendored

@ -39,7 +39,6 @@ pip-delete-this-directory.txt @@ -39,7 +39,6 @@ pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache

27
.pre-commit-config.yaml

@ -11,18 +11,33 @@ repos: @@ -11,18 +11,33 @@ repos:
- id: detect-private-key
- repo: local
hooks:
- id: isort-project
name: isort project
entry: isort -rc src
language: system
pass_filenames: false
- id: isort-test
name: isort test
entry: isort -rc tests
language: system
pass_filenames: false
- id: black
name: Autoformatting code with "black"
name: black
entry: black src tests
language: system
pass_filenames: false
- id: ruff
name: Linting code with "ruff"
entry: ruff src tests
- id: flake8
name: flake8 project
entry: flake8 --ignore E231,W503 src
language: system
pass_filenames: false
- id: flake8
name: flake8 test
entry: flake8 --ignore S101,W503 tests
language: system
pass_filenames: false
- id: pytest
name: Testing the project with "pytest"
name: pytest
entry: pytest tests
language: system
pass_filenames: false
language: system

15
CHANGES.md

@ -1,18 +1,3 @@ @@ -1,18 +1,3 @@
2.0.0 - xml parsing
-------------------
- The assay results xml file is now parsed first
- CSV parsing is still available as fallback if the XML could not be parsed
1.0.0 - cli cleanup
-------------------
- the cli interface was cleaned up a lot
- default output of cli is now stdout
- multiple sources can be specified instead of the clumsy '-r' option before
0.7.0 - simplifications
-----------------------

25
Makefile

@ -1,4 +1,4 @@ @@ -1,4 +1,4 @@
.PHONY: clean coverage coverall docs devenv install lint prepareenv repo serve-docs test testall testfunctional nox tox
.PHONY: clean coverage coverall docs devenv install lint prepareenv repo serve-docs test testall testfunctional tox
.DEFAULT_GOAL := help
define BROWSER_PYSCRIPT
@ -54,8 +54,11 @@ clean-test: ## remove test and coverage artifacts @@ -54,8 +54,11 @@ clean-test: ## remove test and coverage artifacts
rm -fr htmlcov/
lint: ## reformat with black and check style with flake8
isort src
isort tests
black src tests
ruff src tests
flake8 --ignore E231,W503,E402 src
flake8 --ignore S101,W503 tests
test: lint ## run tests quickly, stop on first error
pytest tests -x -l --last-failed --disable-warnings -m "not functional"
@ -76,11 +79,8 @@ coverall: lint ## full test suite, check code coverage and open coverage report @@ -76,11 +79,8 @@ coverall: lint ## full test suite, check code coverage and open coverage report
coverage html
$(BROWSER) htmlcov/index.html
nox: ## run fully isolated tests with nox
nox
tox: ## old habits die hard: typo-squatting to use nox
nox
tox: ## run fully isolated tests with tox
tox
docs: ## build the documentation using mkdocs
mkdocs build
@ -88,14 +88,14 @@ docs: ## build the documentation using mkdocs @@ -88,14 +88,14 @@ docs: ## build the documentation using mkdocs
serve-docs: docs ## build the documentation and serve them in a web server
mkdocs serve
install: ## install updated project.toml
.venv/bin/pip3 install -e ".[docs,dev,test]"
install: ## install updated project.toml with flint
flit install --pth-file
prepareenv: ## setup virtual environment and install packages
rm -fr .venv/
python3 -m venv --prompt sensospot .venv
.venv/bin/pip3 install --upgrade pip wheel
.venv/bin/pip3 install -e ".[docs,dev,test]"
.venv/bin/pip3 install --upgrade pip
.venv/bin/pip3 install "flit>3.2"
.venv/bin/flit install --pth-file
devenv: prepareenv ## setup development environment including precommit hooks
.venv/bin/pre-commit install --install-hooks
@ -107,4 +107,5 @@ repo: prepareenv ## complete project setup with development environment and git @@ -107,4 +107,5 @@ repo: prepareenv ## complete project setup with development environment and git
git branch -m main
git remote add origin https://git.cpi.imtek.uni-freiburg.de/holgi/sensospot_parser.git
git push -u origin main --no-verify
.venv/bin/pre-commit install --install-hooks

6
README.md

@ -43,11 +43,6 @@ There is a `columns` module available, providing constans that define the column @@ -43,11 +43,6 @@ There is a `columns` module available, providing constans that define the column
## Avaliable public functions:
All public functions return a [pandas DataFrame][pandas] object.
Be aware that some columns might contain no values. This is depending on the parsing
method (xml or csv) and if a parameters file could be found or not.
- **parse_folder(path_to_folder)**
Tries the `parse_xml_folder()` function first and if an error occurs,
it falls back to the `parse_csv_folder()`
@ -93,4 +88,3 @@ To generate the documentation pages use `make docs` or `make serve-docs` for @@ -93,4 +88,3 @@ To generate the documentation pages use `make docs` or `make serve-docs` for
starting a webserver with the generated documentation
[sensospot]: https://www.miltenyi-imaging.com/products/sensospot
[pandas]: https://pandas.pydata.org/docs/reference/frame.html

8
noxfile.py

@ -1,8 +0,0 @@ @@ -1,8 +0,0 @@
import nox
@nox.session(python=["3.9", "3.10", "3.11"])
def tests(session):
session.install(".[test]")
session.run("pytest", *session.posargs)

79
pyproject.toml

@ -27,10 +27,10 @@ classifiers = [ @@ -27,10 +27,10 @@ classifiers = [
]
dependencies = [
"click",
"defusedxml >=0.6.0",
"pandas >=1.0.0",
"defusedxml >=0.6.0",
"tables >=3.6.1",
"click",
]
[project.urls]
Source = "https://git.cpi.imtek.uni-freiburg.de/holgi/sensospot_parser.git"
@ -39,34 +39,26 @@ Source = "https://git.cpi.imtek.uni-freiburg.de/holgi/sensospot_parser.git" @@ -39,34 +39,26 @@ Source = "https://git.cpi.imtek.uni-freiburg.de/holgi/sensospot_parser.git"
sensospot_parse = "sensospot_parser:main"
[project.optional-dependencies]
test = [
"pytest >=4.0.0",
"pytest-cov",
"pytest-mock",
"pytest-randomly >=3.5.0",
"tox",
]
dev = [
"black",
"flit",
"flake8",
"flake8-comprehensions",
"flake8-bandit",
"isort >= 5.0.0",
"keyring",
"pre-commit",
"ruff",
]
docs = [
"mkdocs",
"mkdocstrings[python]",
]
test = [
"pytest >=4.0.0",
"pytest-cov",
"pytest-mock",
"pytest-randomly >=3.5.0",
"nox",
]
[tool.pytest.ini_options]
markers = [
"functional: marks tests as functional (deselect with '-m \"not functional\"')",
]
addopts = [
"--strict-markers",
]
[tool.black]
line-length = 79
@ -82,39 +74,16 @@ extend-exclude = ''' @@ -82,39 +74,16 @@ extend-exclude = '''
^/.dist
'''
[tool.ruff]
# see https://github.com/charliermarsh/ruff
select = ["ALL"]
ignore = [
# ignored for now, should be activated in the future
# docstrings
"D",
# flake8-annotations
"ANN",
# flake8-type-checking
"TCH",
# ignored, "black" will handle this
# flake8-commas
"COM",
[tool.isort]
line_length=79
multi_line_output=3
length_sort="True"
include_trailing_comma="True"
# ignored, due to Windows / WSL2 setup
# flake8-executable
"EXE",
# project specific ignores
# flake8-import-conventions
"ICN",
[tool.pytest.ini_options]
markers = [
"functional: marks tests as functional (deselect with '-m \"not functional\"')",
]
addopts = [
"--strict-markers",
]
fixable = ["I"]
fix = true
line-length=79
target-version = "py38"
[tool.ruff.per-file-ignores]
# see https://github.com/charliermarsh/ruff
"src/*" = ["SLF001", "G004"]
"tests/*" = ["FBT003", "INP001", "PLR2004", "S101", "SLF001"]
[tool.ruff.pydocstyle]
convention = "pep257" # Accepts: "google", "numpy", or "pep257".

34
src/sensospot_parser/__init__.py

@ -3,9 +3,9 @@ @@ -3,9 +3,9 @@
Parsing the numerical output from Sensovations Sensospot image analysis.
"""
__version__ = "2.0.2"
__version__ = "2.0.0"
import logging
import pathlib
from typing import Union
@ -16,15 +16,12 @@ from . import columns # noqa: F401 @@ -16,15 +16,12 @@ from . import columns # noqa: F401
from .csv_parser import parse_csv_folder
from .xml_parser import parse_xml_folder
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger("sensospot_parser")
DEFAULT_OUTPUT_FILENAME = "collected_data.csv"
PathLike = Union[str, pathlib.Path]
def parse_folder(source: PathLike, *, quiet: bool = False) -> pandas.DataFrame:
def parse_folder(source: PathLike, quiet: bool = False) -> pandas.DataFrame:
"""parses an assay result folder
The function will first try to use an assay results xml file, and will
@ -41,10 +38,7 @@ def parse_folder(source: PathLike, *, quiet: bool = False) -> pandas.DataFrame: @@ -41,10 +38,7 @@ def parse_folder(source: PathLike, *, quiet: bool = False) -> pandas.DataFrame:
return parse_xml_folder(source)
except ValueError:
pass
logger.info(
"Could not parse xml results file, using fall-back csv parsing"
)
return parse_csv_folder(source, quiet=quiet)
return parse_csv_folder(source, quiet)
@click.command()
@ -74,30 +68,16 @@ def parse_folder(source: PathLike, *, quiet: bool = False) -> pandas.DataFrame: @@ -74,30 +68,16 @@ def parse_folder(source: PathLike, *, quiet: bool = False) -> pandas.DataFrame:
default=False,
help="Ignore sanity check for csv file parsing",
)
@click.option(
"-v",
"--verbose",
help="Set verbosity of log, add multiple -vv for more verbose logging",
count=True,
)
def main(sources, output, verbose, quiet=False): # noqa: FBT002
def main(sources, output, quiet=False):
"""Parses the measurement results of the Sensospot reader
The resulting output is either echoed to stdout or saved to a file.
At first parsing the assay result xml file is tried.
If this doesn't work, the fallback is to parse the csv files.
I this doesn't work, the fallback is to parse the csv files.
"""
if verbose == 0:
logging.disable()
elif verbose == 1:
logging.disable(level=logging.DEBUG)
else:
logging.disable(level=logging.NOTSET)
paths = (pathlib.Path(source) for source in sources)
collection = (parse_folder(source, quiet=quiet) for source in paths)
collection = (parse_folder(source, quiet) for source in paths)
result = (
pandas.concat(collection, ignore_index=True)
.reset_index()

30
src/sensospot_parser/csv_parser.py

@ -3,19 +3,16 @@ @@ -3,19 +3,16 @@
Parsing the csv result files from Sensovations Sensospot image analysis.
"""
import logging
import pathlib
import re
import pathlib
from typing import Union, TextIO, Optional, Sequence
from collections import namedtuple
from typing import Optional, Sequence, TextIO, Union
import pandas
from . import columns
from .parameters import add_measurement_parameters
logger = logging.getLogger("sensospot_parser")
PathLike = Union[str, pathlib.Path]
REGEX_WELL = re.compile(
@ -77,11 +74,10 @@ def _extract_measurement_info(data_file: PathLike) -> FileInfo: @@ -77,11 +74,10 @@ def _extract_measurement_info(data_file: PathLike) -> FileInfo:
named tuple FileInfo with parsed metadata
"""
data_path = pathlib.Path(data_file)
*rest, well, exposure = data_path.stem.rsplit("_", 2)
*rest, well, exposure = data_path.stem.rsplit("_", 2) # noqa: F841
matched = REGEX_WELL.match(well)
if matched is None:
msg = f"not a valid well: '{well}'"
raise ValueError(msg)
raise ValueError(f"not a valid well: '{well}'")
row = matched["row"].upper()
column = int(matched["column"])
exposure = int(exposure)
@ -103,7 +99,6 @@ def parse_csv_file(data_file: PathLike) -> pandas.DataFrame: @@ -103,7 +99,6 @@ def parse_csv_file(data_file: PathLike) -> pandas.DataFrame:
ValueError: if metadata could not be extracted
"""
data_path = pathlib.Path(data_file).resolve()
logger.debug(f"Parsing csv file {data_path}")
measurement_info = _extract_measurement_info(data_path)
data_frame = _parse_csv(data_path)
# normalized well name
@ -148,8 +143,7 @@ def parse_multiple_csv_files( @@ -148,8 +143,7 @@ def parse_multiple_csv_files(
pandas data frame with all parsed data combined
"""
if not file_list:
msg = "Empty file list provided"
raise ValueError(msg)
raise ValueError("Empty file list provided")
collection = (_parse_csv_file_silenced(path) for path in file_list)
filtered = (frame for frame in collection if frame is not None)
data_frame = pandas.concat(filtered, ignore_index=True).reset_index()
@ -192,8 +186,9 @@ def _sanity_check(data_frame: pandas.DataFrame) -> pandas.DataFrame: @@ -192,8 +186,9 @@ def _sanity_check(data_frame: pandas.DataFrame) -> pandas.DataFrame:
spot_positions = len(data_frame[columns.POS_ID].unique())
expected_rows = field_rows * field_cols * exposures * spot_positions
if expected_rows != len(data_frame):
msg = f"Measurements are missing: {expected_rows} != {len(data_frame)}"
raise ValueError(msg)
raise ValueError(
f"Measurements are missing: {expected_rows} != {len(data_frame)}"
)
# set the right data type for measurement columns
for raw_column in columns.NUMERIC_COLUMNS:
data_frame[raw_column] = pandas.to_numeric(data_frame[raw_column])
@ -201,7 +196,7 @@ def _sanity_check(data_frame: pandas.DataFrame) -> pandas.DataFrame: @@ -201,7 +196,7 @@ def _sanity_check(data_frame: pandas.DataFrame) -> pandas.DataFrame:
def parse_csv_folder(
folder: PathLike, *, quiet: bool = False
folder: PathLike, quiet: bool = False
) -> pandas.DataFrame:
"""parses all csv files in a folder to one large dataframe
@ -215,15 +210,12 @@ def parse_csv_folder( @@ -215,15 +210,12 @@ def parse_csv_folder(
Returns:
a pandas data frame with parsed data
"""
logger.info(f"Parsing csv files in folder {folder}")
folder_path = pathlib.Path(folder)
file_list = find_csv_files(folder_path)
try:
data_frame = parse_multiple_csv_files(file_list)
except ValueError as e:
msg = f"No sensospot data found in folder '{folder}'"
logger.warning(msg)
raise ValueError(msg) from e
except ValueError:
raise ValueError(f"No sensospot data found in folder '{folder}'")
data_frame = add_measurement_parameters(data_frame, folder_path)

18
src/sensospot_parser/parameters.py

@ -3,10 +3,9 @@ @@ -3,10 +3,9 @@
Parsing the numerical output from Sensovations Sensospot image analysis.
"""
import logging
import pathlib
from typing import Any, Dict, Optional, Union
from xml.etree.ElementTree import Element as ElementType
from typing import Any, Dict, Union, Optional
from xml.etree.ElementTree import Element as ElementType # noqa: S405
import numpy
import pandas
@ -16,8 +15,6 @@ from . import columns @@ -16,8 +15,6 @@ from . import columns
PathLike = Union[str, pathlib.Path]
logger = logging.getLogger("sensospot_parser")
def _search_params_file(folder: PathLike) -> Optional[pathlib.Path]:
"""searches for a exposure settings file in a folder
@ -33,7 +30,10 @@ def _search_params_file(folder: PathLike) -> Optional[pathlib.Path]: @@ -33,7 +30,10 @@ def _search_params_file(folder: PathLike) -> Optional[pathlib.Path]:
if not params_folder.is_dir():
return None
param_files = list(params_folder.glob("**/*.svexp"))
return param_files[0] if len(param_files) == 1 else None
if len(param_files) == 1:
return param_files[0]
else:
return None
def _get_channel_data(channel_node: ElementType) -> Dict[str, Any]:
@ -45,9 +45,9 @@ def _get_channel_data(channel_node: ElementType) -> Dict[str, Any]: @@ -45,9 +45,9 @@ def _get_channel_data(channel_node: ElementType) -> Dict[str, Any]:
Returns:
dict with the information
"""
# Example "ChannelConfig1"
# child.tag == "ChannelConfig1"
exposure_id = int(channel_node.tag[-1])
# Example "Cy3 Green"
# channel_description == "[Cy3|Cy5] Green"
description = channel_node.attrib["Description"]
exposure_channel = description.rsplit(" ", 1)[-1]
# floats can be used for exposure times, not only ints
@ -68,7 +68,6 @@ def _parse_measurement_params(params_file: PathLike) -> pandas.DataFrame: @@ -68,7 +68,6 @@ def _parse_measurement_params(params_file: PathLike) -> pandas.DataFrame:
Returns:
pandas data frame with the parsed information
"""
logger.debug(f"Parsing parameters file {params_file}")
file_path = pathlib.Path(params_file)
with file_path.open("r") as file_handle:
tree = ElementTree.parse(file_handle)
@ -88,7 +87,6 @@ def get_measurement_params(folder: PathLike) -> Optional[pandas.DataFrame]: @@ -88,7 +87,6 @@ def get_measurement_params(folder: PathLike) -> Optional[pandas.DataFrame]:
params_file = _search_params_file(folder)
if params_file is not None:
return _parse_measurement_params(params_file)
logger.debug(f"Could not locate parameters file in folder {folder}")
return None

27
src/sensospot_parser/xml_parser.py

@ -3,18 +3,15 @@ @@ -3,18 +3,15 @@
Parsing the csv result files from Sensovations Sensospot image analysis.
"""
import logging
import pathlib
from typing import Union, Optional
from datetime import datetime
from typing import Optional, Union
import pandas
from defusedxml import ElementTree
from . import columns, parameters
logger = logging.getLogger("sensospot_parser")
PathLike = Union[str, pathlib.Path]
RESULT_TAG_TYPES = {
@ -79,9 +76,7 @@ class ParserTarget: @@ -79,9 +76,7 @@ class ParserTarget:
def _data_timestamp_parser(self, data: str) -> None:
"""parses the data section of a "Timestamp" tag"""
timestamp = datetime.strptime( # noqa: DTZ007
data.strip(), DATETIME_XML_FORMAT
)
timestamp = datetime.strptime(data.strip(), DATETIME_XML_FORMAT)
self._current[columns.ANALYSIS_DATETIME] = timestamp
def _data_image_name_parser(self, data: str) -> None:
@ -113,6 +108,7 @@ class ParserTarget: @@ -113,6 +108,7 @@ class ParserTarget:
def closed(self) -> None:
"""the end of the xml file is reached"""
pass
def _find_result_xml_file(folder: PathLike) -> Optional[pathlib.Path]:
@ -156,12 +152,9 @@ def parse_xml_file(xml_file: PathLike) -> pandas.DataFrame: @@ -156,12 +152,9 @@ def parse_xml_file(xml_file: PathLike) -> pandas.DataFrame:
Raises:
ValueError if the xml file could not be parsed
"""
logger.info(f"Parsing xml results file {xml_file}")
xml_file = pathlib.Path(xml_file)
if not xml_file.is_file():
msg = "Xml file does not exist"
logger.debug(f"{msg}: {xml_file}")
raise ValueError(msg)
raise ValueError("Xml file does not exist")
target = ParserTarget()
parser = ElementTree.DefusedXMLParser(target=target)
@ -169,15 +162,11 @@ def parse_xml_file(xml_file: PathLike) -> pandas.DataFrame: @@ -169,15 +162,11 @@ def parse_xml_file(xml_file: PathLike) -> pandas.DataFrame:
try:
parser.feed(xml_file.read_text())
except (IndexError, KeyError, ValueError, TypeError) as e:
msg = "Malformed data in xml file"
logger.warning(f"{msg} {xml_file}")
raise ValueError(msg) from e
raise ValueError("Malformed data in xml file") from e
data_frame = pandas.DataFrame(data=target.collected).reset_index()
if data_frame.empty:
msg = "Could not parse assay results xml file"
logger.warning(f"{msg} {xml_file}")
raise ValueError(msg)
raise ValueError("Could not parse assay results xml file")
return columns._cleanup_data_columns(data_frame)
@ -197,9 +186,7 @@ def parse_xml_folder(folder: PathLike) -> pandas.DataFrame: @@ -197,9 +186,7 @@ def parse_xml_folder(folder: PathLike) -> pandas.DataFrame:
folder = pathlib.Path(folder)
xml_file = _find_result_xml_file(folder)
if xml_file is None:
msg = "Could not find assay results xml file"
logger.debug(f"{msg} in folder {folder}")
raise ValueError(msg)
raise ValueError("Could not find assay results xml file")
data_frame = parse_xml_file(xml_file)
data_frame = parameters.add_measurement_parameters(data_frame, folder)
return columns._cleanup_data_columns(data_frame)

20
tests/conftest.py

@ -14,23 +14,23 @@ EXAMPLE_DIR_XML_WITH_PARAMS = "xml_with_parameters" @@ -14,23 +14,23 @@ EXAMPLE_DIR_XML_WITH_PARAMS = "xml_with_parameters"
@pytest.fixture(scope="session")
def example_dir(request):
root_dir = Path(request.config.rootdir)
return root_dir / "example_data"
yield root_dir / "example_data"
@pytest.fixture()
@pytest.fixture
def example_file(example_dir):
data_dir = example_dir / EXAMPLE_DIR_CSV_WO_PARAMS
return data_dir / "160218_SG2-013-001_Regen1_Cy3-100_1_A1_1.csv"
yield data_dir / "160218_SG2-013-001_Regen1_Cy3-100_1_A1_1.csv"
@pytest.fixture()
@pytest.fixture
def exposure_df():
from pandas import DataFrame
return DataFrame(data={"Exposure.Id": [1, 2, 3]})
yield DataFrame(data={"Exposure.Id": [1, 2, 3]})
@pytest.fixture()
@pytest.fixture
def normalization_data_frame():
from sensospot_parser.columns import RAW_DATA_NORMALIZATION_MAP
@ -86,10 +86,10 @@ def normalization_data_frame(): @@ -86,10 +86,10 @@ def normalization_data_frame():
data_frame = pandas.DataFrame(overflow_test_data)
data_frame["Exposure.Channel"] = "Cy5"
for value_column in RAW_DATA_NORMALIZATION_MAP:
for value_column in RAW_DATA_NORMALIZATION_MAP.keys():
data_frame[value_column] = data_frame["Value"]
return data_frame
yield data_frame
@pytest.fixture(scope="session")
@ -106,11 +106,11 @@ def parsed_data_frame_without_params(example_dir): @@ -106,11 +106,11 @@ def parsed_data_frame_without_params(example_dir):
return parse_csv_folder(example_dir / EXAMPLE_DIR_CSV_WO_PARAMS)
@pytest.fixture()
@pytest.fixture
def data_frame_with_params(parsed_data_frame_with_params):
return parsed_data_frame_with_params.copy()
@pytest.fixture()
@pytest.fixture
def data_frame_without_params(parsed_data_frame_without_params):
return parsed_data_frame_without_params.copy()

1
tests/test_columns.py

@ -1,5 +1,6 @@ @@ -1,5 +1,6 @@
def test_cleanup_data_columns():
from pandas import DataFrame
from sensospot_parser.columns import _cleanup_data_columns
columns = ["Rect.", "Contour", " ID ", "Found", "Dia."]

29
tests/test_csv_parser.py

@ -4,11 +4,11 @@ @@ -4,11 +4,11 @@
import numpy
import pytest
from .conftest import EXAMPLE_DIR_CSV_WITH_PARAMS, EXAMPLE_DIR_CSV_WO_PARAMS
from .conftest import EXAMPLE_DIR_CSV_WO_PARAMS, EXAMPLE_DIR_CSV_WITH_PARAMS
@pytest.mark.parametrize(
("sub_dir", "file_name"),
"sub_dir, file_name",
[
(
EXAMPLE_DIR_CSV_WO_PARAMS,
@ -65,15 +65,14 @@ def test_parse_csv_no_array(example_dir): @@ -65,15 +65,14 @@ def test_parse_csv_no_array(example_dir):
@pytest.mark.parametrize(
("provided", "expected"),
[("", "."), ("..,", "."), (".,,", ","), ("..,,", ".")],
"input, expected", [("", "."), ("..,", "."), (".,,", ","), ("..,,", ".")]
)
def test_guess_decimal_separator_returns_correct_separator(provided, expected):
def test_guess_decimal_separator_returns_correct_separator(input, expected):
from io import StringIO
from sensospot_parser.csv_parser import _guess_decimal_separator
handle = StringIO(f"header\n{provided}\n")
handle = StringIO(f"header\n{input}\n")
result = _guess_decimal_separator(handle)
assert result == expected
@ -99,17 +98,17 @@ def test_well_regex_ok(): @@ -99,17 +98,17 @@ def test_well_regex_ok():
assert result["column"] == "123"
@pytest.mark.parametrize("provided", ["", "A", "1", "1A", "-1", "A-"])
def test_well_regex_no_match(provided):
@pytest.mark.parametrize("input", ["", "A", "1", "1A", "-1", "A-"])
def test_well_regex_no_match(input):
from sensospot_parser.csv_parser import REGEX_WELL
result = REGEX_WELL.match(provided)
result = REGEX_WELL.match(input)
assert result is None
@pytest.mark.parametrize(
("filename", "expected"),
"filename, expected",
[("A1_1.csv", ("A", 1, 1)), ("test/measurement_1_H12_2", ("H", 12, 2))],
)
def test_extract_measurement_info_ok(filename, expected):
@ -124,7 +123,7 @@ def test_extract_measurement_info_ok(filename, expected): @@ -124,7 +123,7 @@ def test_extract_measurement_info_ok(filename, expected):
def test_extract_measurement_info_raises_error(filename):
from sensospot_parser.csv_parser import _extract_measurement_info
with pytest.raises(ValueError): # noqa: PT011
with pytest.raises(ValueError):
_extract_measurement_info(filename)
@ -179,7 +178,7 @@ def test_parse_file_raises_error(example_dir): @@ -179,7 +178,7 @@ def test_parse_file_raises_error(example_dir):
/ "should_raise_value_error.csv"
)
with pytest.raises(ValueError): # noqa: PT011
with pytest.raises(ValueError):
parse_csv_file(csv_file)
@ -224,6 +223,7 @@ def testparse_multiple_files_ok(example_dir, file_list): @@ -224,6 +223,7 @@ def testparse_multiple_files_ok(example_dir, file_list):
files = [sub_dir / file for file in file_list]
data_frame = parse_multiple_csv_files(files)
print(data_frame["Exposure.Id"].unique())
assert len(data_frame) == 100 * len(files)
assert len(data_frame["Exposure.Id"].unique()) == len(files)
@ -232,7 +232,7 @@ def testparse_multiple_files_ok(example_dir, file_list): @@ -232,7 +232,7 @@ def testparse_multiple_files_ok(example_dir, file_list):
def testparse_multiple_files_empty_file_list():
from sensospot_parser.csv_parser import parse_multiple_csv_files
with pytest.raises(ValueError): # noqa: PT011
with pytest.raises(ValueError):
parse_multiple_csv_files([])
@ -242,6 +242,7 @@ def testparse_multiple_files_empty_array(example_dir): @@ -242,6 +242,7 @@ def testparse_multiple_files_empty_array(example_dir):
files = [example_dir / "no_array_A1_1.csv"]
data_frame = parse_multiple_csv_files(files)
print(data_frame["Exposure.Id"].unique())
assert len(data_frame) == 1
@ -305,5 +306,5 @@ def test_sanity_check_raises_value_error(example_dir): @@ -305,5 +306,5 @@ def test_sanity_check_raises_value_error(example_dir):
data_frame = parse_multiple_csv_files(files)
data_frame = data_frame.drop(data_frame.index[1])
with pytest.raises(ValueError): # noqa: PT011
with pytest.raises(ValueError):
_sanity_check(data_frame)

4
tests/test_parameters.py

@ -1,6 +1,6 @@ @@ -1,6 +1,6 @@
import pandas
from .conftest import EXAMPLE_DIR_CSV_WITH_PARAMS, EXAMPLE_DIR_CSV_WO_PARAMS
from .conftest import EXAMPLE_DIR_CSV_WO_PARAMS, EXAMPLE_DIR_CSV_WITH_PARAMS
def test_search_params_file_ok(example_dir):
@ -32,8 +32,8 @@ def test_ssearch_measurement_params_file_parameters_file(tmpdir): @@ -32,8 +32,8 @@ def test_ssearch_measurement_params_file_parameters_file(tmpdir):
def test_parse_channel_info(example_dir):
from sensospot_parser.parameters import (
_parse_measurement_params,
_search_params_file,
_parse_measurement_params,
)
params = _search_params_file(example_dir / EXAMPLE_DIR_CSV_WITH_PARAMS)

17
tests/test_sensospot_data.py

@ -5,17 +5,16 @@ from .conftest import EXAMPLE_DIR_CSV_WO_PARAMS, EXAMPLE_DIR_XML_WO_PARAMS @@ -5,17 +5,16 @@ from .conftest import EXAMPLE_DIR_CSV_WO_PARAMS, EXAMPLE_DIR_XML_WO_PARAMS
def test_import_api():
from sensospot_parser import (
columns, # noqa: F401
main, # noqa: F401
parse_csv_folder, # noqa: F401
parse_folder, # noqa: F401
parse_xml_folder, # noqa: F401
)
from sensospot_parser import main # noqa: F401
from sensospot_parser import columns # noqa: F401
from sensospot_parser import parse_folder # noqa: F401
from sensospot_parser import parse_csv_folder # noqa: F401
from sensospot_parser import parse_xml_folder # noqa: F401
def test_compare_xml_to_csv(example_dir):
import pandas
from sensospot_parser import parse_csv_folder, parse_xml_folder
folder = example_dir / EXAMPLE_DIR_XML_WO_PARAMS
@ -27,14 +26,13 @@ def test_compare_xml_to_csv(example_dir): @@ -27,14 +26,13 @@ def test_compare_xml_to_csv(example_dir):
assert isinstance(xml_df, pandas.DataFrame)
assert len(csv_df) == len(xml_df)
assert set(csv_df.columns) == set(xml_df.columns)
assert set(csv_df["Well.Name"]) == set(xml_df["Well.Name"])
assert set(csv_df["Exposure.Id"]) == set(xml_df["Exposure.Id"])
assert set(csv_df["Spot.Diameter"]) == set(xml_df["Spot.Diameter"])
@pytest.mark.parametrize(
("folder", "length", "hasnans"),
"folder, length, hasnans",
[
(EXAMPLE_DIR_XML_WO_PARAMS, 6400, False),
(EXAMPLE_DIR_CSV_WO_PARAMS, 28800, True),
@ -42,6 +40,7 @@ def test_compare_xml_to_csv(example_dir): @@ -42,6 +40,7 @@ def test_compare_xml_to_csv(example_dir):
)
def test_parse_folder_switches_parser(example_dir, folder, length, hasnans):
import pandas
from sensospot_parser import parse_folder
result = parse_folder(example_dir / folder)

71
tests/test_xml_parser.py

@ -2,7 +2,7 @@ from datetime import datetime @@ -2,7 +2,7 @@ from datetime import datetime
import pytest
from .conftest import EXAMPLE_DIR_XML_WITH_PARAMS, EXAMPLE_DIR_XML_WO_PARAMS
from .conftest import EXAMPLE_DIR_XML_WO_PARAMS, EXAMPLE_DIR_XML_WITH_PARAMS
class DummyDataFunc:
@ -28,7 +28,7 @@ def test_parser_target_init(): @@ -28,7 +28,7 @@ def test_parser_target_init():
@pytest.mark.parametrize(
("tag", "attributes", "expected"),
"tag, attributes, expected",
[
("UnknownTag", {"ID": "something"}, {}),
(
@ -84,7 +84,7 @@ def test_parser_target_start_image_file_name(): @@ -84,7 +84,7 @@ def test_parser_target_start_image_file_name():
@pytest.mark.parametrize(
("data_type", "value", "expected"),
"data_type, value, expected",
[
("unknown type", 1, "1"),
("System.Int32", "12", 12),
@ -108,40 +108,16 @@ def test_parser_target_result_attributes_parser(data_type, value, expected): @@ -108,40 +108,16 @@ def test_parser_target_result_attributes_parser(data_type, value, expected):
@pytest.mark.parametrize(
("value", "expected"),
"value, expected",
[
(
"3/7/2022 5:31:47 PM",
datetime(2022, 3, 7, 17, 31, 47), # noqa: DTZ001
),
(
"03/7/2022 5:31:47 PM",
datetime(2022, 3, 7, 17, 31, 47), # noqa: DTZ001
),
(
"3/07/2022 5:31:47 PM",
datetime(2022, 3, 7, 17, 31, 47), # noqa: DTZ001
),
(
"03/07/2022 5:31:47 PM",
datetime(2022, 3, 7, 17, 31, 47), # noqa: DTZ001
),
(
"3/7/2022 5:3:47 PM",
datetime(2022, 3, 7, 17, 3, 47), # noqa: DTZ001
),
(
"3/7/2022 5:31:4 PM",
datetime(2022, 3, 7, 17, 31, 4), # noqa: DTZ001
),
(
"3/7/2022 5:31:47 pm",
datetime(2022, 3, 7, 17, 31, 47), # noqa: DTZ001
),
(
"3/7/2022 5:31:47 AM",
datetime(2022, 3, 7, 5, 31, 47), # noqa: DTZ001
),
("3/7/2022 5:31:47 PM", datetime(2022, 3, 7, 17, 31, 47)),
("03/7/2022 5:31:47 PM", datetime(2022, 3, 7, 17, 31, 47)),
("3/07/2022 5:31:47 PM", datetime(2022, 3, 7, 17, 31, 47)),
("03/07/2022 5:31:47 PM", datetime(2022, 3, 7, 17, 31, 47)),
("3/7/2022 5:3:47 PM", datetime(2022, 3, 7, 17, 3, 47)),
("3/7/2022 5:31:4 PM", datetime(2022, 3, 7, 17, 31, 4)),
("3/7/2022 5:31:47 pm", datetime(2022, 3, 7, 17, 31, 47)),
("3/7/2022 5:31:47 AM", datetime(2022, 3, 7, 5, 31, 47)),
],
)
def test_parser_target_data_timestamp_parser(value, expected):
@ -227,6 +203,8 @@ def test_find_result_xml_file_ok(tmp_path): @@ -227,6 +203,8 @@ def test_find_result_xml_file_ok(tmp_path):
xml_file = tmp_path / "result.xml"
xml_file.touch()
print(list(tmp_path.iterdir()))
result = _find_result_xml_file(tmp_path)
assert result == xml_file
@ -279,6 +257,8 @@ def test_find_result_hidden_xsl_file(tmp_path): @@ -279,6 +257,8 @@ def test_find_result_hidden_xsl_file(tmp_path):
xml_file = tmp_path / ".result.xml"
xml_file.touch()
print(list(tmp_path.iterdir()))
result = _find_result_xml_file(tmp_path)
assert result is None
@ -286,9 +266,10 @@ def test_find_result_hidden_xsl_file(tmp_path): @@ -286,9 +266,10 @@ def test_find_result_hidden_xsl_file(tmp_path):
def test_parse_xml_file_ok(example_dir):
import pandas
from sensospot_parser.xml_parser import (
_find_result_xml_file,
parse_xml_file,
_find_result_xml_file,
)
folder = example_dir / EXAMPLE_DIR_XML_WO_PARAMS
@ -307,10 +288,10 @@ def test_parse_xml_file_ok(example_dir): @@ -307,10 +288,10 @@ def test_parse_xml_file_ok(example_dir):
@pytest.mark.parametrize(
("file_name", "message"),
"file_name, message",
[
("not_existing.xml", "Xml file does not exist"),
("defect.xml", "Could not parse assay results xml file"),
("incomplete.xml", "Could not parse assay results xml file"),
("malformed_data.xml", "Malformed data in xml file"),
],
)
@ -319,14 +300,14 @@ def test_parse_xml_file_raies_error(file_name, message, example_dir): @@ -319,14 +300,14 @@ def test_parse_xml_file_raies_error(file_name, message, example_dir):
xml_file = example_dir / file_name
with pytest.raises(ValueError) as e: # noqa: PT011
with pytest.raises(ValueError) as e:
parse_xml_file(xml_file)
assert message in str(e)
assert message in str(e)
def test_parse_xml_folder_with_params(example_dir):
import pandas
from sensospot_parser.xml_parser import parse_xml_folder
folder = example_dir / EXAMPLE_DIR_XML_WITH_PARAMS
@ -340,6 +321,7 @@ def test_parse_xml_folder_with_params(example_dir): @@ -340,6 +321,7 @@ def test_parse_xml_folder_with_params(example_dir):
def test_parse_xml_folder_without_params(example_dir):
import pandas
from sensospot_parser.xml_parser import parse_xml_folder
folder = example_dir / EXAMPLE_DIR_XML_WO_PARAMS
@ -354,7 +336,6 @@ def test_parse_xml_folder_without_params(example_dir): @@ -354,7 +336,6 @@ def test_parse_xml_folder_without_params(example_dir):
def test_parse_xml_folder_non_existing_xml_file(tmp_path):
from sensospot_parser.xml_parser import parse_xml_folder
with pytest.raises(ValueError) as e: # noqa: PT011
with pytest.raises(ValueError) as e:
parse_xml_folder(tmp_path)
assert "Could not find assay results xml file" in str(e)
assert "Could not find assay results xml file" in str(e)

14
tox.ini

@ -0,0 +1,14 @@ @@ -0,0 +1,14 @@
[tox]
envlist = py39, py310
isolated_build = True
[testenv]
deps =
pytest
pytest-cov
pytest-mock
setuptools>=41.2.0
pip>=20.0.2
changedir = {toxinidir}/tests
commands = pytest --cov=sensovation_parser
Loading…
Cancel
Save