Sensospot Tools =============== Some small tools for working with parsed Sensospot data. ## Selecting and spliting a pandas data frame ### select(data: DataFrame, column: str, value: Any) -> DataFrame Selects rows of a dataframe based on a value in a column Example: ```python from sensospot_tools import select print(data) category value 0 dog 1 1 cat 2 2 horse 3 3 cat 4 print(select(data, "category", "cat")) category value 1 cat 2 3 cat 4 ``` ### split(data: DataFrame, *on: Any) -> Iterator[tuple[Any, ..., DataFrame]] Splits a data frame on unique values in multiple columns Returns a generator of tuples with at least two elements. The _last_ element is the resulting partial data frame, the element(s) before are the values used to split up the original data. Example: ```python from sensospot_tools import split print(data) category value 0 dog 1 1 cat 2 2 horse 3 3 cat 4 result = dict( split(data, column="category") ) print(result["dog"]) category value 0 dog 1 print(result["cat"]) category value 1 cat 2 3 cat 4 print(result["horse"]) category value 2 horse 3 ``` ## Working with data with multiple exposure times ### select_hdr_data(data: DataFrame, spot_id_columns: list[str], time_column: str, overflow_column: str) -> DataFrame: Selects the data for increased dynamic measurement range. To increase the dynamic range of a measurement, multiple exposures of one microarray might be taken. This function selects the data of only one exposure time per spot, based on the information if the spot is in overflow. It starts with the weakest signals (longest exposure time) first and chooses the next lower exposure time, if the result in the `overflow_column` is `True`. This is done for each spot, and therfore a spot needs a way to be identified across multiple exposure times. Examples for this are: - for a single array: the spot id (e.g. "Pos.Id") - for multiple arrays: the array position and the spot id (e.g. "Well.Name" and "Pos.Id") - for multiple runs: the name of the run, array position and the spot id (e.g. "File.Name", "Well.Name" and "Pos.Id") The function will raise a KeyError if any of the provided column names is not present in the data frame ### normalize(data: DataFrame, normalized_time: Union[int, float], time_column: str, value_columns: list[str], template: str) -> DataFrame: normalizes values to a normalized exposure time Will raise a KeyError, if any column is not in the data frame; raises ValueError if no template string was provided. ## Development To install the development version of Sensospot Tools: git clone https://git.cpi.imtek.uni-freiburg.de/holgi/sensospot_tools.git # create a virtual environment and install all required dev dependencies cd sensospot_tools make devenv To run the tests, use `make tests` or `make coverage` for a complete report. To generate the documentation pages use `make docs` or `make serve-docs` for starting a webserver with the generated documentation