Misc code snippets

Holger Frey 2547d8ea2a modified the function signature of `split_uniques()` To specify multiple columns, you add them directly to the function call instead of using a container. OLD: split_uniques(data, ["A", "B"]) NEW: split_uniques(data, "A", "B") This remove the necessity to differentiate between a single string and other containers.		3 years ago
.gitignore	Initial commit	3 years ago
LICENSE	Initial commit	3 years ago
Makefile	setup of infrastructure	3 years ago
README.md	modified the function signature of `split_uniques()`	3 years ago
linear_regression.py	the method `Regression.predict()` requires 'x' or 'y' as keyword argument only	3 years ago
pytest.ini	added test for the `to_dict()` method	3 years ago
requirements.txt	setup of infrastructure	3 years ago
split_uniques.py	modified the function signature of `split_uniques()`	3 years ago

README.md

snippets

Misc code snippets I sometimes need and always have to look up how it works...

linear_regression.py

Calculate the linear regression on two columns of a data frame. The resulting object has the function predict() to calculate x or y values for a given counterpart.

from linear_regression import linear_regression

df = pd.DataFrame({"temperature":[...], "signal":[...]})

regression = linear_regression(df, x="temperature", y="signal")

repr(regression) == "Regression(intercept=1, coefficient=3, score=0.9998)"

regression.predict(x=3) == 10
regression.predict(y=7) == 2

split_uniques.py

Splits a data frame on uniques values in a column

Returns a generator of tuples with at least two elements. The last element is the resulting partial data frame, the element(s) before are the values used to split up the original data.

from split_uniques import split_uniques

df = pd.DataFrame({
        "A": [1, 2, 2], 
        "B": [3, 4, 3], 
        "C": ["x", "y", "z"]
    })

result = list(split_uniques(df, "B"))

assert len(result) == 2

value, data = result[0]
assert value == 3
assert data == pd.DataFrame({
        "A": [1, 1], 
        "B": [3, 3], 
        "C": ["x", "z"]
    })

value, data = result[1]
assert value == 4
assert data == pd.DataFrame({
        "A": [2], 
        "B": [4], 
        "C": ["y"]
    })

This construct might look a little bit weird, but it makes it easy to use the function in a loop definition:

for well, probe, partial_data in split_uniques(full_data, "Well", "Probe"):
    # partial data only contains values for one well and one probe