You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Holger Frey
1019a9781d
|
2 years ago | |
---|---|---|
.gitignore | 2 years ago | |
LICENSE | 2 years ago | |
Makefile | 2 years ago | |
README.md | 2 years ago | |
linear_regression.py | 2 years ago | |
pytest.ini | 2 years ago | |
requirements.txt | 2 years ago | |
split_uniques.py | 2 years ago |
README.md
snippets
Misc code snippets I sometimes need and always have to look up how it works...
linear_regression.py
Calculate the linear regression on two columns of a data frame. The resulting
object has the function predict()
to calculate x or y values for a given
counterpart.
from linear_regression import linear_regression
df = pd.DataFrame({"temperature":[...], "signal":[...]})
regression = linear_regression(df, x="temperature", y="signal")
repr(regression) == "Regression(intercept=1, coefficient=3, score=0.9998)"
regression.predict(x=3) == 10
regression.predict(y=7) == 2
split_uniques.py
Splits a data frame on uniques values in a column
Returns a generator of tuples with at least two elements. The last element is the resulting partial data frame, the element(s) before are the values used to split up the original data.
from split_uniques import split_uniques
df = pd.DataFrame({
"A": [1, 2, 2],
"B": [3, 4, 3],
"C": ["x", "y", "z"]
})
result = list(split_uniques(df, ["B"]))
assert len(result) == 3
value, data = result[0]
assert value == 3
assert data == pd.DataFrame({
"A": [1, 1],
"B": [3, 3],
"C": ["x", "z"]
})
value, data = result[1]
assert value == 4
assert data == pd.DataFrame({
"A": [2],
"B": [4],
"C": ["y"]
})
This construct might look a little bit weird, but it makes it easy to use the function in a loop definition:
for well, probe, partial_data in split_uniques(full_data, ["Well", "Probe"]):
...