Last updated: 2025-04-05
This assignment is due April 26th at 11:59 PM. Submit your solution on Brightspace, under the “Unit 11” assignment.
Please copy your code into the text box, making sure to indent it properly with whitespace so that it appears the same as in IDLE or VSCode or wherever you wrote the code. This will make it easier for me to grade.
You can submit multiple times. I will only grade your last submission.
In order to do this assignment you will need pandas and matplotlib installed. We went through how to do that in class. If you look it up online, you should be able to figure it out.
You will use these datasets in your assignment:
Here are some helper functions for plotting. The function plot_regression is tailored to dealing with the data for KKR (and other) stocks, while plot_coords will work for the general “Coordinates” file given above.
When predicting values based on the regression, you may want to print out the coefficients generated by np.polyfit within the plot_regression function, and plug them into Desmos to see how the function behaves. You will need to figure out how the conversion is being done between the numeric index and the date in order to understand how the function given by the regression coordinates relates to date, so that you can evaluate the function in the year 2027.
def plot_coords(filename):
df = pd.read_csv(filename, header=None)
plt.plot(df[0],df[1],'.')
def plot_regression(filename, col_x, col_y, degree, num_days=None, stripchar='$'):
df = pd.read_csv(filename)
df[col_y] = df[col_y].str.lstrip(stripchar).astype(float)
df[col_x] = pd.to_datetime(df[col_x], format='%m/%d/%Y')
# Sort from earliest to latest
df = df.sort_values(by=col_x)
# Filter to the most recent num_days, if specified
if num_days is not None:
latest_date = df[col_x].max()
earliest_date = latest_date - pd.Timedelta(days=num_days)
df = df[df[col_x] >= earliest_date]
df.reset_index(drop=True, inplace=True)
# Fit polynomial to the index (which is now chronological)
coeffs = np.polyfit(df.index, df[col_y], deg=degree)
# Generate line for plotting
line_x = df.index[::max(1, len(df)//100)]
line_y = np.polyval(coeffs, line_x)
# Plot
plt.figure(figsize=(10, 6))
plt.plot(df.index, df[col_y], label="Data")
plt.plot(line_x, line_y, color='red', label=f"Poly (deg {degree})")
plt.xticks(df.index[::max(1, len(df)//10)], df[col_x].dt.date[::max(1, len(df)//10)], rotation=25)
plt.legend()
plt.title(f"Polynomial Regression (Last {num_days} Days)" if num_days else "Polynomial Regression")
plt.xlabel(col_x)
plt.ylabel(col_y)
plt.tight_layout()
plt.show()When answering non-code questions, write the answer as a comment in the file next to the relevant pieces of code you wrote to get your answer. You should have one comment for each of the below tasks.
Note: For the stock prediction questions: If you are having trouble predicting the exact values for the year 2027 using the regression coefficients, you can just try extending the graph manually and guessing what the values will be.
plot_coords function for this.deg=1) using np.polyfit like we saw in class, and use the resulting parameters to predict what the price of KKR stock will be in 2027.
plot_regression function for this. Add a print statement after the coeffs are calculated to print them out, and use the equation of a line y=m_1 x + m_0 to predict the KKR stock value in 2027.plot_regression, you will want to use the string Date for col_x and Low for col_y. The filename is the name of the KKR stock CSV file on your computer (make sure it’s in the same folder as your code). The degree is used as the deg parameter to np.polyfit.np.polyfit and deg=2) on the KKR stock data for the last 240 days. You should get 3 numbers as a result, which correspond to the coordinates m_2, m_1, and m_0 respectively of the polynomial y = m_2x^2 + m_1x + m_0. Based on this regression, what do you expect the price of KKR stock to be in 2027?
plot_regression function for this too. Use num_days=240 for this part.Follow-up question (not graded, for fun): If someone was given $15,000 in KKR RSUs (restricted stock units) on February 14th of 2024, how much money (in unrealized gains) has that person lost by April 4th, 2025, when compared to the peak attained over the ownership timeframe? (The answer should be a little over $10,000.)
You should be able to do all of the tasks with only the Python topics we covered in class so far.
If you want to use more complex functionality than what we discussed in class, the Python documentation may be helpful: Python 3.10 documentation
Additionally, the pandas and matplotlib documentation may be helpful: pandas documentation, matplotlib documentation