Last updated: 2025-04-05
This assignment is due April 26th at 11:59 PM. Submit your solution on Brightspace, under the “Unit 11” assignment.
Please copy your code into the text box, making sure to indent it properly with whitespace so that it appears the same as in IDLE or VSCode or wherever you wrote the code. This will make it easier for me to grade.
You can submit multiple times. I will only grade your last submission.
In order to do this assignment you will need pandas and matplotlib installed. We went through how to do that in class. If you look it up online, you should be able to figure it out.
You will use these datasets in your assignment:
Here are some helper functions for plotting. The function plot_regression
is tailored to dealing with the data for KKR (and other) stocks, while plot_coords
will work for the general “Coordinates” file given above.
When predicting values based on the regression, you may want to print out the coefficients generated by np.polyfit
within the plot_regression
function, and plug them into Desmos to see how the function behaves. You will need to figure out how the conversion is being done between the numeric index and the date in order to understand how the function given by the regression coordinates relates to date, so that you can evaluate the function in the year 2027.
def plot_coords(filename):
= pd.read_csv(filename, header=None)
df 0],df[1],'.')
plt.plot(df[
def plot_regression(filename, col_x, col_y, degree, num_days=None, stripchar='$'):
= pd.read_csv(filename)
df = df[col_y].str.lstrip(stripchar).astype(float)
df[col_y] = pd.to_datetime(df[col_x], format='%m/%d/%Y')
df[col_x]
# Sort from earliest to latest
= df.sort_values(by=col_x)
df
# Filter to the most recent num_days, if specified
if num_days is not None:
= df[col_x].max()
latest_date = latest_date - pd.Timedelta(days=num_days)
earliest_date = df[df[col_x] >= earliest_date]
df
=True, inplace=True)
df.reset_index(drop
# Fit polynomial to the index (which is now chronological)
= np.polyfit(df.index, df[col_y], deg=degree)
coeffs
# Generate line for plotting
= df.index[::max(1, len(df)//100)]
line_x = np.polyval(coeffs, line_x)
line_y
# Plot
=(10, 6))
plt.figure(figsize="Data")
plt.plot(df.index, df[col_y], label='red', label=f"Poly (deg {degree})")
plt.plot(line_x, line_y, colormax(1, len(df)//10)], df[col_x].dt.date[::max(1, len(df)//10)], rotation=25)
plt.xticks(df.index[::
plt.legend()f"Polynomial Regression (Last {num_days} Days)" if num_days else "Polynomial Regression")
plt.title(
plt.xlabel(col_x)
plt.ylabel(col_y)
plt.tight_layout() plt.show()
When answering non-code questions, write the answer as a comment in the file next to the relevant pieces of code you wrote to get your answer. You should have one comment for each of the below tasks.
Note: For the stock prediction questions: If you are having trouble predicting the exact values for the year 2027 using the regression coefficients, you can just try extending the graph manually and guessing what the values will be.
plot_coords
function for this.deg=1
) using np.polyfit
like we saw in class, and use the resulting parameters to predict what the price of KKR stock will be in 2027.
plot_regression
function for this. Add a print statement after the coeffs
are calculated to print them out, and use the equation of a line y=m_1 x + m_0 to predict the KKR stock value in 2027.plot_regression
, you will want to use the string Date
for col_x
and Low
for col_y
. The filename is the name of the KKR stock CSV file on your computer (make sure it’s in the same folder as your code). The degree is used as the deg
parameter to np.polyfit
.np.polyfit
and deg=2
) on the KKR stock data for the last 240 days. You should get 3 numbers as a result, which correspond to the coordinates m_2, m_1, and m_0 respectively of the polynomial y = m_2x^2 + m_1x + m_0. Based on this regression, what do you expect the price of KKR stock to be in 2027?
plot_regression
function for this too. Use num_days=240
for this part.Follow-up question (not graded, for fun): If someone was given $15,000 in KKR RSUs (restricted stock units) on February 14th of 2024, how much money (in unrealized gains) has that person lost by April 4th, 2025, when compared to the peak attained over the ownership timeframe? (The answer should be a little over $10,000.)
You should be able to do all of the tasks with only the Python topics we covered in class so far.
If you want to use more complex functionality than what we discussed in class, the Python documentation may be helpful: Python 3.10 documentation
Additionally, the pandas and matplotlib documentation may be helpful: pandas documentation, matplotlib documentation