import pandas as pdfrom pandas import Series, DataFrameimport numpy as np
import pandas as pd
pd.__version__
pd.read_*() function.pd.read_csv() accepts a filename, including remote URLs..to_*() methods. Series is a fixed-length, ordered dictionary. Series are closely related to the DataFrame class. pd.Series() constructor with a dict. nyc_air = pd.Series(
{'LGA': 'East Elmhurst', 'JFK': 'Jamaica', 'EWR': 'Newark'})
nyc_air.index.name = 'airport'
nyc_air.name = 'city'
nyc_air
DataFrame class is the primary way of representing
heterogeneous, rectangular data in Python. DataFrame can be thought of as an ordered dictionary of Series
(columns) with a shared index (rown ames). DataFrame directly from data is
to pass a dict of equal length lists, NumPy arrays, or Series,
to pd.DataFrame().columns argument to order the columns.wiki = pd.Series({
'LGA': 'https://en.wikipedia.org/wiki/LaGuardia_Airport',
'EWR': 'https://en.wikipedia.org/wiki/Newark_Liberty_International_Airport',
'JFK': 'https://en.wikipedia.org/wiki/John_F._Kennedy_International_Airport'
})
df = pd.DataFrame({'city': nyc_air, 'wiki': wiki})
df
[] with a string (caution) or list of strings,.iloc[:, 0:2] indexer,.loc[:, ["col1", "col2"] indexer.df.column
(but don't). city = df['city']
city2 = df['city'].copy()
df_city = df[['city']]
[(city is df['city'], df_city is df[['city']]), (type(city), type(df_city))]
[] with a string (caution) or list of strings,.iloc[:, 0:2] indexer,.loc[:, ["col1", "col2"] indexer.df.column
(but don't). df.loc[["JFK", "LGA"], "city"] = "JFK" # always returns a view
df
del keyword or (better) the
.drop(columns='col', inplace=True) method. dat = pd.DataFrame({'a': range(5), 'b': np.linspace(0, 5, 5)})
dat['c'] = dat['d'] = dat['a'] + dat['b']
del dat['c']
dat.drop(columns='a', inplace=True)
dat
.iloc[0, :] or by index using
.loc["a", :]. Index class in more detail. dat.iloc[0, :] = -1
print(dat.index)
dat.loc[0:5:2, :] # takes a slice object b/c uses RangeIndex()
.query() method. .query() is a string containing a Boolean
expression involving column names. b = dat[dat['b'] > 0]
q = dat.query('b > 0')
[b, q]
DataFrames are used to represent tidy, rectangular data. DataFrames as a collection of Series of the same length and
sharing an index. Series or a (new) DataFrame