import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import pandas as pd
pd.__version__
pd.read_*()
function.pd.read_csv()
accepts a filename, including remote URLs..to_*()
methods. Series
is a fixed-length, ordered dictionary. Series
are closely related to the DataFrame
class. pd.Series()
constructor with a dict
. nyc_air = pd.Series(
{'LGA': 'East Elmhurst', 'JFK': 'Jamaica', 'EWR': 'Newark'})
nyc_air.index.name = 'airport'
nyc_air.name = 'city'
nyc_air
DataFrame
class is the primary way of representing
heterogeneous, rectangular data in Python. DataFrame
can be thought of as an ordered dictionary of Series
(columns) with a shared index (rown ames). DataFrame
directly from data is
to pass a dict
of equal length lists, NumPy arrays, or Series
,
to pd.DataFrame()
.columns
argument to order the columns.wiki = pd.Series({
'LGA': 'https://en.wikipedia.org/wiki/LaGuardia_Airport',
'EWR': 'https://en.wikipedia.org/wiki/Newark_Liberty_International_Airport',
'JFK': 'https://en.wikipedia.org/wiki/John_F._Kennedy_International_Airport'
})
df = pd.DataFrame({'city': nyc_air, 'wiki': wiki})
df
[]
with a string (caution) or list of strings,.iloc[:, 0:2]
indexer,.loc[:, ["col1", "col2"]
indexer.df.column
(but don't). city = df['city']
city2 = df['city'].copy()
df_city = df[['city']]
[(city is df['city'], df_city is df[['city']]), (type(city), type(df_city))]
[]
with a string (caution) or list of strings,.iloc[:, 0:2]
indexer,.loc[:, ["col1", "col2"]
indexer.df.column
(but don't). df.loc[["JFK", "LGA"], "city"] = "JFK" # always returns a view
df
del
keyword or (better) the
.drop(columns='col', inplace=True)
method. dat = pd.DataFrame({'a': range(5), 'b': np.linspace(0, 5, 5)})
dat['c'] = dat['d'] = dat['a'] + dat['b']
del dat['c']
dat.drop(columns='a', inplace=True)
dat
.iloc[0, :]
or by index using
.loc["a", :]
. Index
class in more detail. dat.iloc[0, :] = -1
print(dat.index)
dat.loc[0:5:2, :] # takes a slice object b/c uses RangeIndex()
.query()
method. .query()
is a string containing a Boolean
expression involving column names. b = dat[dat['b'] > 0]
q = dat.query('b > 0')
[b, q]
DataFrames
are used to represent tidy, rectangular data. DataFrames
as a collection of Series
of the same length and
sharing an index. Series
or a (new) DataFrame