{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Premiers pas en *pandas*\n", "\n", "***\n", "> __Auteur__: Joseph Salmon\n", "> , adapté en francais du travail de Joris Van den Bossche:\n", "https://github.com/jorisvandenbossche/pandas-tutorial/blob/master/01-pandas_introduction.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "## Sommaire\n", "\n", "* __[Introduction et présentation](#intro)__
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "\n", "# Introduction et présentation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib notebook\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "pd.options.display.max_rows = 8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cas 1: Survie sur le Titanic " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "file_sizes: 100%|██████████████████████████| 61.2k/61.2k [00:00<00:00, 1.38MB/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading data from http://josephsalmon.eu/enseignement/datasets/titanic.csv (60 kB)\n", "\n", "Successfully downloaded file to ./titanic.csv\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "from download import download\n", "\n", "url = \"http://josephsalmon.eu/enseignement/datasets/titanic.csv\"\n", "path_target = \"./titanic.csv\"\n", "download(url, path_target, replace=False)\n", "\n", "# df: data frame\n", "df_titanic_raw = pd.read_csv(\"titanic.csv\")" ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassAgeSibSpParchFare
count183.000000183.000000183.000000183.000000183.000000183.000000183.000000
mean455.3661200.6721311.19125735.6744260.4644810.47541078.682469
std247.0524760.4707250.51518715.6438660.6441590.75461776.347843
min2.0000000.0000001.0000000.9200000.0000000.0000000.000000
25%263.5000000.0000001.00000024.0000000.0000000.00000029.700000
50%457.0000001.0000001.00000036.0000000.0000000.00000057.000000
75%676.0000001.0000001.00000047.5000001.0000001.00000090.000000
max890.0000001.0000003.00000080.0000003.0000004.000000512.329200
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Age SibSp \\\n", "count 183.000000 183.000000 183.000000 183.000000 183.000000 \n", "mean 455.366120 0.672131 1.191257 35.674426 0.464481 \n", "std 247.052476 0.470725 0.515187 15.643866 0.644159 \n", "min 2.000000 0.000000 1.000000 0.920000 0.000000 \n", "25% 263.500000 0.000000 1.000000 24.000000 0.000000 \n", "50% 457.000000 1.000000 1.000000 36.000000 0.000000 \n", "75% 676.000000 1.000000 1.000000 47.500000 1.000000 \n", "max 890.000000 1.000000 3.000000 80.000000 3.000000 \n", "\n", " Parch Fare \n", "count 183.000000 183.000000 \n", "mean 0.475410 78.682469 \n", "std 0.754617 76.347843 \n", "min 0.000000 0.000000 \n", "25% 0.000000 29.700000 \n", "50% 0.000000 57.000000 \n", "75% 1.000000 90.000000 \n", "max 4.000000 512.329200 " ] }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic_raw.tail(n=3)\n", "df_titanic_raw.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Valeurs manquantes:\n", "Pour faciliter la suite on ne garde que les observations qui sont complètes, on enlève donc ici les valeurs manquantes" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "879 880 1 1 \n", "887 888 1 1 \n", "889 890 1 1 \n", "\n", " Name Sex Age SibSp \\\n", "879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "879 1 11767 83.1583 C50 C \n", "887 0 112053 30.0000 B42 S \n", "889 0 111369 30.0000 C148 C " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic = df_titanic_raw.dropna()\n", "df_titanic.tail(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Description succinte des variables:\n", "- Survival - Survie (0 = Non; 1 = Oui).\n", "- Pclass - Passenger Class / Classe du passager (1 = 1ere; 2 = 2nde; 3 = 3ème)\n", "- Name - Nom\n", "- Sex - Sexe\n", "- Age - Age\n", "- Sibsp - Nombre de frères / soeurs / maris / épouses à bord\n", "- Parch - Nombre de parents ascendants / enfants à bord\n", "- Ticket - Numéro du ticket\n", "- Fare - Prix du ticket (British pound)\n", "- Cabin - Cabine\n", "- Embarked - Port d'embarquation (C = Cherbourg; Q = Queenstown; S = Southampton)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Descriptif rapide:\n", "- count - effectif\n", "- mean - moyenne\n", "- std (**st**andard **d**eviation - écart-type)\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassAgeSibSpParchFare
count183.000000183.000000183.000000183.000000183.000000183.000000183.000000
mean455.3661200.6721311.19125735.6744260.4644810.47541078.682469
std247.0524760.4707250.51518715.6438660.6441590.75461776.347843
min2.0000000.0000001.0000000.9200000.0000000.0000000.000000
25%263.5000000.0000001.00000024.0000000.0000000.00000029.700000
50%457.0000001.0000001.00000036.0000000.0000000.00000057.000000
75%676.0000001.0000001.00000047.5000001.0000001.00000090.000000
max890.0000001.0000003.00000080.0000003.0000004.000000512.329200
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Age SibSp \\\n", "count 183.000000 183.000000 183.000000 183.000000 183.000000 \n", "mean 455.366120 0.672131 1.191257 35.674426 0.464481 \n", "std 247.052476 0.470725 0.515187 15.643866 0.644159 \n", "min 2.000000 0.000000 1.000000 0.920000 0.000000 \n", "25% 263.500000 0.000000 1.000000 24.000000 0.000000 \n", "50% 457.000000 1.000000 1.000000 36.000000 0.000000 \n", "75% 676.000000 1.000000 1.000000 47.500000 1.000000 \n", "max 890.000000 1.000000 3.000000 80.000000 3.000000 \n", "\n", " Parch Fare \n", "count 183.000000 183.000000 \n", "mean 0.475410 78.682469 \n", "std 0.754617 76.347843 \n", "min 0.000000 0.000000 \n", "25% 0.000000 29.700000 \n", "50% 0.000000 57.000000 \n", "75% 1.000000 90.000000 \n", "max 4.000000 512.329200 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compréhension visualisation de la base de données:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Quelle est la répartition par âge des passagers?**" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support.' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
')\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", " nav_element.append(button);\n", " }\n", "\n", " // Add the status bar.\n", " var status_bar = $('');\n", " nav_element.append(status_bar);\n", " this.message = status_bar[0];\n", "\n", " // Add the close button to the window.\n", " var buttongrp = $('
');\n", " var button = $('');\n", " button.click(function (evt) { fig.handle_close(fig, {}); } );\n", " button.mouseover('Stop Interaction', toolbar_mouse_event);\n", " buttongrp.append(button);\n", " var titlebar = this.root.find($('.ui-dialog-titlebar'));\n", " titlebar.prepend(buttongrp);\n", "}\n", "\n", "mpl.figure.prototype._root_extra_style = function(el){\n", " var fig = this\n", " el.on(\"remove\", function(){\n", "\tfig.close_ws(fig, {});\n", " });\n", "}\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(el){\n", " // this is important to make the div 'focusable\n", " el.attr('tabindex', 0)\n", " // reach out to IPython and tell the keyboard manager to turn it's self\n", " // off when our div gets focus\n", "\n", " // location in version 3\n", " if (IPython.notebook.keyboard_manager) {\n", " IPython.notebook.keyboard_manager.register_events(el);\n", " }\n", " else {\n", " // location in version 2\n", " IPython.keyboard_manager.register_events(el);\n", " }\n", "\n", "}\n", "\n", "mpl.figure.prototype._key_event_extra = function(event, name) {\n", " var manager = IPython.notebook.keyboard_manager;\n", " if (!manager)\n", " manager = IPython.keyboard_manager;\n", "\n", " // Check for shift+enter\n", " if (event.shiftKey && event.which == 13) {\n", " this.canvas_div.blur();\n", " event.shiftKey = false;\n", " // Send a \"J\" for go to next cell\n", " event.which = 74;\n", " event.keyCode = 74;\n", " manager.command_mode();\n", " manager.handle_keydown(event);\n", " }\n", "}\n", "\n", "mpl.figure.prototype.handle_save = function(fig, msg) {\n", " fig.ondownload(fig, null);\n", "}\n", "\n", "\n", "mpl.find_output_cell = function(html_output) {\n", " // Return the cell and output element which can be found *uniquely* in the notebook.\n", " // Note - this is a bit hacky, but it is done because the \"notebook_saving.Notebook\"\n", " // IPython event is triggered only after the cells have been serialised, which for\n", " // our purposes (turning an active figure into a static one), is too late.\n", " var cells = IPython.notebook.get_cells();\n", " var ncells = cells.length;\n", " for (var i=0; i= 3 moved mimebundle to data attribute of output\n", " data = data.data;\n", " }\n", " if (data['text/html'] == html_output) {\n", " return [cell, data, j];\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "// Register the function which deals with the matplotlib target/channel.\n", "// The kernel may be null if the page has been refreshed.\n", "if (IPython.notebook.kernel != null) {\n", " IPython.notebook.kernel.comm_manager.register_target('matplotlib', mpl.mpl_figure_comm);\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "Text(0.5,1,\"Estimation de la densité de l'âge des passagers\")" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plt.figure()\n", "ax = sns.kdeplot(df_titanic['Age'], shade=True, cut=0, bw=3)\n", "plt.xlabel('Proportion')\n", "plt.ylabel('Age')\n", "ax.legend().set_visible(False)\n", "plt.title(\"Estimation de la densité de l'âge des passagers\")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ax = sns.kdeplot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Comment le taux de survie des passagers diffère-t-il entre les sexes?**" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Survived
Sex
female0.742038
male0.188908
\n", "
" ], "text/plain": [ " Survived\n", "Sex \n", "female 0.742038\n", "male 0.188908" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic_raw.groupby('Sex')[['Survived']].aggregate(lambda x: x.mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Ou en quoi diffère-t-il entre les différentes classes? **" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',\n", " 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],\n", " dtype='object')" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic.columns" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support.' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
')\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", " nav_element.append(button);\n", " }\n", "\n", " // Add the status bar.\n", " var status_bar = $('');\n", " nav_element.append(status_bar);\n", " this.message = status_bar[0];\n", "\n", " // Add the close button to the window.\n", " var buttongrp = $('
');\n", " var button = $('');\n", " button.click(function (evt) { fig.handle_close(fig, {}); } );\n", " button.mouseover('Stop Interaction', toolbar_mouse_event);\n", " buttongrp.append(button);\n", " var titlebar = this.root.find($('.ui-dialog-titlebar'));\n", " titlebar.prepend(buttongrp);\n", "}\n", "\n", "mpl.figure.prototype._root_extra_style = function(el){\n", " var fig = this\n", " el.on(\"remove\", function(){\n", "\tfig.close_ws(fig, {});\n", " });\n", "}\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(el){\n", " // this is important to make the div 'focusable\n", " el.attr('tabindex', 0)\n", " // reach out to IPython and tell the keyboard manager to turn it's self\n", " // off when our div gets focus\n", "\n", " // location in version 3\n", " if (IPython.notebook.keyboard_manager) {\n", " IPython.notebook.keyboard_manager.register_events(el);\n", " }\n", " else {\n", " // location in version 2\n", " IPython.keyboard_manager.register_events(el);\n", " }\n", "\n", "}\n", "\n", "mpl.figure.prototype._key_event_extra = function(event, name) {\n", " var manager = IPython.notebook.keyboard_manager;\n", " if (!manager)\n", " manager = IPython.keyboard_manager;\n", "\n", " // Check for shift+enter\n", " if (event.shiftKey && event.which == 13) {\n", " this.canvas_div.blur();\n", " event.shiftKey = false;\n", " // Send a \"J\" for go to next cell\n", " event.which = 74;\n", " event.keyCode = 74;\n", " manager.command_mode();\n", " manager.handle_keydown(event);\n", " }\n", "}\n", "\n", "mpl.figure.prototype.handle_save = function(fig, msg) {\n", " fig.ondownload(fig, null);\n", "}\n", "\n", "\n", "mpl.find_output_cell = function(html_output) {\n", " // Return the cell and output element which can be found *uniquely* in the notebook.\n", " // Note - this is a bit hacky, but it is done because the \"notebook_saving.Notebook\"\n", " // IPython event is triggered only after the cells have been serialised, which for\n", " // our purposes (turning an active figure into a static one), is too late.\n", " var cells = IPython.notebook.get_cells();\n", " var ncells = cells.length;\n", " for (var i=0; i= 3 moved mimebundle to data attribute of output\n", " data = data.data;\n", " }\n", " if (data['text/html'] == html_output) {\n", " return [cell, data, j];\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "// Register the function which deals with the matplotlib target/channel.\n", "// The kernel may be null if the page has been refreshed.\n", "if (IPython.notebook.kernel != null) {\n", " IPython.notebook.kernel.comm_manager.register_target('matplotlib', mpl.mpl_figure_comm);\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "/home/jo/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.\n", " return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval\n" ] } ], "source": [ "sns.catplot(x=df_titanic_raw.columns[2], y=\"Age\",\n", " hue=\"Sex\", data=df_titanic_raw, kind=\"violin\", legend=False)\n", "plt.title(\"Taux de survie par classe\")\n", "plt.legend?\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cas 2: qualité de l'air" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Replace is False and data exists, so doing nothing. Use replace==True to re-download the data.\n" ] }, { "data": { "text/plain": [ "'./20080421_20160927-PA13_auto.csv'" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "url = \"http://josephsalmon.eu/enseignement/datasets/20080421_20160927-PA13_auto.csv\"\n", "path_target = \"./20080421_20160927-PA13_auto.csv\"\n", "download(url, path_target, replace=False)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NO2O3
DateTime
2008-04-21 00:00:0028.036.0
2008-04-21 01:00:0013.074.0
2008-04-21 02:00:0011.073.0
2008-04-21 03:00:0013.064.0
2008-04-21 04:00:0023.046.0
\n", "
" ], "text/plain": [ " NO2 O3\n", "DateTime \n", "2008-04-21 00:00:00 28.0 36.0\n", "2008-04-21 01:00:00 13.0 74.0\n", "2008-04-21 02:00:00 11.0 73.0\n", "2008-04-21 03:00:00 13.0 64.0\n", "2008-04-21 04:00:00 23.0 46.0" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "polution_df = pd.read_csv('20080421_20160927-PA13_auto.csv', sep=';',\n", " comment='#', na_values=\"n/d\",\n", " converters={'heure': str})\n", "# check issues with 24:00:\n", "# https://www.tutorialspoint.com/python/time_strptime.htm\n", "\n", "\n", "# Pré-traitement:\n", "polution_df['heure'] = polution_df['heure'].replace('24', '0')\n", "time_improved = pd.to_datetime(polution_df['date'] +\n", " ' ' + polution_df['heure'] + ':00',\n", " format='%d/%m/%Y %H:%M')\n", "\n", "polution_df['DateTime'] = time_improved\n", "del polution_df['heure']\n", "del polution_df['date']\n", "\n", "polution_ts = polution_df.set_index(['DateTime'])\n", "polution_ts = polution_ts.sort_index()\n", "polution_ts.head()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NO2O3
count71008.00000071452.000000
mean34.45341439.610046
std20.38070228.837333
min1.0000000.000000
25%19.00000016.000000
50%30.00000038.000000
75%46.00000058.000000
max167.000000211.000000
\n", "
" ], "text/plain": [ " NO2 O3\n", "count 71008.000000 71452.000000\n", "mean 34.453414 39.610046\n", "std 20.380702 28.837333\n", "min 1.000000 0.000000\n", "25% 19.000000 16.000000\n", "50% 30.000000 38.000000\n", "75% 46.000000 58.000000\n", "max 167.000000 211.000000" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Pollution sur Paris au cours des années, Source: Airparif\n", "polution_ts.describe()" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support.' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
')\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", " nav_element.append(button);\n", " }\n", "\n", " // Add the status bar.\n", " var status_bar = $('');\n", " nav_element.append(status_bar);\n", " this.message = status_bar[0];\n", "\n", " // Add the close button to the window.\n", " var buttongrp = $('
');\n", " var button = $('');\n", " button.click(function (evt) { fig.handle_close(fig, {}); } );\n", " button.mouseover('Stop Interaction', toolbar_mouse_event);\n", " buttongrp.append(button);\n", " var titlebar = this.root.find($('.ui-dialog-titlebar'));\n", " titlebar.prepend(buttongrp);\n", "}\n", "\n", "mpl.figure.prototype._root_extra_style = function(el){\n", " var fig = this\n", " el.on(\"remove\", function(){\n", "\tfig.close_ws(fig, {});\n", " });\n", "}\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(el){\n", " // this is important to make the div 'focusable\n", " el.attr('tabindex', 0)\n", " // reach out to IPython and tell the keyboard manager to turn it's self\n", " // off when our div gets focus\n", "\n", " // location in version 3\n", " if (IPython.notebook.keyboard_manager) {\n", " IPython.notebook.keyboard_manager.register_events(el);\n", " }\n", " else {\n", " // location in version 2\n", " IPython.keyboard_manager.register_events(el);\n", " }\n", "\n", "}\n", "\n", "mpl.figure.prototype._key_event_extra = function(event, name) {\n", " var manager = IPython.notebook.keyboard_manager;\n", " if (!manager)\n", " manager = IPython.keyboard_manager;\n", "\n", " // Check for shift+enter\n", " if (event.shiftKey && event.which == 13) {\n", " this.canvas_div.blur();\n", " event.shiftKey = false;\n", " // Send a \"J\" for go to next cell\n", " event.which = 74;\n", " event.keyCode = 74;\n", " manager.command_mode();\n", " manager.handle_keydown(event);\n", " }\n", "}\n", "\n", "mpl.figure.prototype.handle_save = function(fig, msg) {\n", " fig.ondownload(fig, null);\n", "}\n", "\n", "\n", "mpl.find_output_cell = function(html_output) {\n", " // Return the cell and output element which can be found *uniquely* in the notebook.\n", " // Note - this is a bit hacky, but it is done because the \"notebook_saving.Notebook\"\n", " // IPython event is triggered only after the cells have been serialised, which for\n", " // our purposes (turning an active figure into a static one), is too late.\n", " var cells = IPython.notebook.get_cells();\n", " var ncells = cells.length;\n", " for (var i=0; i= 3 moved mimebundle to data attribute of output\n", " data = data.data;\n", " }\n", " if (data['text/html'] == html_output) {\n", " return [cell, data, j];\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "// Register the function which deals with the matplotlib target/channel.\n", "// The kernel may be null if the page has been refreshed.\n", "if (IPython.notebook.kernel != null) {\n", " IPython.notebook.kernel.comm_manager.register_target('matplotlib', mpl.mpl_figure_comm);\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "Text(0.5,0,'Années')" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ax = polution_ts['2008':].resample('A').mean().plot(figsize=(4,4)) # échantillone par année (A pour Annual)\n", "plt.ylim(0,50)\n", "plt.title(\"Evolution de la pollution: \\n moyenne annuelle sur Paris\")\n", "plt.ylabel(\"Concentration (µg/m³)\")\n", "plt.xlabel(\"Années\")" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Chargement des couleurs\n", "sns.set_palette(\"GnBu_d\", n_colors=7)\n", "polution_ts['weekday'] = polution_ts.index.weekday # Monday=0, Sunday=6\n", "\n", "# polution_ts['weekend'] = polution_ts['weekday'].isin([5, 6])\n", "\n", "days = ['Lundi', 'Mardi', 'Mercredi',\n", " 'Jeudi', 'Vendredi', 'Samedi', 'Dimanche']\n", "\n", "polution_week_no2 = polution_ts.groupby(['weekday', polution_ts.index.hour])[\n", " 'NO2'].mean().unstack(level=0)\n", "polution_week_03 = polution_ts.groupby(['weekday', polution_ts.index.hour])[\n", " 'O3'].mean().unstack(level=0)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Int64Index([4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n", " ...\n", " 9, 9, 9, 9, 9, 9, 9, 9, 9, 9],\n", " dtype='int64', name='DateTime', length=73920)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support.' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
')\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", " nav_element.append(button);\n", " }\n", "\n", " // Add the status bar.\n", " var status_bar = $('');\n", " nav_element.append(status_bar);\n", " this.message = status_bar[0];\n", "\n", " // Add the close button to the window.\n", " var buttongrp = $('
');\n", " var button = $('');\n", " button.click(function (evt) { fig.handle_close(fig, {}); } );\n", " button.mouseover('Stop Interaction', toolbar_mouse_event);\n", " buttongrp.append(button);\n", " var titlebar = this.root.find($('.ui-dialog-titlebar'));\n", " titlebar.prepend(buttongrp);\n", "}\n", "\n", "mpl.figure.prototype._root_extra_style = function(el){\n", " var fig = this\n", " el.on(\"remove\", function(){\n", "\tfig.close_ws(fig, {});\n", " });\n", "}\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(el){\n", " // this is important to make the div 'focusable\n", " el.attr('tabindex', 0)\n", " // reach out to IPython and tell the keyboard manager to turn it's self\n", " // off when our div gets focus\n", "\n", " // location in version 3\n", " if (IPython.notebook.keyboard_manager) {\n", " IPython.notebook.keyboard_manager.register_events(el);\n", " }\n", " else {\n", " // location in version 2\n", " IPython.keyboard_manager.register_events(el);\n", " }\n", "\n", "}\n", "\n", "mpl.figure.prototype._key_event_extra = function(event, name) {\n", " var manager = IPython.notebook.keyboard_manager;\n", " if (!manager)\n", " manager = IPython.keyboard_manager;\n", "\n", " // Check for shift+enter\n", " if (event.shiftKey && event.which == 13) {\n", " this.canvas_div.blur();\n", " event.shiftKey = false;\n", " // Send a \"J\" for go to next cell\n", " event.which = 74;\n", " event.keyCode = 74;\n", " manager.command_mode();\n", " manager.handle_keydown(event);\n", " }\n", "}\n", "\n", "mpl.figure.prototype.handle_save = function(fig, msg) {\n", " fig.ondownload(fig, null);\n", "}\n", "\n", "\n", "mpl.find_output_cell = function(html_output) {\n", " // Return the cell and output element which can be found *uniquely* in the notebook.\n", " // Note - this is a bit hacky, but it is done because the \"notebook_saving.Notebook\"\n", " // IPython event is triggered only after the cells have been serialised, which for\n", " // our purposes (turning an active figure into a static one), is too late.\n", " var cells = IPython.notebook.get_cells();\n", " var ncells = cells.length;\n", " for (var i=0; i= 3 moved mimebundle to data attribute of output\n", " data = data.data;\n", " }\n", " if (data['text/html'] == html_output) {\n", " return [cell, data, j];\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "// Register the function which deals with the matplotlib target/channel.\n", "// The kernel may be null if the page has been refreshed.\n", "if (IPython.notebook.kernel != null) {\n", " IPython.notebook.kernel.comm_manager.register_target('matplotlib', mpl.mpl_figure_comm);\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.set_palette(\"GnBu_d\", n_colors=12)\n", "\n", "fig, axes = plt.subplots(2, 1, figsize=(7, 7), sharex=True)\n", "\n", "polution_month_no2.plot(ax=axes[0])\n", "axes[0].set_ylabel(\"Concentration (µg/m³)\")\n", "axes[0].set_xlabel(\"Heure de la journée\")\n", "axes[0].set_title(\n", " \"Profil journalier de la pollution au NO2: effet du weekend?\")\n", "axes[0].set_xticks(np.arange(0, 24))\n", "axes[0].set_xticklabels(np.arange(0, 24), rotation=45)\n", "axes[0].set_ylim(0, 90)\n", "\n", "polution_month_03.plot(ax=axes[1])\n", "axes[1].set_ylabel(\"Concentration (µg/m³)\")\n", "axes[1].set_xlabel(\"Heure de la journée\")\n", "axes[1].set_title(\"Profil journalier de la pollution au O3: effet du weekend?\")\n", "axes[1].set_xticks(np.arange(0, 24))\n", "axes[1].set_xticklabels(np.arange(0, 24), rotation=45)\n", "axes[1].set_ylim(0, 90)\n", "axes[0].legend().set_visible(False)\n", "# ax.legend()\n", "axes[1].legend(labels=calendar.month_name[1:], loc='lower left', bbox_to_anchor=(1, 0.1))\n", "\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas: anlayser des données avec Python " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pour les travaux intensifs en données en Python, la bibliothèque Pandas est devenue essentielle.\n", "\n", "Qu'est ce que pandas? C'est un environnement qui gère des Data Frame:\n", "\n", "- Pandas peut gérer *Data Frame* des tableaux *numpy* avec des étiquettes pour les lignes et les colonnes, et permet une prise en charge des types de données hétérogènes.\n", "- Pandas peut aussi être considéré comme le data.frame de R en Python.\n", "- Puissant pour travailler avec les données manquantes, travailler avec des données chronologiques, pour lire et écrire vos données, pour remodeler, regrouper, fusionner vos données, ...\n", "\n", "Documentation: http://pandas.pydata.org/pandas-docs/stable/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Quand a-t-on besoin de Pandas?\n", "Quand on travaille avec des tableaux ou des structures de données(commme des dataframe R, SQL table, Excel, Spreadsheet, ...):\n", "\n", "- Importer des données\n", "- Nettoyer des données \"sales\" \n", "- Explorer et comprendre des données\n", "- Traiter et preparer les données pour faire une analyse \n", "- Analyser les données (avec en plus scikit-learn, statsmodels,...)\n", "
\n", "
\n", "\n", "**ATTENTION / LIMITES:**\n", "\n", "Pandas est bon pour travailler avec des données hétérogènes et des tableaux 1D/2D, mais tous les types de données ne correspondent pas à ces structures!\n", "\n", "Contre-exemples:\n", "- Quand on travaille avec des données de type **array** (e.g. images): utiliser *numpy*\n", "- Pour des données multidimensionnelles étiquetées (e.g. données de climat): voir [xarray](http://xarray.pydata.org/en/stable/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Les structures de données en pandas : DataFrame et Series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Un DataFrame est une structure de données tabulaire (un objet multidimensionnel pouvant contenir des données étiquetées) composé de lignes et de colonnes, semblable à une feuille de calcul, une table de base de données ou à l'objet data.frame de R. Vous pouvez le considérer comme plusieurs objets Series partageant le même index." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
101113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S
.......................................
87287301Carlsson, Mr. Frans Olofmale33.0006955.0000B51 B53 B55S
87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
\n", "

183 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "1 2 1 1 \n", "3 4 1 1 \n", "6 7 0 1 \n", "10 11 1 3 \n", ".. ... ... ... \n", "872 873 0 1 \n", "879 880 1 1 \n", "887 888 1 1 \n", "889 890 1 1 \n", "\n", " Name Sex Age SibSp \\\n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "6 McCarthy, Mr. Timothy J male 54.0 0 \n", "10 Sandstrom, Miss. Marguerite Rut female 4.0 1 \n", ".. ... ... ... ... \n", "872 Carlsson, Mr. Frans Olof male 33.0 0 \n", "879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "1 0 PC 17599 71.2833 C85 C \n", "3 0 113803 53.1000 C123 S \n", "6 0 17463 51.8625 E46 S \n", "10 1 PP 9549 16.7000 G6 S \n", ".. ... ... ... ... ... \n", "872 0 695 5.0000 B51 B53 B55 S \n", "879 1 11767 83.1583 C50 C \n", "887 0 112053 30.0000 B42 S \n", "889 0 111369 30.0000 C148 C \n", "\n", "[183 rows x 12 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Int64Index([ 1, 3, 6, 10, 11, 21, 23, 27, 52, 54,\n", " ...\n", " 835, 853, 857, 862, 867, 871, 872, 879, 887, 889],\n", " dtype='int64', length=183)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic.index" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',\n", " 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],\n", " dtype='object')" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic.columns" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "PassengerId int64\n", "Survived int64\n", "Pclass int64\n", "Name object\n", " ... \n", "Ticket object\n", "Fare float64\n", "Cabin object\n", "Embarked object\n", "Length: 12, dtype: object" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic.dtypes" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 183 entries, 1 to 889\n", "Data columns (total 12 columns):\n", "PassengerId 183 non-null int64\n", "Survived 183 non-null int64\n", "Pclass 183 non-null int64\n", "Name 183 non-null object\n", "Sex 183 non-null object\n", "Age 183 non-null float64\n", "SibSp 183 non-null int64\n", "Parch 183 non-null int64\n", "Ticket 183 non-null object\n", "Fare 183 non-null float64\n", "Cabin 183 non-null object\n", "Embarked 183 non-null object\n", "dtypes: float64(2), int64(5), object(5)\n", "memory usage: 23.6+ KB\n" ] } ], "source": [ "df_titanic.info()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 891 entries, 0 to 890\n", "Data columns (total 12 columns):\n", "PassengerId 891 non-null int64\n", "Survived 891 non-null int64\n", "Pclass 891 non-null int64\n", "Name 891 non-null object\n", "Sex 891 non-null object\n", "Age 714 non-null float64\n", "SibSp 891 non-null int64\n", "Parch 891 non-null int64\n", "Ticket 891 non-null object\n", "Fare 891 non-null float64\n", "Cabin 204 non-null object\n", "Embarked 889 non-null object\n", "dtypes: float64(2), int64(5), object(5)\n", "memory usage: 83.6+ KB\n" ] } ], "source": [ "# on voit que c'est la variable cabine qui n'est pas bien renseigné, suit après l'âge\n", "df_titanic_raw.info()" ] }, { "cell_type": "code", "execution_count": 140, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[2, 1, 1, ..., 71.2833, 'C85', 'C'],\n", " [4, 1, 1, ..., 53.1, 'C123', 'S'],\n", " [7, 0, 1, ..., 51.8625, 'E46', 'S'],\n", " ...,\n", " [880, 1, 1, ..., 83.1583, 'C50', 'C'],\n", " [888, 1, 1, ..., 30.0, 'B42', 'S'],\n", " [890, 1, 1, ..., 30.0, 'C148', 'C']], dtype=object)" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "array_titanic = df_titanic.values # c'est la liste de valeur /array associé\n", "array_titanic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Données uni-dimensionel : Series (une colonne d'un DataFrame)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Une Series est un support de base pour les données étiquetées unidimensionnelles." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fare = df_titanic['Fare']" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 71.2833\n", "3 53.1000\n", "6 51.8625\n", "10 16.7000\n", " ... \n", "872 5.0000\n", "879 83.1583\n", "887 30.0000\n", "889 30.0000\n", "Name: Fare, Length: 183, dtype: float64" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fare" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Attributs de l'objet *Series*: indices et valeurs" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 71.2833, 53.1 , 51.8625, 16.7 , 26.55 , 13. ,\n", " 35.5 , 263. , 76.7292, 61.9792])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fare.values[:10]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "51.8625" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fare[6] # existe mais fare[0] provoque une erreur, car on l'a enlevé du dataFrame, comme valeur manquante.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Contrairement au tableau *numpy*, cet index peut être autre chose qu'un entier:" ] }, { "cell_type": "code", "execution_count": 142, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassAgeSibSpParchTicketFareCabinEmbarked
Name
Cumings, Mrs. John Bradley (Florence Briggs Thayer)21138.010PC 1759971.2833C85C
Futrelle, Mrs. Jacques Heath (Lily May Peel)41135.01011380353.1000C123S
McCarthy, Mr. Timothy J70154.0001746351.8625E46S
Sandstrom, Miss. Marguerite Rut11134.011PP 954916.7000G6S
.................................
Carlsson, Mr. Frans Olof8730133.0006955.0000B51 B53 B55S
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)8801156.0011176783.1583C50C
Graham, Miss. Margaret Edith8881119.00011205330.0000B42S
Behr, Mr. Karl Howell8901126.00011136930.0000C148C
\n", "

183 rows × 10 columns

\n", "
" ], "text/plain": [ " PassengerId Survived \\\n", "Name \n", "Cumings, Mrs. John Bradley (Florence Briggs Tha... 2 1 \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) 4 1 \n", "McCarthy, Mr. Timothy J 7 0 \n", "Sandstrom, Miss. Marguerite Rut 11 1 \n", "... ... ... \n", "Carlsson, Mr. Frans Olof 873 0 \n", "Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 880 1 \n", "Graham, Miss. Margaret Edith 888 1 \n", "Behr, Mr. Karl Howell 890 1 \n", "\n", " Pclass Age SibSp \\\n", "Name \n", "Cumings, Mrs. John Bradley (Florence Briggs Tha... 1 38.0 1 \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) 1 35.0 1 \n", "McCarthy, Mr. Timothy J 1 54.0 0 \n", "Sandstrom, Miss. Marguerite Rut 3 4.0 1 \n", "... ... ... ... \n", "Carlsson, Mr. Frans Olof 1 33.0 0 \n", "Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 1 56.0 0 \n", "Graham, Miss. Margaret Edith 1 19.0 0 \n", "Behr, Mr. Karl Howell 1 26.0 0 \n", "\n", " Parch Ticket Fare \\\n", "Name \n", "Cumings, Mrs. John Bradley (Florence Briggs Tha... 0 PC 17599 71.2833 \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) 0 113803 53.1000 \n", "McCarthy, Mr. Timothy J 0 17463 51.8625 \n", "Sandstrom, Miss. Marguerite Rut 1 PP 9549 16.7000 \n", "... ... ... ... \n", "Carlsson, Mr. Frans Olof 0 695 5.0000 \n", "Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 1 11767 83.1583 \n", "Graham, Miss. Margaret Edith 0 112053 30.0000 \n", "Behr, Mr. Karl Howell 0 111369 30.0000 \n", "\n", " Cabin Embarked \n", "Name \n", "Cumings, Mrs. John Bradley (Florence Briggs Tha... C85 C \n", "Futrelle, Mrs. Jacques Heath (Lily May Peel) C123 S \n", "McCarthy, Mr. Timothy J E46 S \n", "Sandstrom, Miss. Marguerite Rut G6 S \n", "... ... ... \n", "Carlsson, Mr. Frans Olof B51 B53 B55 S \n", "Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) C50 C \n", "Graham, Miss. Margaret Edith B42 S \n", "Behr, Mr. Karl Howell C148 C \n", "\n", "[183 rows x 10 columns]" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic = df_titanic.set_index('Name')\n", "df_titanic" ] }, { "cell_type": "code", "execution_count": 144, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "26.0" ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "age = df_titanic['Age']\n", "age['Behr, Mr. Karl Howell']" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "35.6744262295082" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "age.mean()" ] }, { "cell_type": "code", "execution_count": 146, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassAgeSibSpParchTicketFareCabinEmbarked
Name
Becker, Master. Richard F184121.002123013639.00F4S
Allison, Master. Hudson Trevor306110.9212113781151.55C22 C26S
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Age SibSp \\\n", "Name \n", "Becker, Master. Richard F 184 1 2 1.00 2 \n", "Allison, Master. Hudson Trevor 306 1 1 0.92 1 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "Name \n", "Becker, Master. Richard F 1 230136 39.00 F4 S \n", "Allison, Master. Hudson Trevor 2 113781 151.55 C22 C26 S " ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic[age <2]" ] }, { "cell_type": "code", "execution_count": 148, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "S 644\n", "C 168\n", "Q 77\n", "Name: Embarked, dtype: int64" ] }, "execution_count": 148, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic_raw['Embarked'].value_counts()" ] }, { "cell_type": "code", "execution_count": 151, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassAgeSibSpParchTicketFareCabinEmbarked
Name
Cumings, Mrs. John Bradley (Florence Briggs Thayer)21138.010PC 1759971.2833C85C
Harper, Mrs. Henry Sleeper (Myna Haxtun)531149.010PC 1757276.7292D33C
Ostby, Mr. Engelhart Cornelius550165.00111350961.9792B30C
Goldschmidt, Mr. George B970171.000PC 1775434.6542A5C
Greenfield, Mr. William Bertram981123.001PC 1775963.3583D10 D12C
Baxter, Mr. Quigg Edmond1190124.001PC 17558247.5208B58 B60C
Giglio, Mr. Victor1400124.000PC 1759379.2000B86C
Smith, Mr. James Clinch1750156.0001776430.6958A7C
Isham, Miss. Ann Elizabeth1780150.000PC 1759528.7125C49C
Brown, Mrs. James Joseph (Margaret Tobin)1951144.000PC 1761027.7208B4C
Lurette, Miss. Elise1961158.000PC 17569146.5208B80C
Blank, Mr. Henry2101140.00011227731.0000A31C
Newell, Miss. Madeleine2161131.01035273113.2750D36C
Bazzani, Miss. Albina2191132.0001181376.2917D15C
Natsch, Mr. Charles H2740137.001PC 1759629.7000C118C
Bishop, Mrs. Dickinson H (Helen Walton)2921119.0101196791.0792B49C
Levy, Mr. Rene Jacques2930236.000SC/Paris 216312.8750DC
Baxter, Mrs. James (Helene DeLaudeniere Chaput)3001150.001PC 17558247.5208B58 B60C
Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)3081117.010PC 17758108.9000C65C
Francatelli, Miss. Laura Mabel3101130.000PC 1748556.9292E36C
Hays, Miss. Margaret Bechstein3111124.0001176783.1583C54C
Ryerson, Miss. Emily Borie3121118.022PC 17608262.3750B57 B59 B63 B66C
Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)3201140.01116966134.5000E34C
Young, Miss. Marie Grice3261136.000PC 17760135.6333C32C
Hippach, Miss. Jean Gertrude3301116.00111136157.9792B18C
Burns, Miss. Elizabeth Margaret3381141.00016966134.5000E40C
Warren, Mrs. Frank Manley (Anna Sophia Atkinson)3671160.01011081375.2500D37C
Aubart, Mme. Leontine Pauline3701124.000PC 1747769.3000B35C
Harder, Mr. George Achilles3711125.0101176555.4417E50C
Widener, Mr. Harry Elkins3780127.002113503211.5000C82C
Newell, Miss. Marjorie3941123.01035273113.2750D36C
Foreman, Mr. Benjamin Laventall4530130.00011305127.7500C111C
Goldenberg, Mr. Samuel L4541149.0101745389.1042C92C
Jerwan, Mrs. Amin S (Marie Marthe Thuillard)4741223.000SC/AH Basle 54113.7917DC
Bishop, Mr. Dickinson H4851125.0101196791.0792B49C
Kent, Mr. Edward Austin4880158.0001177129.7000B37C
Eustis, Miss. Elizabeth Mussey4971154.0103694778.2667D20C
Penasco y Castellana, Mr. Victor de Satode5060118.010PC 17758108.9000C65C
Hippach, Mrs. Louis Albert (Ida Sophia Fischer)5241144.00111136157.9792B18C
Frolicher, Miss. Hedwig Margaritha5401122.0021356849.5000B39C
Douglas, Mr. Walter Donald5450150.010PC 17761106.4250C86C
Thayer, Mr. John Borland Jr5511117.00217421110.8833C70C
Duff Gordon, Lady. (Lucille Christiana Sutherland) (\"Mrs Morgan\")5571148.0101175539.6000A16C
Thayer, Mrs. John Borland (Marian Longstreth Morris)5821139.01117421110.8833C68C
Ross, Mr. John Hugo5840136.0001304940.1250A10C
Frolicher-Stehli, Mr. Maxmillian5881160.0111356779.2000B41C
Stephenson, Mrs. Walter Bertram (Martha Eustis)5921152.0103694778.2667D20C
Duff Gordon, Sir. Cosmo Edmund (\"Mr Morgan\")6001149.010PC 1748556.9292A20C
Stahelin-Maeglin, Dr. Max6331132.0001321430.5000B50C
Sagesser, Mlle. Emma6421124.000PC 1747769.3000B35C
Harper, Mr. Henry Sleeper6461148.010PC 1757276.7292D33C
Simonius-Blumer, Col. Oberst Alfons6481156.0001321335.5000A26C
Newell, Mr. Arthur Webster6600158.00235273113.2750D48C
Cardeza, Mr. Thomas Drake Martinez6801136.001PC 17755512.3292B51 B53 B55C
Hassab, Mr. Hammad6821127.000PC 1757276.7292D49C
Thayer, Mr. John Borland6990149.01117421110.8833C68C
Astor, Mrs. John Jacob (Madeleine Talmadge Force)7011118.010PC 17757227.5250C62 C64C
Mayne, Mlle. Berthe Antonine (\"Mrs de Villiers\")7111124.000PC 1748249.5042C90C
Endres, Miss. Caroline Louise7171138.000PC 17757227.5250C45C
Lesurer, Mr. Gustave J7381135.000PC 17755512.3292B101C
Ryerson, Miss. Susan Parker \"Suzette\"7431121.022PC 17608262.3750B57 B59 B63 B66C
Guggenheim, Mr. Benjamin7900146.000PC 1759379.2000B82 B84C
Compton, Miss. Sara Rebecca8361139.011PC 1775683.1583E49C
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)8801156.0011176783.1583C50C
Behr, Mr. Karl Howell8901126.00011136930.0000C148C
\n", "
" ], "text/plain": [ " PassengerId Survived \\\n", "Name \n", "Cumings, Mrs. John Bradley (Florence Briggs Tha... 2 1 \n", "Harper, Mrs. Henry Sleeper (Myna Haxtun) 53 1 \n", "Ostby, Mr. Engelhart Cornelius 55 0 \n", "Goldschmidt, Mr. George B 97 0 \n", "Greenfield, Mr. William Bertram 98 1 \n", "Baxter, Mr. Quigg Edmond 119 0 \n", "Giglio, Mr. Victor 140 0 \n", "Smith, Mr. James Clinch 175 0 \n", "Isham, Miss. Ann Elizabeth 178 0 \n", "Brown, Mrs. James Joseph (Margaret Tobin) 195 1 \n", "Lurette, Miss. Elise 196 1 \n", "Blank, Mr. Henry 210 1 \n", "Newell, Miss. Madeleine 216 1 \n", "Bazzani, Miss. Albina 219 1 \n", "Natsch, Mr. Charles H 274 0 \n", "Bishop, Mrs. Dickinson H (Helen Walton) 292 1 \n", "Levy, Mr. Rene Jacques 293 0 \n", "Baxter, Mrs. James (Helene DeLaudeniere Chaput) 300 1 \n", "Penasco y Castellana, Mrs. Victor de Satode (Ma... 308 1 \n", "Francatelli, Miss. Laura Mabel 310 1 \n", "Hays, Miss. Margaret Bechstein 311 1 \n", "Ryerson, Miss. Emily Borie 312 1 \n", "Spedden, Mrs. Frederic Oakley (Margaretta Corni... 320 1 \n", "Young, Miss. Marie Grice 326 1 \n", "Hippach, Miss. Jean Gertrude 330 1 \n", "Burns, Miss. Elizabeth Margaret 338 1 \n", "Warren, Mrs. Frank Manley (Anna Sophia Atkinson) 367 1 \n", "Aubart, Mme. Leontine Pauline 370 1 \n", "Harder, Mr. George Achilles 371 1 \n", "Widener, Mr. Harry Elkins 378 0 \n", "Newell, Miss. Marjorie 394 1 \n", "Foreman, Mr. Benjamin Laventall 453 0 \n", "Goldenberg, Mr. Samuel L 454 1 \n", "Jerwan, Mrs. Amin S (Marie Marthe Thuillard) 474 1 \n", "Bishop, Mr. Dickinson H 485 1 \n", "Kent, Mr. Edward Austin 488 0 \n", "Eustis, Miss. Elizabeth Mussey 497 1 \n", "Penasco y Castellana, Mr. Victor de Satode 506 0 \n", "Hippach, Mrs. Louis Albert (Ida Sophia Fischer) 524 1 \n", "Frolicher, Miss. Hedwig Margaritha 540 1 \n", "Douglas, Mr. Walter Donald 545 0 \n", "Thayer, Mr. John Borland Jr 551 1 \n", "Duff Gordon, Lady. (Lucille Christiana Sutherla... 557 1 \n", "Thayer, Mrs. John Borland (Marian Longstreth Mo... 582 1 \n", "Ross, Mr. John Hugo 584 0 \n", "Frolicher-Stehli, Mr. Maxmillian 588 1 \n", "Stephenson, Mrs. Walter Bertram (Martha Eustis) 592 1 \n", "Duff Gordon, Sir. Cosmo Edmund (\"Mr Morgan\") 600 1 \n", "Stahelin-Maeglin, Dr. Max 633 1 \n", "Sagesser, Mlle. Emma 642 1 \n", "Harper, Mr. Henry Sleeper 646 1 \n", "Simonius-Blumer, Col. Oberst Alfons 648 1 \n", "Newell, Mr. Arthur Webster 660 0 \n", "Cardeza, Mr. Thomas Drake Martinez 680 1 \n", "Hassab, Mr. Hammad 682 1 \n", "Thayer, Mr. John Borland 699 0 \n", "Astor, Mrs. John Jacob (Madeleine Talmadge Force) 701 1 \n", "Mayne, Mlle. Berthe Antonine (\"Mrs de Villiers\") 711 1 \n", "Endres, Miss. Caroline Louise 717 1 \n", "Lesurer, Mr. Gustave J 738 1 \n", "Ryerson, Miss. Susan Parker \"Suzette\" 743 1 \n", "Guggenheim, Mr. Benjamin 790 0 \n", "Compton, Miss. Sara Rebecca 836 1 \n", "Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 880 1 \n", "Behr, Mr. Karl Howell 890 1 \n", "\n", " Pclass Age SibSp \\\n", "Name \n", "Cumings, Mrs. John Bradley (Florence Briggs Tha... 1 38.0 1 \n", "Harper, Mrs. Henry Sleeper (Myna Haxtun) 1 49.0 1 \n", "Ostby, Mr. Engelhart Cornelius 1 65.0 0 \n", "Goldschmidt, Mr. George B 1 71.0 0 \n", "Greenfield, Mr. William Bertram 1 23.0 0 \n", "Baxter, Mr. Quigg Edmond 1 24.0 0 \n", "Giglio, Mr. Victor 1 24.0 0 \n", "Smith, Mr. James Clinch 1 56.0 0 \n", "Isham, Miss. Ann Elizabeth 1 50.0 0 \n", "Brown, Mrs. James Joseph (Margaret Tobin) 1 44.0 0 \n", "Lurette, Miss. Elise 1 58.0 0 \n", "Blank, Mr. Henry 1 40.0 0 \n", "Newell, Miss. Madeleine 1 31.0 1 \n", "Bazzani, Miss. Albina 1 32.0 0 \n", "Natsch, Mr. Charles H 1 37.0 0 \n", "Bishop, Mrs. Dickinson H (Helen Walton) 1 19.0 1 \n", "Levy, Mr. Rene Jacques 2 36.0 0 \n", "Baxter, Mrs. James (Helene DeLaudeniere Chaput) 1 50.0 0 \n", "Penasco y Castellana, Mrs. Victor de Satode (Ma... 1 17.0 1 \n", "Francatelli, Miss. Laura Mabel 1 30.0 0 \n", "Hays, Miss. Margaret Bechstein 1 24.0 0 \n", "Ryerson, Miss. Emily Borie 1 18.0 2 \n", "Spedden, Mrs. Frederic Oakley (Margaretta Corni... 1 40.0 1 \n", "Young, Miss. Marie Grice 1 36.0 0 \n", "Hippach, Miss. Jean Gertrude 1 16.0 0 \n", "Burns, Miss. Elizabeth Margaret 1 41.0 0 \n", "Warren, Mrs. Frank Manley (Anna Sophia Atkinson) 1 60.0 1 \n", "Aubart, Mme. Leontine Pauline 1 24.0 0 \n", "Harder, Mr. George Achilles 1 25.0 1 \n", "Widener, Mr. Harry Elkins 1 27.0 0 \n", "Newell, Miss. Marjorie 1 23.0 1 \n", "Foreman, Mr. Benjamin Laventall 1 30.0 0 \n", "Goldenberg, Mr. Samuel L 1 49.0 1 \n", "Jerwan, Mrs. Amin S (Marie Marthe Thuillard) 2 23.0 0 \n", "Bishop, Mr. Dickinson H 1 25.0 1 \n", "Kent, Mr. Edward Austin 1 58.0 0 \n", "Eustis, Miss. Elizabeth Mussey 1 54.0 1 \n", "Penasco y Castellana, Mr. Victor de Satode 1 18.0 1 \n", "Hippach, Mrs. Louis Albert (Ida Sophia Fischer) 1 44.0 0 \n", "Frolicher, Miss. Hedwig Margaritha 1 22.0 0 \n", "Douglas, Mr. Walter Donald 1 50.0 1 \n", "Thayer, Mr. John Borland Jr 1 17.0 0 \n", "Duff Gordon, Lady. (Lucille Christiana Sutherla... 1 48.0 1 \n", "Thayer, Mrs. John Borland (Marian Longstreth Mo... 1 39.0 1 \n", "Ross, Mr. John Hugo 1 36.0 0 \n", "Frolicher-Stehli, Mr. Maxmillian 1 60.0 1 \n", "Stephenson, Mrs. Walter Bertram (Martha Eustis) 1 52.0 1 \n", "Duff Gordon, Sir. Cosmo Edmund (\"Mr Morgan\") 1 49.0 1 \n", "Stahelin-Maeglin, Dr. Max 1 32.0 0 \n", "Sagesser, Mlle. Emma 1 24.0 0 \n", "Harper, Mr. Henry Sleeper 1 48.0 1 \n", "Simonius-Blumer, Col. Oberst Alfons 1 56.0 0 \n", "Newell, Mr. Arthur Webster 1 58.0 0 \n", "Cardeza, Mr. Thomas Drake Martinez 1 36.0 0 \n", "Hassab, Mr. Hammad 1 27.0 0 \n", "Thayer, Mr. John Borland 1 49.0 1 \n", "Astor, Mrs. John Jacob (Madeleine Talmadge Force) 1 18.0 1 \n", "Mayne, Mlle. Berthe Antonine (\"Mrs de Villiers\") 1 24.0 0 \n", "Endres, Miss. Caroline Louise 1 38.0 0 \n", "Lesurer, Mr. Gustave J 1 35.0 0 \n", "Ryerson, Miss. Susan Parker \"Suzette\" 1 21.0 2 \n", "Guggenheim, Mr. Benjamin 1 46.0 0 \n", "Compton, Miss. Sara Rebecca 1 39.0 1 \n", "Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 1 56.0 0 \n", "Behr, Mr. Karl Howell 1 26.0 0 \n", "\n", " Parch Ticket \\\n", "Name \n", "Cumings, Mrs. John Bradley (Florence Briggs Tha... 0 PC 17599 \n", "Harper, Mrs. Henry Sleeper (Myna Haxtun) 0 PC 17572 \n", "Ostby, Mr. Engelhart Cornelius 1 113509 \n", "Goldschmidt, Mr. George B 0 PC 17754 \n", "Greenfield, Mr. William Bertram 1 PC 17759 \n", "Baxter, Mr. Quigg Edmond 1 PC 17558 \n", "Giglio, Mr. Victor 0 PC 17593 \n", "Smith, Mr. James Clinch 0 17764 \n", "Isham, Miss. Ann Elizabeth 0 PC 17595 \n", "Brown, Mrs. James Joseph (Margaret Tobin) 0 PC 17610 \n", "Lurette, Miss. Elise 0 PC 17569 \n", "Blank, Mr. Henry 0 112277 \n", "Newell, Miss. Madeleine 0 35273 \n", "Bazzani, Miss. Albina 0 11813 \n", "Natsch, Mr. Charles H 1 PC 17596 \n", "Bishop, Mrs. Dickinson H (Helen Walton) 0 11967 \n", "Levy, Mr. Rene Jacques 0 SC/Paris 2163 \n", "Baxter, Mrs. James (Helene DeLaudeniere Chaput) 1 PC 17558 \n", "Penasco y Castellana, Mrs. Victor de Satode (Ma... 0 PC 17758 \n", "Francatelli, Miss. Laura Mabel 0 PC 17485 \n", "Hays, Miss. Margaret Bechstein 0 11767 \n", "Ryerson, Miss. Emily Borie 2 PC 17608 \n", "Spedden, Mrs. Frederic Oakley (Margaretta Corni... 1 16966 \n", "Young, Miss. Marie Grice 0 PC 17760 \n", "Hippach, Miss. Jean Gertrude 1 111361 \n", "Burns, Miss. Elizabeth Margaret 0 16966 \n", "Warren, Mrs. Frank Manley (Anna Sophia Atkinson) 0 110813 \n", "Aubart, Mme. Leontine Pauline 0 PC 17477 \n", "Harder, Mr. George Achilles 0 11765 \n", "Widener, Mr. Harry Elkins 2 113503 \n", "Newell, Miss. Marjorie 0 35273 \n", "Foreman, Mr. Benjamin Laventall 0 113051 \n", "Goldenberg, Mr. Samuel L 0 17453 \n", "Jerwan, Mrs. Amin S (Marie Marthe Thuillard) 0 SC/AH Basle 541 \n", "Bishop, Mr. Dickinson H 0 11967 \n", "Kent, Mr. Edward Austin 0 11771 \n", "Eustis, Miss. Elizabeth Mussey 0 36947 \n", "Penasco y Castellana, Mr. Victor de Satode 0 PC 17758 \n", "Hippach, Mrs. Louis Albert (Ida Sophia Fischer) 1 111361 \n", "Frolicher, Miss. Hedwig Margaritha 2 13568 \n", "Douglas, Mr. Walter Donald 0 PC 17761 \n", "Thayer, Mr. John Borland Jr 2 17421 \n", "Duff Gordon, Lady. (Lucille Christiana Sutherla... 0 11755 \n", "Thayer, Mrs. John Borland (Marian Longstreth Mo... 1 17421 \n", "Ross, Mr. John Hugo 0 13049 \n", "Frolicher-Stehli, Mr. Maxmillian 1 13567 \n", "Stephenson, Mrs. Walter Bertram (Martha Eustis) 0 36947 \n", "Duff Gordon, Sir. Cosmo Edmund (\"Mr Morgan\") 0 PC 17485 \n", "Stahelin-Maeglin, Dr. Max 0 13214 \n", "Sagesser, Mlle. Emma 0 PC 17477 \n", "Harper, Mr. Henry Sleeper 0 PC 17572 \n", "Simonius-Blumer, Col. Oberst Alfons 0 13213 \n", "Newell, Mr. Arthur Webster 2 35273 \n", "Cardeza, Mr. Thomas Drake Martinez 1 PC 17755 \n", "Hassab, Mr. Hammad 0 PC 17572 \n", "Thayer, Mr. John Borland 1 17421 \n", "Astor, Mrs. John Jacob (Madeleine Talmadge Force) 0 PC 17757 \n", "Mayne, Mlle. Berthe Antonine (\"Mrs de Villiers\") 0 PC 17482 \n", "Endres, Miss. Caroline Louise 0 PC 17757 \n", "Lesurer, Mr. Gustave J 0 PC 17755 \n", "Ryerson, Miss. Susan Parker \"Suzette\" 2 PC 17608 \n", "Guggenheim, Mr. Benjamin 0 PC 17593 \n", "Compton, Miss. Sara Rebecca 1 PC 17756 \n", "Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 1 11767 \n", "Behr, Mr. Karl Howell 0 111369 \n", "\n", " Fare Cabin \\\n", "Name \n", "Cumings, Mrs. John Bradley (Florence Briggs Tha... 71.2833 C85 \n", "Harper, Mrs. Henry Sleeper (Myna Haxtun) 76.7292 D33 \n", "Ostby, Mr. Engelhart Cornelius 61.9792 B30 \n", "Goldschmidt, Mr. George B 34.6542 A5 \n", "Greenfield, Mr. William Bertram 63.3583 D10 D12 \n", "Baxter, Mr. Quigg Edmond 247.5208 B58 B60 \n", "Giglio, Mr. Victor 79.2000 B86 \n", "Smith, Mr. James Clinch 30.6958 A7 \n", "Isham, Miss. Ann Elizabeth 28.7125 C49 \n", "Brown, Mrs. James Joseph (Margaret Tobin) 27.7208 B4 \n", "Lurette, Miss. Elise 146.5208 B80 \n", "Blank, Mr. Henry 31.0000 A31 \n", "Newell, Miss. Madeleine 113.2750 D36 \n", "Bazzani, Miss. Albina 76.2917 D15 \n", "Natsch, Mr. Charles H 29.7000 C118 \n", "Bishop, Mrs. Dickinson H (Helen Walton) 91.0792 B49 \n", "Levy, Mr. Rene Jacques 12.8750 D \n", "Baxter, Mrs. James (Helene DeLaudeniere Chaput) 247.5208 B58 B60 \n", "Penasco y Castellana, Mrs. Victor de Satode (Ma... 108.9000 C65 \n", "Francatelli, Miss. Laura Mabel 56.9292 E36 \n", "Hays, Miss. Margaret Bechstein 83.1583 C54 \n", "Ryerson, Miss. Emily Borie 262.3750 B57 B59 B63 B66 \n", "Spedden, Mrs. Frederic Oakley (Margaretta Corni... 134.5000 E34 \n", "Young, Miss. Marie Grice 135.6333 C32 \n", "Hippach, Miss. Jean Gertrude 57.9792 B18 \n", "Burns, Miss. Elizabeth Margaret 134.5000 E40 \n", "Warren, Mrs. Frank Manley (Anna Sophia Atkinson) 75.2500 D37 \n", "Aubart, Mme. Leontine Pauline 69.3000 B35 \n", "Harder, Mr. George Achilles 55.4417 E50 \n", "Widener, Mr. Harry Elkins 211.5000 C82 \n", "Newell, Miss. Marjorie 113.2750 D36 \n", "Foreman, Mr. Benjamin Laventall 27.7500 C111 \n", "Goldenberg, Mr. Samuel L 89.1042 C92 \n", "Jerwan, Mrs. Amin S (Marie Marthe Thuillard) 13.7917 D \n", "Bishop, Mr. Dickinson H 91.0792 B49 \n", "Kent, Mr. Edward Austin 29.7000 B37 \n", "Eustis, Miss. Elizabeth Mussey 78.2667 D20 \n", "Penasco y Castellana, Mr. Victor de Satode 108.9000 C65 \n", "Hippach, Mrs. Louis Albert (Ida Sophia Fischer) 57.9792 B18 \n", "Frolicher, Miss. Hedwig Margaritha 49.5000 B39 \n", "Douglas, Mr. Walter Donald 106.4250 C86 \n", "Thayer, Mr. John Borland Jr 110.8833 C70 \n", "Duff Gordon, Lady. (Lucille Christiana Sutherla... 39.6000 A16 \n", "Thayer, Mrs. John Borland (Marian Longstreth Mo... 110.8833 C68 \n", "Ross, Mr. John Hugo 40.1250 A10 \n", "Frolicher-Stehli, Mr. Maxmillian 79.2000 B41 \n", "Stephenson, Mrs. Walter Bertram (Martha Eustis) 78.2667 D20 \n", "Duff Gordon, Sir. Cosmo Edmund (\"Mr Morgan\") 56.9292 A20 \n", "Stahelin-Maeglin, Dr. Max 30.5000 B50 \n", "Sagesser, Mlle. Emma 69.3000 B35 \n", "Harper, Mr. Henry Sleeper 76.7292 D33 \n", "Simonius-Blumer, Col. Oberst Alfons 35.5000 A26 \n", "Newell, Mr. Arthur Webster 113.2750 D48 \n", "Cardeza, Mr. Thomas Drake Martinez 512.3292 B51 B53 B55 \n", "Hassab, Mr. Hammad 76.7292 D49 \n", "Thayer, Mr. John Borland 110.8833 C68 \n", "Astor, Mrs. John Jacob (Madeleine Talmadge Force) 227.5250 C62 C64 \n", "Mayne, Mlle. Berthe Antonine (\"Mrs de Villiers\") 49.5042 C90 \n", "Endres, Miss. Caroline Louise 227.5250 C45 \n", "Lesurer, Mr. Gustave J 512.3292 B101 \n", "Ryerson, Miss. Susan Parker \"Suzette\" 262.3750 B57 B59 B63 B66 \n", "Guggenheim, Mr. Benjamin 79.2000 B82 B84 \n", "Compton, Miss. Sara Rebecca 83.1583 E49 \n", "Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 83.1583 C50 \n", "Behr, Mr. Karl Howell 30.0000 C148 \n", "\n", " Embarked \n", "Name \n", "Cumings, Mrs. John Bradley (Florence Briggs Tha... C \n", "Harper, Mrs. Henry Sleeper (Myna Haxtun) C \n", "Ostby, Mr. Engelhart Cornelius C \n", "Goldschmidt, Mr. George B C \n", "Greenfield, Mr. William Bertram C \n", "Baxter, Mr. Quigg Edmond C \n", "Giglio, Mr. Victor C \n", "Smith, Mr. James Clinch C \n", "Isham, Miss. Ann Elizabeth C \n", "Brown, Mrs. James Joseph (Margaret Tobin) C \n", "Lurette, Miss. Elise C \n", "Blank, Mr. Henry C \n", "Newell, Miss. Madeleine C \n", "Bazzani, Miss. Albina C \n", "Natsch, Mr. Charles H C \n", "Bishop, Mrs. Dickinson H (Helen Walton) C \n", "Levy, Mr. Rene Jacques C \n", "Baxter, Mrs. James (Helene DeLaudeniere Chaput) C \n", "Penasco y Castellana, Mrs. Victor de Satode (Ma... C \n", "Francatelli, Miss. Laura Mabel C \n", "Hays, Miss. Margaret Bechstein C \n", "Ryerson, Miss. Emily Borie C \n", "Spedden, Mrs. Frederic Oakley (Margaretta Corni... C \n", "Young, Miss. Marie Grice C \n", "Hippach, Miss. Jean Gertrude C \n", "Burns, Miss. Elizabeth Margaret C \n", "Warren, Mrs. Frank Manley (Anna Sophia Atkinson) C \n", "Aubart, Mme. Leontine Pauline C \n", "Harder, Mr. George Achilles C \n", "Widener, Mr. Harry Elkins C \n", "Newell, Miss. Marjorie C \n", "Foreman, Mr. Benjamin Laventall C \n", "Goldenberg, Mr. Samuel L C \n", "Jerwan, Mrs. Amin S (Marie Marthe Thuillard) C \n", "Bishop, Mr. Dickinson H C \n", "Kent, Mr. Edward Austin C \n", "Eustis, Miss. Elizabeth Mussey C \n", "Penasco y Castellana, Mr. Victor de Satode C \n", "Hippach, Mrs. Louis Albert (Ida Sophia Fischer) C \n", "Frolicher, Miss. Hedwig Margaritha C \n", "Douglas, Mr. Walter Donald C \n", "Thayer, Mr. John Borland Jr C \n", "Duff Gordon, Lady. (Lucille Christiana Sutherla... C \n", "Thayer, Mrs. John Borland (Marian Longstreth Mo... C \n", "Ross, Mr. John Hugo C \n", "Frolicher-Stehli, Mr. Maxmillian C \n", "Stephenson, Mrs. Walter Bertram (Martha Eustis) C \n", "Duff Gordon, Sir. Cosmo Edmund (\"Mr Morgan\") C \n", "Stahelin-Maeglin, Dr. Max C \n", "Sagesser, Mlle. Emma C \n", "Harper, Mr. Henry Sleeper C \n", "Simonius-Blumer, Col. Oberst Alfons C \n", "Newell, Mr. Arthur Webster C \n", "Cardeza, Mr. Thomas Drake Martinez C \n", "Hassab, Mr. Hammad C \n", "Thayer, Mr. John Borland C \n", "Astor, Mrs. John Jacob (Madeleine Talmadge Force) C \n", "Mayne, Mlle. Berthe Antonine (\"Mrs de Villiers\") C \n", "Endres, Miss. Caroline Louise C \n", "Lesurer, Mr. Gustave J C \n", "Ryerson, Miss. Susan Parker \"Suzette\" C \n", "Guggenheim, Mr. Benjamin C \n", "Compton, Miss. Sara Rebecca C \n", "Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) C \n", "Behr, Mr. Karl Howell C " ] }, "execution_count": 151, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.options.display.max_rows = 70\n", "df_titanic[df_titanic['Embarked']=='C'] # Les passagers montés à Cherbourg n'ont pas des noms gaulois..." ] }, { "cell_type": "code", "execution_count": 152, "metadata": { "collapsed": true }, "outputs": [], "source": [ "pd.options.display.max_rows = 8" ] }, { "cell_type": "code", "execution_count": 157, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.3838383838383838" ] }, "execution_count": 157, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic_raw['Survived'].sum() / df_titanic_raw['Survived'].count()" ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.6721311475409836" ] }, "execution_count": 160, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic['Survived'].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "** Quelle était la proportion de femmes sur le bateau? **" ] }, { "cell_type": "code", "execution_count": 166, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Sex\n", "female 0.352413\n", "male 0.647587\n", "dtype: float64" ] }, "execution_count": 166, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic_raw.groupby(['Sex']).size() / df_titanic_raw['Sex'].count()" ] }, { "cell_type": "code", "execution_count": 165, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassAgeSibSpParchFare
Sex
female431.0286620.7420382.15923627.9157090.6942680.64968244.479818
male454.1473140.1889082.38994830.7266450.4298090.23570225.523893
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Age SibSp Parch \\\n", "Sex \n", "female 431.028662 0.742038 2.159236 27.915709 0.694268 0.649682 \n", "male 454.147314 0.188908 2.389948 30.726645 0.429809 0.235702 \n", "\n", " Fare \n", "Sex \n", "female 44.479818 \n", "male 25.523893 " ] }, "execution_count": 165, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic_raw.groupby(['Sex']).mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data import et export\n", "\n", "Pandas supports nativement une large gamme de formats d'entrée / sortie:\n", "- CSV, text\n", "- SQL database\n", "- Excel\n", "- HDF5\n", "- json\n", "- html\n", "- pickle\n", "- sas, stata\n", "- ..." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# pd.read_csv?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploration" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
88688702Montvila, Rev. Juozasmale27.00021153613.00NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.00B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.45NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.00C148C
89089103Dooley, Mr. Patrickmale32.0003703767.75NaNQ
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name \\\n", "886 887 0 2 Montvila, Rev. Juozas \n", "887 888 1 1 Graham, Miss. Margaret Edith \n", "888 889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" \n", "889 890 1 1 Behr, Mr. Karl Howell \n", "890 891 0 3 Dooley, Mr. Patrick \n", "\n", " Sex Age SibSp Parch Ticket Fare Cabin Embarked \n", "886 male 27.0 0 0 211536 13.00 NaN S \n", "887 female 19.0 0 0 112053 30.00 B42 S \n", "888 female NaN 1 2 W./C. 6607 23.45 NaN S \n", "889 male 26.0 0 0 111369 30.00 C148 C \n", "890 male 32.0 0 0 370376 7.75 NaN Q " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic_raw.tail()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_titanic_raw.head()" ] }, { "cell_type": "code", "execution_count": 169, "metadata": { "collapsed": false }, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support.' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
')\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('