{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "ADioeXA4e9DY" }, "source": [ "# Python для анализа данных" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "Qri5nDu-eSXQ" }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": { "id": "w5l3lSPkeSXN" }, "source": [ "# Библиотека `numpy`\n", "\n", "![440px-NumPy_logo_2020.svg.png]()\n", "\n", "Пакет `numpy` предоставляет $n$-мерные однородные массивы (все элементы одного типа); в них нельзя вставить или удалить элемент в произвольном месте. В `numpy` реализовано много операций над массивами в целом. Если задачу можно решить, произведя некоторую последовательность операций над массивами, то это будет столь же эффективно, как в `C` или `matlab` — львиная доля времени тратится в библиотечных функциях, написанных на `C`.\n", "\n", "\n", "## 1. Одномерные массивы\n", "\n", "#### 1.1 Типы массивов, атрибуты" ] }, { "cell_type": "markdown", "metadata": { "id": "4OYggcLMeSXY" }, "source": [ "Можно преобразовать список в массив." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "60DzfgyMeSXa", "outputId": "a03727c2-4295-43f2-d0ec-07ff0863bc6e" }, "outputs": [ { "data": { "text/plain": [ "(array([0, 2, 1]), numpy.ndarray)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([0, 2, 1])\n", "a, type(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "Zus_bnvFeSXf" }, "source": [ "`print` печатает массивы в удобной форме." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3CY4rY0GeSXg", "outputId": "99ed85f5-0c30-4259-8b0c-c904afa37c5d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 2 1]\n" ] } ], "source": [ "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "GWrTlCJieSXk" }, "source": [ "Класс `ndarray` имеет много методов." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KhZ0kil4eSXl", "outputId": "60efaa7a-91d3-4b63-e4a7-c9097c3fd4e0", "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'T',\n", " '__abs__',\n", " '__add__',\n", " '__and__',\n", " '__array__',\n", " '__array_finalize__',\n", " '__array_function__',\n", " '__array_interface__',\n", " '__array_prepare__',\n", " '__array_priority__',\n", " '__array_struct__',\n", " '__array_ufunc__',\n", " '__array_wrap__',\n", " '__bool__',\n", " '__complex__',\n", " '__contains__',\n", " '__copy__',\n", " '__deepcopy__',\n", " '__delitem__',\n", " '__divmod__',\n", " '__float__',\n", " '__floordiv__',\n", " '__getitem__',\n", " '__iadd__',\n", " '__iand__',\n", " '__ifloordiv__',\n", " '__ilshift__',\n", " '__imatmul__',\n", " '__imod__',\n", " '__imul__',\n", " '__index__',\n", " '__int__',\n", " '__invert__',\n", " '__ior__',\n", " '__ipow__',\n", " '__irshift__',\n", " '__isub__',\n", " '__iter__',\n", " '__itruediv__',\n", " '__ixor__',\n", " '__len__',\n", " '__lshift__',\n", " '__matmul__',\n", " '__mod__',\n", " '__mul__',\n", " '__neg__',\n", " '__or__',\n", " '__pos__',\n", " '__pow__',\n", " '__radd__',\n", " '__rand__',\n", " '__rdivmod__',\n", " '__rfloordiv__',\n", " '__rlshift__',\n", " '__rmatmul__',\n", " '__rmod__',\n", " '__rmul__',\n", " '__ror__',\n", " '__rpow__',\n", " '__rrshift__',\n", " '__rshift__',\n", " '__rsub__',\n", " '__rtruediv__',\n", " '__rxor__',\n", " '__setitem__',\n", " '__setstate__',\n", " '__sub__',\n", " '__truediv__',\n", " '__xor__',\n", " 'all',\n", " 'any',\n", " 'argmax',\n", " 'argmin',\n", " 'argpartition',\n", " 'argsort',\n", " 'astype',\n", " 'base',\n", " 'byteswap',\n", " 'choose',\n", " 'clip',\n", " 'compress',\n", " 'conj',\n", " 'conjugate',\n", " 'copy',\n", " 'ctypes',\n", " 'cumprod',\n", " 'cumsum',\n", " 'data',\n", " 'diagonal',\n", " 'dot',\n", " 'dtype',\n", " 'dump',\n", " 'dumps',\n", " 'fill',\n", " 'flags',\n", " 'flat',\n", " 'flatten',\n", " 'getfield',\n", " 'imag',\n", " 'item',\n", " 'itemset',\n", " 'itemsize',\n", " 'max',\n", " 'mean',\n", " 'min',\n", " 'nbytes',\n", " 'ndim',\n", " 'newbyteorder',\n", " 'nonzero',\n", " 'partition',\n", " 'prod',\n", " 'ptp',\n", " 'put',\n", " 'ravel',\n", " 'real',\n", " 'repeat',\n", " 'reshape',\n", " 'resize',\n", " 'round',\n", " 'searchsorted',\n", " 'setfield',\n", " 'setflags',\n", " 'shape',\n", " 'size',\n", " 'sort',\n", " 'squeeze',\n", " 'std',\n", " 'strides',\n", " 'sum',\n", " 'swapaxes',\n", " 'take',\n", " 'tobytes',\n", " 'tofile',\n", " 'tolist',\n", " 'tostring',\n", " 'trace',\n", " 'transpose',\n", " 'var',\n", " 'view'}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set(dir(a)) - set(dir(object))" ] }, { "cell_type": "markdown", "metadata": { "id": "j2Yro8nMeSXo" }, "source": [ "Наш массив одномерный." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "OMpgblpYeSXp", "outputId": "79d48bf3-26c5-4b0e-caf5-42dc46755289" }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.ndim" ] }, { "cell_type": "markdown", "metadata": { "id": "LB3skyaOeSXs" }, "source": [ "В $n$-мерном случае возвращается кортеж размеров по каждой координате." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ivBL1F1PeSXs", "outputId": "bc327013-63c0-47c8-b3fe-4498dee08e70" }, "outputs": [ { "data": { "text/plain": [ "(3,)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "RalMfmjfeSXv" }, "source": [ "`size` — это полное число элементов в массиве; `len` — размер по первой координате (в 1-мерном случае это то же самое)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "twpTTZ9ieSXw", "outputId": "493638fb-1468-4dcf-c4ac-692922b185a2" }, "outputs": [ { "data": { "text/plain": [ "(3, 3)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(a), a.size" ] }, { "cell_type": "markdown", "metadata": { "id": "eCtLf6OyeSXx" }, "source": [ "`numpy` предоставляет несколько типов для целых (`int16`, `int32`, `int64`) и чисел с плавающей точкой (`float32`, `float64`)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "IHuuDB2peSXy", "outputId": "f55d13d9-fce6-42a4-f4bf-9bcbb8965262" }, "outputs": [ { "data": { "text/plain": [ "(dtype('int64'), 'int64', 8)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.dtype, a.dtype.name, a.itemsize" ] }, { "cell_type": "markdown", "metadata": { "id": "xmBioWwXeSX9" }, "source": [ "Массив чисел с плавающей точкой." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iZ9WpKkVeSX9", "outputId": "e1c12144-7d3b-4d84-8412-19230e99d0ba", "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = np.array([0., 2, 1])\n", "b.dtype" ] }, { "cell_type": "markdown", "metadata": { "id": "EG3m0QNkeSX_" }, "source": [ "Точно такой же массив." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "zB1Gle6peSYA", "outputId": "4978351c-7aa4-490b-a858-9ad2fd1806ff" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 2. 1.]\n" ] } ], "source": [ "c = np.array([0, 2, 1], dtype=np.float64)\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": { "id": "T_gqbazheSYC" }, "source": [ "Преобразование данных" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ok8MtEvaeSYC", "outputId": "99877a6c-19cf-4a86-b70e-326c14fbc6f3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64\n", "[0 2 1]\n", "['0.0' '2.0' '1.0']\n" ] } ], "source": [ "print(c.dtype)\n", "print(c.astype(int))\n", "print(c.astype(str))" ] }, { "cell_type": "markdown", "metadata": { "id": "G1DRYeu5eSX0" }, "source": [ "#### 1.2 Индексация\n", "\n", "Индексировать массив можно обычным образом." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Y9tGXwDIeSX1", "outputId": "cb25adaf-3e7c-4ff2-afb3-e86186d9e949" }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[1]" ] }, { "cell_type": "markdown", "metadata": { "id": "rmKD1RweeSX3" }, "source": [ "Массивы — изменяемые объекты." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QlkGiSjxeSX3", "outputId": "8bb4dfec-01c5-40f0-d378-852198c3c626" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 3 1]\n" ] } ], "source": [ "a[1] = 3\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "Yd31-o_UeSX6" }, "source": [ "Массивы, разумеется, можно использовать в `for` циклах. Но при этом теряется главное преимущество `numpy` — быстродействие. Всегда, когда это возможно, лучше использовать операции над массивами как едиными целыми." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "c4K_PK_OeSX7", "outputId": "6e5570aa-fa22-4136-838f-2a4886e479a0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "3\n", "1\n" ] } ], "source": [ "for i in a:\n", " print(i)" ] }, { "cell_type": "markdown", "metadata": { "id": "SKlWqaBOe0hu" }, "source": [ "**Упражнение:** создайте numpy-массив, состоящий из первых пяти простых чисел, выведите его тип и размер.\n", "\n", "**Решение:**" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fNmCAqpHe0hu", "outputId": "818790e2-f0d1-4496-8fa0-3bfa06b1fb55" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 2 3 5 7 11]\n", "(5,)\n", "int64\n" ] } ], "source": [ "arr = np.array([2, 3, 5, 7, 11])\n", "print(arr)\n", "print(arr.shape)\n", "print(arr.dtype)" ] }, { "cell_type": "markdown", "metadata": { "id": "emMTo7dI3Ovd" }, "source": [ "Довольно часто встречается задача фильтрации массива (или как еще говорят создания `маски массива`). Поэтому рассмотрим пример с маской.\n", "\n", "Пусть дан массив Numpy, и мы хотим удалить из него отрицательные значения." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "id": "JKw7KDF33Ovd" }, "outputs": [], "source": [ "a = np.array([5, 7, -3, 4, 2, -4])" ] }, { "cell_type": "markdown", "metadata": { "id": "PjLy2TaK3Ovd" }, "source": [ "Для того чтобы создать фильтр массива достаточно указать критерий отбора. На выходе мы получим массив из логических значений True и False, в котором нежелательные значения будут помечены как False." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "gnEca4_-3Ove", "outputId": "b6cbc36c-d379-4a3f-bf3e-899b7089db66" }, "outputs": [ { "data": { "text/plain": [ "array([ True, True, False, True, True, False])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a > 0" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "9OF_cWDM3Ove", "outputId": "d0e04b44-f086-4d61-d78f-c29af005cf47" }, "outputs": [ { "data": { "text/plain": [ "array([5, 7, 4, 2])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[a > 0]" ] }, { "cell_type": "markdown", "metadata": { "id": "6n7n_2-n3Ove" }, "source": [ "Кроме того, отфильтрованные значения можно заполнить, например, нулями." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3gF4sDRQ3Ove", "outputId": "8060fa35-02b1-4b44-d3a6-1934691e1c01" }, "outputs": [ { "data": { "text/plain": [ "array([5, 7, 0, 4, 2, 0])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[a < 0] = 0\n", "a" ] }, { "cell_type": "markdown", "metadata": { "id": "ViI6VqnXeSYE" }, "source": [ "#### 1.3 Создание массивов\n", "\n", "Массивы, заполненные нулями или единицами. Часто лучше сначала создать такой массив, а потом присваивать значения его элементам." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "V8yK0FLteSYF", "outputId": "278ad938-6cdd-428a-d9be-d808c509d120" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 0. 0.]\n", "[1 1 1]\n", "[[5 5 5]\n", " [5 5 5]\n", " [5 5 5]]\n" ] } ], "source": [ "a = np.zeros(3)\n", "b = np.ones(3, dtype=np.int64)\n", "c = np.full((3, 3), 5, dtype = int)\n", "print(a)\n", "print(b)\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": { "id": "YW31gGnbeSYI" }, "source": [ "Если нужно создать массив, заполненный нулями, длины и типа другого массива, то можно использовать конструкцию" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0IJVs0EmeSYJ", "outputId": "0f6662cc-79be-4d06-e6f9-48fa94868006" }, "outputs": [ { "data": { "text/plain": [ "array([0, 0, 0])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.zeros_like(b)" ] }, { "cell_type": "markdown", "metadata": { "id": "-nnOn0gNeSYL" }, "source": [ "Функция `arange` подобна `range`. Аргументы могут быть с плавающей точкой. Следует избегать ситуаций, когда *(конец-начало)/шаг* — целое число, потому что в этом случае включение последнего элемента зависит от ошибок округления. Лучше, чтобы конец диапазона был где-то посредине шага." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "gqTdkX18eSYL", "outputId": "478de6cf-8ae3-4432-ffcc-37332b82a066" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 2 4 6 8]\n" ] } ], "source": [ "a = np.arange(0, 9, 2)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7pY-5MmweSYN", "outputId": "550a7b8d-5e35-4813-99bf-799e4b671e3d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 2. 4. 6. 8.]\n" ] } ], "source": [ "b = np.arange(0., 9, 2)\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": { "id": "vgPAKPaqeSYP" }, "source": [ "Последовательности чисел с постоянным шагом можно также создавать функцией `linspace`. Начало и конец диапазона включаются; последний аргумент — число точек." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "l1ruNluSeSYP", "outputId": "00c64d37-44a6-405c-c235-92bde302958e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 2. 4. 6. 8.]\n" ] } ], "source": [ "a = np.linspace(0, 8, 5)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "7t8OsUzt3Ovf" }, "source": [ "Функция `np.random.random()`.\n", "\n", "Данная функция создает массив указанной формы и заполняет его случайными числами с плавающей точкой из непрерывного равномерного распределения в интервале [0, 1)." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "K9nQT1HY3Ovg", "outputId": "a5293cce-2105-4c68-e495-bd22be6584b3" }, "outputs": [ { "data": { "text/plain": [ "array([0.58748789, 0.15412421, 0.38626907, 0.63819212, 0.24204921])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.random(5)" ] }, { "cell_type": "markdown", "metadata": { "id": "98uzmNoj3Ovg" }, "source": [ "Получить значения из интервала [-10, 7) можно следующим образом:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fBtknfnw3Ovg", "outputId": "20f59c53-b2ee-4c14-90bd-27ce06a3a5f6" }, "outputs": [ { "data": { "text/plain": [ "array([[-0.47234907, -0.56408655, -6.59297716, -0.53912027, -6.44575578],\n", " [-7.69606568, -0.69767896, -4.45299145, -0.0114709 , -7.40739129],\n", " [ 6.26038194, 2.99676978, 4.07686008, -5.88453811, 5.97626885],\n", " [-2.16405667, -8.20971631, -2.00891086, 0.69906644, -8.07137645],\n", " [-5.29472676, -9.67574588, -5.21191766, 0.70977685, 2.35969228]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(7 - (-10)) * np.random.random((5, 5)) - 10" ] }, { "cell_type": "markdown", "metadata": { "id": "QFhzbGhb3Ovg" }, "source": [ "Функция `np.random.choice`.\n", "\n", "Генерирует случайную выборку из заданного одномерного массива" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ffjn77Hd3Ovg", "outputId": "dbe6b448-35ce-472a-f244-20e0504689f2" }, "outputs": [ { "data": { "text/plain": [ "array([2, 2, 0, 9])" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.choice(10, 4)" ] }, { "cell_type": "markdown", "metadata": { "id": "zkQDH8XZ3Ovg" }, "source": [ "В параметре p можно задать вероятность появления каждого элемента в выборке:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7TSPRHW03Ovg", "outputId": "00147c7f-1874-4f49-a7bd-0ffbb04dce97" }, "outputs": [ { "data": { "text/plain": [ "array([2, 5, 2, 5])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perm = [0, 0, 0.1, 0.1, 0.3, 0.3, 0.1, 0.1, 0, 0]\n", "np.random.choice(10, 4, p = perm)" ] }, { "cell_type": "markdown", "metadata": { "id": "17KR3_WA3Ovg" }, "source": [ "Параметр replace позволяет указать какими должны быть элементы выборки, False - все уникальные, True - элементы могут повторяться." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iVEbZETZ3Ovg", "outputId": "9029dbf0-b7d6-4c4d-d6fd-823866abc62c" }, "outputs": [ { "data": { "text/plain": [ "array([4, 0, 6, 0, 1, 1, 7])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.choice(10, 7, replace = True)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6OidJf3_3Ovh", "outputId": "1ab8ed83-bc27-4d3e-bd2c-d79882d9f426" }, "outputs": [ { "data": { "text/plain": [ "array(['spam', 'foo'], dtype=', numpy.ufunc)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sin, type(np.sin)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "H5IHOdzVeSYf", "outputId": "5aa08a3a-e08b-40a1-84e1-777fbb03ad5f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 0.90929743 0.84147098]\n" ] } ], "source": [ "print(np.sin(a))" ] }, { "cell_type": "markdown", "metadata": { "id": "Cm7c1VILeSYh" }, "source": [ "Один из операндов может быть скаляром, а не массивом." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "id": "Wz9kCm0feSYh" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 3 2]\n" ] } ], "source": [ "print(a + 1)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "FeacFeQYeSYi", "outputId": "3f6e05ae-caaf-4546-d0cd-9362df58b751" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 4 2]\n" ] } ], "source": [ "print(2 * a)" ] }, { "cell_type": "markdown", "metadata": { "id": "76IhU9CAeSYk" }, "source": [ "Сравнения дают булевы массивы." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "69vVqzSgeSYk", "outputId": "52f7c678-b50b-4877-ae13-6d847bf47b01" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False False False]\n" ] } ], "source": [ "print(a > b)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "DwDMZFhSeSYl", "outputId": "bc0da336-f631-4545-d600-64dc6944bfa3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False True False]\n" ] } ], "source": [ "print(a == b)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "TbjPskMceSYm", "outputId": "2ce352d3-480a-461e-8d98-9c679339a978" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False False False]\n" ] } ], "source": [ "c = a > 5\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": { "id": "vQ05bHN4eSYo" }, "source": [ "Кванторы \"существует\" и \"для всех\"." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "HTMkK7wYeSYo", "outputId": "fc93a0d1-47a9-4e21-dd2e-30c38550bb6c" }, "outputs": [ { "data": { "text/plain": [ "(False, False)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.any(c), np.all(c)" ] }, { "cell_type": "markdown", "metadata": { "id": "YdXbXq--eSYp" }, "source": [ "Модификация на месте." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "brP3tijHe0h4", "outputId": "e233f16c-e575-4b80-bced-b0ad28dbef95" }, "outputs": [ { "data": { "text/plain": [ "array([0, 2, 1])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iNT_nYSveSYq", "outputId": "487b93fa-4c98-4226-ec2d-e8be873ce59c" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 3 2]\n" ] } ], "source": [ "a += 1\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "g73lzOzKe0h4", "outputId": "038fa92b-79f4-4f15-d987-41dd7ffe8f02" }, "outputs": [ { "data": { "text/plain": [ "array([3, 2, 5])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "CItC6JrLeSYr", "outputId": "61f4a55f-da5b-4164-f0b3-e55eb8c47389" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 6 4 10]\n" ] } ], "source": [ "b *= 2\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": { "id": "JtiaPtiGeSYu" }, "source": [ "При выполнении операций над массивами деление на 0 не возбуждает исключения, а даёт значения `np.nan` или `np.inf`." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "uksiTX96eSYu", "outputId": "37ed7b87-2c42-4b43-d700-c579742a92c0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. nan inf -inf]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":1: RuntimeWarning: divide by zero encountered in true_divide\n", " print(np.array([0.0, 0.0, 1.0, -1.0]) / np.array([1.0, 0.0, 0.0, 0.0]))\n", ":1: RuntimeWarning: invalid value encountered in true_divide\n", " print(np.array([0.0, 0.0, 1.0, -1.0]) / np.array([1.0, 0.0, 0.0, 0.0]))\n" ] } ], "source": [ "print(np.array([0.0, 0.0, 1.0, -1.0]) / np.array([1.0, 0.0, 0.0, 0.0]))" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "sXwqQVDHeSYw", "outputId": "0ecb63d5-f780-4ac2-b826-50db4536b8bb" }, "outputs": [ { "data": { "text/plain": [ "(nan, inf, nan, 0.0)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.nan + 1, np.inf + 1, np.inf * 0, 1. / np.inf" ] }, { "cell_type": "markdown", "metadata": { "id": "ukM-LTuAeSYx" }, "source": [ "Сумма и произведение всех элементов массива; максимальный и минимальный элемент; среднее и среднеквадратичное отклонение." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JP9gKcSTe0h5", "outputId": "74fbd68a-5800-448b-93cc-5adde9b5404a" }, "outputs": [ { "data": { "text/plain": [ "array([ 6, 4, 10])" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ZIo9nEFCeSYx", "outputId": "4d1170a6-ca34-4ba3-9fec-cefb2f6930ba" }, "outputs": [ { "data": { "text/plain": [ "(20, 240, 10, 4, 6.666666666666667, 2.494438257849294)" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.sum(), b.prod(), b.max(), b.min(), b.mean(), b.std()" ] }, { "cell_type": "markdown", "metadata": { "id": "rju3ggwJeSYy" }, "source": [ "Имеются встроенные функции" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "H3xDbf6PeSYz", "outputId": "baec065c-9065-42d7-cd91-e134f188ad3f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2.44948974 2. 3.16227766]\n", "[ 403.42879349 54.59815003 22026.46579481]\n", "[1.79175947 1.38629436 2.30258509]\n", "[-0.2794155 -0.7568025 -0.54402111]\n", "2.718281828459045 3.141592653589793\n" ] } ], "source": [ "print(np.sqrt(b))\n", "print(np.exp(b))\n", "print(np.log(b))\n", "print(np.sin(b))\n", "print(np.e, np.pi)" ] }, { "cell_type": "markdown", "metadata": { "id": "g6EBDaBWeSY0" }, "source": [ "Иногда бывает нужно использовать частичные (кумулятивные) суммы. В наших курсах такое может пригодится." ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JzFVH5DIeSY0", "outputId": "0314f209-9510-4f7e-f65d-09119992fafd" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 6 10 20]\n" ] } ], "source": [ "print(b.cumsum())" ] }, { "cell_type": "markdown", "metadata": { "id": "dyaDa9YXeSY1" }, "source": [ "#### 2.2 Сортировка, изменение массивов\n", "\n", "Функция `sort` возвращает отсортированную копию, метод `sort` сортирует на месте." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "i7Lw_nane0h6", "outputId": "6365ef7c-b759-4eb9-8007-d3985ba957ab" }, "outputs": [ { "data": { "text/plain": [ "array([ 6, 4, 10])" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XS7IRW09eSY2", "outputId": "f6228618-b2a8-4041-a7ae-e10214852c84" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 4 6 10]\n", "[ 6 4 10]\n" ] } ], "source": [ "print(np.sort(b))\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QENzFihJeSY3", "outputId": "20d00e78-04b9-4470-f638-2de205afc226" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 4 6 10]\n" ] } ], "source": [ "b.sort()\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": { "id": "ZPjBXM_OeSY4" }, "source": [ "Объединение массивов." ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "plNqd9H6e0h7", "outputId": "bcf1953f-058d-4fd9-cd74-3c1b36f01ef9" }, "outputs": [ { "data": { "text/plain": [ "array([1, 3, 2])" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ZaD7r1qAe0h7", "outputId": "f7f75cec-9a7a-4f8c-b9e0-c358f82c226b" }, "outputs": [ { "data": { "text/plain": [ "array([ 4, 6, 10])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "_SYn0MdseSY4", "outputId": "e8a0d1ee-e7f3-4be2-e0e0-20c16fc72202" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1 3 2 4 6 10]\n" ] } ], "source": [ "a = np.hstack((a, b))\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "cT0JM6lZeSY6" }, "source": [ "Расщепление массива в позициях 3 и 6." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "-DVwnZhFeSY6", "outputId": "c55b8ffa-c25d-4b09-9cef-a08d0ccefcd1" }, "outputs": [ { "data": { "text/plain": [ "[array([1, 3, 2]), array([ 4, 6, 10]), array([], dtype=int64)]" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.hsplit(a, [3, 6])" ] }, { "cell_type": "markdown", "metadata": { "id": "eOZAGZV0eSY7" }, "source": [ "Функции `delete`, `insert` и `append` не меняют массив на месте, а возвращают новый массив, в котором удалены, вставлены в середину или добавлены в конец какие-то элементы." ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "G4iDI8hjeSY8", "outputId": "353ba1c1-1016-4127-ffd0-9699ba67afc4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 3 2 4 6]\n" ] } ], "source": [ "a = np.delete(a, 5)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "y2FW3FAYeSY9", "outputId": "a186616e-1a1c-4d4f-f2b9-860e0f941416" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 3 0 0 2 4 6]\n" ] } ], "source": [ "a = np.insert(a, 2, [0, 0])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "DK6c68IxeSY-", "outputId": "5bc5ec8b-ec3d-4961-d777-754f20e58c64" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 3 0 0 2 4 6 1 2 3]\n" ] } ], "source": [ "a = np.append(a, [1, 2, 3])\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "q669pBe0eSY_" }, "source": [ "#### 2.3 Способы индексации массивов\n", "\n", "Есть несколько способов индексации массива. Вот обычный индекс." ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "oW8SgWFLeSZA", "outputId": "f070500a-badc-48d6-d7a7-f60e5760d22b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]\n" ] } ], "source": [ "a = np.linspace(0, 1, 11)\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7eQ_xryueSZB", "outputId": "a03b4e1c-2b57-4816-a11c-c5eea54efa9d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.2\n" ] } ], "source": [ "b = a[2]\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": { "id": "corcUOljeSZC" }, "source": [ "Диапазон индексов. Создаётся новый заголовок массива, указывающий на те же данные. Изменения, сделанные через такой массив, видны и в исходном массиве." ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "g7sRF3GqeSZC", "outputId": "c0f0da24-fab0-45f6-f0a2-2a642a72249c" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.2 0.3 0.4 0.5]\n" ] } ], "source": [ "b = a[2:6]\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "mNJoJFFOeSZE", "outputId": "59dc0b56-5492-4142-8178-9655b9356300" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.2 0.3 0.4 0.5]\n" ] } ], "source": [ "b[0] = -0.2\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "PdAUJK9XeSZF", "outputId": "1d5dce9a-a2ac-463c-b73b-11a888af07c9" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.1 -0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]\n" ] } ], "source": [ "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "lE9gziSweSZG" }, "source": [ "Диапазон с шагом 2." ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wLEQPJsXeSZH", "outputId": "d9c1120d-97bf-472b-9720-a5e2d7c25b6b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.1 0.3 0.5 0.7 0.9]\n" ] } ], "source": [ "b = a[1:10:2]\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Lw1rqM8WeSZI", "outputId": "3e017d9d-cd19-4627-d70d-0718867d88f2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. -0.1 -0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]\n" ] } ], "source": [ "b[0] = -0.1\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "Nh9e4ZWNeSZK" }, "source": [ "Массив в обратном порядке." ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "LKbKi90neSZK", "outputId": "aa8126fe-1a81-4dcb-dc57-1c83e9f983f2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 -0.2 -0.1 0. ]\n" ] } ], "source": [ "b = a[::-1]\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": { "id": "Im8dqbkIeSZL" }, "source": [ "Подмассиву можно присвоить значение — массив правильного размера или скаляр." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cwQlTL-deSZM", "outputId": "f4f67d2a-0ea6-4b04-cfb9-9ba3c9ef7a0d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0. -0.2 0.3 0. 0.5 0.6 0. 0.8 0.9 1. ]\n" ] } ], "source": [ "a[1:10:3] = 0\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "NkDtCWUgeSZN" }, "source": [ "Тут опять создаётся только новый заголовок, указывающий на те же данные." ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "BSI6oafVeSZO", "outputId": "ec7216c1-aed7-43ae-f3e2-a83c1bb495a3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 0.1 -0.2 0.3 0. 0.5 0.6 0. 0.8 0.9 1. ]\n" ] } ], "source": [ "b = a[:]\n", "b[1] = 0.1\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "Zjq__bNXeSZP" }, "source": [ "Чтобы скопировать и данные массива, нужно использовать метод `copy`." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "VVds6EI5eSZP", "outputId": "480a6d30-e8fd-4c51-b7fc-c9f6b2e45068" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 0.1 0. 0.3 0. 0.5 0.6 0. 0.8 0.9 1. ]\n", "[ 0. 0.1 -0.2 0.3 0. 0.5 0.6 0. 0.8 0.9 1. ]\n" ] } ], "source": [ "b = a.copy()\n", "b[2] = 0\n", "print(b)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "2ynzvLTmeSZR" }, "source": [ "Можно задать список индексов." ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "id": "fRF4VWZmeSZR" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.2 0.3 0.5]\n" ] } ], "source": [ "print(a[[2, 3, 5]])" ] }, { "cell_type": "markdown", "metadata": { "id": "JF2kGGHDeSZT" }, "source": [ "Можно задать булев массив той же величины." ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "id": "yeFzrHxfeSZT" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[False True False True False True True False True True True]\n" ] } ], "source": [ "b = a > 0\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "id": "CS_Xass1eSZU" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.1 0.3 0.5 0.6 0.8 0.9 1. ]\n" ] } ], "source": [ "print(a[b])" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "id": "yvc9keete0iA" }, "outputs": [ { "data": { "text/plain": [ "array([ 0. , 0.1, -0.2, 0.3, 0. , 0.5, 0.6, 0. , 0.8, 0.9, 1. ])" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "id": "G9i7pl7Ue0iB" }, "outputs": [ { "data": { "text/plain": [ "array([False, True, False, True, False, True, True, False, True,\n", " True, True])" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "markdown", "metadata": { "id": "HesCvUkKe0iB" }, "source": [ "**Упражнение:** \n", "1). Создайте массив чисел от $-2\\pi$ до $2\\pi$.\n", "\n", "2). Посчитайте сумму поэлементных квадратов синуса и косинуса для данного массива.\n", "\n", "3). С помощью `np.all` проверьте, что в ответе только единицы.\n", "\n", "**Решение:**" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "id": "P9X8TP-ze0iB" }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.linspace(-2 * np.pi, 2 * np.pi, 20)\n", "np.all((np.sin(x)**2 + np.cos(x)**2).round() == 1)" ] }, { "cell_type": "markdown", "metadata": { "id": "zkYCakcF7ITP" }, "source": [ "**Задание:** Мы увидели, как копировать одномерные массивы. Теперь подумаем над следующей задачей с двумерными массивами. Найдите разницу между `arr[i1:i2, :][:, j1:j2] = 5` и `arr[i1:i2, j1:j2] = 5`." ] }, { "cell_type": "markdown", "metadata": { "id": "Rl-rdZY17mpO" }, "source": [ "Ответ: Разница в том, что в первом случае после первого обращения по индексу произойдет копирование, из-за чего результат не запишется в ожидаемое место." ] }, { "cell_type": "markdown", "metadata": { "id": "DUBkecfkeSZW" }, "source": [ "## 3. Двумерные массивы\n", "\n", "#### 3.1 Создание, простые операции" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "kELaHm5YeSZW", "outputId": "d5644e82-f679-4878-e416-42df3b986f92" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 1.]\n", " [-1. 0.]]\n" ] } ], "source": [ "a = np.array([[0.0, 1.0], [-1.0, 0.0]])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "py9Ei8aSeSZX", "outputId": "4cc6aa24-898c-410f-e010-d364da8905f2" }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.ndim" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0TUKspWjeSZZ", "outputId": "673e25c7-a442-456d-da85-bc65b0f4957a" }, "outputs": [ { "data": { "text/plain": [ "(2, 2)" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a.shape" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "mH7fe1FkeSZa", "outputId": "56251c84-be12-4d7d-eeef-b8cca8bfae74" }, "outputs": [ { "data": { "text/plain": [ "(2, 4)" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(a), a.size" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fYbyOh3geSZc", "outputId": "d77bf277-b7b4-404f-e6e8-2924c2be8b96" }, "outputs": [ { "data": { "text/plain": [ "-1.0" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[1, 0]" ] }, { "cell_type": "markdown", "metadata": { "id": "WaoO-VOLeSZd" }, "source": [ "Атрибуту `shape` можно присвоить новое значение — кортеж размеров по всем координатам. Получится новый заголовок массива; его данные не изменятся." ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Ht7CInofeSZe", "outputId": "4bd6885c-5245-420e-c865-9decc095ac96" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 1. 2. 3.]\n" ] } ], "source": [ "b = np.linspace(0, 3, 4)\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "LmOZ33breSZf", "outputId": "db4e4aa0-bb1e-4d85-8119-76c39b6233a0" }, "outputs": [ { "data": { "text/plain": [ "(4,)" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b.shape" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "_TYvZX8eeSZg", "outputId": "4eb5158e-8e2e-40cf-a586-92bbd1fd67be" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 1.]\n", " [2. 3.]]\n" ] } ], "source": [ "b.shape = 2, 2\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": { "id": "jWVJLH9ZeSZh" }, "source": [ "Можно растянуть в одномерный массив" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "aJdF4hyreSZi", "outputId": "b6d87e4d-c6b5-4a82-b982-e73537165700" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 1. 2. 3.]\n" ] } ], "source": [ "print(b.ravel())" ] }, { "cell_type": "markdown", "metadata": { "id": "ff31TTeHeSZj" }, "source": [ "Арифметические операции поэлементные" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "a6WXKEj4eSZj", "outputId": "ec1259f0-3491-44c1-f6be-cd0dea07fc15" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 2.]\n", " [0. 1.]]\n", "[[ 0. 2.]\n", " [-2. 0.]]\n", "[[ 0. 2.]\n", " [-1. 1.]]\n", "[[0. 1.]\n", " [1. 2.]]\n", "[[0. 2.]\n", " [1. 3.]]\n" ] } ], "source": [ "print(a + 1)\n", "print(a * 2)\n", "print(a + [0, 1]) # второе слагаемое дополняется до матрицы копированием строк\n", "print(a + np.array([[0, 2]]).T) # .T - транспонирование\n", "print(a + b)" ] }, { "cell_type": "markdown", "metadata": { "id": "nLIlrz-BeSZl" }, "source": [ "#### 3.2 Работа с матрицами\n", "\n", "Поэлементное и матричное (только в Python >=3.5) умножение." ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "xAUzdffSeSZl", "outputId": "2dbeb579-5105-48bb-9feb-12ce02f15d63" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 1.]\n", " [-2. 0.]]\n" ] } ], "source": [ "print(a * b)" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "GAI2uX5PeSZn", "outputId": "50ccf93c-20c0-457c-b5e9-ab3c00f0d802" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 2. 3.]\n", " [ 0. -1.]]\n" ] } ], "source": [ "print(a @ b)" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6ERS9kXLeSZp", "outputId": "76a44c70-534a-4eb8-9271-cf950297709f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-1. 0.]\n", " [-3. 2.]]\n" ] } ], "source": [ "print(b @ a)" ] }, { "cell_type": "markdown", "metadata": { "id": "KiDTXGnge0iF" }, "source": [ "**Упражнение:** создайте матрицы $\\begin{pmatrix} -3 & 4 \\\\ 4 & 3 \\end{pmatrix}$ и $\\begin{pmatrix} 2 & 1 \\\\ 1 & 2 \\end{pmatrix}$. Посчитайте их поэлементное и матричное произведения.\n", "\n", "**Решение:**" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Ct4fsUQwe0iF", "outputId": "e20eac8f-d8a5-44db-9906-aea3a9b3b91c" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-6 4]\n", " [ 4 6]]\n", "[[-6 4]\n", " [ 4 6]]\n", "[[-2 5]\n", " [11 10]]\n", "[[-2 11]\n", " [ 5 10]]\n" ] } ], "source": [ "a = np.array([[-3, 4], [4, 3]])\n", "b = np.array([[2, 1], [1, 2]])\n", "print(a * b)\n", "print(b * a)\n", "print(a @ b)\n", "print(b @ a)" ] }, { "cell_type": "markdown", "metadata": { "id": "-3Wm4rIieSZ3" }, "source": [ "Умножение матрицы на вектор." ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "mXgY1mBoeSZ4", "outputId": "222fa6d8-3bef-4ae5-ec4f-5b4416833e5f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. -1.]\n" ] } ], "source": [ "v = np.array([1, -1], dtype=np.float64)\n", "print(b @ v)" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "A0Emfd5ueSZ5", "outputId": "38a74624-7339-4f70-9f84-219a7de3cbc2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1. -1.]\n" ] } ], "source": [ "print(v @ b)" ] }, { "cell_type": "markdown", "metadata": { "id": "uJp4qRrOeSZ6" }, "source": [ "Если у вас Питон более ранней версии, то для работы с матрицами можно использовать класс `np.matrix`, в котором операция умножения реализуется как матричное умножение." ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "a84Su-DneSZ6", "outputId": "8cee68cc-8bd8-44e9-b1d4-b42160ccee3b" }, "outputs": [ { "data": { "text/plain": [ "matrix([[-2, 5],\n", " [11, 10]])" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.matrix(a) * np.matrix(b)" ] }, { "cell_type": "markdown", "metadata": { "id": "ttfS-0_GeSZ7" }, "source": [ "Внешнее произведение $a_{ij}=u_i v_j$" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "hzr6tlLdeSZ8", "outputId": "6b4b15b3-9893-4b62-a249-37a9f0dc1e05" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 2.]\n", "[2. 3. 4.]\n" ] } ], "source": [ "u = np.linspace(1, 2, 2)\n", "v = np.linspace(2, 4, 3)\n", "print(u)\n", "print(v)" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "FcsJmsiGeSZ9", "outputId": "9145d534-3fef-41e5-9244-4bc63020afe2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[2. 3. 4.]\n", " [4. 6. 8.]]\n" ] } ], "source": [ "a = np.outer(u, v)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "Aa_dWlyveSZ-" }, "source": [ "Двумерные массивы, зависящие только от одного индекса: $x_{ij}=u_j$, $y_{ij}=v_i$" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4i6zYBg9eSZ_", "outputId": "3bb8336d-55ba-4eac-a067-76ca2cbe475b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 2.]\n", " [1. 2.]\n", " [1. 2.]]\n", "[[2. 2.]\n", " [3. 3.]\n", " [4. 4.]]\n" ] } ], "source": [ "x, y = np.meshgrid(u, v)\n", "print(x)\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": { "id": "KwK4g9gZeSaA" }, "source": [ "Единичная матрица." ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ST8KOHBWeSaB", "outputId": "b20ac614-d4d1-4e53-99d0-2e0b39a5d00d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 0. 0. 0.]\n", " [0. 1. 0. 0.]\n", " [0. 0. 1. 0.]\n", " [0. 0. 0. 1.]]\n" ] } ], "source": [ "I = np.eye(4)\n", "print(I)" ] }, { "cell_type": "markdown", "metadata": { "id": "zDOuMhIzeSaC" }, "source": [ "Метод `reshape` делает то же самое, что присваивание атрибуту `shape`." ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iV0R_x39eSaC", "outputId": "5eb58827-ccc1-44e9-a13a-533c604a036f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1.]\n" ] } ], "source": [ "print(I.reshape(16))" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "uIEi4h1MeSaD", "outputId": "b2115a77-15f9-4f30-96c3-3bd8eb13e204" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 0. 0. 0. 0. 1. 0. 0.]\n", " [0. 0. 1. 0. 0. 0. 0. 1.]]\n" ] } ], "source": [ "print(I.reshape(2, 8))" ] }, { "cell_type": "markdown", "metadata": { "id": "HfQZjZZleSaE" }, "source": [ "Строка." ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dU-B5CsdeSaE", "outputId": "60595b08-120f-4fe0-dc1f-7dcc14e3b197" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 1. 0. 0.]\n" ] } ], "source": [ "print(I[1])" ] }, { "cell_type": "markdown", "metadata": { "id": "ptRot99NeSaF" }, "source": [ "Цикл по строкам." ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "CQ3h7yA0eSaG", "outputId": "52d3da4f-42cc-4736-d7c4-9dd457c396ba" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 0. 0. 0.]\n", "[0. 1. 0. 0.]\n", "[0. 0. 1. 0.]\n", "[0. 0. 0. 1.]\n" ] } ], "source": [ "for row in I:\n", " print(row)" ] }, { "cell_type": "markdown", "metadata": { "id": "V6lshMkHeSaH" }, "source": [ "Столбец." ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "orbMt6WMeSaH", "outputId": "cbd660ee-cdba-4595-85e4-109706cde7e1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 0. 1. 0.]\n" ] } ], "source": [ "print(I[:, 2])" ] }, { "cell_type": "markdown", "metadata": { "id": "nqnUtI-SeSaI" }, "source": [ "Подматрица." ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "p8ycTAw3eSaI", "outputId": "868c26e6-726e-4e79-9c74-efd0cfda2ba8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 0.]\n", " [1. 0.]]\n" ] } ], "source": [ "print(I[0:2, 1:3])" ] }, { "cell_type": "markdown", "metadata": { "id": "wDDkZc3WeSaJ" }, "source": [ "Можно построить двумерный массив из функции." ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KXCnqDLKeSaK", "outputId": "c57a3061-3341-489a-8aed-07b7bb20410a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 0 0 0]\n", " [1 1 1 1]\n", " [2 2 2 2]\n", " [3 3 3 3]]\n", "[[0 1 2 3]\n", " [0 1 2 3]\n", " [0 1 2 3]\n", " [0 1 2 3]]\n", "[[ 0 1 2 3]\n", " [10 11 12 13]\n", " [20 21 22 23]\n", " [30 31 32 33]]\n" ] } ], "source": [ "def f(i, j):\n", " print(i)\n", " print(j)\n", " return 10 * i + j\n", "\n", "print(np.fromfunction(f, (4, 4), dtype=np.int64))" ] }, { "cell_type": "markdown", "metadata": { "id": "OFz0xwrreSaL" }, "source": [ "Транспонированная матрица." ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Z4J5zzvNeSaL", "outputId": "67475b6a-0bac-4452-fc0d-51722b42f6a2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[2 1]\n", " [1 2]]\n" ] } ], "source": [ "print(b.T)" ] }, { "cell_type": "markdown", "metadata": { "id": "yGeRknu3eSaM" }, "source": [ "Соединение матриц по горизонтали и по вертикали." ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ObH0A6B3eSaM", "outputId": "9e711514-e2c5-4496-f0b2-f36a443be012" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1]\n", " [2 3]]\n", "[[4 5 6]\n", " [7 8 9]]\n", "[[4 5]\n", " [6 7]\n", " [8 9]]\n" ] } ], "source": [ "a = np.array([[0, 1], [2, 3]])\n", "b = np.array([[4, 5, 6], [7, 8, 9]])\n", "c = np.array([[4, 5], [6, 7], [8, 9]])\n", "print(a)\n", "print(b)\n", "print(c)" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0cpapPNveSaN", "outputId": "476034c1-18b5-410b-e2d8-7b2b9e715f71" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1 4 5 6]\n", " [2 3 7 8 9]]\n" ] } ], "source": [ "print(np.hstack((a, b)))" ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "EPxXHWTAeSaO", "outputId": "5f85e045-fabd-4b8c-ffe6-be7664468ede" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1]\n", " [2 3]\n", " [4 5]\n", " [6 7]\n", " [8 9]]\n" ] } ], "source": [ "print(np.vstack((a, c)))" ] }, { "cell_type": "markdown", "metadata": { "id": "i3SBcbGzeSaP" }, "source": [ "Сумма всех элементов; суммы столбцов; суммы строк." ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "V4Mgz9GIe0iL", "outputId": "88b3d22a-d71c-4fad-f134-c25a24471f23" }, "outputs": [ { "data": { "text/plain": [ "array([[4, 5, 6],\n", " [7, 8, 9]])" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "kQLBs3mHeSaP", "outputId": "79299428-0996-4211-b731-79b77e1e8313" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "39\n", "[11 13 15]\n", "[15 24]\n" ] } ], "source": [ "print(b.sum())\n", "print(b.sum(axis=0))\n", "print(b.sum(axis=1))" ] }, { "cell_type": "markdown", "metadata": { "id": "rc8otaIBeSaQ" }, "source": [ "Аналогично работают `prod`, `max`, `min` и т.д." ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "822so2-9eSaQ", "outputId": "9ed50917-5ad7-4e84-8a8d-4d086e43d763" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9\n", "[7 8 9]\n", "[4 7]\n" ] } ], "source": [ "print(b.max())\n", "print(b.max(axis=0))\n", "print(b.min(axis=1))" ] }, { "cell_type": "markdown", "metadata": { "id": "AlzAeJI-eSaR" }, "source": [ "След - сумма диагональных элементов." ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "-BlaKobheSaR", "outputId": "41709f85-0265-4f22-b504-9b2e358fb6a4" }, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.trace(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "nAr4Sa4Te0iM" }, "source": [ "**Упражнение:** \n", "\n", "в статистике и машинном обучении часто приходится иметь с функцией $RSS$, которая вычисляется по формуле $\\sum_{i=1}^{n} (y_i - a_i)^2$, где $y_i$ — координаты одномерного вектора $y$, $a_i$ — координаты одномерного вектора $a$. Посчитайте $RSS$ для $y = (1, 2, 3, 4, 5), a = (3, 2, 1, 0, -1)$.\n", "\n", "**Решение:**" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "id": "wdiG3YHQe0iM" }, "outputs": [], "source": [ "# решение\n", "y = np.arange(1, 6)\n", "a = np.arange(3, -2, -1)\n", "rss = np.sum((y - a)**2)" ] }, { "cell_type": "markdown", "metadata": { "id": "7VULr5jSeSaS" }, "source": [ "## 4. Тензоры (многомерные массивы)" ] }, { "cell_type": "markdown", "metadata": { "id": "3wwvC1UOQppZ" }, "source": [ "#### 4.1 Создание, простые операции" ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "_Vx6sZvjeSaT", "outputId": "270e56a2-15d3-40a7-bc12-27b56d508c93" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[[ 0 1 2 3]\n", " [ 4 5 6 7]\n", " [ 8 9 10 11]]\n", "\n", " [[12 13 14 15]\n", " [16 17 18 19]\n", " [20 21 22 23]]]\n" ] } ], "source": [ "X = np.arange(24).reshape(2, 3, 4)\n", "print(X)" ] }, { "cell_type": "markdown", "metadata": { "id": "6-a_IMz5eSaU" }, "source": [ "Суммирование (аналогично остальные операции)" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XnTt-275eSaU", "outputId": "14391cab-ea20-4806-ac30-f2ebaa37d0f7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[12 14 16 18]\n", " [20 22 24 26]\n", " [28 30 32 34]]\n" ] } ], "source": [ "# суммируем только по нулевой оси, то есть для фиксированных j и k \n", "# суммируем только элементы с индексами (*, j, k)\n", "print(X.sum(axis=0))" ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "O5m-SV4Ge0iN", "outputId": "e9001e81-f3b5-47af-f581-a8688deabc05" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 66 210]\n" ] } ], "source": [ "# суммируем сразу по двум осям, то есть для фиксированной i \n", "# суммируем только элементы с индексами (i, *, *)\n", "print(X.sum(axis=(1, 2)))" ] }, { "cell_type": "markdown", "metadata": { "id": "-FWPCnqzQppZ" }, "source": [ "#### 4.2. Broadcasting" ] }, { "cell_type": "markdown", "metadata": { "id": "uPjV3yjKQppZ" }, "source": [ "Выше при арифметических операциях с массивами, например, при сложении и умножении, мы перемножали массивы одинаковой формы. В самом простом случае операндами были одномерные массивы одинаковой длины." ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "vjFxZY6JQppa", "outputId": "ef8baac4-3a40-4554-f441-36f90d013e06" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2 4 6]\n" ] } ], "source": [ "# Самый простой случай\n", "a = np.array([1, 2, 3])\n", "b = np.array([2, 2, 2])\n", "print(a * b)" ] }, { "cell_type": "markdown", "metadata": { "id": "CFUBi7CDQppa" }, "source": [ "Произошло поэлементное умножение, все элементы массива $a$ умножились на $2$. Но мы знаем, что это можно сделать проще, просто умножив массив на $2$." ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "eSGmLzA5Qppa", "outputId": "a58ccfb2-5b71-4e9f-bef2-11b43fd01cde" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2 4 6]\n" ] } ], "source": [ "# Умножение массива на число\n", "print(a * 2)" ] }, { "cell_type": "markdown", "metadata": { "id": "u7_r9j_9Qppb" }, "source": [ "На самом деле поведение будет аналогичным, если умножить одномерный массив на массив длины $1$." ] }, { "cell_type": "code", "execution_count": 127, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "yg6006goQppb", "outputId": "6b80a310-14ec-47e6-b293-b835d55cc4c7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2 4 6]\n" ] } ], "source": [ "# Умножение массивов разных длин\n", "print(a * [2])" ] }, { "cell_type": "markdown", "metadata": { "id": "FoaMvoBmQppc" }, "source": [ "В этом случае работает так называемый *broadcasting*. Один массив \"растягивается\", чтобы повторить форму другого.\n", "\n", "![theory.broadcast_1.gif]()" ] }, { "cell_type": "markdown", "metadata": { "id": "vzmLw9c3Qppc" }, "source": [ "Такой же эффект работает и для многомерных массивов. Если по какому-то измерению размер у одного массива равен $1$, а у другого — произвольный, то по этому измерению может произойти \"рястяжение\". Таким образом, массивы можно умножать друг на друга, если в измерениях, где они по размеру не совпадают, хотя бы у одного размер $1$. Для других поэлементных операций правило аналогично.\n", "\n", "Важно отметить, что размерности сопоставляются справа налево. Если их количество не совпадает, что массивы меньшей размерности сначала дополняются слева размерностями 1. Например, при сложении массива размера $4 \\times 3$ с массивом размера $3$ последний сначала преобразуется в массив размера $1 \\times 3$." ] }, { "cell_type": "code", "execution_count": 128, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "MPaw_M3iQppc", "outputId": "3111ab64-9c1d-45f9-c403-9957b06adae0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0 1 2]\n", " [10 11 12]\n", " [20 21 22]\n", " [30 31 32]]\n" ] } ], "source": [ "\n", "a = np.array([[ 0, 0, 0],\n", " [10, 10, 10],\n", " [20, 20, 20],\n", " [30, 30, 30]])\n", "\n", "b = np.array([0, 1, 2])\n", "\n", "print(a + b)" ] }, { "cell_type": "markdown", "metadata": { "id": "NVxT_nHjQppd" }, "source": [ "Схематично проведенную операцию можно визуализировать следующим образом.\n", "\n", "![theory.broadcast_2.gif]()\n", "\n", "\n", "Если неединичные размерности справа не будут совпадать, то выполнить операцию уже не получится. Например, как приведено на схеме ниже. \n", "\n", "![theory.broadcast_3.gif]()\n" ] }, { "cell_type": "markdown", "metadata": { "id": "p-EYlh-03Ov4" }, "source": [ "А если размеры будут не совместимы, то произойдет ошибка." ] }, { "cell_type": "code", "execution_count": 129, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 182 }, "id": "jPo3OiBZ3Ov4", "outputId": "7e261607-3764-4e80-ffca-2dcd6e7e028b" }, "outputs": [ { "ename": "ValueError", "evalue": "operands could not be broadcast together with shapes (4,3) (4,) ", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4.0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0ma\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: operands could not be broadcast together with shapes (4,3) (4,) " ] } ], "source": [ "b = np.array([1.0, 2.0, 3.0, 4.0])\n", "a + b" ] }, { "cell_type": "markdown", "metadata": { "id": "Hn5BEycv3Ov4" }, "source": [ "Если массивы имеют несовместимый размер, можно их сначала привести к одной форме" ] }, { "cell_type": "code", "execution_count": 130, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "BsLoImoW3Ov4", "outputId": "f1f38273-63b0-420c-b924-4b9bdc3a2c57" }, "outputs": [ { "data": { "text/plain": [ "array([[ 1., 2., 3.],\n", " [11., 12., 13.],\n", " [21., 22., 23.],\n", " [31., 32., 33.]])" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([0.0, 10.0, 20.0, 30.0])\n", "b = np.array([1.0, 2.0, 3.0])\n", "a.reshape((-1, 1)) + b" ] }, { "cell_type": "markdown", "metadata": { "id": "H_FZ39gJQppd" }, "source": [ "**Упражнение:**\n", "\n", "Подумайте, массив какого размера получится, если перемножить массив $4 \\times 1 \\times 3$ и массив $12 \\times 1$. Убедитесь на практике в правильности вашего ответа." ] }, { "cell_type": "code", "execution_count": 131, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4oVm6BevQppd", "outputId": "8914af53-d50e-47f7-fc38-d36927759fa6" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(4, 12, 3)\n" ] } ], "source": [ "# решение\n", "a = np.ones((4, 1, 3))\n", "b = np.ones((12, 1))\n", "\n", "mul_shape = (a * b).shape\n", "print(mul_shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "sWPEBlaKQppe" }, "source": [ "*Замечание*\n", "\n", "Знать про broadcasting нужно, но пользоваться им надо с осторожностью. Многократное копирование массива при растяжении может привести к неэффективной работе программы по памяти. Особенно за этим приходится следить при работе с GPU." ] }, { "cell_type": "markdown", "metadata": { "id": "qpZqvsLu3Ov5" }, "source": [ "Часто при работе с массивами NumPy требуется добавлять новые оси измерений и удалять существующие. В NumPy добавлять новые оси иногда удобнее с помощью специального объекта `newaxis`. Например, пусть у нас есть одномерный массив:" ] }, { "cell_type": "code", "execution_count": 132, "metadata": { "id": "ptHIMxD73Ov5" }, "outputs": [], "source": [ "a = np.array([1,2,3,4,5,6,7,8,9,10])" ] }, { "cell_type": "markdown", "metadata": { "id": "wbMn6eA93Ov5" }, "source": [ "У него одна ось – одно измерение. Добавим еще одну ось, допустим, в начало. С помощью объекта np.newaxis это можно сделать так:" ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "SEVHK1rc3Ov5", "outputId": "06ea936c-92b4-4b7b-e0a3-5b550202936f" }, "outputs": [ { "data": { "text/plain": [ "(1, 10)" ] }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = a[np.newaxis, :] # добавление оси axis0\n", "b.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "osZG_dcW3Ov5" }, "source": [ "Или, можно прописать сразу две оси:" ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "00aotrkP3Ov5", "outputId": "18d662d3-6e0f-4d7c-e853-171350107a96" }, "outputs": [ { "data": { "text/plain": [ "(1, 10, 1)" ] }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = a[np.newaxis, :, np.newaxis]\n", "c.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "tB4Ygdpd3Ov5" }, "source": [ "Функция `concatenate()` соединяет массивы вдоль указанной оси." ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7y-g1SY53Ov5", "outputId": "c3604b9c-06a1-43cf-aea5-956088649b3a" }, "outputs": [ { "data": { "text/plain": [ "((3, 2), (1, 2), (3, 1))" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([[1, 2], [3, 4], [5, 6]])\n", "b = np.array([[0, 0]])\n", "c = np.array([[0], [0], [0]])\n", "a.shape, b.shape, c.shape" ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "yTpVNtFd3Ov5", "outputId": "e2637b1a-7c1d-45d0-9e69-bfff33b1e32b" }, "outputs": [ { "data": { "text/plain": [ "array([[1, 2],\n", " [3, 4],\n", " [5, 6],\n", " [0, 0]])" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.concatenate([a, b])" ] }, { "cell_type": "code", "execution_count": 137, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Qzy022-P3Ov6", "outputId": "b67d17aa-4164-4147-cff4-a919b35023df" }, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 0],\n", " [3, 4, 0],\n", " [5, 6, 0]])" ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.concatenate([a, c], axis = 1)" ] }, { "cell_type": "markdown", "metadata": { "id": "N2GHweQU3Ov6" }, "source": [ "Функция `split()` разбивает массив на несколько подмассивов." ] }, { "cell_type": "code", "execution_count": 138, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7tNf7gPp3Ov6", "outputId": "dd9cbf8d-a276-4afc-8c03-35a3aaa5c1f1" }, "outputs": [ { "data": { "text/plain": [ "[array([0, 1]), array([2, 3]), array([4, 5]), array([6, 7]), array([8, 9])]" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(10)\n", "np.split(a, 5)" ] }, { "cell_type": "markdown", "metadata": { "id": "OiAbD2Kq3Ov6" }, "source": [ "Функция `array_split()` тоже разбивает массив на несколько подмассивов. Единственное отличие данной функции от функции split заключается в снятии ограничений на параметр `indices_or_sections`. В функции `array_split` параметр indices_or_sections может быть равен числу, которое не делит нацело длинну указанной оси." ] }, { "cell_type": "markdown", "metadata": { "id": "fQu4JWrkeSaV" }, "source": [ "## 5. Линейная алгебра" ] }, { "cell_type": "code", "execution_count": 139, "metadata": { "id": "KJ9qk8KBe0iO" }, "outputs": [], "source": [ "a = np.array([[0, 1], [2, 3]])" ] }, { "cell_type": "code", "execution_count": 140, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QfoGKtHdeSaV", "outputId": "7931ab04-ea1b-4cf4-e6f5-0331bbf06cce" }, "outputs": [ { "data": { "text/plain": [ "-2.0" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.det(a)" ] }, { "cell_type": "markdown", "metadata": { "id": "fh6WDlobeSaW" }, "source": [ "Обратная матрица." ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6_JMYc5NeSaX", "outputId": "17aa79fe-c804-4b1a-c6e6-836d4dfec4a7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-1.5 0.5]\n", " [ 1. 0. ]]\n" ] } ], "source": [ "a1 = np.linalg.inv(a)\n", "print(a1)" ] }, { "cell_type": "code", "execution_count": 142, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "9RK1Y8tOeSaY", "outputId": "c89fefe9-326c-4847-a7ec-d71ac9b2aa9d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 0.]\n", " [0. 1.]]\n", "[[1. 0.]\n", " [0. 1.]]\n" ] } ], "source": [ "print(a @ a1)\n", "print(a1 @ a)" ] }, { "cell_type": "markdown", "metadata": { "id": "hjrrv-DqeSaY" }, "source": [ "Решение линейной системы $au=v$." ] }, { "cell_type": "code", "execution_count": 143, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "gXVo3PmCeSaZ", "outputId": "afddb964-ad3e-4708-c362-d81e6e3f329d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.5 0. ]\n" ] } ], "source": [ "v = np.array([0, 1], dtype=np.float64)\n", "print(a1 @ v)" ] }, { "cell_type": "code", "execution_count": 144, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "CH5xU8hXeSaa", "outputId": "7c2a8ab6-8822-420e-e01f-b71b72a82a26" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.5 0. ]\n" ] } ], "source": [ "u = np.linalg.solve(a, v)\n", "print(u)" ] }, { "cell_type": "markdown", "metadata": { "id": "n4Qrc4JCeSab" }, "source": [ "Проверим." ] }, { "cell_type": "code", "execution_count": 145, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "FzmdANWPeSab", "outputId": "e697b3e9-980e-45b9-91f0-21985009e27d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 0.]\n" ] } ], "source": [ "print(a @ u - v)" ] }, { "cell_type": "markdown", "metadata": { "id": "nr9z4BbGeSac" }, "source": [ "Собственные значения и собственные векторы: $a u_i = \\lambda_i u_i$. `l` — одномерный массив собственных значений $\\lambda_i$, столбцы матрицы $u$ — собственные векторы $u_i$." ] }, { "cell_type": "code", "execution_count": 146, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0W3HKMpreSac", "outputId": "4f4ed2e3-2f6e-4eea-8044-a4a92778a7d0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.56155281 3.56155281]\n" ] } ], "source": [ "l, u = np.linalg.eig(a)\n", "print(l)" ] }, { "cell_type": "code", "execution_count": 147, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iDz1izeneSad", "outputId": "c1af58ec-32f8-4b74-bb5d-9567c0813f27" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-0.87192821 -0.27032301]\n", " [ 0.48963374 -0.96276969]]\n" ] } ], "source": [ "print(u)" ] }, { "cell_type": "markdown", "metadata": { "id": "dNGJ88UCeSae" }, "source": [ "Проверим." ] }, { "cell_type": "code", "execution_count": 148, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "OAbVlo3PeSae", "outputId": "b242b2b2-a3e4-4b3d-8109-742912a4b718" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.00000000e+00 1.66533454e-16]\n", "[ 0.0000000e+00 -4.4408921e-16]\n" ] } ], "source": [ "for i in range(2):\n", " print(a @ u[:, i] - l[i] * u[:, i])" ] }, { "cell_type": "markdown", "metadata": { "id": "CTiDbQI_eSaf" }, "source": [ "Функция `diag` от одномерного массива строит диагональную матрицу; от квадратной матрицы — возвращает одномерный массив её диагональных элементов." ] }, { "cell_type": "code", "execution_count": 149, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "gHJlMqRIeSag", "outputId": "a51971a9-50e1-432a-b33a-1cb8af28411b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-0.56155281 0. ]\n", " [ 0. 3.56155281]]\n", "[-0.56155281 3.56155281]\n" ] } ], "source": [ "L = np.diag(l)\n", "print(L)\n", "print(np.diag(L))" ] }, { "cell_type": "markdown", "metadata": { "id": "tOKmUQDyeSah" }, "source": [ "Все уравнения $a u_i = \\lambda_i u_i$ можно собрать в одно матричное уравнение $a u = u \\Lambda$, где $\\Lambda$ — диагональная матрица с собственными значениями $\\lambda_i$ по диагонали." ] }, { "cell_type": "code", "execution_count": 150, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "qXSUnhqweSah", "outputId": "610a4e48-1aa3-4a84-e4e0-7c549e96de1b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0.00000000e+00 0.00000000e+00]\n", " [ 1.66533454e-16 -4.44089210e-16]]\n" ] } ], "source": [ "print(a @ u - u @ L)" ] }, { "cell_type": "markdown", "metadata": { "id": "b1vsNPo8eSai" }, "source": [ "Поэтому $u^{-1} a u = \\Lambda$." ] }, { "cell_type": "code", "execution_count": 151, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "D8aJ2LeUeSai", "outputId": "d59eedf4-db0a-4a3e-f5ea-9475a59f2a60" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-5.61552813e-01 2.77555756e-17]\n", " [-2.22044605e-16 3.56155281e+00]]\n" ] } ], "source": [ "print(np.linalg.inv(u) @ a @ u)" ] }, { "cell_type": "markdown", "metadata": { "id": "GDV31jreeSaj" }, "source": [ "Найдём теперь левые собственные векторы $v_i a = \\lambda_i v_i$. Собственные значения $\\lambda_i$ те же самые." ] }, { "cell_type": "code", "execution_count": 152, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1HyE2M51eSaj", "outputId": "9510275d-9260-4abd-ebcd-ec8b387d8304" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.56155281 3.56155281]\n", "[[-0.96276969 -0.48963374]\n", " [ 0.27032301 -0.87192821]]\n" ] } ], "source": [ "l, v = np.linalg.eig(a.T)\n", "print(l)\n", "print(v)" ] }, { "cell_type": "markdown", "metadata": { "id": "kvTFZXzieSak" }, "source": [ "Собственные векторы нормированы на 1." ] }, { "cell_type": "code", "execution_count": 153, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "mtRpa6CSeSak", "outputId": "34e6b625-1c7e-4ce5-c414-15a6052ad885" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. -0.23570226]\n", " [-0.23570226 1. ]]\n", "[[1. 0.23570226]\n", " [0.23570226 1. ]]\n" ] } ], "source": [ "print(u.T @ u)\n", "print(v.T @ v)" ] }, { "cell_type": "markdown", "metadata": { "id": "cF_vDIO0eSal" }, "source": [ "Левые и правые собственные векторы, соответствующие разным собственным значениям, ортогональны, потому что $v_i a u_j = \\lambda_i v_i u_j = \\lambda_j v_i u_j$." ] }, { "cell_type": "code", "execution_count": 154, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Ntj_buO3eSam", "outputId": "574cf26d-6cdc-47f0-a800-178356877b8d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 9.71825316e-01 0.00000000e+00]\n", " [-5.55111512e-17 9.71825316e-01]]\n" ] } ], "source": [ "print(v.T @ u)" ] }, { "cell_type": "markdown", "metadata": { "id": "UPo6GT9Ne0iU" }, "source": [ "**Упражнение:** \n", "\n", "в машинном обучении есть модель линейной регрессии, для которой \"хорошее\" решение считается по следующей формуле: $\\widehat{\\theta} = (X^T \\cdot X + \\lambda \\cdot I_n)^{-1}\\cdot X^T y$. Вычислите $\\widehat{\\theta}$ для $ X = \\begin{pmatrix} -3 & 4 & 1 \\\\ 4 & 3 & 1 \\end{pmatrix}$, $y = \\begin{pmatrix} 10 \\\\ 12 \\end{pmatrix}$, $I_n$ — единичная матрица размерности 3, $\\lambda = 0.1$.\n", "\n", "**Решение:**" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "id": "WTHCzulBi1Jk" }, "outputs": [], "source": [ "X = np.array([[-3, 4, 1], [4, 3, 1]])\n", "y = np.array([10, 12])\n", "I = np.eye(3)\n", "lambd = 0.1\n", "theta = np.linalg.inv(X.T @ X + lambd * I) @ X.T @ y" ] }, { "cell_type": "markdown", "metadata": { "id": "qw_8kCAIeSam" }, "source": [ "## 6. Интегрирование" ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "id": "3dA2vW9NeSan" }, "outputs": [], "source": [ "from scipy.integrate import quad, odeint\n", "from scipy.special import erf" ] }, { "cell_type": "code", "execution_count": 157, "metadata": { "id": "ZR_J8-LFeSao" }, "outputs": [], "source": [ "def f(x):\n", " return np.exp(-x ** 2)" ] }, { "cell_type": "markdown", "metadata": { "id": "mC4waOCPeSao" }, "source": [ "Адаптивное численное интегрирование (может быть до бесконечности). `err` — оценка ошибки." ] }, { "cell_type": "code", "execution_count": 158, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ybR0uRuneSap", "outputId": "550216ef-e3e6-44cd-fb7a-b471514f63e7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.8862269254527579 0.8862269254527579 7.101318390472462e-09\n" ] } ], "source": [ "res, err = quad(f, 0, np.inf)\n", "print(np.sqrt(np.pi) / 2, res, err)" ] }, { "cell_type": "code", "execution_count": 159, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "-QGAB-m3eSaq", "outputId": "1d11779c-563a-41dd-f8e1-134c7884223d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.7468241328124269 0.7468241328124271 8.291413475940725e-15\n" ] } ], "source": [ "res, err = quad(f, 0, 1)\n", "print(np.sqrt(np.pi) / 2 * erf(1), res, err)" ] }, { "cell_type": "markdown", "metadata": { "id": "huwe8P9LeSaq" }, "source": [ "## 7. Сохранение в файл и чтение из файла" ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "id": "2XDE5GHzeSar" }, "outputs": [], "source": [ "x = np.arange(0, 25, 0.5).reshape((5, 10))\n", "\n", "# Сохраняем в файл example.txt данные x в формате с двумя точками после запятой и разделителем ';'\n", "np.savetxt('example.txt', x, fmt='%.2f', delimiter=';')" ] }, { "cell_type": "markdown", "metadata": { "id": "B9RcBcSpeSas" }, "source": [ "Получится такой файл" ] }, { "cell_type": "code", "execution_count": 161, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dqoUbX3meSas", "outputId": "0bbc8b9d-09d4-44ed-fae4-9d4b870801a6" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.00;0.50;1.00;1.50;2.00;2.50;3.00;3.50;4.00;4.50\r\n", "5.00;5.50;6.00;6.50;7.00;7.50;8.00;8.50;9.00;9.50\r\n", "10.00;10.50;11.00;11.50;12.00;12.50;13.00;13.50;14.00;14.50\r\n", "15.00;15.50;16.00;16.50;17.00;17.50;18.00;18.50;19.00;19.50\r\n", "20.00;20.50;21.00;21.50;22.00;22.50;23.00;23.50;24.00;24.50\r\n" ] } ], "source": [ "! cat example.txt" ] }, { "cell_type": "markdown", "metadata": { "id": "EMfZ9-KceSat" }, "source": [ "Теперь его можно прочитать" ] }, { "cell_type": "code", "execution_count": 162, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "AzLywGeKeSat", "outputId": "62bc5b7d-3076-46d1-a4fe-346581a65aad" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 0.5 1. 1.5 2. 2.5 3. 3.5 4. 4.5]\n", " [ 5. 5.5 6. 6.5 7. 7.5 8. 8.5 9. 9.5]\n", " [10. 10.5 11. 11.5 12. 12.5 13. 13.5 14. 14.5]\n", " [15. 15.5 16. 16.5 17. 17.5 18. 18.5 19. 19.5]\n", " [20. 20.5 21. 21.5 22. 22.5 23. 23.5 24. 24.5]]\n" ] } ], "source": [ "x = np.loadtxt('example.txt', delimiter=';')\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": { "id": "Y4voLe-LeSau" }, "source": [ "## 8. Производительность numpy\n", "\n", "Посмотрим на простой пример — сумма первых $10^8$ чисел." ] }, { "cell_type": "code", "execution_count": 163, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "nvCQ_4WReSav", "outputId": "206ff723-f29c-47c7-ce17-b061a9ce95f3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4999999950000000\n", "CPU times: user 10.5 s, sys: 0 ns, total: 10.5 s\n", "Wall time: 10.6 s\n" ] } ], "source": [ "%%time\n", "\n", "sum_value = 0\n", "for i in range(10 ** 8):\n", " sum_value += i\n", "print(sum_value)" ] }, { "cell_type": "markdown", "metadata": { "id": "v4SQIcy8eSaw" }, "source": [ "Немного улучшеный код" ] }, { "cell_type": "code", "execution_count": 164, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ewr9BFMpeSaw", "outputId": "c476b8da-b480-4501-923b-0638f89e7c39" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4999999950000000\n", "CPU times: user 1.61 s, sys: 0 ns, total: 1.61 s\n", "Wall time: 1.6 s\n" ] } ], "source": [ "%%time\n", "\n", "sum_value = sum(range(10 ** 8))\n", "print(sum_value)" ] }, { "cell_type": "markdown", "metadata": { "id": "pUjiYs86eSax" }, "source": [ "Код с использованием функций библиотеки `numpy`" ] }, { "cell_type": "code", "execution_count": 165, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Jv6RFksoeSax", "outputId": "79144d35-18e1-477a-cda9-ea6866dfa371" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4999999950000000\n", "CPU times: user 266 ms, sys: 1.84 s, total: 2.11 s\n", "Wall time: 8.04 s\n" ] } ], "source": [ "%%time\n", "\n", "sum_value = np.arange(10 ** 8).sum()\n", "print(sum_value)" ] }, { "cell_type": "markdown", "metadata": { "id": "ya-V2P3veSay" }, "source": [ "Простой и понятный код работает в $30$ раз быстрее!\n", "\n", "Посмотрим на другой пример. Сгенерируем матрицу размера $500\\times1000$, и вычислим средний минимум по колонкам.\n", "\n", "Простой код, но при этом даже использующий некоторые питон-функции\n", "\n", "*Замечание*. Далее с помощью `scipy.stats` происходит генерация случайных чисел из равномерного распределения на отрезке $[0, 1]$. Этот модуль будем изучать в следующем ноутбуке." ] }, { "cell_type": "code", "execution_count": 166, "metadata": { "id": "Ulfmy68keSaz" }, "outputs": [], "source": [ "import scipy.stats as sps" ] }, { "cell_type": "code", "execution_count": 167, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bp_qnnwPeSaz", "outputId": "5c99eb42-2674-4c55-e43e-e36097ceacae" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.004160749803539531\n", "CPU times: user 17.7 s, sys: 180 ms, total: 17.9 s\n", "Wall time: 17.8 s\n" ] } ], "source": [ "%%time\n", "\n", "N, M = 500, 1000\n", "matrix = []\n", "for i in range(N):\n", " matrix.append([sps.uniform.rvs() for j in range(M)])\n", "\n", "min_col = [min([matrix[i][j] for i in range(N)]) for j in range(M)]\n", "mean_min = sum(min_col) / N\n", "print(mean_min)" ] }, { "cell_type": "markdown", "metadata": { "id": "c47oHwHzeSa0" }, "source": [ "Понятный код с использованием функций библиотеки numpy" ] }, { "cell_type": "code", "execution_count": 168, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "h1WGN9hseSa0", "outputId": "926ee2fe-bb94-4ff5-d2e3-d75ae6ca9109" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0010383116509533313\n", "CPU times: user 18.7 ms, sys: 226 µs, total: 19 ms\n", "Wall time: 24.1 ms\n" ] } ], "source": [ "%%time\n", "\n", "N, M = 500, 1000\n", "matrix = sps.uniform.rvs(size=(N, M))\n", "mean_min = matrix.min(axis=1).mean()\n", "print(mean_min)" ] }, { "cell_type": "markdown", "metadata": { "id": "ABVSxP6seSa1" }, "source": [ "Простой и понятный код работает в 1500 раз быстрее!" ] }, { "cell_type": "markdown", "metadata": { "id": "8X4QkUGIe0ib" }, "source": [ "## 9. Суммы Эйнштейна\n", "\n", "С помощью соглашения Эйнштейна о суммировании, многие общие многомерные линейные алгебраические операции с массивами могут быть представлены простым способом.\n", "\n", "`Если одна и та же буква в обозначении индекса встречается и сверху, и снизу, то такой член полагается просуммированным по всем значениям, которые может принимать этот индекс. `\n", "\n", "Например, выражение $c_j = a_i b^i_j$ понимается как $c_j = \\sum_{i=1}^n a_i b^i_j$.\n", "\n", "Подобные операции часто возникают в анализе данных, в особенности при реализации байесовских методов.\n", "\n", "В `numpy` такие операции реализует функция `einsum`, причем здесь не делается разницы между нижними и верхними индексами. Функция принимает на вход сигнатуру операции в виде текстовой строки и матрицы с данными.\n", "\n", "Разберем на примере выше. В данном случае сигнатура имеет вид `i,ji->j`. Элементы сигнатуры последовательно означают следующее (тензор = многомерная матрица):\n", "* `i`— объявление обозначений индексов тензора $A$. Поскольку индекс один, то тем самым $A$ должен быть вектором.\n", "* `,` — переход к объявлению индексов следующему тензору.\n", "* `ji` — объявление обозначений индексов тензора $B$. Поскольку индекса два, то тем самым $B$ должен быть матрицей.\n", "* `->` — разграничение входа и выхода.\n", "* `j` — индекс на выходе. Поскольку индекс $i$ объявлен на входе и не объявлен на выходе, по нему происходит суммирование поэлементных произведений." ] }, { "cell_type": "code", "execution_count": 169, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "frEWaGTpe0ic", "outputId": "7534758d-3408-4ac7-ee74-926be37ff8e1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2]\n", "[[ 10 15 20]\n", " [100 150 200]]\n" ] } ], "source": [ "A = np.array([0, 1, 2])\n", "B = np.array([[10, 15, 20], [100, 150, 200]])\n", "print(A)\n", "print(B)" ] }, { "cell_type": "markdown", "metadata": { "id": "wnYXLx7me0ic" }, "source": [ "В приведенном выше примере получаем:\n", "* $c_0 = a_0 \\cdot b^0_0 + a_1 \\cdot b^1_0 + a_2 \\cdot b^2_0$. В нашем случае: $c_0 = 0 \\cdot 1 + 1 \\cdot 15 + 2 \\cdot 20$.\n", "* $c_1 = a_0 \\cdot b^0_1 + a_1 \\cdot b^1_1 + a_2 \\cdot b^2_1$. В нашем случае: $c_1 = 0 \\cdot 1 + 1 \\cdot 150 + 2 \\cdot 200$." ] }, { "cell_type": "code", "execution_count": 170, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "kUOJs_Ppe0ic", "outputId": "d56832e0-7c7f-46c7-fe4a-bc5bf1f88b44" }, "outputs": [ { "data": { "text/plain": [ "array([ 55, 550])" ] }, "execution_count": 170, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.einsum('i,ji->j', A, B)" ] }, { "cell_type": "markdown", "metadata": { "id": "QbMfu2t3e0ic" }, "source": [ "Суммирование элементов вектора $A$" ] }, { "cell_type": "code", "execution_count": 171, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "f3mqdCsxe0ic", "outputId": "eb3ea242-876a-43f1-b5af-db9147a0be90" }, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 171, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.einsum('i->', A)" ] }, { "cell_type": "markdown", "metadata": { "id": "lRiHtZRJe0id" }, "source": [ "Суммирование элементов матрицы $B$ по столбцам" ] }, { "cell_type": "code", "execution_count": 172, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "2FOx17B8e0id", "outputId": "3d2674a1-3321-4ff0-9b9d-0f920a40cc73" }, "outputs": [ { "data": { "text/plain": [ "array([110, 165, 220])" ] }, "execution_count": 172, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.einsum('ji->i', B)" ] }, { "cell_type": "markdown", "metadata": { "id": "NrhfL3pTe0id" }, "source": [ "Рассмотрим следующие матрицы" ] }, { "cell_type": "code", "execution_count": 173, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KGWiDJh6e0id", "outputId": "aa0667a1-6e9f-45a9-e919-70d6d7c44ba4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1 2]\n", " [3 4 5]]\n", "[[ 0 1]\n", " [ 10 100]\n", " [ 30 70]]\n" ] } ], "source": [ "A = np.array([[0, 1, 2], [3, 4, 5]])\n", "B = np.array([[0, 1], [10, 100], [30, 70]])\n", "print(A)\n", "print(B)" ] }, { "cell_type": "markdown", "metadata": { "id": "_ZLBqQNne0id" }, "source": [ "Транспонирование матрицы $A$" ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7UejqaL3e0ie", "outputId": "f3b0b2b7-4af7-4b27-851b-c0d1e376e9fb" }, "outputs": [ { "data": { "text/plain": [ "array([[0, 3],\n", " [1, 4],\n", " [2, 5]])" ] }, "execution_count": 174, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.einsum('ij->ji', A) " ] }, { "cell_type": "markdown", "metadata": { "id": "aMhJgvvwe0ie" }, "source": [ "Матричное умножение" ] }, { "cell_type": "code", "execution_count": 175, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "egjfsoYRe0ie", "outputId": "b699e7e7-dde6-4ee8-f821-16a4a2f48095" }, "outputs": [ { "data": { "text/plain": [ "array([[ 70, 240],\n", " [190, 753]])" ] }, "execution_count": 175, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.einsum('ij,jk->ik', A, B) " ] }, { "cell_type": "markdown", "metadata": { "id": "atnlUnp1e0ie" }, "source": [ "Можно наоборот" ] }, { "cell_type": "code", "execution_count": 176, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "lRsvxql3e0ie", "outputId": "a3fa91dc-a023-4dbc-a640-670487d3725c" }, "outputs": [ { "data": { "text/plain": [ "array([[ 70, 240],\n", " [190, 753]])" ] }, "execution_count": 176, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.einsum('jk,ij->ik', B, A) " ] }, { "cell_type": "markdown", "metadata": { "id": "UZklEsQ6e0ie" }, "source": [ "Квадратная матрица" ] }, { "cell_type": "code", "execution_count": 177, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "WH2V7drme0if", "outputId": "965e7eaf-73a2-4920-abae-3711f6d6e6b1" }, "outputs": [ { "data": { "text/plain": [ "array([[0, 1, 2],\n", " [3, 4, 5],\n", " [6, 7, 8]])" ] }, "execution_count": 177, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C = np.arange(9).reshape((3, 3))\n", "C" ] }, { "cell_type": "markdown", "metadata": { "id": "GllXtuXde0if" }, "source": [ "Диагональ" ] }, { "cell_type": "code", "execution_count": 178, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bfR_xtA9e0if", "outputId": "7af0ffc2-4416-4463-8df0-9a8849df6635" }, "outputs": [ { "data": { "text/plain": [ "array([0, 4, 8])" ] }, "execution_count": 178, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.einsum('ii->i', C)" ] }, { "cell_type": "markdown", "metadata": { "id": "XJQE3Flhe0if" }, "source": [ "След" ] }, { "cell_type": "code", "execution_count": 179, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Yk_q9fJre0if", "outputId": "c7735343-1170-435a-cc86-1af4cb4ee7a9" }, "outputs": [ { "data": { "text/plain": [ "12" ] }, "execution_count": 179, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.einsum('ii->', C)" ] }, { "cell_type": "markdown", "metadata": { "id": "y5wkPHLte0if" }, "source": [ "Какая-то странная операция" ] }, { "cell_type": "code", "execution_count": 180, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "sMgajVfue0if", "outputId": "5b13cc76-11fc-45c7-cbb0-bf8b1d8c2614" }, "outputs": [ { "data": { "text/plain": [ "array([[[ 130, 340, 550],\n", " [ 380, 1100, 1820]],\n", "\n", " [[ 340, 910, 1480],\n", " [1100, 3359, 5618]]])" ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.einsum('ij,kj,jl->ilk', A, C, B) " ] }, { "cell_type": "markdown", "metadata": { "id": "j_brnaqae0if" }, "source": [ "**Упражнение.** Создайте матрицы $A\\in\\mathbb{R}^{3\\times2}, B\\in\\mathbb{R}^{2\\times2}$. Посчитайте $\\text{tr} (ABBA^T)$" ] }, { "cell_type": "code", "execution_count": 181, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "E9mdcyU0-KsL", "outputId": "355cffcb-b552-42a2-b6a9-68db81fdaac9" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0 1]\n", " [2 3]\n", " [4 5]]\n", "[[ 0 1]\n", " [100 30]]\n" ] } ], "source": [ "A = np.array([[0, 1], [2, 3], [4, 5]])\n", "B = np.array([[0, 1], [100, 30]])\n", "print(A)\n", "print(B)" ] }, { "cell_type": "markdown", "metadata": { "id": "oJqxb1iZh0N9" }, "source": [ "Элемент квадратной матрицы $ABBA^T$ на позиции $(i, m)$ можно представить как\n", "$$\\sum_j\\sum_k\\sum_l a_{ij}b_{jk}b_{kl}a_{ml}.$$\n", " \n", "Результат — сумма диагональных элементой этой матрицы, то есть \n", "$$ \\text{tr} (ABBA^T) = \\sum_i\\sum_j\\sum_k\\sum_l a_{ij}b_{jk}b_{kl}a_{il}. $$\n", "\n", "Причем результат является числом. Код операции в виде сумм Эйншейна:" ] }, { "cell_type": "code", "execution_count": 182, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "jTkZMGJMfkTb", "outputId": "c47fcda4-bf4b-47d6-b713-708b2d015d8c" }, "outputs": [ { "data": { "text/plain": [ "115780" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.einsum('ij,jk,kl,il->', A, B, B, A)" ] }, { "cell_type": "markdown", "metadata": { "id": "FFqg7hzchiKi" }, "source": [ "Проверим" ] }, { "cell_type": "code", "execution_count": 183, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "hN49fr6Uhjyk", "outputId": "abfb4e83-853a-4b87-b026-b59f17ba7f58" }, "outputs": [ { "data": { "text/plain": [ "115780" ] }, "execution_count": 183, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sum(np.diag(A @ B @ B @ A.T))" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "vscode": { "interpreter": { "hash": "33e61429d47ea5072c304948017faf4b8066559ab931d76623e2d35f352f9359" } } }, "nbformat": 4, "nbformat_minor": 1 }