{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "k4kpdDCUGe_i"
   },
   "source": [
    "# <a href=\"https://miptstats.github.io/courses/ad_fivt.html\">Введение в анализ данных</a>\n",
    "\n",
    "\n",
    "## Обработка естественного языка. Генерация текста с помощью модели LLAMA."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "H_h6MAj1SrtP"
   },
   "source": [
    "В предыдущем [ноутбуке](https://miptstats.github.io/courses/ad_fivt/nlp_sem.html) мы научимся строить рекуррентные нейронные сети. В этом ноутбуке мы применим большую языковую модель <a target=\"_blank\" href=\"https://llama-2.ai/\">LLAMA-2</a>, используя GPU.\n",
    "Llama 2 — это семейство современных больших языковых моделей с открытым доступом. Почитать оригинальную статью 2023 года можно <a target=\"_blank\" href=\"https://huggingface.co/papers/2307.09288\">здесь</a>.\n",
    "\n",
    "Модель может принимать на вход некоторый текст и продолжать его. Заметьте, что по умолчанию языковые модели не являются *conversational*, то есть их использование отличается от моделей типа Chat-GPT, которые предназначены для интерактивного взаимодействия с пользователем. Часто одну модель выкладывают в нескольких разных конфигурациях &mdash; и с обычным, и с conversational интерфейсом."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7xeRF_hSKgzs"
   },
   "outputs": [],
   "source": [
    "%pip install --quiet bitsandbytes==0.41.1 transformers==4.34.1 accelerate==0.24.0 sentencepiece==0.1.99 optimum==1.13.2 auto-gptq==0.4.2\n",
    "import torch\n",
    "import transformers\n",
    "\n",
    "assert torch.cuda.is_available(), \"you need cuda for this part\"\n",
    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xfYEmczvjmsk"
   },
   "source": [
    "Загрузим модель <a target=\"_blank\" href=\"https://huggingface.co/TheBloke/Llama-2-13B-GPTQ\">`TheBloke/Llama-2-13B-GPTQ`</a> из Hugging Face."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 365,
     "referenced_widgets": [
      "0a57aa3fb8af4b6d904b879ace771551",
      "475a554352744fed8da0ba9c92410898",
      "0a200547d48d488f8c44c77d2f73483b",
      "accd345ccfc74afaa2b4147ab7dcef9a",
      "3a36131f141e4da7afc3e00a12d69a86",
      "bf482a3f2e0c4305afbe918d3ede3ee0",
      "845cb6c473154f8eb553aa66f9b3fa7d",
      "9abb03a508d64b3ca373527ad73eee43",
      "29cf52c50a1a4e82957cdca500d00f23",
      "93b833aeda394f298cbff99f2b8e04e9",
      "bae1737063ee40dd8ef5e6553659cbf7",
      "00696440baba428490dc95ebc72e788e",
      "67874a98790c41c1adc8cda98c4ca950",
      "cab1d4211ec044229324c80a3b14c84c",
      "4ed6854cfdf54a8ea3c8114b8b5c6b31",
      "df2871c9f3ca4189b9cb7fcb18faf55c",
      "d7db1cb1374644708a751e8987c7f3b6",
      "1767702750324c6bb94c9d54350e5322",
      "d27c5606c07c48039f32803b1bc67dc6",
      "0f8939827c5a4b9b9daae69900fab24d",
      "280de6ba38614e1aba1dfc73d1b6fc4b",
      "e343c08ee7174cf19b4712950fd6f75d",
      "ab6941b786604960a6184de5f83d6c6f",
      "48fd878592b44c75b8e89012b38a7c98",
      "17119b5128b24cfe89b5d6e62f1773cb",
      "a6818b39c89b4cbd9d5ce85103cee07a",
      "b6186aadad97458e9974cbb5c6127e00",
      "393ea146c52c4a1aa96d377e4de9f801",
      "13d71cd481b248398131519584911180",
      "1a9a79adc5fb470793e06d9c322b6926",
      "7c75f36d1f144fe09dd02351a2f74668",
      "898bcad4a82642d688464550fc8025c2",
      "fee0b5097c2e45a89bd9d71b56472589",
      "3e400844696a47edb97fb85404390425",
      "369eecf94279469980fe1449f903dc7d",
      "64e707df1c954ed39f68dbb37dd5a7c6",
      "23f97903fdc94f9088b157c82f05688a",
      "e685dee405874ffe8986629bdb06bfa8",
      "105c86d260214e68ad024ae5f7c6257e",
      "f3489a41493f453fb6de30abb8c55134",
      "4d1df5c6848f4cffb64763619206e629",
      "ce8335014fb248d29ae14ea5d6146b3e",
      "8170e01b38c04d3b988b87c0cd0b33a9",
      "d8f24f11cdea4a4c9ecf3c414630dd69",
      "9eebf1f4fe884b67a0b04eecc99bf6b5",
      "07299e026d794c3ba5ddd41ba2a7bd2a",
      "626185f4dd974afab772dea7c58964bd",
      "202947386e9b4468af6b5321a1bdfb73",
      "84d4624212b8410aa13bbca392dfd694",
      "e51ef80b5d4241ae85789d31670ac9de",
      "a16c4bbdb48241f8a88433d32ad25823",
      "db91a529aa2b4eb0b5764c451a3fbb9f",
      "eae35522244f4e778548775a37c9cb73",
      "22d5ccba29934652a1e96888ec3741d9",
      "7e1e56111053477a8086a397e57f221d",
      "feff1892c44a42c787bd9467eabc1712",
      "c461603036b0450490912f6741c16235",
      "9c5ce1657e7444f7a542831d06fe533c",
      "13f0725bffd64a98bbb6a555d4cc4464",
      "a631fee570394506ada1cdcabf719fb8",
      "b4118e45f3b945d9b95f942dccd815cb",
      "326a5fb332be4e41bb75e026d25c9fdd",
      "c93f78e236234d3a927c9ab696915f69",
      "cd1400d1251d43e7be7496c3ffd67b90",
      "256923e9ed844d66b7b4a490a52aefcb",
      "8c72760ae196449dbbdb5f5ac81e10a2",
      "df504ac591534a5bb4184e93fefafecd",
      "67ec9a06b70a49d9b965e74d1fd601f7",
      "0dd9103b87f94f4096c1acc76ae2b24e",
      "c1dfc401974d4a20b553fbd46ec73cf1",
      "e81111dd5105409ba5a71ba192d0ce79",
      "bde80d3a6ac14cfcac2fd9dd7dcac1ae",
      "365aec1ab7ad4d8984526f9d55d4b9d6",
      "1e3c8879103146979caccb8a9265c1a7",
      "f8382986155d41c29acdf217d237561f",
      "0cdcd08e951844e48545d9c9b7ae2e1b",
      "770fdabc09b74f50ac64473ade4548a6"
     ]
    },
    "id": "VMzFwx29Kgzu",
    "outputId": "12d0cf33-a940-422b-81fe-a8d3117b729b"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0a57aa3fb8af4b6d904b879ace771551",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "00696440baba428490dc95ebc72e788e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ab6941b786604960a6184de5f83d6c6f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "3e400844696a47edb97fb85404390425",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9eebf1f4fe884b67a0b04eecc99bf6b5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading config.json:   0%|          | 0.00/913 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.\n",
      "  torch.utils._pytree._register_pytree_node(\n",
      "/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.\n",
      "  torch.utils._pytree._register_pytree_node(\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "feff1892c44a42c787bd9467eabc1712",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading model.safetensors:   0%|          | 0.00/7.26G [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:auto_gptq.nn_modules.qlinear.qlinear_cuda_old:CUDA extension not installed.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "df504ac591534a5bb4184e93fefafecd",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Downloading generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "model_name = \"TheBloke/Llama-2-13B-GPTQ\"\n",
    "\n",
    "# Загружаем Llama токенизатор\n",
    "tokenizer = transformers.LlamaTokenizer.from_pretrained(\n",
    "    model_name, device_map=device\n",
    ")\n",
    "tokenizer.pad_token_id = tokenizer.eos_token_id\n",
    "\n",
    "# И саму модель Llama\n",
    "model = transformers.AutoModelForCausalLM.from_pretrained(\n",
    "    model_name,\n",
    "    device_map=\"auto\",\n",
    "    torch_dtype=torch.float16,\n",
    "    low_cpu_mem_usage=True,\n",
    "    offload_state_dict=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OL25ObCDjtCh"
   },
   "source": [
    "Зададим промпт и выведем сгенерированное моделью продолжение."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "gGfyeM-vdq5o",
    "outputId": "2b9b31ba-6824-473e-bdc4-bc68130bb37e",
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Input batch (encoded): {'input_ids': tensor([[    1,   450,   937, 10943, 14436,   713,  2834,   689,  3430,   763]],\n",
      "       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py:1421: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use and modify the model generation configuration (see https://huggingface.co/docs/transformers/generation_strategies#default-text-generation-configuration )\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Output: <s>The first discovered martian lifeform looks like a \"fancy spheroid\"\n",
      "It’s not an alien, but it’s close.\n",
      "A methane-rich meteorite from Mars is the planet’s first known lifeform.\n",
      "NASA / JSC / SCIENCE PHOTO LIBR\n"
     ]
    }
   ],
   "source": [
    "prompt = \"The first discovered martian lifeform looks like\"\n",
    "batch = tokenizer(prompt, return_tensors=\"pt\", return_token_type_ids=False).to(\n",
    "    device\n",
    ")\n",
    "print(\"Input batch (encoded):\", batch)\n",
    "\n",
    "output_tokens = model.generate(\n",
    "    **batch, max_new_tokens=64, do_sample=True, temperature=0.8\n",
    ")\n",
    "# greedy inference:                                        do_sample=False)\n",
    "# beam search for highest probability:                     num_beams=4)\n",
    "\n",
    "print(\"\\nOutput:\", tokenizer.decode(output_tokens[0].cpu()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6SJ-c-lTj27E"
   },
   "source": [
    "**Вывод:** В этом ноутбуке мы посмотрели, как можно генерировать текст с помощью предобученной языковой модели LLAMA."
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}