{"id":99,"date":"2024-07-24T22:32:45","date_gmt":"2024-07-24T22:32:45","guid":{"rendered":"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/"},"modified":"2024-07-24T22:32:45","modified_gmt":"2024-07-24T22:32:45","slug":"is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui","status":"publish","type":"post","link":"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/","title":{"rendered":"I\u0161 anksto parengto Berto modelio naudojimas teksto klasifikavimui"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div id=\"\">\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Turinys:<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/#Ivadas\" >\u012evadas<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/#Butinos_salygos\" >B\u016btinos s\u0105lygos<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/#1_veiksmas_idiekite_reikalingas_bibliotekas\" >1 veiksmas: \u012fdiekite reikalingas bibliotekas<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/#2_veiksmas_ikelkite_TinyBERT_modeli_ir_tokenizatoriu\" >2 veiksmas: \u012fkelkite TinyBERT model\u012f ir tokenizatori\u0173<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/#3_veiksmas_paruoskite_duomenu_rinkini\" >3 veiksmas: paruo\u0161kite duomen\u0173 rinkin\u012f<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/#4_veiksmas_suaktyvinkite_duomenu_rinkini\" >4 veiksmas: suaktyvinkite duomen\u0173 rinkin\u012f<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/#5_veiksmas_paruoskite_duomenu_ikelimo_irenginius\" >5 veiksmas: paruo\u0161kite duomen\u0173 \u012fk\u0117limo \u012frenginius<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/#6_veiksmas_ismokykite_modeli\" >6 veiksmas: i\u0161mokykite model\u012f<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/#7_veiksmas_ivertinkite_modeli\" >7 veiksmas: \u012fvertinkite model\u012f<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/infonaujiena.lt\/index.php\/2024\/07\/24\/is-anksto-parengto-berto-modelio-naudojimas-teksto-klasifikavimui\/#Isvada\" >I\u0161vada<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Ivadas\"><\/span>\u012evadas<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u201eTinyBERT\u201c yra kompakti\u0161ka BERT (transformatori\u0173 dvikryp\u010di\u0173 kodavimo priemoni\u0173) versija, sukurta pana\u0161iai na\u0161umui su \u017eymiai ma\u017eesniu modelio dyd\u017eiu.  \u0160ioje pamokoje parodysime, kaip naudoti <code>TinyBERT_General_4L_312D<\/code> teksto klasifikavimui.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Butinos_salygos\"><\/span>B\u016btinos s\u0105lygos<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<ul>\n<li>Python 3.6 arba naujesn\u0117 versija<\/li>\n<li>PyTorch<\/li>\n<li>\u201eTransformeri\u0173\u201c biblioteka, sukurta Hugging Face<\/li>\n<li>Duomen\u0173 rinkiniai mokymams ir testavimui<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"1_veiksmas_idiekite_reikalingas_bibliotekas\"><\/span>1 veiksmas: \u012fdiekite reikalingas bibliotekas<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Pirmiausia \u012fdiegkime reikiamas bibliotekas:<\/p>\n<pre><code class=\"language-bash\">pip install torch transformers datasets\n<\/code><\/pre>\n<h2><span class=\"ez-toc-section\" id=\"2_veiksmas_ikelkite_TinyBERT_modeli_ir_tokenizatoriu\"><\/span>2 veiksmas: \u012fkelkite TinyBERT model\u012f ir tokenizatori\u0173<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Turime \u012fkelti TinyBERT model\u012f ir atitinkam\u0105 \u017eeton\u0173 \u012ftais\u0105 i\u0161 Hugging Face Transformers bibliotekos.<\/p>\n<pre><code class=\"language-python\">from transformers import BertTokenizer, BertForSequenceClassification\n\n# Load TinyBERT tokenizer and model\ntokenizer = BertTokenizer.from_pretrained('huawei-noah\/TinyBERT_General_4L_312D')\nmodel = BertForSequenceClassification.from_pretrained('huawei-noah\/TinyBERT_General_4L_312D')\n<\/code><\/pre>\n<h2><span class=\"ez-toc-section\" id=\"3_veiksmas_paruoskite_duomenu_rinkini\"><\/span>3 veiksmas: paruo\u0161kite duomen\u0173 rinkin\u012f<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Mes naudosime <code>datasets<\/code> bibliotek\u0105 duomen\u0173 rinkiniui \u012fkelti ir i\u0161 anksto apdoroti.  \u0160iame pavyzdyje dvejetainiam nuotaik\u0173 klasifikavimui naudosime IMDB duomen\u0173 rinkin\u012f.<\/p>\n<pre><code class=\"language-python\">from datasets import load_dataset\n\n# Load the IMDB dataset\ndataset = load_dataset('imdb')\n\n# Split the dataset into train and test sets\ntrain_dataset = dataset('train')\ntest_dataset = dataset('test')\n<\/code><\/pre>\n<h2><span class=\"ez-toc-section\" id=\"4_veiksmas_suaktyvinkite_duomenu_rinkini\"><\/span>4 veiksmas: suaktyvinkite duomen\u0173 rinkin\u012f<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Turime sutvirtinti tekstinius duomenis, kad juos b\u016bt\u0173 galima \u012fvesti \u012f TinyBERT model\u012f.<\/p>\n<pre><code class=\"language-python\">def tokenize_function(examples):\n    return tokenizer(examples('text'), padding='max_length', truncation=True, max_length=128)\n\n# Tokenize the dataset\ntrain_dataset = train_dataset.map(tokenize_function, batched=True)\ntest_dataset = test_dataset.map(tokenize_function, batched=True)\n<\/code><\/pre>\n<h2><span class=\"ez-toc-section\" id=\"5_veiksmas_paruoskite_duomenu_ikelimo_irenginius\"><\/span>5 veiksmas: paruo\u0161kite duomen\u0173 \u012fk\u0117limo \u012frenginius<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u201ePyTorch\u201c reikalauja, kad duomenys b\u016bt\u0173 \u012fkeliami paketais.  Mes naudosime <code>DataLoader<\/code> klas\u0117 \u0161iam tikslui.<\/p>\n<pre><code class=\"language-python\">from torch.utils.data import DataLoader\n\n# Define data collator\ndata_collator = lambda data: {\n    'input_ids': torch.tensor((f('input_ids') for f in data), dtype=torch.long),\n    'attention_mask': torch.tensor((f('attention_mask') for f in data), dtype=torch.long),\n    'labels': torch.tensor((f('label') for f in data), dtype=torch.long)\n}\n\n# Create data loaders\ntrain_dataloader = DataLoader(train_dataset, batch_size=16, shuffle=True, collate_fn=data_collator)\ntest_dataloader = DataLoader(test_dataset, batch_size=16, shuffle=False, collate_fn=data_collator)\n<\/code><\/pre>\n<h2><span class=\"ez-toc-section\" id=\"6_veiksmas_ismokykite_modeli\"><\/span>6 veiksmas: i\u0161mokykite model\u012f<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Dabar nustatysime mokymo kilp\u0105, kad gal\u0117tume tiksliai suderinti TinyBERT model\u012f m\u016bs\u0173 duomen\u0173 rinkinyje.<\/p>\n<pre><code class=\"language-python\">import torch\nfrom torch.optim import AdamW\nfrom tqdm import tqdm\n\n# Set device (GPU or CPU)\ndevice = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')\nmodel.to(device)\n\n# Define optimizer\noptimizer = AdamW(model.parameters(), lr=5e-5)\n\n# Training loop\nmodel.train()\nfor epoch in range(3):  # Train for 3 epochs\n    loop = tqdm(train_dataloader, leave=True)\n    for batch in loop:\n        optimizer.zero_grad()\n        \n        # Move batch to device\n        batch = {k: v.to(device) for k, v in batch.items()}\n        \n        # Forward pass\n        outputs = model(**batch)\n        loss = outputs.loss\n        \n        # Backward pass\n        loss.backward()\n        optimizer.step()\n        \n        # Update progress bar\n        loop.set_description(f'Epoch {epoch}')\n        loop.set_postfix(loss=loss.item())\n<\/code><\/pre>\n<h2><span class=\"ez-toc-section\" id=\"7_veiksmas_ivertinkite_modeli\"><\/span>7 veiksmas: \u012fvertinkite model\u012f<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Po mokymo turime \u012fvertinti modelio na\u0161um\u0105 bandymo duomen\u0173 rinkinyje.<\/p>\n<pre><code class=\"language-python\">model.eval()\ncorrect = 0\ntotal = 0\n\nwith torch.no_grad():\n    for batch in test_dataloader:\n        # Move batch to device\n        batch = {k: v.to(device) for k, v in batch.items()}\n        \n        # Forward pass\n        outputs = model(**batch)\n        logits = outputs.logits\n        \n        # Calculate accuracy\n        predictions = torch.argmax(logits, dim=-1)\n        correct += (predictions == batch('labels')).sum().item()\n        total += len(batch('labels'))\n\naccuracy = correct \/ total\nprint(f'Test Accuracy: {accuracy:.4f}')\n<\/code><\/pre>\n<h2><span class=\"ez-toc-section\" id=\"Isvada\"><\/span>I\u0161vada<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>\u0160ioje pamokoje parod\u0117me, kaip naudoti <code>TinyBERT_General_4L_312D<\/code> teksto klasifikavimui.  \u012ek\u0117l\u0117me model\u012f ir prieigos rakt\u0105, paruo\u0161\u0117me duomen\u0173 rinkin\u012f, apmok\u0117me model\u012f ir \u012fvertinome jo veikim\u0105.  \u201eTinyBERT\u201c si\u016blo lengv\u0105, bet veiksming\u0105 alternatyv\u0105 originaliam BERT modeliui, tod\u0117l jis tinkamas naudoti ribotoje aplinkoje.<\/p>\n<\/p><\/div>\n<p>Jei tekste radote klaid\u0105, si\u0173skite prane\u0161im\u0105 autoriui pa\u017eym\u0117dami klaid\u0105 ir paspausdami Ctrl-Enter.<\/p>\n<p><br \/>\n<br \/><a href=\"https:\/\/techplanet.today\/post\/using-pretrained-bert-model-for-text-classification\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u012evadas \u201eTinyBERT\u201c yra kompakti\u0161ka BERT (transformatori\u0173 dvikryp\u010di\u0173 kodavimo priemoni\u0173) versija, sukurta pana\u0161iai na\u0161umui su \u017eymiai ma\u017eesniu modelio dyd\u017eiu. \u0160ioje pamokoje parodysime, kaip naudoti TinyBERT_General_4L_312D teksto klasifikavimui. B\u016btinos s\u0105lygos Python 3.6 arba naujesn\u0117 versija PyTorch \u201eTransformeri\u0173\u201c biblioteka, sukurta Hugging Face Duomen\u0173 rinkiniai mokymams ir testavimui 1 veiksmas: \u012fdiekite reikalingas bibliotekas Pirmiausia \u012fdiegkime reikiamas bibliotekas: pip install [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":100,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[3],"tags":[],"class_list":["post-99","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technologijos"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/infonaujiena.lt\/index.php\/wp-json\/wp\/v2\/posts\/99","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/infonaujiena.lt\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/infonaujiena.lt\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/infonaujiena.lt\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/infonaujiena.lt\/index.php\/wp-json\/wp\/v2\/comments?post=99"}],"version-history":[{"count":0,"href":"https:\/\/infonaujiena.lt\/index.php\/wp-json\/wp\/v2\/posts\/99\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/infonaujiena.lt\/index.php\/wp-json\/wp\/v2\/media\/100"}],"wp:attachment":[{"href":"https:\/\/infonaujiena.lt\/index.php\/wp-json\/wp\/v2\/media?parent=99"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/infonaujiena.lt\/index.php\/wp-json\/wp\/v2\/categories?post=99"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/infonaujiena.lt\/index.php\/wp-json\/wp\/v2\/tags?post=99"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}