Files
old-svevijesti/pyth/__pycache__/get_articles.cpython-310.pyc

41 lines
5.1 KiB
Plaintext
Raw Normal View History

2024-01-29 14:55:20 +01:00
o
2024-01-31 12:37:55 +01:00
gz<67>e<00> @s\ddlmZddlZddlmZddlmZddlZddlm Z ddl
2024-01-29 14:55:20 +01:00
m Z m Z m Z mZddlZddlmZddlZddlmZe<11>e<0E>e<07>d <09>Ze<06>Ze <09>Zgd
2024-01-31 12:37:55 +01:00
<EFBFBD>Zd d iZdBdedefdd<11>Zdd<13>Zdd<15>Zdd<17>Z dd<19>Z!e"<22>Z#e"<22>Z$dd<1B>Z%e"<22>Z&eD]Z'e%e'e&<26>Z(e(r<>e#<23>)e(<28>q<>dd<1D>e#D<00>Z*e"e <0A><00>Z+e*e+Z,e,Z*e"e*<2A>Z*e!e*<2A>Z*e-dk<02>r<>e*D]<5D>Z.e.e+v<01>r<>e/de.<2E><00><02>e+<2B>0e.<2E>e<02>1e.e<1A>Z2ee2j3d <20>Z4e4<65>5gd!<21><01>Z6d"<22>7d#d$<24>e6D<00><01>Z8e4<65>5d%g<01>Z9d"<22>7d&d$<24>e9D<00><01>Z:e:Z:e8Z8e e8<65>Z8ee:<3A>Z:e ee:<3A><01>Z:ee:<3A>Z;gd'<27>Z<d(d)d*d+d,d-d.<2E>Z=e;d/k<04>ree8<65>Z8zpej>j?j@d d0d1d2<64>d3d4e8<65>d5e:<3A>d6e<<3C>d7<64>d2<64>gd8<64>ZAeAjBdjCjDZEeeE<65>ZEe<0F>FeE<65>ZGeGd9ZHeGd:ZIeGd;Z3eI<65>J<EFBFBD>e<v<00>rbeI<65>J<EFBFBD>ZKnd<ZKe=<3D>1eKeK<65>L<EFBFBD><00>ZKe<18>MeE<65>ZNe/d=eK<65><00><02>e eHe3e.eNd>d?<3F><05>s<>d@ZOe eHe3e.eNeOeK<65>Wq<>eP<65>y<>ZQz e/dAeQ<65><00><02>WYdZQ[Qq<51>dZQ[Qwwq<>dSdS)C<>)<01> BeautifulSoupN)<01>urljoin)<01>OpenAI)<01>OpenAIEmbeddings)<04> insert_data<74>is_similar_data<74> get_all_links<6B> cleansing)<01> load_dotenv)<01> repair_json<6F>OPENAI_API_KEY)
2024-01-29 14:55:20 +01:00
zhttps://klix.bazhttps://srpskainfo.comzhttps://bljesak.infozhttps://www.index.hrzhttps://avaz.bazhttps://www.telegraf.rszhttps://www.blic.rszhttps://www.vijesti.mezhttps://dnevnik.hrzhttps://24sata.hrz
User-Agentz<74>Mozilla/5.0 (Linux; Android 5.1.1; SM-G928X Build/LMY47X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.83 Mobile Safari/537.36<EFBFBD> gpt-3.5-turbo<62>string<6E>returncCst<00>|<01>}t|<02>|<00><01>S)N)<04>tiktoken<65>encoding_for_model<65>len<65>encode)r<00>model<65>encoding<6E>r<00>2/home/amir/Desktop/svevijesti/pyth/get_articles.py<70>num_tokens_from_strings
rcC<00>Hd}d}t<00>|<01>}|<03>|<00>}t|<04>|kr|gS|d|<02>}|<03>|<05>}|S)Nr i<><00>rrrr<00>decode<64><07>text<78> encoding_name<6D>
max_tokensr<00>tokens<6E> sliced_tokens<6E> sliced_textrrr<00>slice_text_at_2k_tokens<00>

  
r#cCr)Nr <00>drrrrr<00>slice_title_if_needed'r$r&cs d<01>d<02><00>fdd<04>|D<00><01>}|S)NuYABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzČčĆćDždžĐ𩹮ž0123456789 <20>c3s <00>|] }|<01>vr
|ndVqdS)<02> Nr)<02>.0<EFBFBD>char<61><01> allowed_charsrr<00> <genexpr>4s<02>z&replace_with_spaces.<locals>.<genexpr>)<01>join)r<00> cleaned_textrr+r<00>replace_with_spaces2sr0cCs>t<00>}|D]}d|vr|<02>dd<03>}|<01>|<03>q|<01>|<02>q|S)N<>wwwzwww.r')<03>set<65>replace<63>add)<04> links_set<65>modified_links<6B>link<6E> modified_linkrrr<00> fix_links7s   r9c
Cs<>t<00>|t<02>}|jdkr@t|jd<02>}|<03>d<03>}g}|D]#}|jddd<06>}|D]}t||d<00>} | |vr<|<05>| <09>|<01> | <09>q%q|SdS)N<><4E><00> html.parser<65>article<6C>aT)<01>hrefr>)
<EFBFBD>requests<74>get<65>headers<72> status_coderr<00>find_allr<00>appendr4)
<EFBFBD>url<72>already_checked<65>response<73>soup<75>articles<65>
link_storer<<00>linksr7<00>
link_valuerrr<00>get_article_linksDs 
 


<02><02><04>rMcCsh|]}|r|<01>qSrr)r)<00>itemrrr<00> <setcomp>ZsrO<00>__main__zProcessing link: r;)<03>h2<68>h1<68>h3r(cC<00>g|]}|jdd<01><01>qS<00>T)<01>strip<69><01>get_text)r)<00>titlerrr<00>
2024-01-31 12:37:55 +01:00
<listcomp>m<00>rZ<00>pcCrTrUrW)r)rrrrrZpr[)<05>politics<63>business<73>sport<72>magazine<6E>scitech<63>Politika<6B>Biznis<69>Sport<72>MagazinzNauka i tehnologija<6A>Ostalo)r]r^r_r`ra<00>otheril<00>systemz+Data analytic, Journalist and News reporter)<02>role<6C>content<6E>userz>Extract relevant information from the following input: Title: z, Text: z|. Remove any non-news element related to the current text and title and remove 'FOTO' and 'VIDEO' from title and text, from z<> select category in wich that news belong, and provide the cleaned data make sure that its on Bosnian language and valid JSON object with 'title' field, 'category' and 'content' field.)r<00>messagesrY<00>categoryrjrgz
Category: g\<5C><><EFBFBD>(\<5C>?)<01> threshold<6C>NOzError in completion: )r )R<>bs4rr?<00> urllib.parser<00>openair<00>os<6F>langchain_openair<00> db_managementrrrr <00>json<6F>dotenvr
2024-01-29 14:55:20 +01:00
r<00> json_repairr <00>getenvr <00>client<6E>
embeddings<EFBFBD>dlinksrA<00>str<74>intrr#r&r0r9r2<00> total_links<6B>collected_newsrMrF<00>dlink<6E>
temp_links<EFBFBD>update<74> final_links<6B>db_links<6B> new_links<6B>__name__r7<00>printr4r@rGrrHrC<00>titlesr.<00>
2024-01-31 12:37:55 +01:00
title_text<EFBFBD>texts<74> text_text<78>ttk<74>category_options<6E>category_translation<6F>chat<61> completions<6E>create<74>
completion<EFBFBD>choices<65>messagerj<00>generated_text<78>loads<64> response_datarY<00>predicted_category<72>lowerrm<00>
capitalize<EFBFBD> embed_query<72>vector<6F> similar_d<5F> Exception<6F>errrr<00><module>s<>      
2024-01-29 14:55:20 +01:00
  


<02>



2024-01-31 12:37:55 +01:00
    <06>
<02><06>
2024-01-29 14:55:20 +01:00

2024-01-31 12:37:55 +01:00

<04> <08><02><02><04>