Files
old-svevijesti/pyth/__pycache__/scrapingsingle.cpython-310.pyc

27 lines
3.0 KiB
Plaintext
Raw Normal View History

2024-01-02 15:00:07 +01:00
o
<00>)<29>eV <00> @sLddlmZddlZddlmZddlmZddlZddlm Z ddl
m Z ddl m Z mZddlZddlmZe<11>e<07>d <09>Ze<06>Ze <09>Zgd
<EFBFBD>Zd d iZe<18>Ze<18>Zd d<0E>Ze<18>ZeD]Zeee<1C>Zerke<19>e<1E>q]dd<10>eD<00>Z e!dk<02>r"e D]<5D>Z"e<02>#e"e<17>Z$ee$j%d<12>Z&e&<26>'gd<13><01>Z(d<14>)dd<16>e(D<00><01>Z*e&<26>'dg<01>Z+d<14>)dd<16>e+D<00><01>Z,z^ej-j.j/dddd<1C>dde*<2A>de,<2C>d <20>d<1C>gd!<21>Z0e0j1dj2j3Z4e<0F>5e4<65>Z6e6d"Z7e6d#Z%e8d$<24>e8d%e7<65><00><02>e8d&<26>e8d'e%<25><00><02>e8d$<24>e<15>9e4<65>Z:ee7e%e"e:d(d)<29><05>se e7e%e"e:<3A>Wqze;<3B>y!Z<z e8d*e<<3C><00><02>WYdZ<[<qzdZ<[<wwdSdS)+<2B>)<01> BeautifulSoupN)<01>urljoin)<01>OpenAI)<01>OpenAIEmbeddings)<01>PGVector)<02> insert_data<74>is_similar_data)<01> load_dotenv<6E>OPENAI_API_KEY)zhttps://klix.bazhttps://srpskainfo.comzhttps://bljesak.infoz
User-Agentz<74>Mozilla/5.0 (Linux; Android 5.1.1; SM-G928X Build/LMY47X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.83 Mobile Safari/537.36c
Cs<>t<00>|t<02>}|jdkr@t|jd<02>}|<03>d<03>}g}|D]#}|jddd<06>}|D]}t||d<00>} | |vr<|<05>| <09>|<01> | <09>q%q|SdS)N<><4E><00> html.parser<65>article<6C>aT)<01>hrefr)
<EFBFBD>requests<74>get<65>headers<72> status_coder<00>text<78>find_allr<00>append<6E>add)
<EFBFBD>url<72>already_checked<65>response<73>soup<75>articles<65>
link_storer <00>links<6B>link<6E>
link_value<EFBFBD>r!<00>>/home/asabani/Desktop/svevijesti-master/pyth/scrapingsingle.py<70>get_article_linkss 
 


<02><02><04>r#cCsh|]}|r|<01>qSr!r!)<02>.0<EFBFBD>itemr!r!r"<00> <setcomp>1sr&<00>__main__r )<03>h2<68>h1<68>h3<68> cC<00>g|]}|jdd<01><01>qS<00>T)<01>strip<69><01>get_text)r$<00>titler!r!r"<00>
<listcomp>:<00>r2<00>pcCr,r-r/)r$rr!r!r"r2=r3z gpt-3.5-turbo<62>systemz+Data analytic, Journalist and News reporter)<02>role<6C>content<6E>userz>Extract relevant information from the following input: Title: z, Text: z<>. Remove any non-news element related to the current text and title, and provide the cleaned data as a JSON object with 'title' and 'content' fields.)<02>model<65>messagesr1r7z!*********************************zTitle: z!---------------------------------z
Content : g<><67><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>?)<01> thresholdzError in completion: )=<3D>bs4rr<00> urllib.parser<00>openair<00>os<6F>langchain.embeddingsr<00>langchain.vectorstores.pgvectorr<00>vectDatarr<00>json<6F>dotenvr <00>getenvr
<00>client<6E>
embeddings<EFBFBD>dlinksr<00>set<65> total_links<6B>collected_newsr#r<00>dlink<6E>
temp_links<EFBFBD>update<74> final_links<6B>__name__rrrrrr<00>titles<65>join<69>
title_text<EFBFBD>texts<74> text_text<78>chat<61> completions<6E>create<74>
completion<EFBFBD>choices<65>messager7<00>generated_text<78>loads<64> response_datar1<00>print<6E> embed_query<72>vector<6F> Exception<6F>er!r!r!r"<00><module>sz      


<02>
   <02><06>

<04> <08><02><04>