Launching "A Python Renku Project" stops at "requesting token for provider internal_gitlab"

Launching “A Python Renku Project” stops at “requesting token for provider internal_gitlab”

2024-11-12T17:00:07.896040493Z INFO:root:Checking if the repo already exists.
2024-11-12T17:00:07.896105541Z INFO:root:/home/jovyan/work/test3 does not exist, creating it.
2024-11-12T17:00:07.910072751Z INFO:root:Requesting token for provider INTERNAL_GITLAB

The pods are stuck Initializing

Defaulted container "jupyter-server" out of: jupyter-server, oauth2-proxy, git-proxy, git-sidecar, init-certificates (init), download-image (init), git-clone (init)
Error from server (BadRequest): container "jupyter-server" in pod "xxx-2ec-test3-d15d1159-0" is waiting to start: PodInitializing

Eventually they fail, and retry, but I did see this message:

INFO:root:Checking if the repo already exists.
INFO:root:/home/jovyan/work/flight-31 does not exist, creating it.
INFO:root:Requesting token for provider INTERNAL_GITLAB
Traceback (most recent call last):
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/app/.venv/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 404, in _make_request
    self._validate_conn(conn)
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1060, in _validate_conn
    conn.connect()
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connection.py", line 363, in connect
    self.sock = conn = self._new_conn()
                       ^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connection.py", line 179, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7856739898d0>, 'Connection to soc-renku.xx.xx.ie timed out. (connect timeout=None)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/.venv/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 801, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 594, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='soc-renku.xx.xx.ie', port=443): Max retries exceeded with url: /api/auth/gitlab/exchange (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7856739898d0>, 'Connection to soc-renku.xx.xxx.ie timed out. (connect timeout=None)'))

Strangely when I reinstall, and this pod is not deleted, the the pod starts up. Any ideas what Im doing wrong?

Just to note, all important pods are running

NAME                                                READY   STATUS      RESTARTS        AGE
xxx-2ec-test3-d15d1159-0                       0/4     Init:2/3    3 (2m40s ago)   9m44s
renku-amalthea-7d78494545-qcpsd                     1/1     Running     0               12m
renku-authz-6cd66b6c76-7fd24                        1/1     Running     0               12m
renku-commit-event-service-6466b9dd99-6wp6s         1/1     Running     0               12m
renku-core-cleanup-v10-28857180-bt6ds               0/1     Completed   0               7m33s
renku-core-cleanup-v10-28857185-xpkjh               0/1     Completed   0               2m33s
renku-core-v10-659c79ddb9-pndvf                     3/3     Running     2 (12m ago)     12m
renku-core-v10-659c79ddb9-wzp8t                     3/3     Running     4 (12m ago)     12m
renku-data-service-6b448f5648-l4s2d                 1/1     Running     1 (11m ago)     12m
renku-data-service-6b448f5648-wcbn2                 1/1     Running     2 (11m ago)     12m
renku-data-service-background-jobs-28857186-k659g   0/1     Completed   0               93s
renku-event-log-ff788d96-p887w                      1/1     Running     0               12m
renku-gateway-588c9c9cfd-42cbl                      1/1     Running     3 (12m ago)     12m
renku-gateway-588c9c9cfd-n9b58                      1/1     Running     4 (11m ago)     12m
renku-init-keycloak-realms-rev1-wrmkz-z542r         0/1     Completed   0               12m
renku-init-postgres-authz-rev1-wdl6a-684f8          0/1     Completed   0               12m
renku-init-postgres-keycloak-rev1-rf8bh-kmvk6       0/1     Completed   0               12m
renku-init-postgres-renku-rev1-esavs-8mghm          0/1     Completed   0               12m
renku-init-renku-platform-rev1-8palc-xpnbv          0/1     Completed   0               12m
renku-jena-master-0                                 1/1     Running     0               12m
renku-keycloakx-0                                   1/1     Running     0               12m
renku-knowledge-graph-c9b55fc8f-zmbql               1/1     Running     0               12m
renku-notebooks-0                                   1/1     Running     0               12m
renku-notebooks-1                                   1/1     Running     0               12m
renku-notebooks-k8s-watcher-6f74749b5d-m76tg        1/1     Running     0               12m
renku-postgresql-0                                  1/1     Running     0               12m
renku-redis-node-0                                  3/3     Running     0               12m
renku-redis-node-1                                  3/3     Running     0               12m
renku-redis-node-2                                  3/3     Running     0               11m
renku-search-api-864c6755c5-xwpmc                   1/1     Running     0               12m
renku-search-provision-54d665c4f6-nrzxg             1/1     Running     0               12m
renku-secrets-storage-7dd8864f97-xx8db              1/1     Running     1 (11m ago)     12m
renku-solr-0                                        1/1     Running     0               12m
renku-swagger-55cd5b6594-vpcpc                      1/1     Running     0               12m
renku-token-repository-d6c488464-2sb6w              1/1     Running     0               12m
renku-triples-generator-5f4ddf888-wzdrm             1/1     Running     0               12m
renku-ui-5758579fd6-4sjxr                           1/1     Running     0               12m
renku-uiserver-79667d8b5-pqqf6                      1/1     Running     3 (12m ago)     12m
renku-webhook-service-7b5498bb86-hb5lb              1/1     Running     0               12m

Hi @diarmuidcire , as you can see from the logs, the git-clone container cannot contact the Renku API. See "Connection timed out" on host='soc-renku.xx.xx.ie', port=443 with URL /api/auth/gitlab/exchange. The session pods need to be able to make requests on the Renku API so that they can work properly. Is there any change to the Renku helm chart or to the network policy which prevents networking from session pods?

Thanks @leafty. Yes that I what I have noticed also.
The only thing I changed in the renku-values.yaml file is

graph:
  tokenRepository:
    aesEncryptionKey: 
  webhookService:
    aesEncryptionKey:

It didnt like this:

graph:
  tokenRepository:
    tokenEncryption:
      secret: <use `openssl rand -hex 8 | base64`>
  webhookService:
    hookToken:
      secret: <use `openssl rand -hex 8 | base64`>

My version of kubernetes complained about the hookToken. Otherwise, apart from the gitlab info, secretName, and renku hosts. everything remains the same.

Any ideas welcome.

It looks like hookToken and tokenEncryption in graph might be needed for the networking. So Ill downgrade my kubernetes and reinstall. Ill let you know how it goes

Note, this is the version of kubernetes I am running.

Client Version: v1.29.7
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.7

Helm Version:

helm version
version.BuildInfo{Version:"v3.15.3", GitCommit:"3bb50bbbdd9c946ba9989fbe4fb4104766302a64", GitTreeState:"clean", GoVersion:"go1.22.5"}

This is the error I get, which forced the above change:

Error: execution error at (renku/templates/graph/db-encryption-secret.yaml:40:4): The value graph.tokenRepository.tokenEncryption.secretis deprecated. Please move it tograph.tokenRepository.aesEncryptionKey base64 decoded.

@diarmuidcire can you post what network policies you have in the namespace where renku is running? I.e. kubectl get netpol -o yaml -n <k8s-namespace-where-renku-is-installed> should do the trick. I suspect that something is misconfigured there and it prevents the HTTP request from completing.

Also please let me know which version of renku you have installed. It will help with troubleshooting this further.

I was using Renku 0.58.0 when I reported the above issue.

I tried moving to 0.60.0 but got this error:

Sessions are currently unavailable
{"error":{"code":3003,"detail":"If this problem persists please contact your administrator.","message":"The jupyter server cache is not available"}}

I am now back on 0.58.0 [where I had the original problem]. The output from netpol command above is:

kubectl get netpol -o yaml -n renku 
apiVersion: v1
items:
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app: renku
      app.kubernetes.io/managed-by: Helm
      chart: renku-0.58.0
      heritage: Helm
      release: renku
    name: postgres-ingress
    namespace: renku
    resourceVersion: "21302569"
    uid: 27e4b5cd-875d-492b-b651-1d2f030d21e1
  spec:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app.kubernetes.io/name: keycloakx
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: event-log
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: triples-generator
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: token-repository
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: post-install-postgres
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: renku-data-service
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: keycloak-sync
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: renku-authz
      ports:
      - port: 5432
        protocol: TCP
    - from:
      - namespaceSelector: {}
        podSelector: {}
      ports:
      - port: 9187
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/name: postgresql
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app.kubernetes.io/instance: renku
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: amalthea
      app.kubernetes.io/version: latest
      helm.sh/chart: amalthea-0.12.3
    name: renku-amalthea-controller
    namespace: renku
    resourceVersion: "21302564"
    uid: abf00a69-ef82-4ad6-9095-ad1fc6c6ee50
  spec:
    podSelector:
      matchLabels:
        app.kubernetes.io/component: controller
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: amalthea
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app.kubernetes.io/instance: renku
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: amalthea
      app.kubernetes.io/version: latest
      helm.sh/chart: amalthea-0.12.3
    name: renku-amalthea-jupyterserver
    namespace: renku
    resourceVersion: "21302566"
    uid: 52a3d8a7-1bfc-45a3-8847-7e8c915ac1a7
  spec:
    egress:
    - ports:
      - port: 53
        protocol: UDP
      - port: 53
        protocol: TCP
    - to:
      - ipBlock:
          cidr: 0.0.0.0/0
          except:
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16
    ingress:
    - from:
      - ipBlock:
          cidr: 0.0.0.0/0
      ports:
      - port: 4180
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/component: jupyterserver
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: amalthea
    policyTypes:
    - Ingress
    - Egress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: Helm
    name: renku-notebooks-k8s-watcher
    namespace: renku
    resourceVersion: "21302561"
    uid: 33da9a34-57b9-4a10-a66b-52675cf7fb43
  spec:
    ingress:
    - from:
      - podSelector:
          matchLabels:
            app: notebooks
            release: renku
      ports:
      - port: http
        protocol: TCP
    podSelector:
      matchLabels:
        app: notebooks-k8s-watcher
        release: renku
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: Helm
    name: renku-notebooks-sessions
    namespace: renku
    resourceVersion: "21302565"
    uid: af90d5fb-0e54-4ad6-b371-5dce5361446d
  spec:
    egress:
    - ports:
      - port: 53
        protocol: UDP
      - port: 53
        protocol: TCP
    - to:
      - ipBlock:
          cidr: 0.0.0.0/0
          except:
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16
    - ports:
      - port: http
        protocol: TCP
      to:
      - podSelector:
          matchLabels:
            app: renku-data-service
    ingress:
    - from:
      - podSelector:
          matchLabels:
            app: notebooks-ssh
      ports:
      - port: ssh
        protocol: TCP
    - from:
      - ipBlock:
          cidr: 0.0.0.0/0
      ports:
      - port: 4180
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/component: jupyterserver
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: amalthea
    policyTypes:
    - Ingress
    - Egress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: Helm
    name: renku-notebooks-ssh-jumphost
    namespace: renku
    resourceVersion: "21302562"
    uid: 67302160-0e3b-4793-8f1b-b82a904d4219
  spec:
    egress:
    - ports:
      - port: ssh
        protocol: TCP
      to:
      - podSelector:
          matchLabels:
            app.kubernetes.io/component: jupyterserver
            app.kubernetes.io/instance: renku
            app.kubernetes.io/name: amalthea
    - ports:
      - port: 53
        protocol: UDP
      - port: 53
        protocol: TCP
    podSelector:
      matchLabels:
        app: notebooks-ssh
    policyTypes:
    - Egress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app.kubernetes.io/component: primary
      app.kubernetes.io/instance: renku
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: postgresql
      app.kubernetes.io/version: 16.2.0
      helm.sh/chart: postgresql-14.2.4
    name: renku-postgresql
    namespace: renku
    resourceVersion: "21302563"
    uid: 4b268c67-4c46-4929-9411-37500c56e293
  spec:
    egress:
    - {}
    ingress:
    - ports:
      - port: 5432
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/component: primary
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: postgresql
    policyTypes:
    - Ingress
    - Egress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app.kubernetes.io/instance: renku
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: redis
      app.kubernetes.io/version: 7.0.7
      helm.sh/chart: redis-17.4.2
    name: renku-redis
    namespace: renku
    resourceVersion: "21302568"
    uid: 89ddeee0-538c-47ff-a507-567539541a96
  spec:
    egress:
    - ports:
      - port: 53
        protocol: UDP
    - ports:
      - port: 6379
        protocol: TCP
      - port: 26379
        protocol: TCP
      to:
      - podSelector:
          matchLabels:
            app.kubernetes.io/instance: renku
            app.kubernetes.io/name: redis
    ingress:
    - from:
      - podSelector:
          matchLabels:
            renku-redis-client: "true"
      - podSelector:
          matchLabels:
            app.kubernetes.io/instance: renku
            app.kubernetes.io/name: redis
      ports:
      - port: 6379
        protocol: TCP
      - port: 26379
        protocol: TCP
    - ports:
      - port: 9121
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: redis
    policyTypes:
    - Ingress
    - Egress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: Helm
    name: renku-secrets-storage
    namespace: renku
    resourceVersion: "21302560"
    uid: bcbf12b9-0213-4ffc-8ec5-391a30ab9e6c
  spec:
    ingress:
    - from:
      - podSelector:
          matchLabels:
            app: notebooks
            release: renku
      ports:
      - port: http
        protocol: TCP
    podSelector:
      matchLabels:
        app: renku-secrets-storage
        release: renku
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app: renku
      app.kubernetes.io/managed-by: Helm
      chart: renku-0.58.0
      heritage: Helm
      release: renku
    name: renku-setup-job
    namespace: renku
    resourceVersion: "21302559"
    uid: 5c1020eb-cbee-4d2a-8745-c40e9adfb497
  spec:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: postgres-setup
      ports:
      - port: 5432
        protocol: TCP
    - from:
      - namespaceSelector: {}
        podSelector: {}
      ports:
      - port: 9187
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/name: postgresql
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T13:23:37Z"
    generation: 1
    labels:
      app.kubernetes.io/instance: renku
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: solr
      app.kubernetes.io/version: 9.5.0
      helm.sh/chart: solr-8.9.2
    name: renku-solr
    namespace: renku
    resourceVersion: "21302567"
    uid: 275316a4-f4b2-49de-87b2-65dd9de1aeaf
  spec:
    egress:
    - {}
    ingress:
    - from:
      - podSelector:
          matchLabels:
            app.kubernetes.io/instance: renku
            app.kubernetes.io/name: solr
      - podSelector:
          matchLabels:
            renku-solr-client: "true"
      ports:
      - port: 8983
        protocol: TCP
      - port: 8983
        protocol: TCP
    - from:
      - podSelector:
          matchLabels:
            app: search-api
      - podSelector:
          matchLabels:
            app: search-provision
      ports:
      - port: 8983
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: solr
    policyTypes:
    - Ingress
    - Egress
kind: List
metadata:
  resourceVersion: ""

@tolevshi, I added the following to the renku-amalthea-jupyterserver network config,

      - port: 443
        protocol: TCP

and the pod launched.
xxxxx-2ec-saas-935bf057-0 4/4 Running 0 3m5s

Is there some way of configuring this for the future?

apiVersion: v1
items:
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 1
    labels:
      app: renku
      app.kubernetes.io/managed-by: Helm
      chart: renku-0.58.0
      heritage: Helm
      release: renku
    name: postgres-ingress
    namespace: renku
    resourceVersion: "21322003"
    uid: 52fae3dc-251b-4c2b-8ada-1fc165f6c300
  spec:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app.kubernetes.io/name: keycloakx
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: event-log
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: triples-generator
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: token-repository
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: post-install-postgres
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: renku-data-service
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: keycloak-sync
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: renku-authz
      ports:
      - port: 5432
        protocol: TCP
    - from:
      - namespaceSelector: {}
        podSelector: {}
      ports:
      - port: 9187
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/name: postgresql
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 1
    labels:
      app.kubernetes.io/instance: renku
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: amalthea
      app.kubernetes.io/version: latest
      helm.sh/chart: amalthea-0.12.3
    name: renku-amalthea-controller
    namespace: renku
    resourceVersion: "21321993"
    uid: 8f2d5d6e-0fb5-4221-8653-1af019ccbad9
  spec:
    podSelector:
      matchLabels:
        app.kubernetes.io/component: controller
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: amalthea
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 2
    labels:
      app.kubernetes.io/instance: renku
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: amalthea
      app.kubernetes.io/version: latest
      helm.sh/chart: amalthea-0.12.3
    name: renku-amalthea-jupyterserver
    namespace: renku
    resourceVersion: "21343208"
    uid: c051aee6-3607-4491-83e8-0e3d7caf283e
  spec:
    egress:
    - ports:
      - port: 53
        protocol: UDP
      - port: 53
        protocol: TCP
      - port: 443
        protocol: TCP
    - to:
      - ipBlock:
          cidr: 0.0.0.0/0
          except:
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16
    ingress:
    - from:
      - ipBlock:
          cidr: 0.0.0.0/0
      ports:
      - port: 4180
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/component: jupyterserver
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: amalthea
    policyTypes:
    - Ingress
    - Egress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: Helm
    name: renku-notebooks-k8s-watcher
    namespace: renku
    resourceVersion: "21322002"
    uid: 1c9da1c4-052d-4a96-9579-7ce5be6c649a
  spec:
    ingress:
    - from:
      - podSelector:
          matchLabels:
            app: notebooks
            release: renku
      ports:
      - port: http
        protocol: TCP
    podSelector:
      matchLabels:
        app: notebooks-k8s-watcher
        release: renku
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: Helm
    name: renku-notebooks-sessions
    namespace: renku
    resourceVersion: "21321997"
    uid: 8a27547e-c2e7-4b27-b3d7-58f5daa6fbe8
  spec:
    egress:
    - ports:
      - port: 53
        protocol: UDP
      - port: 53
        protocol: TCP
    - to:
      - ipBlock:
          cidr: 0.0.0.0/0
          except:
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16
    - ports:
      - port: http
        protocol: TCP
      to:
      - podSelector:
          matchLabels:
            app: renku-data-service
    ingress:
    - from:
      - podSelector:
          matchLabels:
            app: notebooks-ssh
      ports:
      - port: ssh
        protocol: TCP
    - from:
      - ipBlock:
          cidr: 0.0.0.0/0
      ports:
      - port: 4180
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/component: jupyterserver
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: amalthea
    policyTypes:
    - Ingress
    - Egress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: Helm
    name: renku-notebooks-ssh-jumphost
    namespace: renku
    resourceVersion: "21321994"
    uid: 1fd56124-4f8b-4368-8f22-0a7a401ba447
  spec:
    egress:
    - ports:
      - port: ssh
        protocol: TCP
      to:
      - podSelector:
          matchLabels:
            app.kubernetes.io/component: jupyterserver
            app.kubernetes.io/instance: renku
            app.kubernetes.io/name: amalthea
    - ports:
      - port: 53
        protocol: UDP
      - port: 53
        protocol: TCP
    podSelector:
      matchLabels:
        app: notebooks-ssh
    policyTypes:
    - Egress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 1
    labels:
      app.kubernetes.io/component: primary
      app.kubernetes.io/instance: renku
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: postgresql
      app.kubernetes.io/version: 16.2.0
      helm.sh/chart: postgresql-14.2.4
    name: renku-postgresql
    namespace: renku
    resourceVersion: "21321996"
    uid: 7736032a-f1cc-44d4-8938-6a119f650206
  spec:
    egress:
    - {}
    ingress:
    - ports:
      - port: 5432
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/component: primary
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: postgresql
    policyTypes:
    - Ingress
    - Egress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 1
    labels:
      app.kubernetes.io/instance: renku
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: redis
      app.kubernetes.io/version: 7.0.7
      helm.sh/chart: redis-17.4.2
    name: renku-redis
    namespace: renku
    resourceVersion: "21322001"
    uid: 283db864-0851-4def-b8bc-0378a489a5d8
  spec:
    egress:
    - ports:
      - port: 53
        protocol: UDP
    - ports:
      - port: 6379
        protocol: TCP
      - port: 26379
        protocol: TCP
      to:
      - podSelector:
          matchLabels:
            app.kubernetes.io/instance: renku
            app.kubernetes.io/name: redis
    ingress:
    - from:
      - podSelector:
          matchLabels:
            renku-redis-client: "true"
      - podSelector:
          matchLabels:
            app.kubernetes.io/instance: renku
            app.kubernetes.io/name: redis
      ports:
      - port: 6379
        protocol: TCP
      - port: 26379
        protocol: TCP
    - ports:
      - port: 9121
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: redis
    policyTypes:
    - Ingress
    - Egress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 1
    labels:
      app.kubernetes.io/managed-by: Helm
    name: renku-secrets-storage
    namespace: renku
    resourceVersion: "21322000"
    uid: d3e1bbac-5722-4622-8556-d28e57a82e81
  spec:
    ingress:
    - from:
      - podSelector:
          matchLabels:
            app: notebooks
            release: renku
      ports:
      - port: http
        protocol: TCP
    podSelector:
      matchLabels:
        app: renku-secrets-storage
        release: renku
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 1
    labels:
      app: renku
      app.kubernetes.io/managed-by: Helm
      chart: renku-0.58.0
      heritage: Helm
      release: renku
    name: renku-setup-job
    namespace: renku
    resourceVersion: "21321995"
    uid: 4fbe8b0b-cd6c-44f3-89fa-811eb465de4a
  spec:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: renku
        podSelector:
          matchLabels:
            app: postgres-setup
      ports:
      - port: 5432
        protocol: TCP
    - from:
      - namespaceSelector: {}
        podSelector: {}
      ports:
      - port: 9187
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/name: postgresql
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
      meta.helm.sh/release-name: renku
      meta.helm.sh/release-namespace: renku
    creationTimestamp: "2024-11-13T14:58:51Z"
    generation: 1
    labels:
      app.kubernetes.io/instance: renku
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/name: solr
      app.kubernetes.io/version: 9.5.0
      helm.sh/chart: solr-8.9.2
    name: renku-solr
    namespace: renku
    resourceVersion: "21321998"
    uid: eee96a33-26dd-48ee-ad4e-d48aa2bcc1ec
  spec:
    egress:
    - {}
    ingress:
    - from:
      - podSelector:
          matchLabels:
            app.kubernetes.io/instance: renku
            app.kubernetes.io/name: solr
      - podSelector:
          matchLabels:
            renku-solr-client: "true"
      ports:
      - port: 8983
        protocol: TCP
      - port: 8983
        protocol: TCP
    - from:
      - podSelector:
          matchLabels:
            app: search-api
      - podSelector:
          matchLabels:
            app: search-provision
      ports:
      - port: 8983
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: solr
    policyTypes:
    - Ingress
    - Egress
kind: List
metadata:
  resourceVersion: ""

@diarmuidcire the network policy you edited is meant to prevent user sessions from being able to access other services that reside within the cluster. The only thing that users sessions are allowed to call inside the cluster is on port 53 - which is used for DNS resolution.

The change you added makes it so that user sessions can call any other service that is active in the same namespace in the cluster on port 443. This is not safe - you should undo this change. I think very few Renku services listen on port 443 but this is still not safe to use in production.

I will post the network policy here by itself just for reference and clarify:

- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    name: renku-amalthea-jupyterserver
    namespace: renku
  spec:
    egress:
    - ports:
      - port: 53
        protocol: UDP
      - port: 53
        protocol: TCP
      - port: 443
        protocol: TCP
    - to:
      - ipBlock:
          cidr: 0.0.0.0/0
          except:
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16
    ingress:
    - from:
      - ipBlock:
          cidr: 0.0.0.0/0
      ports:
      - port: 4180
        protocol: TCP
    podSelector:
      matchLabels:
        app.kubernetes.io/component: jupyterserver
        app.kubernetes.io/instance: renku
        app.kubernetes.io/name: amalthea
    policyTypes:
    - Ingress
    - Egress

The call that is failing without your edits (that enable egress from sessions to port 443) should not be “caught” by that network policy at all. This is because the call to host='soc-renku.xx.xx.ie', port=443 with url: /api/auth/gitlab/exchange is directed outside of the cluster. I am pretty sure the problem is here:

 - to:
      - ipBlock:
          cidr: 0.0.0.0/0
          except:
          - 10.0.0.0/8
          - 172.16.0.0/12
          - 192.168.0.0/16

The three IP ranges in the except block indicate IP addresses reserved for access internal to a network (IPv4 Private Address Space and Filtering - American Registry for Internet Numbers). So I think in your case the dns for the gateway url resolves to an internal/private IP address. I am not sure exactly why this is the case. Can you provide some more information with regard to your networking setup?

@tolevski Thanks for getting back to me.
From within Renku, a cat of the resolv.conf file reveals the DNS is 10.96.0.10. No idea what or that is however. Its not our DNS server which starts with 136.x.x.x.

 base ▶ ~ ▶ work ❯ gpu2-proj ▶ master ▶ $ ▶ cat /etc/resolv.conf 
nameserver 10.96.0.10
search renku.svc.cluster.local svc.cluster.local cluster.local computing.xx.ie
options ndots:5
 base ▶ ~ ▶ work ❯ gpu2-proj ▶ master ▶ $ ▶

All the kubernetes Pod IP addresses are either 192.168.x.x or 136.x.x.x. The 136.x.x.x addresses are in the kube-system namespace and include calico, kube-proxy, kube scheduler, and metallb-system namespace - pods beginning with speaker-xx

soc-renku.xx.xx.ie resolves to a 136.x.x.x address in the kubernetes cluster
ingress command:
renku soc-renku.computing.dcu.ie 136.xx.xx.xx 80, 443
gitlab-webservice-default gitlab-nginx gitlab-gpu.computing.dcu.ie 136.x.x.x 80, 443 8d

Does this help clarify my setup?

Can I set the DNS somewhere within renku?

one thing you can try for debugging is to use a fully-qualified domain name and see if that resolves to the correct ip.

So instead of soc-renku.xx.xx.ie try soc-renku.xx.xx.ie. (notice the . at the end).

Without the . at the end:

  • Your resolv.conf gets used
  • the ndots options limits search so it only does relative searches if there’s less than 5 dots in the URL you’re searching for. It’s usually recommended to set that lower to like 1 dot to improve performance, if there’s lots of external traffic. since your example URL has 3 dots, this passes to the next step. (more info on this)
  • it will try each of the search urls by appending them to the url, so it’ll try soc-renku.xx.xx.ie.renku.svc.cluster.local, then soc-renku.xx.xx.ie.svc.cluster.local etc.
  • if none of them pass it will just search for the URL as is through the DNS. but sometimes in a cluster one of the earlier ones might reply erroneously that it knows a specifid domain.

With the .:

  • as this is a FQDN it will just ask the DNS for the IP directly, irrespective of dots or anything else. this completely ignores the search bit.

If the former fails and the latter works, that’s usually a sign of a not properly configured DNS in the cluster.

Thanks @ralf.grubenmann . Where should I specify soc-renku.xx.xx.ie. ?
In the renku-values.yaml file?

Hi @diarmuidcire what Ralf meant above is for you to try and run nslookup soc-renku.xx.xx.ie. or dig soc-renku.xx.xx.ie. inside the cluster and see what happens. And it is a good idea to try the commands both with and without the trailing dot too.

For this the easiest this is to just make a Pod which contains these tools in the same namespace as the Renku deployment, get a shell in the pod, and run the command and tell us what you get.

Here is an example Pod manifest you can use:

apiVersion: v1
kind: Pod
metadata:
  name: test-dns-lookup
spec:
  containers:
  - name: test
    image: azukiapp/dig:latest
    command:
      - sleep
      - "99999999"

@tolevski
Thanks for the clarification.

Wow, its super weird what happening in the container. Is it something renku is doing? Here is the dig command for soc-renku with and without . at the end.

root@test-dns-lookup:/# dig soc.computing.xxx.ie

; <<>> DiG 9.18.28-0ubuntu0.22.04.1-Ubuntu <<>> soc.computing.xxx.ie
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 26651
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 62dc08b03f028370 (echoed)
;; QUESTION SECTION:
;soc.computing.xxx.ie.		IN	A

;; AUTHORITY SECTION:
computing.xxx.ie.	30	IN	SOA	ns1.xxx.ie. root.computing.xxx.ie. 2024082663 3600 600 1209600 3600

;; Query time: 3 msec
;; SERVER: 10.96.0.10#53(10.96.0.10) (UDP)
;; WHEN: Tue Dec 03 11:02:50 UTC 2024
;; MSG SIZE  rcvd: 144
root@test-dns-lookup:/# dig soc.computing.xxx.ie.

; <<>> DiG 9.18.28-0ubuntu0.22.04.1-Ubuntu <<>> soc.computing.xxx.ie.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 1145
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 0d3c2af1e1c68324 (echoed)
;; QUESTION SECTION:
;soc.computing.xxx.ie.		IN	A

;; AUTHORITY SECTION:
computing.xxx.ie.	30	IN	SOA	ns1.xxx.ie. root.computing.xxx.ie. 2024082663 3600 600 1209600 3600

;; Query time: 4 msec
;; SERVER: 10.96.0.10#53(10.96.0.10) (UDP)
;; WHEN: Tue Dec 03 11:02:54 UTC 2024
;; MSG SIZE  rcvd: 144

root@test-dns-lookup:/# 

When i check other entries in the soc-xxx.computing.xxx.ie domain, I get a response with IP address.

When I changed the dns entry to renku.computing.xxx.ie , i get a response in the container, as follows:

root@test-dns-lookup:/# dig renku.computing.xxx.ie

; <<>> DiG 9.18.28-0ubuntu0.22.04.1-Ubuntu <<>> renku.computing.xxx.ie
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7175
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: f57a883f0bf6f06f (echoed)
;; QUESTION SECTION:
;renku.computing.xxx.ie.		IN	A

;; ANSWER SECTION:
renku.computing.xxx.ie.	30	IN	A	136.xxx.xxx.xxx

;; Query time: 2 msec
;; SERVER: 10.96.0.10#53(10.96.0.10) (UDP)
;; WHEN: Tue Dec 03 11:18:43 UTC 2024
;; MSG SIZE  rcvd: 101

root@test-dns-lookup:/# 

Im going to reinstall the containers now with the new dns entry renku.computing.xxx.ie, so i will see if that improves the situation or if there is something else a miss - maybe in my renku-values.yaml file.

@diarmuidcire see if things work for you with the new dns.

If they do not then you can do the following:

dig +search +recurse +showsearch <domain-name>

As well as the one with . at the end.

dig +search +recurse +showsearch <domain-name>.

We need more details from the commands and the added +... options allows us to see better what is going on. And ofcourse replace <domain-name> with the actual domain name you are using.