MatthausK's comments

MatthausK · on Feb 27, 2024

one of the dltHub founders here - we aim to address this in the coming weeks

MatthausK · on Feb 27, 2024

one of the dltHub founders here - we aim to address this in the coming weeks

MatthausK · on Oct 25, 2023

Pulling from and into production databases is one of the early favourites from our dlt user base. Some reasons explained here in this MongoDB example (https://dlthub.com/docs/blog/MongoDB-dlt-Holistics)

juliusgeo · on Oct 25, 2023

This is a really cool project—congrats! A somewhat related project that I worked on at MongoDB is PyMongoArrow, it does some of the same transformations to take unstructured MongoDB data and convert it to tabular formats like Arrow data frames. I’m curious what the support for BSON types that do not map cleanly to JSON types looks like? One example I can think of off the top of my head is Decimal128

MatthausK · on Oct 25, 2023

We hear a lot about the dlt & AWS Lambda. We have currently one user working on the use case (see our Slack https://dlthub-community.slack.com/archives/C04DQA7JJN6/p169...)

MatthausK · on Oct 25, 2023

Thanks for your vote of confidence & support Max!

MatthausK · on Oct 25, 2023

We took at least one immediate practical good piece of advice out of this which is that we should release a conda package and make sure that dlt works in it.

claytonjy · on Oct 25, 2023

I wouldn't make it a high priority. If there's one thing I know about conda users it's that "no conda package available" has never stopped them. In fact they prefer to pip install inside their conda environment, and the only conda packages they use are the ones that touch Nvidia drivers (e.g. pytorch).

crabbone · on Oct 25, 2023

> "no conda package available" has never stopped them.

Yes and no. They won't stop because they want to get things done, and the things usually don't involve honing the infrastructure. But installing packages with pip usually breaks conda installation, not even a particular virtual environment. (Usually pip nukes the setuptools that come with conda, and then once you want to install / upgrade anything in base environment, you discover that it's toast because conda itself depends on setuptools, but it's now broken and cannot be reinstalled).

So, in practice, if you give up and use pip to install stuff, it means that for the next project you will be reinstalling conda (and you will probably lose all your previous virtual environments). Kinda sucks.

MatthausK · on Oct 25, 2023

1) Yes. We support all the databases and buckets as data sources as well. Some examples: - get data from any sql database: https://dlthub.com/docs/dlt-ecosystem/verified-sources/sql_d... or https://dlthub.com/docs/getting-started#load-data-from-a-var... - do it super quickly with pyarrow: https://dlthub.com/docs/examples/connector_x_arrow/ - get data from any storage bucket:https://github.com/dlt-hub/verified-sources/tree/master/sour... 2) Strictly technical answer: on the code level sources and destinations are different Python objects so the answer is no:) but you as a user rarely deal with them directly when coding

MatthausK · on Oct 25, 2023

You can use pydantic models to define schemas, validate data (we also load instances of the models natively): https://dlthub.com/docs/general-usage/resource#define-a-sche...

We have a PR (https://github.com/dlt-hub/dlt/pull/594) that is about to merge that makes the above highly configurable, between evolution and hard stopping: - you will be able to totally freeze schema and reject bad rows - or accept the data for existing columns but not new columns - or accept some fields based on rules'