Hi, I am trying to find a guideline or thumb rule for the selection of NOSQL Dbs and associated use cases. Is there a document or blog that will provide this information? ie : Which use cases fit best for Key-Value, Document or Wide Column DB. Among them when to choose say Mongo or Couchbase, Cassandra or Hbase, Redis or MemCached. Please share your thoughts.
Recently a team mate came to me asking about to put a DAG with schedule_interval set to 1 min. I said to him that Airflow was not properly designed to run jobs like that (with these kind of frequency).
But I got me wondering "what if?".
I firstly said that because we already have some DAGs that runs @hourly since a long time. Navigating to the "Graph View" for these DAGs is painful due the browser rendering the huge list that comes in the select box.
Then I also thought about the impacts of such thing would bring overall:
the number of active DAG Runs would increase due the delay of the Task Instance's scheduling lifecycle;
maybe we would need to increase the config regarding "maximum active DAG Runs" to do not impact any other DAG;
an increase in logs being created (more I/O);
workers getting busy (would need to increase the workers and maybe use Polls);
access to the metadata database would be more frequent.
Apart from those "issues" I see no other reasons for not creating a DAG with such config.
All of that was only thoughts, we didn't tested anything yet.