Details
-
Work Item
-
Status: New
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
All
-
-
GreenHopper Ranking:0|i1ey2r:
-
9223372036854775807
-
Small
Description
Pipelines consuming Kafka records only seem to start if the flow of records into Kafka is relatively slow.
I loaded about 18,000 tweets into Kafka in about 6 mins. The Pipeline I was running (Kafka to S3) didn't pick up a single message in 15 minutes. However, when I cleared the Kafka Topic and started loading data in at about 1 record a second, the Pipeline started consuming the messages and sending them to S3. The performance was bad. Approximately under half the speed at which the messages were added to the Topic, they were consumed by the Pipeline.
After a few minutes, I removed the throttle from my Kafka data load script and was loading tweets at the same rate as mentioned above (about 3000 messages a minute). The rate at which the Pipeline consumed these messages and processed them went up and continued to process until the data load script was switched off. However, the speed was very poor.
When the Kafka Topic was being fed 1 tweet a second, the Pipeline consumed 78 records in 6 mins. When the Kafka Topic was fed 3000 tweets a minute, the Pipeline consumed at varied rates in a stop/start manner. The highest rate observed was 50 records per second, but that was rare. On average it appeared to be reading about 5 records a second.
So, it seems that if you start a Kafka Pipeline reading from a Topic that is reasonably busy, it might not start at all. However, if you start it slowly and build up the speed, then it finishes. But the performance is poor.
Attachments
Issue Links
- is child of
-
TDI-41791 S3 to S3 perfomance is poor
-
- Done
-