Notebook exit code 137. Cause and solution

Yes, I’ve been at it again with Microsoft Fabric, and as I’m trying to find the limits of this new cool toy, the limits sometimes get angry with me and throw an error. Most of the time this error is caused by me and I can usually figure out what’s happening, but not always.

Exit code without a real error

In this case, my notebook threw an error at me but the command seemed to finish without any issue. Sounds vague? It did to me. The notebookcell I tried to run had a lot of stuff happening at the same time.

As you can see in the above screenshot, the status shows green checkmarks but there’s an error as well. The error message was not really clear to me, but that can really be me lack of deep level experience. So, I logged a call with Microsoft Support and see what they could come up with.

More hardware!

Long story short, the issue can be seen in the first line of the second error. When a container runs out of memory. Well, who would have known that processing this amount of rows would lead to a lack of memory. And yes, I’ve created my workspace with a small standard pool. That wasn’t the best idea with the benefit of hindsight. On the other hand, it does give some good insights into what a standard small sparkpool can handle.

So the advice was to create a custom pool, large(r) than the current one and retry. So I created a Large pool and see what would happen. The start-up time of the pool was around three minutes, the process itself finished in 16 minutes. Not only quicker but also without the error message.

Why?

The support engineer was kind enough to provide links to the documentation. There you can find the following:

Node sizes

A Spark pool can be defined with node sizes that range from a small compute node (with 4 vCore and 32 GB of memory) to a large compute node (with 64 vCore and 512 GB of memory per node). Node sizes can be altered after pool creation, although the active session would have to be restarted.

Size	vCore	Memory
Small	4	32 GB
Medium	8	64 GB
Large	16	128 GB
X-Large	32	256 GB
XX-Large	64	512 GB

Source: https://learn.microsoft.com/en-us/fabric/data-engineering/spark-compute

32 GB for a small cluster can be enough, but if it isn’t, you can scale up. But how do these cluster sizes compare to our Fabric capacity units? Because that’s where the key lies.

Every capacity unit has two vCores for a spark pool. If you have 2 CU’s, you can run one Small spark cluster of 4 cores. If you upgrade to 4 CU’s, you get 8 vCores and you can either run 1 medium spark cluster OR 2 small spark clusters. For each step you move up on the capacity unit ladder, you can change your configuration. For the F64, these are some options:

Fabric capacity SKU	Capacity units	Spark VCores	Node size	Max number of nodes
F64	64	128	Small	32
F64	64	128	Medium	16
F64	64	128	Large	8
F64	64	128	X-Large	4
F64	64	128	XX-Large	2

source: https://learn.microsoft.com/en-us/fabric/data-engineering/spark-compute

Important to remember, pausing your Fabric environment pauses all these clusters too.

In any case, raising my issue with Microsoft support taught me some valuable lessons on both reading error messages and understanding the capacity unit definitions of Microsoft Fabric.

Thanks for reading!

Notebook exit code 137. Cause and solution

Exit code without a real error

More hardware!

Why?

Node sizes

Published by reitsees

One thought on “Notebook exit code 137. Cause and solution”

Leave a comment Cancel reply

Exit code without a real error

More hardware!

Why?

Node sizes

Share this:

Related

Published by reitsees

One thought on “Notebook exit code 137. Cause and solution”

Leave a comment Cancel reply